A Survey on Sentiment and Emotion Analysis forComputational Literary Studies

A Survey on Sentiment and Emotion Analysis for
Computational Literary Studies

Evgeny Kim and Roman Klinger
Institut für Maschinelle Sprachverarbeitung (IMS)
University of Stuttgart
70569 Stuttgart, Germany

Emotions have often been a crucial part of compelling narratives: literature tells about people with goals, desires, passions, and intentions. In the past, classical literary studies usually scrutinized the affective dimension of literature within the framework of hermeneutics. However, with emergence of the research field known as Digital Humanities (DH) some studies of emotions in literary context have taken a computational turn. Given the fact that DH is still being formed as a science, this direction of research can be rendered relatively new. At the same time, the research in sentiment analysis started in computational linguistic almost two decades ago and is nowadays an established field that has dedicated workshops and tracks in the main computational linguistics conferences. This leads us to the question of what are the commonalities and discrepancies between sentiment analysis research in computational linguistics and digital humanities? In this survey, we offer an overview of the existing body of research on sentiment and emotion analysis as applied to literature. We precede the main part of the survey with a short introduction to natural language processing and machine learning, psychological models of emotions, and provide an overview of existing approaches to sentiment and emotion analysis in computational linguistics. The papers presented in this survey are either coming directly from DH or computational linguistics venues and are limited to sentiment and emotion analysis as applied to literary text.

A Survey on Sentiment and Emotion Analysis for
Computational Literary Studies

Evgeny Kim and Roman Klinger Institut für Maschinelle Sprachverarbeitung (IMS) University of Stuttgart 70569 Stuttgart, Germany {firstname.lastname}@ims.uni-stuttgart.de

1 Introduction and Motivation

1.1 On the Importance of Emotions

Human mental experiences consist of various phenomena that are not directly grounded in the objective perception of the world. A large portion of our daily decisions and interactions with others are driven by subconscious processes, including emotions and affect. Emotions play a crucial role when it comes to the arts (Johnson-Laird and Oatley, 2016). Unintentionally or not, when creating a piece of art, an artist introduces this emotional component into her work that in turn make us experience different emotions (Anderson, 2004; Ingermanson and Economy, 2009). When perceiving the arts, for example during reading a novel, people can feel emotions, because they are drawn into the stories that depict characters who act and feel, have desires and fears, reach success or fail (Djikic et al., 2009). Readers of fiction have richer emotional experiences and better abilities of empathy and understanding of others’ lives than people who do not consume literature (Mar et al., 2009; Kidd and Castano, 2013).

This observation has two major implications for the connection between the literature and human emotions. First, literature requires that we use our emotions in order to understand it (Robinson, 2005), or better, we have to use our knowledge about human emotions to understand the feelings and moods of the fictional characters. Second, emotional experiences we draw from the literature are of the same sort we have in real life, which makes literature a valid source of the depiction of human emotions (Hogan, 2010, 2015).

All said-above means that emotions are tightly intervened with the content of artistic work, and thus need to be studied in this context not only by humanities scholars but by psychologists as well, because research in this direction can benefit the understanding of both the arts and emotions.

The link between emotions and arts in general is a matter of debates that date back to the Ancient period, particularly, to Plato, who viewed passions and desires as the lowest kind of knowledge and treated poets as undesirable members in his ideal society (Plato, 1969). In contrast, the Aristotle’s view on emotive component of poetry expressed in his Poetics (Aristotle, 1996) differed from Plato’s in that emotions do have great importance, particularly, in the moral life of a person (de Sousa, 2017).

For a long period of time, no single word or term existed in English language to describe “the emotions” as a category of feeling (Downes and McNamara, 2016). However, in the late 19th century the emotion theory of arts stepped into the spotlight of philosophers. One of the first accounts on the topic is given by Leo Tolstoy in 1898 in his essay What is Art? (Tolstoy, 1962). Tolstoy argues that art can express emotions experienced in fictitious context and the degree to which the audience is convinced in them defines the success of the artistic work (cf., Anderson and McMaster (1986), (Hogan, 2010), and Piper and Jean So (2015)). But why do imaginary contexts make people experience emotions? This paradox that later received the name “paradox of fiction” was first pinpointed by the English philosopher Coling Radford (Radford and Weston, 1975). The paradox is formulated as follows:

  1. We experience emotions towards fictitious characters, object or events.

  2. In order to experience emotions, we must believe that these characters, object, or events are real.

  3. We do not believe that these characters or situations are real.

This paradox and its possible solutions are discussed (e.g., Walton (1978), Lamarque (1981), and Neill (1991)) and disputed (e.g., Tullmann and Buckwalter (2014)) by others, but we leave it to the reader to explore this philosophical problem. What we would like to highlight though in relation to this paradox is that Radford’s statements contributed to the popularity of the research on emotions and arts in many fields, from literary studies to psychology.

But what exactly can we learn from this interplay of emotion and literature? Emotional intelligence is a prerequisite to understanding literary fiction but reading literature in turn enhances our emotional intelligence (Bal and Veltkamp, 2013; Djikic et al., 2013; Johnson, 2012; Samur et al., 2018; Djikic et al., 2009). Moreover, there is a growing body of literature that recognizes the deliberate choices people make with regard to their emotional states when seeking narrative enjoyment, for example a book or a film (Zillmann et al., 1980; Ross, 1999; Bryant and Zillmann, 1984; Oliver, 2008; Mar et al., 2011). The influence of mood on these choices has been studied by Zillmann (1988). His mood-management theory proposes that readers and viewers when seeking entertainment make choices that will promote or maintain positive moods or reduce the negative ones. Usual objections to the mood-management theory point to the fact that people still enjoy tragedies or horror stories (Oliver, 1993; Oliver et al., 2000; Oliver, 2008), though these genres provoke negative emotions in them, such as sadness, fear, anxiety, and anger. A possible solution was proposed by Vorderer et al. (2004): Enjoyment is explained by the notion of “meta-emotions”, i.e., emotions we experience towards our emotions directed at some object, which are deemed appropriate in a particular situation. Recent research in cognitive psychology suggests possible explanations why such experiences are perceived as positive in the first place (Tamborini et al., 2010).

New methods of quantitative research emerged in the humanities scholarship bringing forth the so-called “digital revolution” (Lanham, 1989) and the transformation of the field into what we know as digital humanities (Berry, 2012; Schreibman et al., 2015). The adoption of computational methods of text analysis and data mining from the fields of then fast-growing areas of computational linguistics and artificial intelligence provided humanities scholars with new tools of text analytics and data-driven approaches to theory formulation (Vanhoutte, 2013; Jockers and Underwood, 2016).

Although one of the first works on computational treatment of subjective phenomena was originating from the area of artificial intelligence (AI) (Carbonell, 1979) (cited by Pang et al. (2008)), it was only a few years later that the first work on the computer-assisted modeling of emotions in literature was published (Anderson and McMaster, 1982). Challenged by the question why some texts are more interesting than the others, in their paper, Anderson and McMaster concluded that the “emotional tone” of a story can be responsible for the reader’s interest. The results of their study suggest that a large-scale analysis of “emotional tone” of the collection of texts is possible with the help of a computer program. There are three implications of this finding. First, they suggested that by identifying emotional tones of text passages one can model affective patterns of a given text or a collection of texts, which in turn can be used to challenge or test existing literary theories. Second, their approach to affect modeling demonstrate that the stylistic properties of texts can be defined on the basis of their emotional interest and not only linguistic characteristics. And finally, they suggest that functional texts (speeches, memos, advertisement) can be run through an emotion analysis program to test whether they will have the intended impact. With regard to these implications, the work by Anderson and McMaster (1982) is an important early piece as it laid out the “roadmap” for some of the basic applications of sentiment and emotion analysis of texts, namely sentiment and emotion pattern recognition from text and computational text characterization based on sentiment and emotion.

1.2 Scope and Structure of the Survey

The goal of this survey is to provide a comprehensive overview of the methods of emotion and sentiment analysis as applied to a text. The survey is prepared with a digital humanities scholar in mind who is looking for an introduction to the existing research in the field of sentiment and emotion analysis from (primarily literary) text. All the studies presented in this article are either directly coming from digital humanities venues or deal with sentiments and emotions in the literary text context. A substantial number of the works of the latter category originate from the computational linguistics community. Their primary goal is often a methodological one rather than interpretative one. However, these works are still included in the survey, as we believe – and argue in the discussion – that interpretation and methodology should come hand in hand.

The survey does not cover applications of emotion and sentiment analysis in the areas of digital humanities that are not focused on text, e.g., sentiment analysis of visual art and design, movies, or music. It does also not provide an in-depth overview of all possible applications of emotion analysis in the computational context outside of the DH line of research. However, to make the reader aware of these applications, we shortly mention examples of them in Section 1.3.

The survey is structured as follows: Section 1 is an introduction. Section 2 introduces the reader to the field of natural language processing (NLP) and to the standard pipeline used in many NLP projects. Section 3 introduces the most common emotion theories used for the development of methods of computational emotion analysis, as well as provides an important background to emotion analysis of literary texts from a classic and computational perspective. Section 4 is the core of the survey and is an overview of different applications of sentiment and emotion analysis to literary text. Section 5 concludes the paper.

1.3 Other Applications of Emotion and Sentiment Analysis

The survey does not cover every possible work on emotion analysis that exists, even in the digital humanities context. The understanding and automatic analysis of emotions, sentiments and affects played an important role in computer science and artificial intelligence in the last decades. It is applied in a variety of studies from which we discuss a selection in the following.

Robotics and Artificial Intelligence (AI)

While there is big overlap between the robotics and AI, the former is mostly an engineering field that deals with the design and use of robots, while the latter is more concerned with their actual operation including but not limited to decision making, problem solving, and reasoning (Brady, 1985). This also includes emotional intelligence, as more and more robots that are developed today serve not only pragmatic goals (e.g., cleaning, warehouse operation) but social ones as well (Breazeal, 2003). The motivation for affective computing in robotics and AI, therefore, is to build robots and virtual agents that are more human-like in terms of communication and reasoning.

Robots and virtual agents that are able to recognize and express emotions have been one of the foci in the fields of robotics and artificial intelligence for decades, both at the conceptual (Sloman and Croucher, 1981; Dorner and Hille, 1995; Wright, 1997; Coeckelbergh, 2012) and implementational (case-study) levels (Velásquez, 1998; Leite et al., 2008; Beck et al., 2010; Klein and Cook, 2012). Some works focus on theoretical implications of emotional robots (Sloman and Croucher, 1981; Frijda and Swagerman, 1987; Evans, 2004; Arbib and Fellous, 2004) engaging in a fundamental discussion of such a possibility. A closely related body of research touches upon moral and ethical implications that arise when we talk about autonomous self-aware robots, who may make decisions which are against human moral judgements (Kahn Jr et al. (2012), Arkin et al. (2012), Malle et al. (2015).

Another thriving line of research related to AI and emotions deals with computational modeling of emotions in robotic and virtual agent applications. For example, Gratch and Marsella (2004) propose a new methodology of emotion modeling based on comparing the behavior of the computational model against human behavior and on the use of standard clinical instruments for assessing emotions. Pereira et al. (2005) outline the belief–desire–intention architecture of emotions based on four modules, namely the Emotional State Manager, the Sensing and Perception Module, Capabilities module, and the Resources module, where each module is responsible for separate processes within the emotion concept. Jiang et al. (2007) put forth an extended belief-desire-intention model introducing primary and secondary emotions into the architecture.

Human-computer interaction (HCI) can be considered a subfield of artificial intelligence. It has also showed an increased interest in emotions. For instance, Cowie et al. (2001) examine basic issues related to the extraction of emotions from the user consolidating psychological and linguistic analyses of emotions. Pantic and Rothkrantz (2003) argue that next-generation HCI designs will need to include the ability to recognize user’s affective states in order to become more effective and more human-like. Both Beale and Creed (2008) and Beale and Creed (2009) provide an overview of the role of emotions in HCI highlighting important lessons drawn from different research and providing guidelines for future research.

Computer Games and Virtual Reality

As video games become more complex and engaging, research in the field of game AI gains more popularity. The foci of the research are different (cf. Yannakakis (2012)) but the ones relevant to our discussion are mining of player data and enhancing non-player character behavior. The main motivation for researchers from this field is to study what makes players enjoy or detest a game and consequently make the gaming experience even more enjoyable.

On the one hand, recognition and elicitation of user’s emotions through mining of player data (e.g., recognition of facial expressions and keystroke patterns, chat message analysis) has several applications in the field of game development. For example, by timely and accurately recognizing the player’s emotional state the system can adaptively respond to it by changing the game environment (changing the pace, color scheme, or even suggesting the player to take a break). It can also play a role in educational games by customizing the learning process (Zhou (2003), Conati (2002), Conati et al. (2003)). On the other hand, games that are able to cause an emotional response in players, such as fear or happiness, are more immersive (Sweetser and Johnson, 2004), and are thought to be facilitating the flow (Johnson and Wiles, 2003), which is a state of profound enjoyment and total immersion in an activity (Csikszentmihalyi and Csikszentmihalyi, 1992). Therefore, for the gaming process to be captivating and realistic it is important that the player interacts with realistic non-player characters that express emotions in an intelligent way and react to player’s emotions appropriately (Chaplin and Rhalibi, 2004; Hudlicka and Broekens, 2009; Bosser et al., 2007; Ochs et al., 2008, 2009; Li and Campbell, 2015; Popescu et al., 2014, i.a.).

Emotion Detection from Voice, Face, Body, and Physiology

In contrast to robotics and gaming, the goal of the recognition of emotions from bodily reactions focuses on humans; to identify patterns in acoustic speech signals, facial expressions, body postures, and physiology, and classifying them into different emotions, often with machine learning techniques. Calvo and D’Mello (2010) provide an in-depth survey to which we refer the reader for a comprehensive overview. We will, however, add that since the publication of Calvo’s and D’Mello’s survey, the methodology has changed in terms of the used equipment. Earlier researchers had to rely on laboratory equipment. Nowadays more and more studies are done with the help of non-invasive wearable devices (wrist bands and smartphone applications) that monitor the subjects’ emotional state (cf. Dupré et al. (2018), Ghandeharioun et al. (2016)). This turn seems warranted as it provides the researchers and developers with a more natural and close monitoring of the subjects, and hence, a larger amount of research data.

Sentiment Analysis and Opinion Mining

Applications of sentiment analysis and opinion mining outside of a humanities context are not covered by this survey. However, in Section 3.3 we will give an overview of the existing methods used in sentiment analysis, as some of them are relevant in the context of the reviewed papers. In this section, we give a short overview of other application areas of sentiment analysis and opinion mining.

Opinion mining deals with tracking and automatic classification of opinions expressed by people (Liu, 2012). Opinion in a narrow sense is understood as evaluation or attitude towards some object (Liu, 2015). Although opinion mining and sentiment analysis are often used interchangeably in the literature, opinions are not sentiments (Munezero et al., 2014): While sentiment is prompted by emotions, opinions are judgements based on objective or subjective interpretations of the topic that are not necessarily related to emotions.

As far as sentiment analysis and opinion mining are concerned with human attitudes and feelings towards anything, they find applications in many areas, for instance in business and sociology. A popular application of opinion mining is automatic review analysis, often performed on a large collection of reviews that can originate from any domain, for example movie reviews (Amolik et al., 2016; Parkhe and Biswas, 2016; Tang et al., 2018), product reviews (books, electronics, DVDs, etc.) (Fang and Zhan, 2015; Xia et al., 2015), restaurant and tourism products reviews (Kiritchenko et al., 2014; Gan et al., 2017; Marrese-Taylor et al., 2014). The goal of opinion mining in this context is to classify reviews into positive or negative with various levels of classification granularity.

Opinion mining is not limited to reviews. Computational social sciences have also witnessed an increased interest in automatic sentiment analysis, for example, in the political domain (Maragoudakis et al., 2011; Ceron et al., 2014; Rill et al., 2014; Liu and Lei, 2018). A goal of these studies is not only to understand the electoral preferences of the population, but also to gain insight into how these preferences are formed and propagated via social media (Yaqub et al., 2017).

A significant amount of research is concerned with automatic analysis of social media posts (most commonly, Twitter), for example Khan et al. (2015), Rosenthal et al. (2017), Asghar et al. (2018). The goal here is to enable automatic detection of the users’ posts with respect to the sentiments and opinions. This can be useful for automatic monitoring social media for emergency reports, violent language, and mood of certain user groups.

2 A Very Short Introduction to Natural Language Processing and Machine Learning

In the introduction, we discussed the importance of emotion analysis for literary studies. In Section 3.3, we will provide an overview of research in sentiment and emotion analysis as applied to text. As that section relies on the concepts from both natural language processing (NLP) and machine learning, we provide a short introduction to both disciplines in the following.

A comprehensive overview of NLP is beyond the scope of this survey paper. Therefore, we present NLP tasks as steps of a single pipeline (see Figure 1), which is common for many NLP projects. Readers who are familiar with NLP may skip this section without any hesitation. Readers who feel that they need an in-depth textbook-style introduction to the field are referred to Jurafsky and James (2000), which we follow in this section to describe some important concepts of NLP.

According to the Encyclopedia of Cognitive Science (Allen, 2006), NLP is a field that explores computational methods for interpreting and processing natural language, in either textual or spoken form. NLP addresses a variety of tasks related to language use and text analysis, from machine translation to code switching to named entity recognition to semantic role labeling. Regardless of the task, any NLP project includes several preliminary steps of speech or text processing that are necessary for these and other downstream tasks. We now proceed to the description of these fundamental steps in an NLP pipeline.

Figure 1: Typical NLP pipeline.

2.1 Typical NLP pipeline

Modern NLP pipelines may include a variety of processes and heavy feature engineering combining multiple features. Figure 1 shows the basic NLP pipeline that is most commonly used across various projects.

An NLP pipeline usually starts with speech recognition (if the input is speech) and then continues as if the input is directly text. The next step is tokenization and segmentation followed by morphological analysis, syntactic analysis, and semantic analysis (Uszkoreit, 2001).

2.1.1 Speech Recognition

Such that a speech signal can be analyzed syntactically and semantically, it is typically first converted to text to be able to apply the same methods as for text as input. First, the analogue speech signals are sampled by time, filtered, and decomposed into the frequency domain. The frequency components are then analyzed for features (the most common of which are mel frequency cepstral coefficients (Imai, 1983)) and converted into specific acoustic feature vectors. Then, a language model and vocabulary are used to calculate the phonetic likelihood of each speech sample. The decoded speech signal is then available as hypotheses of textual representations.

2.1.2 Tokenization and Segmentation

Text, converted from a speech signal or directly as input is passed to the tokenization and segmentation part of the pipeline that outputs an array of tokens, i.e., the input text in which units, often words, are separated from each other. This processing is required before a morphological analysis can be applied.

In many languages, words are separated by whitespace and splitting the text on it, in most cases, will produce a meaningful output. However, some words contain whitespaces (e.g., San Francisco, rock and roll) and, depending on the application, some tokenizers may also tokenize multi-word expressions. Some tokenizers can also expand clitic contractions that are marked by apostrophes, for example don’t is converted to do not and I’m to I am.

In some cases, the text should be segmented into sentences first and only then into words. Essentially, the task of sentence segmentation is to separate sentences from each other. The most common cues for segmenting a text into sentences are punctuation marks. Though some symbols, like question mark or exclamation point, are unambiguous markers of a sentence boundary, periods are less unambiguous, as they also indicate abbreviations boundaries (e.g., Mr., Mrs., Inc.). Therefore, it is often more appropriate to address word tokenization and sentence segmentation jointly.

2.1.3 Morphological Analysis

After the text is available is a segmented form, each word in the text can be analyzed for its morphological properties, e.g., inflection and case markers. For each token, a morphological parser outputs its lemma (a dictionary form), and a part-of-speech category with the morphosyntactic information. In addition, words can be stemmed, that is, reduced to their root with affixes and suffixes being removed. Morphological analysis is an important prerequisite for syntactic analysis.

Lemmatization is a process of casting a word to its base form. It is often required to reduce the variability of surface realizations of the words sharing the same root. While lemmatization involves a complex morphological analysis of a word (the algorithm should learn, for example, that the words sang, sung, sings share the same lemma form sing), stemming takes a simpler approach. In some applications, it is only important to map the word to its root, without full parsing of a word. Stemming does exactly that by chopping off the affixes of the words. For example, in web search, one may want to map foxes to fox, but might not need to know that foxes are plural (Jurafsky and James, 2000, p. 46). One popular algorithm for stemming is the Porter stemmer (Porter, 1980).

Morphological parsing is important not only for lemmatization and stemming, but for part-of-speech (POS) tagging as well. In fact, POS tagging is often based on the analysis of word affixes and suffixes (e.g., adjectives in English are recognized by -able,-ful,-ish among other suffixes, while verbs by -ate and -en). The number of POS tags used varies, from seventeen, as in the Universal POS tagset (Petrov et al., 2012), to forty-five in Penn Treebank (Marcus et al., 1993), to sixty one used by the Lancaster UCREL project’s CLAWS (the Constituent Likelihood Automatic Wordtagging System) (Rayson and Garside, 1998).

2.1.4 Syntactic Analysis

Based on the morphological information obtained in the previous step, the words in the sentence are analyzed for their grammatical function (e.g., whether a word is a subject, object, modifier). This process is called parsing and it is important for analyzing the relationship between words, including disambiguating their meaning. The output of this step of the pipeline is a text represented by its syntactic or dependency tree.

There are two main types of parsing: constituency parsing based on Chomsky’s generative grammar (Chomsky, 1993), and dependency parsing based on dependency rules (Kübler et al., 2009). The main difference between the two types of parsing is that the constituency parsing operates on a phrase level, where each type of phrase (e.g., noun phrase, verbal phrase) is allowed to be composed of phrases of certain type, while dependency parsing operates on a word level and takes into account dependency rules between them.

2.1.5 Semantic Analysis

Finally, the sentences, phrases, or words of the text are analyzed for their meaning based on the information obtained in the preceding parts of the pipeline.

Semantic analysis is needed to disambiguate polysemous words, which is especially difficult given a wide range of meanings a single word can take. The most straight-forward approach to word sense disambiguation is through the use of lexical resources, such as WordNet (Fellbaum, 1998). WordNet provides a set of lemmas for nouns, verbs, adjectives, and adverbs, where each lemma is annotated with a set of senses.

Many of the disambiguation algorithms, however, rely on contextual similarity when choosing the proper sense. There are different approaches to computing the word context. One of the most popular of them is a distributional semantics approach. Distributional semantics deals with semantic properties of words derived from their distribution across texts. The intuition behind its use is that words that occur in the same context tend to have similar meaning. Generally, these words are represented as vectors or arrays of numbers that are, in some way, related to word counts. These relationships are captured in a term-context matrix that represents how well each word fit with other words (context) in the corpus. Such a matrix is of dimensionality , where each cell contains the number of times the row word (target) and the column word (context) co-occur in some context in some corpus. The matrix can then be used to calculate the similarity of the words, with the cosine measure being used more commonly.

In contrast to count-based sparse vector representations, most approaches rely on dense representations nowadays, either obtained by dimensionality reduction or by predicting the target word or its context. Examples for this group vector representations include embeddings (Mikolov et al., 2013; Pennington et al., 2014).

2.2 Machine-learning

Although some of the previously described tasks, such as POS tagging or syntactic parsing, can be performed using rule-based approaches, most modern NLP pipelines make use of machine learning methods. The advantage of machine learning over rule-based systems becomes especially clear in the context of large data that needs to be processed. Writing down rules that capture all the minuscule differences and variety of language in some corpus is a tedious and by and large an impossible task. That is where machine learning techniques come handy. Machine learning is a subfield of artificial intelligence widely applied to many other disciplines, including natural language processing and data science. In the remainder of this section, we introduce three main paradigms of machine learning and briefly describe how are they used for solving NLP tasks.

2.2.1 Learning Paradigms

Machine learning is about using the right features to build the right models that achieve the right task (Flach, 2012, p. 13). The machine learning models learns to associate characteristics of each instance with a class to be predicted. These charateristics are commonly referred to as features. Following this definition, one must acknowledge that there is no single machine learning framework (cf. “no silver bullet” argument by Brooks (1987)) that applies to all possible scenarios. Generally, machine learning settings bifurcate into supervised and unsupervised learning paradigms, with each of the paradigms encompassing a wide range of models.

Supervised machine learning refers to the methods of labeling unseen data by learning a function from labeled training instances. A classifier is a function , where is an instance of data and is a finite set of class labels. Labels can be numerical, ordinal, or nominal, or structured, and often denote a class membership of each data instance. For example, in the task of POS tagging, labels are the actual POS tags assigned to the words in the training set. During the training phase, a classification algorithm learns a mapping from instances to labels, and later, during the prediction phase, classifies the new, unseen, instances with the class labels. Examples for supervised machine learning methods are naïve Bayes classifier, support vector machines, decision trees, and supervised deep learning algorithms.

However, labeled training data is not always available or is prohibitively expensive. Moreover, sometimes researchers do not know the actual labels of the data they have. In this case, another family of the machine learning algorithms, referred to as unsupervised machine learning, comes to the rescue. Clustering is one of the most popular unsupervised models that works by assessing the similarity between instances and arranging them in such a way that similar instances are put in the same cluster while dissimilar instances are put in different clusters. The output of an unsupervised machine learning algorithm can be used to better understand the nature and variance of the data, or as a prerequisite step to develop a supervised learning task with a set of defined labels.

2.2.2 Feature-based Learning

A common first approach when developing a machine-learning-based model is to map each instance into a representation of its characteristics, its features. Features are functions mapping instances to a set of values, for instance real numbers, Boolean values (e.g., “is this word an adjective/noun/verb?”, “is this word a proper noun?”, “is previous word ‘the’?”, etc.) and integers (when the feature is a count of something). In text classification, a common approach is the so-called bag-of-words, in which each word is represented by its count in an instance (a document, sentence, for instance).

Features do not come ready-made with the data and the process of model building and feature engineering is often iterative: features are added, removed, normalized, and fine-tuned until the model achieves the results one expects from it. Traditionally, feature engineering has been paid great attention to in machine learning. However, recent successes in the family of machine learning methods known as deep learning have deemed features less necessary an ingredient than the model architecture.

2.2.3 Deep Learning

During the past decade, neural networks have regained their once-lost popularity, which vanished in the late 1990s due to the computational cost associated with them and the rise of other successful methods, for instance support vector machines. One of the factors that can be attributed to the re-emergence of neural networks is the availability of moderately expensive hardware and software capable of processing big data. What was not possible back in 1990s has become possible now: neural networks can be trained on big amounts of data, with comparably big sets of parameters and “deep” architectures.

The general idea behind deep learning is to build models in which, specifically in NLP, words are represented in a continuous space (following the ideas of distributional semantics). Neural networks usually have several layers, which are trained jointly to fulfill the specific task at hand. Each of these layers can be interpreted as being responsible for different subtasks on the route to the common goal. The layers of the network extract and transform features sequentially. The layers that are close to the data input extract simple features, while higher layers learn more complex features derived from the lower layer features (Zhang et al., 2018, p. 2). Exactly due to such a multi-layered structure of deep neural networks, manually designed features are of lower importance, as every layer extracts them from the input on its own.

Common network substructures for NLP include the following components: embedding layers, convolutional networks or a long-short-term memory network, and a dense layer, which we discuss exemplarily in the following. Word embeddings transforms words in a vocabulary to vectors of continuous real numbers that represent words as a function of their context and encode linguistic patterns. Word2Vec (Mikolov et al., 2013) is one popular word embedding approach that includes models for predicting a target word from its context and, vice versa, predicting the contexts words given the target word. Dense layers combine the information received from the preceding components and often perform the final classification. Convolutional neural networks (CNN) (LeCun et al., 1989, 1998) are a special kind of neural networks originally used in computer vision, inspired by the human visual cortex. Similar to the visual system, CNNs are able to detect relevant features in the input that is processed in an “n-gram” fashion. This is achieved by using filters, that detect relevant features from the input, and a max pooling, an operation of extracting the most representative numeric values from the filtered features. CNNs ability to capture the spatial correlation of features proved to be useful in the NLP context, as features important for text classification may be located in different places of the input.

Long short-term memory (LSTM) networks Hochreiter and Schmidhuber (1997) have a recursive structure and interpret the input as time-series and are capable of learning distant dependencies. In contrast to CNNs that are limited to their filter sizes, LSTMs have a memory of more distant information. This comes at cost of computational complexity. The trade-off between efficiency and complexity is realized in the mechanism of a “forget gate”. The gate discards irrelevant information (features) from the previously read input. This makes LSTM efficient in learning sequential data, as irrelevant features are discarded improving the prediction, which is not biased by unimportant details.

These and other components make deep learning an efficient tool for solving many problems. However, deep learning has its limitations. First, it often requires large amounts of data to recognize helpful characteristics in data, which is not always available, especially in certain domains. Second, deep learning algorithms are not always easily interpretable, which often makes it difficult to understand what meta-parameters of the network should be optimized for a better result.

2.3 Applications

Machine learning finds an extensive application in natural language processing. The advancements in machine learning have contributed to the development of the field in recent years, both in terms of methodology and the efficiency of performing certain tasks. Most of the steps of a typical NLP pipeline we describe in Section 2.1 are performed today with the help of machine learning. The presented pipeline is rather fundamental and is often used as a part of other larger pipelines designed for specific applications. These applications include dialogue systems, discourse analysis, document classification, text generation, text mining, machine translation, question answering, text summarization, and, finally, sentiment and emotion analysis.

With this necessary introduction to natural language processing and machine learning, we now may proceed to an overview of what sentiment and emotion analysis is and how it is performed computationally. But before that, we first need to provide a background in the emotion theories that exist in psychology and introduce the role they play in the computational emotion analysis.

3 Background on Sentiment Analysis and Emotion Analysis

3.1 Affect and Emotion in Psychology

The history of emotion research has a long and rich tradition that followed the 1872 Darwin’s publication of The Expression of the Emotions in Man and Animals Darwin (1872). The subject of emotion theories is so vast and diverse that it is not possible to even briefly mention all of the theories or name prominent psychologists who contributed to the emotion research throughout the nineteenth and twentieth centuries (see Gendron and Feldman Barrett (2009) for a brief history of ideas about emotion in psychology). Most emotion theories, however, that appeared in the last century fall into one of the traditions, namely basic, appraisal, and constructionist. In the pages that follow, we briefly discuss models of emotions as they are introduced in psychology. We limit these descriptions to those theories which have been used to formalize computational methods for automatic analysis in the digital humanities and natural language processing. Namely, we will introduce two theories from basic tradition, and one theory from both appraisal and constructionist ones.

3.1.1 Ekman’s Theory of Basic Emotions

The basic emotion theory was first articulated by Silvan Tomkins in the early 1960’s (Tomkins, 1962). Inspired by Darwin’s view of emotions as mental states that cause stereotypic bodily expressions (Gendron and Feldman Barrett, 2009), Tomkins postulated that certain emotions are automatically triggered by objects or events in the world. Importantly, each episode of certain emotion (or “instance”), Tomkins argue, is biologically similar to other instances of the same emotion or share a common trigger. Tomkins’ own work in turn inspired one of his mentees, Paul Ekman, to formulate a new theory of emotions. Ekman put in question the existing emotion theories that postulated that facial displays of emotion are socially learned and therefore vary from culture to culture. Together with Sorenson and Friesen, Ekman (Ekman et al., 1969) endeavor on a field trip to New Guinea, Borneo, the United States, Brazil, and Japan to challenge this view. The outcome of their large-scale study led to a conclusion that would revolutionize the field of psychology for many years: facial displays of fundamental emotions are not learned but innate, and therefore are universal across the nationalities. However, there are culture-specific prescriptions about how and in which situations emotions are displayed.

To come to this basic emotion definition, Ekman and et al. select 30 photographs of adult males and females, children, professional actors, and mental patients. The photographs are selected in such a way that the portrayed faces express one of the six basic affects from Tomkin’s list of affects, excluding interest and shame, namely anger, fear, disgust, surprise, sadness, and happiness. The selection of affects is based on previous research (Ekman et al., 1971)111In press at the time of publication of Ekman et al. (1969) study that finds that facial expressions pertaining to these emotions are clearly identifiable and can be scored by observers. These selected pictures are then shown to the participants of the study along with the list of six basic affects. The observer’s task is to categorize each picture into one of the six categories.

Ekman’s research boosted interest in emotion and brought forth new challenges and questions about the nature of the emotions. In his subsequent studies, Ekman showed that both nature and nurture must be considered in the study of emotions (Ekman, 1971, 1992) and that facial expressions of emotions, even when produced voluntarily, generate the physiology and some subjective feelings pertaining to the true emotional experience (Ekman et al., 1983). The latter findings gave way to a new line of research in the biology of emotions studying the emotion-specific changes in the physiology.

Based on the observation of facial behavior in early development or social interaction, Ekman’s theory also postulates that emotions should be considered discrete categories (Ekman, 1993), rather than dimensional. Though this view allows for conceiving of emotions as having different intensities (for example, anger can take different intensity, from resentment to rage), it does not allow emotions to blend and leaves no room for more complex affective states in which individuals report the co-occurrence of like-valenced discrete emotions (Barrett, 1998). This and other theory postulates were widely criticized and disputed in literature (c.f Russell (1994), Russell et al. (2003), Gendron et al. (2014), Barrett (2017)).

Regardless of the criticism that Ekman’s theory of basic emotions has undergone in recent years, the theory itself as well as its methodology, was revolutionary in the time of its appearance and continued to shape the research in emotion in the late twentieth century. Ekman’s categories of basic emotions are frequently used in the research on computational facial emotion recognition (e.g., Essa and Pentland (1997), Pantic and Rothkrantz (2000), Bartlett et al. (2005)) and well as in emotion recognition from text.

3.1.2 Plutchik’s Wheel of Emotions

Robert Plutchik was an American psychologist and a professor of psychiatry at the Albert Einstein College of Medicine, who contributed to the study of emotions, violence and suicide222Based on the information from https://www.the-emotions.com/robert-plutchik.html. In the early 80’s he formulated his psychoevolutionary theory of emotions (Plutchik, 1991, cited by revised version) together with the postulates that shape it, some of which overlap with the assumptions of Ekman’s theory, that there is a small number of basic emotions, which differ from each other both in physiology and behavior, and which can exist in varying degrees of intensity). However, there are important differences to the Ekman’s study of emotions.

First and foremost, Plutchik stated that, apart from a small set of basic emotions, all other emotions are mixed and derived from the various combinations of basic ones. He further categorized these other emotions in the primary dyads (very likely to co-occur), secondary dyads (less likely to co-occur) and tertiary dyads (co-occur seldom) (Plutchik, 1991, p. 117). Love, for instance, is a primary-dyad emotion derived from both joy and trust (the same applies to friendship). Delight is an example of the secondary-dyad emotion, which takes a little bit from both joy and surprise. Finally, guilt is a tertiary-dyad emotion being a mixture of fear and joy. Some other examples of blended emotions are optimism (anticipation + joy), aggression (anticipation + anger), shame (fear + disgust), and envy (sadness + anger). Plutchik argues that most of our daily emotions are mixed, while primary emotions almost do not exist in their pure form. More importantly, to Plutchik, mixed emotions are the actual personality traits. He writes that “Emotions like pride, aggression, submission, and optimism are usually long-lasting, and in fact are often called personality traits.” Plutchik (1991, p. 120) and later concludes that “persisting situations which produce mixed emotions produce personality traits” (Plutchik, 1991, p. 121). In other words, a conflict between two or more emotions produce a new unique personality trait or attitude, which persist over time.

The second radical difference of Plutchik’s emotion theory from the basic emotion theory of Ekman is that emotion is not reduced to physiology only. Plutchik believes that humans recognize and express emotions not with any one particular physiological signal, but in terms of overall behavior. Hence, he claims, we should study emotions through behavior and not by using bodily measurements. Plutchik writes: “Emotion is not a thing in the sense as table or chair is” (Plutchik, 1991, p. 50). For Plutchik, emotion is “a patterned bodily reaction of destruction, reproduction, […] brought about by a stimulus.” (Plutchik, 1991, p. 151), and its (emotion) properties can only be inferred, but not measured. As Ekman, Plutchik considers that emotions are innate, but this innateness has nothing to do with certain body parts or neural structures. Emotions are mere adaptive devices inherited by an individual from the process of evolution and struggle for survival. In this sense, adaptive behavior comes first, and emotion follows. Evolution taught us to explore, protect, reproduce, reject, destruct, and emotions are evolutionary devices that have relevance to basic biological adaptive processes.

In order to represent the organization and properties of the emotions as they were defined by his psychoevolutionary theory, Plutchik proposed a structural model of emotions, which he called a multidimensional model of emotions that is more known today as Plutchik’s wheel of emotions Plutchik (1991). The wheel (Figure 2) is constructed in the fashion of a color wheel, with similar emotions placed closer together and opposite emotions 180 degrees apart. The wheel is designed as a cone, where the vertical dimension indicates the intensity, ranging from maximum intensity at the top to a state of deep sleep at the bottom. Such a shape implies that emotions become less distinguishable at lower levels of intensity. Essentially, the wheel is constructed from eight basic bipolar emotions: joy versus sorrow, anger versus fear, trust versus disgust, and surprise versus anticipation. The blank spaces between the leaves are so-called primary dyads — emotions that are mixtures of two of the primary emotions.

Just as Ekman’s theory of basic emotions influenced the research in facial emotion recognition, the wheel model of emotions proposed by Plutchik too had a great impact on the field of affective computing. However, in contrast to Ekman’s model, Plutchik’s wheel of emotions is primarily used in the emotion recognition from text as a basis for emotion categorization (some examples are Cambria et al. (2012), Kim et al. (2012), Suttles and Ide (2013), Borth et al. (2013), Mohammad and Turney (2013), Abdul-Mageed and Ungar (2017)).

Figure 2: Plutchik’s wheel of emotions

3.1.3 Russel’s Circumplex Model

Despite wide popularity and influence, the theory of basic emotions elaborated in detail by Ekman is challenged by some theoretical and empirical difficulties associated with it. Main objection raised to the theory of basic emotions is that there are no reliable neural, physiological and facial correlates to specific basic emotions (Posner et al., 2005), which essentially challenges the idea of innate, and hence “universal”, emotions. At the same time, investigations in the subjective experience of emotions suggest that they arise from cognitive interpretations of physiological experiences (Cacioppo et al., 2000). Attempts to overcome the shortcomings of basic emotions theory and its unfitness for clinical studies led researches to suggesting various dimensional models, the most prominent of which is the circumplex model of affect proposed by James Russel (Russell, 1980). The word “circumplex” in the name of the model refers to the fact that emotional episodes do not cluster at the axes but at the periphery of a circle (Figure 3).

At the core of the circumplex model is the notion of two dimensions plotted on a circle along horizontal and vertical axes. These dimensions are valence (how pleasant or unpleasant one feels) and arousal (the degree of calmness or excitement). The number of dimensions is not strictly fixed and there are adaptations of the model that incorporate more dimensions, as the Valence-Arousal-Dominance model that adds an additional dimension of dominance, the degree of control one feels over the situation that causes an emotion (Bradley and Lang, 1994).

Essentially, by moving from discrete categories to a dimensional representation, the researchers are able to account for subjective experiences that do not fit nicely the isolated non-overlapping categories. Accordingly, each affective experience can be depicted as a point in a circumplex that is described by only two parameters — valence and arousal — without need for labeling or reference to folk emotional concepts (Russell, 2003). However, the strengths of the model turned out to be its weaknesses: For example, it is not clear if there are basic dimensions in the model (Larsen and Diener, 1992) and what to do with qualitatively different events of fear, anger, embarrassment and disgust that fall in identical places in the circumplex structure (Russell and Barrett, 1999). Despite these shortcomings, the circumplex model of affect is widely used in psychologic and psycholinguistic studies. In computational linguistics, the circumplex model is applied when the interest is in continuous measurements of valence and arousal rather than in the specific discrete emotional categories.

Figure 3: Circumplex model of affect: Horizontal axis represents the valence dimension, the vertical axis represents the arousal dimension

3.2 Emotion Analysis in Classical Literary Studies

Until the end of the twentieth century, literary and art theories often disregarded the importance of the aesthetic and affective dimension of literature, which in part stemmed from the rejection of old-fashioned literary history that had explained the meaning of art works by the biography of the author (Sætre et al., 2014a). However, the affective turn taken by a wide range of disciplines in the past two decades – from political and sociological sciences to neurosciences to media studies – have refueled the interest of literary critics in human affects and sentiments.

We already mentioned several works that explore the link between the arts and emotions in the Introduction. In this section, we will talk about several other studies that focus on the emotions expressed in literary art form to set a ground for further discussion of differences between classical and computational approaches to theorizing about emotions.

We said earlier there seems to be a consensus among literary critics that literary art and emotions go hand in hand. However, one might be challenged to define the specific way in which emotions come into play in the text. The exploration of this problem is presented by van Meel (1995). Underpinning the centrality of human destiny, hopes, and feelings in the themes of many artworks – from painting to literature – van Meel explores how emotions are involved in the production of arts. Pointing out to big differences between the two media in their possibilities to depict human emotions (painting convey nonverbal behavior directly, but lack temporal dimension that novels have and use to describe emotions), van Meel provides an analysis of the nonverbal descriptions used by the writers to convey emotional behavior of the characters. Description of visual characteristics, van Meel speculates, responds to a fundamental need of a reader to build an image of a person and her behavior. Moreover, nonverbal descriptions add important information, which can in some cases play a crucial hermeneutical role, as in Kafka’s Der Prozess, where the fatal decisions for K. are made clear by gestures rather than words. However, gestures are not the only nonverbal channels that are used to convey emotions in literature. Van Meel defines eight channels (bodily characteristics, clothing, facial expressions, looking behavior, hand gestures, movements of the body, voice, and spatial relations) and offers a small-scale quantitative systematic analysis of their use in literature (on a sample of six twentieth-century “classics”). The analysis shows that the voice category was the most frequently used followed by facial expressions, and hand gestures. The results, van Meel suggest, show that such types of analysis could contribute to unraveling the hidden presuppositions about inner life and its outer appearance, and can help in reconstructing the emotional universe of individual writers and historical periods.

A hermeneutic approach through the lenses of emotions is presented by Kuivalainen (2009), which provides a detailed analysis of linguistic features that contribute to characters’ emotional involvement in Mansfield’s prose. The study shows how, through the extensive use of adjectives, adverbs, deictic markers, and orthography, Mansfield steers the reader towards the protagonist’s climax. Subtly shifting between psycho-narration and free indirect discourse, Mansfield is making use of evaluative and emotive descriptors in psycho-narrative sections, often marking the internal discourse with dashes, exclamation marks, intensifiers, and repetition, which triggers an emotional climax. Various deictic features introduced in the text are used to pinpoint the source of emotions in the text, which helps in creating a picture of characters’ emotional world. Verbs (especially, in present tense), adjectives, and adverbs serve the same goal in Mansfield’s prose, to describe the emotional world of characters. Going back and forth from psycho-narration to free indirect discourse provides Mansfield with a tool to point out the significant moments in the protagonists’ lives and draw a separation between characters and narration.

Both van Meel’s and Kuivalainen works, separated from each other by more than a decade, underpin the importance of emotional language in the interpretation of characters’ traits, hopes, and tragedy, and this view in fact finds empirical support, for example in Barton (1996) and Van Horn (1997). Of course, the power of linguistic tools in conveying emotions cannot be underestimated. But at the same time its role in the creation and depiction of emotion should not be overestimated. That is, saying that someone looked angry or fearful or sad, as well as directly expressing characters emotions are not the only ways the authors resort to when building believable fictional space filled with characters, action, and emotions. In fact, many novelists strived to express emotions indirectly by way of figures of speech or catachresis Hillis Miller (2014), first of all, because emotional language can be ambiguous and vague, and, second, to avoid any allusions to Victorian emotionalism and pathos.

How can an author convey emotions indirectly? A book chapter by Hillis Miller (2014) in Exploring Text and Emotions (Sætre et al., 2014b) seeks the answer to exactly this question. Using Conrad’s Nostromo opening scenes as material, Hillis Miller shows how Conrad’s descriptions of an imaginary space generate emotions in readers without direct communication of emotions.

Conrad’s Nostromo opening chapter is an objective description of Sulaco, an imaginary land. The description is mainly topographical and includes occasional architectural metaphors, but it combines wide expanse with hermetically sealed enclosure, which generates “depthless emotional detachment” (Hillis Miller, 2014, p. 93). Through the use of present tense, Conrad is making the readers to suggest that the whole scene is timeless and does not change. The topographical descriptions are given in a pure materialist way: There is nothing behind clouds, mountains, rocks, and sea that would matter to humankind, not a single feature of the landscape is personified, not a single topographical shape is symbolic. Knowingly or unknowingly, the author argues, but by telling the reader what she should see – with no deviations from truth – Conrad employs a trope that perfectly matches Kant’s concept of the sublime. Kant’s view of the poetry was that true poets tell the truth without interpretation, they do not deviate from what their eyes see. Conrad, or to be more specific, his narrator in Nostromo, is an example of sublime seeing with a latent presence of strong emotions. On the one hand, Conrad’s descriptions are cool and detached. This coolness is caused by the indifference of the elements in the scene. On the other hand, by dehumanizing a sea and sky, Conrad generates “awe, fear, and a dark foreboding about the kinds of life stories that are likely to be enacted against such a backdrop” (Hillis Miller, 2014, p. 115).

The analysis by Hillis Miller (2014) resonates with some premises from emotion theory that we have discussed previously. Namely, Plutchik’s belief that emotions should be studied not by a certain way of expression, but by the overall behavior of a person. As long as such a formula cannot be applied to all literary theory studies about emotions (as not all authors choose to convey emotions indirectly, as well as not all authors tend to comment on characters’ nonverbal emotional behavior), it seems that one should search for a balance between low-level linguistic feature analysis of emotional language and a rigorous high-level hermeneutic inquiry dissecting the form of the novel and undercovered philosophical layers.

3.3 Computational Methods for Sentiment and Emotion Analysis

In natural language processing, a large number of tasks have been established whose goal is to extract (aspects of) meaning from unstructured natural language texts. Sentiment, opinions, stance, as well as emotions and affect belong to a family of tasks concerned with a subset of nonpropositional meaning aspects, as they do not describe what happens, but what stance is adopted towards something. Sentiment analysis (often used synonymously to opinion mining) is one of the most popular and best-understood of these tasks. The goal is to infer a polarity value (discrete or continuous) from text, based on the polarity of each term in the text. Early approaches detect the polarity of words, for instance by semantic similarity, and combine these values for phrases (Turney, 2002). The task has also been phrased as supervised (Pang et al., 2002) or semi-supervised (Täckström and McDonald, 2011) text classification to generalize from word to phrase and document level. Popular domains of application include product reviews, social media, and news articles.

A good amount of sentiment analysis research focuses on English, but is also performed in other languages, including for instance Chinese (Lee and Renganathan, 2011), Arabic (Al Sallab et al., 2015), German (Klinger and Cimiano, 2014) with several approaches in cross-lingual settings (Barnes et al., 2018a; Klinger and Cimiano, 2015; Jain and Batra, 2015; Wei and Pal, 2010; Wan, 2009). With the aim to reach a stable performance across domains, sentiment analysis-specific domain-adaptation (Barnes et al., 2018b; Glorot et al., 2011), cross-domain (Bollegala et al., 2011) and concept drift-aware models (Guerra et al., 2014) were developed.

A variety of resources to support the development of methods and models has been made available to the research community. Examples include corpora with hierarchical annotations of targets, aspects, holders, valence shifters, subjective phrases with prior and posterior polarities for English (Kessler et al., 2010, i.a.) or for German (Klinger and Cimiano, 2014, i.a.) as well as dictionaries of polarity assignments to separate words (Esuli and Sebastiani, 2006; Hu and Liu, 2004) or combinations of words (Fahrni and Klenner, 2008). For emotion detection, fewer resources exist (exceptions include Alm et al., 2005; Scherer and Wallbott, 1994; Strapparava and Mihalcea, 2007; Aman and Szpakowicz, 2007).

There are several levels of granularity (each of increasing complexity) of the sentiment analysis tasks, but two most popular tasks are document sentiment analysis (also known as coarse-grained sentiment analysis) and fine-grained sentiment analysis, which we discuss below along with relevant methodology. In the remainder of the section, we follow Liu (2015) when providing the definitions and overview of the methods.

3.3.1 Document Sentiment Analysis

By and large, document-level sentiment classification is the most popular and extensively studied topic in the field of sentiment analysis (Liu, 2015, p. 47). The goal of this task is to classify a document (e.g., a product review) as expressing positive or negative sentiment, which are called sentiment orientations or polarities (Liu, 2015, p. 47). Document sentiment classification considers each document as a whole and ignores such details as who is expressing the sentiment and towards which aspects of the product. The assumption behind document sentiment analysis is that the document expresses sentiments on a single entity and those sentiments come from a single sentiment holder. This assumption may not always hold true, because sometimes an author of a document may express opinions on multiple products comparing them or express opinion on multiple aspects of the product.

Supervised sentiment classification

Sentiment classification is similar to other text classification problems. The only difference is that, for example, in text genre classification, topic-related words (e.g. from sport, politics, and science domains) are important discriminative features, while in sentiment analysis these are sentiment words that indicate polarity (e.g., great, terrible, funny, sad). Most approaches to document sentiment analysis rely on supervised machine learning techniques, where the task is to assign a polarity to a document. Usually, polarity can be positive, negative, and neutral. In practice, any supervised machine learning technique can be used for sentiment classification, for example naïve Bayes or support-vector machines (SVM). Similarly, like for most supervised learning applications one of the most important steps is feature engineering. Most features for sentiment classification are similar to the ones used in more traditional text classification problems and include, but are not limited to, bag-of-words, word n-grams, POS tags, and dependency-based syntactic features (Liu, 2015).

Among other features that have been developed specifically for sentiment analysis are sentiment shifters and rules of opinion. Sentiment shifters are specific words that flip the polarity of sentiment words. For instance, the sentence “I don’t like this movie” is negative though the word “like” is positive. However, the presence of a negation word does not always imply the shift in the polarity (e.g., “not only … but also”). That’s where another set of custom features, rules of sentiment, proves to be helpful. Rules of sentiment are lists of language composition rules that can be used to express or imply sentiment. The general idea behind such rules is that each rule represents a scenario that implies a positive or negative sentiment. There are various ways of representing and capturing such rules that cannot be covered here. It is common to combine several features: Some examples of such feature combinations are POS n-grams coupled with semantic relations (Gamon, 2004), dependency relations with various n-grams (Joshi and Penstein-Rosé, 2009), combinations of n-grams with counts of verbs, adjectives, adverbs, nouns, and other parts of speech, as well as binary features of polarity term presence (Kouloumpis et al., 2011), and various feature weighting experimenting techniques (Kim et al., 2009; Martineau and Finin, 2009; Paltoglou and Thelwall, 2010). We refer readers to Liu (2015) for the survey of existing approaches.

In addition to classifying documents to predefined classes, the research has also been done on predicting the rating scores (e.g., number of stars) for product reviews (Pang and Lee, 2005; Ganu et al., 2009). In this case sentiment analysis becomes a regression problem as the algorithm predicts an ordinal number.

Researchers have also proposed several custom techniques for sentiment classification that do not rely on standard machine learning methods. For example, Dave et al. (2003) introduce a custom scoring function that, for each term, computes the probability of its presence in positive or negative classes of reviews. A term score is a measure of the term’s bias towards either class ranging from -1 to 1. The overall document score is then computed by summing up and normalizing individual scores of terms and each document is then classified based on a threshold parameter.

Current state-of-the-art approaches to sentiment analysis rely on embedding-based feature extraction and deep learning architectures (Barnes et al., 2017; Tang et al., 2014; dos Santos and Gatti, 2014; Sohangir et al., 2018). These approaches represent words as a function of their context, which enables machine learning algorithms to generalize over words that have similar contextual representations.

All of the above-mentioned approaches require labeled data that, as we discussed earlier, can be expensive and time-consuming to obtain. Therefore, several unsupervised approaches to sentiment classification have been proposed, which we discuss in Section 3.3.1.

Unsupervised classification

One of the methods of unsupervised sentiment classification based in syntactic patterns is presented in Turney (2002). The method consists of extracting sequences of two consecutive words if their POS tags conform to certain patterns (e.g., adjective followed by noun, etc.). Then, the semantic orientation for each extracted phrase is calculated using pointwise mutual information, based on its association with the positive reference word “excellent” and the negative word “poor”. In the final step, the algorithm computes the average semantic orientation of each phrase in the review.

Another family of methods which is often seen as being unsupervised are dictionary or lexicon-based approaches. As we have seen previously, polarity terms are the most discriminative features for sentiment analysis, which makes it possible to classify documents having only a lexical resource containing words from both polarities at hand. Usually, lexicon-based approaches consist of counting and summing the words of known polarity in the document. Each positive expression in this case is given a value , and each negative expression is given a value of . The overall sentiment of the document is considered positive, if the final score of the document is positive, neutral if it is zero (or close to zero), and negative if it is negative. There are many variations of this approach. Some methods of scoring the terms may also take into account intensification and negation to compute a sentiment score for each document (Kennedy and Inkpen, 2006; Polanyi and Zaenen, 2006; Taboada et al., 2006a). For example, nice is and not nice is . The similar procedure is applied to intensifiers or diminishers. For example, if the sentiment word is preceded by a downtoner it is scored -1, if it is preceded by an amplifier, it is scored -3.

One of the main disadvantages of the lexicon-based approach, is that it may lead to suboptimal results when dealing with domain-specific sentiment expressions. However, this approach is warrant when no training data is available and no supervised machine learning can be applied.

3.3.2 Fine-grained Sentiment Analysis

Document sentiment analysis may be insufficient or even undesirable, when dealing with product reviews. The reason is that the manufacturers are often interested not only in overall sentiment of the customer towards their product, but in a more fine-grained information, such as which aspects of the product the customer likes and which not. The task of extracting relevant entities and phrases in text and associating them with polarity values is often referred to as fine-grained sentiment analysis. An example for a product review in which relevant entities are annotated333from Klinger and Cimiano (2014), https://www.amazon.com/gp/customer-reviews/R240ELITUG28KP is the following text:

this . I live in a tiny condo and I have been washing my dishes by hand for 12 years. I was sick of it. A told me about these and I bought one. The . does a , and I don’t have a sink full of dishes anymore. for and dish wash weary people!

Here, (at least) the following phrases play a role: the theme of the opinion (often a product, here: “dishwasher”); opinion holder (the author, “I”, and a “friend”), aspects of the product (that it is good for “small spaces”, and does a good “job”) and evaluative, subjective phrases (“Awesome”, “love”, “best thing I have ever done”, “great”, “highly recommend”, all positive).

There are various approaches to defining relevant elements of a sentiment structure. For example, these can be aspects of real world objects, e.g., the display of camera, or the food quality of a restaurant (Jakob and Gurevych, 2010; Nakov et al., 2016). As a simplification, a finite set of such entities can be pre-defined (Ganu et al., 2009). Further fine-grained approaches extend this idea to extract the opinion holder (source) who utters the polar statement (Kim and Hovy, 2006) or to detect polarity shifters and negations (Reitan et al., 2015; Li et al., 2010). The subtasks to detect different arguments of the sentiment structure influence each other. Therefore, several joint models for these subtasks have been proposed (e.g. Yang and Cardie (2013); Klinger and Cimiano (2013a, b); Titov and McDonald (2008); Lin et al. (2012); Lin and He (2009)).

More current approaches to fine-grained sentiment analysis rely on deep learning methods (Akhtar et al., 2017; Cortis et al., 2017). A detailed overview of the deep learning approaches to sentiment analysis, including fine-grained, can be found in Zhang et al. (2018).

In general, fine-grained sentiment analysis is more informative than document sentiment analysis. This, however, comes at cost of more expensive labeled data, as each document has to be annotated not only for polarity but for relevant elements of a sentiment structure. This increases the complexity of the annotation process, as well as subsequent classification, both supervised and unsupervised.

3.3.3 Computational Emotion Analysis

In this section, we give an overview of the methods of emotion recognition from text. This section is limited to an overview of emotion analysis methods in natural language processing and is an important prerequisite for understanding the upcoming discussions in the main part of our survey. For an extensive review of existing emotion datasets to emotion extraction we refer the reader to Bostan and Klinger (2018).

Although emotion recognition from text is a relatively new task, there are various approaches ranging from simple lexicon matching (Dodds et al., 2011), which is similar to unsupervised sentiment analysis (see Section 3.3.1), to deep learning methods (e.g., Abdul-Mageed and Ungar, 2017), both supervised and unsupervised.

One of the earliest works on emotion detection from text is Alm et al. (2005) that uses supervised machine learning to classify sentences from children books into emotional categories using a large number of linguistic features, such as exclamation marks, ratio of adjectives, nouns, and verbs, direct speech. Aman and Szpakowicz (2007) is another early emotion annotation and classification study that uses lexical features in a bag-of-words fashion to classify sentences from a blog corpus into a set of Ekman’s emotions.

The task of emotion classification of text is challenging. One possible reason for that is that the text should be classified in a larger number of classes. In most cases, it is conventional to use emotion classes from existing emotion theories as reference categories. That means that the task of emotion classification stretches beyond binary classification (e.g., positive vs. negative as in sentiment analysis). Ghazi et al. (2010) propose one way to tackle this problem with hierarchical classification, which first classifies the text as emotional or not emotional, and then performs a multiclass categorization on emotional text.

Emotions have strong linguistic markers that define the tone of the text (Johnson-Laird and Oatley, 1989). This warrants different approaches to emotion detection from text that rely on linguistically rich features. For example, Neviarouskaya et al. (2009) propose rules to formulate how nouns, verbs or adjectives dominate the emotion in the corresponding sentence and combine those with modifiers. Gao et al. (2014) use dependency-based features for classification, including negations. Agrawal and An (2012) propose an unsupervised framework for emotion extraction from sentences based on semantic relatedness between words and various emotion concepts.

As text classification in general, the array of methods seen for emotion classification can be divided into rule-based methods and machine learning, which we discuss in the following.

Rule-based Algorithms

Rule-based text classification typically builds on top of lexical resources of emotionally charged words. These dictionaries can originate from crowdsourcing or expert curation. Examples include WordNetAffect (Strapparava and Valitutti, 2004) and SentiWordNet (Esuli and Sebastiani, 2006), both of which stem from expert annotation. Partly built on top of them is the NRC Word-Emotion Association Lexicon (Mohammad and Turney, 2013), which uses the eight basic emotions from Plutchik classification. Warriner et al. (2013) use crowdsourcing to assign values of valence, arousal, and dominance (Russell, 1980). Another related category of lexical resources which has been used for emotion analysis is concreteness and abstractness (Köper et al., 2017). Brysbaert et al. (2014) publish a lexicon based on crowdsourcing, where the task was to assign a rating from 1 to 5 of the concreteness of 40,000 words. Similarly, Köper and im Walde (2016) automatically generate affective norms of abstractness, arousal, imageability, and valence for 350,000 lemmas in German. The Linguistic Inquiry and Word Count (LIWC) is a set of 73 lexicons (Pennebaker et al., 2007), built to gather aspects of lexical meaning regarding psychological tasks. Dictionary and rule-based approaches are particularly common in the field of digital humanities due to their transparency and straightforward use.

Machine Learning

A performance improvement over dictionary lookup has been observed with supervised learning. Common features include word n-grams, character n-grams, word embeddings, affect lexicons, negation, punctuation, emoticons, or hashtags (Mohammad, 2012a). This feature representation is then usually used as input to feed classifiers such as naive Bayes, SVM, MaxEnt and others to predict the relevant emotion category (Alm et al., 2005; Aman and Szpakowicz, 2007). Similar to the paradigm shift in sentiment analysis, from feature-based modelling to deep learning, state-of-the-art models for emotion classification are often based on neural networks. Schuff et al. (2017) applied models from the classes of CNN, BiLSTM, and LSTM and compare them to linear classifiers (SVM and MaxEnt), where the BiLSTM show best results with the most balanced precision and recall. Abdul-Mageed and Ungar (2017) claim the highest F1 following Plutchik’s emotion model with gated recurrent unit networks (Chung et al., 2015). One approach to tackle sparsity of datasets is transfer learning; to make use of similar resources and then transfer the model to the actual task. A recent successful example for this procedure is Felbo et al. (2017) who present a neural network model trained on emoticons which is then transferred to different downstream tasks, namely the prediction of sentiment, sarcasm, and emotions.

A recent shared task on implicit emotion detection (IEST, Klinger et al., 2018) showed that the majority of participants built systems on top of deep learning architectures, similarly to participants of the emotion intensity shared tasks in previous years (Mohammad and Bravo-Marquez, 2017; Mohammad et al., 2018). Therefore, we can conclude that the paradigm shift to deep learning has reached the field of emotion analysis.

Features Models








Naive Bayes

Decision Trees



Deep Learning


Emotion Class.

Yu (2008) y y y
Barros et al. (2013) y y
Reed (2018) y y
Zehe et al. (2016) y y y
Reagan et al. (2016) y y y
Samothrakis and Fasli (2015) y y y
Kim et al. (2017a) y y y y
Henny-Krahmer (2018) y y


Heuser et al. (2016) y y
Bruggmann and Fabrikant (2014) y y
Rebora (2017) y y
Taboada et al. (2006b, 2008) y y y
Chen et al. (2012) y y
Marchetti et al. (2014) y y y
Sprugnoli et al. (2016) y y y
Buechel et al. (2017) y y
Buechel et al. (2016) y y y

Network Analysis

Nalisnick and Baird (2013a, b) y y
Elsner (2012, 2015) y y y
Kim and Klinger (2018) y y y y
Barth et al. (2018) y y
Jhavar and Mirza (2018) y y
Egloff et al. (2018) y y
Rinaldi et al. (2013) y
Zhuravlev et al. (2014) y
Jafari et al. (2016) y

Emotion Tracking

Anderson and McMaster (1986) y y
Anderson and McMaster (1993) y y
Alm and Sproat (2005) y y
Mohammad (2011, 2012b) y y
Klinger et al. (2016) y y
Kim et al. (2017b) y y y
Gao et al. (2016) y
Kakkonen and Galic Kakkonen (2011) y y y
Koolen (2018) y y
Morin and Acerbi (2017) y y
Bentley et al. (2014) y y
Table 1: Summary of characteristics of methods used in the papers reviewed in this survey.

4 Emotion and Sentiment Analysis in Computational Literary Studies

With this section, we proceed to the main part of the survey and provide an overview of the existing body of research on computational analysis of emotion and sentiment in computational literary studies. An overview of the papers including their properties is shown in Table 1. The table as well as this section is divided into several subsections each of which corresponds to a specific application of sentiment analysis to literature. Section 4.1 reviews the papers that deal with the classification of literary texts in terms of emotions they convey; Section 4.2 examines the papers that address text classification into genres or other story-types based on sentiment and emotion features; Section 4.3 is dedicated to research in modeling sentiments and emotions in texts from previous centuries, as well as research dealing with application of sentiment analysis to the texts written in the past; Section 4.4 provides an overview of sentiment analysis applications to character analysis and character network construction, and Section 4.5 is dedicated to more general applications of sentiment and emotion analysis to literature.

4.1 Emotion Classification

We discussed in Section 3.3.3 that a fundamental question of the emotion classification task is how to find the best features and classification algorithms to classify the data (sentences, paragraphs, entire documents) into the predefined classes. When applied to literature, such a classification may be of use for grouping different literary texts in digital collections based on the emotional properties of the stories. For example, books or poems can be grouped based on the emotions they convey or based on whether they have happy endings or not.

A straight-forward task of emotion classification is to separate the text that expresses emotions from the one that does not (cf. Alm et al. (2005)). For example, Yu (2008) apply support vector machines and naive Bayes for the task of the classification of early American novels belonging to the genre of sentimentalism. The final goal of the study is to explore what linguistic patterns characterize this subgenre. To that end, they construct a collection of five novels from the mid-nineteenth century and annotate the emotionality of each of the chapters as “high” or “low”. Consequently, the task is to learn the classifier that would categorize the respective chapters as highly emotional or the opposite. They extract content words from the collection of texts, excluding proper names and words that occur less than five times, and construct several representations of features that are fed into the classifier. The results of the classification with both SVM and näive Bayes show that Boolean features444Features that represent word absense of presence (0,1) in the classification unit as feature values. are useful for the task. The author shows that no-stemming leads to better classification results. A possible explanation for that can be that stemming conflates and neutralizes a large number of discriminative features. The author provides an example of such a conflation with words “wilderness” and “wild.” – while the latter is used with the same frequency in both high and low emotional pieces, the former one is primarily used in highly emotional chapters.

Classification of poetry based on emotions

Barros et al. (2013) report on the results of the experiment on the categorization of 185 Francisco de Quevedo’s poems that are divided by literary scholars into four categories (Love, Songs to Lisi, Satiric, and Philosophical-Moral-Religious). The authors aim at answering two research questions: a) Is the classification proposed by the literary scholars consistent with the sentiment reflected by the corresponding poems? and b) Which learning algorithms perform the best on the classification? Using emotions of joy, anger, fear, and sadness as point of reference, Barros et al. construct the list of emotion words by looking up the synonyms of English emotion words and adjectives associated with these four emotions555https://bab.la/ and translating them to Spanish. Then, each poem is searched for the occurrence of emotion words. The number of words associated with each emotion is normalized by the length of the poem and each emotion is represented by a value in a tuple. The experiments with different algorithms show the superiority of decision trees achieving accuracy of almost 60%. However, this result is biased by unbalanced distribution of classes. To avoid the bias, Barros et al. apply a resampling strategy that leads to a more balanced distribution and repeat the classification experiments. After resampling, the accuracy of decision trees in a 10-fold cross validation achieves 75,13%, thus improving over the previous classification performance. This result confirms that a meaningful classification of the literary pieces based only on emotion information is possible and is more than twice a good as random picking.

Reed (2018) offers a proof-of-concept for performing sentiment analysis on a corpus of the twentieth-century American poetry. Specifically, Reed analyzes the expression of emotions in the poetry of the Black Arts Movement (BAM) of the 1960s and 1970s. The paper describes an on-going project “Measured Unrest in the Poetry of the Black Arts Movement” whose goal is to understand 1) how the feelings associated with injustice are coded in terms of race and gender, and 2) what can sentiment analysis show us about the relations between affect and gender in poetry. Currently, the study uses a small corpus of poetry (twenty-six books) from prominent BAM authors. Reed notes that the two packages that were used for sentiment analysis of the poems, namely Pattern (Smedt and Daelemans, 2012) and VADER (Hutto and Gilbert, 2014), provide sentiment predictions for some of the poems that either contradict or are in line with what critics say about them. For example, Pattern considers Quincy Troupe’s “Come Sing a Song” to be the most negative in the corpus. This is in contrast to Reed’s close reading of the poem that suggests it to be positive and celebratory. At the same time, VADER provides predictions that are in line with critics’ judgement. Reed notes that surface affective value of the words does not always align with their more nuanced affective meaning shaped by poetic, social, and political contexts. The goal of the project, therefore, will be to closer analyze such discrepancies and correspondences, and to examine what it is that provides a book, poem, or poetic line with its emotional charge.

Classification of happy ending vs. non-happy endings

Zehe et al. (2016) classify 212 German novels written between 1750 and 1920 as having happy or non-happy ending. A novel is considered to have a happy ending if the situation of the main characters in the novel improves towards the end or is constantly favorable. The novels were manually annotated with this information by domain experts. For feature extraction, the authors first split each novel into n segments of the same length. Then they calculate sentiment values for each of the segments by counting the occurrences of words that appear in the respective segment and are found in the German version of NRC Word-Emotion Association Lexicon (Mohammad and Turney, 2013) and dividing this number by the length of the dictionary. Unlike other works previously mentioned in this category, Zehe et al. consider negations by adding a negation marker to any word that is directly preceded by negation, and thus inverting the word’s sentiment score. In addition to segments, Zehe et al. define main, late-main and final sections in a book. Main section covers 75% to 98% of the segments from the beginning of the book, late-main covers the last part of the main and final section covers the rest. Finally, they calculate the sentiment score for the sections, by taking the average of all sentiment scores in the segments that are part of the section. These steps are then followed by classification with a SVM with 10-fold cross-validation that achieves F1-score of 0.73. Zehe et al. also report on the best parameter configuration observing that splitting the novels into 75 segments with the NRC leads to the best performance of the classifier.

4.2 Genre and Story-type Classification

The papers we discussed so far focus on understanding the emotion associated to units of texts. This extracted information can further be used for downstream tasks and also for downstream evaluations. We discuss the following downstream classification cases here. The papers in this category use sentiment and emotion features for a higher-level classification, namely story-type clustering and literary genre classification. The assumption behind these works is that different types of literary text may show different composition and distribution of emotion vocabulary, and thus, can be classified based on this information. The hypothesis that different literary genres convey different emotions stems from the common knowledge: we know that horror stories instill fear, mysteries evoke anticipation and anger, while romances are filled with joy and love. However, the task of automatic classification of these genres is not always that straight-forward and reliable, as we will see in this section.

Story-type clustering

A study by Reagan et al. (2016) is inspired by Kurt Vonnegut’s lecture on emotional arcs of stories666As available on Web Archive as of March 2010 https://web.archive.org/web/20100326094804/http:/www.laphamsquarterly.org:80/voices-in-time/kurt-vonnegut-at-the-blackboard.php?page=all, Reagan et al. test the idea that the plot of each story can be plotted as an emotional arc, i.e. a time series graph, where the x-axis represents a time point in a story, and the y-axis represents the events happening to the main characters, which can be favorable (peaks on a graph) or unfavorable (troughs on a graph). Moreover, as Vonnegut puts it, the stories can be grouped by these arcs and the number of such grouping is limited. To test this idea, Reagan et al. collect 1,327 books most popular books from the Project Gutenberg777https://www.gutenberg.org/ filtering them by length (between 20,000 and 100,000 words) and the total number of downloads888They ignore the books with less than 40 downloads as well as collections and poems. Each book is then split into uniform-length segments for which sentiment scores are calculated in a sliding window approach with window size of 10,000 words. The emotional content is rated using the labMT sentiment dictionary (Dodds et al., 2011). To unveil the structure of the corpus hidden behind the constructed emotional arcs, the authors apply several methods, namely singular value decomposition (SVD), hierarchical clustering (Ward Jr., 1963), and a self-organizing map (Kohonen, 1990) (a neural network-based type of unsupervised clustering). The results of the analysis show support for six emotional arcs that are shared between subgroupings of the corpus and can broadly be restricted to the following patterns:

  • Rise (“Rags to riches”): the arc starts at low point and steadily increases towards the end;

  • Fall(“Tragedy”): the arc starts at high point and steadily decreases towards the end;

  • Fall-rise (“Man in a hole”): the arc drops in the middle of the story but increases towards the end;

  • Rise-fall (“Icarus”): the arc hits the high point in the middle of the story decreases towards the end;

  • Rise-fall-rise (“Cinderella”): the arc fluctuates between high and low points but ends with an increase;

  • Fall-rise-fall (“Oedipus”): the arc fluctuates between high and low points but ends with a decrease.

Additionally, Reagan et al. examine the downloads for all the books that are most similar in terms of SVD and find that “Icarus”, “Oedipus”, and “Man in the hole” arcs are the three most popular emotional arcs among readers, though the authors are cautious in concluding that the number of downloads is a direct indicator of a success of a story. However, they conclude that the methods they apply provide a broad support for the six basic emotional arcs.

Genre classification

Similar in spirit are the studies by Samothrakis and Fasli (2015) and Kim et al. (2017a, b). The study by Samothrakis and Fasli (2015) examines the hypothesis that works of fiction can be characterized by the emotions they portray. To that end they collect works of genres mystery, humor, fantasy, horror, science fiction and western from the Project Gutenberg. Using WordNet-Affect (Strapparava and Valitutti, 2004) to detect emotion words as categorized by Ekman’s fundamental emotion classes, they calculate an emotion score for each sentence in the text. Each work is then transformed into six signals, one for each basic emotion. Consequently, each signal is smoothed and the average of a signal for each n sentences is taken. This creates a smoothed version of the signals with 50 timesteps. This process leads to 300 features extracted for each work of fiction (50 for each basic emotion), which are fed into a random forest classifier with 1,500 trees trained and tested in 10-fold cross-validation. All these steps lead to a classification accuracy of 0.52, which is higher than the random guess and most frequent class baseline (0.36). The error analysis reveals that the classifier mostly misclassifies horror fiction as fantasy or science fiction. Additionally, they calculate the importance of each feature by calculating each feature importance for tree split and find that the most discriminating feature is fear. Another interesting observation that there is no particular correlation between joy and different genres, which may be due to the higher number of keywords related to joy compared to other emotions.

A more recent study by Kim et al. (2017a) originates from the same premise as Samothrakis and Fasli (2015), namely that the emotional content of the works of fiction is a discriminative factor for genre classification. Extending the set of tracked emotions to Plutchik’s classification, Kim et al. collect 2,000 books from the Project Gutenberg that belong to five genres found in the Brown corpus (Francis and Kucera, 1979), namely adventure, science fiction, mystery, humor and romance. To extract features, each book is split into n evenly-sized consecutive chunks. The emotion score for each emotion for each of the chunks is then calculated using NRC dictionary. In contrast to Samothrakis and Fasli (2015), Kim et al. define a richer feature set and a bigger set of classifiers. In particular, in addition to emotion scoring features they also use standard bag-of-words (BoW) features as a baseline, as well as BoW features that are filtered to emotion words only. The authors extend the set of classification algorithms beyond random forests using a multi-layer perceptron and convolutional neural networks (CNN). The CNN architecture achieves the best performance (0.59 F1-score) but is outperformed by the baseline BoW model (0.80 F1-score). At the same time, the BoW model filtered to emotion-bearing words performs statistically better (0.81 F1-score) than the baseline model. In addition, Kim et al. introduce an ensemble SVM classifier that takes as input predictions of all the classifiers, which results in an increased performance (0.84 F1-score) compared to the best result achieved by previous models. The authors continue their analysis of classification errors by introducing the notion of prototypicality, which is computed as average of all emotion scores. Having defined a point of reference for each genre in question it is possible to analyze how uniform the development of each emotion in books is that belong to one genre using Spearman correlation. The results of this analysis suggest that fear and anger are the most salient plot devices in fiction, while joy is only of mediocre stability, which is in line with findings of Samothrakis and Fasli (2015).

A study by Henny-Krahmer (2018) reports on the results of a subgenre classification on a corpus of Spanish American novels using sentiment values as features. The paper has two goals: The fist is to test whether different subgenres differ in the degree and kind of emotionality, and second, whether emotions in the novels are expressed in the direct speech of characters or in narrated text. To answer the first question, each novel is split into five segments and for each sentence in the segment the emotion score is calculated using SentiWordNet (Baccianella et al., 2010) and NRC (Mohammad and Turney, 2013) dictionaries. The features are then fed to a decision tree classifier. The classifier achieves an average of 0.52, which is higher than the most-frequent class baseline and, hence, provides a support for emotion-based features in subgenre classification. The analysis of feature importance shows that the most salient features come from the sentiment scores calculated from the direct speech of the characters, and that novels with higher values of positive speech are more likely to be sentimental novels.

There are some limitations to the studies presented in this section. On the one hand, it is questionable how reliable is coarse emotion scoring that takes into account only presence or absence of words found in specialized dictionaries and overlook negations and modifiers that can either negate an emotion word or increase/decrease its intensity. On the other hand, a limited view of the emotional content as a sum of emotion bearing words reserves no room for qualitative interpretation of the texts — it is not clear how can one distinguish between emotion words used by the author to express his/her sentiment, between words used to describe character’s feelings, and emotion words that characters use to address or describe other characters in a story.

4.3 Temporal Change of Sentiment

The papers that we have reviewed so far approach the problem of sentiment analysis as a downstream classification task. However, as we showed in Section 1.3, applications of sentiment analysis are not only limited to classification. For example, computational social sciences make use of sentiment analysis for detecting political preferences of the electorate or for mining opinions about different products or topics (see Section 1.3). Similarly, several digital humanities studies incorporate sentiment analysis methods in a task of mining sentiments and emotions of people who lived in the past. The goal of these studies is not only to recognize the sentiments, but also to understand how they were formed.

The papers in this section can be grouped into two categories: topography of emotions, and tracking sentiments in texts from previous centuries.

Topography of emotions

A study by Heuser et al. (2016) reports on the effort to build an interactive map of emotions in Victorian London for the project called Emotions of London999https://www.historypin.org/en/victorian-london/. The whole project starts with the premise that emotions occur at a specific moment in time and space, thus making it possible to link emotions to specific geographical locations. Consequently, having such information at hand, one can glimpse into which emotions are hidden behind literary landmarks of Victorian London. To construct a corpus for their analysis, Heuser et al. collect a large corpus of English books from 18th and 19th centuries and extract 383 geographical locations of London that have at least ten mentions each. The resulting corpus includes 15,000 passages, each of which has a toponym in the middle and 100 words directly preceding and following the location mention. The data is then given to annotators who are asked to define whether each of the passages expressed happiness or fear, or neutral. The same data is also analyzed by a custom sentiment analysis program that would assign each passage one of these emotion categories. Only 12% of passages are annotated by humans as conveying fear and 21% as conveying happiness. The remaining 67% of passages are marked as neutral. The evaluation of automatic emotion prediction shows that the program matches human judgement for happiness (21%) but mismatches it for fear (only 1% of passages were classified as fearful). The program also classifies the majority of the passages as neutral (78%). Some striking observations are made with regard to the data analysis. First, there is clear discrepancy between fiction and reality — while toponyms from the West End with its Westminster and the City are over-represented in the books, the same does not hold true for the East End with its Tower Hamlets, Southwark, and Hackney. Hence, there is less information about emotions of this particular location of London. Another striking detail is that the resulting map is dominated by neutral emotion. Heuser et al. argue that this has nothing to do with the absence of emotions but rather stems from the fact that emotions tend to be silenced in public domain, which influenced the annotators decision. To test this hypothesis, they take a sample of 200-word passages not including place names and ask annotators to tag them. The results clearly support this claim: the number of fearful passages increases to from 12% to 25% and number of happy passages increases from 21% to 34%.

The space and time context is also used by Bruggmann and Fabrikant (2014) who model sentiments of the Swiss historians towards places in Switzerland in different historical periods. As the authors note, it is unlikely that a historian will directly express attitudes towards certain toponyms, but it is very likely that words they use to describe those can bear some negative connotation (e.g., Cholera, death). Correspondingly, such places should be identified as bearing negative sentiment by sentiment analysis tool. Additionally, they study the changes of sentiment towards a particular place over time. Using the General Inquirer (GI) lexicon (Stone et al., 1968) to identify positive and negative terms in the document, they assign each document a sentiment score by summing up the weights of negative and positive words and normalizing them by the document length. The authors conclude that the results of their analysis look promising, especially regarding negatively scored articles. However, the authors find difficulties interpreting positively ranked documents, which may be due to the fact that negative information is more salient.

Tracking Sentiment

Rebora (2017) introduce preliminary results of the project aimed at modeling the reception of secondary Italian literature in 19th-century England. The project is in the starting phase at the time of writing with the goal of producing graphs quantifying the amount of texts dedicated to each Italian writer with an indication of positive or negative reception. The sentiment analysis module of the project pipeline will make use of existing off-the-shelves tools, e.g. SentiStrength101010http://sentistrength.wlv.ac.uk/, as well as available lexica (NRC Word-Emotion Association Lexicon) and manual annotation of the corpus.

Taboada et al. (2006b, 2008) present the results of a sentiment analysis task aimed at tracking the literary reputation of six authors writing in the first half of the 20th century, namely James Galsworthy, Marie Corelli, Arnold Bennet, D.H. Lawrence, Virginia Woolf, and T.S. Eliot. Both studies present preliminary results of the pilot studies. The research questions raised in the project are how the reputation is made or lost, and how to find correlation between what is written about the author and his/her work to the author’s reputation and subsequent canonicity. To that end, the project’s goal is to examine critical reviews of six authors writing and map information contained in the critical texts to the author’s reputation. The material they work with include not only reviews, but also press notes, press articles, and letters to editors (including from the authors themselves). For the pilot project with Galsworthy and Lawrence they have collected and scanned 330 documents (480,000 words). The documents are tagged for the parts-of-speech and relevant words (positive and negative) are extracted using custom-made sentiment dictionaries. The sentiment orientation of rhetorically important parts of the texts is then measured. However, no information is available at the moment about the next steps of the project, namely mapping semantic orientation to reputation of the authors.

Chen et al. (2012) uses as input personal narratives of Korean “comfort women” who had been forced into sexual slavery by Japanese military during World War II. Adapting the WordNet-Affect lexicon (Strapparava and Valitutti, 2004) Chen et al. build their own emotion dictionary to spot the emotional keywords in women’s stories and map the sentences to emotion categories. By adding variables of time and space, Chen et al. provide a unified framework of collective remembering of this historical event as witnessed by the victims.

Finally, though no publications are available at the time of writing, an interesting project to follow is the Oceanic Exchanges project (Cordell et al., 2017) that started in late 2017111111urlhttp://oceanicexchanges.org/. The goal of the project is to trace information exchange in the 19th century newspapers and journals, with sentiments being one the angles of the analysis.

Other papers in this category put emphasis not so much on the sentiments expressed by the writers of the past, but rather on methodology of sentiment detection from old texts. This is especially warranted in DH domain, as without proper methodology interpretation may be less reliable.

Marchetti et al. (2014) and Sprugnoli et al. (2016) present the integration of sentiment analysis in ALCIDE (Analysis of Language and Content In a Digital Environment) project121212http://celct.fbk.eu:8080/Alcide_Demo/. The goal of the project is the analysis of historical texts with particular focus on the writings of Alcide De Gasperi, an Italian politician who founded the Christian Democracy Party in the beginning of the twentieth century. The aim of the sentiment analysis module in the NLP pipeline of the project is “to quantify the general sentiment of single documents, to track the attitude towards a specific topic or entity over time … and to allow specific search based on sentiment.” Sprugnoli et al. integrate this functionality in two ways: 1) based on prior polarity, and 2) based on contextual polarity. Using WordNet-Affect, SentiWordNet (Baccianella et al., 2010) as a source of polarity terms and MultiWordNet (Pianta et al., 2002) as a source of their Italian counterparts, Sprugnoli et al. assign each document a polarity score by summing up the words with prior polarity and dividing by the number of words in the document. Positive global score leads to positive document polarity and negative global score leads to negative document polarity. To overcome the issue of neglected context in assigning the polarity, they adopt a methodology based on contextual polarity (they identify two topics of interest), for which purpose they have data annotated by two expert annotators and one non-expert annotator. The results indicate that overall accuracy of assigning polarity is higher with contextual polarity — 68.30% contextual vs. 43% prior polarity, and is especially better when dealing with negative and neutral polarity. At the same time, the prior polarity approach shows a more robust performance on positive examples. The overall conclusion of their work is that assignment of polarity in the historical domain is an extremely challenging task largely due to lack of agreement on polarity of historical sources between human annotators.

Buechel et al. (2017) presents a novel method for measuring emotions in non-contemporary German texts. Challenged by the problem of applicability of existing emotion lexicons to historical texts, Buechel et al. propose a new method of constructing affective lexicons that would adapt well to the German texts written up to three centuries ago. In their study, Buechel et al. use the representation of affect based on Valence-Arousal-Dominance model. Presumably, such a representation provides a finer-grained insight into the literary text Buechel et al. (2016), which is more expressive than discrete categories, as it quantifies the emotion along three different dimensions. To induce historical VAD lexicon they use Schmidtke et al. (2014) lexicon as a source of seed values for German language. They compare several lexicon expansion algorithms and evaluate them by comparing their induced historical lexicon against the judgement of knowledgeable PhD students from the humanities. They find that Turney-Litman algorithm (Turney and Littman, 2003) performs the best in this set-up and use it in the rest of the analysis. As a basis for the analysis, they collect German texts from the Deutsches Textarchiv131313http://www.deutschestextarchiv.de/ written between 1690 and 1899. The resulting corpus is split into seven slices, each spanning 30 years. For each slice they compute word similarities using Levy et al. (2015) algorithm and then apply Turney-Litman expansion algorithm, thus obtaining seven distinct emotion lexicons, each corresponding to specific time period. Such a procedure, the authors argue, allows to trace the shift in emotion association of words over time. To support the claim, they select the words for which they could compute similarity scores in each time step and visualize the overall development of these. For example, they show the development of Sünde (sin) coincides with the age of enlightenment, which is often understood as a starting point of secularization, and acquires an additional moral meaning in the end of the analyzed period, which was not present in the beginning (Ausschweifung - excess, Ärgernis - nuisance, and Laster - vice). Finally, Buechel et al. find clear emotional signals for evolving distinctions between the principal German literary forms, Narrative, Lyric, and Drama, finding the most distinct emotional patterns between 1780 and 1809 (roughly corresponding to the Weimar Classicism) and 1870 and 1899 (corresponding to the late German realism).

4.4 Character Network Analysis and Relationship Extraction

The papers reviewed in the previous subsections of this survey address sentiment analysis of literary texts mainly on a document level. This abstraction is warranted if the goal is to get an insight into the distribution of emotion lexicon in a corpus of books. However, literary texts, particularly novels, are not abstract categories. Moreover, the emotions depicted in books do not exist in the isolation, but are associated with characters, who are at the core of any literary narrative (Ingermanson and Economy, 2009). This leads us to a question of what sentiment analysis can tell us about the characters. How emotional are they? And what role do emotions play in their interaction?

Character relationships have been analyzed in computational linguistics from a graph theoretic perspective, particularly using social network analysis (e.g., Agarwal et al. (2013) and Elson et al. (2010)). Fewer works, however, address the problem of modeling character relationships in terms of sentiment. Below we provide an overview of several papers that propose the methodology for extracting this information.

Sentiment dynamic between characters

Some examples of character network and relationship analysis with sentiment are Nalisnick and Baird (2013a, b) that present automatic methods for analyzing sentiment dynamics between plays characters. The goal of the study by Nalisnick and Baird (2013a) is to track the emotional trajectories of interpersonal relationship. The structured format of a dialog allows them to identify who is speaking to whom, which makes it possible to mine character-to-character sentiment by summing the valence values of words that appear in the continuous direct speech and are found in the AFFIN lexicon (Nielsen, 2011). They also show that their method provides adequate predictions about three of Shakespeare’s best-known relationships, namely Othello vs. Desdemona, Romeo vs. Juliet, and Petruchio vs. Katharina. Nalisnick and Baird (2013b) is an extension of the previous research from the same authors that introduces the concept of a “sentiment network”, a dynamic social network of characters. Changing polarities between characters are modeled as edge weights in the network. Motivated by the desire to explain such networks in terms of general sociological model, Nalisnick and Baird test if Shakespeare’s plays obey Structural Balance Theory (SBT) Marvel et al. (2011) that postulates that a friend of a friend is also your friend. Using the procedure proposed by Marvel et al. on their Shakespearean sentiment networks, Nalisnick and Baird test if they can predict how a play’s characters will split into factions using only information about the state of the sentiment network after Act II. The results of their analysis are varied and do not provide adequate support for SBT as a benchmark for network analysis in Shakespeare’s plays. One reason for that is inadequacy of their shallow sentiment analysis methods that cannot detect such elements of speech as irony and deceit that play pivotal role in many literary works. Unfortunately, Nalisnick and Baird do not mention whether their approach to sentiment networks can be extended to works of literature other than plays.

Character analysis and character relationships

Elsner (2012, 2015) deal with plot structure analysis using the notion of a kernel, a similarity measure between two novels in terms of the characters and their relationships. These kernels allow abstracting away from the events of a story and rather focus on important characters and relationships. Elsner (2012) focuses on 19th-century romances (41 books). Each book is split into chapters and each chapter is split into paragraphs, from which characters are extracted141414They use manually predefined list of characters for each book. and dependency tree-based unigrams are computed for each character. There are two types of time-varying features recorded for each character: 1) character’s frequency as a proportion of all characters mentions in the text, and 2) frequency with which each character is associated with emotional language. Paragraphs that mention only one character are searched for all emotion words with their counts added to the character’s total. The chapter’s score is an average over paragraphs (normalized by the count of emotional words in the whole text). For pairwise character relationships, Elsner counts the paragraphs in which only two characters are mentioned using this number as a measurement of the strength of the relationship between characters. The kernels are evaluated on their ability to distinguish real novels in the dataset and artificially constructed surrogates of these novels that are obtained by permuting the chapters, random reassignment of character relationships, and putting chapters in reverse order. The performance of their approach is tested against sentiment-only baseline that tracks only sentiment progress throughout the story without taking into account characters it is associated with. The kernel performs a ranking task deciding whether a test novel is a real one or an artificial construction using weighted nearest neighbor strategy. The accuracy achieved by their system is 90% for the order and character task (significantly higher than random baseline of 50%), and 67% for the reverse task (not significantly higher than random guess). Elsner (2015) extends the previous research by adding a more diverse emotion lexicon (NRC) and making use of LDA topic modeling.

Kim and Klinger (2018) contribute the REMAN corpus151515http://www.ims.uni-stuttgart.de/data/reman of literary texts with annotations of emotions, experiencers, causes and targets of the emotions. The goal of the annotation project was to enable the automatic extraction of character relationship and causes of emotions experienced by the characters. The authors suggest that the results of coarse-grained emotion classification (with BoW) in literary text are not readily interpretable, as they do not tell much about who is the experiencer of the emotion. Indeed, if a text mentions two characters, one of whom is angry and another one is scared because of that, BoW models will only tell us that the text is about anger and fear, but nothing else. Hence, a finer-grained approach towards character relationship extraction is warranted. The study by Kim and Klinger conduct small experiments on the annotated dataset showing that fine-grained approach to emotion prediction with long-short-term memory networks outperforms BoW models (an increase in by 12 pp). At the same time, the results of their experiments suggest that joint prediction of emotions and experiencers can be more beneficial than learning these categories separately.

Barth et al. (2018) develop a character relation analysis tool rCAT161616http://www.rcat-ims.de/. The tool aims at providing an easy-to-use, reusable solution for automatic extraction of relational information from a text. The tool implements a distance parameter (based on token space) for finding pairs of characters who interact. In addition to the general context words that characterize each pair of characters, the tool provides an emotional filter to restrict character relationship analysis to emotions only. A tool presented in Jhavar and Mirza (2018) provides a similar functionality: given an input of two character names from Harry Potter series, the EMOFIEL171717https://gate.d5.mpi-inf.mpg.de/emofiel/ tool identifies the emotion flow between a given directed pair of story characters. These emotions are identified using categorical (Plutchik, 1991) and dimensional (Russell, 1980) emotion models. At the moment of writing, the tool only works with the books and characters from the Harry Potter series.

Egloff et al. (2018) present an ongoing work on the Ontology of Literary Characters (OLC) that allows to capture and infer psychological characters’ traits from their linguistic descriptions. The OLC incorporates the Ontology of Emotion (Patti et al., 2015) that is based on both Plutchik’s and Hourglass’s (Cambria et al., 2012) models of emotions. Two additional models are used: a preliminary ontology of narrative roles, and a model of psychological profiles based on the Big Five Model of personality traits (Digman, 1990). The ontology encodes thirty-two emotional concepts, each of which is associated with the Big Five Model. Based on their natural language description, characters are attributed a psychological profile along the classes of Openness to experience, Conscientiousness, Extraversion, Agreeableness, and Neuroticism. The ontology links each of these profiles to one or more archetypal categories of hero, anti-hero, and villain. Egloff et al. emphasize that using the semantic connections of the OLC, it is possible to infer characters’ psychological profiles and their role played in the plot.

Finally, a small body of work exists that focuses on mathematical modeling of character relationships. Rinaldi et al. (2013) contribute a model that describes the love story between the Beauty and the Beast through ordinary differential equations. Zhuravlev et al. (2014) introduce a distance function to model the relationship between protagonist and other characters in two masochistic short novels by Ivan Turgenev and Sacher-Masoch. Borrowing some instruments from the literary criticism and using ordinary differential equations, Zhuravlev et al. are able to reproduce the temporal and spatial dynamics of the love plot in the two novellas more precisely than in had been done in previous research. Jafari et al. (2016) present a dynamic model describing the development of character relationships based on differential equations. The proposed model is enriched with complex variables that can represent complex emotions such as coexisting love and hate.

4.5 Emotion Tracking

We have seen that sentiment analysis as applied to literature can be used for a number of downstream tasks, such as classification of texts based on the emotions they convey, genre classification based on emotions, sentiment analysis on the historical domain, and character relationship extraction. However, the application of sentiment analysis is not limited to these tasks. In this concluding part of the survey, we review some papers that do not formulate their approach to sentiment analysis as a downstream task. Often, the goal of these works is to understand how sentiments and emotions are represented in literary text in general, and how sentiment or emotion content varies across specific documents or a collection of thereof with time, where time can be either relative to the text in question (from beginning to end) or to the historical changes in language (from past to present). Such information is valuable for gaining a deeper insight into how sentiments and emotions change over time allowing to bring forward new theories or shed more light onto existing literary or sociological theories.

All papers presented in this section share a common approach to the recognition of affect in text, namely various lexical resources. We start from the earliest research in this field and gradually move towards the present day works that, methodology-wise, have not dramatically changed during the last thirty years. This may come as a surprise to the reader, but we will return to this issue later in the discussion.

Early works

One of the earliest works in this direction is Anderson and McMaster (1986) that starts from the premise that reading enjoyment stems in the affective tones of a text. These affective tones create a “residual tension”, i.e., conflict, that can rise to climax through a series of crises, which is necessary for a work of fiction to be attractive to the reader. The work is more qualitative in spirit and aims at validating the idea that automatic detection of emotional words and scoring of corresponding passages leads to objective interpretation of emotional component of the stories. The paper illustrates two approaches to the analysis of emotional tones. The first approach is the production of tension scores for the text passages. The second approach is the generation of transition graphs that represent the development of emotional states, similar to the ones used by Reagan et al. (2016); Kim et al. (2017a); Samothrakis and Fasli (2015). Both approaches use the list of 1,000 most common English words annotated with pleasure, arousal, and dominance ratings (Heise, 1965). To produce a residual tension score, Anderson and McMaster take the mean of the pleasure, arousal, and dominance ratings for each word in a text passage. The more negative the score is, the higher residual tension is, and vice versa. To illustrate the second approach, Anderson and McMaster calculate residual tension for each consecutive 100 words of a story Boys at a Picnic, plot the resulting numbers on a graph and provide qualitative analysis of the peaks. The overall conclusion of this analysis is that a reader who has access to text would be able to find correlation between events in the story and peaks on the graph. However, the authors still stress that such interpretation remains dependent upon the judgement of the reader. The message of the paper is that the presented approaches to the modeling of emotional tones can be used in objective comparisons of different stories and can lead to fresh interpretations of their impact on the reader.

Anderson and McMaster (1993) contribute to their previous work by offering a more practically oriented analysis. Challenged by the negative press responses to the modified version of Beatrix Potter’s The Tale of Peter Rabbit with simplified text, Anderson and McMaster propose an objective evaluation of original and revised versions (each composed of three stories) with their computer system presented in Anderson and McMaster (1982). Similar to their earlier work, they first analyze both versions for emotional tones (as expressed in pleasure, arousal, and dominance dimensions) and then compare the mean values for both stories using MANOVA analysis Hair et al. (1998). The results of the analysis show that revision of these stories did not result in significant differences in mean levels of pleasure, arousal, and dominance, except in case of one story. As for emotional transitions, the analysis shows that emotional state spectrographs of all stories have been shifted and their indices of emotional fluctuation were lowered, which in authors’ opinion lowered the excitement of the revised version. Unfortunately, Anderson and McMaster do not report whether these correlations in emotional patterns are statistically significant.

Child-directed texts are also a focus of analysis in Alm and Sproat (2005). The study presents the results of the emotion annotation task of 22 brothers Grimm’s tales and evaluates patterns of emotional story development. Specifically, Alm and Sproat pose three research questions in their study: 1) Do fairy tales tend to have neutral beginning and happy ending? 2) Does neutral emotion (no emotion) dominate in the sentence context? and 3) Which emotions tend to be prolonged and which not? To answer these questions, first, two annotators are given a task of annotating each sentence either as neutral or with one of basic emotion categories of the following classes: angry, disgusted, fearful, happy, sad, positively surprised and negatively surprised. The reported inter-annotator agreement scores are lower compared to other more standard linguistic annotation task that confirms the task difficulty and that emotion annotation is highly sensitive to subjective interpretations. At the same time, the majority of the sentences in the corpus (60%) were annotated as neutral, while other emotion classes represent much smaller proportion in the resulting data (e.g., anger: 12%, fear: 7%, happy: 7%). Then, the corpus is statistically evaluated by a one-way significance test for difference between the proportions of time that emotion i and emotion j immediately preceded or followed emotion k. The test showed that neutral category occurs significantly more frequently in the first sentence and happy in the last sentence. The test also reveals that neutral is indeed a dominating category in sentence context. Finally, the results of significance test show that such emotions as anger and sadness are significantly more often preceded and followed themselves, disgust more often preceded itself and that there is no statistically significant evidence for happy, fearful and two surprised categories. Lastly, Alm and Sproat explore if tales show a particular emotional trajectory. To that end, they combine emotions into positive, negative and neutral categories and divide each story into five parts for which aggregate frequency counts of combined emotion categories are computed. The resulting numbers are plotted on a graph that shows a wave-shaped pattern. From this graph, Alm and Sproat argue, one can see that the first part of the fairy tales is the least emotional, which is probably due to scene setting, while the last part shows an increase in positive emotions, which may signify the happy ending.

Recent studies

Both Mohammad (2011) and Mohammad (2012b) are show-cases for sentiment analysis and visualization used to quantify and track emotions in individual books and large collections of books, and provide an insight into differences in emotion word density, as well as emotional trajectories, between books of different genres. Emotion word density is defined as a number of times a reader will encounter an emotion word on reading every X words, where X is set to 10,000. In addition, each text is assigned several emotion scores for each emotion that are calculated as a ratio of words associated with one emotion to the total number of emotion words occurring in a text. Both metrics use NRC Affective Lexicon to find occurrences of emotion words. However simple this approach is, the author argues, it is still effective in determining if a large piece of text has more emotional content compared to others in a corpus. Using Corpus of English Novels (CEN)181818https://perswww.kuleuven.be/~u0044428/cen.htm and the Fairy Tale Corpus (FTC)191919https://www.l2f.inesc-id.pt/wiki/index.php/Fairy_tale_corpus for the experiments, the author calculates the polarity and emotion word density for each of the novels and each of the fairy tales. The analysis of the resulting numbers suggests that mean densities for anger and sadness across CET and FTC are not significantly (p¡0.001) different. At the same time, fairy tales have significantly higher anticipation, disgust, joy and surprise word densities, but lower trust word density when compared to novels. While word density scores are used to quantify the distribution of emotion words in different texts, emotion scores are used to visualize the emotional trajectories. To provide an intuitive visualization of the flows of emotions, the authors takes three texts from different genres, namely Shakespeare’s Hamlet (tragedy) and As you Like It (comedy), and Marry Shelley’s Frankenstein (horror). Each book is then split into 20 consecutive segments and emotion scores for each segment are calculated and plotted on graphs, which show that the novel is progressively darker than the plays, which is especially notable in the final segments. At the same time, the tragedy is depicted with much higher levels of fear and lower levels of trust towards the end, while the comedy is shown to have the opposite trends.

Klinger et al. (2016) is a case-study in automatic emotion analysis of Kafka’s “Amerika” and “Das Schloss”. The goal of the work is to analyze the development of emotions in both texts, as well as provide a character-oriented emotion analysis that would reveal specific character traits in both novels. To that end, Klinger et al. develop German dictionaries of words associated with Ekman’s fundamental emotions plus contempt and apply them to both texts in question to automatically detect emotion words. To track emotions over the text, they apply a sliding window approach and assign emotion score to a subset of consecutive tokens by counting the occurrences of emotion words in each subset and normalizing by the dictionary size. The same procedure is used for character-oriented analysis, but it is restricted to windows around each mention of characters. The results of their analysis for “Das Schloss” show a striking increase of surprise towards the end and a peak of fear shortly after start of chapter 3. In the case of “Amerika”, the analysis shows that there is a decrease in enjoyment after a peak in chapter 4.

Yet another work that tracks the flow of emotions in a collection of texts is Kim et al. (2017b). The authors hypothesize that literary genres can be linked to the development of emotions over the course of text. To test this, they collect more than 2,000 books from five genres (adventure, science fiction, mystery, humor and romance) from the Project Gutenberg and identify prototypical emotion shapes for each genre. The procedure for obtaining emotion scores for the text passages is similar to Klinger et al. (2016). However, the authors do not use a sliding window approach but rather adopt a simpler scheme of text segmentation inspired by the five-act theory of dramatic acts (Freytag, 1863), which proposes a five-act structure of a dramatic text: exposition, rising action, climax, falling action, and denouement. Consequently, each novel in the corpus is split into five consecutive equally-sized segments that, according to the authors, roughly correspond to the dramatic acts in Freytag’s theory. The results of the corpus analysis show that all five genres show close correspondence with regard to sadness, anger, fear and disgust, i.e. a consistent increase of these emotions from Act 1 to Act 5, which may correspond to an entertaining narrative. Mystery and science fiction books show increase in anger towards the end, and joy shows an inverse decreasing pattern from Act 1 to Act 2, with exception of humor. Additionally, Kim et al. calculate top 10 pointwise mutual information scores (Church and Hanks, 1990) for the genres and the Act that shows a peak of an emotion. The higher the score is, the higher is association between genre and an emotion peak in certain Act. Some interesting emotion-genre associations are revealed by this analysis, for example, trust is a characteristic of the beginning and the end of adventure and science fiction books, while humor is the only genre that does not contain joy among top 10 associations.

Gao et al. (2016) employ an adaptive filter and performing fractal and multifractal analysis on the sentiment time series of thirteen literary texts. They show that though each novel has its unique sentiment spikes and troughs, there is still a long-range correlation between sentiment time series. The sentiment time series of the analyzed novels are characterized by a Hurst parameter larger than 1/2 and less than 1, which is required for a novel to be both captivating and rich.

Kakkonen and Galic Kakkonen (2011) report on their tool building effort to support literary analysis of Gothic texts at the sentiment level. The authors introduce a system called SentiProfiler that generates visual representations of affective content in such texts and outlines similarities and differences between them. The SentiProfiler uses WordNet-Affect to derive a list of emotion-bearing words that will be used for analysis. Architecturally, the system consists of three modules: ontology, sentiment analyzer, and a visualizer. The ontology corresponds to WordNet-Affect hierarchy and corresponding synsets from WordNet and includes 147 classes under the negative emotion branch, each class being an hierarchy of emotion words (e.g.,) with a total of 823 emotion nouns, verbs, and adjectives. The sentiment analyzer consists of creating a sentiment profile (SP) for an input document. To create a SP the authors first detect sentiment-bearing words, relate each such word to the relevant sentiment class, and construct the hierarchy that describes the SP of a document. Finally, they construct a graph representation of sentiment class hierarchy, where each graph vertex is associated with the number of times a word from the relevant sentiment class appears in a document. Additionally, they define a score that measures the dominance of each sentiment class in a document by counting the number of times a word instantiating the sentiment class appears in the document and dividing it by the total number of word tokens in the document. The resulting SP and scores are used by SentiProfiler to visualize the presence of sentiment in a particular document and to compare two different texts. To test the system, Kakkonen and Kakkonen analyze SPs of Gothic novels that are usually divided into subgenres of terror and horror. The results of their analysis applied on two books from these subgenres indicate that differences can be observed in the relative frequency and presence of certain sentiment classes. For example, such classes as sorrow, depression, and anxiety are more frequent in the subgenre of terror, while horror, disgust, and repugnance are prevalent in the horror subgenre. The authors conclude that the results produced by SentiProfiler support the literary theoretic distinctions between these two subgenres. The main disadvantage of the system proposed in this study is that, at least at the reported stage, it is not scalable to other genres of literature, as well as does not go beyond negative emotions.

Koolen (2018) uses sentiment information as one of the angles of stylistic analysis of books written by and marketed towards women. Koolen mentions that there is a bias among readers that put works by female authors on a par with “women’s books”, which are perceived as of lower literary quality. The author investigates how much “women’s books” (here, romantic novels written by women) differ from novels that are perceived as literary (female and male-authored literary fiction). The corpus used in the study is the collection of European and North-American novels translated into Dutch. Koolen uses a Dutch version of Linguistic Inquiry and Word Count (Boot et al., 2017), a dictionary that contains content and sentiment-related categories of words, to count the number of words from different categories in each type of fiction. The analysis shows that romantic novels contain more positive emotions and words pertaining to friendship than in literary fiction. The amount of negative emotions is not significantly different. There are also more job-related words in the romantic novels than in literary fiction written by females. However, female-authored literary novels and male-authored ones do not significantly differ on any category. The author speculates that readers tend to stress the commonalities between female-authored literary fiction and romantic novels but overlook the commonalities between female and male authors. This might explain why female authors’ novels are judged to have less literary quality.

Other approaches to sentiment and emotion tracking in literature, such as Morin and Acerbi (2017), focus on larger-scale data spanning hundred thousands of books. Morin and Acerbi (2017) is a extension to the previous study by the same authors (Acerbi et al., 2013) that revealed a steady decrease in the presence of emotion related words in the 20th-century English-language printed literature. The paper provides a grounding to the hypothesis that the decrease in emotion-related words is a linguistic and cultural phenomenon. Having collected 307,527 books written between 1900 and 2000 from the Google Books corpus202020http://storage.googleapis.com/books/ngrams/books/datasetsv2.html they collect, for each year, the total number of case-insensitive occurrences of emotion terms that are found under positive and negative taxonomies of LIWC dictionary (Pennebaker et al., 2007). The number is then normalized with the total number of 1-grams in the sample for each year, and finally the frequencies are summed. The resulting value is used as an indicator of the emotion-related words usage trends. For validation purposes they use two small hand-crafted book corpora that cover two centuries. The main findings of their research are the following: 1) emotionality (both positive and negative emotions) declines with time in all three corpora, 2) this decline is driven by the decrease in usage of positive vocabulary, 3) such a decline cannot be explained by changes in demographic dynamics as sociological reports confirm that the self-reported happiness of the population steadily increased throughout the twentieth century, and 4) the age of the authors covaries with negative emotionality, with older authors using fewer negative emotion-related words. Among possible reasons for such a decline in the usage of positive vocabulary, Morin and Acerbi propose a “regression-to-the-mean” hypothesis. The hypothesis states that, because it is not possible to get reliable data before 1800, we cannot positively conclude that emotionality always declined. The Romantic period was dominated by emotionality in writing, which could be the effect of a group of writers who wrote above the mean. If one assumes that each new writer tends to copy emotional style of his/her predecessors with a random error factor, then writers at one point of time are disproportionally influenced by this group of above-the-mean writers. However, this trend does not last forever and, sooner or later, the trend reverts to the mean, as each writer reverts to a normal level of emotionality.

An earlier work (Bentley et al., 2014) written in collaboration with one of the authors of Morin and Acerbi (2017) provides a somewhat different approach and interpretation of the problem of the decline in positive vocabulary in English books of 20th century. Using the same dataset and lexical resources (plus WordNet-Affect) Bentley et al. find a strong correlation between expressed negative emotions and the U.S. economic misery index, which is especially strong for the books written during and after WW1 (1918), the Great Depression (1935), and the energy crisis (1975). However, in the present study (Morin and Acerbi, 2017), the authors argue that the extent to which positive emotionality correlates with subjective well-being is a debatable issue. Morin and Acerbi provide more possible reasons for this effect, as well as detailed statistical analysis of the data, so we refer the reader to the original paper for more information.

5 Conclusion

5.1 Summary

We provided an overview of the research related to sentiment and emotion analysis in computational literary studies. We argued that emotion and sentiment are an important dimension of literature, which roots in the important role emotions play in human life. Given their status, emotions have often been a crucial part of compelling narratives: literature tells about people, who have goals, desires, passions, and intentions. The existing research in media psychology suggests that emotions we encounter in fiction evoke an emotional and cognitive response in the reader. This in turn improves our ability to better understand other people and, at the same time, provides us with a tool to interpret literary texts along an affective dimension (see Section 1).

There has been growing interest in computationally-oriented research within the humanities discipline. This interest combined with an appeal of emotions and their role in literature has given way to emergence of computational analysis of literary texts through a prism of affect: While there are only few related studies in 1980s and 1990s, nowadays the proceedings of many conferences on digital humanities and computational linguistics include, at least, several papers that focus on sentiment and emotion analysis of literary texts.

Although the works by researchers from different backgrounds may have different goals, methodological or interpretative ones, application-wise, most papers can be easily sub-grouped into several categories: modelling emotions in the past, classification of literature by emotions, classification of genres and story-types based on emotion and sentiment features, character relationship extraction, and emotion analysis in general.

5.2 Discussion

We now proceed to the discussion. The high-level discussion aims at highlighting some of the observations we made when reviewing the literature. It is not our goal to criticize some papers and praise the others. Instead, our goal is to detect commonalities and discrepancies between the digital humanities and computational linguistic communities and find a common framework of research in sentiment and emotion analysis within a DH paradigm.

We have shown throughout this survey that there is a growing interest in sentiment and emotion analysis within digital humanities. Given the fact that DH have emerged into a thriving science within the past decade, it may safely be said that this direction of research is relatively new. At the same time, the research in sentiment analysis started in computational linguistic almost two decades ago and is nowadays an established field that has dedicated workshops and tracks in the main computational linguistics conferences. Moreover, a recent meta-study by Mäntylä et al. (2018) shows that the number of papers in sentiment analysis is rapidly increasing each year. Indeed, the topic has not yet outrun itself and we should not expect to see it vanishing within the next decade or two, provided that no significant paradigm shift in the computational sciences takes place. One may wonder, whether the same applies to sentiment analysis in digital humanities scholarship. Will the interest in the topic grow continuously or will it rally to the peak and vanish in few years?

There is no a decisive answer. However, there is an evidence from the computational linguistics, where the popularity of sentiment analysis may have reached its peak but is far from fading. And it roots in the way the research in computational linguistics is done. Application-wise, not a lot has changed during the past years: researchers are still interested in predicting sentiment and emotion from text for different purposes. If anything has changed, it is methodology. Early research in sentiment analysis relied on word polarity and specific dictionaries. Modern state-of-the-art approaches rely on word embeddings and deep learning architectures achieving prediction accuracy around 90% (e.g., Abdul-Mageed and Ungar (2017)). Having started with simple polarity detection, contemporary sentiment analysis has advanced to a more nuanced analysis of sentiments and emotions (Mäntylä et al., 2018).

The situation is somewhat different in digital humanities research. Most of the works that exist there rely on affective lexicons and word counts, a technique for detecting emotions in literary text first used by Anderson and McMaster in 1982 (Anderson and McMaster, 1982). Even the most recent works base the interpretation of the results on the use of dictionaries and counts of emotion-bearing words in a text, passage, or sentence. In fact, around 70% of the papers that we discussed in Section 4 substantially rely on the use of various lexical resources for detecting emotions (see Table 1 for a summary of methods used in the reviewed papers). We have discussed some limitations of this approach in Section 4.2. Let us reiterate its weakness with the following small example. Consider a sentence “Jack was afraid of John because John held a knife in his hand”. Assuming a dictionary of emotion-bearing words is used, the sentence can be categorized as expressing fear, because of the two strong fear markers, “afraid” and “knife”. Indeed, the sentence does express fear. But does it do it equally for Jack and John? The answer is no: Jack is the one who is afraid and John holding a knife is the reason for the Jack being afraid. Let us assume that a researcher is interested in the emotional analysis of a book that contains thousands of sentences expressing emotions in different ways: some sentences describe characters who feel emotions just as in the sentence above, some are narrator’s digressions filled with emotions, some contain emotion-bearing words (“knife”, “baby”) but do not in fact express any emotion. No doubt, a dictionary and count-based approach will be helpful in understanding the distribution of the emotional lexicon throughout the story. But is it enough for the interpretation? Can hermeneutics, in its traditional form, make use of such knowledge? Barely. In fact, some of the works that we reviewed (e.g., Reed (2018)), pinpoint that the surface affective value of the words does not always align with their more nuanced affective meaning and sentiment analysis tools make mistakes when classifying a text as emotional or not. If so, how reliable is interpretation? Said differently, what kind of interpretation should we expect from the sentiment and emotion analysis research in DH community?

We do not have a ready answer to that question. At the one extreme, there is traditional hermeneutics, the examples of which are presented in a Section 3.2. At the other extreme, is interpretation in the form of “Author A writes more emotionally than author B because the numbers tell so”. We do, however, suggest that a balance should be stricken somewhere between these two extremes. In fact, even as simple as it is, approach of detecting sentiment and emotion-related words can be used to deliver a high-quality interpretation such as in Heuser et al. (2016) or Morin and Acerbi (2017). However, we note again that there are still limits posed by the simplicity of this approach.

This leads us to an outline of the harsh reality of sentiment analysis research in digital humanities: the methods of sentiment analysis used by some of the DH scholars nowadays have gone or are almost extinct among computational linguists. This in turn affects the quality of the interpretation.

However, we admit that this criticism may be unfair. In fact, there is a possible reason why DH researchers have taken this approach to sentiment analysis. Digital humanities are still being formed as a discipline in its own and it is easier to form something new in a step-by-step fashion. Resorting to a metaphor from the construction building world, one should first learn how to stack single bricks to build a wall, rather than starting from the design of a communications system. It is true that much of digital humanities research (especially dealing with a text) uses the methods of text analysis that were in fashion in computational linguistic twenty years ago. One can argue that new research in digital humanities should start with the state-of-the-art methods. However, it is not clear yet and there is no evidence, whether the state-of-the-art methodology will suit the digital humanities scholarship, because, again, it has not been established what is the final goal of digital humanities science.

What digital humanities can learn from computational linguistics though, is that methodology cannot stall. Because what really matters for digital humanities is interpretation. And if methodology is not going forward, the interpretation is not either.


  • Abdul-Mageed and Ungar (2017) Muhammad Abdul-Mageed and Lyle Ungar. 2017. EmoNet: Fine-grained emotion detection with gated recurrent neural networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). volume 1, pages 718–728.
  • Acerbi et al. (2013) Alberto Acerbi, Vasileios Lampos, Philip Garnett, and R. Alexander Bentley. 2013. The expression of emotions in 20th century books. PLOS ONE 8(3):1–6. https://doi.org/10.1371/journal.pone.0059030.
  • Agarwal et al. (2013) Apoorv Agarwal, Anup Kotalwar, and Owen Rambow. 2013. Automatic extraction of social networks from literary text: A case study on alice in wonderland. In Proceedings of the Sixth International Joint Conference on Natural Language Processing. pages 1202–1208.
  • Agrawal and An (2012) Ameeta Agrawal and Aijun An. 2012. Unsupervised emotion detection from text using semantic and syntactic relations. In Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology-Volume 01. IEEE Computer Society, pages 346–353.
  • Akhtar et al. (2017) Md Shad Akhtar, Abhishek Kumar, Deepanway Ghosal, Asif Ekbal, and Pushpak Bhattacharyya. 2017. A multilayer perceptron based ensemble technique for fine-grained financial sentiment analysis. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. pages 540–546.
  • Al Sallab et al. (2015) Ahmad Al Sallab, Hazem Hajj, Gilbert Badaro, Ramy Baly, Wassim El Hajj, and Khaled Bashir Shaban. 2015. Deep learning models for sentiment analysis in arabic. In Proceedings of the Second Workshop on Arabic Natural Language Processing. pages 9–17.
  • Allen (2006) James F Allen. 2006. Natural Language Processing, American Cancer Society. https://doi.org/10.1002/0470018860.s00078.
  • Alm et al. (2005) Cecilia Ovesdotter Alm, Dan Roth, and Richard Sproat. 2005. Emotions from text: machine learning for text-based emotion prediction. In Proceedings of the conference on human language technology and empirical methods in natural language processing. Association for Computational Linguistics, pages 579–586.
  • Alm and Sproat (2005) Cecilia Ovesdotter Alm and Richard Sproat. 2005. Emotional sequencing and development in fairy tales. In International Conference on Affective Computing and Intelligent Interaction. Springer, pages 668–674.
  • Aman and Szpakowicz (2007) Saima Aman and Stan Szpakowicz. 2007. Identifying expressions of emotion in text. In International Conference on Text, Speech and Dialogue. Springer, pages 196–205.
  • Amolik et al. (2016) Akshay Amolik, Niketan Jivane, Mahavir Bhandari, and M Venkatesan. 2016. Twitter sentiment analysis of movie reviews using machine learning techniques. International Journal of Engineering and Technology 7(6):1–7.
  • Anderson and McMaster (1982) Clifford W. Anderson and George E. McMaster. 1982. Computer assisted modeling of affective tone in written documents. Computers and the Humanities 16(1):1–9.
  • Anderson and McMaster (1986) Clifford W. Anderson and George E. McMaster. 1986. Modeling emotional tone in stories using tension levels and categorical states. Computers and the Humanities 20(1):3–9.
  • Anderson and McMaster (1993) Clifford W. Anderson and George E. McMaster. 1993. Emotional tone in peter rabbit before and after simplification. Empirical Studies of the Arts 11(2):177–185.
  • Anderson (2004) Tom Anderson. 2004. Why and how we make art, with implications for art education. Arts Education Policy Review 105(5):31.
  • Arbib and Fellous (2004) Michael A Arbib and Jean-Marc Fellous. 2004. Emotions: from brain to robot. Trends in cognitive sciences 8(12):554–561.
  • Aristotle (1996) Aristotle. 1996. Poetics. Penguin.
  • Arkin et al. (2012) Ronald Craig Arkin, Patrick Ulam, and Alan R Wagner. 2012. Moral decision making in autonomous systems: Enforcement, moral emotions, dignity, trust, and deception. Proceedings of the IEEE 100(3):571–589.
  • Asghar et al. (2018) Muhammad Zubair Asghar, Fazal Masud Kundi, Shakeel Ahmad, Aurangzeb Khan, and Furqan Khan. 2018. T-SAF: Twitter sentiment analysis framework using a hybrid classification scheme. Expert Systems 35(1):e12233.
  • Baccianella et al. (2010) Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2010. Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In LREC. volume 10, pages 2200–2204.
  • Bal and Veltkamp (2013) P. Matthijs Bal and Martijn Veltkamp. 2013. How does fiction reading influence empathy? an experimental investigation on the role of emotional transportation. PLOS ONE 8(1):1–12. https://doi.org/10.1371/journal.pone.0055341.
  • Barnes et al. (2017) Jeremy Barnes, Roman Klinger, and Sabine Schulte im Walde. 2017. Assessing state-of-the-art sentiment models on state-of-the-art sentiment datasets. In Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. pages 2–12.
  • Barnes et al. (2018a) Jeremy Barnes, Roman Klinger, and Sabine Schulte im Walde. 2018a. Bilingual sentiment embeddings: Joint projection of sentiment across languages. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Melbourne, Australia.
  • Barnes et al. (2018b) Jeremy Barnes, Roman Klinger, and Sabine Schulte im Walde. 2018b. Projecting embeddings for domain adaptation: Joint modeling of sentiment analysis in diverse domains. In Proceedings of COLING 2018, the 27th International Conference on Computational Linguistics. Santa Fe, USA.
  • Barrett (1998) Lisa F. Barrett. 1998. Discrete emotions or dimensions? the role of valence focus and arousal focus. Cognition & Emotion 12(4):579–599.
  • Barrett (2017) Lisa F. Barrett. 2017. How emotions are made: The secret life of the brain. Houghton Mifflin Harcourt.
  • Barros et al. (2013) Linda Barros, Pilar Rodriguez, and Alvaro Ortigosa. 2013. Automatic classification of literature pieces by emotion detection: A study on quevedo’s poetry. In Affective Computing and Intelligent Interaction (ACII), 2013 Humaine Association Conference on. IEEE, pages 141–146.
  • Barth et al. (2018) Florian Barth, Evgeny Kim, Sandra Murr, and Roman Klinger. 2018. A reporting tool for relational visualization and analysis of character mentions in literature. In Book of Abstracts – Digital Humanities im deutschsprachigen Raum. Cologne, Germany.
  • Bartlett et al. (2005) Marian Stewart Bartlett, Gwen Littlewort, Mark Frank, Claudia Lainscsek, Ian Fasel, and Javier Movellan. 2005. Recognizing facial expression: machine learning and application to spontaneous behavior. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). volume 2, pages 568–573 vol. 2. https://doi.org/10.1109/CVPR.2005.297.
  • Barton (1996) James Barton. 1996. Interpreting character emotions for literature comprehension. Journal of Adolescent & Adult Literacy 40(1):22–28.
  • Beale and Creed (2008) Russell Beale and Chris Creed. 2008. Affect and emotion in human-computer interaction. Springer.
  • Beale and Creed (2009) Russell Beale and Chris Creed. 2009. Affective interaction: How emotional agents affect users. International Journal of Human-Computer Studies 67(9):755–776.
  • Beck et al. (2010) Aryel Beck, Antoine Hiolle, Alexandre Mazel, and Lola Cañamero. 2010. Interpretation of emotional body language displayed by robots. In Proceedings of the 3rd international workshop on Affective interaction in natural environments. ACM, pages 37–42.
  • Bentley et al. (2014) R. Alexander Bentley, Alberto Acerbi, Paul Ormerod, and Vasileios Lampos. 2014. Books average previous decade of economic misery. PLOS ONE 9(1):1–7. https://doi.org/10.1371/journal.pone.0083147.
  • Berry (2012) David M Berry. 2012. Introduction: Understanding the digital humanities. In Understanding digital humanities, Springer, pages 1–20.
  • Bollegala et al. (2011) Danushka Bollegala, David Weir, and John Carroll. 2011. Using multiple sources to construct a sentiment sensitive thesaurus for cross-domain sentiment classification. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1. Association for Computational Linguistics, Stroudsburg, PA, USA, HLT ’11, pages 132–141.
  • Boot et al. (2017) Peter Boot, Hanna Zijlstra, and Rinie Geenen. 2017. The dutch translation of the linguistic inquiry and word count (liwc) 2007 dictionary. Dutch Journal of Applied Linguistics 6(1):65–76.
  • Borth et al. (2013) Damian Borth, Rongrong Ji, Tao Chen, Thomas Breuel, and Shih-Fu Chang. 2013. Large-scale visual sentiment ontology and detectors using adjective noun pairs. In Proceedings of the 21st ACM international conference on Multimedia. ACM, pages 223–232.
  • Bosser et al. (2007) Anne-Gwenn Bosser, Guillaume Levieux, Karim Sehaba, Axel Buendia, Vincent Corruble, and Guillaume De Fondaumière. 2007. Dialogs taking into account experience, emotions and personality. In Proceedings of the 2nd international conference on Digital interactive media in entertainment and arts. ACM, pages 9–12.
  • Bostan and Klinger (2018) Laura Ana Maria Bostan and Roman Klinger. 2018. An analysis of annotated corpora for emotion classification in text. In Proceedings of COLING 2018, the 27th International Conference on Computational Linguistics. Santa Fe, USA.
  • Bradley and Lang (1994) Margaret M Bradley and Peter J Lang. 1994. Measuring emotion: the self-assessment manikin and the semantic differential. Journal of behavior therapy and experimental psychiatry 25(1):49–59.
  • Brady (1985) Michael Brady. 1985. Artificial intelligence and robotics. Artificial Intelligence 26(1):79 – 121. https://doi.org/10.1016/0004-3702(85)90013-X.
  • Breazeal (2003) Cynthia Breazeal. 2003. Emotion and sociable humanoid robots. International Journal of Human-Computer Studies 59(1):119–155.
  • Brooks (1987) Frederick P Brooks. 1987. No silver bullet. IEEE Computer 20(4):10–19.
  • Bruggmann and Fabrikant (2014) André Bruggmann and Sara Irina Fabrikant. 2014. Spatializing a digital text archive about history. In Krzysztof Janowicz, Benjamin Adams, Grant McKenzie, and Tomi Kauppinen, editors, GIO 2014 Geographic Information Observatories co-located with GIScience 2014: Eight International Conference on Geographic Information Science. CEUR-WS, number 1273 in CEUR Workshop Proceedings, pages 6–14. https://doi.org/10.5167/uzh-103129.
  • Bryant and Zillmann (1984) Jennings Bryant and Dolf Zillmann. 1984. Using television to alleviate boredom and stress: Selective exposure as a function of induced excitational states. Journal of Broadcasting & Electronic Media 28(1):1–20.
  • Brysbaert et al. (2014) Marc Brysbaert, Amy Beth Warriner, and Victor Kuperman. 2014. Concreteness ratings for 40 thousand generally known english word lemmas. Behavior research methods 46(3):904–911.
  • Buechel et al. (2016) Sven Buechel, Johannes Hellrich, and Udo Hahn. 2016. Feelings from the past—adapting affective lexicons for historical emotion analysis. In Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH). pages 54–61.
  • Buechel et al. (2017) Sven Buechel, Johannes Hellrich, and Udo Hahn. 2017. The course of emotion in three centuries of german text— a methodological framework. In Digital Humanities 2017: Conference Abstracts. Montreal, Canada.
  • Cacioppo et al. (2000) John T Cacioppo, Gary G Berntson, Jeff T Larsen, Kirsten M Poehlmann, Tiffany A Ito, et al. 2000. The psychophysiology of emotion. Handbook of emotions 2:173–191.
  • Calvo and D’Mello (2010) Rafael A. Calvo and Sidney D’Mello. 2010. Affect detection: An interdisciplinary review of models, methods, and their applications. IEEE Transactions on Affective Computing 1(1):18–37. https://doi.org/10.1109/T-AFFC.2010.1.
  • Cambria et al. (2012) Erik Cambria, Andrew Livingstone, and Amir Hussain. 2012. The hourglass of emotions. Cognitive behavioural systems 7403:144–157.
  • Carbonell (1979) Jaime Guillermo Carbonell. 1979. Subjective understanding: Computer models of belief systems. Technical report, YALE UNIV NEW HAVEN CONN DEPT OF COMPUTER SCIENCE.
  • Ceron et al. (2014) Andrea Ceron, Luigi Curini, Stefano M Iacus, and Giuseppe Porro. 2014. Every tweet counts? how sentiment analysis of social media can improve our knowledge of citizens’ political preferences with an application to italy and france. New Media & Society 16(2):340–358.
  • Chaplin and Rhalibi (2004) David Joseph Chaplin and Abdennour El Rhalibi. 2004. Ipd for emotional npc societies in games. In Proceedings of the 2004 ACM SIGCHI International Conference on Advances in computer entertainment technology. ACM, pages 51–60.
  • Chen et al. (2012) Annie T Chen, Ayoung Yoon, and Ryan Shaw. 2012. People, places and emotions: Visually representing historical context in oral testimonies. In Proceedings of the Third Workshop on Computational Models of Narrative, Istanbul, Turkey. pages 26–27.
  • Chomsky (1993) Noam Chomsky. 1993. Lectures on government and binding: The Pisa lectures. Number 9 in Studies in Generative Grammar. Walter de Gruyter.
  • Chung et al. (2015) Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, and Yoshua Bengio. 2015. Gated feedback recurrent neural networks. In Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37. JMLR.org, ICML’15, pages 2067–2075. http://dl.acm.org/citation.cfm?id=3045118.3045338.
  • Church and Hanks (1990) Kenneth Ward Church and Patrick Hanks. 1990. Word association norms, mutual information, and lexicography. Computational Linguistics 16(1):22–29.
  • Coeckelbergh (2012) Mark Coeckelbergh. 2012. Are emotional robots deceptive? IEEE Transactions on Affective Computing 3(4):388–393.
  • Conati (2002) Cristina Conati. 2002. Probabilistic assessment of user’s emotions in educational games. Applied artificial intelligence 16(7-8):555–575.
  • Conati et al. (2003) Cristina Conati, Romain Chabbal, and Heather Maclaren. 2003. A study on using biometric sensors for monitoring user emotions in educational games. In Workshop on Assessing and Adapting to User Attitudes and Affect: Why, When and How.
  • Cordell et al. (2017) Ryan Cordell, M. H Beals, Isabel G Russell, Julianne Nyhan, Ernesto Priani, Marc Priewe, and Hannu Salmi. 2017. Oceanic exchanges: Tracing global information networks in historical newspaper repositories, 1840-1914. Online. https://doi.org/10.17605/OSF.IO/WA94S.
  • Cortis et al. (2017) Keith Cortis, André Freitas, Tobias Daudert, Manuela Huerlimann, Manel Zarrouk, Siegfried Handschuh, and Brian Davis. 2017. Semeval-2017 task 5: Fine-grained sentiment analysis on financial microblogs and news. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). pages 519–535.
  • Cowie et al. (2001) Roddy Cowie, Ellen Douglas-Cowie, Nicolas Tsapatsoulis, George Votsis, Stefanos Kollias, Winfried Fellenz, and John G Taylor. 2001. Emotion recognition in human-computer interaction. IEEE Signal processing magazine 18(1):32–80.
  • Csikszentmihalyi and Csikszentmihalyi (1992) Mihaly Csikszentmihalyi and Isabella Selega Csikszentmihalyi. 1992. Optimal experience: Psychological studies of flow in consciousness. Cambridge university press.
  • Darwin (1872) Charles Darwin. 1872. The expression of emotion in animals and man. London: Methuen.(1877), A biographical sketch of an infant. Mind 2:285–294.
  • Dave et al. (2003) Kushal Dave, Steve Lawrence, and David M Pennock. 2003. Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. In Proceedings of the 12th international conference on World Wide Web. ACM, pages 519–528.
  • de Sousa (2017) Ronald de Sousa. 2017. Emotion. In Edward N. Zalta, editor, The Stanford Encyclopedia of Philosophy, Metaphysics Research Lab, Stanford University. Winter 2017 edition.
  • Digman (1990) John M Digman. 1990. Personality structure: Emergence of the five-factor model. Annual review of psychology 41(1):417–440.
  • Djikic et al. (2013) Maja Djikic, Keith Oatley, and Mihnea C Moldoveanu. 2013. Reading other minds: Effects of literature on empathy. Scientific Study of Literature 3(1):28–47.
  • Djikic et al. (2009) Maja Djikic, Keith Oatley, Sara Zoeterman, and Jordan B Peterson. 2009. On being moved by art: How reading fiction transforms the self. Creativity Research Journal 21(1):24–29.
  • Dodds et al. (2011) Peter Sheridan Dodds, Kameron Decker Harris, Isabel M. Kloumann, Catherine A. Bliss, and Christopher M. Danforth. 2011. Temporal patterns of happiness and information in a global social network: Hedonometrics and twitter. PLOS ONE 6(12):1–1. https://doi.org/10.1371/journal.pone.0026752.
  • Dorner and Hille (1995) Dietrich Dorner and Katrin Hille. 1995. Artificial souls: Motivated emotional robots. In Systems, Man and Cybernetics, 1995. Intelligent Systems for the 21st Century., IEEE International Conference on. IEEE, volume 4, pages 3828–3832.
  • dos Santos and Gatti (2014) Cicero dos Santos and Maira Gatti. 2014. Deep convolutional neural networks for sentiment analysis of short texts. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. pages 69–78.
  • Downes and McNamara (2016) Stephanie Downes and Rebecca F McNamara. 2016. The history of emotions and middle english literature. Literature Compass 13(6):444–456.
  • Dupré et al. (2018) Damien Dupré, Ben Bland, Andrew Bolster, Gawain Morrison, and Gary McKeown. 2018. Dynamic model of athletes’ emotions based on wearable devices. In Tareq Ahram, editor, Advances in Human Factors in Sports, Injury Prevention and Outdoor Recreation. Springer International Publishing, Cham, pages 42–50.
  • Egloff et al. (2018) Mattia Egloff, Antonio Lieto, and Davide Picca. 2018. An ontological model for inferring psychological profiles and narrative roles of characters. In Digital Humanities 2018: Conference Abstracts. Mexico, Mexico.
  • Ekman (1971) Paul Ekman. 1971. Universals and cultural differences in facial expressions of emotion. In Nebraska symposium on motivation. University of Nebraska Press.
  • Ekman (1992) Paul Ekman. 1992. An argument for basic emotions. Cognition & emotion 6(3-4):169–200.
  • Ekman (1993) Paul Ekman. 1993. Facial expression and emotion. American psychologist 48(4):384.
  • Ekman et al. (1971) Paul Ekman, Wallace V Friesen, and Silvan S Tomkins. 1971. Facial affect scoring technique: A first validity study. Semiotica 3(1):37–58.
  • Ekman et al. (1983) Paul Ekman, Robert W Levenson, and Wallace V Friesen. 1983. Autonomic nervous system activity distinguishes among emotions. Science 221(4616):1208–1210.
  • Ekman et al. (1969) Paul Ekman, E Richard Sorenson, Wallace V Friesen, et al. 1969. Pan-cultural elements in facial displays of emotion. Science 164(3875):86–88.
  • Elsner (2012) Micha Elsner. 2012. Character-based kernels for novelistic plot structure. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, pages 634–644.
  • Elsner (2015) Micha Elsner. 2015. Abstract representations of plot structure. LiLT (Linguistic Issues in Language Technology) 12:1–29.
  • Elson et al. (2010) David K Elson, Nicholas Dames, and Kathleen R McKeown. 2010. Extracting social networks from literary fiction. In Proceedings of the 48th annual meeting of the association for computational linguistics. Association for Computational Linguistics, pages 138–147.
  • Essa and Pentland (1997) Irfan A. Essa and Alex Paul Pentland. 1997. Coding, analysis, interpretation, and recognition of facial expressions. IEEE transactions on pattern analysis and machine intelligence 19(7):757–763.
  • Esuli and Sebastiani (2006) Andrea Esuli and Fabrizio Sebastiani. 2006. Sentiwordnet: A publicly available lexical resource for opinion mining. In Proceedings of the 5th Conference on Language Resources and Evaluation (LREC’06. pages 417–422.
  • Evans (2004) Dylan Evans. 2004. Can robots have emotions? Psychology Review 11:2–5.
  • Fahrni and Klenner (2008) Angela Fahrni and Manfred Klenner. 2008. Old wine or warm beer: target-specific sentiment analysis of adjectives. In Symposium on Affective Language in Human and Machine (AISB) Convention. Aberdeen, Scotland, pages 60–63.
  • Fang and Zhan (2015) Xing Fang and Justin Zhan. 2015. Sentiment analysis using product review data. Journal of Big Data 2(1):5.
  • Felbo et al. (2017) Bjarke Felbo, Alan Mislove, Anders Søgaard, Iyad Rahwan, and Sune Lehmann. 2017. Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. pages 1615–1625.
  • Fellbaum (1998) Christiane Fellbaum. 1998. A semantic network of english verbs. WordNet: An electronic lexical database 3:153–178.
  • Flach (2012) Peter Flach. 2012. Machine learning: the art and science of algorithms that make sense of data. Cambridge University Press.
  • Francis and Kucera (1979) W. Nelson Francis and Henry Kucera. 1979. Brown corpus manual. Received online at http://clu.uni.no/icame/manuals/BROWN/INDEX.HTM.
  • Freytag (1863) Gustav Freytag. 1863. Die Technik des Dramas. Hirzel, Leipzig, Germany.
  • Frijda and Swagerman (1987) Nico H Frijda and Jaap Swagerman. 1987. Can computers feel? theory and design of an emotional system. Cognition and emotion 1(3):235–257.
  • Gamon (2004) Michael Gamon. 2004. Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis. In Proceedings of the 20th international conference on Computational Linguistics. Association for Computational Linguistics, page 841.
  • Gan et al. (2017) Qiwei Gan, Bo H Ferns, Yang Yu, and Lei Jin. 2017. A text mining and multidimensional sentiment analysis of online restaurant reviews. Journal of Quality Assurance in Hospitality & Tourism 18(4):465–492.
  • Ganu et al. (2009) Gayatree Ganu, Noemie Elhadad, and Amélie Marian. 2009. Beyond the stars: Improving rating predictions using review text content. In WebDB.
  • Gao et al. (2016) Jianbo Gao, Matthew L. Jockers, John Laudun, and Timothy Tangherlini. 2016. A multiscale theory for the dynamical evolution of sentiment in novels. In 2016 International Conference on Behavioral, Economic and Socio-cultural Computing (BESC). pages 1–4. https://doi.org/10.1109/BESC.2016.7804470.
  • Gao et al. (2014) Kai Gao, Hua Xu, and Jiushuo Wang. 2014. Emotion classification based on structured information. In 2014 International Conference on Multisensor Fusion and Information Integration for Intelligent Systems (MFI). pages 1–6. https://doi.org/10.1109/MFI.2014.6997756.
  • Gendron and Feldman Barrett (2009) Maria Gendron and Lisa Feldman Barrett. 2009. Reconstructing the past: A century of ideas about emotion in psychology. Emotion review 1(4):316–339.
  • Gendron et al. (2014) Maria Gendron, Debi Roberson, Jacoba Marietta van der Vyver, and Lisa Feldman Barrett. 2014. Perceptions of emotion from facial expressions are not culturally universal: evidence from a remote culture. Emotion 14(2):251.
  • Ghandeharioun et al. (2016) Asma Ghandeharioun, Asaph Azaria, Sara Taylor, and Rosalind W Picard. 2016. “kind and grateful”: a context-sensitive smartphone app utilizing inspirational content to promote gratitude. Psychology of well-being 6(1):9.
  • Ghazi et al. (2010) Diman Ghazi, Diana Inkpen, and Stan Szpakowicz. 2010. Hierarchical versus flat classification of emotions in text. In Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text. ACL, Los Angeles, CA, pages 140–146. http://www.aclweb.org/anthology/W10-0217.
  • Glorot et al. (2011) Xavier Glorot, Antoine Bordes, and Yoshua Bengio. 2011. Domain adaptation for large-scale sentiment classification: A deep learning approach. In Proceedings of the 28th International Conference on International Conference on Machine Learning. Omnipress, USA, ICML’11, pages 513–520.
  • Gratch and Marsella (2004) Jonathan Gratch and Stacy Marsella. 2004. Evaluating the modeling and use of emotion in virtual humans. In Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems-Volume 1. IEEE Computer Society, pages 320–327.
  • Guerra et al. (2014) Pedro Henrique Calais Guerra, Wagner Meira Jr., and Claire Cardie. 2014. Sentiment analysis on evolving social streams: how self-report imbalances can help. In Seventh ACM International Conference on Web Search and Data Mining, WSDM 2014, New York, NY, USA, February 24-28, 2014. pages 443–452. https://doi.org/10.1145/2556195.2556261.
  • Hair et al. (1998) Joseph F Hair, William C Black, Barry J Babin, Rolph E Anderson, Ronald L Tatham, et al. 1998. Multivariate data analysis, volume 5. Prentice hall Upper Saddle River, NJ.
  • Heise (1965) David R Heise. 1965. Semantic differential profiles for 1,000 most frequent english words. Psychological Monographs: General and Applied 79(8):1.
  • Henny-Krahmer (2018) Ulrike Edith Gerda Henny-Krahmer. 2018. Exploration of sentiments and genre in spanish american novels. In Digital Humanities 2018: Conference Abstracts. Mexico, Mexico.
  • Heuser et al. (2016) Ryan Heuser, Franco Moretti, and Erik Steiner. 2016. The emotions of london. Technical report, Stanford University. Pamphlets of the Stanford Literary Lab.
  • Hillis Miller (2014) J. Hillis Miller. 2014. Exploring Text and Emotions, Aarhus University Press, chapter Text; Action; Space; Emotion in Conrad’s Nostromo, pages 91–117.
  • Hochreiter and Schmidhuber (1997) Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9(8):1735–1780.
  • Hogan (2010) Patrick Colm Hogan. 2010. Fictions and feelings: On the place of literature in the study of emotion. Emotion Review 2(2):184–195. https://doi.org/10.1177/1754073909352874.
  • Hogan (2015) Patrick Colm Hogan. 2015. What Literature Teaches Us about Emotion, Oxford University Press, USA, pages 273–290.
  • Hu and Liu (2004) Minqing Hu and Bing Liu. 2004. Mining and summarizing customer reviews. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, USA, KDD ’04, pages 168–177. https://doi.org/10.1145/1014052.1014073.
  • Hudlicka and Broekens (2009) Eva Hudlicka and Joost Broekens. 2009. Foundations for modelling emotions in game characters: Modelling emotion effects on cognition. In Affective Computing and Intelligent Interaction and Workshops, 2009. ACII 2009. 3rd International Conference on. IEEE, pages 1–6.
  • Hutto and Gilbert (2014) C.J. Hutto and Eric Gilbert. 2014. VADER: A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of the Eighth International AAAI Conference on Weblogs and Social Media. pages 216–225.
  • Imai (1983) Satoshi Imai. 1983. Cepstral analysis synthesis on the mel frequency scale. In Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP’83.. IEEE, volume 8, pages 93–96.
  • Ingermanson and Economy (2009) Randy Ingermanson and Peter Economy. 2009. Writing fiction for dummies. John Wiley & Sons.
  • Jafari et al. (2016) Sajad Jafari, Julien C. Sprott, and S. Mohammad Reza Hashemi Golpayegani. 2016. Layla and majnun: a complex love story. Nonlinear Dynamics 83(1):615–622. https://doi.org/10.1007/s11071-015-2351-3.
  • Jain and Batra (2015) Sarthak Jain and Shashank Batra. 2015. Cross lingual sentiment analysis using modified brae. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pages 159–168. https://doi.org/10.18653/v1/D15-1016.
  • Jakob and Gurevych (2010) Niklas Jakob and Iryna Gurevych. 2010. Extracting opinion targets in a single- and cross-domain setting with conditional random fields. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Stroudsburg, PA, USA, EMNLP ’10, pages 1035–1045.
  • Jhavar and Mirza (2018) Harshita Jhavar and Paramita Mirza. 2018. EMOFIEL: Mapping emotions of relationships in a story. In Companion Proceedings of the The Web Conference 2018. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, WWW ’18, pages 243–246. https://doi.org/10.1145/3184558.3186989.
  • Jiang et al. (2007) Hong Jiang, Jose M Vidal, and Michael N Huhns. 2007. Ebdi: an architecture for emotional agents. In Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems. ACM, page 11.
  • Jockers and Underwood (2016) Matthew L Jockers and Ted Underwood. 2016. Text-mining the humanities. In A New Companion to Digital Humanities, Wiley Online Library, pages 291–306.
  • Johnson (2012) Dan R Johnson. 2012. Transportation into a story increases empathy, prosocial behavior, and perceptual bias toward fearful expressions. Personality and Individual Differences 52(2):150–155.
  • Johnson and Wiles (2003) Daniel Johnson and Janet Wiles. 2003. Effective affective user interface design in games. Ergonomics 46(13-14):1332–1345.
  • Johnson-Laird and Oatley (1989) Philip N. Johnson-Laird and Keith Oatley. 1989. The language of emotions: An analysis of a semantic field. Cognition and emotion 3(2):81–123.
  • Johnson-Laird and Oatley (2016) Philip N. Johnson-Laird and Keith Oatley. 2016. Handbook of emotions, Guilford Publications, chapter Emotions in Music, Literature, and Film, pages 82–97.
  • Joshi and Penstein-Rosé (2009) Mahesh Joshi and Carolyn Penstein-Rosé. 2009. Generalizing dependency features for opinion mining. In Proceedings of the ACL-IJCNLP 2009 conference short papers. Association for Computational Linguistics, pages 313–316.
  • Jurafsky and James (2000) Daniel Jurafsky and H. Martin James. 2000. Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. Prentice Hall.
  • Kahn Jr et al. (2012) Peter H Kahn Jr, Takayuki Kanda, Hiroshi Ishiguro, Brian T Gill, Jolina H Ruckert, Solace Shen, Heather E Gary, Aimee L Reichert, Nathan G Freier, and Rachel L Severson. 2012. Do people hold a humanoid robot morally accountable for the harm it causes? In Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction. ACM, pages 33–40.
  • Kakkonen and Galic Kakkonen (2011) Tuomo Kakkonen and Gordana Galic Kakkonen. 2011. Sentiprofiler: Creating comparable visual profiles of sentimental content in texts. In Proceedings of the Workshop on Language Technologies for Digital Humanities and Cultural Heritage. Association for Computational Linguistics, pages 62–69. http://www.aclweb.org/anthology/W11-4110.
  • Kennedy and Inkpen (2006) Alistair Kennedy and Diana Inkpen. 2006. Sentiment classification of movie reviews using contextual valence shifters. Computational intelligence 22(2):110–125.
  • Kessler et al. (2010) Jason S. Kessler, Miriam Eckert, Lyndsie Clark, and Nicolas Nicolov. 2010. The 2010 ICWSM JDPA Sentiment Corpus for the Automotive Domain. In Proc. od the 4th International AAAI Conference on Weblogs and Social Media Data Workshop Challenge (ICWSM-DWC 2010).
  • Khan et al. (2015) Aamera ZH Khan, Mohammad Atique, and VM Thakare. 2015. Combining lexicon-based and learning-based methods for twitter sentiment analysis. International Journal of Electronics, Communication and Soft Computing Science & Engineering (IJECSCSE) 89:89.
  • Kidd and Castano (2013) David Comer Kidd and Emanuele Castano. 2013. Reading literary fiction improves theory of mind. Science 342(6156):377–380.
  • Kim and Klinger (2018) Evgeny Kim and Roman Klinger. 2018. Who feels what and why? annotation of a literature corpus with semantic roles of emotions. In Proceedings of COLING 2018, the 27th International Conference on Computational Linguistics. Santa Fe, USA.
  • Kim et al. (2017a) Evgeny Kim, Sebastian Padó, and Roman Klinger. 2017a. Investigating the relationship between literary genres and emotional plot development. In Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature. pages 17–26.
  • Kim et al. (2017b) Evgeny Kim, Sebastian Padó, and Roman Klinger. 2017b. Prototypical emotion developments in adventures, romances, and mystery stories. In Digital Humanities 2017: Conference Abstracts. Montreal, Canada.
  • Kim et al. (2009) Jungi Kim, Jin-Ji Li, and Jong-Hyeok Lee. 2009. Discovering the discriminative views: measuring term weights for sentiment analysis. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1-Volume 1. Association for Computational Linguistics, pages 253–261.
  • Kim and Hovy (2006) Soo-Min Kim and Eduard Hovy. 2006. Extracting opinions, opinion holders, and topics expressed in online news media text. In Proceedings of the Workshop on Sentiment and Subjectivity in Text. Association for Computational Linguistics, Stroudsburg, PA, USA, SST ’06, pages 1–8.
  • Kim et al. (2012) Suin Kim, JinYeong Bak, and Alice Haeyun Oh. 2012. Do you feel what i feel? social aspects of emotions in twitter conversations. In ICWSM.
  • Kiritchenko et al. (2014) Svetlana Kiritchenko, Xiaodan Zhu, Colin Cherry, and Saif Mohammad. 2014. Nrc-canada-2014: Detecting aspects and sentiment in customer reviews. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014). pages 437–442.
  • Klein and Cook (2012) Barbara Klein and Glenda Cook. 2012. Emotional robotics in elder care–a comparison of findings in the uk and germany. Social Robotics 7621:108–117.
  • Klinger and Cimiano (2013a) Roman Klinger and Philipp Cimiano. 2013a. Bi-directional inter-dependencies of subjective expressions and targets and their value for a joint model. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). volume 2, pages 848–854.
  • Klinger and Cimiano (2013b) Roman Klinger and Philipp Cimiano. 2013b. Joint and pipeline probabilistic models for fine-grained sentiment analysis: Extracting aspects, subjective phrases and their relations. In Data Mining Workshops (ICDMW), 2013 IEEE 13th International Conference on. IEEE, pages 937–944.
  • Klinger and Cimiano (2014) Roman Klinger and Philipp Cimiano. 2014. The USAGE review corpus for fine grained multi lingual opinion analysis. In Proceedings of the Ninth International Conference on Language Resources and Evaluation LREC’14. European Language Resources Association (ELRA), Reykjavik, Iceland.
  • Klinger and Cimiano (2015) Roman Klinger and Philipp Cimiano. 2015. Instance selection improves cross-lingual model training for fine-grained sentiment analysis. In Proceedings of the Nineteenth Conference on Computational Natural Language Learning. ACL, Beijing, China, pages 153–163.
  • Klinger et al. (2018) Roman Klinger, Orphée de Clercq, Saif M. Mohammad, and Alexandra Balahur. 2018. Iest: Wassa-2018 implicit emotions shared task. In Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. Association for Computational Linguistics, Brussels, Belgium.
  • Klinger et al. (2016) Roman Klinger, Surayya Samat Suliya, and Nils Reiter. 2016. Automatic Emotion Detection for Quantitative Literary Studies – A case study based on Franz Kafka’s “Das Schloss” and “Amerika”. In Digital Humanities 2016: Conference Abstracts. Kraków, Poland.
  • Kohonen (1990) Teuvo Kohonen. 1990. The self-organizing map. Proceedings of the IEEE 78(9):1464–1480.
  • Koolen (2018) Corina Koolen. 2018. Women’s books versus books by women. In Digital Humanities 2018: Conference Abstracts. Mexico, Mexico.
  • Köper et al. (2017) Maximilian Köper, Evgeny Kim, and Roman Klinger. 2017. Ims at emoint-2017: emotion intensity prediction with affective norms, automatically extended resources and deep learning. In Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. pages 50–57.
  • Kouloumpis et al. (2011) Efthymios Kouloumpis, Theresa Wilson, and Johanna D. Moore. 2011. Twitter sentiment analysis: The good the bad and the omg! In Proceedings of the Fifth International Conference on Weblogs and Social Media, Barcelona, Catalonia, Spain, July 17-21, 2011.
  • Kübler et al. (2009) Sandra Kübler, Ryan McDonald, and Joakim Nivre. 2009. Dependency parsing. Synthesis Lectures on Human Language Technologies 1(1):1–127.
  • Kuivalainen (2009) Päivi Kuivalainen. 2009. Emotions in narrative: A linguistic study of katherine mansfield’s short fiction. The Electronic Journal of the Department of English at the University of Helsinki 5.
  • Köper and im Walde (2016) Maximilian Köper and Sabine Schulte im Walde. 2016. Automatically generated affective norms of abstractness, arousal, imageability and valence for 350 000 german lemmas. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). European Language Resources Association (ELRA), Paris, France.
  • Lamarque (1981) Peter Lamarque. 1981. How can we fear and pity fictions? The British Journal of Aesthetics 21(4):291.
  • Lanham (1989) Richard A Lanham. 1989. The electronic word: Literary study and the digital revolution. New Literary History 20(2):265–290.
  • Larsen and Diener (1992) Randy J Larsen and Edward Diener. 1992. Promises and problems with the circumplex model of emotion. Review of personality and social psychology 13:25–29.
  • LeCun et al. (1989) Yann LeCun, Bernhard Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne Hubbard, and Lawrence D Jackel. 1989. Backpropagation applied to handwritten zip code recognition. Neural computation 1(4):541–551.
  • LeCun et al. (1998) Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11):2278–2324. https://doi.org/10.1109/5.726791.
  • Lee and Renganathan (2011) Huey Yee Lee and Hemnaath Renganathan. 2011. Chinese sentiment analysis using maximum entropy. In Proceedings of the Workshop on Sentiment Analysis where AI meets Psychology (SAAIP 2011). Asian Federation of Natural Language Processing, Chiang Mai, Thailand, pages 89–93.
  • Leite et al. (2008) Iolanda Leite, André Pereira, Carlos Martinho, and Ana Paiva. 2008. Are emotional robots more fun to play with? In Robot and human interactive communication, 2008. RO-MAN 2008. The 17th IEEE international symposium on. IEEE, pages 77–82.
  • Levy et al. (2015) Omer Levy, Yoav Goldberg, and Ido Dagan. 2015. Improving distributional similarity with lessons learned from word embeddings. Transactions of the Association for Computational Linguistics 3:211–225.
  • Li and Campbell (2015) Ling Li and James Campbell. 2015. Emotion Modeling and Interaction of NPCS in Virtual Simulation and Games. International Journal of Virtual Reality (IJVR) 9(4):1–6. https://hal.archives-ouvertes.fr/hal-01530518.
  • Li et al. (2010) Shoushan Li, Sophia Yat Mei Lee, Ying Chen, Chu-Ren Huang, and Guodong Zhou. 2010. Sentiment classification and polarity shifting. In Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA, USA, COLING ’10, pages 635–643.
  • Lin et al. (2012) C. Lin, Y. He, R. Everson, and S. Ruger. 2012. Weakly supervised joint sentiment-topic detection from text. IEEE Transactions on Knowledge and Data Engineering 24(6):1134–1145. https://doi.org/10.1109/TKDE.2011.48.
  • Lin and He (2009) Chenghua Lin and Yulan He. 2009. Joint sentiment/topic model for sentiment analysis. In Proceedings of the 18th ACM Conference on Information and Knowledge Management. ACM, New York, NY, USA, CIKM ’09, pages 375–384. https://doi.org/10.1145/1645953.1646003.
  • Liu (2012) Bing Liu. 2012. Sentiment analysis and opinion mining. Synthesis lectures on human language technologies 5(1):1–167.
  • Liu (2015) Bing Liu. 2015. Sentiment Analysis. Cambridge University Press.
  • Liu and Lei (2018) Dilin Liu and Lei Lei. 2018. The appeal to political sentiment: An analysis of donald trump’s and hillary clinton’s speech themes and discourse strategies in the 2016 us presidential election. Discourse, Context & Media https://doi.org/10.1016/j.dcm.2018.05.001.
  • Malle et al. (2015) Bertram F Malle, Matthias Scheutz, Thomas Arnold, John Voiklis, and Corey Cusimano. 2015. Sacrifice one for the good of many?: People apply different moral norms to human and robot agents. In Proceedings of the tenth annual ACM/IEEE international conference on human-robot interaction. ACM, pages 117–124.
  • Mäntylä et al. (2018) Mika V Mäntylä, Daniel Graziotin, and Miikka Kuutila. 2018. The evolution of sentiment analysis—a review of research topics, venues, and top cited papers. Computer Science Review 27:16–32.
  • Mar et al. (2011) Raymond A Mar, Keith Oatley, Maja Djikic, and Justin Mullin. 2011. Emotion and narrative fiction: Interactive influences before, during, and after reading. Cognition & emotion 25(5):818–833.
  • Mar et al. (2009) Raymond A Mar, Keith Oatley, and Jordan B Peterson. 2009. Exploring the link between reading fiction and empathy: Ruling out individual differences and examining outcomes. Communications 34(4):407–428.
  • Maragoudakis et al. (2011) Manolis Maragoudakis, Euripidis Loukis, and Yannis Charalabidis. 2011. A review of opinion mining methods for analyzing citizens’ contributions in public policy debate. In ePart. Springer, pages 298–313.
  • Marchetti et al. (2014) Alessandro Marchetti, Rachele Sprugnoli, and Sara Tonelli. 2014. Sentiment analysis for the humanities: the case of historical texts. In Digital Humanities 2014: Conference Abstracts. University of Lausanne (UNIL) & Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland. pages 254–257.
  • Marcus et al. (1993) Mitchell P Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. 1993. Building a large annotated corpus of english: The penn treebank. Computational linguistics 19(2):313–330.
  • Marrese-Taylor et al. (2014) Edison Marrese-Taylor, Juan D Velásquez, and Felipe Bravo-Marquez. 2014. A novel deterministic approach for aspect-based opinion mining in tourism products reviews. Expert Systems with Applications 41(17):7764–7775.
  • Martineau and Finin (2009) Justin Martineau and Tim Finin. 2009. Delta tfidf: An improved feature space for sentiment analysis. In International AAAI Conference on Web and Social Media. volume 9, page 106.
  • Marvel et al. (2011) Seth A Marvel, Jon Kleinberg, Robert D Kleinberg, and Steven H Strogatz. 2011. Continuous-time model of structural balance. Proceedings of the National Academy of Sciences 108(5):1771–1776.
  • Mikolov et al. (2013) Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2. Curran Associates Inc., USA, NIPS’13, pages 3111–3119.
  • Mohammad (2011) Saif Mohammad. 2011. From once upon a time to happily ever after: Tracking emotions in novels and fairy tales. In Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities. Association for Computational Linguistics, pages 105–114.
  • Mohammad and Bravo-Marquez (2017) Saif Mohammad and Felipe Bravo-Marquez. 2017. Wassa-2017 shared task on emotion intensity. In Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. Association for Computational Linguistics, Copenhagen, Denmark, pages 34–49. http://www.aclweb.org/anthology/W17-5205.
  • Mohammad et al. (2018) Saif Mohammad, Felipe Bravo-Marquez, Mohammad Salameh, and Svetlana Kiritchenko. 2018. Semeval-2018 task 1: Affect in tweets. In Proceedings of The 12th International Workshop on Semantic Evaluation. Association for Computational Linguistics, New Orleans, Louisiana, pages 1–17. http://www.aclweb.org/anthology/S18-1001.
  • Mohammad (2012a) Saif M Mohammad. 2012a. # emotional tweets. In Proceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation. Association for Computational Linguistics, pages 246–255.
  • Mohammad (2012b) Saif M Mohammad. 2012b. From once upon a time to happily ever after: Tracking emotions in mail and books. Decision Support Systems 53(4):730–741.
  • Mohammad and Turney (2013) Saif M Mohammad and Peter D Turney. 2013. Crowdsourcing a word–emotion association lexicon. Computational Intelligence 29(3):436–465.
  • Morin and Acerbi (2017) Olivier Morin and Alberto Acerbi. 2017. Birth of the cool: a two-centuries decline in emotional expression in anglophone fiction. Cognition and Emotion 31(8):1663–1675. PMID: 27910735. https://doi.org/10.1080/02699931.2016.1260528.
  • Munezero et al. (2014) Myriam D Munezero, Calkin Suero Montero, Erkki Sutinen, and John Pajunen. 2014. Are they different? affect, feeling, emotion, sentiment, and opinion detection in text. IEEE transactions on affective computing 5(2):101–111.
  • Nakov et al. (2016) Preslav Nakov, Alan Ritter, Sara Rosenthal, Fabrizio Sebastiani, and Veselin Stoyanov. 2016. Semeval-2016 task 4: Sentiment analysis in twitter. In Proceedings of the 10th international workshop on semantic evaluation (semeval-2016). pages 1–18.
  • Nalisnick and Baird (2013a) Eric T Nalisnick and Henry S Baird. 2013a. Character-to-character sentiment analysis in shakespeare’s plays. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). volume 2, pages 479–483.
  • Nalisnick and Baird (2013b) Eric T Nalisnick and Henry S Baird. 2013b. Extracting sentiment networks from shakespeare’s plays. In Document Analysis and Recognition (ICDAR), 2013 12th International Conference on. IEEE, pages 758–762.
  • Neill (1991) Alex Neill. 1991. Fear, fiction and make-believe. The Journal of Aesthetics and Art Criticism 49(1):47–56.
  • Neviarouskaya et al. (2009) Alena Neviarouskaya, Helmut Prendinger, and Mitsuru Ishizuka. 2009. Compositionality principle in recognition of fine-grained emotions from text. In Eytan Adar, Matthew Hurst, Tim Finin, Natalie S. Glance, Nicolas Nicolov, and Belle L. Tseng, editors, Third International AAAI Conference on Weblogs and Social Media. The AAAI Press. http://aaai.org/ocs/index.php/ICWSM/09/paper/view/197.
  • Nielsen (2011) Finn Å. Nielsen. 2011. AFINN. http://www2.imm.dtu.dk/pubdb/p.php?6010.
  • Ochs et al. (2008) Magalie Ochs, Nicolas Sabouret, and Vincent Corruble. 2008. Modeling the dynamics of non-player characters’ social relations in video games. In AIIDE.
  • Ochs et al. (2009) Magalie Ochs, Nicolas Sabouret, and Vincent Corruble. 2009. Simulation of the dynamics of nonplayer characters’ emotions and social relations in games. IEEE Transactions on Computational Intelligence and AI in Games 1(4):281–297.
  • Oliver (1993) Mary Beth Oliver. 1993. Exploring the paradox of the enjoyment of sad films. Human Communication Research 19(3):315–342.
  • Oliver (2008) Mary Beth Oliver. 2008. Tender affective states as predictors of entertainment preference. Journal of Communication 58(1):40–61.
  • Oliver et al. (2000) Mary Beth Oliver, III James B Weaver, and Stephanie Lee Sargent. 2000. An examination of factors related to sex differences in enjoyment of sad films. Journal of Broadcasting & Electronic Media 44(2):282–300.
  • Paltoglou and Thelwall (2010) Georgios Paltoglou and Mike Thelwall. 2010. A study of information retrieval weighting schemes for sentiment analysis. In Proceedings of the 48th annual meeting of the association for computational linguistics. Association for Computational Linguistics, pages 1386–1395.
  • Pang and Lee (2005) Bo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd annual meeting on association for computational linguistics. Association for Computational Linguistics, pages 115–124.
  • Pang et al. (2002) Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up?: Sentiment classification using machine learning techniques. In Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing - Volume 10. Association for Computational Linguistics, Stroudsburg, PA, USA, EMNLP ’02, pages 79–86. https://doi.org/10.3115/1118693.1118704.
  • Pang et al. (2008) Bo Pang, Lillian Lee, et al. 2008. Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval 2(1–2):1–135.
  • Pantic and Rothkrantz (2000) Maja Pantic and Leon JM Rothkrantz. 2000. Expert system for automatic analysis of facial expressions. Image and Vision Computing 18(11):881–905.
  • Pantic and Rothkrantz (2003) Maja Pantic and Leon JM Rothkrantz. 2003. Toward an affect-sensitive multimodal human-computer interaction. Proceedings of the IEEE 91(9):1370–1390.
  • Parkhe and Biswas (2016) Viraj Parkhe and Bhaskar Biswas. 2016. Sentiment analysis of movie reviews: finding most important movie aspects using driving factors. Soft Computing 20(9):3373–3379.
  • Patti et al. (2015) Viviana Patti, Federico Bertola, and Antonio Lieto. 2015. Arsemotica for arsmeteo. org: Emotion-driven exploration of online art collections. In The Twenty-Eighth International Florida Artificial Intelligence Research Society Conference. Association for the Advancement of Artificial Intelligence, pages 288–293.
  • Pennebaker et al. (2007) JW Pennebaker, CK Chung, M Ireland, A Gonzales, and RJ Booth. 2007. The development and psychometric properties of liwc2007: Liwc. net.
  • Pennington et al. (2014) Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). pages 1532–1543.
  • Pereira et al. (2005) D. Pereira, E. Oliveira, N. Moreira, and L. Sarmento. 2005. Towards an architecture for emotional bdi agents. In 2005 Portuguese Conference on Artificial Intelligence. pages 40–46. https://doi.org/10.1109/EPIA.2005.341262.
  • Petrov et al. (2012) Slav Petrov, Dipanjan Das, and Ryan McDonald. 2012. A universal part-of-speech tagset. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC-2012). European Language Resources Association (ELRA). http://www.aclweb.org/anthology/L12-1115.
  • Pianta et al. (2002) Emanuele Pianta, Luisa Bentivogli, and Christian Girardi. 2002. Developing an aligned multilingual database. In Proceedings of the 1st International Conference on Global WordNet.
  • Piper and Jean So (2015) Andrew Piper and Richard Jean So. 2015. Quantifying the weepy bestseller. Https://newrepublic.com/article/126123/quantifying-weepy-bestseller.
  • Plato (1969) Plato. 1969. Plato in Twelve Volumes. Harvard University Press, Cambridge, MA. Vols. 5 & 6 translated by Paul Shorey. http://classics.mit.edu/Plato/republic.html.
  • Plutchik (1991) Robert Plutchik. 1991. The emotions. University Press of America.
  • Polanyi and Zaenen (2006) Livia Polanyi and Annie Zaenen. 2006. Contextual valence shifters. Computing Attitude and Affect in Text 20:1–10.
  • Popescu et al. (2014) Alexandru Popescu, Joost Broekens, and Maarten Van Someren. 2014. Gamygdala: An emotion engine for games. IEEE Transactions on Affective Computing 5(1):32–44.
  • Porter (1980) Martin F Porter. 1980. An algorithm for suffix stripping. Program 14(3):130–137.
  • Posner et al. (2005) Jonathan Posner, James A. Russell, and Bradley S. Peterson. 2005. The circumplex model of affect: an integrative approach to affective neuroscience, cognitive development, and psychopathology. Dev Psychopathol 17(3):715–734. https://doi.org/10.1017/S0954579405050340.
  • Radford and Weston (1975) Colin Radford and Michael Weston. 1975. How can we be moved by the fate of anna karenina? Proceedings of the Aristotelian Society, Supplementary Volumes 49:67–93.
  • Rayson and Garside (1998) Paul Rayson and Roger Garside. 1998. The claws web tagger. ICAME Journal 22:121–123.
  • Reagan et al. (2016) Andrew J Reagan, Lewis Mitchell, Dilan Kiley, Christopher M Danforth, and Peter Sheridan Dodds. 2016. The emotional arcs of stories are dominated by six basic shapes. EPJ Data Science 5(1):31.
  • Rebora (2017) Simone Rebora. 2017. A software pipeline for the reception of italian literature in nineteenth-century england: Preliminary testing. In Proceedings of the 2nd International Conference on Digital Access to Textual Cultural Heritage. ACM, pages 129–134.
  • Reed (2018) Ethan Reed. 2018. Measured unrest in the poetry of the black arts movement. In Digital Humanities 2018: Conference Abstracts. Mexico, Mexico.
  • Reitan et al. (2015) Johan Reitan, Jørgen Faret, Björn Gambäck, and Lars Bungum. 2015. Negation scope detection for twitter sentiment analysis. In Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. ACL, Lisboa, Portugal, pages 99–108.
  • Rill et al. (2014) Sven Rill, Dirk Reinel, Jörg Scheidt, and Roberto V Zicari. 2014. Politwi: Early detection of emerging political topics on twitter and the impact on concept-level sentiment analysis. Knowledge-Based Systems 69:24–33.
  • Rinaldi et al. (2013) Sergio Rinaldi, Pietro Landi, and FABIO DELLA ROSSA. 2013. Small discoveries can have great consequences in love affairs: the case of beauty and the beast. International Journal of Bifurcation and Chaos 23(11):1330038.
  • Robinson (2005) Jenefer Robinson. 2005. Deeper than reason: Emotion and its role in literature, music, and art. Oxford University Press on Demand.
  • Rosenthal et al. (2017) Sara Rosenthal, Noura Farra, and Preslav Nakov. 2017. Semeval-2017 task 4: Sentiment analysis in twitter. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). pages 502–518.
  • Ross (1999) Catherine Sheldrick Ross. 1999. Finding without seeking: the information encounter in the context of reading for pleasure. Information Processing & Management 35(6):783–799.
  • Russell (1980) James A. Russell. 1980. A circumplex model of affect. Journal of Personality and Social Psychology 39:1161–1178.
  • Russell (1994) James A. Russell. 1994. Is there universal recognition of emotion from facial expression? a review of the cross-cultural studies. Psychological bulletin 115(1):102.
  • Russell (2003) James A. Russell. 2003. Core affect and the psychological construction of emotion. Psychological review 110(1):145.
  • Russell et al. (2003) James A. Russell, Jo-Anne Bachorowski, and José-Miguel Fernández-Dols. 2003. Facial and vocal expressions of emotion. Annual review of psychology 54(1):329–349.
  • Russell and Barrett (1999) James A. Russell and Lisa F. Barrett. 1999. Core affect, prototypical emotional episodes, and other things called emotion: dissecting the elephant. J Pers Soc Psychol 76(5):805–819.
  • Sætre et al. (2014a) Lars Sætre, Patrizia Lombardo, and Julien Zanetta. 2014a. Exploring Text and Emotions, Aarhus University Press, chapter Text and Emotions, pages 9–26.
  • Sætre et al. (2014b) Lars Sætre, Patrizia Lombardo, and Julien Zanetta. 2014b. Exploring Text and Emotions, volume 13. Aarhus University Press.
  • Samothrakis and Fasli (2015) Spyridon Samothrakis and Maria Fasli. 2015. Emotional sentence annotation helps predict fiction genre. PloS one 10(11):e0141922.
  • Samur et al. (2018) Dalya Samur, Mattie Tops, and Sander L Koole. 2018. Does a single session of reading literary fiction prime enhanced mentalising performance? four replication experiments of kidd and castano (2013). Cognition and Emotion 32:130–144.
  • Scherer and Wallbott (1994) Klaus R. Scherer and Harald G. Wallbott. 1994. Evidence for universality and cultural variation of differential emotion response patterning. Journal of personality and social psychology 66(2):310–328.
  • Schmidtke et al. (2014) David S. Schmidtke, Tobias Schröder, Arthur M. Jacobs, and Markus Conrad. 2014. Angst: affective norms for german sentiment terms, derived from the affective norms for english words. Behav Res Methods 46(4):1108–1118. https://doi.org/10.3758/s13428-013-0426-y.
  • Schreibman et al. (2015) Susan Schreibman, Ray Siemens, and John Unsworth. 2015. A New Companion to Digital Humanities. John Wiley & Sons.
  • Schuff et al. (2017) Hendrik Schuff, Jeremy Barnes, Julian Mohme, Sebastian Padó, and Roman Klinger. 2017. Annotation, modelling and analysis of fine-grained emotions on a stance and sentiment detection corpus. In Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. pages 13–23.
  • Sloman and Croucher (1981) Aaron Sloman and Monica Croucher. 1981. Why robots will have emotions. Technical report, Sussex University.
  • Smedt and Daelemans (2012) Tom De Smedt and Walter Daelemans. 2012. Pattern for python. Journal of Machine Learning Research 13(Jun):2063–2067.
  • Sohangir et al. (2018) Sahar Sohangir, Dingding Wang, Anna Pomeranets, and Taghi M Khoshgoftaar. 2018. Big data: deep learning for financial sentiment analysis. Journal of Big Data 5(1):3.
  • Sprugnoli et al. (2016) Rachele Sprugnoli, Sara Tonelli, Alessandro Marchetti, and Giovanni Moretti. 2016. Towards sentiment analysis for historical texts. Digital Scholarship in the Humanities 31(4):762–772. https://doi.org/10.1093/llc/fqv027.
  • Stone et al. (1968) Philip J Stone, Dexter C Dunphy, and Marshall S Smith. 1968. The general inquirer: A computer approach to content analysis. American Journal of Sociology 73(5):634–635.
  • Strapparava and Mihalcea (2007) Carlo Strapparava and Rada Mihalcea. 2007. Semeval-2007 task 14: Affective text. In SemEval. ACL, Prague, Czech Republic, pages 70–74. http://www.aclweb.org/anthology/S/S07/S07-1013.
  • Strapparava and Valitutti (2004) Carlo Strapparava and Alessandro Valitutti. 2004. Wordnet-affect: an affective extension of wordnet. In Language Resources and Evaluation Conference. pages 1083–1086.
  • Suttles and Ide (2013) Jared Suttles and Nancy Ide. 2013. Distant supervision for emotion classification with discrete binary values. In Alexander Gelbukh, editor, Computational Linguistics and Intelligent Text Processing. Springer Berlin Heidelberg, Berlin, Heidelberg, pages 121–136.
  • Sweetser and Johnson (2004) Penelope Sweetser and Daniel Johnson. 2004. Player-centered game environments: Assessing player opinions, experiences, and issues. In ICEC. Springer, pages 321–332.
  • Taboada et al. (2006a) Maite Taboada, Caroline Anthony, and Kimberly Voll. 2006a. Methods for creating semantic orientation dictionaries. In Proceedings of the 5th Conference on Language Resources and Evaluation (LREC’06). pages 427–432.
  • Taboada et al. (2006b) Maite Taboada, Mary Ann Gillies, and Paul McFetridge. 2006b. Sentiment classification techniques for tracking literary reputation. In LREC workshop: towards computational models of literary analysis. pages 36–43.
  • Taboada et al. (2008) Maite Taboada, Mary Ann Gillies, Paul McFetridge, and Robert Outtrim. 2008. Tracking literary reputation with text analysis tools. In Meeting of the Society for Digital Humanities.
  • Täckström and McDonald (2011) Oscar Täckström and Ryan McDonald. 2011. Semi-supervised latent variable models for sentence-level sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers - Volume 2. Association for Computational Linguistics, Stroudsburg, PA, USA, HLT ’11, pages 569–574. http://dl.acm.org/citation.cfm?id=2002736.2002848.
  • Tamborini et al. (2010) Ron Tamborini, Nicholas David Bowman, Allison Eden, Matthew Grizzard, and Ashley Organ. 2010. Defining media enjoyment as the satisfaction of intrinsic needs. Journal of communication 60(4):758–777.
  • Tang et al. (2014) Duyu Tang, Furu Wei, Nan Yang, Ming Zhou, Ting Liu, and Bing Qin. 2014. Learning sentiment-specific word embedding for twitter sentiment classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). volume 1, pages 1555–1565.
  • Tang et al. (2018) Tiffany Y Tang, Pinata Winoto, Aonan Guan, and Guanxing Chen. 2018. The foreign language effect and movie recommendation: A comparative study of sentiment analysis of movie reviews in chinese and english. In Proceedings of the 2018 10th International Conference on Machine Learning and Computing. ACM, pages 79–84.
  • Titov and McDonald (2008) Ivan Titov and Ryan McDonald. 2008. A joint model of text and aspect ratings for sentiment summarization. In Proceedings of ACL-08: HLT. Association for Computational Linguistics, pages 308–316. http://www.aclweb.org/anthology/P08-1036.
  • Tolstoy (1962) Leo Tolstoy. 1962. What is art?: and essays on art, volume 331. Reprint Services Corp.
  • Tomkins (1962) Silvan Tomkins. 1962. Affect imagery consciousness: Volume I: The positive affects. Springer publishing company.
  • Tullmann and Buckwalter (2014) Katherine Tullmann and Wesley Buckwalter. 2014. Does the paradox of fiction exist? Erkenntnis 79(4):779–796. https://doi.org/10.1007/s10670-013-9563-z.
  • Turney (2002) Peter D Turney. 2002. Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. http://www.aclweb.org/anthology/P02-1053.
  • Turney and Littman (2003) Peter D Turney and Michael L Littman. 2003. Measuring praise and criticism: Inference of semantic orientation from association. ACM Transactions on Information Systems (TOIS) 21(4):315–346.
  • Uszkoreit (2001) Hans Uszkoreit. 2001. Repräsentationen und Prozesse in der Sprachverarbeitung. http://www.coli.uni-saarland.de/~hansu/Verarbeitung.html. Accessed: 2018-07-01.
  • Van Horn (1997) Leigh Van Horn. 1997. The characters within us: Readers connect with characters to create meaning and understanding. Journal of Adolescent & Adult Literacy 40(5):342–347.
  • van Meel (1995) Jacques M van Meel. 1995. Representing emotions in literature and paintings: a comparative analysis. Poetics 23(1-2):159–176.
  • Vanhoutte (2013) Edward Vanhoutte. 2013. The gates of hell: History and definition of digital— humanities— computing, Ashgate Surrey, pages 119–56.
  • Velásquez (1998) Juan David Velásquez. 1998. When robots weep: emotional memories and decision-making. In AAAI-98 Proceedings.
  • Vorderer et al. (2004) Peter Vorderer, Christoph Klimmt, and Ute Ritterfeld. 2004. Enjoyment: At the heart of media entertainment. Communication theory 14(4):388–408.
  • Walton (1978) Kendall L Walton. 1978. Fearing fictions. The Journal of Philosophy 75(1):5–27.
  • Wan (2009) Xiaojun Wan. 2009. Co-training for cross-lingual sentiment classification. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Association for Computational Linguistics, pages 235–243. http://www.aclweb.org/anthology/P09-1027.
  • Ward Jr. (1963) Joe H. Ward Jr. 1963. Hierarchical grouping to optimize an objective function. Journal of the American statistical association 58(301):236–244.
  • Warriner et al. (2013) Amy Beth Warriner, Victor Kuperman, and Marc Brysbaert. 2013. Norms of valence, arousal, and dominance for 13,915 english lemmas. Behavior Research Methods 45(4):1191–1207. https://doi.org/10.3758/s13428-012-0314-x.
  • Wei and Pal (2010) Bin Wei and Christopher Pal. 2010. Cross lingual adaptation: An experiment on sentiment classifications. In ACL. Uppsala, Sweden, pages 258–262.
  • Wright (1997) Ian Paul Wright. 1997. Emotional agents. Ph.D. thesis, University of Birmingham.
  • Xia et al. (2015) Rui Xia, Feng Xu, Chengqing Zong, Qianmu Li, Yong Qi, Tao Li, et al. 2015. Dual sentiment analysis: Considering two sides of one review. IEEE Transactions on Knowledge and Data Engineering 27(8):2120–2133. https://doi.org/10.1109/TKDE.2015.2407371.
  • Yang and Cardie (2013) Bishan Yang and Claire Cardie. 2013. Joint inference for fine-grained opinion extraction. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. Sofia, Bulgaria.
  • Yannakakis (2012) Geogios N Yannakakis. 2012. Game ai revisited. In Proceedings of the 9th conference on Computing Frontiers. ACM, pages 285–292.
  • Yaqub et al. (2017) Ussama Yaqub, Soon Ae Chun, Vijayalakshmi Atluri, and Jaideep Vaidya. 2017. Analysis of political discourse on twitter in the context of the 2016 us presidential elections. Government Information Quarterly 34(4):613–626.
  • Yu (2008) Bei Yu. 2008. An evaluation of text classification methods for literary study. Literary and Linguistic Computing 23(3):327–343.
  • Zehe et al. (2016) Albin Zehe, Martin Becker, Lena Hettinger, Andreas Hotho, Isabella Reger, and Fotis Jannidis. 2016. Prediction of happy endings in german novels based on sentiment information. In 3rd Workshop on Interactions between Data Mining and Natural Language Processing, Riva del Garda, Italy. page 9.
  • Zhang et al. (2018) Lei Zhang, Shuai Wang, and Bing Liu. 2018. Deep learning for sentiment analysis: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8(4):1–25. https://doi.org/10.1002/widm.1253.
  • Zhou (2003) Xiaoming Zhou. 2003. An affective student model to assess emotions in an educational game. Ph.D. thesis, University of British Columbia.
  • Zhuravlev et al. (2014) Mikhail Zhuravlev, Irina Golovacheva, and Polina de Mauny. 2014. Mathematical modelling of love affairs between the characters of the pre-masochistic novel. In 2014 Second World Conference on Complex Systems (WCCS). pages 396–401.
  • Zillmann (1988) Dolf Zillmann. 1988. Mood management through communication choices. American Behavioral Scientist 31(3):327–340.
  • Zillmann et al. (1980) Dolf Zillmann, Richard T Hezel, and Norman J Medoff. 1980. The effect of affective states on selective exposure to televised entertainment fare. Journal of Applied Social Psychology 10(4):323–339.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description