Human language reveals a universal positivity bias
Using human evaluation of 100,000 words spread across 24 corpora in 10 languages diverse in origin and culture, we present evidence of a deep imprint of human sociality in language, observing that (1) the words of natural human language possess a universal positivity bias; (2) the estimated emotional content of words is consistent between languages under translation; and (3) this positivity bias is strongly independent of frequency of word usage. Alongside these general regularities, we describe inter-language variations in the emotional spectrum of languages which allow us to rank corpora. We also show how our word evaluations can be used to construct physical-like instruments for both real-time and offline measurement of the emotional content of large-scale texts.
Human language—our great social technology—reflects that which it describes through the stories it allows to be told, and us, the tellers of those stories. While language’s shaping effect on thinking has long been controversial Whorf (1956); Chomsky (1957); Pinker (1994), we know that a rich array of metaphor encodes our conceptualizations Lakoff and Johnson (1980), word choice reflects our internal motives and immediate social roles Campbell and Pennebaker (2003); Newman (2003); Pennebaker (2011), and the way a language represents the present and future may condition economic choices Chen (2013).
In 1969, Boucher and Osgood framed the Pollyanna Hypothesis: a hypothetical, universal positivity bias in human communication Boucher and Osgood (1969). From a selection of small-scale, cross-cultural studies, they marshaled evidence that positive words are likely more prevalent, more meaningful, more diversely used, and more readily learned. However, in being far from an exhaustive, data-driven analysis of language—the approach we take here—their findings could only be regarded as suggestive. Indeed, studies of the positivity of isolated words and word stems have produced conflicting results, some pointing toward a positivity bias Bradley and Lang (1999), others the opposite Stone et al. (1966); Pennebaker et al. (2007), though attempts to adjust for usage frequency tend to recover a positivity signal Jurafsky et al. (2014).
To deeply explore the positivity of human language, we constructed 24 corpora spread across 10 languages (see Supplementary Online Material). Our global coverage of linguistically and culturally diverse languages includes English, Spanish, French, German, Brazilian Portuguese, Korean, Chinese (Simplified), Russian, Indonesian, and Arabic. The sources of our corpora are similarly broad, spanning books goo (), news outlets, social media, the web goo (2006), television and movie subtitles, and music lyrics Dodds and Payne (2009). Our work here greatly expands upon our earlier study of English alone, where we found strong evidence for a usage-invariant positivity bias Kloumann et al. (2012).
We address the social nature of language in two important ways: (1) we focus on the words people most commonly use, and (2) we measure how those same words are received by individuals. We take word usage frequency as the primary organizing measure of a word’s importance. Such a data-driven approach is crucial for both understanding the structure of language and for creating linguistic instruments for principled measurements Dodds et al. (2011); Mitchell et al. (2013). By contrast, earlier studies focusing on meaning and emotion have used ‘expert’ generated word lists, and these fail to statistically match frequency distributions of natural language Osgood et al. (1957); Stone et al. (1966); Bradley and Lang (1999); Pennebaker et al. (2007), confounding attempts to make claims about language in general. For each of our corpora we selected between 5,000 to 10,000 of the most frequently used words, choosing the exact numbers so that we obtained approximately 10,000 words for each language.
We then paid native speakers to rate how they felt in response to individual words on a 9 point scale, with 1 corresponding to most negative or saddest, 5 to neutral, and 9 to most positive or happiest Bradley and Lang (1999); Dodds et al. (2011) (see also Supplementary Online Material). This happy-sad semantic differential Osgood et al. (1957) functions as a coupling of two standard 5-point Likert scales. Participants were restricted to certain regions or countries (for example, Portuguese was rated by residents of Brazil). Overall, we collected 50 ratings per word for a total of around 5,000,000 individual human assessments, and we provide all data sets as part of the Supplementary Online Material.
In Fig. 1, we show distributions of the average happiness scores for all 24 corpora, leading to our most general observation of a clear positivity bias in natural language. We indicate the above neutral part of each distribution with yellow, below neutral with blue, and order the distributions moving upwards by increasing median (vertical red line). For all corpora, the median clearly exceeds the neutral score of 5. The background gray lines connect deciles for each distribution. In Fig. S1, we provide the same distributions ordered instead by increasing variance.
As is evident from the ordering in Figs. 1 and S1, while a positivity bias is the universal rule, there are minor differences between the happiness distributions of languages. For example, Latin American-evaluated corpora (Mexican Spanish and Brazilian Portuguese) exhibit relatively high medians and, to a lesser degree, higher variances. For other languages, we see those with multiple corpora have more variable medians, and specific corpora are not ordered by median in the same way across languages (e.g., Google Books has a lower median than Twitter for Russian, but the reverse is true for German and English). In terms of emotional variance, all four English corpora are among the highest, while Chinese and Russian Google Books seem especially constrained.
We now examine how individual words themselves vary in their average happiness score between languages. Owing to the scale of our corpora, we were compelled to use an online service, choosing Google Translate. For each of the 45 language pairs, we translated isolated words from one language to the other and then back. We then found all word pairs that (1) were translationally-stable, meaning the forward and back translation returns the original word, and (2) appeared in our corpora for each language.
We provide the resulting comparison between languages at the level of individual words in Fig. 2. We use the mean of each language’s word happiness distribution derived from their merged corpora to generate a rough overall ordering, acknowledging that frequency of usage is no longer meaningful, and moreover is not relevant as we are now investigating the properties of individual words. Each cell shows a heat map comparison with word density increasing as shading moves from gray to white. The background colors reflect the ordering of each pair of languages, yellow if the row language had a higher average happiness than the column language, and blue for the reverse. In each cell, we display the number of translation-stable words between language pairs, , along with the difference in average word happiness, , where each word is equally weighted.
A linear relationship is clear for each language-language comparison, and is supported by Pearson’s correlation coefficient being in the range 0.73 to 0.89 (-value across all pairs; see Fig. 2 and Tabs. S3, S4, and S5). Overall, this strong agreement between languages, previously observed on a small scale for a Spanish-English translation Redondo et al. (2007), suggests that approximate estimates of word happiness for unscored languages could be generated with no expense from our existing data set. Some words will of course translate unsatisfactorily, with the dominant meaning changing between languages. For example ‘lying’ in English, most readily interpreted as speaking falsehoods by our participants, translates to ‘acostado’ in Spanish, meaning recumbent. Nevertheless, happiness scores obtained by translation will be serviceable for purposes where the effects of many different words are incorporated. (See the Supplementary Online Material for links to an interactive visualization of Fig. 2.)
Stepping back from examining inter-language robustness, we return to a more detailed exploration of the rich structure of each corpus’s happiness distribution. In Fig. 3, we show how average word happiness is largely independent of word usage frequency for four example corpora. We first plot usage frequency rank of the 5000 most frequently used words as a function of their average happiness score, (background dots), along with some example evenly-spaced words. (We note that words at the extremes of the happiness scale are ones evaluators agreed upon strongly, while words near neutral range from being clearly neutral (e.g., =4.98) to contentious with high standard deviation Kloumann et al. (2012).) We then compute deciles for contiguous sets of 500 words, sliding this window through rank . These deciles form the vertical strands. We overlay randomly chosen, equally-spaced example words to give a sense of each corpus’s emotional texture.
We chose the four example corpora shown in Fig. 3 to be disparate in nature, covering diverse languages (French, Egyptian Arabic, Brazilian Portuguese, and Chinese), regions of the world (Europe, the Middle East, South America, and Asia), and texts (Twitter, movies and television, the Web goo (2006), and books goo ()). In the Supplementary Online Material, we show all 24 corpora yield similar plots (see Figs. S4–S7 and English translated versions, Figs. S12–S15). We also show how the standard deviation for word happiness exhibits an approximate self-similarity (Figs. S8–S11 and their translations, Figs. S16–S19).
Across all corpora, we observe visually that the deciles tend to stay fixed or move slightly toward the negative, with some expected fragility at the 10% and 90% levels (due to the distributions’ tails), indicating that each corpus’s overall happiness distribution approximately holds independent of word usage. In Fig. 3, for example, we see that both the Brazilian Portuguese and French examples show a small shift to the negative for increasingly rare words, while there is no visually clear trend for the Arabic and Chinese cases. Fitting typically returns on the order of -1 suggesting decreases 0.1 per 10,000 words. For standard deviations of happiness scores (Figs. S8–S11), we find a similarly weak drift toward higher values for increasingly rare words (see Tabs. S6 and S7 for correlations and linear fits for and as a function of word rank for all corpora). We thus find that, to first order, not just the positivity bias, but the happiness distribution itself applies for common words and rare words alike, revealing an unexpected addition to the many well known scalings found in natural language, famously exemplified by Zipf’s law Zipf (1949).
In constructing language-based instruments for measuring expressed happiness, such as our hedonometer Dodds et al. (2011), this frequency independence allows for a way to ‘increase the gain’ in a way resembling that of standard physical instruments. Moreover, we have earlier demonstrated the robustness of our hedonometer for the English language, showing, for example that measurements derived from Twitter correlate strongly with Gallup well-being polls and related indices at the state and city level for the United States Mitchell et al. (2013).
Here, we provide an illustrative use of our hedonometer in the realm of literature, inspired by Vonnegut’s shapes of stories Vonnegut (2005); von (). In Fig. 4, we show ‘happiness time series’ for three famous works of literature, evaluated in their original languages English, Russian, and French: A. Melville’s Moby Dick gut (), B. Dostoyevsky’s Crime and Punishment Dostoyevsky (), and C. Dumas’ Count of Monte Cristo gut (). We slide a 10,000-word window through each work, computing the average happiness using a ‘lens’ for the hedonometer in the following manner. We capitalize on our instrument’s tunablility to obtain a strong signal by excluding all words for which , i.e., we keep words residing in the tails of each distribution Dodds et al. (2011). Denoting a given lens by its corresponding set of allowed words , we estimate the happiness score of any text as where is the frequency of word in Dodds and Danforth (2009).
The three resulting happiness time series provide interesting, detailed views of each work’s narrative trajectory revealing numerous peaks and troughs throughout, at times clearly dropping below neutral. Both Moby Dick and Crime and Punishment end on low notes, whereas the Count of Monte Cristo culminates with a rise in positivity, accurately reflecting the finishing arcs of all three. The ‘word shifts’ overlaying the time series compare two distinct regions of each work, showing how changes in word abundances lead to overall shifts in average happiness. Such word shifts are essential tests of any sentiment measurement, and are made possible by the linear form of our instrument Dodds and Danforth (2009); Dodds et al. (2011) (see pp. Explanation of Word Shifts–Full Word Shifts in the Supplementary Online Material for a full explanation). As one example, the third word shift for Moby Dick shows why the average happiness of the last 10% of the book is well below that of the first 25%. The major contribution is an increase in relatively negative words including ‘missing’, ‘shot’, ‘poor’, ‘die’, and ‘evil’. We include full diagnostic versions of all word shifts in Figs. S21–S34.
By adjusting the lens, many other related time series can be formed such as those produced by focusing on only positive or negative words. Emotional variance as a function of text position can also be readily extracted. In the Supplementary Online Material, we provide links to online, interactive versions of these graphs where different lenses and regions of comparisons may be easily explored. Beyond this example tool we have created here for the digital humanities and our hedonometer for measuring population well-being, the data sets we have generated for the present study may be useful in creating a great variety of language-based instruments for assessing emotional expression.
Overall, our major scientific finding is that when experienced in isolation and weighted properly according to usage, words—the atoms of human language—present an emotional spectrum with a universal, self-similar positive bias. We emphasize that this apparent linguistic encoding of our social nature is a system level property, and in no way asserts all natural texts will skew positive (as exemplified by certain passages of the three works in Fig. 4), or diminishes the salience of negative states Forgas (2013). Nevertheless, a general positive bias points towards a positive social evolution, and may be linked to the gradual if haphazard trajectory of modern civilization toward greater human rights and decreases in violence Pinker (2011). Going forward, our word happiness assessments should be periodically repeated, and carried out for new languages, tested on different demographics, and expanded to phrases, both for the improvement of hedonometric instruments and to chart the dynamics of our collective social self.
- Whorf (1956) B. L. Whorf, Language, Thought, and Reality: Selected Writings of Benjamin Lee Whorf, language (MIT Press, Cambridge, MA, 1956) edited by John B. Carroll.
- Chomsky (1957) N. Chomsky, Syntactic Structures, language (Mouton, The Hague/Paris, 1957).
- Pinker (1994) S. Pinker, The Language Instinct: How the Mind Creates Language, language (William Morrow and Company, New York, NY, 1994).
- Lakoff and Johnson (1980) G. Lakoff and M. Johnson, Metaphors We Live By, language (University of Chicago Press, Chicago, IL, 1980).
- Campbell and Pennebaker (2003) S. R. Campbell and J. W. Pennebaker, Psychological Science 14, 60 (2003).
- Newman (2003) M. E. J. Newman, SIAM Rev. 45, 167 (2003).
- Pennebaker (2011) J. W. Pennebaker, The Secret Life of Pronouns: What Our Words Say About Us (Bloomsbury Press, New York, NY, 2011).
- Chen (2013) M. K. Chen, American Economic Review 2013, 103(2): 690-731 103, 690 (2013).
- Boucher and Osgood (1969) J. Boucher and C. E. Osgood, Journal of Verbal Learning and Verbal Behavior 8, 1 (1969).
- Bradley and Lang (1999) M. M. Bradley and P. J. Lang, Affective norms for English words (ANEW): Stimuli, instruction manual and affective ratings, Technical report C-1 (University of Florida, Gainesville, FL, 1999).
- Stone et al. (1966) P. J. Stone, D. C. Dunphy, and D. M. Smith, M. S.and Ogilvie, The general inquirer: A computer approach to content analysis. (MIT Press, Cambridge, Ma, 1966).
- Pennebaker et al. (2007) J. W. Pennebaker, R. J. Booth, and M. E. Francis, “Linguistic inquiry and word count: Liwc 2007,” at http://bit.ly/S1Dk2L, accessed May 15, 2014. (2007).
- Jurafsky et al. (2014) D. Jurafsky, V. Chahuneau, B. Routledge R., and N. A. Smith, First Monday 19 (2014).
- (14) Google Labs ngram viewer. Available at http://ngrams.googlelabs.com/. Accessed May 15, 2014.
- goo (2006) (2006), Google Web 1T 5-gram Version 1, distributed by the Linguistic Data Consortium (LDC).
- Dodds and Payne (2009) P. S. Dodds and J. L. Payne, Phys. Rev. E 79, 066115 (2009).
- Kloumann et al. (2012) I. M. Kloumann, C. M. Danforth, K. D. Harris, C. A. Bliss, and P. S. Dodds, PLoS ONE 7, e29484 (2012).
- Dodds et al. (2011) P. S. Dodds, K. D. Harris, I. M. Kloumann, C. A. Bliss, and C. M. Danforth, PLoS ONE 6, e26752 (2011).
- Mitchell et al. (2013) L. Mitchell, M. R. Frank, K. D. Harris, P. S. Dodds, and C. M. Danforth, PLoS ONE 8, e64417 (2013).
- Osgood et al. (1957) C. Osgood, G. Suci, and P. Tannenbaum, The Measurement of Meaning (University of Illinois, Urbana, IL, 1957).
- Redondo et al. (2007) J. Redondo, I. Fraga, I. Padron, and M. Comesana, Behavior Research Methods 39, 600 (August 2007).
- Zipf (1949) G. K. Zipf, Human Behaviour and the Principle of Least-Effort, patterns (Addison-Wesley, Cambridge, MA, 1949).
- Vonnegut (2005) K. Vonnegut, Jr., A Man Without a Country, stories (Seven Stories Press, New York, 2005).
- (24) “Kurt Vonnegut on the shapes of stories,” https://www.youtube.com/watch?v=oP3c1h8v2ZQ, accessed May 15, 2014.
- (25) The Gutenberg Project: http://www.gutenberg.org; accessed November 15, 2013.
- (26) F. Dostoyevsky, ‘‘Crime and punishment,” Original Russian text. Obtained from http://ilibrary.ru/text/69/p.1/index.html, accessed December 15, 2013.
- Dodds and Danforth (2009) P. S. Dodds and C. M. Danforth, Journal of Happiness Studies (2009), doi:10.1007/s10902-009-9150-9.
- Forgas (2013) J. P. Forgas, Current Directions in Psychological Science 22, 225 (2013).
- Pinker (2011) S. Pinker, The Better Angels of Our Nature: Why Violence Has Declined (Viking Books, New York, 2011).
- (30) Twitter API. Available at http://dev.twitter.com/. Accessed October 24, 2011.
- Michel et al. (2011) J.-B. Michel, Y. K. Shen, A. P. Aiden, A. Veres, M. K. Gray, The Google Books Team, J. P. Pickett, D. Hoiberg, D. Clancy, P. Norvig, J. Orwant, S. Pinker, M. A. Nowak, and E. A. Lieberman, Science Magazine 331, 176 (2011).
- Sandhaus (2008) E. Sandhaus, “The New York Times Annotated Corpus,” Linguistic Data Consortium, Philadelphia (2008).
The authors acknowledge I. Ramiscal, C. Burke, P. Carrigan, M. Koehler, and Z. Henscheid, in part for their roles in developing hedonometer.org. The authors are also grateful for conversations with F. Henegan, A. Powers, and N. Atkins. PSD was supported by NSF CAREER Award # 0846668.
Supplementary Online Material
Online, interactive visualizations:
Spatiotemporal hedonometric measurements of Twitter across all 10 languages can be explored at hedonometer.org.
We provide the following resources online at http://www.uvm.edu/~storylab/share/papers/dodds2014a/.
Example scripts for parsing and measuring average happiness scores for texts;
D3 and Matlab scripts for generating word shifts;
Visualizations for exploring translation-stable word pairs across languages;
Interactive time series for Moby Dick, Crime and Punishment, the Count of Monte Cristo, and other works of literature.
|English: Twitter||5000||twi (); Dodds et al. (2011)|
|English: Google Books Project||5000||goo (); Michel et al. (2011)|
|English: The New York Times||5000||Sandhaus (2008)|
|English: Music lyrics||5000||Dodds and Danforth (2009)|
|Portuguese: Google Web Crawl||7133||goo (2006)|
|Portuguese: Twitter||7119||twi ()|
|Spanish: Google Web Crawl||7189||goo (2006)|
|Spanish: Twitter||6415||twi ()|
|Spanish: Google Books Project||6379||goo (); Michel et al. (2011)|
|French: Google Web Crawl||7056||goo (2006)|
|French: Twitter||6569||twi ()|
|French: Google Books Project||6192||goo (); Michel et al. (2011)|
|Arabic: Movie and TV subtitles||9999||The MITRE Corporation|
|Indonesian: Twitter||7044||twi ()|
|Indonesian: Movie subtitles||6726||The MITRE Corporation|
|Russian: Twitter||6575||twi ()|
|Russian: Google Books Project||5980||goo (); Michel et al. (2011)|
|Russian: Movie and TV subtitles||6186||goo (2006)|
|German: Google Web Crawl||6902||goo (2006)|
|German: Twitter||6459||twi ()|
|German: Google Books Project||6097||goo (); Michel et al. (2011)|
|Korean: Twitter||6728||twi ()|
|Korean: Movie subtitles||5389||The MITRE Corporation|
|Chinese: Google Books Project||10000||goo (); Michel et al. (2011)|
|English||United States of America, India|
|Korean||Korea, United States of America|
We used the services of Appen Butler Hill (http://www.appen.com) for all word evaluations excluding English, for which we had earlier employed Mechanical Turk (https://www.mturk.com/ Kloumann et al. (2012)).
English instructions were translated to all other languages and given to participants along with survey questions, and an example of the English instruction page is below. Non-english language experiments were conducted through a custom interactive website built by Appen Butler Hill, and all participants were required to pass a stringent oral proficiency test in their own language.
Sizes and sources for our 24 corpora are given in Tab. S1.
We used Mechanical Turk to obtain evaluations of the four English corpora Kloumann et al. (2012). For all non-English assessments, we contracted the translation services company Appen-Butler Hill. For each language, participants were required to be native speaker, to have grown up in the country where the language is spoken, and to pass a strenuous online aural comprehension test.
Notes on corpus generation
There is no single, principled way to merge corpora to create an ordered list of words for a given language. For example, it is impossible to weight the most commonly used words in the New York Times against those of Twitter. Nevertheless, we are obliged to choose some method for doing so to facilitate comparisons across languages and for the purposes of building adaptable linguistic instruments.
For each language, we created a single quasi-ranked word list by finding the smallest integer such that the union of all words with rank in at least one corpus formed a set of at least 10,000 words.
For Twitter, we first checked if a string contains at least one valid utf8 letter, discarding if not. Next we filtered out strings containing invisible control characters, as these symbols can be problematic. We ignored all strings that start with and end with (generally html code). We ignored strings with a leading @ or &, or either preceded with standard punctuation (e.g., Twitter ID’s), but kept hashtags. We also removed all strings starting with www. or http: or end in .com (all websites). We stripped the remaining strings of standard punctuation, and we replaced all double quotes (”) by single quotes (’). Finally, we converted all Latin alphabet letters to lowercase.
A simple example of this tokenization process would be:
Term count love 19 #love 3 love87 1
The term ‘@love’ is discarded, and all other terms map to either ‘love’ or ‘love87’.
|Spanish||1.00, 0.00||1.01, 0.03||1.06, -0.07||1.22, -0.88||1.11, -0.24||1.22, -0.84||1.13, -0.22||1.31, -1.16||1.60, -2.73||1.58, -2.30|
|Portuguese||0.99, -0.03||1.00, 0.00||1.04, -0.03||1.22, -0.97||1.11, -0.33||1.21, -0.86||1.09, -0.08||1.26, -0.95||1.62, -2.92||1.58, -2.39|
|English||0.94, 0.06||0.96, 0.03||1.00, 0.00||1.13, -0.66||1.06, -0.23||1.16, -0.75||1.05, -0.10||1.21, -0.91||1.51, -2.53||1.47, -2.10|
|Indonesian||0.82, 0.72||0.82, 0.80||0.88, 0.58||1.00, 0.00||0.92, 0.48||0.99, 0.06||0.89, 0.71||1.02, 0.04||1.31, -1.53||1.33, -1.42|
|French||0.90, 0.22||0.90, 0.30||0.94, 0.22||1.09, -0.52||1.00, 0.00||1.08, -0.44||0.99, 0.12||1.12, -0.50||1.37, -1.88||1.40, -1.77|
|German||0.82, 0.69||0.83, 0.71||0.86, 0.65||1.01, -0.06||0.92, 0.41||1.00, 0.00||0.91, 0.61||1.07, -0.25||1.29, -1.44||1.32, -1.36|
|Arabic||0.88, 0.19||0.92, 0.08||0.95, 0.10||1.12, -0.80||1.01, -0.12||1.10, -0.68||1.00, 0.00||1.12, -0.63||1.40, -2.14||1.43, -2.01|
|Russian||0.76, 0.88||0.80, 0.75||0.83, 0.75||0.98, -0.04||0.89, 0.45||0.93, 0.24||0.89, 0.56||1.00, 0.00||1.26, -1.39||1.25, -1.05|
|Korean||0.62, 1.70||0.62, 1.81||0.66, 1.67||0.77, 1.17||0.73, 1.37||0.78, 1.12||0.71, 1.53||0.79, 1.10||1.00, 0.00||0.98, 0.28|
|Chinese||0.63, 1.46||0.63, 1.51||0.68, 1.43||0.75, 1.07||0.71, 1.26||0.76, 1.03||0.70, 1.41||0.80, 0.84||1.02, -0.29||1.00, 0.00|
|Spanish: Google Web Crawl||-0.114||3.38||-0.090||1.85||-5.55||6.10|
|Spanish: Google Books||-0.040||1.51||-0.016||1.90||-2.28||5.90|
|Portuguese: Google Web Crawl||-0.085||6.33||-0.060||3.23||-3.98||5.96|
|English: Google Books||-0.042||3.03||-0.013||3.50||-3.04||5.62|
|English: New York Times||-0.056||6.93||-0.044||1.99||-4.17||5.61|
|German: Google Web Crawl||-0.096||1.11||-0.082||6.75||-3.67||5.65|
|French: Google Web Crawl||-0.105||9.20||-0.080||1.99||-4.50||5.68|
|Indonesian: Movie subtitles||-0.039||1.48||-0.063||2.45||-2.04||5.45|
|French: Google Books||-0.043||6.80||-0.030||1.71||-2.31||5.49|
|German: Google Books||-0.003||8.12||+0.014||2.74||-1.38||5.45|
|Russian: Movie and TV subtitles||-0.029||2.36||-0.033||9.17||-1.57||5.43|
|Arabic: Movie and TV subtitles||-0.045||7.10||-0.029||4.19||-1.66||5.44|
|Russian: Google Books||+0.030||2.09||+0.070||5.08||+1.20||5.35|
|English: Music Lyrics||-0.073||2.53||-0.081||1.05||-6.12||5.45|
|Korean: Movie subtitles||-0.187||8.22||-0.180||2.01||-9.66||5.41|
|Chinese: Google Books||-0.067||1.48||-0.050||5.01||-1.72||5.21|
|English: Music Lyrics||+0.129||4.87||+0.134||1.63||2.76||1.33|
|English: New York Times||+0.050||4.56||+0.044||1.91||9.34||1.32|
|Arabic: Movie and TV subtitles||+0.101||7.13||+0.101||3.41||9.41||1.01|
|English: Google Books||+0.180||1.68||+0.176||4.96||3.36||1.27|
|Spanish: Google Books||+0.066||1.23||+0.062||6.53||9.17||1.26|
|Indonesian: Movie subtitles||+0.026||3.43||+0.027||2.81||2.87||1.12|
|Russian: Movie and TV subtitles||+0.083||7.60||+0.075||3.28||1.06||0.89|
|French: Google Books||+0.090||1.02||+0.085||1.67||1.25||1.02|
|Spanish: Google Web Crawl||+0.119||4.45||+0.106||2.60||1.45||1.23|
|Portuguese: Google Web Crawl||+0.093||4.06||+0.083||2.91||1.07||1.26|
|French: Google Web Crawl||+0.104||2.12||+0.088||9.64||1.27||1.01|
|Korean: Movie subtitles||+0.171||1.39||+0.185||8.85||2.58||0.88|
|German: Google Books||+0.157||6.06||+0.162||4.96||2.17||1.03|
|German: Google Web Crawl||+0.099||2.05||+0.085||1.18||1.20||1.07|
|Chinese: Google Books||+0.099||3.07||+0.097||3.81||8.70||1.16|
|Russian: Google Books||+0.187||5.15||+0.177||2.24||2.28||0.81|
Explanation of Word Shifts
In this section, we explain our word shifts in detail, both the abbreviated ones included in Figs. 4 and S20, and the more sophisticated, complementary word shifts which follow in this supplementary section. We expand upon the approach described in Dodds and Danforth (2009) and Dodds et al. (2011) to rank and visualize how words contribute to this overall upward shift in happiness.
Shown below is the third inset word shift used in Fig 4 for the Count of Monte Cristo, a comparison of words found in the last 10% of the book (, = 6.32) relative to those used between 30% and 40% (, = 4.82). For this particular measurement, we employed the ‘word lens’ which excluded words with .
We will use the following probability notation for the normalized frequency of a given word in a text :
where is the frequency of word in with word lens applied Dodds and Danforth (2009). (For the example word shift above, we have .) We then estimate the happiness score of any text as
where is the average happiness score of a word as determined by our survey.
We can now express the happiness difference between two texts as follows:
where we have introduced as base reference for the average happiness of a word by noting that
We can now see the change in average happiness between a reference and comparison text as depending on how these two quantities behave for each word:
Words can contribute to or work against a shift in average happiness in four possible ways which we encode with symbols and colors:
, : Words that are more positive than the reference text’s overall average and are used more in the comparison text (, strong yellow).
, : Words that are less positive than the reference text’s overall average but are used less in the comparison text (, pale blue).
, : Words that are more positive than the reference text’s overall average but are used less in the comparison text (, pale yellow).
, : Words that are more positive than the reference text’s overall average and are used more in the comparison text (, strong blue).
Regardless of usage changes, yellow indicates a relatively positive word, blue a negative one. The stronger colors indicate words with the most simple impact: relatively positive or negative words being used more overall.
We order words by the absolute value of their contribution to or against the overall shift, and normalize them as percentages.
Simple Word Shifts
For simple inset word shifts, we show the 10 top words in terms of their absolute contribution to the shift.
Returning to the inset word shift above, we see that an increase in the abundance of relatively positive words ‘excellence’ ‘mer’ and ‘rêve’ (, strong yellow) as well as a decrease in the relatively negative words ‘prison’ and ‘prisonnier’ (, pale blue) most strongly contribute to the increase in positivity. Some words go against this trend, and in the abbreviated word shift we see less usage of relatively positive words ‘liberté’ and ‘été’ (, pale yellow).
The normalized sum total of each of the four categories of words is shown in the summary bars at the bottom of the word shift. For example, represents the total shift due to all relatively positive words that are more prevalent in the comparison text. The smallest contribution comes from relatively negative words being used more (, strong blue).
The bottom bar with shows the overall shift with a breakdown of how relatively positive and negative words separately contribute. For the Count of Monte Cristo example, we observe an overall use of relatively positive words and a drop in the use of relatively negative ones (strong yellow and pale blue).
Full Word Shifts
We turn now to explaining the sophisticated word shifts we include at the end of this document. We break down the full word shift corresponding to the simple one we have just addressed for the Count of Monte Cristo, Fig. S34.
First, each word shift has a summary at the top:
which describes both the reference and summary text, gives their average happiness scores, shows which is happier through an inequality, and functions as a legend showing that average happiness will be marked on graphs with diamonds (filled for the reference text, unfilled for the comparison one).
We note that if two texts are equal in happiness two two decimal places, the word shift will show them as approximately the same. The word shift is still very much informative as word usage will most likely have be different between any two large-scale texts.
Below the summary and taking up the left column of each figure, is the word shift itself for the first 50 words, ordered by contribution rank:
⋮ ⋮ ⋮
⋮ ⋮ ⋮
The right column of each figure contains a series of summary and histogram graphics that show how the underlying word distributions for each text give rise to the overall shift. In all cases, and in the manner of the word shift, data for the reference text is on the left, the comparison is on the right. In the histograms, we indicate the lens with a pale red for inclusion, light gray for exclusion. We mark average happiness for each text by black and unfilled diamonds.
First in plot B, we have the bare frequency distributions for each text. The left hand summary compares the sizes of the two texts (the reference is larger in this case), while the histogram gives a detailed view of how each text’s words are distributed according to average happiness.
In plot C, we then apply the lens and renormalize. We can now also use our colors to show the relative positivity or negativity of words. Note that the strong yellow and blue appear on the side of comparison text, as these words are being used more relative to the reference text, and we are still considering normalized word counts only. The plot on the left shows the sum of the four kinds of counts. We can see that relatively positive words are dominating in terms of pure counts at this stage of the computation.
We move to plot D, where we weight words by their emotional distance from the reference text, . We note that in this particular example, the reference text’s average happiness is near neutral ( = 5), so the shapes of histograms do not change greatly. Also, since is negative, the colors for the relatively negative words swap from left to right. More frequently used negative words, for example, drag the comparison text down (strong blue) and must switch toward favoring the reference text.
In plot E, we incorporate the differences in word usage, . The histogram shows the result binned by average happiness, and in this case we see that the comparison text is generally happier across the negativity-positivity scale. The summary plot shows both the sums of relatively positive and negative words, and the overall differential. These three bars match those at the bottom of the corresponding simple word shift.
Finally, we show how the four categories of words combine as we sum their contributions up in descending order of absolute contribution to or against the overall happiness shift. The four outer plots below show the growth for each kind of word separately, and their end points match the bar lengths in Plot D above. The central plot shows how all four contribute together with the black line showing the overall sum. In this example, the shift is positive, and all the sum of all contributions gives +100%. The horizontal line in all five plots indicates a word rank of 50, to match the extent of Figure’s word shift.