Kashmir: A Computational Analysis of the Voice of Peace

# Kashmir: A Computational Analysis of the Voice of Peace

Shriphani Palakodety
Onai
spalakod@onai.com
\AndAshiqur R. KhudaBukhsh
Carnegie Mellon University
akhudabu@cs.cmu.edu
\AndJaime G. Carbonell
Carnegie Mellon University
jgc@cs.cmu.edu
Shriphani Palakodety and Ashiqur R. KhudaBukhsh are equal contribution first authors.
###### Abstract

The recent Pulwama terror attack (February 14, 2019, Pulwama, Kashmir) triggered a chain of escalating events between India and Pakistan adding another episode to their 70-year-old dispute over Kashmir. The present era of ubiquitious social media has never seen nuclear powers closer to war. In this paper, we analyze this evolving international crisis via a substantial corpus constructed using comments on YouTube videos (921,235 English comments posted by 392,460 users out of 2.04 million overall comments by 791,289 users on 2,890 videos). Our main contributions in the paper are three-fold. First, we present an observation that polyglot word-embeddings reveal precise and accurate language clusters, and subsequently construct a document language-identification technique with negligible annotation requirements. We demonstrate the viability and utility across a variety of data sets involving several low-resource languages. Second, we present an extensive analysis on temporal trends of pro-peace and pro-war intent through a manually constructed polarity phrase lexicon. We observe that when tensions between the two nations were at their peak, pro-peace intent in the corpus was at its highest point. Finally, in the context of heated discussions in a politically tense situation where two nations are at the brink of a full-fledged war, we argue the importance of automatic identification of user-generated web content that can diffuse hostility and address this prediction task, dubbed hope-speech detection.

\keywords

India-Pakistan conflict Kashmir issue Hope-speech

## 1 Introduction

“In peace, sons bury their fathers. In war, fathers bury their sons.”
– Originally from Herodotus, an automatically discovered comment from our war corpus.

On February 14, 2019, a suicide bomber attacked a convoy of vehicles carrying Indian Central Reserve Police Force (CRPF) personnel in Pulwama district, Jammu and Kashmir, resulting in the deaths of 40 CRPF service-personnel and the attacker. A Pakistan-based Islamist militant group claimed responsibility, though Pakistan condemned the attack and denied any connection to it. The Pulwama attack triggered a chain of events where each passing day led to an escalation of tensions between India and Pakistan reaching a peak on the of February, 2019. With the two nuclear powers coming precariously close to declaring a full-fledged war, the world witnessed a first-of-its-kind specter of war between nuclear adversaries in the modern era of ubiquitous internet, where a unique war-dialogue took place between the two nations’ civilians on social media.

In this paper we focus on the discourse that took place in comments posted on YouTube - one of the most popular social media platforms in the Indian sub-continent. We collected a comprehensive data set of comments posted in response to YouTube videos on news coverage of relevant incidents by Indian, Pakistani and global media, and analyzed several important aspects of the dialogue between the two conflicting neighbors in relation to this crisis.

Contributions: Our contributions are the following:

1. Domain: To the best of our knowledge, ours is the first large-scale analysis of an evolving international crisis between two nuclear adversaries at the brink of a full-fledged war through the lens of social media. India and Pakistan have a long history of political tension that includes four wars and multiple skirmishes resulting in significant military and civilian casualties [1]. As previously presented in [2], social media would play an increasingly important role in understanding and analyzing modern conflicts, and we believe that our work would complement the vast literature of quantitative political science on conflict analysis (see, e.g., [3]).

2. Linguistic: We present a novel observation that polyglot word-embeddings reveal precise and accurate language clusters and subsequently construct a document language-identification technique with negligible annotation requirements. We demonstrate our technique’s competitive performance against strong supervised baselines. Our technique has applications in analysis of social media content in multilingual settings like India, a country with tremendous linguistic diversity (22 major recognized languages) featuring several low-resource languages.

3. Social: Through an extensive polarity phrase lexicon, we analyze the temporal trends of pro-war and pro-peace intent and observe that the pro-peace intent reached its peak when the two nations were closest to declaring a full-fledged war. Using an embedding-space mining technique, we construct a lexicon of hateful terms specific to the India-Pakistan geopolitical situation.

4. Hope speech: We propose a novel task: hope-speech detection to automatically detect web content that may play a positive role in diffusing hostility on social media triggered by heightened political tensions during a conflict. Our results indicate that such web content can be automatically identified with considerable accuracy. Solutions to detect hostility-diffusing comments may also find applications in many other contexts. For instance, hostile messages and rumors on platforms like WhatsApp have been used to incite communal violence in the Indian subcontinent in recent times. The severity of the issue prompted the then administration to disable internet access in the regions of unrest to prevent further spread of hateful messages. Beyond a warlike situation, we expect our work to find application in these and other similar settings.

## 2 Background

A brief history of the conflict: Kashmir has been a point of contention between India and Pakistan for nearly 70 years. A key factor for continued unrest in South-East Asia, the Kashmir issue has drawn wide attention from the political science community for decades [4, 1, 5, 6]. The root of this conflict can be traced back to the independence struggle of India and the subsequent partition into India and Pakistan in 1947. Overall, India and Pakistan have gone to full-fledged war four times (1947, 1965, 1971 and 1999) of which, the 1971 war was the goriest (11,000 killed from both sides) which resulted in the largest number of prisoners of war (90,000 POWs) since the Second World War [7]. In the four wars, overall, an estimated 27,650 soldiers were killed and thousands wounded. A timeline outlining some of the key events in the bilateral conflict lasting decades is presented in the Appendix.

Feb 14th∙ A suicide bomber kills 40 CRPF personnel at Pulwama, India. A Pakistan-based Islamist militant group, Jaish-e-Mohammad (JEM), claims responsibility. Pakistan condemns the attack and denies any connection to it. India withdraws “most favored nation” status of Pakistan. Nine people, including four Indian soldiers and a policeman are killed in a gun battle in India-controlled Kashmir. Pakistan Prime Minister Imran Khan offers assistance to investigate the Pulwama attack. India refuses the offer citing previous attacks. India halts a bus-service between India-controlled Kashmir and Pakistan-controlled Kashmir. India begins a two-day crackdown of separatists in Kashmir heightening tensions further. India claims an airstrike against JEM training base at Balakot and reports a large number of terrorists, trainers and senior commanders have been killed. Pakistan denies any such casualty count. As an ominous sign of nuclear threat, Pakistan media reports that Imran Khan chaired a meeting of the National Command Authority, the overseeing body of the country’s nuclear warheads. An Indian Air Force pilot, Abhinandan, is captured by Pakistani armed forces inside Pakistan air space. Pakistan announces that they will release Abhinandan as a peace gesture. Pakistan hands over Wing Commander Abhinandan to India at the Wagah border.

Timeline of the most-recent crisis: We outline some of the key events relevant to the most-recent crisis (presented above). We denote five key events: Pulwama terror attack as PULWAMA (Feb 14, 2019), Balakot air strike claimed by Indian Government as an act of retaliation as BALAKOT (Feb 26, 2019), Indian Air Force (IAF) wing commander Abhinandan’s capture by Pakistan (Feb 27, 2019) as IAFPILOT-CAPTURE, Pakistan Government’s subsequent announcement of his release as IAFPILOT-RELEASE (Feb 28, 2019), and Pakistan Government’s handing over of the captured Indian pilot as IAFPILOT-RETURN (Mar 1, 2019).

Why YouTube? As of April 2019, the platform drew 265 million monthly active users (225 million on mobile) in India accounting for 80% of the population with internet access. In Pakistan, 73% of the population with internet access views YouTube on a regular basis and considers YouTube as the primary online platform for video consumption. The large user base, broad geographic reach, and widespread adoption in the Indian subcontinent make YouTube a high quality source for the analysis in this paper.

Our data set was acquired using the following steps: (i) obtaining a set of search queries to execute against the YouTube search feature (ii) executing the searches against YouTube search to retrieve a list of relevant videos, (iii) crawling the comments for these videos using the publicly available YouTube API.

Collecting a set of queries: We start with a seed set of queries relevant to the crisis: [Pulwama], [Balakot], [Abhinandan], [Kashmir], [India Pakistan war], [India Pakistan]555We noticed that the queries [India Pakistan] and [Pakistan India] yielded slightly different results, Following [8], that revealed that we tend to put our more-preferred choice ahead in a pair, whenever we have a query that contained a country pair (e.g., [India Pakistan] or [Pakistan India war], we adjusted the order of the pair accordingly matching it with the location of interest.. We construct News, a set of highly popular news channels in India , Pakistan, and the world (listed in Appendix). Next, we expand this query set and construct by searching for each of the queries in on Google Trends setting the geographic location to India or Pakistan and including the related queries returned for the time duration of interest (14th February 2019 to 13th March 2019). Finally, for each query in and each channel in News, a new query is formulated by concatenating and . For instance, [Pulwama CNN] is obtained by concatenating the query [Pulwama] and news channel [CNN]. This final set of queries is called .

Constructing , a set of relevant videos: For each query in , we execute a search using the YouTube search API to retrieve 200 most relevant videos posted during the period of interest. This step yields a result set (6,157 unique videos). after removing irrelevant and unpopular videos (less than 10 comments), we finally obtained , a set of 2,890 videos.

Constructing , the English comments corpus: We next extract English comments using a novel polyglot embedding based method first proposed in this paper (described in the results section). Our English comments corpus, , consists of 921,235 English comments posted by 392,460 users.

Investigating Coverage: It is important that the corpus reflects comments from both conflicting countries. We conducted a text template matching analysis to estimate the origin of the comments posted. We manually inspected the corpus and observed that majority of the users identified their nationality starting with the phrases [I’m], [I am] ,[I am from],[I am a],[I am an],[I am in],[I am in the],[I am from the], and [love from]. We used these templates and retrieved five tokens following each phrase (for example I am from Nepal). Country mentions are extracted from these following tokens and mention frequencies obtained. A log-scale choropleth visualization is shown in Figure 1 and the 10 most mentioned countries are listed in Table 1.

This analysis illustrates that our corpus (i) contains a balanced participation from both conflicting countries with (ii) moderate participation from neighboring countries likely to be affected in the event of a war. Interestingly, the plots indicate participation from nations with a significant Indian and Pakistani origin population (USA, United Kingdom, South Africa). We conjecture that modern migration patterns, the nuclear arsenals of India and Pakistan, and the broad global spread of Indian and Pakistani diaspora could be possible reasons for expressed global attention.

## 4 Results and Analysis

### 4.1 Language Identification

Mining a multilingual corpus for insights requires separating out portions of the corpus written in distinct languages. This is a critical step since annotators might be proficient in only a subset of the languages, and the majority of NLP tools are designed for monolingual corpora. We now present an important result to navigate multilingual social media corpora like those generated in the Indian subcontinent.

Polyglot word embeddings discover language clusters. Polyglot word embeddings are real-valued word embeddings obtained by training a single model on a multilingual corpus. Polyglot word-embeddings have received attention recently for demonstrating performance improvements across a variety of NLP tasks [9]. While the downstream impact of the embeddings has been explored, limited attention has been paid to their actual embedding space. We perform the first qualitative and quantitative analysis of this embedding space for a variety of Indian and European languages and present the following observations: (i) The word embedding space is divided into highly-accurate language clusters, (ii) a simple algorithm like Means can retrieve these clusters, and (iii) the resulting clusters perform on-par with supervised systems or better in some cases.

For generating the embeddings, we first strip all punctuation and tokenize by splitting on whitespace. Next, 100-dimensional FastText [10] embeddings are trained on the full corpus yielding the polyglot embeddings. A full comment (document) embedding can be obtained by normalizing the word-embedding of each of the tokens in the comment and subsequently averaging these word embeddings.

Qualitative analysis: We first show a two-dimensional (2D) visualization of the document (comment) embedding space generated through applying the TSNE algorithm [11] on the computed document embeddings of a random sample of 10,000 comments. As shown in Figure 2, we observe three clusters in the visualization. We then run -Means on these document embeddings setting to 3 based on this observation. A manual inspection of the clusters reveals that they correspond to (i) Hindi in Roman script (green), (ii) Hindi in Devanagari script (blue), (iii) English (red).

Quantitative analysis: We next construct a technique for comment language identification. First, each comment’s embedding is obtained by the scheme described above. Next, the value of for the -Means algorithm is chosen using a standard heuristic [12] and -Means is run which yields clusters. Finally, a sample of 10 comments is drawn from each of the obtained clusters and the dominant language from this sample is assigned to this cluster. In our experience, at least 8 out of 10 comments in the sample were from the dominant language i.e. each of the clusters obtained contains a highly dominant language (at least 8 out of 10) and the value of matches the number of languages present in the corpus. A test comment is assigned a language by (i) computing its embedding (as mentioned above), (ii) assigning this comment (embedding) to the cluster whose center is closest, (iii) returning the cluster’s assigned language label.

We evaluate performance on a held-out set of 200 documents and report precision, recall, F1, and accuracy. 3 languages were discovered by annotators - English, Hindi in Roman script (denoted Hindi(E)), and Hindi written in Devanagiri (see, Table 2). Note that, Hindi (mainly spoken in India) and Urdu (mainly spoken in Pakistan) are registers of the same language. Neither our annotators, nor commercial and open source solutions were able to distinguish between the two and thus the Hindi(E) cluster is used to denote both. We compare against two strong supervised baselines - (i) fastTextLangID - a popular open source solution supporting 174 languages, and (ii) GoogleLangID - a commercial solution able to identify close to 100 languages. Note that the comparison against supervised solutions has limitations - they support far more languages than considered in our data set, and might not be able to identify the exact languages in our corpus (highly likely in low-resource settings). We include them to contrast the performance of our technique with successful real-world systems on our data set. Results are presented in Table 2. We observed that our method and GoogleLangID achieved near-perfect results while fastTextLangID mislabeled the Hindi(E) cluster comments as English underscoring the importance of our method.

A thorough treatment of this data set, and additional data sets containing a variety of Indian and European languages is presented in the Appendix. Our technique’s simplicity, high performance and low annotation requirement lead us to believe in its viability for analyzing multi-lingual corpora featuring low-resource languages (e.g., Bengali and Oriya, see, 6 Appendix).

Intuition: The SkipGram model [13] used for training the FastText embeddings predicts an input word’s context. In a polyglot setting, the likeliest context predicted for a Hindi word is other Hindi words. The embeddings likely reflect this aspect of the language model and thus we see language clusters. We admit that implementation choices like splitting on whitespace (for instance) can preclude some languages, so we refrain from making claims about the universality of the technique and present empirical results only on Indian and European languages.

Polyglot embeddings to mine conflict-specific slurs. We found frequent use of porkistan (an intra-word code-mixed insult for Pakistan [14]) and randia (a code-mixed derogatory term for India). In order to uncover similar insults, we started with a seed set and expanded it by including the top-ten nearest neighbors in the polyglot embedding space (distance metric: cosine similarity). We conducted this expansion step 3 times, and overall obtained 384 unique terms. We manually annotated them and uncovered 243 insults111111Will be made publicly available upon acceptance.. Our hate lexicon mainly uncovered India-Pakistan specific insults and thus had minimal overlap with previously published hate lexicons [15, 14].

Through polyglot embeddings, we were able to retrieve oinkistan from pigistan. Several of these insults attack religious beliefs, mock the economy (e.g., slumistan, bhikharistan, bhikhari translates to beggar) and taunt social evils (e.g., terroristan, rapistan). Through the usage pattern of these insults, we observed “conflict spiral” [16] where parties mirror each other’s aggressive communication tactics. A detailed treatment is presented in the Appendix with the finding: baseline hate remained roughly constant throughout the entire time-period.

### 4.2 Temporal Trends in Pro-peace Intent

State of the art sentiment analysis tools typically target domains like movie reviews, product reviews and so on. Prior sentiment analysis research has been been performed on political news content [17] and social media responses to humanitarian crises [18], but to the best of our knowledge, there has been no previous work on war sentiment. Moreover, most of these standard off-the-shelf sentiment analysis tools have been trained on corpora very different from ours. For instance, OpenAI sentiment analysis tool [19] is trained on Amazon e-commerce product reviews. Consequently, off-the-shelf tools are not sufficient in our case. For instance, Stanford CoreNLP (version 3.9.2) sentiment analysis12 [20], a popular sentiment analysis model, marks the following three examples: [Say no to war.], [War is not a solution.], and [We will nuke you.] as negative, neutral, and positive, respectively. In a conflict-analysis scenario, these three examples should be marked as positive, positive, and negative instead. Moreover, we observed that the predicted results are sensitive to punctuation and casing - which cannot be guaranteed in a noisy setting. Hence, we address the challenges in modeling sentiment in our corpus by using a comprehensive manually labeled set of phrases to reveal sentiment. Techniques for analyzing the semantic orientation of text have heavily exploited manually curated lexicons [21, 22, 23, 24]. Following [22, 23], we construct an annotated domain-specific phrase lexicon for mining pro-war and pro-peace intent.

We first analyze a set of four high-frequency trigrams expressing collective war/peace intent: [we want peace], [we want war], [we want surgical] (surgical refers to surgical strike), [we want revenge]. These express peace, war, war, and war intents respectively. Out of 9,300,740 unique trigrams, these four trigrams are the 35, 515, 875 and 967 in terms of frequency and are the top four collective intent expressing trigrams. A comment that contains instances of a peace-seeking (war-seeking) phrase receives a positive (negative) score of (). The overall score of a comment is . The comment expresses peace-seeking intent if the overall score is greater than 0, neutral intent if the overall score is equal to 0 and war-seeking intent if the overall score is less than 0.

We summarize the temporal trends of peace-seeking and war-seeking intent using the four frequently used trigrams in Figure 3(c) and 3(d). We normalize war and peace intent frequencies by the total number of likes or comments received on that day, giving us values in the [0,1] interval, and allowing us to compare activities and sentiment across different days. We measure engagement in terms of comments and likes and plot the overall comment activity (Figure 3(a)) and overall like activity (Figure 3(b)). As shown in Figure 3(a) and 3(b), the baseline user activity, both in terms of comments and likes, spiked around the IAFPILOT-CAPTURE and IAFPILOT-RELEASE events (nearly 6 times more user engagement on 27 as compared to 15). Figure 3(c) and Figure 3(d) show that right after PULWAMA, pro-war intent dominated pro-peace intent. Following the pilot’s capture and subsequent release declaration, there was a substantial shift towards pro-peace intent after which, the pro-peace intent generally dominated war-seeking intent. Feb 27 was also the day when Pakistan media reported a meeting between the Pakistan PM and the nuclear warheads body and several news videos discussed the possibility of a nuclear war. Human evaluation on randomly sampled 200 positive comments suggested the following takeaways: in the context of this particular conflict, (1) pro-peace intent spiked when the possibility of a war became real, and (2) the peace-gesture (IAFPILOT-RELEASE) by the Pakistan Government possibly helped sustain this pro-peace intent.

In order to widen our coverage, we constructed an extensive lexicon of polarity phrases. Overall, we obtained 3,104 annotated phrases as one of: (i) peace-seeking (310 phrases), (ii) war-seeking (278 phrases), or (iii) neutral or unclear (2,516 phrases). A random sample of 10 pro-peace and pro-war phrases is prestented in Table 3. Our annotators were instructed to label explicit calls for war and peace. Similar to our previous setting, for a given comment, presence of a peace-seeking phrase contributes +1 to the comment’s score, a war-seeking phrase contributes -1 to the overall score, an a neutral phrase contributes a score of 0. The longest matching phrase is considered for computing the sentiment score and all subsumed phrases are ignored. For instance, consider a comment [we want peace but India is not worth it]; if [we want peace] has score +1 and [we want peace but India] has score 0, [we want peace] is disregarded and the overall contribution from these 2 phrases is 0.

As shown in Figure 3(e) and Figure 3(f), the qualitative trends found in our previous analysis hold. Right after PULWAMA, pro-war intent dominated pro-peace intent and a visible shift was observed on and after Feb 27th. Additionally, our coverage (fraction of comments containing at least one intent-expressing phrase) improved; overall, we obtained 7.25% coverage of comments (20x more than before) and 10.42% (24x more than before) coverage of likes.

Did many people change their minds? Unlike YouTube comments, YouTube likes are anonymous and cannot be attributed to individual users. Hence, we focus on the following research question: where there many users who initially clamored for war but later changed their minds? Or, when war became an imminent possibility, did a different sub-population voice their concerns? Analysis reveals the latter case to be true. On our comprehensive intent-expressing phrase set, we found that 4,407 users posted one or more peace-seeking comments, while 7,402 users posted one or more war-seeking comments. 280 users posted both types of comments. The Jaccard index131313defined as for sets and between the two user sets was 0.02 indicating low overlap.

Focused analysis around the peace-spike: We now focus on a comparative analysis between the two time intervals when war (or revenge) and peace intents were at their respective maximums: a three day period starting on PULWAMA (denoted as war-spike), and a three day period starting on IAFPILOT-CAPTURE (denoted as peace-spike). We compute the respective unigram distributions and . Next, for each token , we compute the scores , and and obtain the top tokens ranked by these scores (indicating increased usage in the respective periods of interest). As listed in Table 4, both war and peace were heavily used tokens during the peace-spike. However, war was predominantly used in the context of peace (e.g., [war is not a solution], [we don’t want war]). Several users also identified themselves as Indian or Pakistani and expressed love for the neighbor country. In contrast, during the war-spike, demands for revenge, or a surgical strike, or an attack on Pakistan dominated. Heavy use of Kashmir specific keywords during the war-spike and greater emphasis at the country level at the later stage was also consistent with the sequence of events that started as a regional terror attack and snowballed into an international crisis between two nuclear adversaries. We conducted a similar analysis on the set of Hindi comments and our observations align with the English corpus.

### 4.3 Hope-speech Detection

Analyzing and detecting hate-speech and hostility in social media [25, 26, 27, 28, 29] have received considerable attention from the research community. Hate-speech detection and subsequent intervention (in the form of moderation or flagging a user) are crucial in maintaining a convivial web environment. However, in our case where the civilians of two conflicting nations are engaging in heated discussions in a politically tense situation, detecting comments that can potentially diffuse hostility and bring the two countries together has particular importance, for instance by highlighting such comments or otherwise giving them more prominence.

Definition 1: A comment is marked as hope-speech, if it exhibits any of the following:

1. The comment explicitly mentions that the author comes from a neutral country (e.g., [great job thanks from Bangladesh make love not war]), and exhibits a positive sentiment towards both countries in the conflict.

2. The comment explicitly mentions that the author comes from one of the conflicting countries, and exhibits a positive sentiment to an entity (all people, media, army, government, specific professionals) of the other country (e.g., [I am from Pakistan I love India and Indian people]).

3. The comment explicitly urges fellow citizens to de-escalate, to stay calm.

4. The comment explicitly mentions that the author comes from one of the conflicting countries, and criticizes some aspect of the author’s own country (e.g.,[I am from India but Indian media very very bad]).

5. The comment criticizes some aspect of both of the conflicting countries.

6. The comment urges both countries to be peaceful.

7. The comment talks about the humanitarian cost of war and seeks to avoid civilian casualties (e.g., [peace is better than war as the price of war is death of innocent peoples]).

8. The comment expresses unconditional peace-seeking intent (e.g., [we want peace]).

If any of the following criteria are met, the comment is not hope-speech:

1. The comment explicitly mentions that the author comes from a conflicting country and expresses no positive sentiment toward the other conflicting country.

2. The comment explicitly mentions that the author comes from a neutral country but takes a position favoring only one of the conflicting countries (e.g., [I m frm Australia I support Pakistan]).

3. The comment actively seeks violence (e.g., [I want to see Hiroshima and Nagasaki type of attack on Pakistan please please]).

4. The comment uses racially, ethnically or nationally motivated slurs (e.g., randia, porkistan).

5. The comment starts the proverbial whataboutism, i.e., we did b because you did a (e.g., [Pakistan started it by causing Pulwama attack killing 44 Indian soldiers]).

Our list of hostility diffusing criteria is not exhaustive and may not cover the full spectrum of hostility diffusing comments. Consequently, we agree that it is possible to have several other reasonable formulations of hope-speech. Also, in a conflict scenario involving more than two conflicting entities, this particular definition may not hold. However, upon manual inspection of the corpus, we found that the definition covers a wide range of potentially hostility-diffusing comments while capturing several nuances.

Hope-speech comment frequency in the wild: On 2000 randomly sampled comments (500 from each week), our annotators found 49 positives (2.45%), 1946 negatives and 5 indeterminate comments. This indicates that detecting hope-speech is essentially a rare positive mining task which underscores the importance of automated detection.

Training set construction using Active Learning: To ensure generalizability and performance in the wild, it is critical that the training set contains sufficient examples from both classes and captures a wide variety of data points. To ensure this, we divided the corpus into four weekly sub-corpora and sampled uniformly from each of these acknowledging the strong temporal aspect in our data; for a data set consisting of sufficient number of positives and negatives, we employed a combination of Active Learning strategies [30] and constructed a data set of 2,277 positives and 7,716 negatives.

Features: We considered the following features:

1. n-grams up to size 3 following existing literature on text classification [31].

2. the previously described sentiment score of a comment obtained using our comprehensive set of intent phrases (denoted as I in Table 5).

3. the previously described 100-dimensional polyglot FastText embeddings (denoted as FT in Table 5).

Classifier performance: On our final data set, we used a 80/10/10 train/validation/test split. On the training set, we train a logistic regression classifier with L2 regularization with the discussed features and report performance on the test set. The experiment was run 100 times on 100 randomly chosen splits. As shown in Table 5, the results indicate that a hope-speech classifier with good precision and recall can be constructed. We admit that off-the-shelf sentiment analysis tools may perform poorly in our task, and it is not a fair comparison since they are trained for a different domain. However, for the sake of completeness, we ran the Stanford CoreNLP sentiment analyzer on our data set (precision: 27.65%, recall: 41.45%, F1: 33.17%). Our baselines’ stronger performance indicates that the task of hope-speech detection is different from simple sentiment analysis and hence requires a targeted approach.

Performance in the wild: We randomly sampled 1000 unlabeled comments from each day and ran our hope-speech classifier. Overall, 111 comments were predicted as positives with 94 verified correct by human evaluation (precision: 84.68%). A random sample of 10 hope-speech comments are presented in Table 6. Recall that, simple random sampling uncovered 2.45% of comments exhibiting hope-speech. Hence, our performance in the wild holds promise for substantially reducing manual moderation effort.

## 5 Conclusion

In the era of ubiquitous internet, public opinion on a rapidly evolving global issue can exhibit similar fast-changing behavior, much of which is visible to a very large fraction of internet users. Consequently, this poses an additional challenge to countries with a history of past conflicts as comments inciting hostility may spiral the public opinion towards a stronger pro-war stance. In this work, we define a novel task of hope-speech detection to identify hostility-diffusing content. Extreme web-moderation during periods of strife and tension has included completely disabling internet access in a locality. Our work in detecting hostility-diffusing content may find applications in these scenarios as well. We present a thorough analysis of a novel polyglot embedding based language identification module that can be useful in facilitating research on social media data generated in this part of the globe with presence of several low-resource languages.

## References

• [1] Victoria Schofield. Kashmir in conflict: India, Pakistan and the unending war. Bloomsbury Publishing, 2010.
• [2] Thomas Zeitzoff. How social media is changing conflict. Journal of Conflict Resolution, 61(9):1970–1991, 2017.
• [3] Charles S Gochman and Russell J Leng. Realpolitik and the road to war: An analysis of attributes and behavior. International Studies Quarterly, 27(1):97–120, 1983.
• [4] Iffat Malik and Robert G Wirsing. Kashmir: Ethnic conflict international dispute. Oxford University Press Oxford, 2002.
• [5] Sumantra Bose. Kashmir: Roots of conflict, paths to peace. Harvard University Press, 2009.
• [6] Paul Staniland. Kashmir since 2003: Counterinsurgency and the Paradox of “Normalcy”. Asian Survey, volume 53, pages 931–957, 2013.
• [7] Tariq Ali. Can Pakistan survive?: the death of a state. Penguin Books London, 1983.
• [8] Seth Stephens-Davidowitz and Andrés Pabon. Everybody lies: Big data, new data, and what the internet can tell us about who we really are. HarperCollins New York, 2017.
• [9] Phoebe Mulcaire, Jungo Kasai, and Noah A. Smith. Polyglot contextual representations improve crosslingual transfer. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1, pages 3912–3918, June 2019.
• [10] Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5:135–146, 2017.
• [11] Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(Nov):2579–2605, 2008.
• [12] Peter J Rousseeuw. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics, 20:53–65, 1987.
• [13] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
• [14] Raghav Kapoor, Yaman Kumar, Kshitij Rajput, Rajiv Ratn Shah, Ponnurangam Kumaraguru, and Roger Zimmermann. Mind your language: Abuse and offense detection for code-switched languages. arXiv preprint arXiv:1809.08652, 2018.
• [15] Aditya Bohra, Deepanshu Vijay, Vinay Singh, Syed Sarfaraz Akhtar, and Manish Shrivastava. A dataset of hindi-english code-mixed social media text for hate speech detection. In Second Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media, pages 36–41, 2018.
• [16] Jeffrey Z Rubin, Dean G Pruitt, and Sung Hee Kim. Social conflict: Escalation, stalemate, and settlement. Mcgraw-Hill Book Company, 1994.
• [17] Mesut Kaya, Guven Fidan, and Ismail H Toroslu. Sentiment analysis of turkish political news. In Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology-Volume 01, pages 174–180. IEEE Computer Society, 2012.
• [18] Nazan Öztürk and Serkan Ayvaz. Sentiment analysis on twitter: A text mining approach to the syrian refugee crisis. Telematics and Informatics, 35(1):136–147, 2018.
• [19] Alec Radford, Rafal Jozefowicz, and Ilya Sutskever. Learning to generate reviews and discovering sentiment. arXiv preprint arXiv:1704.01444, 2017.
• [20] Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. The Stanford CoreNLP natural language processing toolkit. In Association for Computational Linguistics (ACL) System Demonstrations, pages 55–60, 2014.
• [21] Maite Taboada, Julian Brooke, Milan Tofiloski, Kimberly Voll, and Manfred Stede. Lexicon-based methods for sentiment analysis. Computational linguistics, 37(2):267–307, 2011.
• [22] Leonid Velikovich, Sasha Blair-Goldensohn, Kerry Hannan, and Ryan McDonald. The viability of web-derived polarity lexicons. In NAACL-HLT, pages 777–785. Association for Computational Linguistics, 2010.
• [23] William L Hamilton, Kevin Clark, Jure Leskovec, and Dan Jurafsky. Inducing domain-specific sentiment lexicons from unlabeled corpora. In Proceedings of EMNLP, volume 2016, page 595. NIH Public Access, 2016.
• [24] Brendan O’Connor, Ramnath Balasubramanyan, Bryan R Routledge, and Noah A Smith. From tweets to polls: Linking text sentiment to public opinion time series. In Fourth International AAAI Conference on Weblogs and Social Media, 2010.
• [25] Fabio Del Vigna, Andrea Cimino, Felice Dell’Orletta, Marinella Petrocchi, and Maurizio Tesconi. Hate me, hate me not: Hate speech detection on Facebook. Proceedings of the First Italian Conference on Cybersecurity, 2017.
• [26] Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber. Automated hate speech detection and the problem of offensive language. In Eleventh International AAAI Conference on Web and Social Media, 2017.
• [27] Eshwar Chandrasekharan, Umashanthi Pavalanathan, Anirudh Srinivasan, Adam Glynn, Jacob Eisenstein, and Eric Gilbert. You Can’t Stay Here: The Efficacy of Reddit’s 2015 Ban Examined Through Hate Speech. Proceedings of the ACM on Human-Computer Interaction, 1(CSCW):31, 2017.
• [28] Karthik Dinakar, Birago Jones, Catherine Havasi, Henry Lieberman, and Rosalind Picard. Common sense reasoning for detection, prevention, and mitigation of cyberbullying. ACM Transactions on Interactive Intelligent Systems (TiiS), 2(3):18, 2012.
• [29] Ping Liu, Joshua Guberman, Libby Hemphill, and Aron Culotta. Forecasting the presence and intensity of hostility on instagram using linguistic and social features. In Twelfth International AAAI Conference on Web and Social Media, 2018.
• [30] Vikas Sindhwani, Prem Melville, and Richard D Lawrence. Uncertainty sampling and transductive experimental design for active dual supervision. In Proceedings of the 26th ICML, pages 953–960. ACM, 2009.
• [31] Christopher D Manning, Christopher D Manning, and Hinrich Schütze. Foundations of statistical natural language processing. MIT press, 1999.
• [32] Herman Anthony Carneiro and Eleftherios Mylonakis. Google trends: a web-based tool for real-time surveillance of disease outbreaks. Clinical infectious diseases, 49(10):1557–1564, 2009.
• [33] Philipp Koehn. Europarl: A parallel corpus for statistical machine translation. In MT summit, volume 5, pages 79–86, 2005.
• [34] Jörg Tiedemann. Parallel data, tools and interfaces in opus. In LREC, volume 2012, pages 2214–2218, 2012.

## 6 Appendix

### 6.1 A Brief History of the Conflict

Kashmir has been a disputed region for an almost century-long India-Pakistan conflict. A key factor for continual unrest in South-East Asia, the Kashmir issue has drawn wide attention from the political science community for decades [4, 1, 5]. The root of this conflict can be traced back to the independence struggle of India and the subsequent partition of India and Pakistan in 1947. Overall, India and Pakistan have gone to full-fledged war for four times (1947, 1965, 1971 and 1999) of which, the 1971 war was the goriest one (11,000 killed from both sides) which resulted in the largest number of prisoners of war (90,000 POWs) since the Second World War [7]. In the four wars, overall, an estimated 27,650 soldiers got killed and thousands got wounded. A timeline outlining some of the key events in the bilateral conflict lasting decades is presented below.

1947∙ British rule over Indian subcontinent ends. Territory is partitioned into the Islamic Republic of Pakistan (Muslim majority), and the Republic of India (Hindu majority). Kashmir (Muslim majority) becomes a sovereign monarchy led by a Hindu dynasty. Pakistani tribal military attacks Kashmir. Merely two months after declaring independence from both India and Pakistan, the ruler of Kashmir signs a treaty of accession to India triggering the first war between India and Pakistan. The Kashmir issue is discussed at the United Nations (UN) Security Council. Ceasefire is achieved in 1949 but Pakistan does not evacuate its troops thereby partitioning Kashmir. The modern territories of Pakistan-administered Kashmir - Azad Kashmir and Gilgit-Baltistan (then Northern Areas), and India-administered Jammu and Kashmir are formed. The UN resolution calls for a referendum on the status of the Kashmir region. Elections in India-administered Jammu and Kashmir show strong support for Indian accession and India considers a referendum unnecessary. Pakistan and the UN disagree on the counts that the Pakistan-administered regions were not considered in the vote. India and Pakistan go to war for the second time over Kashmir. The brief war ends in a ceasefire and a return to the original positions. India and Pakistan go to war for the third time. The war results in a defeat for Pakistan and the Simla Agreement is signed. The Kashmir ceasefire line is christened the Line of Control (LoC). Both sides pledge to settle the Kashmir issue through negotiations. The Siachen glacier, a vital strategic territorial asset, is seized by the Indian military. The glacier heretofore not demarcated by the LoC is the casus belli of several future confrontations between India and Pakistan. The Siachen glacier is the highest warzone in the world. India’s Prime Minister declares India a full-fledged nuclear state in a press conference. Shortly after, Pakistan successfully develops and tests its own nuclear weapons and both nations become one of a handful of global nuclear powers. India and Pakistan go to war for the fourth time. The event was triggered by militant activity in the Indian-administered Kargil district of Jammu and Kashmir. The conflict (called the Kargil war) lasts for approximately 2 months. In a flurry of terror attacks in both nations where both parties allege the involvement of the other, PULWAMA marks the most recent episode bringing both the nuclear states precariously close to war.

We obtain a comprehensive set of YouTube comments using the publicly available YouTube API on incident-specific videos during our period of interest (14th Feb 2019 to 13th Mar 2019). In what follows, we first provide a brief outline of our data collection process and then describe each of the steps in greater detail.

Our data collection procedure consists of the following steps:

1. Start with a small seed set, , with potentially relevant search queries.

2. Expand to by including related search queries from Google Trends during the period of interest.

3. Create a set of queries, News, containing ten popular world news channels, and ten popular news channels from India and Pakistan. Expand to which contains all unique queries in and .

4. Query YouTube for videos posted within the specified time range. Obtain a video set, , containing top 10 video search results for each query in and top 200 video recommendations for each query in .

5. Manually inspect , remove irrelevant videos and obtain consisting of relevant videos.

6. Prune further to set of popular videos, , by removing any video that has 10 or fewer comments.

7. Obtain , the set of user comments to every video in .

8. Construct by filtering further to restrict it to comments written in English (described later).

We now provide a detailed description of our procedure. In step 1, we constructed with the following six queries: [pulwama], [balakot], [abhinandan], [kashmir], [india pakistan], [india pakistan war].

We expanded each query in with related search queries procured from Google Trends. Google Trends queries have been found effective in tackling time-series AI problems like early-detection of disease outbreaks [32]. Google Trends allows to specify geographic location of interest and period of interest. For each query in , we set the location of interest to India or Pakistan and the start date as February 14th, 2019 and the end date as March 13th, 2019. Our first four seed queries have only a single token. However, we noticed that the queries [India Pakistan] and [Pakistan India] yielded slightly different results, Following [8], that revealed that we tend to put our more-preferred choice ahead in a pair, whenever we have a query that contained a country pair (e.g., [India Pakistan] or [Pakistan India war], we adjusted the order of the pair accordingly matching it with the location of interest. In step 2, upon expansion using Google Trends and subsequent removal of duplicate queries, we obtained with 207 unique queries.

In step 3, with 29 unique news channels in News, the Cartesian-product between and News (listed in Table 7) produced 6003 queries in .

In step 4, we obtained 6,157 unique videos and after removing irrelevant and unpopular videos (less than 10 comments), we finally obtained a set of 2,890 videos. Two annotators fluent in English, Hindi and Urdu and familiarity with additional European and Indian languages annotated the videos (inter-rater agreement Cohen’s : 0.87). We considered the consensus labels.

consists of 2.04 million user comments coming from 791,289 unique users. After removing punctuation and emojis and lower-casing the comments written in roman alphabet, the average length of a comment was 22.51 38.47 tokens.

We finally ran our language identification module and obtained , a set of 921,235 YouTube comments posted by 392,460 users.

### 6.3 Language Identification

We now provide a thorough treatment of our approach on additional data sets. We focus on two low-resource language settings (Bengali and Oriya) and a large mix of well-formed texts from Europe (21 languages, EuroParl data set [33]). We first make two slight digressions: one to provide an intuition for our our results, the other to address the issue of fairness in comparison.

Intuition: The SkipGram model used for training the FastText embeddings predicts an input word’s context. In a polyglot setting, the likeliest context predicted for a Hindi word is other Hindi words. The embeddings likely reflect this aspect of the language model and thus we see language clusters. We admit that implementation choices like splitting on whitespace (for instance) can preclude some languages, so we refrain from making claims about the universality of the technique and present empirical results only on Indian and European languages.

Fairness: A fair performance comparison between the two supervised baselines: GoogleLangID and fastTextLangID and our proposed unsupervised approach is challenging for the following reasons. On one hand the baselines predict from a larger set of languages. In contrast, our method reveals only those languages observed in the corpus in question - thus a limited set of clusters (labels) is obtained - in most cases this is substantially smaller than the number of languages supported by industrial strength baselines. On the other hand, the baselines are supervised methods that have been trained on a vast amount of data whereas our methods require minimal manual labeling - a critical feature for dealing with corpora featuring low resource languages which are a common occurrence in the Indian subcontinent.

We admit that restricting the baselines to predict only from the smaller set may offset the advantage of our method. The API for fastTextLangID provides an ordered list of all languages that it supports with the confidence score. Let the set of all languages present in a corpus be denoted as . For a given document, we predict the language belonging to with the highest confidence score. Suppose the top three predictions for a document from our India-Pakistan data set by fastTextLangID are (1) German (predicted with highest confidence), (2) Spanish and (3) Hindi. Since Hindi , and German , Spanish , we consider that the predicted label is Hindi. We denote this new setting as fastTextLangID. For , we present in the performance of this additional baseline in Table 8.

Data sets: We now describe our additional data sets, two of which are collected from the Indian subcontinent, one is a well-known data set of European languages.

• : The ABP Ananda news channel is a Bengali news organization. We crawled the comments on videos uploaded by their YouTube channel and obtained 219,927 comments. Most of the comments are in Bengali, Hindi, and English. Note that internet users in the Indian subcontinent use the Latin script as well as their native script for writing. The use of the Latin script for writing in Hindi and Bengali is significant in this corpus.

• : OTV is an Oriya news network with a popular YouTube channel. We crawled videos from this network and subsequently crawled comments to obtain 153,435 comments. with most of the comments posted in Oriya, Hindi, and English. Latin script is heavily used alongside the native script for Oriya and Hindi.

• : The Europarl corpus [33] contains 21 languages with well-written text. The processed version is obtained from  [34]. 420,000 documents were reserved for training and 210,000 documents were used for test.

Performance on EuroParl: Our model’s performance is on-par with fastTextLangID. We did not evaluate against GoogleLangId due to prohibitive costs and it is reasonable to expect very high accuracy due to the clean nature of the corpus. Our method is near-perfect and on-par with fastTextLangID. Our model’s accuracy is versus for fastTextLangID.

Performance on low resource languages: We considered two additional data sets consisting of a mix of languages of which two are low resource languages (Bengali and Oriya). As shown in Table 10 and 9, our performance on the India-Pakistan data set translates to other languages in the Indian subcontinent. We observed that our added fairness criterion marginally improved the performance of fastTextLangID but our method still substantially outperformed fastTextLangID. As we already mentioned, we could not construct a similar GoogleLangID due to its API’s limitation. However, based on our current observations on fastTextLangID, we conjecture that the performance boost would not be substantial.

### 6.4 Analysis of Animus

We found frequent use of porkistan (an intra-word code-mixed insult for Pakistan [14]) and randia (a code-mixed derogatory term for India). In order to uncover similar insults, we started with a seed set and expanded it by including the top-ten nearest neighbors in the polyglot embedding space (distance metric: cosine similarity). We conducted this expansion step 3 times, and overall obtained 384 unique terms. We manually annotated them and uncovered 243 insults. Our hate lexicon mainly uncovered India-Pakistan specific insults and thus had minimal overlap with previously published hate lexicons [15, 14].

Through polyglot embedding, we were able to retrieve oinkistan from pigistan. Several of these insults attack religious beliefs, mock economy (e.g., slumistan, bhikharistan, bhikhari translates to beggar) and taunt social evils (e.g., terroristan, rapistan). Through the usage pattern of these insults, we observed “conflict spiral” [16] where parties mirror each other’s aggressive communication tactics.

We next provide a more detailed analysis,

Pakistan murdabad: We inspected the bigram distribution and found a high-frequency bigram Pakistan murdabad (murdabad is a Hindi/Urdu word that is used to express disapproval). In Figure 5, we plot the temporal trends of comments containing Pakistan murdabad. We found that the usage of the term peaked around February 15th, following JEM (Pakistan-based militant group) claiming responsibility for PULWAMA and as time progressed, the usage gradually declined.

Conflict-specific slurs: We use our lexicon obtained by nearest-neighbor search in the polyglot embedding space to compute the fraction of vitriolic comments targeted to either of the two countries in our corpus. If a comment contains any of the insults in the lexicon, we consider the comment as expressing vitriol. As shown in Figure 5(b), we found that the presence of vitriol was roughly the same throughout our period of interest; i.e., there was no strong vocabulary shift, rather an attitude shift when the peace-spike happened.

You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters