Estimating Causal Effects of Tone in Online Debates

Estimating Causal Effects of Tone in Online Debates

Dhanya Sridhar    Lise Getoor \affiliationsColumbia University
UC Santa Cruz

Statistical methods applied to social media posts shed light on the dynamics of online dialogue. For example, users’ wording choices predict their persuasiveness [??] and users adopt the language patterns of other dialogue participants [??]. In this paper, we estimate the causal effect of reply tones in debates on linguistic and sentiment changes in subsequent responses. The challenge for this estimation is that a reply’s tone and subsequent responses are confounded by the users’ ideologies on the debate topic and their emotions. To overcome this challenge, we learn representations of ideology using generative models of text. We study debates from and compare annotated tones of replying such as emotional versus factual, or reasonable versus attacking. We show that our latent confounder representation reduces bias in ATE estimation. Our results suggest that factual and asserting tones affect dialogue and provide a methodology for estimating causal effects from text.

Estimating Causal Effects of Tone in Online Debates

Dhanya Sridhar, Lise Getoor

Columbia University
UC Santa Cruz

1 Introduction

Debates on online forums or social media sites provide observational data for studying discourse. Current understanding draws upon theories such as linguistic accommodation, which states that dialogue participants change and vary their wording styles to mirror one another [??]. Statistical methods applied to social media have shown evidence of linguistic style accommodation [?], power dynamics [?] and varying persuasiveness of argumentation styles [?].

In this paper, we focus on online debates. We ask the causal question of how the tone used to reply in a debate affects changes in linguistic style and sentiment. To illustrate the setting, consider a snippet of a debate between two users, A and B, on a given topic. User A posts her opinion on the topic to which user B replies with a nasty tone. User A writes a second post, responding to B’s post. The goal is to examine the change in A’s sentiment or linguistic style between her first and second post. For example, we may observe that between her first post and her second post in response to B, A’s negative sentiment increased. We study how A’s sentiment might have changed had B been nice instead of nasty in the reply. We consider such sequences of three posts within debates and cast the tone of the first reply as the treatment. Formally, we estimate the average treatment effect of reply tone on changes in sentiment and linguistic style.

The challenge for this estimation is that the ideologies encoded in A and B’s posts, and A’s initial sentiment affect both B’s reply tone and A’s subsequent response. For example, consider a debate between A and B on gun control. Examples of opposing ideologies that influence the debate are strong opposition to gun violence versus strict interpretations of the constitution. Ideological differences or innate negativity from A provoke both B’s nasty tone and A’s subsequent reactions. Valid causal inference requires adjusting for these confounders when estimating the treatment effect. While sentiment analysis tools are available for extracting posts’ sentiment, modeling the latent ideologies that underpin a particular debate requires careful consideration. This paper proposes representations of ideologies learned from debates to adjust for confounding.

In recent social media analyses, adjusting for attributes such as discussion topic, post authors, timing of posts and posting frequency has been useful to understand post likeability, antisocial behavior and emoji use [????]. The adjustments are typically performed by only comparing posts that have similar values of the confounder. Our approach requires adjusting for the underlying facets of a debate, an unobserved and multi-dimensional confounder. We use a generative model of text to learn latent representations of posts to this end.

Main Idea.

The goal of this paper is to estimate the causal effects of tones used in debate replies on other users’ change in linguistic style and sentiment. We identify treatments (tone) and outcome representations which capture the change in sentiment and linguistic style across sequences of posts. We use three plug-in estimators of average treatment effects: regression, inverse propensity weighting (IPW), and augmented IPW. To adjust for confounding in these estimates, we find latent representations of posts that capture the underlying ideological viewpoints of the debate. Our contributions include:

  • Formulating the problem of estimating the effects of tone on subsequent dialogue within the framework of causal inference.

  • Learning latent representations of ideologies in debates from generative language models to represent confounders.

  • Validating the consistency of estimated effects using three different estimators and examining multiple tones of reply in online debates.

We study, an online debate forum corpus that includes annotations for multiple reply tones including nasty versus nice, or emotional versus factual [?]. Through comparisons against a naive confounder representation and studies across reply tones, we examine the implications of various modeling choices. With these findings, we highlight guidelines for estimating treatment effects using text from social media. We also find that factual replies significantly affect how users’ vary their linguistic style and sentiment between posts.

2 Related Work

Prior work on online debate forums primarily focus on supervised prediction tasks. Debate text and reply structure between users has been used to predict stance, sentiment and reply polarity [??????]. Related work has also used the Change My View forum on to predict persuasiveness from styles of argumentation and characterize logical fallacies [???].

Similarly, unsupervised methods have been applied to analyze dialogue. Statistical models have been proposed to quantify linguistic accommodation both on Twitter and U.S. Supreme Court arguments [??]. In contrast, we formulate an approach based on causal inference.

Existing work on applying causal inference methods to social media focuses on controlling for confounding, or inferring treatments and outcomes from text. One line of work controls for observable confounders such as topic [??], timing of posts [?] and the post author [?]. Another line of research uses social media posts to estimate the effect of exercise on mood by inferring both exercise habits and mood from text [??]. In a different line of work, embeddings of text have been used as proxy confounders to study causal effects on paper acceptance [?].

3 Technical Background

We review estimating the average treatment effect (ATE) for binary treatments from observational data. We have iid observations called units, . Each unit is treated or not, and we denote this treatment assigment . We say is the potential outcome if we treat unit (set ), and analogously, if we do not treat (set ). The average treatment effect (ATE) compares potential outcomes:


However, we only observe one outcome for each unit, conditioned on its assigned treatment . If we compute the ATE above by simply averaging over treated and untreated populations, the estimate will typically be biased because are not independent of the assigned treatments . Put simply, knowing the treated and untreated units gives us information about their outcomes.

In observational studies, this bias occurs because variables , called confounders, may affect both the treatment and outcome. If we observe the confounders for each unit, , then we have , the condition called ignorability. In this case, the ATE, which we denote , is identifiable as a parameter of the observational distribution by a theorem called adjustment:


In plain English, the ATE is: how do treated and control units differ in outcome when we average over the varying rates at which units receive treatment? We refer to work by Pearl and Rubin for an in-depth treatment of causal inference [??].

3.1 Estimators for ATE

Drawing from extensive work on estimating ATEs, we present three estimators for . We will return to using these estimators in our empirical study. The first estimator fits expected outcomes from the observations, e.g., with linear regression. The corresponding ATE estimate, is:


The second estimator reweights observed outcomes using the propensity score, . The resulting inverse propensity weighting (IPW) estimator is:


The final estimator, augmented-IPW (AIPW), interpolates between the two estimators [??]. It has been shown that the AIPW estimator satisfies double robustness: it retrieves consistent estimates if either the propensity score or outcome model is correct even if the other is misspecified. It is:


All three estimators rely on measurements of confounders . We will see that in debate threads, we must recover the confounders from high-dimensional text. A key idea of this paper is to use text data to find a representation for .

4 Dataset

Figure 1: We group LIWC categories into three types: linguistic style, positive sentiment, negative sentiment. We use the groupings for vector representations of posts which we can use to measure the outcomes of interest: changes between the first and last post of a triple.

To estimate the ATE of tone, we use the corpus collected and annotated as part of the Internet Argument Corpus [?]. has been used to predict users’ stances on topics, disagreements between users, and sarcasm use [????]. is a collection of debate discussions, each belonging to a topic such as “evolution” or “climate change.” For some pairs of posts called quote-response pairs, includes annotations about the reply, obtained using Amazon Mechnical Turk. A quote-response pair is a post and its reply where the replier quotes the original poster and responds directly to the quoted statement. The response is annotated by multiple annotators along four dimensions which we refer to as reply types: nasty/nice, attacking/reasonable, emotional/factual, questioning/asserting. Each reply type has two opposing polarities (e.g., nasty or nice) which we refer to as its tone. The annotation score for each type ranges from -5 to 5, where negative values correspond to the antagonistic tone such as nasty or attack and positive values map to tones such as reasonable or factual.

We select the four debate topics with the most quote-response annotations: “abortion”, “gay marriage”, “evolution”, and “gun control”. Each debate topic has on roughly 1200 quote-response annotations. Following prior work, we use the mean score across annotators and discard annotations with a mean score between -1 and 1 [???]. In the next section, we formalize the use of these annotations as treatments to estimate causal effects.

5 Problem Statement

To study causal effects in debate threads, we first introduce post triples. A post triple is an ordered sequence of three posts where each post belongs to the -th triple and appears -th in the sequence. The author of post is denoted by . The triples we consider have the property that , i.e., the same user authors the first and last posts. Based on the discussion in which the triple appears, the triple has a debate topic . We will refer to as the original post and to as the reply post.

Each triple we study has a quote-response annotation for towards . Given the reply type of the annotations and its mean score, we binarize the values by considering those as 0 and as 1. Replies are thus converted to binary negative or positive tone, such as nasty or nice. For each triple and reply type , the tone of reply post toward gives the treatment assignment for the triple.


The next problem is to quantify the outcome of interest: changes between and after receiving reply . We rely on the Linguistic Inquiry and Word Count (LIWC) tool [?]. LIWC is a dictionary which maps an extensive set of English words to categories that capture both lexical and semantic choices. Several text classification and statistical analysis tasks have represented posts with counts of LIWC categories [????].

We first combine LIWC categories into groups which we call category types that measure positive sentiment, negative sentiment, and linguistic style. Fig. 1 shows each category type. For the sentiment groupings, we select the categories related to positive and negative emotion as listed on the LIWC website. For linguistic style, we use the categories identified in prior work for linguistic style accommodation [?].

Given a category type, the frequency of words in belonging to each category gives a vector representation of the post. We can construct such vector representations for and . Formally, for a category type and reply type , the outcome for triple is the Euclidean distance between the vector representations for and . This strategy suggests many possible vector representations of posts including word embeddings [?].

We state the ATE estimatation problem for these debate triples. For all configurations of and , we estimate:

This estimates the mean difference in text changes between users receiving a positive-tone reply and those receiving negative-tone ones. The main challenge is to find a representation for that captures plausible confounders in debates.

6 Constructing Confounder Representations

In a post triple of interest, the debate topic, latent ideologies of each author within the topic, and sentiment of the original author are confounders. That is, these variables plausibly influence both the treatment (reply tone) and the outcome (change between posts). Prior work has shown that text in political debates can be mapped into a lower dimensional space that corresponds to the moral or ideological facets of that debate topic [????]. Unsupervised approaches have been used to discover word-clusters that correspond to these frames directly from text [?]. Here, we fit an unsupervised generative model of text to learn ideology representations.

Ideology Representation.

We use the latent Dirichlet allocation (LDA) topic model [?] to recover unobserved ideologies. The observations are counts of word in document . The generative process is:


Each of the latent topics is a distribution over the words in the vocabulary. Each document-level latent variable is a distribution over topics. For a document , each word is drawn by sampling a topic from the document’s distribution over topics and then sampling a word from that topic.

The posterior expected values and converge to the optimal values of and . This convergence property allows us to substitute as a confounder for each post. LDA is fit using variational inference.

We fit LDA for each debate topic with the observed word counts across posts from that topic. By conditioning on , the confounder representation incorporates both the debate topic and finer-grained ideology. The inferred mean proportions over latent topics is the embedding for each post. For each triple , we concatenate the embeddings for posts and to include in the confounder . Since the embeddings aim to approximate ideologies, including both and embeddings helps to further deconfound the effect of users’ opposing or similar views on tone and word change.

Sentiment Representation.

To represent the sentiment encoded in , we use the same LIWC category types as we do for the outcomes. As before, we compute the frequency of words in that belong to each category. This gives us a vector representation of sentiment which we include in .

7 Empirical Results

The difficulty in validating causal effects, particularly in debates, is that there is no ground truth. Typically, causal estimation procedures are validated using simulated data but for text, realistic generative models do not exist. Thus, one of the paper’s contributions is developing an evaluation strategy for text-based causal inferences. Our approach is three-pronged: 1) we assess the predictive performance of the key ingredients for estimation, the propensity score and expected outcome models; 2) we manually inspect the latent ideological topic ; 3) we compare causal effects across multiple estimators and against a naive confounder representation.

We found that: 1) causal estimation using our confounder representation reduces bias in the ATE estimates compared to using a naive confounder; 2) if we had not compared multiple estimators and instead used a single estimator like the high-variance IPTW (a common practice), we would have incorrectly reported effects; 3) the estimates suggest that emotional/factual and questioning/asserting tones elicit changes in linguistic style and emotion while nice/nasty or reasonable/attacking tones show no effect. Code and data to reproduce all results are available.

Methods and Metrics.

Using the latent confounder representation proposed, the goal is to estimate the ATE of reply tone on outcome . In the empirical results below, we estimate using three estimators () for all configurations of LIWC category type and that yield different treatments and outcomes. We report the unadjusted estimate, , which will be biased. We compare against a naive representation, - Debate Topics Only, which only uses the debate topic without finer-grained ideologies.

We fit the propensity score (used by ) with logistic regression using the observed treatments and constructed confounder representations. We fit the expected outcomes, (used by ) with linear regression.

Experimental Setup.

Besides the processing of quote-response pair annotations described in the Data section, we prepare the posts to fit LDA. We obtain all unigram tokens after lemmatizing and removing stop words from posts across all discussions for a given topic. We retain only those tokens which occur in more than 2% but in fewer than 80% of posts. For each topic, this yields a document-term-frequency matrix of roughly 30, 000 posts and 400 remaining terms after pre-processing. We fit LDA with topics. For each reply type, the ATE is averaged over roughly 1500 triples. For the AIPW estimator , we use a variant proposed to improve finite sample performance [?].

Performance of Outcome (Pos. Sent, RMSE) and Propensity Models (F1)
Reply type - Debate Topics Only - Full
Nasty/Nice 3.5 2.7 0.89 2.6 2.4 0.89
Attacking/Reasonable 3.6 2.7 0.81 3.0 2.7 0.81
Emotional/Factual 2.4 5.1 0.69 2.2 5.1 0.72
Questioning/Asserting 3.1 4.4 0.80 2.9 4.1 0.79
Table 1: The expected outcome models (shown here for positive sentiment) using -Full generally improves over using -Debate Topic Only. The propensity score model performs comparably in both cases. We evaluate the models using 5-fold cross-validation. For expected outcomes, we report RMSE and F1 for the propensity score.
Figure 2: Top words across latent topics from each debate topic chosen to illustrate ideologies captured by LDA. The latent topics are suggestive of viewpoints like morality and faith versus science and evidence.

Performance of Outcome and Propensity Models.

The first step to validating ATEs is to verify that the models used in various estimators fit the observations well. Table 1 gives the root mean squared error (RMSE) and F1 for the expected outcomes and propensity scores , respectively. We perform five-fold cross-validation on the triples. We report performances for all reply types but for the outcome model which depends also on the LIWC category type, we show scores for positive sentiment outcomes for conciseness. Our code will reproduce the other RMSEs.

For the positive sentiment expected outcomes, our confounder -Full predicts with lower RMSE than -Debate Topics Only. The F1 score is also slightly improved by using -Full over -Debate Topics Only when predicting emotional/factual. While this is reassuring, for causal inference, unbiased estimation is more important than predictive performance. Below, we investigate causal estimation, where -Debate Topics Only shows undesirable consequences.

ATE (and Standard Error) for Nasty/Nice
Estimator - Debate Topics Only - Full
Pos. Neg. Ling. Pos. Neg. Ling.
Unadjusted 0.0 -0.8 -0.6 0.0 -0.8 -0.6
0.0 (0.0) -0.8 (0.1) -0.6 (0.1) 0.0 (0.1) -0.3 (0.1) -0.3 (0.1)
0.0 (0.2) -0.7 (0.35) -0.6 (1.0) 0.0 (0.3) -0.2 (0.3) -0.1 (1.0)
0.0 (0.2) -0.8 (0.3) -0.6 (0.5) 0.0 (0.1) -0.3 (0.2) -0.3 (0.4)
Table 2: The debate topics-only approach can overestimate treatment effects; it remains more biased (compared to the unadjusted estimate) than using -Full. We report ATE (and standard error) for Nasty/Nice reply type.
ATE (and Standard Error) for Remaining Reply Types
Estimator Attacking/Reasonable Emotional/Factual Questioning/Asserting
Pos. Neg. Ling. Pos. Neg. Ling. Pos. Neg. Ling.
Unadjusted -0.1 -0.6 -0.6 -1.0 -0.8 -2.3 -0.5 -0.2 -1.7
0.1 (0.1) -0.2 (0.1) -0.2 (0.1) -0.6 (0.1) -0.3 (0.1) -1.4 (0.1) -0.3 (0.1) -0.2 (0.1) -1.2 (0.1)
0.1 (0.2) -0.1 (0.3) -0.4 (0.4) -0.4 (0.3) -0.2 (0.2) -0.7 (0.8) -0.3 (0.3) -0.2 (0.2) -1.1 (0.8)
-0.1 (0.1) -0.2 (0.9) -0.4 (0.4) -0.6 (0.2) -0.3 (0.1) -1.2 (0.3) -0.3 (0.2) -0.2 (0.1) -1.2 (0.3)
Table 3: Factual and asserting tones result in the first dialogue participant significantly decreasing changes in linguistic style. Factual tones may provoke decreased change in sentiment. We report ATE (and standard error) for all remaining reply types. Bolded numbers indicate that the ATE is significantly greater than zero.

Latent Ideologies.

Even if the ideology representations inferred using LDA are useful for predictive performance above, we carefully inspect the latent topics found by LDA. Since we assumed that ideology is a confounder, we want the latent topics to approximate ideologies. We inspected the top ten words associated with each latent topic across all debate topics. Fig. 2 shows these words for two latent topics from every debate topic as illustrative examples of the ideological views found. For example, in gun control debates, LDA finds topics that align with constitutional rights to bear arms and in evolution debates, there are contrasting topics that align with creationist versus scientific views. Our code includes simple visualization to inspect all latent topics.

ATE Estimation.

We use -Debate Topic Only and -Full and apply the three estimators and . We compare these estimates against the unadjusted estimate. Table 2 shows the ATEs (and standard error) for the nasty/nice reply type. When confounders are missing from adjustment, we expect the estimate to be closer to the biased, unadjusted estimate. Indeed, the results show that using - Debate Topics Only, omitting sentiment and ideology, consistently yields estimates which are closer to the unadjusted estimate than using - Full. This is a key finding: estimation bias is reduced with the finer-grained confounder representation. After adjusting for - Full, the effects on dialogue outcomes from nasty/nice tones are not significant.

In Table 3, we proceed with our confounder representation - Full and study the remaining reply types: attacking/reasonable, emotional/factual, questioning/asserting. The and estimators find significant effects, particularly for factual and authoritative tones. The estimator yields similar ATE estimates but suffers from high variance. This is another key finding that comparing multiple estimators provides a form of validation: the propensity score-based estimator is known to have high variance, and without the two remaining estimators, we may have concluded that no significant effects occur.

The and estimators suggest that factual and asserting tones cause decreased changes in linguistic style: users’ second posts remain closer to their original posts on average across triples. Further, these estimators suggest that both positive and negative sentiment changes are decreased when the tone is factual. The results suggest that users change their overall sentiment less when responding to factual arguments instead of emotionally charged ones. However, users may also maintain their original linguistic styles more when responding to factual or asserting tones. This finding on the role of factual and asserting tones may point to in-depth followup studies on persuasion and argumentation in debates.

Finally, the empirical studies reveal unexpected findings about causal estimation of treatments effects in debates. Interestingly, the choice of outcome representation matters: Table 3 in particular shows that changes to linguistic style are affected more than sentiment changes. A single outcome representation which had concatenated all LIWC categories or used word embeddings may have yielded different results.

8 Discussion

We study treatment effects in debates by estimating unobserved confounders from sequences of posts. We examine these interpretable embeddings in debates to find that they match known ideological views. The exercise of estimating treatment effects yields results of two flavors: 1) evidence that factual replies cause decreased change in linguistic style and sentiment, and 2) guidelines for practioners to estimate treatment effects from social media text.

We highlight areas of future study. In this work, we focus on ideology and sentiment as confounders. It is interesting to consider possible confounding from posts’ timing and position in a discussion thread. A fruitful area of research is learning deep confounder (and outcome) representations for text while maintaining model interpretability, which we show in this paper is important for validating findings. Finally, validating causal effects in questions of social science research remains an open problem. Simulating outcomes from text is a line of future work.


This work is supported by NSF grants CCF-1740850 and IIS-1703331.


  • [Abbott et al., 2011] Rob Abbott, Marilyn Walker, Jean E. Fox Tree, Pranav Anand, Robeson Bowmani, and Joseph King. How can you say such things?!?: Recognizing disagreement in informal political argument. In ACL Workshop on Language and Social Media, 2011.
  • [Anand et al., 2011] Pranav Anand, Marilyn Walker, Rob Abbott, Jean E Fox Tree, Robeson Bowmani, and Michael Minor. Cats rule and dogs drool!: Classifying stance in online debate. In ACL, 2011.
  • [Blei et al., 2003] David M Blei, Andrew Y Ng, and Michael I Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3(1), 2003.
  • [Boydstun et al., 2013] Amber E Boydstun, Justin H Gross, Philip Resnik, and Noah A Smith. Identifying media frames and frame dynamics within and across policy issues. In New Directions in Analyzing Text as Data Workshop, 2013.
  • [Cheng et al., 2015] Justin Cheng, Cristian Danescu-Niculescu-Mizil, and Jure Leskovec. Antisocial behavior in online discussion communities. In ICWSM, 2015.
  • [Danescu-Niculescu-Mizil et al., 2011] Cristian Danescu-Niculescu-Mizil, Michael Gamon, and Susan Dumais. Mark my words!: linguistic style accommodation in social media. In WWW, 2011.
  • [Danescu-Niculescu-Mizil et al., 2012a] Cristian Danescu-Niculescu-Mizil, Justin Cheng, Jon Kleinberg, and Lillian Lee. You had me at hello: How phrasing affects memorability. In ACL, 2012.
  • [Danescu-Niculescu-Mizil et al., 2012b] Cristian Danescu-Niculescu-Mizil, Lillian Lee, Bo Pang, and Jon Kleinberg. Echoes of power: Language effects and power differences in social interaction. In WWW, 2012.
  • [Dos Reis and Culotta, 2015] Virgile Landeiro Dos Reis and Aron Culotta. Using matched samples to estimate the effects of exercise on mental health from twitter. In AAAI, 2015.
  • [Gallois and Giles, 2015] Cindy Gallois and Howard Giles. Communication accommodation theory. The International Encyclopedia of Language and Social Interaction, 2015.
  • [Giles and Baker, 2008] Howard Giles and Susan C Baker. Communication accommodation theory. The International Encyclopedia of Language and Social Interaction, 2008.
  • [Habernal et al., 2018] Ivan Habernal, Henning Wachsmuth, Iryna Gurevych, and Benno Stein. Before name-calling: Dynamics and triggers of ad hominem fallacies in web argumentation. In NAACL, 2018.
  • [Hasan and Ng, 2013] Kazi Saidul Hasan and Vincent Ng. Extra-linguistic constraints on stance recognition in ideological debates. In ACL, 2013.
  • [Iyyer et al., 2014] Mohit Iyyer, Peter Enns, Jordan Boyd-Graber, and Philip Resnik. Political ideology detection using recursive neural networks. In ACL, 2014.
  • [Jaech et al., 2015] Aaron Jaech, Vicky Zayats, Hao Fang, Mari Ostendorf, and Hannaneh Hajishirzi. Talking to the crowd: What do people react to in online discussions? Politics, 7, 2015.
  • [Johnson and Goldwasser, 2018] Kristen Johnson and Dan Goldwasser. Classification of moral foundations in microblog political discourse. In ACL, 2018.
  • [Lukin and Walker, 2013] Stephanie Lukin and Marilyn Walker. Really? well. apparently bootstrapping improves the performance of sarcasm and nastiness classifiers for online dialogue. In NAACL, 2013.
  • [Mikolov et al., 2013] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In NIPS, 2013.
  • [Misra and Walker, 2013] Amita Misra and Marilyn A Walker. Topic independent identification of agreement and disagreement in social media dialogue. In SIGDIAL, 2013.
  • [Misra et al., 2015] Amita Misra, Pranav Anand, Jean E Fox Tree, and Marilyn Walker. Using summarization to discover argument facets in online idealogical dialog. In NAACL, 2015.
  • [Olteanu et al., 2017] Alexandra Olteanu, Onur Varol, and Emre Kiciman. Distilling the outcomes of personal experiences: A propensity-scored analysis of social media. In CSCW, 2017.
  • [Pavalanathan and Eisenstein, 2015] Umashanthi Pavalanathan and Jacob Eisenstein. Emoticons vs. emojis on twitter: A causal inference approach. arXiv:1510.08480, 2015.
  • [Pearl, 2009] Judea Pearl. Causality. 2009.
  • [Pennebaker et al., 2007] James W Pennebaker, Roger J Booth, and Martha E Francis. Liwc2007: Linguistic inquiry and word count. Austin, Texas:, 2007.
  • [Robins et al., 1994] James M Robins, Andrea Rotnitzky, and Lue Ping Zhao. Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association, pages 846–866, 1994.
  • [Rosenthal and McKeown, 2015] Sara Rosenthal and Kathy McKeown. I couldn’t agree more: The role of conversational structure in agreement and disagreement detection in online discussions. In SIGDIAL, 2015.
  • [Rubin, 2005] Donald B Rubin. Causal inference using potential outcomes: Design, modeling, decisions. Journal of the American Statistical Association, pages 322–331, 2005.
  • [Sridhar et al., 2015] Dhanya Sridhar, James Foulds, Bert Huang, Lise Getoor, and Marilyn Walker. Joint models of disagreement and stance in online debate. In ACL, 2015.
  • [Tan et al., 2014] Chenhao Tan, Lillian Lee, and Bo Pang. The effect of wording on message propagation: Topic-and author-controlled natural experiments on twitter. In ACL, 2014.
  • [Tan et al., 2016] Chenhao Tan, Vlad Niculae, Cristian Danescu-Niculescu-Mizil, and Lillian Lee. Winning arguments: Interaction dynamics and persuasion strategies in good-faith online discussions. In WWW, 2016.
  • [Van der Laan and Rose, 2011] Mark J Van der Laan and Sherri Rose. Targeted learning: causal inference for observational and experimental data. Springer Science & Business Media, 2011.
  • [Veitch et al., 2019] Victor Veitch, Yixin Wang, and David M Blei. Using embeddings to correct for unobserved confounding. arXiv:1902.04114, 2019.
  • [Walker et al., 2012a] Marilyn Walker, Pranav Anand, Rob Abbott, Jean E. Fox Tree, Craig Martell, and Joseph King. That’s your evidence?: Classifying stance in online political debate. Decision Support Sciences, 53(4), 2012.
  • [Walker et al., 2012b] Marilyn Walker, Pranav Anand, Robert Abbott, and Jean E. Fox Tree. A corpus for research on deliberation and debate. In LREC, 2012.
  • [Walker et al., 2012c] Marilyn Walker, Pranav Anand, Robert Abbott, and Richard Grant. Stance classification using dialogic properties of persuasion. In NAACL, 2012.
  • [Wei et al., 2016] Zhongyu Wei, Yang Liu, and Yi Li. Is this post persuasive? ranking argumentative comments in online forum. In ACL, 2016.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description