#DebateNight: The Role and Influence of Socialbots on Twitter During the 1st U.S. Presidential Debate
Serious concerns have been raised about the role of ‘socialbots’ in manipulating public opinion and influencing the outcome of elections by retweeting partisan content to increase its reach. Here we analyze the role and influence of socialbots on Twitter by determining how they contribute to retweet diffusions. We collect a large dataset of tweets during the 1st U.S. Presidential Debate in 2016 (#DebateNight) and we analyze its 1.5 million users from three perspectives: user influence, political behavior (partisanship and engagement) and botness. First, we define a measure of user influence based on the user’s active contributions to information diffusions, i.e. their tweets and retweets. Given that Twitter does not expose the retweet structure – it associates all retweets with the original tweet – we model the latent diffusion structure using only tweet time and user features, and we implement a scalable novel approach to estimate influence over all possible unfoldings. Next, we use partisan hashtag analysis to quantify user political polarization and engagement. Finally, we use the BotOrNot API to measure user botness (the likelihood of being a bot). We build a two-dimensional “polarization map” that allows for a nuanced analysis of the interplay between botness, partisanship and influence. We find that not only social bots are more active on Twitter – starting more retweet cascades and retweeting more – but they are 2.5 times more influential than humans, and more politically engaged. Moreover, pro-Republican bots are both more influential and more politically engaged than their pro-Democrat counterparts. However we caution against blanket statements that software designed to appear human dominates political debates. Firstly, it is known that accounts controlled by teams of humans (e.g. organizational accounts) are often identified as bots. Secondly, we find that many highly influential Twitter users are in fact pro-Democrat and that most pro-Republican users are mid-influential and likely to be human (low botness).
Socialbots are broadly defined as “software processes that are programmed to appear to be human-generated within the context of social networking sites such as Facebook and Twitter” [\citeauthoryearGehl and Bakardjieva2016, p.2]. They have recently attracted much attention and controversy, with concerns that they infiltrated political discourse during the 2016 U.S. Presidential election and manipulated public opinion at scale. Concerns were heightened with the discovery that the influential conservative commentator (@Jenn_Abrams, 70,000 followers) and a user claiming to belong to the Tennessee Republican Party (@TEN_GOP, 136,000 followers) were both in fact Russian-controlled bots operated by the Internet Research Agency in St. Petersburg [\citeauthoryearCollins and Cox2017, \citeauthoryearTimberg, Dwoskin, and Entous2017].
There are several challenges that arise when conducting large-scale empirical analysis of political influence of bots on Twitter. The first challenge concerns estimating user influence from retweet diffusions, where the retweet relations are unobserved – the Twitter API assigns every retweet to the original tweet in the diffusion. Current state-of-the-art influence estimation methods such as ConTinEst [\citeauthoryearDu et al.2013] operate on a static snapshot of the diffusion graph, which needs to be inferred from retweet diffusions using approaches like NetRate [\citeauthoryearRodriguez, Balduzzi, and Schölkopf2011]. This workflow suffers from two major drawbacks: first, the algorithms for uncovering the diffusion graph do not scale to millions of users like in our application; second, operating on the diffusion graph estimates the “potential of being influential”, but it loses information about user activity – e.g. a less well connected user can still be influential if they tweet a lot. The question is how to estimate at scale the influence of millions of users from diffusion in which the retweet relation is not observed? The second challenge lies in determining whether a user is a bot and also her political behavior, as manually labeling millions of users is infeasible. The question is therefore how to leverage recent automated bot detection approaches such as BotOrNot [\citeauthoryearDavis et al.2016] to measure the botness of users, and further, how to analyze political behavior (partisanship and engagement) at scale?
This paper addresses the above challenges on #DebateNight, a large dataset of 6.5 million tweets authored by 1.5 million users that was collected on 26 September 2016 during the first U.S. presidential debate.
To address the first challenge, we introduce, evaluate, and apply a novel algorithm to estimate user influence based on retweet diffusions. We model the latent diffusion structure using only time and user features by introducing the diffusion scenario – a possible unfolding of a diffusion – and its likelihood. We implement a scalable algorithm to estimate user influence over all possible diffusion scenarios associated with a diffusion. We demonstrate that our algorithm obtains state-of-the-art performance on a synthetic dataset with known ground truth. We also show that, unlike simpler alternative measures like the number of followers, or the mean size of initiated cascades, our influence measure assigns high scores to both highly-connected users who never start diffusions and to active retweeters with little followership.
We address the second challenge by proposing three new measures (political polarization , political engagement and botness ) and by computing them for each user in #DebateNight. We manually compile a list of partisan hashtags and we estimate political engagement based on the tendency to use these hashtags and political polarization based on whether pro-Democrat or pro-Republican hashtags were predominantly used. We use the BotOrNot API to evaluate botness and to construct four reference populations – Human, Protected, Suspended and Bot. We build a two-dimensional visualization – the polarization map – that enables a nuanced analysis of the interplay between botness, partisanship and influence. We make several new and important findings: (1) bots are more likely to be pro-Republican; (2) bots are more engaged than humans, and pro-Republican bots are more engaged than pro-Democrat bots; (3) the average pro-Republican bot is twice as influential as the average pro-Democrat bot; (4) very highly influential users are more likely to be pro-Democrat; and (5) highly influential bots are mostly pro-Republican.
The main contributions of this work include:
We introduce a scalable algorithm to estimate user influence over all possible unfoldings of retweet diffusions where the cascade structure is not observed;
We develop two new measures of political polarization and engagement based on usage of partisan hashtags;
We measure the botness of a very large population of users engaged in Twitter activity relating to an important political event – the 2016 U.S Presidential debates;
We propose the polarization map – a novel visualization of political polarization as a function of user influence and botness – and we use it to gain insights into the influence of bots on the information landscape around the U.S. elections.
2 Related Work
We structure the discussion of previous work into two categories: related work on the estimation of user influence and work concerning bot presence and behavior on Twitter.
Estimating user influence on Twitter. Aggregate measures such as the follower count, the number of retweets and the number of mentions have been shown to be indicative of user influence on Twitter [\citeauthoryearCha et al.2010]. More sophisticated estimates of user influence use eigenvector centrality to account for the connectivity of followers or retweeters; for example, TwitteRank [\citeauthoryearWeng et al.2010] extends PageRank [\citeauthoryearPage et al.1999] by taking into account topical similarity between users and network structure. Other extensions like Temporal PageRank [\citeauthoryearRozenshtein and Gionis2016] explicitly incorporate time into ranking to account for a time-evolving network. However, one limitation of PageRank-based methods is that they require a complete mapping of the social networks. More fundamentally, network centrality has the drawback of evaluating only the potential of a user to be influential in spreading ideas or content, and it does not account for the actions of the user (e.g. tweeting on a particular subject). Our influence estimation approach proposed in Sec. 3 is built starting from the user Twitter activity and it does not require knowledge of the social network.
Recent work [\citeauthoryearYates, Joselow, and Goharian2016, \citeauthoryearChikhaoui et al.2017] has focused on estimating user influence as the contribution to information diffusion. For example, ConTinEst [\citeauthoryearDu et al.2013] requires a complete diffusion graph and employs a random sampling algorithm to approximate user influence with scalable complexity. However, constructing the complete diffusion graph might prove problematic, as current state-of-the-art methods for uncovering the diffusion structure (for e.g. [\citeauthoryearRodriguez, Balduzzi, and Schölkopf2011, \citeauthoryearSimma and Jordan2010, \citeauthoryearCho et al.2013, \citeauthoryearLi and Zha2013, \citeauthoryearLinderman and Adams2014]) do not scale to the number of users in our dataset. This is because these methods assume that a large number of cascades occur in a rather small social neighborhood, whereas in #DebateNight cascades occur during a short period of time in a very large population of users. Our proposed algorithm estimates influence directly from retweet cascades, without the need to reconstruct the retweet graph, and it scales quadratically with the number of users.
Bot presence and behavior on Twitter.
The ‘BotOrNot’ Twitter bot detection API uses a Random Forest supervised machine learning classifier to calculate the likelihood of a given Twitter user being a bot, based on more than 1,000 features extracted from meta-data, patterns of activity, and tweet content (grouped into six main classes: user-based; friends; network; temporal; content and language; and sentiment)
[\citeauthoryearDavis et al.2016, \citeauthoryearVarol et al.2017]
Previous work has studied the political partisanship of Twitter bots. \citeauthorKollanyi.2016.presidentialdebate (\citeyearKollanyi.2016.presidentialdebate) analyzed candidate-oriented hashtag use during the 1st U.S. Presidential Debate and found that highly automated accounts (self-identified bots and/or accounts that post at least 50 times a day) were disproportionately pro-Trump. \citeauthorFM7090 (\citeyearFM7090) also studied political partisanship by identifying five pro-Trump and four pro-Clinton hashtags and assigning users to a particular political faction. The results suggested that both humans and bots were more pro-Trump in terms of hashtag partisanship. However, the above findings are limited to a comparison between humans and bots of frequency counts of tweets authored and retweets received, and they provide no insight into the importance of users in retweet diffusions. We overcome this limitation by modeling the latent structure of retweet diffusions and computing user influence over all possible scenarios.
3 Estimating influence in retweet cascades
An information cascade of size is defined as a series of messages sent by user at time , i.e. . Here is the initial message, and with are subsequent reposts or relays of the initial message. In the context of Twitter, the initial message is an original tweet and the subsequent messages are retweets (which by definition, are also tweets) of that original tweet. A latent retweet diffusion graph has the set of tweets as its vertexes , and additional edges that represent that the tweet is a retweet of the tweet, and respects the temporal precedence . Web data interfaces such as the Twitter API provide cascades, but not the diffusion edges . Such missing data makes it impossible to directly measure a given user’s contribution to the diffusion process.
3.1 Modeling latent diffusions
Diffusion scenarios. We focus on tree-structured diffusion graphs, i.e. each node has only one incoming link , . Denote the set of trees that are consistent with the temporal order in cascade as , we call each diffusion tree a diffusion scenario . Fig. a contains a cascade visualized as a star graph, denoting that subsequent tweets are cascading effects from the first tweet at . Fig. b shows four example diffusion scenarios that can lead to this cascade. The main challenge here is to estimate the influence of each user in the cascade, taking into account all possible diffusion trees.
Probability of retweeting. For each tweet , we model the probability of it being a direct descendant of each previous tweet in the same cascade as a weighted softmax function, defined as follows. In line with previous work [\citeauthoryearCrane and Sornette2008, \citeauthoryearMishra, Rizoiu, and Xie2016, \citeauthoryearZhao et al.2015, \citeauthoryearShen et al.2014, \citeauthoryearYu et al.2015], we model two factors: firstly, users retweet fresh content [\citeauthoryearWu and Huberman2007]. The probability of the edge decays exponentially with the time difference ; secondly, users prefer to retweet locally influential users, also known as preferential attachment [\citeauthoryearBarabási2005]. We measure the local influence of a user using her number of followers [\citeauthoryearKwak et al.2010, \citeauthoryearCha et al.2010, \citeauthoryearRizoiu et al.2017]. We quantify the probability of an edge as:
where controls the temporal decay of the probability and is the number of followers of user (the user of ).
3.2 Tweet influence
Tweet influence over one diffusion scenario. Let be a path in the diffusion scenario – i.e a sequence of nodes which starts with and ends with . is the probability of reaching from . We define the influence of in the scenario as the expected number of users reached from using a model of independent binomials to decide whether or not to take each hop in each path that starts with . Formally:
where is a function that takes the value 1 when the path from to exists in , and 0 otherwise.
Tweet influence over a retweet cascade. We define the influence of over all valid diffusion scenarios :
Under the assumption that retweeting events (i.e. edges) occur independently one from another, the probability of a diffusion scenario is:
It is intractable to directly evaluate Eq. (3) (together with Eq. (2) and (4)), particularly due to the factorial number of diffusion scenarios in (as shown in the online supplement [\citeauthoryearsup2018, annex A]). For example, there are diffusion scenarios for a cascade of 100 retweets. We develop, in the next section, an efficient algorithm to compute the influence of all tweets.
3.3 Tractable tweet influence computation
The key observation for a tractable computation of Eq. (3) is that tweets are added sequentially at time , to each diffusion scenario constructed at time . Adding to a diffusion scenario has two effects. First, new diffusion scenarios are generated, as can attach to any of the existing nodes. Second, contributes only once to the tweet influence of each tweet that is found on the branch it attaches to and it does not make any other contributions at times . This process is exemplified in Fig. c. Node is added to a given diffusion scenario, generating 4 new diffusion scenarios at time . The nodes colored in red see their influence increase as a result of adding . This allows to compute tweet influence incrementally, by updating at each time . We denote by the value of tweet influence of after adding node . Thus, we only keep track of how tweet influence increases as nodes attach and we do not need to construct all diffusion scenarios.
We define as the contribution of to the tweet influence of , over all possible diffusion scenarios. It can be shown that the influence of increases at time by :
Recursive computation of tweet influence. can be alternatively interpreted as the influence of tweet over . Consequently, we can compute the total influence of as the sum of the individual influences of over each of the other nodes in the diffusion. This can be recursively computed as:
Computational complexity for estimating influence. The scalable influence computation algorithm shown in Algorithm 1 uses two matrices: matrix , with and matrix . It computes each column of sequentially, using Eq. (6). Recursively computing each column in takes multiplications/additions, the total computations complexity is . In real cascades containing 1000 tweets, the above algorithm finishes in 34 seconds on a PC. For more details and examples, see the online supplement [\citeauthoryearsup2018, annex B]. Throughout the experiments in Sec. 5 and 6, we set the temporal decay hyper-parameter defined in Eq (1) to , tuned using linear search on a sample of 20 real retweet diffusions (details in the online supplement [\citeauthoryearsup2018, annex D]).
3.4 Computing influence of a user
Given – the set of tweets authored by user –, we define the user influence of as the mean tweet influence of tweets :
To account for the skewed distribution of user influence, we mostly use the normalization – percentiles with a value of 1 for the most influential user our dataset and 0 for the least influential – denoted .
4 Dataset and measures of political behavior
In this section, we first describe the #DebateNight dataset that we collected during the first U.S. presidential debate. Next, we introduce three measures for analyzing the political behavior of users who were active on Twitter during the debate. In Sec. 4.1, we introduce political polarization and political engagement . In Sec. 4.2 we introduce the botness score and we describe how we construct the reference bot and human populations.
The #DebateNight dataset
contains Twitter discussions that occurred during the first 2016 U.S presidential debate between Hillary Clinton and Donald Trump.
Using the Twitter Firehose API
4.1 Political polarization and engagement
Protocol. We extracted the 1000 most frequent hashtags in our dataset. Using a content analysis approach [\citeauthoryearKim and Kuljis2010], we coded each hashtag into two categories: Democrat and Republican. Hashtags that did not have a clear political polarity were not labeled and thus excluded from analysis. Our coding methodology is similar to previous work [\citeauthoryearKollanyi, Howard, and Woolley2016, \citeauthoryearBessi and Ferrara2016], with the difference that we extract candidate hashtags from the data instead of using a predefined set of partisan hashtags. Fig. 2 presents the wordclouds of the most frequent partisan hashtags, for Democrats (top) and Republicans (bottom). We chose hashtags indicating either strong support for a candidate (e.g., #imwithher for Clinton and #trump2016 for Trump), or opposition and/or antagonism (e.g., #nevertrump and #crookedhillary). This results in 93 Democrat and 86 Republican hashtags.
Two measures of political behavior. We identify 65,031 tweets in #DebateNight that contain at least one partisan hashtag. 1,917 tweets contain partisan hashtags with both polarities: these are mostly negative tweets towards both candidates (e.g. “Let’s Get READY TO RUMBLE AND TELL LIES. #nevertrump #neverhillary #Obama”) or hashtag spam. We count the number of occurrences of partisan hashtags for each user, and we detect a set of 46,906 politically engaged users that have used at least one partisan hashtag. Each politically engaged user has two counts: the number of Democrat hashtags that used, and the number of Republican hashtags. We measure the political polarization as the normalized difference between the number of Republican and Democrat hashtags used:
takes values between (if emitted only Democrat partisan hashtags) and ( emitted only Republican hashtags). We threshold the political polarization to construct a population of Democrat users with and Republican users with . In the set of politically engaged users, there are 21,711 Democrat users, 22,644 Republican users and 2,551 users with no polarization (). We measure the political engagement of users using the total volume of partisan hashtags included in their tweets .
4.2 Botness score and bot detection
Detecting automated bots. We use the BotOrNot [\citeauthoryearDavis et al.2016] API to measure the likelihood of a user being a bot for each of the 1,451,388 users in the #DebateNight dataset. Given a user , the API returns the botness score (with 0 being likely human, and 1 likely non-human). Previous work [\citeauthoryearVarol et al.2017, \citeauthoryearBessi and Ferrara2016, \citeauthoryearWoolley and Guilbeault2017] use a botness threshold of to detect social bots. However, we manually checked a random sample of 100 users with and we found several human accounts being classified as bots. A threshold of 0.6 decreases mis-classification by . It has been previously reported by [\citeauthoryearVarol et al.2017] that organizational accounts have high botness scores. This however is not a concern in this work, as we aim to detect ‘highly automated’ accounts that behave in a non-human way. We chose to use a threshold of to construct the Bot population in light of the more encompassing notion of account automation.
Four reference populations. In addition to the Bot population, we construct three additional reference populations: Human contains users with a high likelihood of being regular Twitter users. Protected are the users whose profile has the access restricted to their followers and friends (the BotOrNot system cannot return the botness score); we consider these users to be regular Twitter users, since we assume that no organization or broadcasting bot would restrict access to their profile. Suspended are those users which have been suspended by Twitter between the date of the tweet collection (26 September 2016) and the date of retrieving the botness score (July 2017); this population has a high likelihood of containing bots. Table 1 tabulates the size of each population, split over political polarization.
5 Evaluation of user influence estimation
In this section, we evaluate our proposed algorithm and measure of user influence. In Sec 5.1, we evaluate on synthetic data against a known ground truth. In Sec. 5.2, we compare the measure (defined in Sec. 3.4) against two alternatives: the number of followers and the mean size of initiated cascades.
5.1 Evaluation of user influence
Evaluating user influence on real data presents two major hurdles. The first is the lack of ground truth, as user influence is not directly observed. The second hurdle is that the diffusion graph is unknown, which renders impossible comparing to state-of-the-art methods which require this information (e.g. ConTinEst [\citeauthoryearDu et al.2013]). In this section, evaluate our algorithm against a known ground truth on a synthetic dataset, using the same evaluation approach used for ConTinEst.
Evaluation on synthetic data. We evaluate on synthetic data using the protocol previously employed in [\citeauthoryearDu et al.2013]. We use the simulator in [\citeauthoryearDu et al.2013] to generate an artificial social network with 1000 users. We then simulate 1000 cascades through this social network, starting from the same initial user. The generation of the synthetic social network and of the cascades is detailed in the online supplement [\citeauthoryearsup2018, annex C]. Similar to the retweet cascades in #DebateNight, each event in the synthetic cascades has a timestamp and an associated user. Unlike the real retweet cascades, we know the real diffusion structure behind each synthetic cascade. For each user , we count the number of nodes reachable from in the diffusion tree of each cascade. We compute the influence of as the mean influence over all cascades. ConTinEst [\citeauthoryearDu et al.2013] has been shown to asymptotically approximate this synthetic user influence.
We use our algorithm introduced in Sec. 3.3 on the synthetic data, to compute the measure defined in Eq. 7. We plot in Fig. a the 2D scatter-plot and the density plot of the synthetic users, with our influence measure on the y-axis and the ground truth on the x-axis (both in percentiles). Visibly, there is a high agreement between the two measures, particularly for the most influential and the least influential users. The Spearman correlation coefficient of the raw values is . This shows that our method can obtain state-of-the-art performances, in the absence of any information about the structure of the diffusions.
5.2 Comparison with other influence metrics
We compare the influence measure against two alternatives that can be computed on #DebateNight.
Mean size of initiated cascades (of a user ) is the average number of users reached by original content authored by . It should be noted that this measure does not capture ’s role in diffusing content authored by someone else. In the context of Twitter, mean size of initiated cascades is the average number of users who retweeted an original tweet authored by : we compute this for every user in the #DebateNight dataset, and we plot it against in Fig. b. Few users have a meaningful value for mean cascade size: of users never start a cascades (and they are not accounted for in Fig. b); out of the ones that start cascades are never retweeted and they are all positioned at the lowest percentile (shown by the 1D histograms in the plot). It is apparent that the mean cascade size metric detects the influential users that start cascades, and it correlates with . However, it misses highly influential users who never initiate cascades, but who participate by retweeting. Examples are user @SethMacFarlane (the actor and filmmaker Seth MacFarlane, 10.8 million followers) or user @michaelianblack (comedian Michael Ian Black, 2.1 million followers), both with in the top most influential users.
Number of followers is one of the simplest measures of direct influence used in literature [\citeauthoryearMishra, Rizoiu, and Xie2016, \citeauthoryearZhao et al.2015]. While being loosely correlated with (visible in Fig. c, Pearson ), it has the drawback of not accounting for any of the user actions, such as an active participation in discussions or generating large retweet cascades. For example, user @PoliticJames (alt-right and pro-Trump, 2 followers) emitted one tweet in #DebateNight, which was retweeted 18 times and placing him in the top most influential users. Similarly, user @tiwtter1tr4_tv (now suspended, 0 followers) initiated a cascade of size 58 (top most influential). Interestingly, half of the accounts scoring on the bottom by number of followers and top by influence are now suspended or have very high botness scores.
6 Results and findings
In this section, we present an analysis of the interplay between botness, political behavior (polarization and engagement) and influence. In Sec. 6.1, we first profile the activity of users in the four reference populations; next, we analyze the political polarization and engagement, and their relation with the botness measure. Finally, in Sec. 6.2 we tabulate user influence against polarization and botness, and we construct the polarization map.
6.1 Political behavior of humans and bots
Twitter activity across four populations. We measure the behavior of users in the four reference populations defined in Sec. 4.1 using several measures computed from the Twiter API. The number of cascades started (i.e. number of original tweets) and the number of posted retweets are simple measures of activity on Twitter, and they are known to be long-tail distributed [\citeauthoryearCha et al.2010]. Fig. a and b respectively plot the log-log plot of the empirical Complementary Cumulative Distribution Function (CCDF) for each of the two measures. It is apparent that users in the Bot and Suspended populations exhibit higher levels of activity than the general population, whereas the Human and Protected populations exhibit lower level. Fig. c and d plot the number of followers and present a more nuanced story: the average bot user has 10 times more followers than the average human user; however, bots have a median of followers, less than the median followers of human users. In other words, some bots are very highly followed, but most are simply ignored. Finally, Fig. e shows that bots favorite less than humans, indicating that their activity patterns differ from those of humans.
Political polarization and engagement. The density distribution of political polarization (Fig. a) shows two peaks at -1 and 1, corresponding to strongly pro-Democrat and strongly pro-Republican respectively. The shape of the density plot is consistent with the sizes of Republican and Democrat populations (Sec. 4.1), and the extreme bi-modality can be explained by the clear partisan nature of the chosen hashtags and by the known political polarization of users on Twitter [\citeauthoryearConover et al.2011, \citeauthoryearBarberá et al.2015], which will be greatly enhanced in the context of a political debate. Fig. b presents the log-log plot of the CCDF of the political engagement, which shows that the political engagement score is long-tail distributed, with pro-Democrats slightly more engaged than pro-Republicans overall.
Botness and political polarization. The distribution of botness exhibits a large peak around and a long tail (Fig. c). The dashed gray vertical lines show the threshholds used in Sec. 4.2 for constructing the reference Human () and Bot () populations. The density distribution for politically polarized users is skewed towards higher botness, showing that politically polarized users are more likely to be automated systems. Fig. d shows the conditional density of polarization conditioned on botness. While the likelihood of being pro-Democrat or pro-Republican varies significantly with botness, for high botness scores the likelihood of being Republican is consistently higher than that of being pro-Democrat. In other words, socialbots accounts are more likely to be pro-Republican.
Political engagement of bots. Fig. e shows the CCDF of political engagement of the four reference populations, and it is apparent that the Bot and Suspended populations exhibit consistently higher political engagement than the Human and Protected populations. Fig. f shows the CCDF of political engagement by the political partisanship of bots and we find that pro-Republican Bot accounts are more politically engaged than their pro-Democrat counterparts. In summary, socialbots are more engaged than humans, and pro-Republican bots are more engaged than their pro-Democrat counterparts.
6.2 User influence and polarization map
User influence across four populations. First, we study the allocation of user influence across the four reference populations constructed in Sec. 4.2. We plot the CCDF in Fig. a and we summarize user influence as boxplots in Fig. b for each population. User influence is long-tail distributed (shown in Fig. a) and it is higher for Bot and Suspended populations, than for Human and Protected (shown in Fig b). There is a large discrepancy between the influence of Human and Bot, with the average bot having 2.5 times more influence than the average human. We further break down users in the Bot population based on their political polarization. Fig. d aggregates as boxplots the influence of pro-Democrat and pro-Republican bots (note: not all bots are politically polarized). Notably, on a per-bot basis, pro-Republican bots are more influential than their pro-Democrat counterparts – the average pro-Republican bot is twice as influential as the average pro-Democrat bot.
Political polarization and user influence. Next, we analyze the relation between influence and polarization. Fig. c plots the probability distribution of political polarization, conditioned on user influence . While for mid-range influential users () the likelihood of being Republican is higher than being Democrat, we observe the inverse situation on the higher end of the influence scale. Very highly influential users () are more likely to be pro-Democrat, and this is consistent with the fact that many public figures were supportive of the Democrat candidate during the presidential campaign.
The polarization map. Finally, we create a visualization that allows us to jointly account for botness and user influence when studying political partisanship. We project each politically polarized user in #DebateNight onto the two-dimensional space of user influence (x-axis) and botness (y-axis). We compute the 2D density estimates for the pro-Democrat and pro-Republican users (shown in the online supplement [\citeauthoryearsup2018, annex E]). For each point in the space we compute a score as the log of the ratio between the density of the Republican users and that of the pro-Democrats, which gets renormalized to -1 (mostly Democrat) to +1 (mostly Republican). The resulting map – dubbed the polarization map – is shown in Fig. 7 and it provides a number of insights. We find a pro-Democrat area corresponding to highly influential users (already shown in Fig. c) that spans across most of the range of botness values. However the largest predominantly pro-Republican area corresponds to mid-range influence (also shown in Fig. c), but concentrates around small botness values – which indicates the presence of a large pro-Republican population of mainly human users with regular user influence. We also observe that top-right area (with high botness and high influence) is predominantly red. In other words highly influential bots are mostly pro-Republican.
In this paper, we study the influence and the political behavior of socialbots. We introduce a novel algorithm for estimating user influence from retweet cascades in which the diffusion structure is not observed. We propose four measures to analyze the role and user influence of bots versus humans on Twitter during the 1st U.S. Presidential Debate of 2016. The first is the user influence, computed over all possible unfoldings of each cascade. Second, we use the BotOrNot API to retrieve the botness score for a large number of Twitter users. Lastly, by examining the 1000 most frequently-used hashtags we measure political polarization and engagement. We analyze the interplay of influence, botness and political polarization using a two-dimensional map – the polarization map. We make several novel findings, for example: bots are more likely to be pro-Republican; the average pro-Republican bot is twice as influential as its pro-Democrat counterpart; very highly influential users are more likely to be pro-Democrat; and highly influential bots are mostly pro-Republican.
Validity of analysis with respect to BotOrNot. The BotOrNot algorithm uses tweet content and user activity patterns to predict botness. However, this does not confound the conclusions presented in Sec. 6. First, political behavior (polarization and engagement) is computed from a list of hashtags specific to #DebateNight, while the BotOrNot predictor was trained before the elections took place and it has no knowledge of the hashtags used during the debate. Second, a loose relation between political engagement and activity patterns could be made, however we argue that engagement is the number of used partisan hashtags, not tweets – i.e. users can have a high political engagement score after emitting few very polarized tweets.
Assumptions, limitations and future work. This work makes a number of simplifying assumptions, some of which can be addressed in future work. First, the delay between the tweet crawling (Sept 2016) and computing botness (July 2017) means that a significant number of users were suspended or deleted. A future application could see simultaneous tweets and botscore crawling. Second, our binary hashtag partisanship characterization does not account for independent voters or other spectra of democratic participation, and future work could evaluate our approach against a clustering approach using follower ties to political actors [\citeauthoryearBarberá et al.2015]. Last, this work computes the expected influence of users in a particular population, but it does not account for the aggregate influence of the population as a whole. Future work could generalize our approach to entire populations, which would allow answers to questions like “Overall, were the Republican bots more influential than the Democrat humans?”.
.tocmtappendix \etocsettagdepthmtchapternone \etocsettagdepthmtappendixsubsection \etoctocstyle1Contents (Appendix)
- 1 Introduction
- 2 Related Work
- 3 Estimating influence in retweet cascades
- 4 Dataset and measures of political behavior
- 5 Evaluation of user influence estimation
- 6 Results and findings
- 7 Discussion
- A Derivation of the influence formula
- B Efficient tweet influence computation
- C Generation of synthetic data
- D Choosing the temporal decay parameter
- E Additional 2D densities plots
Appendix A Derivation of the influence formula
In this section, we detail the calculation of the tweet influence , proposed in Eq. (3). In Sec. A.1, we define the notion of diffusion scenario, and we compute its likelihood given an observed retweet cascade. In Sec. A.2, we compute the formula for tweet influence over all possible diffusion scenarios associated with the given cascade.
a.1 Diffusion scenarios
Diffusion trees. We can represent an online diffusion using a directed tree , in which each node has a single parent and the direction of the edges indicates the flow of the information. For retweet cascades, the nodes are individual tweets and each directed edge (showing the direction ) indicates that is a direct retweet of . A direct retweet means that – the user that emitted the tweet – clicked on the “Retweet” option under tweet . The top panel of Fig. a shows an example of such a diffusion tree. Note that each node has associated a time of arrival and that the diffusion tree respects the order of the times of arrival – i.e. given the edge , then . The bottom panel of Fig. a shows the incremental construction of the diffusion tree shown in top panel: node is the root of the tree and the source of the information diffusion; at each time , node attaches to the previous tree constructed at time .
|diffusion tree. In case the tree is unobserved, is a diffusion scenario.|
|node in the diffusion tree (i.e. retweets).|
|directed edge in the diffusion tree, tweet is a direct retweet of .|
|time of arrival of node (timestamp of the tweet).|
|user that has emitted tweet .|
|local influence (i.e. number of followers) of user .|
Diffusion scenarios for retweet cascades. The diffusion tree is not observed for real Twitter retweet cascades, since the Twitter API does not expose the direct retweet relationships. Instead, it assigns every retweet in the cascade to the original tweet. Every retweet cascades constructed based on raw retweet information from the Twitter API resembles the graph in Fig. a. Due to this particular shape, we denote retweet cascades as stars. However, the API exposes the time of arrival of the retweets . We denote as a diffusion scenario any valid diffusion tree that could be associated with the observed retweet star – i.e., the edges in the diffusion tree respects the order of arrival of retweets. Fig. b shows four examples of diffusion scenarios associated with the star in Fig. a.
Constructing diffusion scenarios. Fig. b exemplifies a straight-forward method to enumerate all diffusion scenarios associated with the star in Fig. a. The node is the root node and it is published at time ; tweet occurs at time and it is undoubtedly a direct retweet of – a directed edge is drawn from to . Tweet observed at can be a direct retweet of either tweet or tweet . Therefore, at time there are two possible diffusion scenarios: with the edge set and with the edge set . Similarly, can be a direct retweet of , or , in either or . Consequently, at time there are 6 possible diffusion scenarios. The process continues until all nodes have been attached. For an observed star of size , there are associated diffusion scenarios.
Probability of a diffusion scenario. Given that in retweet cascades individual edges are not observed, we define the probability of an edge as the likelihood that emitted tweet as a direct retweet of . In line with previous work [\citeauthoryearCrane and Sornette2008, \citeauthoryearMishra, Rizoiu, and Xie2016, \citeauthoryearZhao et al.2015, \citeauthoryearShen et al.2014, \citeauthoryearYu et al.2015], we model two factors in the likelihood of retweeting: firstly, users retweet fresh content [\citeauthoryearWu and Huberman2007]. The probability of the edge decays exponentially with the time difference ; secondly, users prefer to retweet locally influential users, also known as preferential attachment [\citeauthoryearBarabási2005]. We measure the local influence of a user using his number of followers [\citeauthoryearKwak et al.2010, \citeauthoryearCha et al.2010, \citeauthoryearRizoiu et al.2017]. We quantify the probability of an edge as:
where is the number of followers of the user of and controls the temporal decay of the probability.
Under the assumption that retweeting events (i.e. edges) occur independently one from another, we obtain the probability of a diffusion scenario as:
Note that the above assumption of independence of retweet events is a strong assumption. Current state-of-the-art approaches [\citeauthoryearMishra, Rizoiu, and Xie2016, \citeauthoryearZhao et al.2015] for modeling retweet cascades employ self-exciting point processes, in which the arrival of one event increases the probability of future events. However, for our application of estimating the probability of a diffusion scenario it is the simplest assumption. Additional arguments in its favor are also that we are studying networks of events (in which each event is identified with unique user), not networks of users. The interdependence often observed in socially generated networks (like triadic closure) can be ignored in edge formation within a particular retweet cascade.
a.2 Computing tweet influence
[\citeauthoryearDu et al.2013] define user influence of a user as the average number of users in the social network who get in contact with the content emitted by . For retweet cascades the diffusion tree is not observed; it is impossible to directly measure user influence, apart from the root user. We define the tweet influence over a retweet cascade as the expected number of time it is retweeted – direct retweets or descendants in a diffusion scenario –, over all possible diffusion scenarios associated with the given star. Finally, we compute the influence of user as the sum of the influences of the tweets that authored. We see that the definition in [\citeauthoryearDu et al.2013] is a special case of our definition, in which the diffusion tree is observed.
Tweet influence over one diffusion scenario. Let be a path in the diffusion scenario – i.e a sequence of nodes which starts with and ends with . is the probability of reaching from . is the influence of and it is computed as the expected number of users reached from using a model of independent binomials to decide whether or not to take each hop in each path that starts with . Formally:
where is a function that takes the value 1 when the path from to exists in .
Tweet influence over a retweet cascade. We compute the influence of tweet over – all possible diffusion scenarios associated with a retweet cascade – as:
It is intractable to directly evaluate Eq. (13), plugged in with Eq. (10) and (12), particularly due to the factorial number of diffusion scenarios in . For example, there are diffusion scenarios for a cascade of 100 retweet. We develop, in the next section, an efficient linear time algorithm to compute the influence of all tweets in a retweet cascade.
Appendix B Efficient tweet influence computation
The key observation is that each tweet is added simultaneously at time to all diffusion scenarios constructed at time . contributes only once to the tweet influence of each tweet that is found on the branch it attached to. This process is exemplified in Fig. c. Node is added to a given diffusion scenario, generating 4 new diffusion scenarios at time . We color in red the nodes whose influence increases as a result of adding node . This allows to compute the tweet influence incrementally, by updating at each time . We denote by the value of tweet influence of after adding node . As a result, we only keep track of how tweet influence increases over time steps and we do not require to construct all diffusion scenarios.
b.1 Complete derivation of the recursive influence formula
Incremental construction of diffusion scenarios. Let be a diffusion scenario constructed at time , with the set of nodes . When arrives, it can attach to any node in , generating new diffusion scenarios , with and . This process is exemplified in Fig. c. We can write the set of scenarios at time as:
We write the tweet influence of at time as:
Note that in Eq. (15) we explicitly make use of how diffusion scenarios at time are constructed based on the diffusion scenarios at time .
Attach a new node . We concentrate on the right-most factor in Eq. (15) – the tweet influence in scenario . We observe that the terms in Eq. (11) can be divided into two: the paths from to all other nodes except and the path from . We obtain:
Note that a path that does not involve has the same probability in and in its parent scenario :
Consequently, is the tweet influence of at the previous time step . Note that because is necessarily the direct retweet of one of the previous nodes of the retweet cascade.
Contribution of . With being the influence of at the previous time step, intuitively is the contribution of to the influence of . Knowing that:
we write as:
An alternative interpretation for is the influence of over . Eq. (19) can be intuitively understood as the expected influence of over a newly attached node is proportional to the influence of over each already attached node multiplied with the likelihood that attached itself to . We obtain the formula of the the expected influence of a node over another node as:
b.2 Tweet influence algorithm
We define two matrices: matrix , with . Element of the matrix is the square probability that tweet is a direct retweet of tweet ; matrix , with defined in Eq. (20). Element of matrix is the contribution of to the influence of . Alternatively, can be interpreted as the influence of on .
From Eq. (20) follows the formula for computing iteratively the columns of matrix :
with the value 1 occurring on line . For each column , we compute the first elements by multiplying the sub-matrix with the first elements on the column of matrix . The computation of matrix finishes in linear time, after steps, where is the total number of retweets in the retweet cascade. Fig. 9 demonstrated the computation of the first four columns in . Algorithm 1 gives an overview of the efficient influence computation algorithm.
Appendix C Generation of synthetic data
This section completes and details the results concerning the evaluation on synthetic data, presented in the main text Sec. 5.1. This section details the construction of a synthetic random social graph (Sec. C.1) and the sampling of synthetic cascades (Sec. C.2). The purpose is to construct a synthetic dataset of cascades, in which the user influence ground truth is known. Both the graph and cascade generators described here below reproduce closely the synthetic experimental setup described in [\citeauthoryearDu et al.2013].
c.1 Generation of random graphs
In this section, we describe the construction of a synthetic social graph, with nodes, specified by its adjacency matrix . Each node corresponds to a synthetic user, and the edges correspond to synthetic follow relations. We follow the below steps:
Given the number of nodes in the graph (here ), create an null (all zero) adjacency matrix of size ;
Randomly choose the number of edges , between to .
Randomly choose and between to and set to . Iterate this step until different edges are generated.
The adjacency matrix defines the final random graph.
c.2 Sampling synthetic cascades
In this section we describe how to construct synthetic cascades, given a synthetic social graph constructed as shown in Sec. C.1. To generate one cascades, to each edge in we associate an exponentially distributed waiting time and we construct a shortest path tree. We detail this procedure in the following steps:
Similar to previous work (ConTinEst [\citeauthoryearDu et al.2013]), for each edge draw a transmission rate from a Weibull distribution of shape .
Given the transmission rate , we draw an Exponentially distributed waiting time using the inverse transform sampling:
where is draw uniformly from .
Set as the weight of edge .
Starting from a source node , construct the shortest path tree from to all the other nodes in ;
For each node compute two measures: – its time of occurrence as the total waiting time along the path from to – and – the number of reachable nodes from in the shortest path tree;
The generated cascade is where ;
The ground truth influence of node is the mean over multiple random graphs (here 100).
Appendix D Choosing the temporal decay parameter
The temporal decay parameter shown in Eq. (1) is determined by linear search. Eq. (1) measures the probabilities of edges in the retweeting (diffusion) network, however the real diffusions are not observed. We make the assumption that diffusions occur along edges in the underlying social graph (follower relation). We measure the fitness of edge probability by the likelihood of uncovering the ground truth follower graph. In other words, if the edge exists in the follower graph (i.e. is a follower of ) then the edge has the highest probability in the diffusion tree than the edge which does not exist in the following graph. In other words, we are using the retweeting probability to predict the existence of edges in the social graph. We randomly select 20 cascades, and we crawl the following list of every user appearing the the diffusions (the following list for a user consists of users followed by ). We use this information as ground truth for the following prediction exercise: given a user who emitted tweet in a particular cascade, we want to predict which among the users in the set are followed by . Considering that are real numbers and the prediction target is binary, we use the AUC (are under ROC curve) the measure the prediction performance of a particular probability scoring function. For each value of in Eq. (1) we compute the mean AUC over all predictions. We perform a linear search for the optimal between to . Finally, maximizes the mean of AUC and it is chosen as value in the experiments in Sec. 5 and 6.
Appendix E Additional 2D densities plots
- See: https://botometer.iuni.iu.edu/#!/
- Via the Uberlink Twitter Analytics Service.
- Barabási, A.-L. 2005. The origin of bursts and heavy tails in human dynamics. Nature 435(7039):207–11.
- Barberá, P.; Jost, J. T.; Nagler, J.; Tucker, J. A.; and Bonneau, R. 2015. Tweeting from left to right: Is online political communication more than an echo chamber? Psychological Science 26(10):1531–1542.
- Bessi, A., and Ferrara, E. 2016. Social bots distort the 2016 u.s. presidential election online discussion. First Monday 21(11).
- Cha, M.; Haddadi, H.; Benevenuto, F.; and Gummadi, K. P. 2010. Measuring User Influence in Twitter: The Million Follower Fallacy. In ICWSM ’10, volume 10, 10–17.
- Chikhaoui, B.; Chiazzaro, M.; Wang, S.; and Sotir, M. 2017. Detecting communities of authority and analyzing their influence in dynamic social networks. ACM Trans. Intell. Syst. Technol. 8(6):82:1–82:28.
- Cho, Y.-S.; Galstyan, A.; Brantingham, P. J.; and Tita, G. 2013. Latent self-exciting point process model for spatial-temporal networks. Disc. and Cont. Dynamic Syst. Series B.
- Collins, B., and Cox, J. 2017. Jenna abrams, russia\unichar8217s clown troll princess, duped the mainstream media and the world. https://www.thedailybeast.com/jenna-abrams-russias-clown-troll-princess-duped-the-mainstream-media-and-the-world.
- Conover, M. D.; Ratkiewicz, J.; Francisco, M.; Goncalves, B.; Menczer, F.; and Flammini, A. 2011. Political polarization on Twitter. In ICWSM’11, 89–96. AAAI.
- Crane, R., and Sornette, D. 2008. Robust dynamic classes revealed by measuring the response function of a social system. PNAS 105(41):15649–15653.
- Davis, C. A.; Varol, O.; Ferrara, E.; Flammini, A.; and Menczer, F. 2016. Botornot: A system to evaluate social bots. In WWW Companion, 273–274. ACM.
- Du, N.; Song, L.; Gomez-Rodriguez, M.; and Zha, H. 2013. Scalable Influence Estimation in Continuous-Time Diffusion Networks. In NIPS’13, 3147–3155.
- Gehl, R., and Bakardjieva, M. 2016. Socialbots and Their Friends: Digital Media and the Automation of Sociality.
- Kim, I., and Kuljis, J. 2010. Applying content analysis to web-based content. Jour. of Comp. and Inf. Tech. 18(4):369–375.
- Kollanyi, B.; Howard, P. N.; and Woolley, S. C. 2016. Bots and automation over twitter during the first u.s. presidential debate: Comprop data memo 2016.1. Oxford, UK.
- Kwak, H.; Lee, C.; Park, H.; and Moon, S. 2010. What is twitter, a social network or a news media? In WWW’10, 591–600.
- Li, L., and Zha, H. 2013. Dyadic event attribution in social networks with mixtures of hawkes processes. In CIKM’13, 1667–1672. ACM.
- Linderman, S., and Adams, R. 2014. Discovering latent network structure in point process data. In ICML’14, 1413–1421.
- Mishra, S.; Rizoiu, M.-A.; and Xie, L. 2016. Feature driven and point process approaches for popularity prediction. In CIKM’16, 1069–1078.
- Page, L.; Brin, S.; Motwani, R.; and Winograd, T. 1999. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab.
- Rizoiu, M.-A.; Xie, L.; Sanner, S.; Cebrian, M.; Yu, H.; and Van Hentenryck, P. 2017. Expecting to be HIP: Hawkes Intensity Processes for Social Media Popularity. In WWW’17, 735–744.
- Rodriguez, M. G.; Balduzzi, D.; and Schölkopf, B. 2011. Uncovering the Temporal Dynamics of Diffusion Networks. In ICML’11, 561–568.
- Rozenshtein, P., and Gionis, A. 2016. Temporal pagerank. In ECML-PKDD’16, 674–689. Springer.
- Shen, H.; Wang, D.; Song, C.; and Barabási, A. 2014. Modeling and Predicting Popularity Dynamics via Reinforced Poisson Processes. In AAAI’14, number 3, 291–297. AAAI Press.
- Simma, A., and Jordan, M. I. 2010. Modeling events with cascades of poisson processes. UAI’10.
- 2018. Appendix: #DebateNight: The role and the influence of socialbots on twitter during the 1st U.S. presidential debate. https://www.dropbox.com/s/b5zszszlccb37fu/ICWSM-2018-SI.pdf?dl=0.
- Timberg, C.; Dwoskin, E.; and Entous, A. 2017. Russian twitter account pretending to be tennessee gop fools celebrities, politicians. http://www.chicagotribune.com/bluesky/technology/ct-russian-twitter-account-tennessee-gop-20171018-story.html.
- Varol, O.; Ferrara, E.; Davis, C. A.; Menczer, F.; and Flammini, A. 2017. Online human-bot interactions: Detection, estimation, and characterization. In ICWSM’17, 280–289. AAAI.
- Weng, J.; Lim, E.-P.; Jiang, J.; and He, Q. 2010. Twitterrank: finding topic-sensitive influential twitterers. In WSDM’10, 261–270. ACM.
- Woolley, S. C., and Guilbeault, D. 2017. Computational propaganda in the united states of america: Manufacturing consensus online: Working paper 2017.5.
- Wu, F., and Huberman, B. A. 2007. Novelty and collective attention. PNAS 104(45):17599–601.
- Yates, A.; Joselow, J.; and Goharian, N. 2016. The news cycle’s influence on social media activity. In ICWSM’16.
- Yu, L.; Cui, P.; Wang, F.; Song, C.; and Yang, S. 2015. From Micro to Macro: Uncovering and Predicting Information Cascading Process with Behavioral Dynamics. In ICDM’15, 559–568.
- Zhao, Q.; Erdogdu, M. A.; He, H. Y.; Rajaraman, A.; and Leskovec, J. 2015. SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet Popularity. In KDD’15.