#DebateNight: The Role and Influence of Socialbots on TwitterDuring the 1st U.S. Presidential Debate

#DebateNight: The Role and Influence of Socialbots on Twitter
During the 1st U.S. Presidential Debate

Marian-Andrei Rizoiu    Timothy Graham    Rui Zhang
and Yifei Zhang and Robert Ackland and Lexing Xie
The Australian National University and Data61 CSIRO, Canberra, Australia.

Serious concerns have been raised about the role of ‘socialbots’ in manipulating public opinion and influencing the outcome of elections by retweeting partisan content to increase its reach. Here we analyze the role and influence of socialbots on Twitter by determining how they contribute to retweet diffusions. We collect a large dataset of tweets during the 1st U.S. Presidential Debate in 2016 (#DebateNight) and we analyze its 1.5 million users from three perspectives: user influence, political behavior (partisanship and engagement) and botness. First, we define a measure of user influence based on the user’s active contributions to information diffusions, i.e. their tweets and retweets. Given that Twitter does not expose the retweet structure – it associates all retweets with the original tweet – we model the latent diffusion structure using only tweet time and user features, and we implement a scalable novel approach to estimate influence over all possible unfoldings. Next, we use partisan hashtag analysis to quantify user political polarization and engagement. Finally, we use the BotOrNot API to measure user botness (the likelihood of being a bot). We build a two-dimensional “polarization map” that allows for a nuanced analysis of the interplay between botness, partisanship and influence. We find that not only social bots are more active on Twitter – starting more retweet cascades and retweeting more – but they are 2.5 times more influential than humans, and more politically engaged. Moreover, pro-Republican bots are both more influential and more politically engaged than their pro-Democrat counterparts. However we caution against blanket statements that software designed to appear human dominates political debates. Firstly, it is known that accounts controlled by teams of humans (e.g. organizational accounts) are often identified as bots. Secondly, we find that many highly influential Twitter users are in fact pro-Democrat and that most pro-Republican users are mid-influential and likely to be human (low botness).


#DebateNight: The Role and Influence of Socialbots on Twitter
During the 1st U.S. Presidential Debate

Marian-Andrei Rizoiu and Timothy Graham and Rui Zhang and Yifei Zhang and Robert Ackland and Lexing Xie The Australian National University and Data61 CSIRO, Canberra, Australia.

Copyright © 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

1 Introduction

Socialbots are broadly defined as “software processes that are programmed to appear to be human-generated within the context of social networking sites such as Facebook and Twitter” (?, p.2). They have recently attracted much attention and controversy, with concerns that they infiltrated political discourse during the 2016 U.S. Presidential election and manipulated public opinion at scale. Concerns were heightened with the discovery that the influential conservative commentator (@Jenn_Abrams, 70,000 followers) and a user claiming to belong to the Tennessee Republican Party (@TEN_GOP, 136,000 followers) were both in fact Russian-controlled bots operated by the Internet Research Agency in St. Petersburg (??).

There are several challenges that arise when conducting large-scale empirical analysis of political influence of bots on Twitter. The first challenge concerns estimating user influence from retweet diffusions, where the retweet relations are unobserved – the Twitter API assigns every retweet to the original tweet in the diffusion. Current state-of-the-art influence estimation methods such as ConTinEst (?) operate on a static snapshot of the diffusion graph, which needs to be inferred from retweet diffusions using approaches like NetRate (?). This workflow suffers from two major drawbacks: first, the algorithms for uncovering the diffusion graph do not scale to millions of users like in our application; second, operating on the diffusion graph estimates the “potential of being influential”, but it loses information about user activity – e.g. a less well connected user can still be influential if they tweet a lot. The question is how to estimate at scale the influence of millions of users from diffusion in which the retweet relation is not observed? The second challenge lies in determining whether a user is a bot and also her political behavior, as manually labeling millions of users is infeasible. The question is therefore how to leverage recent automated bot detection approaches such as BotOrNot (?) to measure the botness of users, and further, how to analyze political behavior (partisanship and engagement) at scale?

This paper addresses the above challenges on #DebateNight, a large dataset of 6.5 million tweets authored by 1.5 million users that was collected on 26 September 2016 during the first U.S. presidential debate.

To address the first challenge, we introduce, evaluate, and apply a novel algorithm to estimate user influence based on retweet diffusions. We model the latent diffusion structure using only time and user features by introducing the diffusion scenario – a possible unfolding of a diffusion – and its likelihood. We implement a scalable algorithm to estimate user influence over all possible diffusion scenarios associated with a diffusion. We demonstrate that our algorithm obtains state-of-the-art performance on a synthetic dataset with known ground truth. We also show that, unlike simpler alternative measures like the number of followers, or the mean size of initiated cascades, our influence measure assigns high scores to both highly-connected users who never start diffusions and to active retweeters with little followership.

We address the second challenge by proposing three new measures (political polarization , political engagement and botness ) and by computing them for each user in #DebateNight. We manually compile a list of partisan hashtags and we estimate political engagement based on the tendency to use these hashtags and political polarization based on whether pro-Democrat or pro-Republican hashtags were predominantly used. We use the BotOrNot API to evaluate botness and to construct four reference populations – Human, Protected, Suspended and Bot. We build a two-dimensional visualization – the polarization map – that enables a nuanced analysis of the interplay between botness, partisanship and influence. We make several new and important findings: (1) bots are more likely to be pro-Republican; (2) bots are more engaged than humans, and pro-Republican bots are more engaged than pro-Democrat bots; (3) the average pro-Republican bot is twice as influential as the average pro-Democrat bot; (4) very highly influential users are more likely to be pro-Democrat; and (5) highly influential bots are mostly pro-Republican.

The main contributions of this work include:

  • We introduce a scalable algorithm to estimate user influence over all possible unfoldings of retweet diffusions where the cascade structure is not observed;

  • We develop two new measures of political polarization and engagement based on usage of partisan hashtags;

  • We measure the botness of a very large population of users engaged in Twitter activity relating to an important political event – the 2016 U.S Presidential debates;

  • We propose the polarization map – a novel visualization of political polarization as a function of user influence and botness – and we use it to gain insights into the influence of bots on the information landscape around the U.S. elections.

2 Related Work

We structure the discussion of previous work into two categories: related work on the estimation of user influence and work concerning bot presence and behavior on Twitter.

Estimating user influence on Twitter. Aggregate measures such as the follower count, the number of retweets and the number of mentions have been shown to be indicative of user influence on Twitter (?). More sophisticated estimates of user influence use eigenvector centrality to account for the connectivity of followers or retweeters; for example, TwitteRank (?) extends PageRank (?) by taking into account topical similarity between users and network structure. Other extensions like Temporal PageRank (?) explicitly incorporate time into ranking to account for a time-evolving network. However, one limitation of PageRank-based methods is that they require a complete mapping of the social networks. More fundamentally, network centrality has the drawback of evaluating only the potential of a user to be influential in spreading ideas or content, and it does not account for the actions of the user (e.g. tweeting on a particular subject). Our influence estimation approach proposed in Sec. 3 is built starting from the user Twitter activity and it does not require knowledge of the social network.

Recent work (??) has focused on estimating user influence as the contribution to information diffusion. For example, ConTinEst (?) requires a complete diffusion graph and employs a random sampling algorithm to approximate user influence with scalable complexity. However, constructing the complete diffusion graph might prove problematic, as current state-of-the-art methods for uncovering the diffusion structure (for e.g. (?????)) do not scale to the number of users in our dataset. This is because these methods assume that a large number of cascades occur in a rather small social neighborhood, whereas in #DebateNight cascades occur during a short period of time in a very large population of users. Our proposed algorithm estimates influence directly from retweet cascades, without the need to reconstruct the retweet graph, and it scales quadratically with the number of users.

Figure 4: Modeling latent diffusions. (a) Example of retweet cascade as provided by the Twitter API, in which all retweets are assigned to the original tweet. (b) Four diffusion scenarios (out of the 120 possible unfoldings), associated with the observed retweet cascade in (a). This example considers that we observe , for . (c) Illustration of a node attaching to a diffusion scenario . Four new diffusion scenarios are generated at as can attach to any of the existing nodes. The influence of each of the nodes colored in red increases as attaches.

Bot presence and behavior on Twitter. The ‘BotOrNot’ Twitter bot detection API uses a Random Forest supervised machine learning classifier to calculate the likelihood of a given Twitter user being a bot, based on more than 1,000 features extracted from meta-data, patterns of activity, and tweet content (grouped into six main classes: user-based; friends; network; temporal; content and language; and sentiment) (??)111See: https://botometer.iuni.iu.edu/#!/. The bot scores are in the range , where 0 (1) means the user is very unlikely (likely) to be a bot. BotOrNot was used to examine how socialbots affected political discussions on Twitter during the 2016 U.S. Presidential Election (?). They found that bots accounted for approximately 15 % (400,000 accounts) of the Twitter population involved in election-related activity, and authored about 3.8 million (19 %) tweets. However, ? (?) sampled the most active accounts, which could bias upwards their estimate of the presence of bots as activity volume is one of the features that is used by BotOrNot. They found that bots were just as effective as humans at attracting retweets from humans. Furthermore, (?) used the BotOrNot API to test 157,504 users randomly sampled from 1,798,127 Twitter users participating in election-related activity during 1-11 November and found that just over 10% were bots. In the present paper we use BotOrNot to classify all 1.5 million users in our dataset to obtain a less biased approximation of their numbers and impact.

Previous work has studied the political partisanship of Twitter bots. ? (?) analyzed candidate-oriented hashtag use during the 1st U.S. Presidential Debate and found that highly automated accounts (self-identified bots and/or accounts that post at least 50 times a day) were disproportionately pro-Trump. ? (?) also studied political partisanship by identifying five pro-Trump and four pro-Clinton hashtags and assigning users to a particular political faction. The results suggested that both humans and bots were more pro-Trump in terms of hashtag partisanship. However, the above findings are limited to a comparison between humans and bots of frequency counts of tweets authored and retweets received, and they provide no insight into the importance of users in retweet diffusions. We overcome this limitation by modeling the latent structure of retweet diffusions and computing user influence over all possible scenarios.

3 Estimating influence in retweet cascades

An information cascade of size is defined as a series of messages sent by user at time , i.e. . Here is the initial message, and with are subsequent reposts or relays of the initial message. In the context of Twitter, the initial message is an original tweet and the subsequent messages are retweets (which by definition, are also tweets) of that original tweet. A latent retweet diffusion graph has the set of tweets as its vertexes , and additional edges that represent that the tweet is a retweet of the tweet, and respects the temporal precedence . Web data interfaces such as the Twitter API provide cascades, but not the diffusion edges . Such missing data makes it impossible to directly measure a given user’s contribution to the diffusion process.

3.1 Modeling latent diffusions

Diffusion scenarios. We focus on tree-structured diffusion graphs, i.e. each node has only one incoming link , . Denote the set of trees that are consistent with the temporal order in cascade as , we call each diffusion tree a diffusion scenario . Fig. (a)a contains a cascade visualized as a star graph, denoting that subsequent tweets are cascading effects from the first tweet at . Fig. (b)b shows four example diffusion scenarios that can lead to this cascade. The main challenge here is to estimate the influence of each user in the cascade, taking into account all possible diffusion trees.

Probability of retweeting. For each tweet , we model the probability of it being a direct descendant of each previous tweet in the same cascade as a weighted softmax function, defined as follows. In line with previous work (?????), we model two factors: firstly, users retweet fresh content (?). The probability of the edge decays exponentially with the time difference ; secondly, users prefer to retweet locally influential users, also known as preferential attachment (?). We measure the local influence of a user using her number of followers (???). We quantify the probability of an edge as:


where controls the temporal decay of the probability and is the number of followers of user (the user of ).

3.2 Tweet influence

Tweet influence over one diffusion scenario. Let be a path in the diffusion scenario – i.e a sequence of nodes which starts with and ends with . is the probability of reaching from . We define the influence of in the scenario as the expected number of users reached from using a model of independent binomials to decide whether or not to take each hop in each path that starts with . Formally:


where is a function that takes the value 1 when the path from to exists in , and 0 otherwise.

Tweet influence over a retweet cascade. We define the influence of over all valid diffusion scenarios :


Under the assumption that retweeting events (i.e. edges) occur independently one from another, the probability of a diffusion scenario is:


It is intractable to directly evaluate Eq. (3) (together with Eq. (2) and (4)), particularly due to the factorial number of diffusion scenarios in (as shown in the online supplement (?, annex LABEL:si-sec:infl-derivation)). For example, there are diffusion scenarios for a cascade of 100 retweets. We develop, in the next section, an efficient algorithm to compute the influence of all tweets.

3.3 Tractable tweet influence computation

The key observation for a tractable computation of Eq. (3) is that tweets are added sequentially at time , to each diffusion scenario constructed at time . Adding to a diffusion scenario has two effects. First, new diffusion scenarios are generated, as can attach to any of the existing nodes. Second, contributes only once to the tweet influence of each tweet that is found on the branch it attaches to and it does not make any other contributions at times . This process is exemplified in Fig. (c)c. Node is added to a given diffusion scenario, generating 4 new diffusion scenarios at time . The nodes colored in red see their influence increase as a result of adding . This allows to compute tweet influence incrementally, by updating at each time . We denote by the value of tweet influence of after adding node . Thus, we only keep track of how tweet influence increases as nodes attach and we do not need to construct all diffusion scenarios.

We define as the contribution of to the tweet influence of , over all possible diffusion scenarios. It can be shown that the influence of increases at time by :


See the online supplement (?, annex LABEL:si-sec:efficient-algo) for the complete derivation from Eq. (3) to Eq. (5) and (6).

Recursive computation of tweet influence. can be alternatively interpreted as the influence of tweet over . Consequently, we can compute the total influence of as the sum of the individual influences of over each of the other nodes in the diffusion. This can be recursively computed as:

0:  A set of retweets
0:  Parameter – temporal decay.
0:  influence matrix
  Initialize matrix
  Initialize matrix
  for  to  do
     for  to  do
     end for
  end for
  column normalize
   (element-wise multiplication)
  for  to  do
  end for
Algorithm 1 Compute influence matrix

Computational complexity for estimating influence. The scalable influence computation algorithm shown in Algorithm 1 uses two matrices: matrix , with and matrix . It computes each column of sequentially, using Eq. (6). Recursively computing each column in takes multiplications/additions, the total computations complexity is . In real cascades containing 1000 tweets, the above algorithm finishes in 34 seconds on a PC. For more details and examples, see the online supplement (?, annex LABEL:si-sec:efficient-algo). Throughout the experiments in Sec. 5 and 6, we set the temporal decay hyper-parameter defined in Eq (1) to , tuned using linear search on a sample of 20 real retweet diffusions (details in the online supplement (?, annex LABEL:si-sec:choose-temp-decay)).

3.4 Computing influence of a user

Given – the set of tweets authored by user –, we define the user influence of as the mean tweet influence of tweets :


To account for the skewed distribution of user influence, we mostly use the normalization – percentiles with a value of 1 for the most influential user our dataset and 0 for the least influential – denoted .

4 Dataset and measures of political behavior

In this section, we first describe the #DebateNight dataset that we collected during the first U.S. presidential debate. Next, we introduce three measures for analyzing the political behavior of users who were active on Twitter during the debate. In Sec. 4.1, we introduce political polarization and political engagement . In Sec. 4.2 we introduce the botness score and we describe how we construct the reference bot and human populations.

The #DebateNight dataset contains Twitter discussions that occurred during the first 2016 U.S presidential debate between Hillary Clinton and Donald Trump. Using the Twitter Firehose API222Via the Uberlink Twitter Analytics Service., we collected all the tweets (including retweets) that were authored during the two hour period from 8.45pm to 10.45pm EDT, on 26 September 2016, and which contain at least one of the hashtags: #DebateNight, #Debates2016, #election2016, #HillaryClinton, #Debates, #Hillary2016, #DonaldTrump and #Trump2016. The time range includes the 90 minutes of the presidential debate, as well as 15 minutes before and 15 minutes after the debate. The resulting dataset contains 6,498,818 tweets, emitted by 1,451,388 twitter users. For each user, the Twitter API provides aggregate information such as the number of followers, the total number (over the lifetime of the user) of emitted tweets, authored retweets, and favorites. For individual tweets, the API provides the timestamp and, if it is a retweet, the original tweet that started the retweet cascade. The #DebateNight dataset contains 200,191 retweet diffusions of size 3 and larger.

4.1 Political polarization and engagement

Figure 5: Wordclouds of partisan hashtags in #DebateNight: Democrat (top) and Republican (bottom). Hashtags sizes are scaled by their frequency.

Protocol. We extracted the 1000 most frequent hashtags in our dataset. Using a content analysis approach (?), we coded each hashtag into two categories: Democrat and Republican. Hashtags that did not have a clear political polarity were not labeled and thus excluded from analysis. Our coding methodology is similar to previous work (??), with the difference that we extract candidate hashtags from the data instead of using a predefined set of partisan hashtags. Fig. 5 presents the wordclouds of the most frequent partisan hashtags, for Democrats (top) and Republicans (bottom). We chose hashtags indicating either strong support for a candidate (e.g., #imwithher for Clinton and #trump2016 for Trump), or opposition and/or antagonism (e.g., #nevertrump and #crookedhillary). This results in 93 Democrat and 86 Republican hashtags.

Two measures of political behavior. We identify 65,031 tweets in #DebateNight that contain at least one partisan hashtag. 1,917 tweets contain partisan hashtags with both polarities: these are mostly negative tweets towards both candidates (e.g. “Let’s Get READY TO RUMBLE AND TELL LIES. #nevertrump #neverhillary #Obama”) or hashtag spam. We count the number of occurrences of partisan hashtags for each user, and we detect a set of 46,906 politically engaged users that have used at least one partisan hashtag. Each politically engaged user has two counts: the number of Democrat hashtags that used, and the number of Republican hashtags. We measure the political polarization as the normalized difference between the number of Republican and Democrat hashtags used:


takes values between (if emitted only Democrat partisan hashtags) and ( emitted only Republican hashtags). We threshold the political polarization to construct a population of Democrat users with and Republican users with . In the set of politically engaged users, there are 21,711 Democrat users, 22,644 Republican users and 2,551 users with no polarization (). We measure the political engagement of users using the total volume of partisan hashtags included in their tweets .

4.2 Botness score and bot detection

Detecting automated bots. We use the BotOrNot (?) API to measure the likelihood of a user being a bot for each of the 1,451,388 users in the #DebateNight dataset. Given a user , the API returns the botness score (with 0 being likely human, and 1 likely non-human). Previous work (???) use a botness threshold of to detect social bots. However, we manually checked a random sample of 100 users with and we found several human accounts being classified as bots. A threshold of 0.6 decreases mis-classification by . It has been previously reported by (?) that organizational accounts have high botness scores. This however is not a concern in this work, as we aim to detect ‘highly automated’ accounts that behave in a non-human way. We chose to use a threshold of to construct the Bot population in light of the more encompassing notion of account automation.

All Prot. Human Susp. Bot
All 1,451,388 45,316 499,822 10,162 17,561
Polarized 44,299 1,245 11,972 265 435
Democrat 21,676 585 5,376 111 185
Republican 22,623 660 6,596 154 250
Dem. % 48.93% 46.99% 44.90% 41.89% 42.53%
Rep. % 51.07% 53.01% 55.10% 58.11% 57.47%
Table 1: Tabulating population volumes and percentages of politically polarized users over four populations: Protected, Human, Suspended and Bot.

Four reference populations. In addition to the Bot population, we construct three additional reference populations: Human contains users with a high likelihood of being regular Twitter users. Protected are the users whose profile has the access restricted to their followers and friends (the BotOrNot system cannot return the botness score); we consider these users to be regular Twitter users, since we assume that no organization or broadcasting bot would restrict access to their profile. Suspended are those users which have been suspended by Twitter between the date of the tweet collection (26 September 2016) and the date of retrieving the botness score (July 2017); this population has a high likelihood of containing bots. Table 1 tabulates the size of each population, split over political polarization.

5 Evaluation of user influence estimation

Figure 9: Evaluation of the user influence measure. (a) 2D density plot (shades of blue) and scatter-plot (gray circles) of user influence against the ground truth on a synthetic dataset. (b)(c) Hexbin plot of user influence percentile (x-axis) against mean cascade size percentile (b) and the number of followers (c) (y-axis) on #DebateNight. The color intensity indicates the number of users in each hex bin. 1D histograms of each axis are shown using gray bars. Note of all users that initiate cascades are never retweeted.
Figure 15: Profiling behavior of the Protected, Human, Suspended and Bot populations in the #DebateNight dataset. The numbers in parentheses in the legend are mean values. (a) CCDF of the number of Twitter diffusion cascades started. (b) CCDF of the number of retweets. (c)(d) CCDF (c) and boxplots (d) of the number of followers. (e) Number of items favorited.

In this section, we evaluate our proposed algorithm and measure of user influence. In Sec 5.1, we evaluate on synthetic data against a known ground truth. In Sec. 5.2, we compare the measure (defined in Sec. 3.4) against two alternatives: the number of followers and the mean size of initiated cascades.

5.1 Evaluation of user influence

Evaluating user influence on real data presents two major hurdles. The first is the lack of ground truth, as user influence is not directly observed. The second hurdle is that the diffusion graph is unknown, which renders impossible comparing to state-of-the-art methods which require this information (e.g. ConTinEst (?)). In this section, evaluate our algorithm against a known ground truth on a synthetic dataset, using the same evaluation approach used for ConTinEst.

Evaluation on synthetic data. We evaluate on synthetic data using the protocol previously employed in (?). We use the simulator in (?) to generate an artificial social network with 1000 users. We then simulate 1000 cascades through this social network, starting from the same initial user. The generation of the synthetic social network and of the cascades is detailed in the online supplement (?, annex LABEL:si-sec:generation-artificial). Similar to the retweet cascades in #DebateNight, each event in the synthetic cascades has a timestamp and an associated user. Unlike the real retweet cascades, we know the real diffusion structure behind each synthetic cascade. For each user , we count the number of nodes reachable from in the diffusion tree of each cascade. We compute the influence of as the mean influence over all cascades. ConTinEst (?) has been shown to asymptotically approximate this synthetic user influence.

We use our algorithm introduced in Sec. 3.3 on the synthetic data, to compute the measure defined in Eq. 7. We plot in Fig. (a)a the 2D scatter-plot and the density plot of the synthetic users, with our influence measure on the y-axis and the ground truth on the x-axis (both in percentiles). Visibly, there is a high agreement between the two measures, particularly for the most influential and the least influential users. The Spearman correlation coefficient of the raw values is . This shows that our method can obtain state-of-the-art performances, in the absence of any information about the structure of the diffusions.

5.2 Comparison with other influence metrics

We compare the influence measure against two alternatives that can be computed on #DebateNight.

Mean size of initiated cascades (of a user ) is the average number of users reached by original content authored by . It should be noted that this measure does not capture ’s role in diffusing content authored by someone else. In the context of Twitter, mean size of initiated cascades is the average number of users who retweeted an original tweet authored by : we compute this for every user in the #DebateNight dataset, and we plot it against in Fig. (b)b. Few users have a meaningful value for mean cascade size: of users never start a cascades (and they are not accounted for in Fig. (b)b); out of the ones that start cascades are never retweeted and they are all positioned at the lowest percentile (shown by the 1D histograms in the plot). It is apparent that the mean cascade size metric detects the influential users that start cascades, and it correlates with . However, it misses highly influential users who never initiate cascades, but who participate by retweeting. Examples are user @SethMacFarlane (the actor and filmmaker Seth MacFarlane, 10.8 million followers) or user @michaelianblack (comedian Michael Ian Black, 2.1 million followers), both with in the top most influential users.

Number of followers is one of the simplest measures of direct influence used in literature (??). While being loosely correlated with (visible in Fig. (c)c, Pearson ), it has the drawback of not accounting for any of the user actions, such as an active participation in discussions or generating large retweet cascades. For example, user @PoliticJames (alt-right and pro-Trump, 2 followers) emitted one tweet in #DebateNight, which was retweeted 18 times and placing him in the top most influential users. Similarly, user @tiwtter1tr4_tv (now suspended, 0 followers) initiated a cascade of size 58 (top most influential). Interestingly, half of the accounts scoring on the bottom by number of followers and top by influence are now suspended or have very high botness scores.

6 Results and findings

In this section, we present an analysis of the interplay between botness, political behavior (polarization and engagement) and influence. In Sec. 6.1, we first profile the activity of users in the four reference populations; next, we analyze the political polarization and engagement, and their relation with the botness measure. Finally, in Sec. 6.2 we tabulate user influence against polarization and botness, and we construct the polarization map.

Figure 22: Political polarization, engagement and botness. (a) The density distribution of political polarization . (b) Log-log plot of the CCDF of political engagement for the Democrat and Republican populations. (c) The density distribution of botness for the entire population (solid line) and the politically polarized population (dashed line). (d) The conditional density of polarization conditioned on botness. The top panel shows the volumes of politically polarized users in 30 bins. (e)(f) CCDF of political engagement for the reference populations (e) and for the polarized Bot populations (f).

6.1 Political behavior of humans and bots

Twitter activity across four populations. We measure the behavior of users in the four reference populations defined in Sec. 4.1 using several measures computed from the Twiter API. The number of cascades started (i.e. number of original tweets) and the number of posted retweets are simple measures of activity on Twitter, and they are known to be long-tail distributed (?). Fig. (a)a and (b)b respectively plot the log-log plot of the empirical Complementary Cumulative Distribution Function (CCDF) for each of the two measures. It is apparent that users in the Bot and Suspended populations exhibit higher levels of activity than the general population, whereas the Human and Protected populations exhibit lower level. Fig. (c)c and (d)d plot the number of followers and present a more nuanced story: the average bot user has 10 times more followers than the average human user; however, bots have a median of followers, less than the median followers of human users. In other words, some bots are very highly followed, but most are simply ignored. Finally, Fig. (e)e shows that bots favorite less than humans, indicating that their activity patterns differ from those of humans.

Political polarization and engagement. The density distribution of political polarization (Fig. (a)a) shows two peaks at -1 and 1, corresponding to strongly pro-Democrat and strongly pro-Republican respectively. The shape of the density plot is consistent with the sizes of Republican and Democrat populations (Sec. 4.1), and the extreme bi-modality can be explained by the clear partisan nature of the chosen hashtags and by the known political polarization of users on Twitter (??), which will be greatly enhanced in the context of a political debate. Fig. (b)b presents the log-log plot of the CCDF of the political engagement, which shows that the political engagement score is long-tail distributed, with pro-Democrats slightly more engaged than pro-Republicans overall.

Botness and political polarization. The distribution of botness exhibits a large peak around and a long tail (Fig. (c)c). The dashed gray vertical lines show the threshholds used in Sec. 4.2 for constructing the reference Human () and Bot () populations. The density distribution for politically polarized users is skewed towards higher botness, showing that politically polarized users are more likely to be automated systems. Fig. (d)d shows the conditional density of polarization conditioned on botness. While the likelihood of being pro-Democrat or pro-Republican varies significantly with botness, for high botness scores the likelihood of being Republican is consistently higher than that of being pro-Democrat. In other words, socialbots accounts are more likely to be pro-Republican.

Political engagement of bots. Fig. (e)e shows the CCDF of political engagement of the four reference populations, and it is apparent that the Bot and Suspended populations exhibit consistently higher political engagement than the Human and Protected populations. Fig. (f)f shows the CCDF of political engagement by the political partisanship of bots and we find that pro-Republican Bot accounts are more politically engaged than their pro-Democrat counterparts. In summary, socialbots are more engaged than humans, and pro-Republican bots are more engaged than their pro-Democrat counterparts.

Figure 27: Profiling influence, and linking to botness and political behavior. (a)(b) User influence for the reference populations, shown as log-log CCDF plot (a) and boxplots (b). (c) Probability distribution of polarization, conditional on . (d) Boxplots of user influence for the pro-Democrat and pro-Republican Bot users. Numbers in parenthesis show mean values.
Figure 28: Political polarization by user influence (x-axis) and bot score (y-axis). The y-axis is re-scaled so that an equal length interval around any botness value contains the same amount of users. This allows to zoom in into denser areas like . The gray dashed horizontal line shows the threshold of 0.6 above which a user is considered a bot. The color in the map shows political polarization: areas colored in bright blue (red) are areas where the Democrats (Republicans) have considerably higher density than Republicans (Democrats). Areas where the two populations have similar densities are colored white.

6.2 User influence and polarization map

User influence across four populations. First, we study the allocation of user influence across the four reference populations constructed in Sec. 4.2. We plot the CCDF in Fig. (a)a and we summarize user influence as boxplots in Fig. (b)b for each population. User influence is long-tail distributed (shown in Fig. (a)a) and it is higher for Bot and Suspended populations, than for Human and Protected (shown in Fig (b)b). There is a large discrepancy between the influence of Human and Bot, with the average bot having 2.5 times more influence than the average human. We further break down users in the Bot population based on their political polarization. Fig. (d)d aggregates as boxplots the influence of pro-Democrat and pro-Republican bots (note: not all bots are politically polarized). Notably, on a per-bot basis, pro-Republican bots are more influential than their pro-Democrat counterparts – the average pro-Republican bot is twice as influential as the average pro-Democrat bot.

Political polarization and user influence. Next, we analyze the relation between influence and polarization. Fig. (c)c plots the probability distribution of political polarization, conditioned on user influence . While for mid-range influential users () the likelihood of being Republican is higher than being Democrat, we observe the inverse situation on the higher end of the influence scale. Very highly influential users () are more likely to be pro-Democrat, and this is consistent with the fact that many public figures were supportive of the Democrat candidate during the presidential campaign.

The polarization map. Finally, we create a visualization that allows us to jointly account for botness and user influence when studying political partisanship. We project each politically polarized user in #DebateNight onto the two-dimensional space of user influence (x-axis) and botness (y-axis). We compute the 2D density estimates for the pro-Democrat and pro-Republican users (shown in the online supplement (?, annex LABEL:si-sec:polarization-map)). For each point in the space we compute a score as the log of the ratio between the density of the Republican users and that of the pro-Democrats, which gets renormalized to -1 (mostly Democrat) to +1 (mostly Republican). The resulting map – dubbed the polarization map – is shown in Fig. 28 and it provides a number of insights. We find a pro-Democrat area corresponding to highly influential users (already shown in Fig. (c)c) that spans across most of the range of botness values. However the largest predominantly pro-Republican area corresponds to mid-range influence (also shown in Fig. (c)c), but concentrates around small botness values – which indicates the presence of a large pro-Republican population of mainly human users with regular user influence. We also observe that top-right area (with high botness and high influence) is predominantly red. In other words highly influential bots are mostly pro-Republican.

7 Discussion

In this paper, we study the influence and the political behavior of socialbots. We introduce a novel algorithm for estimating user influence from retweet cascades in which the diffusion structure is not observed. We propose four measures to analyze the role and user influence of bots versus humans on Twitter during the 1st U.S. Presidential Debate of 2016. The first is the user influence, computed over all possible unfoldings of each cascade. Second, we use the BotOrNot API to retrieve the botness score for a large number of Twitter users. Lastly, by examining the 1000 most frequently-used hashtags we measure political polarization and engagement. We analyze the interplay of influence, botness and political polarization using a two-dimensional map – the polarization map. We make several novel findings, for example: bots are more likely to be pro-Republican; the average pro-Republican bot is twice as influential as its pro-Democrat counterpart; very highly influential users are more likely to be pro-Democrat; and highly influential bots are mostly pro-Republican.

Validity of analysis with respect to BotOrNot. The BotOrNot algorithm uses tweet content and user activity patterns to predict botness. However, this does not confound the conclusions presented in Sec. 6. First, political behavior (polarization and engagement) is computed from a list of hashtags specific to #DebateNight, while the BotOrNot predictor was trained before the elections took place and it has no knowledge of the hashtags used during the debate. Second, a loose relation between political engagement and activity patterns could be made, however we argue that engagement is the number of used partisan hashtags, not tweets – i.e. users can have a high political engagement score after emitting few very polarized tweets.

Assumptions, limitations and future work. This work makes a number of simplifying assumptions, some of which can be addressed in future work. First, the delay between the tweet crawling (Sept 2016) and computing botness (July 2017) means that a significant number of users were suspended or deleted. A future application could see simultaneous tweets and botscore crawling. Second, our binary hashtag partisanship characterization does not account for independent voters or other spectra of democratic participation, and future work could evaluate our approach against a clustering approach using follower ties to political actors (?). Last, this work computes the expected influence of users in a particular population, but it does not account for the aggregate influence of the population as a whole. Future work could generalize our approach to entire populations, which would allow answers to questions like “Overall, were the Republican bots more influential than the Democrat humans?”.


  • [Barabási 2005] Barabási, A.-L. 2005. The origin of bursts and heavy tails in human dynamics. Nature 435(7039):207–11.
  • [Barberá et al. 2015] Barberá, P.; Jost, J. T.; Nagler, J.; Tucker, J. A.; and Bonneau, R. 2015. Tweeting from left to right: Is online political communication more than an echo chamber? Psychological Science 26(10):1531–1542.
  • [Bessi and Ferrara 2016] Bessi, A., and Ferrara, E. 2016. Social bots distort the 2016 u.s. presidential election online discussion. First Monday 21(11).
  • [Cha et al. 2010] Cha, M.; Haddadi, H.; Benevenuto, F.; and Gummadi, K. P. 2010. Measuring User Influence in Twitter: The Million Follower Fallacy. In ICWSM ’10, volume 10, 10–17.
  • [Chikhaoui et al. 2017] Chikhaoui, B.; Chiazzaro, M.; Wang, S.; and Sotir, M. 2017. Detecting communities of authority and analyzing their influence in dynamic social networks. ACM Trans. Intell. Syst. Technol. 8(6):82:1–82:28.
  • [Cho et al. 2013] Cho, Y.-S.; Galstyan, A.; Brantingham, P. J.; and Tita, G. 2013. Latent self-exciting point process model for spatial-temporal networks. Disc. and Cont. Dynamic Syst. Series B.
  • [Collins and Cox 2017] Collins, B., and Cox, J. 2017. Jenna abrams, russia’s clown troll princess, duped the mainstream media and the world. https://www.thedailybeast.com/jenna-abrams-russias-clown-troll-princess-duped-the-mainstream-media-and-the-world.
  • [Conover et al. 2011] Conover, M. D.; Ratkiewicz, J.; Francisco, M.; Goncalves, B.; Menczer, F.; and Flammini, A. 2011. Political polarization on Twitter. In ICWSM’11, 89–96. AAAI.
  • [Crane and Sornette 2008] Crane, R., and Sornette, D. 2008. Robust dynamic classes revealed by measuring the response function of a social system. PNAS 105(41):15649–15653.
  • [Davis et al. 2016] Davis, C. A.; Varol, O.; Ferrara, E.; Flammini, A.; and Menczer, F. 2016. Botornot: A system to evaluate social bots. In WWW Companion, 273–274. ACM.
  • [Du et al. 2013] Du, N.; Song, L.; Gomez-Rodriguez, M.; and Zha, H. 2013. Scalable Influence Estimation in Continuous-Time Diffusion Networks. In NIPS’13, 3147–3155.
  • [Gehl and Bakardjieva 2016] Gehl, R., and Bakardjieva, M. 2016. Socialbots and Their Friends: Digital Media and the Automation of Sociality.
  • [Kim and Kuljis 2010] Kim, I., and Kuljis, J. 2010. Applying content analysis to web-based content. Jour. of Comp. and Inf. Tech. 18(4):369–375.
  • [Kollanyi, Howard, and Woolley 2016] Kollanyi, B.; Howard, P. N.; and Woolley, S. C. 2016. Bots and automation over twitter during the first u.s. presidential debate: Comprop data memo 2016.1. Oxford, UK.
  • [Kwak et al. 2010] Kwak, H.; Lee, C.; Park, H.; and Moon, S. 2010. What is twitter, a social network or a news media? In WWW’10, 591–600.
  • [Li and Zha 2013] Li, L., and Zha, H. 2013. Dyadic event attribution in social networks with mixtures of hawkes processes. In CIKM’13, 1667–1672. ACM.
  • [Linderman and Adams 2014] Linderman, S., and Adams, R. 2014. Discovering latent network structure in point process data. In ICML’14, 1413–1421.
  • [Mishra, Rizoiu, and Xie 2016] Mishra, S.; Rizoiu, M.-A.; and Xie, L. 2016. Feature driven and point process approaches for popularity prediction. In CIKM’16, 1069–1078.
  • [Page et al. 1999] Page, L.; Brin, S.; Motwani, R.; and Winograd, T. 1999. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab.
  • [Rizoiu et al. 2017] Rizoiu, M.-A.; Xie, L.; Sanner, S.; Cebrian, M.; Yu, H.; and Van Hentenryck, P. 2017. Expecting to be HIP: Hawkes Intensity Processes for Social Media Popularity. In WWW’17, 735–744.
  • [Rodriguez, Balduzzi, and Schölkopf 2011] Rodriguez, M. G.; Balduzzi, D.; and Schölkopf, B. 2011. Uncovering the Temporal Dynamics of Diffusion Networks. In ICML’11, 561–568.
  • [Rozenshtein and Gionis 2016] Rozenshtein, P., and Gionis, A. 2016. Temporal pagerank. In ECML-PKDD’16, 674–689. Springer.
  • [Shen et al. 2014] Shen, H.; Wang, D.; Song, C.; and Barabási, A. 2014. Modeling and Predicting Popularity Dynamics via Reinforced Poisson Processes. In AAAI’14, number 3, 291–297. AAAI Press.
  • [Simma and Jordan 2010] Simma, A., and Jordan, M. I. 2010. Modeling events with cascades of poisson processes. UAI’10.
  • [sup 2018] 2018. Appendix: #DebateNight: The role and the influence of socialbots on twitter during the 1st U.S. presidential debate. https://www.dropbox.com/s/b5zszszlccb37fu/ICWSM-2018-SI.pdf?dl=0.
  • [Timberg, Dwoskin, and Entous 2017] Timberg, C.; Dwoskin, E.; and Entous, A. 2017. Russian twitter account pretending to be tennessee gop fools celebrities, politicians. http://www.chicagotribune.com/bluesky/technology/ct-russian-twitter-account-tennessee-gop-20171018-story.html.
  • [Varol et al. 2017] Varol, O.; Ferrara, E.; Davis, C. A.; Menczer, F.; and Flammini, A. 2017. Online human-bot interactions: Detection, estimation, and characterization. In ICWSM’17, 280–289. AAAI.
  • [Weng et al. 2010] Weng, J.; Lim, E.-P.; Jiang, J.; and He, Q. 2010. Twitterrank: finding topic-sensitive influential twitterers. In WSDM’10, 261–270. ACM.
  • [Woolley and Guilbeault 2017] Woolley, S. C., and Guilbeault, D. 2017. Computational propaganda in the united states of america: Manufacturing consensus online: Working paper 2017.5.
  • [Wu and Huberman 2007] Wu, F., and Huberman, B. A. 2007. Novelty and collective attention. PNAS 104(45):17599–601.
  • [Yates, Joselow, and Goharian 2016] Yates, A.; Joselow, J.; and Goharian, N. 2016. The news cycle’s influence on social media activity. In ICWSM’16.
  • [Yu et al. 2015] Yu, L.; Cui, P.; Wang, F.; Song, C.; and Yang, S. 2015. From Micro to Macro: Uncovering and Predicting Information Cascading Process with Behavioral Dynamics. In ICDM’15, 559–568.
  • [Zhao et al. 2015] Zhao, Q.; Erdogdu, M. A.; He, H. Y.; Rajaraman, A.; and Leskovec, J. 2015. SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet Popularity. In KDD’15.

Contents (Appendix)

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description