Twitter Corpus of the #BlackLivesMatter Movement And Counter Protests: 2013 to 2020
Abstract
Black Lives Matter (BLM) is a grassroots movement protesting violence towards Black individuals and communities with a focus on police brutality. The movement has gained significant media and political attention following the killings of Ahmaud Arbery, Breonna Taylor, and George Floyd and the shooting of Jacob Blake in 2020 [1]. Due to its decentralized nature, the #BlackLivesMatter social media hashtag has come to both represent the movement and been used as a call to action. Similar hashtags have appeared to counter the BLM movement, such as #AllLivesMatter and #BlueLivesMatter. We introduce a data set of 41.8 million tweets from 10 million users which contain one of the following keywords: BlackLivesMatter, AllLivesMatter and BlueLivesMatter
Social media Twitter hashtags social movements protests policing
1 Value of the Data
-
These data are useful because they showcase the entire course of a large, ongoing social movement (Black Lives Matter) and its counter protests (All Lives Matter and Blue Lives Matter). To our knowledge, no other Twitter data sets exist that cover the entire span of the Black Lives Matter movement to date.
-
All researchers interested in systemic racism, social movements, grassroots campaigns, racial inequality, police brutality and counter protests, especially those working in the fields of computational social science, communications, and political science, can benefit from this data.
-
The data set contains 41.8 million posts from 10 million users and can be used to identify linguistic patterns associated with the social movements and their counter protests, social networks (through friend/follower user data), temporal and spatial patterns (through the use of timestamps and latitude/longitude coordinates), inter- and intra- movement dialog and the spread of news and misinformation (through retweets and tweets linking news articles).
-
Since 2013, the BLM movement has grown exponentially, resulting in global protests and several counter protests. This historical data, starting in 2013 and ending in 2020, permits researchers to track this grassroots movement from its social media beginnings.
2 Data Description
Tweets containing the keywords BlackLivesMatter, AllLivesMatter and BlueLivesMatter were collected from the Twitter API from January 2013 to June 2020. Table 1 contains counts of total number of tweets and users for the entire data set and each keyword. It also includes counts for the following: retweets (original tweets which are shared by other users on the platform), replies (tweets which directly respond to another tweet), geotagged (latitude/longitude coordinates associated with the tweet) and top languages (automatically detected language of the tweet). Retweets may or may not contain additional content created by the user doing the retweeting.
Tweets | Users | Retweets | Replies | Geotagged | Top Languages | |
---|---|---|---|---|---|---|
All | 41,801,153 | 10,136,019 | 30,377,162 | 2,033,245 | 69,969 | en, fr, es, pt, ja |
BlackLivesMatter | 36,892,699 | 9,543,924 | 27,565,206 | 1,583,077 | 61,392 | en, fr, es, pt, ja |
AllLivesMatter | 3,001,012 | 1,462,712 | 1,463,972 | 368,035 | 8,977 | en, es, nl, ja, fr |
BlueLivesMatter | 3,352,437 | 811,805 | 2,174,139 | 195,525 | 2,049 | en, fr, es, ja, de |
Tweets also contain a large number of other pieces of metadata, such as user profile data and place information. User profiles contain information such as user handles, free text descriptions and profile images. Places are named locations users decide to associate with a tweet. While Places describe physical locations, they do not imply the tweet originated from this location. Twitter users may manually tag a location when their tweet is about that Place. Due to the large number of additional fields available for each tweet, we do not provide counts for any additional content.
The monthly volume of each keyword is plotted in Figure 1. Here we plot the seven day running average of the total count (logged) of all tweets containing one of our keywords. All labels marked with a single name indicate the date of high profile police brutality-related killings.

3 Experimental Design, Materials and Methods
On July 14, 2016, we set up a data puller using the Python package TwitterMySQL
We note that the Twitter API limits such streams to 1% of the total Twitter volume at any given moment. To see if our keyword data set was limited at any point, we compared the monthly keyword volume to a full 1% monthly pull (not limited to any single keyword, location, etc.). Over the 4-year time span, our keyword data set pulled in a monthly average of 1,176,161 tweets (4,629,878 SD) as compared to a monthly average of 94,893,476 tweets (27,394,826 SD) from the full 1% pull. Thus, we do not believe our data set was limited by the Twitter API.
Due to server maintenance, there were periods when we were unable to collect data. These include: October 17, 2016 through November 23, 2016; January 1, 2017 through January 21, 2017; March 11, 2017 through March 16, 2017; May 2, 2018 through December 18, 2018; and March 16, 2019 through March 20, 2019. Additionally, the Black Lives Matter movement began in 2013, roughly three years before the beginning of our data collection. In order to fill these gaps, we used the Python package GetOldTweets3 [9], which pulls historical tweets containing a given keyword. These tweets were pulled in June 2020. Using this method, we collected 4,276,423 historical tweets.
Having two separate methods of pulling tweet data (prospective using the streaming API and retrospective using GetOldTweets3
Due to Twitter’s Terms of Service, only numeric tweet IDs can be publicly shared. The numeric IDs can be used to pull the full tweet set using the Twitter API. There are a number of open source software packages which allow researchers to easily interface with the API.
The authors used the Python package TwitterMySQL, which saves tweet information in a MySQL database.
Other packages exist which do not rely on relational databases, such as the Python package twarc
4 Ethics Statement
The data used in this article is publicly available and distributed within Twitter’s Terms of Services. Additionally, no human subjects were used in the data collection.
5 Acknowledgments
This research was supported in part by the Intramural Research Program of the NIH, National Institute on Drug Abuse (NIDA).
6 Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships which have, or could be perceived to have, influenced the work reported in this article.
Footnotes
References
- (2020-06) #BlackLivesMatter surges on twitter after george floyd’s death. Pew Research Center. Note: \urlhttps://www.pewresearch.org/fact-tank/2020/06/10/blacklivesmatter-surges-on-twitter-after-george-floyds-death/ External Links: Link Cited by: Twitter Corpus of the #BlackLivesMatter Movement And Counter Protests: 2013 to 2020.
- (2019) Tweeting for social justice in# ferguson: affective discourse in twitter hashtags. new media & society 21 (7), pp. 1636–1653. Cited by: Twitter Corpus of the #BlackLivesMatter Movement And Counter Protests: 2013 to 2020.
- (2018) Divergent discourse between protests and counter-protests: #blacklivesmatter and #alllivesmatter. PloS one 13 (4), pp. e0195644. Cited by: Twitter Corpus of the #BlackLivesMatter Movement And Counter Protests: 2013 to 2020.
- (2017) The social media response to black lives matter: how twitter users interact with black lives matter through hashtag use. Ethnic and racial studies 40 (11), pp. 1814–1830. Cited by: Twitter Corpus of the #BlackLivesMatter Movement And Counter Protests: 2013 to 2020.
- (2018) Important tweets matter: predicting retweets in the# blacklivesmatter talk on twitter. Computers in Human Behavior 85, pp. 106–115. Cited by: Twitter Corpus of the #BlackLivesMatter Movement And Counter Protests: 2013 to 2020.
- (2018) Scaling social movements through social media: the case of black lives matter. Social Media+ Society 4 (4), pp. 2056305118807911. Cited by: Twitter Corpus of the #BlackLivesMatter Movement And Counter Protests: 2013 to 2020.
- (2016) Narrative agency in hashtag activism: the case of #blacklivesmatter. Media and communication 4 (4), pp. 13. Cited by: Twitter Corpus of the #BlackLivesMatter Movement And Counter Protests: 2013 to 2020.
