Detecting Malicious Content on Facebook

Detecting Malicious Content on Facebook

Prateek Dewan, Ponnurangam Kumaraguru
Indraprastha Institute of Information Technology - Delhi (IIITD), India
Cybersecurity Education and Research Centre (CERC)
{prateekd, pk}

Online Social Networks (OSNs) witness a rise in user activity whenever an event takes place. Malicious entities exploit this spur in user-engagement levels to spread malicious content that compromises system reputation and degrades user experience. It also generates revenue from advertisements, clicks, etc. for the malicious entities. Facebook, the world’s biggest social network, is no exception and has recently been reported to face much abuse through scams and other type of malicious content, especially during news making events. Recent studies have reported that spammers earn $200 million just by posting malicious links on Facebook. In this paper, we characterize malicious content posted on Facebook during 17 events, and discover that existing efforts to counter malicious content by Facebook are not able to stop all malicious content from entering the social graph. Our findings revealed that malicious entities tend to post content through web and third party applications while legitimate entities prefer mobile platforms to post content. In addition, we discovered a substantial amount of malicious content generated by Facebook pages. Through our observations, we propose an extensive feature set based on entity profile, textual content, metadata, and URL features to identify malicious content on Facebook in real time and at zero-hour. This feature set was used to train multiple machine learning models and achieved an accuracy of 86.9%. The intent is to catch malicious content that is currently evading Facebook’s detection techniques. Our machine learning model was able to detect more than double the number of malicious posts as compared to existing malicious content detection techniques. Finally, we built a real world solution in the form of a REST based API and a browser plug-in to identify malicious Facebook posts in real time.

Detecting Malicious Content on Facebook

Prateek Dewan, Ponnurangam Kumaraguru Indraprastha Institute of Information Technology - Delhi (IIITD), India Cybersecurity Education and Research Centre (CERC) {prateekd, pk}

Copyright © 2015, Association for the Advancement of Artificial Intelligence ( All rights reserved.


Social network activity rises considerably during events that make the news, like sports, natural calamities, etc. (?). The FIFA World Cup in 2014, for example, saw a record-breaking 350 million users generating over 3 billion posts on Facebook over a period of 32 days. 111 Such colossal magnitude of activity makes OSNs an attractive venue for malicious entities. Facebook, world’s biggest social network, is no exception. Being the most preferred OSN for users to get news (?), Facebook is potentially one of the most lucrative OSNs for malicious entities. Recently, a group of malicious users exploited the famous biting incident during the 2014 FIFA World Cup, where an Uruguayan player was banned for biting an opponent. Attackers used the viral nature of this incident to spread links on Facebook, pointing to a phishing page prompting visitors to sign a petition in defense of the Uruguayan player. 222 The petition required a user to sign in with details such as name, country of residence, mobile phone number and email address. The petitioner could potentially end up on a spam mailing list, on the receiving end of a malicious attachment or even subjected to a targeted attack. In another recent incident of Malaysian Airline MH17 flight crash, scammers placed dozens of so-called ‘community pages’ on Facebook, dedicated to victims of the tragedy. On the page, Facebook users were tricked into clicking links showing extra or unseen footage of the crash. Instead of seeing a video, they were led to various pop-up ads for porn sites or online casinos. 333 Such activity not only violates Facebook’s terms of service, but also degrades user experience. It has been claimed that Facebook spammers make $200 million just by posting links. 444 Facebook has confirmed spam as a serious issue, and taken steps to reduce spam and malicious content in users’ newsfeed recently (?).

The problem of identifying malicious content is not specific to Facebook and has been widely studied on other OSNs in the past. Researchers have used feature based machine learning models to detect spam and other types of malicious content on OSNs like Twitter, and achieved good results (??). However, existing approaches to detect malicious content in other OSNs like Twitter, cannot be directly ported to Facebook because they heavily rely on features that aren’t publicly available from Facebook. These include profile, and network information, age of the account, total number of messages posted, number of social connections, etc.

In this paper, we highlight that existing techniques used by Facebook for countering malicious content do not eliminate all malicious posts completely. Although Facebook’s immune system (?) seems to perform well at protecting its users from malicious content, our focus is on detecting the fraction of content which evades this system. We identify some key characteristics of malicious content spread on Facebook, which distinguishes it from legitimate content. Our dataset consists of 4.4 million public posts generated by 3.3 million unique entities during 17 events, across a 16 month time frame (April 2013 - July 2014). We then propose an extensive set of 42 features which can be used to distinguish malicious content from legitimate content in real time and at zero-hour. We emphasize on zero-hour detection because content on OSNs spreads like wildfire, and can reach thousands of users within seconds. Such velocity and reach of OSN content makes it hard to control the spread of malicious content, if not detected instantly. We apply machine learning techniques to identify malicious posts on Facebook using this feature set and achieve a maximum accuracy of 86.9% using the Random Forest classifier. We also compare our technique with past research and find that our machine learning model is able to detect more than twice the number of malicious posts as compared to clustering based campaign detection techniques used in the past.

Related work

Facebook has its own immune system (?) to safeguard its users from unwanted malicious content. Researchers at Facebook built and deployed a coherent, scalable, and extensible real time system to protect their users and the social graph. This system performs real time checks and classifications on every read and write action. Designers of this complex system used an exhaustive set of components and techniques to differentiate between legitimate actions and spam. These components were standard classifiers like Random Forest, Support Vector Machines, Logistic Regression, a feature extraction language, dynamic model loading, a policy engine, and feature loops. Interestingly, despite this complex immune system deployed by Facebook, unwanted spam, phishing, and other malicious content continues to exist and thrive on Facebook. Although the immune system deployed by Facebook utilizes a variety of techniques to safeguard its users, authors did not present an evaluation of the system to suggest how accurately and efficiently the system is able to capture malicious content.

Detection of malicious content on Facebook

Gao et al., in 2010, presented an initial study to quantify and characterize spam campaigns launched using accounts on Facebook (?). They studied a large anonymized dataset of 187 million asynchronous “wall” messages between Facebook users, and used a set of automated techniques to detect and characterize coordinated spam campaigns. Authors detected roughly 200,000 malicious wall posts with embedded URLs, originating from more than 57,000 user accounts. Following up their work, Gao et al. presented an online spam filtering system that could be deployed as a component of the OSN platform to inspect messages generated by users in real-time (?). Their approach focused on reconstructing spam messages into campaigns for classification rather than examining each post individually. They were able to achieve a true positive rate of slightly over 80% using this technique, and achieved an average throughput of 1,580 messages/sec with an average processing latency of 21.5ms on their Facebook dataset of 187 million wall posts. However, the clustering approach used by authors always marked a new cluster as non malicious, and was unable to detect malicious posts if the system had not seen a similar post before. We overcome this drawback in our work by eliminating dependence on post similarity, and using classification instead of clustering.

In an attempt to protect Facebook users from malicious posts, Rahman et al. designed an efficient social malware detection method which took advantage of the social context of posts (?). Authors were able to achieve a maximum true positive accuracy rate of 97%, using a SVM based classifier trained on 6 features, and requiring 46 milliseconds to classify a post. This model was then used to develop MyPageKeeper 555, a Facebook app to protect users from malware and malicious posts. Similar to Gao et al’s work (?), this work was also targeted at detecting spam campaigns, and relied on message similarity features.

Detection of malicious content on other OSNs

Multiple machine learning based techniques have been proposed in the past to detect malicious content on other social networks such as Twitter and YouTube (????). The efficiency of such techniques comes from features like age of the account, number of social connections, past messages of the user, etc. (?). However, none of these features are available on Facebook publicly. Other techniques make use of OSN specific features like user replies, user mentions, retweets (Twitter) (?), post views and ratings (YouTube) (?) to identify malicious content, which cannot be ported to Facebook. Blacklists have been shown to be highly ineffective initially, capturing less than 20% URLs at zero-hour (?). None of these techniques can thus be used for efficient zero-hour detection of malicious content on Facebook. To the best of our knowledge, our technique is one of the first attempts towards zero-hour detection of malicious content on Facebook.

Most aforementioned techniques to identify malicious content on Facebook largely rely on message similarity features. Such techniques are reasonably efficient in detecting content which they have seen in the past, for example, campaigns. However, none of the these techniques are capable of detecting malicious posts which their systems haven’t seen in the past. Researchers have acknowledged this drawback (?) and zero-hour detection of malicious content on Facebook remains an unaddressed problem.


There exists a wide range of malicious content on OSNs today. These include phishing URLs, spreading malware, advertising campaigns, content originating from compromised profiles, artificial reputation gained through fake likes, etc. We do not intend to address all such attacks. We focus our analysis on identifying posts containing one or more malicious URLs and creating automated means to detect such posts in real time, without looking at the landing pages of the URLs. We emphasize on not visiting the landing pages of URLs since this process induces time lag and increases the time taken by real time systems to make a judgment on a post. Existing methods involve detection of such malicious posts by grouping them into campaigns (?), or by looking up public blacklists like PhishTank, Google Safebrowsing, etc. to identify malicious URLs. However, as previously discussed, both campaign detection techniques and URL blacklists prove ineffective while the attack is new. For the scope of this work, we refer to a post as malicious if it contains one or more malicious URLs.

Data collection

We collected data using Facebook’s Graph API Search endpoint (?) during 17 events that took place between April 2013 and July 2014. The search endpoint in version 1.0 of the Graph API allows searching over many public objects in the social graph, like users, posts, events, pages etc., using a search query keyword. We used event specific terms for each of the 17 events (see Table 1) to collect relevant public posts. Unlike other social networks like Twitter, Facebook does not provide an API endpoint to collect a continuous random sample of public posts in real time. Thus, we used the search API to collect data. A drawback of the search method is that if a post is deleted or removed (either by the user herself, or by Facebook) before our data collection module queries the API, it would not appear in the search results. We repeated the search every 15 minutes to overcome this drawback to some extent. In all, we collected over 4.4 million public Facebook posts generated by over 3.3 million unique entities. Table 2 shows the descriptive statistics of our final dataset.

Event (keywords) # Posts Description
Missing Air Algerie Flight AH5017 (ah5017; air algerie) 6,767 Air Algerie flight 5017 disappeared from radar 50 minutes after take off on July 24, 2014. Found crashed near Mali; no survivors.
Boston Marathon Blasts (prayforboston; marathon blasts; boston marathon) 1,480,467 Two pressure cooker bombs exploded during the Boston Marathon at 2:49 pm EDT, April 15, 2013, killing 3 and injuring 264.
Cyclone Phailin (phailin; cyclonephailin) 60,016 Phailin was the second-strongest tropical cyclone ever to make landfall in India on October 11, 2013.
FIFA World Cup 2014 (worldcup; fifaworldcup) 67,406 20th edition of FIFA world cup, began on June 12, 2014. Germany beat Argentina in the final to win the tournament.
Unrest in Gaza (gaza) 31,302 Israel launched Operation Protective Edge in the Hamas-ruled Gaza Strip on July 8, 2014.
Heartbleed bug in OpenSSL (heartbleed) 8,362 Security bug in OpenSSL disclosed on April 1, 2014. About 17% of the world’s web servers found to be at risk.
IPL 2013 (ipl; ipl6; ipl2013) 708,483 Edition 6 of IPL cricket tournament hosted in India, April-May 2013.
IPL 2014 (ipl; ipl7) 59,126 Edition 7 of IPL cricket tournament jointly hosted by United Arab Emirates and India, April-May 2013.
Lee Rigby’s murder in Woolwich (woolwich; londonattack) 86,083 British soldier Lee Rigby attacked and murdered by Michael Adebolajo and Michael Adebowale in Woolwich, London on May 22, 2013.
Malaysian Airlines Flight MH17 shot down (mh17) 27,624 Malaysia Airlines Flight 17 crashed on 17 July 2014, presumed to have been shot down, killing all 298 on board.
Metro-North Train Derailment (bronx derailment; metro north derailment; metronorth) 1,165 A Metro-North Railroad Hudson Line passenger train derailed near the Spuyten Duyvil station in the New York City borough of the Bronx on December 1, 2013. Four killed, 59 injured.
Washington Navy Yard Shootings (washington navy yard; navy yard shooting; NavyYardShooting) 4,562 Lone gunman Aaron Alexis killed 12 and injured 3 in a mass shooting at the Naval Sea Systems Command (NAVSEA) headquarters inside the Washington Navy Yard in Washington, D.C. on Sept. 16, 2013.
Death of Nelson Mandela (nelson; mandela; nelsonmandela; madiba) 1,319,783 Nelson Mandela, the first elected President of South Africa, died on December 5, 2013. He was 95.
Birth of the fist Royal Baby (RoyalBabyWatch; kate middleton; royalbaby) 90,096 Prince George of Cambridge, first son of Prince William, and Catherine (Kate Middleton), was born on July 22, 2013.
Typhoon Haiyan (haiyan; yolanda; typhoon philippines) 486,325 Typhoon Haiyan (Yolanda), one of the strongest tropical cyclones ever recorded, devastated parts of Southeast Asia on Nov. 8, 2013.
T20 Cricket World Cup (wt20; wt2014) 25,209 Fifth ICC World Twenty20 cricket competition, hosted in Bangladesh during March-April, 2014. Sri Lanka won the tournament.
Wimbledon Tennis 2014 (wimbledon) 2,633 128th Wimbledon Tennis championship held between June 23, and July 6, 2014. Novak Djokovic from Serbia won the championship.
Table 1: Event name, keywords used as search queries, number of posts, and description for each of the 17 events in our dataset.
Event selection

All events we picked for our analysis, made headlines in international news. To maintain diversity, we selected events covering various domains of news events like political, sport, natural hazards, terror strikes and entertainment news. We tried to pick a mixture of crisis, and non crisis events which took place in, and affected different parts of the world. In terms of data, we selected events with at least 1,000 public Facebook posts. For all 17 events, we started data collection from the time the event took place, and stopped about two weeks after the event ended.

Labeled dataset creation

To create a labeled dataset, we first filtered out all posts containing at least one URL. We used regular expressions to filter out all possible URLs from the message field of the posts. These URLs were added to the set of URLs present in the link field (if available) for each post. These extracted URLs were then visited using Python’s Requests package 666 and LongURL API 777, in case the Requests package failed. Visiting the landing pages of the URLs before the blacklist lookup helped us to eliminate invalid URLs and capture the correct final destination URLs corresponding to shortened URLs, if any. After the extraction and validation process, we were left with a total of 480,407 unique URLs across 1,222,137 unique posts (Table 2). Each URL was then subjected to five blacklist lookups, viz. SURBL (?), Google Safebrowsing (?), PhishTank (?), VirusTotal (?), and Web of Trust (?). This methodology of identifying malicious content using URL blacklists has also been used multiple times in past research (???).

Unique posts 4,465,371
Unique entities 3,373,953
- Unique users 2,983,707
- Unique pages 390,246
Unique URLs 480,407
Unique posts with URLs 1,222,137
Unique entities posting URLs 856,758
Unique posts with malicious URLs 11,217
Unique entities posting malicious URLs 7,962
Unique malicious URLs 4,622
Table 2: Descriptive statistics of complete dataset collected over April 2013 - August 2014.

The scan results returned by the VirusTotal API contain domain information from multiple services like TrendMicro, BitDefender, WebSense ThreatSeeker, etc. for a given domain. We marked a URL as malicious if one or more of these services categorized the domain of the URL as spam, malicious, or phishing. The Web of Trust (WOT) API returns a reputation score for a given domain. Reputations are measured for domains in several components, for example, trustworthiness. For each {domain, component} pair, the system computes two values: a reputation estimate and the confidence in the reputation. Together, these indicate the amount of trust in the domain in the given component. A reputation estimate of below 60 indicates unsatisfactory. Also, the WOT browser add-on requires a confidence value of 10 before it presents a warning about a website. We tested the domain of each URL in our dataset for two components, viz. Trustworthiness and Child Safety. For our experiment, a URL was marked as malicious if both the aforementioned conditions were satisfied. That is, if the reputation estimate for a domain of a URL was below 60 (unsatisfactory, poor or very poor) and the confidence in the reputation was 10, the URL was marked malicious. In addition to reputations, the WOT rating system also computes categories for websites based on votes from users and third parties. We marked a URL as malicious if it fell under the Negative or Questionable category group. 888The exact category labels corresponding to Negative and Questionable categories can be found at Further, a URL was marked malicious if it was present under any category (spam / malware / phishing) in the SURBL, Google Safebrowsing or PhishTank blacklists.

The reason for including WOT reputation scores in our labeled dataset of malicious posts was two-fold. Firstly, as previously discussed, one of the goals of this work is to evaluate Facebook’s current techniques to counter malicious content. Facebook partnered with WOT (?) to protect its users from malicious URLs (discussed further in Analysis and results section). Evaluating the effectiveness of Facebook’s use of WOT was one of the ways to achieve our goal. Secondly, when it comes to real world events, malicious entities tend to engage in spreading fake, untrustworthy and obscene content to degrade user experience (??). This kind of information, despite being malicious, is not captured by blacklists like Google Safebrowsing and SURBL, since they do not fall under the more obvious kinds of threats like malware and phishing. WOT scores helped us to identify and tag such content.

Analysis and results

We now present our findings about the effectiveness of Facebook’s current techniques of malicious content detection and the differences between malicious and legitimate content on Facebook. We then use these difference to identify a set of 42 features and apply standard machine learning techniques to automatically differentiate malicious content from legitimate content.

Efficiency of Facebook’s current techniques

Facebook’s immune system uses multiple URL blacklists to detect malicious URLs in real time and prevent them from entering the social graph (?). Understandably, the inefficiency of blacklists to detect URLs at zero-hour makes this technique considerably ineffective (?). We tried to check if Facebook was taking any measures to overcome this drawback. To this end, we re-queried the Graph API for all malicious posts in our dataset and observed if Facebook was able to detect and remove malicious posts at a later point in time. We also studied the effectiveness of Facebook’s partnership with WOT to protect its users from malicious URLs (?).

Upon re-querying the Graph API in November 2014, we found that only 3,921 out of the 11,217 (34.95%) malicious posts had been deleted. It was surprising to note that almost two thirds of all malicious posts (65.05%) which got past Facebook’s real time detection filters remained undetected even after at least 4 months (between July 2014, when we stopped data collection, and November 2014, when we re-queried the API) from the date of post. Collectively, these posts had gathered likes from 52,169 unique users and comments from 8,784 unique users at the time we recollected their data in November 2014. Figure 1 shows one such malicious post from our dataset which went undetected by Facebook. The short URL in the post points to a scam website which misleads users into earning money by liking posts on Facebook. Using the URL endpoint of the Graph API 999, we also found that the 4,622 unique URLs present in the 11,217 malicious posts had been shared on Facebook over 37 million times. A possible reason for the continued existence of malicious posts could be that Facebook does not rescan a post once it passes through Facebook’s content filter and into the social network.

Figure 1: One of the 7,296 malicious posts from our dataset which were not deleted by Facebook. We revisited this post after 11 months of being posted.

For the 3,921 posts that were deleted, an interesting aspect to study was the amount of time it took for these posts to get removed. Although we were not able capture this information, we used the minimum retention period (time difference between the time of post creation and the time of post capture in our dataset) to estimate a lower bound on this value. We observed that the median time difference between a post’s creation time and capture time was 4.64 hours ( = 41.99 hours, = 128.7 hours, min. = 1 second, max = 54 days). Figure 2 represents the minimum retention period of the 3,921 malicious posts in our dataset, which were not deleted by Facebook until November, 2014.

Figure 2: Minimum retention period (in hours) of malicious posts in our dataset which were removed from Facebook. Approximately 51% of the 3,921 posts were captured within the first 5 hours of being posted.

To get an accurate approximation of the time it took for a malicious post to get deleted, the ideal approach would have been to re-query the Graph API for each post we collected, repeatedly after a fixed (small) interval of time. However, re-querying the API for all 4.4 million posts in our dataset repeatedly and periodically was a computationally expensive and infeasible task.

The above analysis suggests malicious content which goes undetected by Facebook’s real time filters not only remains undetected for at least some time, but thrives on users’ likes and comments. This increases the reach and visibility of malicious content and potentially exposes a much larger section of users than the attacker may have initially intended. A real time zero-hour detection technique can aid the identification and removal of such content, and can further improve Facebook’s existing systems for malicious content detection.

Partnership with Web of Trust

Facebook partnered with Web of Trust in 2011 to protect its users from malicious URLs (?). According to this partnership, Facebook shows a warning page to the user whenever she clicks on a link which has been reported for spam, malware, phishing or any other kind of abuse on WOT (Figure 3). To verify the existence of this warning page, we manually visited a random sample of 100 posts containing a URL marked as malicious by WOT, and clicked on the URL. Surprisingly, the warning page did not appear even once. We also noticed that over 88% of all malicious URLs in our dataset (4,077 out of 4,622) were marked as malicious by Web of Trust. This highlights considerable inefficacy in Facebook’s partnership with Web of Trust to control the spread of malicious URLs.

Figure 3: Warning page shown by Facebook whenever a user clicks on a link reported as abusive on Web of Trust. The user may chose to return or ignore the warning and visit the URL any way.

From the above analysis, it is evident that Facebook’s existing techniques to combat the spread of malicious content through the social graph still have scope for improvement. In the next section, we highlight some key characteristics of the malicious content we found in our dataset. We present some important features which can be used to subdue the shortcomings of blacklist lookups and help Facebook in identifying malicious content from legitimate content efficiently and in real time.

Key characteristics of malicious content

We analyzed the malicious content in our dataset in three aspects; a) textual content and URLs, b) entities who post malicious content, and c) metadata associated with malicious content. We now look at all these three aspects individually in detail.

Textual content and URLs

From our dataset of 11,217 unique malicious posts, we first looked at the most commonly appearing posts. Similar to past work (?), we found various campaigns promoting a particular entity or event. However, campaigns in our dataset were very different than those discussed in the past. Table 3 shows the top 10 campaigns we found in our dataset of malicious posts. We found that most of the campaigns in our dataset were event specific, and targeted at celebrities and famous personalities who were part of the event. Although this seems fairly obvious because of our event based dataset, such campaigns reflect the attackers’ preferences of using the context of an event to target OSN users. Attackers now prefer to exploit users’ curiosity, instead of hijacking trends and posting unrelated content (like promoting free iPhone, illegal drugs, cheap pills, free ringtones, etc.) using topic specific keywords.

Post Summary Count
Sexy Football Worldcup - Bodypainting 155
10 Things Nelson Mandela Said That Everyone Should Know 154
Was Bishop Desmond Tutu Frozen Out of Nelson Mandela’s Funeral? 105
Nude images of Kate Middleton 73
The Gospel Profoundly Affected Nelson Mandela’s Life After Prison 72
Promotion of Obamacare (Affordable Care Act) through Nelson Mandela’s death 67
Radical post about Nelson Mandela 54
Was Nelson Mandela a Christian? 41
R.I.P. Nelson Mandela: How he changed the world 36
Easy free cash 29
Table 3: Top 10 most common posts in our dataset of malicious posts.

We then looked at the various types of malicious posts present in our dataset. We found that the most common type of malicious posts were the ones with URLs pointing to adult content and incidental nudity, and marked unsafe for children (52.0%) by Web of Trust. The second most common types of malicious posts comprised of negative (malware, phishing, scam, etc.) and questionable (misleading claims or unethical, spam, hate, discrimination, potentially unwanted programs, etc.) category URLs (45.2%), closely followed by posts containing untrustworthy sources of information (38.22%). Interestingly, only 325 malicious posts (2.9%) advertised a phishing URL. This is a drastic drop as compared to the observations made by Gao et al. in 2010, where they found that over 70% of all malicious posts in their dataset advertised phishing (?). We also found that 18.4% of the malicious posts in our dataset (2,064 posts out of 11,217) advertised one or more shortened URLs. Past literature has shown wide usage of shortened URLs to spread malicious content on microblogging platforms (??). Short URLs have seen a significant increase in their usage mostly due to restriction of message length on OSNs like Twitter. However, given that this restriction on message length does not apply on Facebook, obfuscation of actual landing pages is, most likely, the primary reason behind usage of shortened URLs.

In addition to post categories, we also looked at the most common URL domains in our dataset. Table 4 shows the 10 most widely shared malicious and legitimate domains in our dataset. It is interesting to note that Facebook and YouTube constituted almost 60% of all legitimate URLs shared during the 17 events. The remaining legitimate URLs largely belonged to news websites. On the contrary, malicious URLs were more evenly distributed across a mixture of categories including news, sports, entertainment, blogs, etc. Our dataset revealed that a large fraction of malicious content comprised of untrustworthy sources of information, which may have inappropriate implications in the real world, especially during events like elections, riots, etc. Most previous studies on detecting malicious content on online social networks have concentrated on identifying more obvious threats like malware and phishing (???). There exists some work on studying trustworthiness of information on other social networks like Twitter (??). However, to the best of our knowledge, no past work addresses the problem of identifying untrustworthy content on Facebook.

Malicious Domain WOT categories VirusTotal Category % Legitimate Domain VirusTotal Category % Untrustworthy News 5.60 Social networks 53.17 Child unsafe Sports 4.69 Social web 6.69 Child unsafe Online Photos 3.45 News 0.72 Untrustworthy, Child unsafe, hate, discrimination News, Traditional religions 2.53 News 0.58 Child unsafe, adult content, gruesome or shocking Entertainment, Streaming media 2.42 Social networks 0.50 Untrustworthy, Child unsafe, spam Society and lifestyle 2.21 News 0.49 Child unsafe, adult content, gruesome or shocking News, Violence, Streaming media 2.01 Search engine 0.47 Child unsafe, adult content Blogs and personal sites, adult content 1.44 News, entertainment 0.39 Adult content Adult content 1.27 Social networks 0.37 Child unsafe, adult content Blogs / social networks 1.21 News 0.31
Table 4: Top 10 malicious and legitimate domains and their VirusTotal categories (and Web of Trust categories for malicious domains) in our dataset. Most legitimate URLs shared on Facebook belonged to their own domain. Malicious domains were much more evenly spread.

Entities posting malicious content

Having found a reasonable number of malicious posts in our dataset, we further investigated the entities (users / pages) who generated these malicious posts. We found that over 25% of the entire 3.3 million unique entities in our dataset (approx. 0.85 million) posted at least one URL, and 0.24% (7,962) unique entities posted at least one malicious URL (see Table 2). Content on Facebook is generated by two types of entities; users and pages. Pages are public profiles specifically created for businesses, brands, celebrities, causes, and other organizations, often for publicity. Unlike users, pages do not gain “friends,” but “fans,” people who choose to “like” a page. In our dataset, we identified pages by the presence of category field in the response returned by Graph API search (?) during the initial data collection process. The category field is specific to pages and we manually verified that it was returned for all pages in our dataset. All remaining entities were users. We queried the Graph API 101010¡entity˙id¿ for profiles of all the 3.3 million entities in our dataset in October 2014. During this process, gender information for all users was acquired and changes in page category (if any) were captured. We found that 10.2% of the pages (39,843) had been deleted, and 8.7% of the pages (33,794) had changed their category.

Upon analyzing users, we found that for users who posted malicious URLs, the gender distribution was skewed towards the male population as compared to the gender distribution for legitimate users in our dataset (see Figure 4). For malicious users, the male:female ratio was 1:2.41 (Figure 4), whereas for legitimate users posting one or more URLs, this ratio was 1:2 (Figure 4). The male:female distribution further dropped to 1:1.62 for all legitimate users (Figure 7). We also found that pages were more active in posting malicious URLs as compared to non malicious URLs. Pages were observed to constitute 21% (1,676 out of 7,962) of all malicious entities (Figure 4), while only 10% of all legitimate URL posting entities were pages (Figure 4). A similar percentage of pages (12%) was found to constitute all legitimate entities in our dataset (Figure 7). We also found 43 verified pages and 1 verified user among entities who posted malicious content. The most common type of verified pages were radio station pages (12), website pages (5) and public figure pages (4). Combined together, the 43 verified pages had over 71 million likes.

Figure 4: Gender and category distribution of (a) Malicious entities (N=7,962); (b) Non malicious entities posting URLs (N=849,190); and (c) All non malicious entities (N=3,365,991) in our dataset.

It is important to note that most of the past attempts at studying malicious content on Facebook did not capture content posted by pages, and concentrated only on users (???). Malicious content originating from pages in our dataset brings out a new element of Facebook, which is yet to be addressed. Facebook limits the number of friends a user can have, but there is not limit on the number of people who can subscribe to (Like, in terms of Facebook terminology) a page. Content posted by a page can thus, have much larger audience than that of a user, making malicious content posted by pages potentially more widespread and dangerous than that posted by individual users. We found that in our dataset, pages posting malicious content had 123,255 Likes on average (min. = 0, max. = 50,034,993), whereas for legitimate pages, the average number of Likes per page was only 45,812 (min. = 0, max. = 502,938,006). Upon further investigation, we found high similarity between the categories of the most famous pages posting legitimate and malicious content. We found that the 10 most famous categories of pages posting malicious content were also among the most famous categories of pages posting legitimate content (Figure 5).

Figure 5: Top ten page categories among Facebook pages posting malicious content. We found similar category distribution among malicious and non-malicious pages.


There are various types of metadata associated with a post, for example, application used to post, time of post, type of post (picture / video / link), location etc. Metadata is a rich source of information that can be used to differentiate between malicious and legitimate users. Figure 6 shows the distribution of the top 25 applications (web / mobile / other) used to post content in our dataset. 111111The top 25 applications were used to generate over 95% of content in all three categories we analysed. We observed that over 51% of all legitimate content was posted through mobile apps. This percentage dropped to below 15% for malicious content. Third party and custom applications (captured in “Other” in Figure 6) were used to generate 11.5% of all malicious content in our dataset as compared to only 1.4% of all legitimate content being generated by such applications. This behavior reflects that malicious entities make use of web and third party applications (possibly for automation) to spread malicious content, and can be an indicator of malicious activity. Legitimate entities, on the other hand, largely resort to standard mobile platforms to post.

Although Facebook has more web users than mobile users (?), our observations may be biased towards mobile users due to our event specific dataset. As described in the Data section, our data was collected during 17 real world events. Past literature has shown high social network activity through mobile devices during such events (?).

Figure 6: Sources of malicious content, legitimate content with URLs, and all legitimate content. Mobile platforms were preferred over web for posting legitimate content.

We also observed significant difference in the content types that constituted malicious and legitimate content. Over 50% of legitimate posts containing a URL were photos or videos whereas this percentage dropped to below 6% for malicious content. A large proportion of these photos and videos were uploaded on Facebook itself. This was one of the main reasons for being the most common legitimate domain in our dataset (see Table 4). We used these, and some other features to train multiple machine learning algorithms for automatic detection of malicious content. The results of our experiments are presented in the next section.

Detecting malicious content automatically

Past efforts for automatic detection of spam and malicious content on Facebook largely focus on detecting campaigns (??), and rely heavily on message similarity features to detect malicious content (?). Researchers using this approach have reported consistent accuracies of over 80% using small feature sets comprising of under 10 features. However, this approach is ineffective in zero-hour detection since the aforementioned models require to have seen similar spam messages in the past. To overcome this inability, we propose an extensive set of 42 features (see Table 5) to detect malicious content, excluding features like message similarity, likes, comments, shares etc., which are absent at zero-hour. We group these 42 features into four categories based on the their source; Entity (E), Text content (T), Metadata (M) and Link (L).

Source Features
Entity (9) is a page / user, gender, page category, has username, username length, name length, num. words in name, locale, likes on page
Text content (18) Presence of !, ?, !!, ??, emoticons (smile, frown), num. words, avg. word length, num. sentences, avg. sentence length, num. English dictionary words, num. hashtags, hashtags per word, num. characters, num. URLs, num. URLs per word, num. uppercase characters, num. words / num. unique words
Metadata (8) App, has URL, has message, has story, has link, has picture, type, link length
Link (7) has HTTP / HTTPS, hyphen count, paramters count, parameter length, num. subdomains, path length
Table 5: Features used for machine learning experiments. We extracted features from four sources, viz. entity, content, metadata, and link.

We trained four classifiers using 11,217 unique malicious posts as the positive class and 11,217 unique legitimate posts, randomly drawn from the 1,210,920 unique legitimate posts containing one or more URLs (see Table 2) as the negative class. All experiments were performed using Weka (?). A 10-fold cross validation on this training set yielded a maximum accuracy of 86.9% using the Random Forest classifier. Table 6 describes the results in detail. We also performed the classification experiments using the four category features (E, T, M, and L) separately, and observed that link (L) features performed the best, yielding an accuracy of 82.3%. A combination of all four category features, however, outperformed the individual category scores, signifying that none of the category features individually could identify malicious posts as accurately as their combination. We also recorded the accuracy results for the Random Forest classifier by varying the number of features according to their information gain values (see Figure 7). We calculated the accuracy for 1 through all 42 features, adding features one by one in decreasing order of their information gain value, and found that the accuracy peaked at the top 10 features. All four classifiers achieved higher accuracy when trained on the top 10 features, as compared to accuracy when trained on all 42 features (see Table 6). Table 7 shows the information gain value and source of the top 10 features.

Feature Set E T M L All Top10
Naive Bayesian 58.9 52.0 75.0 66.3 58.8 74.3
Decision Tree 63.8 65.4 80.8 82.0 85.0 85.8
Random Forest 63.6 65.6 80.9 82.3 85.5 86.9
AdaBoost 59.5 62.8 76.5 71.8 76.8 77.4
Table 6: Ten-fold cross validation accuracies for four classifiers over six different feature sets.
Feature Source Info. Gain
Presence of URL Metadata 0.240
Post type Metadata 0.219
Length of parameter(s) in URL Link 0.216
Application used to post Metadata 0.209
Length of link field Metadata 0.201
Number of parameters in URL Link 0.178
Number of sub-domains in URL Link 0.110
Length of URL path (after domain) Link 0.093
Number of hyphens in URL Link 0.084
Presence of story field Metadata 0.071
Table 7: Source and information gain value of the top 10 features.

We performed further experiments to observe the change in true positive rate, false positive rate and accuracy values with change in training dataset sizes. The Random Forest classifier was used for these experiments since it gave the highest accuracy amongst the four classifiers we used in the previous experiment. We used all 42 features to train the classifier in this experiment. Keeping the size of the positive class constant (11,217 instances), the size of the training set was varied by varying the size of the negative class instances in the ratio 1:1/2, 1:1, 1:2 and 1:5. This yielded 5,609, 11,217, 22,434 and 56,085 negative class instances consecutively, randomly drawn from 1,210,920 unique legitimate posts containing one or more URLs (see Table 2). Figure 7 shows the results of this experiment. We were able to achieve a maximum true positive rate of 97.7% for malicious posts using the 1:1/2 split. The false positive rate dropped to a lowest of 3.4% for 1:5 split. As we increased the size of the negative class, both the true positive and false positive rates for the malicious class constantly decreased. The 1:1 split yielded the lowest average false positive rate across both classes.

Figure 7: (a) Accuracy values of the Random Forest classifier for 1 through 42 features. Accuracy peaked to 86.9% at top 10 features. (b) True positive and false positive rates of malicious and legitimate classes with different sizes of the training set.

To check the effectiveness of our model over time, we collected test data about the Ebola outbreak in Africa during August - October 2014. We collected a total of 59,179 posts containing URLs, out of which, 3,248 post were found to be malicious. 121212We used the same methodology to find malicious posts as we did for the 17 events in our training data. Our final test set consisted of 6,496 posts (3,248 malicious and 3,248 randomly picked legitimate posts). Evaluating our balanced model (trained on 11,217 malicious and 11,217 legitimate posts) on this test set revealed a significant drop in true positive rate, from 93.2% to 81.6%. This value is, however, a slight improvement over previously reported true positive rates for spam detection techniques over time (?).

Comparison with past work

One of the few studies on detecting malicious content on Facebook using a dataset bigger than ours, was conducted by Gao et al., where researchers used a clustering approach to identify spam campaigns (?). Although authors reported a high true positive rate of 93.9%, there was no estimation of the amount of malicious posts that this approach missed (false negatives). To this end, we applied the same clustering technique and threshold values used by Gao et al. on our dataset to get an estimate of the false negatives of their clustering approach. Since our entire dataset was already labeled (as opposed to Gao et al.’s dataset), we did not apply clustering on our entire dataset to find malicious posts. Instead, we applied clustering only on malicious posts in our dataset, and compared how many of those clusters met the distributed and bursty threshold values previously used (5 users per cluster, and 90 minutes median time between consecutive posts respectively). Applying clustering on our 11,217 malicious posts yielded a total of 4,306 clusters. Out of these, only 183 clusters (containing 4,294 posts) met the distributed and bursty thresholds, yielding a high false negative rate of 61.7%. These results indicate that our machine learning models perform considerably better, and are able to detect more than double the amount of malicious posts as compared to existing clustering techniques. We were unable to compare our results with other previous work due to two major reasons; a) absence of features like likes, comments, message similarity etc. at zero-hour (used by Rahman et al.), and b) public unavailability of features like number of friends, message sent, friend choice, active friends, page likes etc. (used by Stringhini et al. (?) and Ahmed et al. (?)).

REST API and browser plug-in

To provide a real world solution for the problem of detecting malicious content on Facebook, we built a REST based API (Application Programming Interface) using the Random Forest classifier trained our labeled dataset. The API is publicly accessible at¡Facebook˙post˙ID¿ and can be queried by sending a HTTP GET request along with a Facebook post ID. Due to Facebook’s API limitations, our API currently works only for public posts which are accessible through Facebook’s Graph API. Our API fetches the post and entity profile information using Facebook’s Graph API and generates a feature vector, which is subjected to a pre-trained classifier. The label returned by the classifier is output by the API in JSON format along with the original Facebook post ID.

For better utility of our REST API, we built a plug-in for Google Chrome browser. Figure 8 shows the flow diagram representing the communication flow between the browser plug-in, REST API, and the classifier. Once installed and enabled, this plug-in loads whenever a user opens her Facebook page on her Google Chrome browser, and extracts the post IDs of all public posts in the user’s newsfeed. The post IDs are then sent to the REST based API. If the API returns the label malicious for a post, the plug-in marks the post with an “alert” symbol (Figure 9).

Figure 8: Flow diagram for our browser plug-in.
Figure 9: Screenshot of a malicious Facebook post labeled by our browser plug-in.


We could not find a way to claim that our dataset is representative of the entire Facebook population. Facebook does not provide any information about what percentage of public posts is returned by Graph API search. However, to the best of our knowledge, our dataset of 4.4 million public posts and 3.3 million users is the biggest dataset in literature, collected using Facebook APIs.

We understand that the WOT ratings that we used to create our labeled dataset of malicious posts are obtained through crowd sourcing, and may suffer biases. However, WOT states that in order to keep ratings more reliable, the system tracks each user’s rating behavior before deciding how much it trusts the user. In addition, the meritocratic nature of WOT makes it far more difficult for spammers to abuse, because bots will have a hard time simulating human behavior over a long period of time. With these measures taken by WOT to control bias, we believe that it is safe to assume the validity of our labeled dataset of malicious posts and hence, our results.


OSNs witness large volumes of content during real world events, providing malicious entities a lucrative environment to spread scams, and other types of malicious content. We studied content generated during 17 such events on Facebook, and found substantial presence of malicious content which evaded Facebook’s existing immune system and made it to the social graph. We observed characteristic differences between malicious and legitimate posts and used them to train machine learning models for automatic detection of malicious posts. Our extensive feature set was completely derived from public information available at zero-hour, and was able to detect more than double the number of malicious posts as compared to existing spam campaign detection techniques. Finally, we deployed a real world solution in the form of a REST based API and a browser plug-in to identify malicious Facebook posts in real time. In future, we would like to test the performance and usability of our browser plug-in. We would also like to investigate Facebook pages spreading malicious content in further detail. Further, we intend to study malicious Facebook posts which do not contain URLs.


  • [Ahmed and Abulaish 2012] Ahmed, F., and Abulaish, M. 2012. An mcl-based approach for spam profile detection in online social networks. In Trust, Security and Privacy in Computing and Communications (TrustCom), 2012 IEEE 11th International Conference on, 602–608. IEEE.
  • [Antoniades et al. 2011] Antoniades, D.; Polakis, I.; Kontaxis, G.; Athanasopoulos, E.; Ioannidis, S.; Markatos, E. P.; and Karagiannis, T. 2011. we. b: The web of short urls. In Proceedings of the 20th international conference on World Wide Web, 715–724. ACM.
  • [Benevenuto et al. 2009] Benevenuto, F.; Rodrigues, T.; Almeida, V.; Almeida, J.; and Gonçalves, M. 2009. Detecting spammers and content promoters in online video social networks. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, 620–627. ACM.
  • [Benevenuto et al. 2010] Benevenuto, F.; Magno, G.; Rodrigues, T.; and Almeida, V. 2010. Detecting spammers on twitter. In CEAS, volume 6,  12.
  • [Castillo, Mendoza, and Poblete 2011] Castillo, C.; Mendoza, M.; and Poblete, B. 2011. Information credibility on twitter. In WWW, 675–684. ACM.
  • [Chhabra et al. 2011] Chhabra, S.; Aggarwal, A.; Benevenuto, F.; and Kumaraguru, P. 2011.$ocial: the phishing landscape through short urls. In CEAS, 92–101. ACM.
  • [Chu, Widjaja, and Wang 2012] Chu, Z.; Widjaja, I.; and Wang, H. 2012. Detecting social spam campaigns on twitter. In Applied Cryptography and Network Security, 455–472. Springer.
  • [Facebook Developers 2011] Facebook Developers. 2011. Keeping you safe from scams and spam.
  • [Facebook Developers 2013] Facebook Developers. 2013. Facebook graph api search.
  • [Facebook 2014] Facebook. 2014. Facebook Company Info.
  • [Gao et al. 2010] Gao, H.; Hu, J.; Wilson, C.; Li, Z.; Chen, Y.; and Zhao, B. Y. 2010. Detecting and characterizing social spam campaigns. In Proceedings of the 10th ACM SIGCOMM conference on Internet measurement, 35–47. ACM.
  • [Gao et al. 2012] Gao, H.; Chen, Y.; Lee, K.; Palsetia, D.; and Choudhary, A. N. 2012. Towards online spam filtering in social networks. In NDSS.
  • [Google 2014] Google. 2014. Safe browsing api.
  • [Grier et al. 2010] Grier, C.; Thomas, K.; Paxson, V.; and Zhang, M. 2010. @ spam: the underground on 140 characters or less. In CCS, 27–37. ACM.
  • [Gupta and Kumaraguru 2012] Gupta, A., and Kumaraguru, P. 2012. Credibility ranking of tweets during high impact events. In PSOSM. ACM.
  • [Gupta et al. 2013] Gupta, A.; Lamba, H.; Kumaraguru, P.; and Joshi, A. 2013. Faking sandy: characterizing and identifying fake images on twitter during hurricane sandy. In Proceedings of the 22nd international conference on World Wide Web companion, 729–736. International World Wide Web Conferences Steering Committee.
  • [Gupta, Lamba, and Kumaraguru 2013] Gupta, A.; Lamba, H.; and Kumaraguru, P. 2013. $1.00 per rt #bostonmarathon #prayforboston: Analyzing fake content on twitter. In eCRS,  12. IEEE.
  • [Hall et al. 2009] Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; and Witten, I. H. 2009. The weka data mining software: an update. ACM SIGKDD explorations newsletter 11(1):10–18.
  • [Hispasec Sistemas S.L. 2013] Hispasec Sistemas S.L. 2013. VirusTotal Public API.
  • [Holcomb, Gottfried, and Mitchell 2013] Holcomb, J.; Gottfried, J.; and Mitchell, A. 2013. News use across social media platforms. Technical report, Pew Research Center.
  • [McCord and Chuah 2011] McCord, M., and Chuah, M. 2011. Spam detection on twitter using traditional classifiers. In Autonomic and Trusted Computing. Springer. 175–186.
  • [OpenDNS 2014] OpenDNS. 2014. Phishtank api.˙info.php.
  • [Owens and Turitzin 2014] Owens, E., and Turitzin, C. 2014. News feed fyi: Cleaning up news feed spam.
  • [Rahman et al. 2012] Rahman, M. S.; Huang, T.-K.; Madhyastha, H. V.; and Faloutsos, M. 2012. Efficient and scalable socware detection in online social networks. In USENIX Security Symposium, 663–678.
  • [Sheng et al. 2009] Sheng, S.; Wardman, B.; Warner, G.; Cranor, L.; Hong, J.; and Zhang, C. 2009. An empirical analysis of phishing blacklists. In Sixth Conference on Email and Anti-Spam (CEAS).
  • [Stein, Chen, and Mangla 2011] Stein, T.; Chen, E.; and Mangla, K. 2011. Facebook immune system. In Proceedings of the 4th Workshop on Social Network Systems,  8. ACM.
  • [Stringhini, Kruegel, and Vigna 2010] Stringhini, G.; Kruegel, C.; and Vigna, G. 2010. Detecting spammers on social networks. In Proceedings of the 26th Annual Computer Security Applications Conference, 1–9. ACM.
  • [SURBL, URI 2011] SURBL, URI. 2011. Reputation data.
  • [Szell, Grauwin, and Ratti 2014] Szell, M.; Grauwin, S.; and Ratti, C. 2014. Contraction of online response to major events. PLoS ONE 9(2): e89052, MIT.
  • [Thomas et al. 2011] Thomas, K.; Grier, C.; Ma, J.; Paxson, V.; and Song, D. 2011. Design and evaluation of a real-time url spam filtering service. In Security and Privacy (SP), 2011 IEEE Symposium on, 447–462. IEEE.
  • [Wang 2010] Wang, A. H. 2010. Don’t follow me: Spam detection in twitter. In Security and Cryptography (SECRYPT), Proceedings of the 2010 International Conference on, 1–10. IEEE.
  • [WOT 2014] WOT. 2014. Web of trust api.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description