Adblocking and Counter-Blocking: A Slice of the Arms Race
Adblocking tools like Adblock Plus continue to rise in popularity, potentially threatening the dynamics of advertising revenue streams. In response, a number of publishers have ramped up efforts to develop and deploy mechanisms for detecting and/or counter-blocking adblockers (which we refer to as anti-adblockers), effectively escalating the online advertising arms race. In this paper, we develop a scalable approach for identifying third-party services shared across multiple websites and use it to provide a first characterization of anti-adblocking across the Alexa Top-5K websites. We map websites that perform anti-adblocking as well as the entities that provide anti-adblocking scripts. We study the modus operandi of these scripts and their impact on popular adblockers. We find that at least 6.7% of websites in the Alexa Top-5K use anti-adblocking scripts, acquired from 12 distinct entities – some of which have a direct interest in nourishing the online advertising industry.
Today’s web ecosystem is largely driven by online advertising. However, recent years have seen a large number of users turn to adblocking and tracker-blocking tools111While adblocking differs from tracker-blocking, to ease presentation, we refer to tools that provide any of these properties as adblockers. for the purposes of improving their web-browsing experience, maintaining privacy, and more recently to protect themselves against malware [21, 25]. With a recent study estimating the number of active adblock users to be 198M and revenue losses due to adblockers at $22B , the threat posed by adblockers to the online advertising revenue model has moved from mildly concerning to existential. In response, publishers have started to actively detect users of adblockers, and subsequently block them or otherwise coerce them to disable the adblocker – in the rest of the paper, we refer to these practices as anti-adblocking. Most recently, this practice gained wide attention with the endorsement of the Internet Advertising Bureau (IAB) when, in March 2016, it released a primer on how to deal with users of adblockers, as well as a semi open-source script222The script was only made available to members of the IAB. for detecting the use of adblockers . The tension between key stakeholders in this ecosystem – publishers, users, and a plethora of intermediate beneficiaries – forms part of what has been dubbed as the adblocking arms race .
Motivation. While incidents of anti-adblocking [3, 6, 26, 24], and the legality of such practices [28, 4, 19], have received increasing attention, current understanding thereof is limited to a few forums  and user-generated reports . As a result, we lack quantifiable insights into key questions such as: how prevalent nowadays are such practices on the Web? Are certain categories of websites more likely to employ anti-adblockers? Who are the main suppliers of anti-adblocking services? What mechanisms do these employ to detect the presence of adblockers? Is it possible for adblockers to counter-block anti-adblockers? What are common responses after positive detection of adblockers and their impact on end-users? In this work, we address these questions by presenting the first characterization of anti-adblocking.
Roadmap. We start with characterizing anti-adblocking on the Web by identifying anti-adblocking scripts across Alexa Top-5K sites. To this end, we develop a scalable technique to identify popular third-party services that are shared across multiple websites, and utilize it to flag anti-adblocking scripts. We then map out the entities that serve anti-adblocking scripts and the websites that use these scripts. We find that at least 6.7% of Alexa Top-5K websites conduct some form of anti-adblocking by downloading 14 scripts from 12 unique domains most of which belong to ad services, while one specifically offers anti-adblocking services. Most of the anti-adblocking websites represent popular categories such as news, blogs, and entertainment. We manually visit sample websites from the anti-adblockers and find that the arms race has already entered the next round: at least one of three popular browser extensions (AdBlock Plus, Ghostery, Privacy Badger) can counter-block half of the anti-adblocking scripts. We conclude with a discussion of the anti-adblocking arms race in terms of ethics and legality, also enumerating existing proposals that aim to achieve a sustainable and unintrusive online advertising model.
2 Related Work
Rafique et al.  measure anti-adblocking as an incidental aspect of a broader study of malicious and deceptive advertisements, malware and scams on free live-streaming services. They find that anti-adblocking scripts were used by 16.3% of the 1,000 domains they crawled, which is a bit higher than what we find in the Alexa Top-5K (6.7%), although not surprising given their heavy use of deceptive ads.
All of these studies illuminate the nature and scale of opaque practices on the web, informing our understanding of complex and multidimensional ecosystems. Our work complements previous studies by presenting a novel technique to identify shared objects across multiple websites at scale, and utilizing this approach to provide a first look at how the Web employs anti-adblocking techniques.
Identifying JS objects with common sources. We formulate our problem of finding groups of similar JS as a maximal clique finding problem . We consider each JS file loaded by a website to be a node in a graph. If two nodes are within some margin of similarity of each other (we define our similarity metric below), we say there is an edge between them. We extract classes of JS that have a common source by identifying all maximal cliques in this graph. By intentionally focusing on finding similar JS (rather than identical JS) we allow for the grouping of objects that differ only slightly because they contain website-specific identifiers, features and properties.
Choice of similarity metric and threshold. In order to add an edge between two nodes in the graph (i.e., to indicate that two JS files in two different websites are similar), we need to define a metric for similarity, and a suitable threshold for this metric. To measure the similarity of two JS files, we use Term Frequency–Inverse Document Frequency (TF-IDF) to generate a vector of keyword weights for each JS file after filtering out JS reserved words, such as function and var. We then use the cosine similarity metric to measure the similarity of the two keyword weight vectors. Similar approaches using both TF-IDF and cosine similarity have been used by the information retrieval community for topic identification and similarity checking of source-code [15, 30]. We note that this method is particularly well suited to our task compared to other string matching approaches because it is:
White-space insensitive: Many websites perform script minification using different libraries, yielding different indentation and white-spacing practices. Our approach is unaffected by these complications.
Position insensitive: In scripts that have several functionalities (e.g., tracking and ad-block detection), the position of each specific function is irrelevant to the similarity score.
Reasonably resistant to noise: Small changes (e.g., website specific identifiers) have little impact on the final similarity score.
In order to determine a similarity score threshold, we perform a series of experiments on a small dataset of 4.4K JS files extracted from the Alexa Top-100 websites. In each experiment, we set a similarity threshold between and and compute the cliques in each of the corresponding graphs. We then manually inspect the cliques extracted at each threshold to identify the fraction of cliques containing JS with identical functionality and sources. Using this approach, we find that at a similarity threshold of , 17/20 cliques returned by our program contain scripts with identical functionality and sources, i.e., achieving True Positive Rate (TPR) of 0.85. In Figure 1, we illustrate the change in TPR along with the number of cliques returned as the threshold increases. Although thresholds above 0.90 yield TPR1.0, the number of cliques returned drops significantly, which will result in lower True Negative Rates (TNR). Therefore, following a conservative stance, we use a threshold of 0.80 for the remainder of our experiments.
Word-count filter: We avoid comparing scripts with significant word-count difference. Specifically, if a pair of scripts has a word-count ratio higher than 1.50, we assume that they are unlikely to be a part of the same clique and set their similarity to 0.
Source and functionality identification. Once maximal cliques of similar scripts are identified, the content and meta-data of each script in a clique is used to generate and log: (i) the FQDN (Fully Qualified Domain Name) of the script’s source, (ii) FQDNs of external resources utilized by the script, and (iii) keywords associated with the script. In Section 4, we use these three features, in addition to content of the script, to classify cliques by functionality.
Method limitations. We acknowledge that our method has a few limitations. First, our similarity metric will fail to identify obfuscated JS code. Second, given that we do not compare downloaded with embedded JS code, we may fail to identify small cliques in which a reduced number of sites integrate an anti-adblocking JS in a different way than is normal. Finally, our method may fail to identify similarities between composed JS– i.e., scripts that consist of multiple individual files downloaded as a single object. As a result, our method only provides a lower-bound approximation of the usage of anti-adblocking across websites. We plan on addressing these limitations in future work.
4 Dataset and Results
We manually analyze all the 1,882 cliques (corresponding to 4,017 unique websites) identified for both downloaded and embedded scripts, and tag them as trackers (if they upload information such as IP addresses and cookies to tracking companies), anti-adblockers (if they check for the presence of adblockers), or others. Manual analysis is performed by identifying external libraries and function specific keywords used in the scripts. We note that manual analysis of JS is a tedious process that does not scale to a larger number of scripts, therefore we leave as part of future work to investigate ways to automate JS tagging.
We uncover 22 cliques used for anti-adblocking employed by 335 websites – about 6.7% of Alexa Top-5K websites. We observe that Alexa Top-1K have 60 anti-adblocking websites, and the number increases by about 70 websites for every additional 1K considered, reaching 335 anti-adblocking websites in Top-5K. While studying anti-adblockers, we also identify 456 tracking cliques employed by about 54% of Alexa Top-5K, validating previous studies on the pervasiveness of tracking over the Web .
Anti-adblocking by website categories. In Table 2, we report the categories of the 335 anti-adblocking websites, using McAfee’s URL categorization service . We find that anti-adblocking is common among a diverse mix of publishers, and prevalent among publishers of “General News” (19.5%), “Blogs/Wiki” (9.3%), and “Entertainment” (8.5%) categories, which represent more than one third of all websites. Note that these categories are also among the most popular ones across all Top-5K Alexa domains, although to a lesser extent – respectively, 9.4%, 6.29%, and 5.4%. Whereas, other popular categories among Top-5K domains (e.g., “Internet services”, “Online Shopping”, “Business”, which account for 20% of the Top-5K) are much less prevalent in anti-adblocking websites.
|4.3%||Internet Services||2.2%||Potential Illegal Software|
Website response to detection of adblockers. In order to assess how anti-adblocking websites behave once they identify adblockers, we look at all the screenshots taken by our crawler, respectively, when using the vanilla Firefox browser with no extensions and the Firefox browser with AdBlock Plus enabled (which we assume is more likely to be detected due to its popularity ) .
We note cases where there is an explicit (i.e., warning to disable adblocker) or a discrete (i.e., blank page via AdBlock Plus, but normal appearance without) response to adblocking. For these websites, we also view screenshots when accessed by the Firefox browser with each of the following extensions: Ghostery, Privacy Badger, and NoScript.
We find only 6 explicit and no discrete responses to adblocking. Of the explicit responses, 3 are displayed by porn websites hosted by the same company – MindGeek – and employ the same anti-adblocking script downloaded from DoublePimp. The warning is displayed for both AdBlock Plus and Ghostery. The remaining 3 also employ the same script, but display different messages (only for AdBlock Plus) with the same general theme, i.e., nudging the user to disable the adblocker and/or support the website via subscription or donation.
Some websites display adblocker warning to users after they engage in some form of activity, such as clicking on links or scrolling. To capture such responses, we repeat the above exercise for screenshots taken after mimicking user activity – specifically, clicking on a random link on the page, scrolling down to the bottom of the newly loaded page, waiting three seconds, then scrolling back up to the top of the page, waiting 5 seconds. While the modified methodology validates our previous observations, we do not discover any new responses.
In the attempt of automating the analysis of websites’ response to anti-adblocking, we have also tried to use image comparison tools, such as perceptual hashing. However, this generates a high number of false positives due to dynamic content on many sites as well as false negatives since anti-adblocking warnings and messages generate a relatively small visual difference.
How anti-adblockers work. Next, we manually inspect the 22 anti-adblocking scripts (14 downloaded and 8 embedded) aiming to understand how anti-adblocking scripts detect adblockers. We note that of these only the 14 downloaded scripts are actually useful as the 8 embedded scripts simply redirect to the downloaded scripts. We find that anti-adblockers operate on a simple premise: if a bait object (i.e., an object that is expected to be blocked by ad-blockers – e.g., a JS or DIV element named ads) on the publisher’s website is missing when the page loads, the script concludes that the user has an adblocker installed.
Specifically, the anti-adblocker detects adblockers by one of the following approaches: (1) The anti-adblocker injects a bait advertisement container element (e.g., DIV), and then compares the values of properties representing dimensions (height and width) and/or visual status (display) of the container element with the expected values when properly loaded. (2) The anti-adblocker loads a bait script that modifies the value of a variable, and then checks the value of this variable in the main anti-adblocking script to verify that the bait script was properly loaded. If the bait object is determined to be absent, the anti-adblocking script concludes that an adblocker is present. To track whether the user has turned off the adblocker after being prompted to do so, the anti-adblocker periodically runs the ad-block check and stores the last recorded status in the user’s browser using a cookie or local storage.
Anti-adblocker suppliers. We analyze the source code of the 14 anti-adblocking scripts and the domains from which these are downloaded aiming to infer the suppliers of these scripts. The remaining 8 embedded scripts redirect to anti-adblocking scripts served by Cloudflare and Taboola. Our analysis is summarized in Table 3. We also include a description of these domains – based on the information available on their official websites, Google search, and McAfee URL categorization service  – as well as the number of websites in our dataset that employ the anti-adblocker.
At the top we find Pagefair, a company specialized in anti-adblocking services, followed by a number of domains related to Google, Taboola, Outbrain and Ensighten. Overall the anti-adblockers downloaded from these 5 domains are employed by 48% of all the 315 websites employing anti-adblockers. We note that these domains are direct beneficiaries of anti-adblocking as these inherently thrive on the prevalence of online advertisements. Though not directly related to online advertisement, the ability to detect adblockers is a useful capability for the analytics company HotJar.
We also find two cases where the anti-adblocking script is shared by entities in the same domain or business: TripAdvisor (tacdn.com) distributes the script to its 8 websites with different country code top-level domains. Adult websites, all of which are hosted by MindGeek, turn to DoublePimp for anti-adblocking. Two anti-adblocking scripts are pulled from popular Content Delivery Networks (CDNs), but we could not determine their original supplier. Finally, ytimg (a content server associated with YouTube) serves a script that has the ability to detect if ads were properly loaded, however, it is not clear how it uses this information.
Adblocker response to being blocked. There is anecdotal evidence that the adblocking arms race has entered the next level: some adblockers can detect anti-adblockers and counter-block them . To test for this behaviour, we visit a sample website for each anti-adblocking script via AdBlock Plus, Ghostery and Privacy Badger over Chrome web browser. We repeat the experiment three times and monitor all HTTP requests generated when loading the website using Chrome’s Developer Tools. We infer that adblocker can counter-block if the request to fetch anti-adblocking script fails to be initiated. As reported in Table 3, half of the 12 anti-adblocking suppliers are blocked by at least one adblocker. Ghostery and Privacy Badger detect 4 anti-adblockers each, while AdBlock Plus detects only 1. Anti-adblocking scripts served by Taboola and Outbrain are blocked by both Ghostery and Privacy Badger, PageFair scripts by both AdBlock Plus and Privacy Badger, while Doublepimp, Ensighten and Cloudflare scripts by at most one of the three adblockers. We note that the anti-adblocking suppliers that are never detected are related to content distribution, Google ad services, analytics, or site-wide scripts.
The adblocking arms race involves a plethora of players: between publishers and consumers, a jostling array of intermediaries compete to deliver ads, mostly supported by business models that involve taking a cut of the resultant advertising revenue. At the heart of this rich ecosystem lie important questions regarding the legality and ethics of adblocking and anti-adblocking.
The legality of adblocking is potentially contestable under laws about anti-competitive business conduct and copyright infringement. To date, only Germany has tested these arguments in court, with adblockers winning most , but not all of the cases . On the other hand, anti-adblocking in the EU might in turn breach Article 5(3) of the Privacy and Electronic Communications Directive 2002/58/EC, as it involves interrogating an end-user’s terminal equipment without consent .
Many consider adblocking to be an ethical choice for consumers and publishers to consider from both an individual and societal perspective. In reality, however, both sides have resorted to radical measures to achieve their goals. The Web has empowered publishers and advertisers to track, profile and target users in a way that is unprecedented in the physical realm . In addition, publishers are inadvertantly and increasingly serving up malicious ads . This has resulted in the rise of adblocking, which in turn has led publishers to employ anti-adblocking. The core issue is to get the balance right between ads and information: publishers turn to anti-adblocking to force consumers to reconsider the default blocking of ads for earnest ad-supported publishers but defaults are difficult to shift at scale. Nevertheless, those publishers will fail if they do not redress in a fundamental way the reasons that brought consumers to adblockers in the first place. There exist proposals to provide a compromise, such as privacy-friendly advertising  as well as mechanisms to give users more control over ads and trackers they are exposed to [31, 2]. Our work extends these efforts by providing quantified insights into anti-adblocking, to inform policy that can improve upon the current blocking/counter-blocking deadlock.
Acknowledgements. The authors would like to thank the anonymous reviewers for constructive feedback on preparation of the final version of this paper. Rishab Nithyanand was supported by a Open Technology Fund Emerging Technology Senior Fellowship. Sheharbano Khattak and Steven J. Murdoch were supported by The Royal Society [grant number UF110392]; Engineering and Physical Sciences Research Council [grant number EP/L003406/1]. Emiliano De Cristofaro was supported by a Xerox University Affairs Committee award and EU grant H2020-MSCA-RISE “ENCASE”.
-  G. Acar, C. Eubank, S. Englehardt, M. Juarez, A. Narayanan, and C. Diaz. The web never forgets: Persistent tracking mechanisms in the wild. In CCS, 2014.
-  J. P. Achara, J. Parra-Arnau, and C. Castelluccia. MyTrackingChoices: Pacifying the Ad-Block War by Enforcing User Privacy Preferences. In WEIS, 2016.
-  Adblock Plus. Filters for Adblock Plus forum. https://adblockplus.org/forum/viewforum.php?f=2&sid=83e35818f92df8cf921623f2ff27ce70.
-  Adblock Plus. Five and oh â¦ look, another lawsuit upholds users’ rights online. https://adblockplus.org/blog/five-and-oh-look-another-lawsuit-upholds-users-rights-online.
-  C. Bron and J. Kerbosch. Algorithm 457: Finding all cliques of an undirected graph. Commun. ACM, 16(9), 1973.
-  Campaign Against The Illegal Detection/Circumvention Of Adblocking Tools. Alexander Hanff. https://adblocking.think-privacy.com.
-  L. Chen, A. Mislove, and C. Wilson. Peeking beneath the hood of Uber. In IMC, 2015.
-  M. Falahrastegar, H. Haddadi, S. Uhlig, and R. Mortier. Tracking Personal Identifiers Across the Web. In PAM, 2016.
-  S. Guha, B. Cheng, and P. Francis. Privad: Practical privacy in online advertising. In NDSI, 2011.
-  A. Hannak, P. Sapiezynski, A. Molavi Kakhki, B. Krishnamurthy, D. Lazer, A. Mislove, and C. Wilson. Measuring personalization of web search. In WWW, 2013.
-  A. Hannak, G. Soeller, D. Lazer, A. Mislove, and C. Wilson. Measuring price discrimination and steering on e-commerce web sites. In IMC, 2014.
-  IAB Tech Lab. Publisher ad blocking primer. Technical report, 2016.
-  M. Ikram, H. J. Asghar, M. A. Kaafar, B. Krishnamurthy, and A. Mahanti. Towards Seamless Tracking-Free Web: Improved Detection of Trackers via One-class Learning. arXiv preprint 1603.06289, 2016.
-  S. Khattak, D. Fifield, S. Afroz, M. Javed, S. Sundaresan, V. Paxson, S. J. Murdoch, and D. McCoy. Do You See What I See?: Differential Treatment of Anonymous Users. In NDSS, 2016.
-  A. Kuhn, S. Ducasse, and T. Gírba. Semantic clustering: Identifying topics in source code. Information and Software Technology, 49(3), 2007.
-  B. Liu, A. Sheth, U. Weinsberg, J. Chandrashekar, and R. Govindan. Adreveal: Improving transparency into online targeted advertising. In HotNets, 2013.
-  J. Mayer. Tracking the Trackers: Self-help Tools. http://cyberlaw.stanford.edu/blog/2011/09/tracking-trackers-self-help-tools, 2011.
-  McAfee. http://www.trustedsource.org.
-  MEEDIA. Doppelte Attacke: Springers zwei Fronten-Strategie im Kampf gegen Ad-Blocker. http://meedia.de/2015/12/15/doppelte-attacke-springers-zwei-fronten-strategie-im-kampf-gegen-ad-blocker/.
-  J. Mikians, L. Gyarmati, V. Erramilli, and N. Laoutaris. Crowd-assisted search for price discrimination in e-commerce: first results. In CoNEXT, 2013.
-  Mozilla. Firefox: Most Popular Extensions. https://addons.mozilla.org/en-us/firefox/extensions/?sort=users.
-  PageFair. The 2015 Ad Blocking Report. https://blog.pagefair.com/2015/ad-blocking-report/.
-  M. Z. Rafique, T. Van Goethem, W. Joosen, C. Huygens, and N. Nikiforakis. It’s Free for a Reason: Exploring the Ecosystem of Free Live Streaming Services. In NDSS, 2016.
-  Schneier on Security. The Ads vs. Ad Blockers Arms Race. https://www.schneier.com/blog/archives/2016/02/the˙ads˙vs˙ad˙b.html, 2016.
-  The Guardian. Major sites including New York Times and BBC hit by ‘ransomware’ malvertising . https://www.theguardian.com/technology/2016/mar/16/major-sites-new-york-times-bbc-ransomware-malvertising, 2016.
-  The New York Times. The Ad Blocking Wars. http://nyti.ms/1Qs20YB, 2016.
-  The Next Web. This adblocker-blocker helps you get around sites that ban you for hiding ads. http://thenextweb.com/apps/2016/02/11/around-and-around-we-go/#gref, 2016.
-  The Register. Ad-blocker blocking websites face legal peril at hands of privacy bods. http://www.theregister.co.uk/2016/04/23/anti˙ad˙blockers˙face˙legal˙challenges/.
-  X. Xing, W. Meng, D. Doozan, N. Feamster, W. Lee, and A. C. Snoeren. Exposing inconsistent web search results with bobble. In PAM, 2014.
-  T. Yamamoto, M. Matsushita, T. Kamiya, and K. Inoue. Measuring similarity of large software systems based on source code correspondence. In Product Focused Software Process Improvement. Springer, 2005.
-  Z. Yu, S. Macbeth, K. Modi, and J. M. Pujol. Tracking the Trackers. In WWW, 2016.