Dark Patterns after the GDPR: Scraping Consent Pop-ups and Demonstrating their Influence

# Dark Patterns after the GDPR: Scraping Consent Pop-ups and Demonstrating their Influence

## Abstract

New consent management platforms (CMPs) have been introduced to the web to conform with the EU’s General Data Protection Regulation, particularly its requirements for consent when companies collect and process users’ personal data. This work analyses how the most prevalent CMP designs affect people’s consent choices. We scraped the designs of the five most popular CMPs on the top 10,000 websites in the UK (n=680). We found that dark patterns and implied consent are ubiquitous; only 11.8% meet the minimal requirements that we set based on European law. Second, we conducted a field experiment with 40 participants to investigate how the eight most common designs affect consent choices. We found that notification style (banner or barrier) has no effect; removing the opt-out button from the first page increases consent by 22–23 percentage points; and providing more granular controls on the first page decreases consent by 8–20 percentage points. This study provides an empirical basis for the necessary regulatory action to enforce the GDPR, in particular the possibility of focusing on the centralised, third-party CMP services as an effective way to increase compliance.

Notice and Consent; Dark patterns; Consent Management Platforms; GDPR; Web scraper; Controlled experiment
\CopyrightYear

#### Apparatus and Materials

The materials and apparatus of this study include a pre-study survey, a browser extension, and a post-study survey.

The pre-study survey consisted of 11 questions designed to gather demographic information (age, employment status, highest degree obtained, country of residence), check whether the participants met the study criteria (devices used to browse the web, main browser, travelling to the EU during the study), and acquire their informed consent.

To expose the participants to the different interface designs in a controlled yet ecologically realistic context, we developed a browser extension that injects different pop-ups into any website that participants would visit during their normal daily browsing (available as open-source after publication). The designs of the eight interfaces (i.e., conditions) were inspired by the designs of the top five Consent Management Platforms also used for the scraper study: QuantCast, OneTrust, TrustArc, Cookiebot, and Crownpeak.

All the text, data processing purposes, and vendor names were created by synthesising those commonly used by a random selection of those CMPS in the top 500 Alexa websites in the UK. The data processing purposes are a combination of the options that the five CMPs give to website owners when they create their own pop-up, or the purposes those websites came up with themselves. The vendor names were copied from existing websites, and picked to represent one of four categories: well-known companies (e.g, “Yahoo!”), foreign companies with English names (e.g., “Beijing Interactive Marketing”), foreign companies with non-English names (e.g., “Programatica de publicidad S.L.”), and gibberish names (e.g., “s_vi_bikx7Becalgbkjkxx”).

The extension used the open-source JavaScript database PouchDB to store the participants’ interactions with the interfaces locally, which was synchronised with a CouchDB instance running on OpenStack over an SSL encrypted connection.

The post-study survey consisted of four questions asking the participants to reflect on their general pop-up answering strategy, showed them a visualisation of their actual answers, and asked them to describe how well those answers fit their ideal preferences.

#### Procedure

A recruitment email was sent to potential participants asking them to join a study about web-tracker activity in the United States compared to the European Union, and answer the pre-study survey. Once approved, the participants were assigned and emailed a participant number and a link to the Chrome extension on the Chrome Web Store. After installing the extension, a welcome screen automatically appeared asking the participants to fill in their assigned number. This connected the installation to the participant number in the CouchDB database, where each participant was matched to a pre-determined experiment and condition order. Once the extension was successfully activated, a pop-up appeared notifying the participants the experiment had started. To train the participants and homogenise their understanding of the CMPs they received an additional email that informed them they might sometimes see consent pop-ups (ostensibly when they were shown the European version of a website instead of the US equivalent), explained how those pop-ups worked, and instructed them to answer the pop-ups according to their preferences.

The extension injected a pop-up every fourth url visited – including navigations on the same page, excluding automatic redirects or urls for which an answer was already recorded – to approximate the realistic frequency with which consent pop-ups are currently shown9. Each interface condition was repeated four times, requiring the participants to answer sixteen pop-ups per experiment. All interactions with the pop-up were recorded and timestamped: clicking on the elements, toggling purposes or vendors, scrolling the lists, navigating back and forth between the pages, submitting a consent response. Interfaces which were not interacted with were re-appended to the list of conditions and shown again for a maximum of five times, after which it was recorded as “not answered” (similar to a participant clicking or scrolling through the interface without providing a consent response). Once all conditions of the first experiment were answered, the participant progressed to the second experiment.

After completing both experiments, the participants were notified by email that the study was finished, informed that they could uninstall the extension, and asked to complete the post-study survey. The completion time of the experiment ranged from four days to three weeks, depending on how many unique urls the participant visited per day (e.g., some participants mostly visited the same websites, some went on holiday during the experiment, some installed the extension on their secondary device and only used it a couple days per week).

#### Data analysis

Although originally 48 participants finished the experiment, we removed eight of them because they mentioned in the survey that their answers were affected by the study (e.g., some participants said they chose “accept all” because they wanted to give more data, despite the instructions they received). To analyse the effects of interface design on consent answers we created a linear regression model with fixed effects; we treated the participants as a factor to account for their (assumed) stable privacy preferences.

### 5.2 Results

#### General interaction patterns

Of the four possible consent choices – accept all, reject all, submit preferences, no answer – the vast majority of answers submitted by participants was through the bulk options (89.3%), with a skew towards accepting: 55.2% (707) accept all versus 34.1% (437) reject all. Just 9.7% (124) of answers represent the “submit preference” option, and 0.9% (12) were “no answer” (but recorded interactions). Of those 124 “submit” answers, merely 21 – given by 6 participants – were personalised answers, instead of submitting the default status (all toggled off). Four of those 21 were personalised by clicking the “Toggle All” button, which means only 17 answers out of 1280 (1.3%) represent a participant consenting to a specific selection of purposes or vendors. Whether this is because users are unable to make such decisions, are not interested in that level of detail, or fatigued by the form and frequency of the question is unclear; but it does indicate that users’ consent is rarely empirically as “specific” as the GDPR requires it to be. It does not follow that specific controls should therefore be removed, but rather that such specificity could be distributed to other actors invited by the user (e.g., browser agents, consent predictors, a knowledgeable friend).

Almost all interactions (93.1%) were limited to the first page of the pop-up the participants were exposed to. Seven out of eight interfaces had a “more options” link to navigate to a second or third page for more information and granular consent choices, but this was clicked only 88 times (6.9%). When participants were exposed to a scrollable list of data collection purposes or vendors on the first or subsequent pages (560 occasions), they ignored it 68.6% (384) of the time. Of the 176 instances they did scroll, 21.6% (38) were between 0 and 25 percent of the list, and 64.2% (113) between 75 and 100 percent. In other words, anything not immediately visible to the user, anything requiring interaction to access, might as well not exist.

#### Notification style

The validity of one design element still under discussion by policy makers is that of the notification style [zuiderveenborgesiusTrackingWallsTakeItOrLeaveIt2017]: a barrier in the middle of the screen which prevents the user from interacting with the website until a response is recorded, or a banner stretching the width of the screen that does not block access to the information .

We found that notification style did not affect the consent rate of participants. Two simple linear regressions were calculated to investigate the relationship between the answer given (accept or not) and notification style (banner or barrier). The first, comparing Barrier to Banner with both the Accept and Reject button, did not find a regression line at all (F(1,279) = 0.000, p = 1). The second, comparing Barrier to Banner with just the Accept button, found a non-significant relationship (p = 0.702), with a slope coefficient of 0.013 (95% CI min and max of 0.052 and 0.077 respectively) and an R2 of 0.001.

While there was no difference in acceptance rate when participants actually answered the pop-up, the banner notification was ignored 3.6 times more often than the barrier. For this statistic we considered any pop-up that was not interacted with, but which had a time difference of at least 3 seconds between being injected and the tab being closed, as “ignored”; 133 of such instances were found, with only 21.1% (28) for the barrier and 78.9% (106) for the banner.

#### Button prominence

Data from our scraper indicates ‘accept all’ and ‘reject all’ buttons are not displayed with equal prominence: only a mere 12.6% of sites show both on the same page. Such unequal prominence of consent options is already considered non-compliant with the GDPR [planet49ag, informationcommissionersofficeGuidanceUseCookies2019] because it is expected they affect consent answers, but the severity of its impact is unknown.

We found that removing the ‘reject all’ button from the first page increased the probability of consent by 2223 percentage points. We calculated two simple linear regressions to analyse the relationship between the answer given (accept or not) and the consent options on the first page (accept and reject, or just accept). The first, comparing Accept all + Reject all to Accept all for the barrier notification, found a strong positive linear relationship between the two. The significant (p < 0.001) slope coefficient for the consent answer was 0.220, meaning the accept rate increased on average by 22.0 percentage points when the reject all button was removed from the first page. The 95% CI had a minimum and maximum of 0.149 and 0.290 respectively. The R2 was 0.117, so 11.7% of the variation in answers for the barrier notification can be explained by the changing prominence of the buttons.

The second regression compared Accept all + Reject all to Accept all for banner notifications and found a similarly strong, positive linear relationship between the button prominence and answer given. The significant (p < 0.001) slope coefficient was 0.231, meaning the accept rate increased on average by 23.1 percentage points when the reject all button was removed from the first page. The 95% CI had a minimum and maximum of 0.163 and 0.230 respectively. The R2 was 0.135, so 13.5% of the variation in answers for the banner notification can be explained by the changing prominence of the buttons.

#### Level of granularity

The most common order in which consent options are displayed is bulk first, followed by the data collection purposes on the second page and the vendors on the third page, or some combination of those two on the same page.

We found that displaying more granular consent choices on the first page decreased the probability of consent by 820 percentage points. We calculated a simple linear regression to compare a Bulk only interface to an interface that combined Bulk + Purposes; Bulk + Vendors; and Bulk + Purposes + Vendors on the same page. We found a significant (p < 0.01) negative relationship between all increases in the level of granularity of consent options and the answer given, with different strengths depending on the kind of options that were available. As illustrated by Table 2, showing just the vendors affected the acceptance rate the most (0.200), whereas just the purposes (0.088) and the combination of vendors and purposes (0.119) were closer together but still lower than the baseline interface with just bulk options. Along the same lines, the 95% CIs overlap most between Purposes and Purposes + Vendors and only a little with Vendors.

#### Participant Strategies and Behaviour Patterns

While the experimental data suggests how different designs affect how “freely given” the consent answers of participants are, it does not provide information about how those answer relate to their preferred privacy settings. In a post-study survey, we requested participants to describe their overall answering strategy, showed them a visualisation of their actual behaviour, and then asked them to state how well their answers reflected their ideal preferences and, if not, why. To structure these findings, we classify participants according to their general consent answers: always accept, mostly accept (>= 75%), mixed consent, mostly reject (>= 75%), always reject.

When asked what they based their choices on, the answers touched on eleven different topics. The four ‘always accept’ participants cited a general apathy towards privacy concerns and “just did it to make the window go away”. The one participant that ‘always rejected’, no matter whether that required more effort, argued that they would only accept data collection if it was to use a particular feature offered by the site. The eleven participants categorised as ‘mostly reject’ heavily emphasised a disagreement with the practice of tracking in general and stated they would only consent to have their data collected if it was for websites they trusted. Two of those also mentioned that they did not feel a need for any personalisation. The participants that fell into the ‘mostly accept’ and ‘mixed consent’ category were more diverse. Most often mentioned were pragmatic reasons such as just wanting to get to the site as quickly as possible, not believing the controls were meaningful, and not wanting to lose any functionality. Eight decided based on trust, whether it was the website or the vendors, and the sensitivity of the data they would be submitting (e.g., banking information). One participant stated that they relied on other methods to protect their privacy, so did not care that much about their pop-up answers: “I tend to vary my devices/browsers/accounts/use incognito and duckduckgo a lot, I’m not too worried about my data being tracked to every detail.”

After being shown a visualisation of their actual consent behaviour and asked if it matched their ideal settings, the responses were predominately that it did not. Only those falling into the two extreme categories – ‘always accept’ and ‘always reject’ – all indicated they agreed (3) or strongly agreed (2) with their answers. For the remaining three categories, the sentiments were mostly spread evenly along the spectrum, with 11 somewhat agreeing, 3 neither agreeing nor disagreeing, 7 somewhat disagreeing, 1 disagreeing, and 3 strongly disagreeing.

The 25 participants who indicated their behaviour did not match their ideal privacy settings were asked to explain what the reason for this difference was. Participants mentioned desires such as just wanting more privacy (“I would rather companies not collect any information”); the fear of unknown consequences of opting-out (“I didn’t want to risk the website not working after that”); and not knowing what their ideal preferences even are. The most common reason mentioned, however, was the interface design. Participants lamented the fact that pop-ups stand in the way of their primary goal (accessing a service), that the frequency of the pop-ups caused frustration and consent fatigue, and even the perception that the pop-up “forced them to accept” – even though these options were available on the second page.

### 5.3 Interim Discussion

The experimental results indicate how two of the most common consent interface designs – not showing a ‘reject all’ button on the first page; and showing bulk options before showing granular control – make it more likely for users to provide consent, violating the principle of “freely given”10. The notification style, on the other hand, appears to have no effect on the answer, but possibly a large effect on whether an answer is given at all, suggesting that a non-blocking mechanism provides a desired third consent option to users: a neutral middle-ground. The qualitative reflections of the participants, however, put into question the entire notice-and-consent model not because of specific design decisions but merely because an action is required before the user can accomplish their main task and because they appear too frequently if they are shown on a website-by-website basis.

### 5.4 Limitations

The participant sample is by no means representative of the general population in the United States: they are almost all young and university-educated, and recruited primarily through an emailing list of a computer science department. Arguably, this means that our results describe a “best case scenario”: these participants should be more knowledgeable about privacy issues and better equipped to understand consent interfaces than the average web user.

There are a number of confounding variables that could have affected the participants’ answers. First, although the condition order was counterbalanced, we cannot guarantee that the participants were actually exposed to them in that order (e.g., if they opened multiple tabs in a row and visited them anachronistically), meaning order effects might not be controlled for. Second, because we showed the same pop-up to each participant until we recorded four answers per interface, some participants were exposed to the different conditions more often than others. Lastly, participants might have also encountered “real” pop-ups at the same time as the injected ones if the website they were visiting was within the territorial scope of the GDPR.

While the GDPR is a European policy, our experiments were conducted in the United States. These populations have been exposed to different legal regimes and different consent controls over the year, something which we expect has affected their mental model of these kind of pop-ups and accordingly, how they answer them. This might influence the extent to which these findings can be generalised to a European population, and thus how they should be used to inform EU policy changes.

## 6 Discussion and Conclusion

The results of our empirical survey of CMPs today illustrates the extent to which illegal practices prevail, with vendors of CMPs turning a blind eye to — or worse, incentivising —- clearly illegal configurations of their systems. Enforcement in this area is sorely lacking. Data protection authorities should make use of automated tools like the one we have designed to expedite discovery and enforcement. Designers might help here to design tools for regulators, rather than just for users or for websites. Regulators should also work further upstream and consider placing requirements on the vendors of CMPs to only allow compliant designs to be placed on the market. Such enforcement may be possible as the Court of Justice indicates that plugin system designers can be ‘joint controllers’ along with websites [fashionid, mahieuResponsibilityDataProtection2019a, vanalsenoy], and the UK’s ICO indicates it may be willing to force advertising trade bodies to alter their standards [informationcommissionersofficeUpdateReportAdtech2019]. If this is the case, regulators must carefully consider how to build a robust and well-maintained evidence base for user-centric CMP design.

A core takeaway from the user study is that placing controls or information below the first layer renders it effectively ignored. This leaves a few options for genuine control of tracking online. If the notice-and-consent model is to continue, it may be necessary to declare that, for example, consent can never be valid with the presence of the (on average) hundreds of third parties we have shown data is sent to and cookies laid by today. This would mean consent would only be valid if a compact but representative and rich description can be placed on the first layer, and could certainly be a possible direction for the Court of Justice to consider if they interpret the principles of data protection in a future case.

An alternative approach would be to overhaul the design pattern of the consent banner or barrier, and have richer, more durable ways to set preferences, potentially within the browser. The key is that such browser settings would be legally binding, rather than weak and self-regulatory in nature. Yet the current heavy lobbying around the EU’s draft ePrivacy Regulation has centred in part on adtech firms trying to prevent browser settings having legally binding effect — part of an ongoing drama for many years about the potential legal status of ‘Do Not Track’ signals [kamaraNotTrackInitiatives2016a]. Designers have a role here: how can users reflect on tracking across the Web, rather than on a per-site basis? If users are not to automatically reject everything, how can advertisers negotiate and present them with reasons that they should consent? Might there be a role for delegation of preferences to a trusted civil society actor, and what kind of relationship, information and interaction might the user have with these? We invite and encourage researchers to bring their skills and views to bear on these important, current issues at the confluence of regulation, design and fundamental rights.

## 7 Acknowledgments

Our thanks to José Juan Dominguez Veiga and Kristian Borup Antonsen for their invaluable assistance programming the scraper and extension, and to Luke Taylor and Benjamin Cowan for their advice with the statistical analysis.

Midas Nouwens was supported by the Aarhus Universitets Forskningsfond and the Danish Agency for Science and Higher Education. Ilaria Liccardi was supported by the William and Flora Hewlett Foundation and the NSF 1639994: Transparency Bridges grant. Michael Veale was supported by the Alan Turing Institute under EPSRC grant no. EP/N510129/1.

## References

### Footnotes

1. As an unalienable fundamental right, it is impossible for an EU resident to ‘sign away’ their right to effective data protection.
2. https://github.com/scrapy/scrapy
3. https://github.com/scrapy-plugins/scrapy-splash
4. Relevant legislation is harmonised across the EU and so a Danish IP and UK IP are the same jurisdiction for our purposes.
5. A company that does server-side ad serving and writes reports about the state of the industry: www.adzerk.com
6. It should be noted that Adzerk’s methodology counts CMPs by URL endpoints of the Javascript files and we found during development that websites frequently include inactive CMPs’ .js files. This means that Adzerk’s statistics are likely inflated with double-counting, and that our survey is consequently more representative than the 57.09% would indicate.
7. Note that the recent judgement from the European Court of Justice clarified that these requirements have been part of EU law since 2012, rather than just since the GDPR [planet49court]
8. Age was reported using brackets of ten years so we are unable to report the exact range; the answers were assumed to be normally distributed to calculate the mean.
9. Based on Adzerk’s Ad-Tech Insights report: [adzerk]
10. It should be noted this data alone is not enough to establish legal compliance.
Comments 0
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters

Loading ...
404156

You are asking your first question!
How to quickly get a good answer:
• Keep your question short and to the point
• Check for grammar or spelling errors.
• Phrase it like a question
Test
Test description