Zorro: A Model Agnostic System to Price Consumer Data
Personal data is a key ingredient in showing web users targeted ads - the economic backbone of the web. Still, there are two major inefficiencies in how such data is bought and sold online: (1) users do not decide what information is released nor do they get paid for this privacy loss; (2) algorithmic advertisers are stuck in inefficient long-term contracts where they purchase user data without knowing how much value it provides. This paper proposes a system, Zorro, which aims to rectify the aforementioned two problems.
As the main contribution, we provide a natural, “absolute" definition of the “Value of Data” (VoD) – for any quantity of interest, it is the delta between an individual’s value and the population mean. The challenge remains how to operationalize this definition such that it is independent of a buyer’s model for the VoD. We propose a model agnostic solution by relying on matrix estimation, a rapidly growing field within machine learning, and show how it can be used to estimate click-through-rate (CTR), for example.
Regarding (2), Zorro (and this VoD definition) empowers advertisers to measure value of user data: (i) on a query-by-query basis; (ii) based only on increase in accuracy it provides in estimating CTR. This is in contrast with inefficient long-term contracts advertisers are engaged in now with third-party data sellers. We highlight two experimental results on a large real-world ad click dataset. (i) Our CTR estimation system has a of , in line with state-of-the-art results for comparable problems (e.g. content recommendation). Crucially, our system is model agnostic, i.e., we estimate CTR for a given user and advertiser without accessing an advertiser’s proprietary models, a necessary property of any such pricing system. (ii) With respect to our definition of VoD, experiments show selling user data has incremental value in estimating CTR ranging from 30% to 69% depending on advertiser category. Roughly, this translates to at least a USD 16 Billion loss in value for advertisers if user data is not provided.
Regarding (1), in addition to allowing users to get paid for data they share, we extend our system design for when users provide explicit intent for types of ads they want to see.
Zorro: A Model Agnostic System to Price Consumer Data
|MIT and Munther Dahleh|
|MIT and Devavrat Shah|
|MIT and Dylan Sleeper|
|MIT and Andrew Tsai|
|MIT and Madeline Wong|
“Personal data is the new oil of the internet and the new currency of the digital world "
- Meglena Kuneva, European Consumer Commissioner
Over the past two decades, online advertising has become the economic backbone of the web. According to a study by PriceWaterhouseCoopers and the Internet Advertising Bureau (IAB), non-search 111Non-search includes display ads, audio & video ads, classifieds and lead generation. For simplicity, we collectively call them display ads (see  for details). online advertising revenue has grown from USD 50 million in 1996 to USD 54 Billion in 2018, with a 23% year-on-year growth from 2017 to 2018 (cf. ). Furthermore, personal data has been one of the key resources advertisers use to show users targeted, relevant display ads.
Inefficiencies in Current System. There still exist two major inefficiencies in the way such data is both bought and sold: (1) from a user’s perspective, they do not get to decide what information about themselves is released nor do they get paid for this loss of privacy; (2) from an advertiser’s perspective, they have to purchase user data without knowing how much value it is going to provide to them.
These inefficiencies have led to a swath of societal issues including: (i) increasing ad fraud, where 20% to 30% of online ad data sold has been deemed to be fraudulent (cf. ); (ii) an increasing consumer backlash for loss of privacy from unsanctioned data sharing such as in the Facebook-Cambridge Analytica case (cf. ); (iii) an inability to litigate against companies with massive data breaches such as Equifax (cf. ).
To have a meaningful dialogue about such issues, an important and necessary first step is defining what the “Value of Data" (VoD) is. However, there does not exist a precise definition of the VoD, which is independent of a buyer’s model for VoD. This is why firms are engaged in inflexible long-term data contracts, instead of paying on a query-by-query basis based on the incremental value the data provides.
Why Data is a Unique Commodity. The challenge in designing data pricing systems stems from the nature of data as an asset: (i) user data is freely replicable once sold; (ii) advertisers vary widely and correspondingly, so do their models for the VoD; (iii) the usefulness of user data is difficult to verify a priori before running it through a buyer’s model, something buyers are not willing to do before purchasing the data. See  for a more complete overview.
Our Focus: Click-Through-Rate Estimation. We focus our efforts on the sale of online user cookie data (e.g. a user’s demographic data and browsing history) for display advertising and the prediction task of estimating user click-through-rate (CTR) for a given ad and user, the key quantity of interest in online display advertising. We do so because the sale of online cookie data for this setting is one of the large sources of data exchanged (and revenue generated) in modern web-scale systems and the associated network engineering infrastructure is extremely performant; transactions between display advertisers and third-party data sellers normally occur in less than 100 milliseconds, the time required for a user to load a webpage and for an ad to be served (cf. ).
Thus if one can design a feasible solution for this setting, there are many further applications. An important business-to-business example is the sale of potential consumer leads in insurance and mortgage markets using the increasingly adopted Ping-Post Architecture (cf. ).
1.1 Our Contribution
VoD Definition. In Section 3, we give a natural definition of the VoD – for any quantity of interest (in our case, CTR) it is the delta between an individual’s value and the population mean. The key challenge we try to solve through our system design is how to operationalize such a definition in an “absolute" sense, independent of a buyer’s models for the VoD.
System Design. In Section 3 we give an overview of the key functionalities/modules of Zorro. (1) An inference system to operationalize our definition of the VoD. This is a challenging inference problem as we only get access to sparse, noisy click data from users. We propose to solve it by reducing CTR estimation to an instance of matrix estimation, where the rows of the induced matrix are users, columns are advertisers and entries are historical click data. Thus we can exploit the rich matrix estimation literature, and utilize data-driven, model agnostic matrix estimation algorithms that have become a workhorse in large-scale content recommendation systems. (2) A module to identify and categorize online ads in an automated fashion. (3) An interface between users, Zorro and advertisers that ensures user data is not inadvertently “leaked" to advertisers without user permission and ensures advertisers only pay for user data based on the increased in accuracy it provides to them (in particular, to estimate CTR).
Experiments. In Section 4, we perform four set of experiments to verify the implementation of the modules described above. (1) On a large real-world advertising click dataset, we show we accurately estimate the CTR for a given user and advertiser ( of ) independently of an advertiser’s model, the crucial property we require to operationalize our definition of the VoD. (2) On the same dataset, we show that with respect to our definition of the VoD, there is large variation in the value of user data depending on the advertiser’s category (30% to 69% depending on category). (3) We show we can accurately classify (83% True Positive and 98% True Negative) and categorize ads (53% correct classification out of 341 categories) by testing our implementation on popular websites. (4) We show we can indeed prevent user data from being “leaked" to advertisers by counting the number of repeats ads a user sees depending on whether Zorro is turned “on" or “off" (number of repeats drops three-fold when “on") .
Forward Looking Intent. In Section 5, we show how one can generalize our model to capture forward-looking intent of users. We do so by extending the fundamental data structure of interest, the matrix of (users advertisers), to an order-3 tensor of (user advertisers user-intent) and furnishing an algorithm that exploits the additional structure when estimating CTR. We verify experimentally that our proposed algorithm does effectively exploit the additional latent structure in the (user advertisers user-intent) tensor by comparing against an appropriate baseline ( of 0.53 vs. 0.17).
1.2 Related Work
We divide related work into the three major buckets: (1) real-time online systems to sell user data; (2) data mining to estimate value of user data; (3) theoretical market mechanisms to sell information.
Real-Time Systems to Sell User Data. We compare with two proposed systems (cf. [24, 26]) that aim to design real-time pricing mechanisms of online user data in the context of advertising. The key difference is in Zorro, we propose a precise definition of the VoD. Without such a definition, which is agreed upon by both buyer and seller, price setting remains a challenge.
In , they propose selling historical user data by posting a price each time a user loads a webpage and running an “infinite supply" auction, i.e., if an advertiser bids above the posted price, then acquire user data for that webpage load. This posted price is then constantly updated for each subsequent webpage load by that user. Such a mechanism potentially has limitations. (i) Optimal online price setting for data in such systems can be infeasible using standard methods (e.g. a no-regret update mechanism). For example, no-regret update schemes are feasible only if user data is relatively static. However, user’s browsing patterns are constantly changing and so it is not immediately clear how to adapt a no-regret update scheme to maximize revenue for such a case. (ii) Advertisers either get all of the data and pay the posted price or receive none of it. This could potentially lead to a large loss in revenue. Zorro aims to circumvent these two problems by dividing the transaction into two segments: (i) an offline segment, where each advertiser negotiates a contract with Zorro on how much they value a marginal increase in accuracy; (ii) an online segment, where Zorro estimates the marginal increase in accuracy without access to the advertisers models. Advertisers then pay based on the real-time estimated increase in accuracy and the offline contract (see Section 3.4 for details of the implementation). Importantly, both the offline and online segments of the contract in Zorro are entirely auditable by a third-party leading to a more robust system.
|n , they propose a blockchain based system and advocate for a new cryptocurrency named Basic Attention Token (BAT), which allows user to get paid in BAT for viewing ads online. For their system to be widely adopted, it would require an overhaul of the entire, very well-established online advertising ecosystem (see Section 2), and need users, advertisers and publishers to all buy into this new cryptocurrency. In contrast, Zorro can be plugged into the current ad ecosystem in a straightforward manner agnostic to a buyer’s model. It essentially functions as a novel third-party data seller (i.e. Data Management Platform - see Section 2), where users get paid for providing their data and for advertisers pay based on the estimated increase in accuracy they experience. Further in , they do not provide details on how pricing of user data is done, a crucial part of any such system.
Data Mining to Estimate Value of User Data. Given how important the problem of understanding the role of user data in ad targeting, there has been a lot of empirical work in this field. We highlight a few representative works, [3, 31, 28, 12]. The focus of such work has been to mine large historical datasets to estimate the average value of user data. They do indeed find that user data can increase value ranging from 20-60%, depending on the user and advertiser type, which is line with our findings. However, these works propose highly specific approaches that require intimate knowledge of how a buyer will extract information/value from the data. In contrast, by applying matrix estimation to estimate CTR, we propose a general-purpose model agnostic method that allows advertisers to estimate VoD on a query-by-query basis.
Theoretical Market Mechanisms to Sell Information. Some works that propose theoretical market designs to sell information include [2, 4]. We consider our work to be complimentary to theirs as a fundamental quantity that needs to be estimated in such works is the VoD. Zorro proposes a natural, data-driven definition of VoD that can potentially be used to operationalize such systems. Further the works listed have a single buyer and multiple data sellers (e.g. a supply-chain company buying data streams for demand forecasting). This is in contrast with Zorro which is meant for direct, repeated sales of data between multiple buyers and a single seller.
1.3 Organization of Paper
The rest of the paper is organized as follows: (1) In Section 2, we give an overview of the online advertising ecosystem that matches advertisers to users and the role of third-party data sellers within it. (2) In Section 3, we precisely define the VoD and justify it. We also give a detailed conceptual overview of all of the key components of the Zorro architecture. (3) In Section 4, we rigorously test the various components of our architecture on large real-world datasets. (4) In Section 5, we show how to extend Zorro to include forward-looking intent and corroborate our proposal with experiments. (5) Lastly, in Section 6, we discuss the role of Zorro in designing a general-purpose data recommendation system.
2 Overview of Real-Time Bidding Ecosystem
The Real-Time Bidding (RTB) ecosystem refers to the entire engineering infrastructure associated with buying and selling ads in real-time as users load webpages. We give a brief summary of the key players in the RTB ecosystem below. Refer to  for a detailed description of RTB. See Figure 1 for an overview of the current ecosystem.
Advertisers and Demand Side Platforms (DSPs). Advertisers are companies that want to market their product online. Due to the engineering complexity of the RTB ecosystem, advertisers normally go through a DSP to make bids into the RTB system on their behalf. Thus DSPs essentially serve as algorithmic online advertising agencies.
For our purposes it is not necessary to differentiate between advertisers and DSPs, and we call the collective entity “advertisers" for simplicity.
Publishers and Supply Side Platforms (SSPs). Websites that users visit (e.g. Nytimes, ESPN) are called publishers. When a user loads a publisher webpage, the publisher sends a request out into the RTB system to serve the user ads. Analogous to advertisers, publishers go through SSPs to make requests in the RTB system on their behalf.
Ad Exchanges (ADX). ADX serve as the interface between DSPs and SSPs by matching ad requests from SSPs to bids made by DSPs. They do so by using a combination of first-price and second-price auctions.
Data Exchanges a.k.a Data Management Platforms (DMPs). DMPs serve DSPs, SSPs and ADXs historical user data (usually in real-time) to feed into their models to allow for better, more targeted matchings between users and ads.
Our Focus is Advertiser-DMP Interface. The majority of demand (and source of revenue) for historical user data is from advertisers. User data is crucial to them as their models require such data to make more accurate bids by evaluating CTR. That is why we focus the discussion of this paper to solve inefficiencies between how Users – DMPs – Advertisers interface (see Section 1 for an overview of the current market inefficiencies) . However none of our system design is specific to the transaction between DSPs and DMPs, and can be extended to include SSPs and ADXs as well (see Section 3). See Figure 1 for an overview of the proposed change to ecosystem with Zorro.
3 Zorro System Overview
As mentioned in Section 1.1, we focus our system design on the following functionalities that are required for Zorro to be feasible in practice:
VoD Definition. We provide a natural, “absolute" definition of the VoD that captures the incremental value that data provides in estimating CTR. Importantly, our definition is independent of the models of the buyer.
VoD Computation. To operationalize our definition of the VoD, the key requirement is that for a given advertiser, we need to estimate the value of a user’s data independently of an advertiser’s models. We show how one can achieve this using matrix estimation to accurately estimate the CTR in a model agnostic manner.
Ad Identification & Categorization. A necessary step to do CTR estimation, is to accurately identify when users have clicked on an ad. Further to effectively group advertisers, we need to categorize advertisers in a meaningful and automated way. We show how to do both ad identification and categorization by tying together existing open source software packages.
User – Zorro – Advertiser Interface. We propose an interface/dynamic that ensures advertisers pay for user data based only on the increase in accuracy it brings. And correspondingly, ensures user data is not inadvertently “leaked" to advertisers without them paying for it.
In Sections 3.1 - 3.4, we explain how Zorro performs the above four functions respectively. Our focus will be on “VoD Definition", “VoD Computation" and the “User - Zorro - DSP Interface" as that is where the most significant conceptual contributions are. “Ad Identification & Categorization" is a necessary component to make Zorro work and so we will describe how to implement it briefly. Our proposed system design will be corroborated with experiments in Section 4.
3.1 Defining the Value of Data
3.1.1 Problem Setup - CTR Estimation
We consider the case where there are users and advertisers222A “user" can also refer to a group of users. Similarly, an “advertiser" can refer to a category of advertisers.. For a given User and Advertiser , we define the probability of an ad click as . We represent these CTR quantities for each user, advertiser pair through the (unobserved) matrix where .
The challenge is that we do not observe the matrix . Rather, we observe a sparse, noisy observations of its entries. This is because in reality, most users only see a small percentage of all advertisers; further, for advertisers they do see, they get exposed to their ads only a few times, most likely once.
We denote our observation matrix as . If , then user has not been exposed to advertiser . If , then indicates the empirical average of the number of times a user has clicked a particular ad.
The aim is to infer the CTR for any User and Advertiser (i.e., ), given only sparse, noisy observations (i.e., ).
3.1.2 Latent Variable Model
For any model to effectively learn the underlying CTR given sparse, noisy data, some structure on the matrix must be assumed. A general, flexible way of capturing the underlying structure in social data (e.g. Netflix challenge, product recommendation) is assuming a latent variable model (LVM), arguably the canonical modeling choice from the lens of nonparametric statistics (see cf.  and references therein).
Specifically for our case, we assume that the CTR for a given user and advertiser is described by the following latent variable model, where , and . Here refer to multi-dimensional variables that denote latent factors associated with user and advertiser . is a latent function that maps and to a value between and denoting the underlying CTR for that user, advertiser pair.
3.1.3 Definition of Value of Data
Given the LVM described above, we can now provide a definition of the VoD.
Definition 3.1 (Value of Data)
For and , the value of User ’s data to Advertiser is given by,
In words, this definition states that the value of User ’s data to an Advertiser is the absolute difference between the average CTR for the advertiser across all users (given by ) and the CTR for user (given by ). Though seemingly natural, we consider the definition above as arguably the most important conceptual contribution of this paper and we now make a few remarks regarding it.
Model Agnostic VoD Definition. Note that our definition of VoD is not dependent on any advertiser specific model, but rather just on the prediction task itself (in this case CTR estimation), which is what renders it model agnostic. Further, it is worth noting that we can easily replace CTR with any other quantity of interest in a two-sided market (e.g. likelihood of lead conversion in an auto-insurance marketplace, estimated average spend in a retail marketplace).
Translation of VoD to a Dollar Amount. To convert this quantity to a dollar amount, we need to multiply VoD(User i, Advertiser j) by how much Advertiser values a marginal increase in accuracy in estimating CTR. For example, it could be be the case that an online healthcare advertiser selling medical drugs values a unit increase in accuracy significantly more than an advertiser selling sports equipment. Hence, the quantity VoD(User i, Advertiser j) will need to be scaled to capture this heterogeneity. In Section 3.4, we show in detail how such advertiser specific adjustments can be made.
3.2 Computing the Value of Data
3.2.1 CTR Estimation via Matrix Estimation
To make our definition of VoD operational in practice, we need an inference method to estimate CTR without relying on the models of the advertisers, but instead using only observations in terms of historical clicks. Specifically, given , estimate . This is why we choose matrix estimation, the de facto, non-parametric method to learn the underlying mean of a matrix from sparse, noisy observations. Moreover, it comes with strong theoretical guarantees when a LVM structure is assumed on the underlying mean matrix (cf. ). Importantly, matrix estimation remains an actively researched area within machine learning with a variety of scalable, production-grade implementations (some standard libraries include [23, 1]).
Matrix Estimation Overview. Matrix estimation is the problem of recovering a data matrix from an incomplete and noisy sampling of its entries. Specifically, there exists an underlying matrix, , of interest. We denote the observation matrix as , whose entries are noisy versions of , i.e., . In addition to the observations being noisy, it is further assumed that we only observe a small subset of the entries, i.e., each entry is observed with probability .
This setup has become of great interest due to its connection to recommendation systems, social network analysis, and graph learning (graphon estimation). The key realization of this rich literature is that one can estimate the true underlying matrix from noisy, partial observations by simply taking a low-rank approximation of the observed data. We refer an interested reader to  for a broad overview and references therein.
Definition 3.2 (Matrix Estimation)
A matrix estimation algorithm, denoted as , takes as input a noisy matrix and outputs an estimator .
CTR estimation via Matrix Estimation. CTR estimation can be viewed as a special case of matrix estimation. Specifically, can be modeled as the empirical average of a Bernoulli random variable with parameter . Thus we have that (recall is the fraction of entries observed). This is exactly the setting in which matrix estimation applies. Thus by applying a standard matrix estimation algorithm, such as Singular Value Thresholding (SVT)or Alternating Least Squares (ALS) (cf. [14, 8]), we can reliably infer the underlying , simply from sparse, noisy observations (i.e., the small number of click or no click data we have). Recall again that to carry out this procedure, we do not need access to the advertiser’s ML model, simply the click data of any user using the Zorro system.
In Section 4.1 we show on a large real-world online advertising datasets, matrix estimation (using ALS) does reliably infer the underlying CTR from partial, noisy click data (after appropriate aggregation of user and advertiser data). We have out-of-sample performance of , which is in line with state-of-the-art performance for matrix estimation.
Estimated Value of Data Definition. Analogous to Definition 3.1, we define a notion of VoD that can be empirically measured. We recall certain quantities. Let denote the set of sparse noisy observations of user, advertiser data. Let be the estimated underlying CTR after applying matrix estimation on the sparse, noisy observations . We denote .
Definition 3.3 (Estimated Value of Data)
For and , the estimated value of User ’s data to Advertiser is given by,
In Section 4.2, we show on a large real-world online ad click dataset, user’s personal data has value (with respect to Definition 3.3) ranging from 30% to 69% depending on advertiser category. Importantly, Zorro allows advertiser’s to estimate the VoD for each user and hence purchase data on a query-by-query basis, rather than through inefficient long-term contracts as is done now (details in Section 3.4).
It is also worth highlighting that Definition 3.3 can be operationalized using matrix estimation simply using historical user ad clicks. Thus any DMP (or more generally third party data seller) can perform this function if they collect such data.
3.3 Ad Identification & Categorization
3.3.1 Ad Identification
Recall that the focus of this work is display ads rather than search ads (see Section 1).
Definition of Ad. Below we describe a straightforward two-level filter system to identify display ads when a user clicks on a link on a website.
Filter Level 1. We filter for hyperlinks on a website that redirect to a url external to the website itself.
Filter Level 2. We parse the redirect url to check for substrings that are almost always present when the external link redirects to ads hosted by third-party servers (e.g. AdChoices, Outbrain). We do this by filtering the redirect url using the industry standard Easylist filter list, which is a set of rules designed by AdBlock (cf. ) to identify ads.
In Section 4.3, we show that this simple system effectively identifies whether an external hyperlink is an ad (True Positive and True Negative rate of and respectively).
Generalization to Non-Traditional Ads. It is worth mentioning that companies not traditionally considered advertisers do purchase user data to personalize their website experience for each user (cf. ). For example, an online marketplace company might purchase user browsing data to decide what products to recommend to that user if he or she comes to their homepage. Again users do not get to choose whether this personal data about them is allowed to be sold nor do they derive income from it. We leave it as future work to generalize our definition of ads (with the necessary adjustments made to the entire Zorro system) to just Filter Level 1.
3.3.2 Ad Categorization
Interactive Advertising Bureau (IAB) Overview. Ad categorization is an important pre-processing step in grouping advertisers together before performing matrix estimation to compute the VoD. Otherwise, the matrix of users and advertisers (i.e., ) is too sparse and we cannot effectively infer the underlying CTR for a user, advertiser pair. We rely on a widely adopted online advertising taxonomy 333“IAB Tech Lab Content Taxonomy Version 2.0" published by the IAB (cf. ), which advertisers use to serve more relevant content to users. This context taxonomy has four tiers of classification (e.g. “Style & Fashion" Women’s Fashion Women’s Clothing Women’s Business Wear). We focus our categorization system on the top two tiers of IAB categories of where there are and respectively.
Implementing IAB’s Categorization. Given a website, classifying which category it belongs is a challenging task due to the large number of Tier 1 and 2 categories. To automate the process of classifying websites into IAB categories, we input the text within the body of a webpage into a natural language processing software, called uClassify (cf. ), to classify it to an IAB category.
In Section 4.3, we show this natural language processing system effectively classifies Wikipedia webpages into the correct IAB category, with an average Tier 1 and Tier 2 accuracy of and respectively. Note that a random guess Tier 1 and Tier 2 guess would have an accuracy of approximately and respectively.
3.4 User — Zorro — Advertiser Interface
In this section, we explain from the advertisers’ perspective, how precisely they will interface with Zorro and gain value from it. Recall this part of the system serves two purposes: (1) ensure advertisers pay for user data based only on the increase in accuracy it brings; (2) ensure user data is not inadvertently “leaked" to advertisers without them paying for it. Our approach consists of two steps: (1) an offline step where advertisers negotiate a contract of how much a marginal increase in accuracy is worth; (2) an online step where advertisers query Zorro for user data, and pay for it based on the estimated increase in accuracy they will receive from purchasing it.
Offline Contract. Recall from Section 3.1.3 that each advertiser can have a very different value for an increase in accuracy (consider an online advertiser selling medical drugs vs. sports equipment - how much each values an unit increase CTR can by vastly varying). Thus the estimated VoD (given by Definition 3.3) needs to be scaled accordingly. Doing this scaling is precisely the purpose of the offline contract between a DSP and Zorro. We describe the contract below.
Recall from Section 3.3.2 that before doing matrix estimation, we group advertisers together by IAB category (e.g. “Disease and Conditions", “Sports Equipment"). Since we estimate the VoD per advertising category, it follows that this offline contract must be defined per IAB category as well. Let there be such categories. Then for advertiser and category , we define a function . This function, summarizes how much advertiser values an increase in accuracy for estimating CTR for category .
For example, a sports equipment advertiser might value a increase in accuracy in the category “Sports Equipment" to be worth while may value a increase in the category “Disease and Conditions" to only be worth . The functions capture this heterogeneity. Thus a negotiated contract between Zorro and an advertiser will be the set
Real-Time Interface. In the example below, let there be advertisers denoted that have negotiated an offline contract with Zorro. We describe the proposed sequence of interactions that will occur between advertisers and Zorro, each time a user on Zorro loads a publisher’s webpage.
Whenever a user sends a webpage request, it is routed through a Zorro proxy server and is anonymized by replacing the User ID (i.e. user cookies) by a random ID (i.e. random cookie). Advertiser’s that have signed a contract with Zorro can query the system to ask for user data before deciding what bid to make. They then pay for this data based on the estimated VoD and the offline contract. A key point to highlight is that a new random ID is generated each time a user loads a webpage, which is what anonymizes the user request.
A random Zorro ID cookie is generated and linked with User ’s actual ID cookie.
User ’s with Zorro extension has their webpage request routed through a Zorro proxy server.
User ’s loads a webpage.
The publisher pings a RTB exchange for ad requests and the RTB forwards the requests to advertisers.
query Zorro to check if User ’s ID cookie belongs to Zorro 444The random Zorro ID cookie could contain a Zorro specific numeric signature to signify that user has the Zorro extension installed
If User ’s ID cookie belongs to Zorro, then following two actions occur
Zorro applies matrix estimation and sends following two signals: (i) for each , the estimated VoD of User ’s data; (ii) User ’s actual personal data (e.g. demographic, interest segments, browsing history).
pays Zorro based on offline contract defined above. Specifically pays,
A new random Zorro ID cookie is generated and linked with User ’s actual ID cookie.
Why User Data is Not “Leaked"? By refreshing a new random Zorro ID cookie every time a user loads a webpage, we ensure that to an advertiser who has not negotiated a contract with Zorro, it seems like a random user is loading the webpage. A crucial benefit of this architecture is that user data is never permanently “leaked" as the random Zorro ID cookie is constantly refreshed. Thus if a user who has installed Zorro decides to go incognito at some point and no longer receive money for providing their personal data, they easily can.
3.4.1 Model Agnostic Data Contracts
The problem of companies not wanting to share their models or their data before a transaction has happened clearly extends far beyond online advertising. This is why so many firms are stuck in inefficient data contracts, where they purchase data based on a long-term agreement or a coarse metric such as the number of API calls they make to a data seller.
That is why we believe our proposed approach to designing data contracts can serve as a general-purpose, empirically auditable system for companies to reason about the VoD when sourcing from external vendors. In essence, Zorro can serve as a base to allow data buyers to pay based only on the estimated increase in accuracy for a clearly defined prediction task, when purchasing a new dataset from third-parties.
The experiments are organized to answer four set of questions:
Matrix Estimation. How accurately matrix estimation is able to estimate the underlying CTR?
VoD. How much variation in VoD for user data is there depending on an advertiser’s category?
Ad Data Collection. How accurately can we identify and categorize ads?
User Data Protection. Can we reliably prevent user data from being “leaked" to advertisers?
4.1 Matrix Estimation Experiments
Matrix Estimation Algorithms Applied. We focus on two well-studied matrix estimation algorithms, Alternating Least Squares (ALS) and Singular Value Thresholding (SVT). See [14, 8] for a detailed description of these algorithms and theoretical guarantees of convergence and quality of the final estimate. For ALS, we rely on a standard PySpark implementation (cf. ) and for SVT, we use the NumPy package (cf. ) to perform the singular value decomposition. Lastly, we implement a hybrid two-step algorithm that first does SVT on the observation matrix and uses the outputted matrix as a “warm start" for ALS.
Metric Used to Measure Accuracy. We measure quality of performance using 555For a set of values and estimates , , where , , and .
4.1.1 Avito Dataset
Description of Dataset Used. We test our system on the Avito Context Ad Clicks dataset. To the best of our knowledge, it is the largest and most comprehensive publicly available ad click dataset. It can be found on Kaggle (cf. ). Avito is the largest online marketplace in Russia with 70 million unique monthly visitors. Another reason we use this dataset is that its focus is contextual ads (i.e. banner ads), which is also the focus of our work.
There are context ad views in the dataset, out of which led to clicks. For each view, the dataset includes metadata about the user (e.g. geographic location, parameters of the search) and the ad itself (e.g. category of the ad, historical CTR), along with whether the view led to a click or not.
Grouping Users & Advertisers. We group users by their geographical location (there are 3431 unique location IDs) and ads by their category (there are 31 unique category IDs). We choose location as it is a standard feature used in CTR estimation, and related tasks. However, many other user features can and should be incorporated in practice. Indeed, our experiments are meant to showcase accurate CTR estimation is possible by simple groupings of users. Similarly, in practice advertisers are commonly grouped by category (see Section 3.3.2) and hence we do so as well.
Data Preprocessing. In Figure 6, we plot a histogram of empirical CTR for each user in the Avito dataset (on a log scale). We see a large number of user CTRs that are unreasonably high (e.g. there is a spike at ). Indeed, a known issue in such datasets is “click-bots" which are programs that automatically click on ads to increase revenue in a fraudulent manner. That is why we filter out all users beyond a certain threshold, specifically . We do so because according to historical Google AdWords data, “human" CTR for different advertiser categories ranges between and . (cf. ).
This choice of threshold is justified empirically through Figure 6. We create a matrix of users and advertisers (grouped according to location and category respectively), take its singular value decomposition, and then plot the singular values of this matrix for different thresholds. If we don’t threshold (i.e. keep the threshold at ), then the spectrum decays very gradually. However, as we decrease the threshold, the spectrum decays progressively quicker, and the low-rank structure we expect to see emerges (see Section 3.2). The optimal threshold is approximately at , and at this threshold, we still retain of the original data.
Figure 6 is encouraging as it gives credence to the theoretical model laid out in Section 3.1.2. A LVM suggests that the underlying matrix, ought to be approximately low-rank. And indeed, Figure 6 shows that the underlying data is approximately rank 2! This justifies the LVM and provides a structural reason for the success of the matrix estimation.
4.1.2 Matrix Estimation Results
Hyper-parameter Selection. For both ALS and SVT, there is one major hyper-parameter to choose, the number of singular values to keep. From Figure 6, we see that the top two singular values contain most of the “signal" and hence we choose to keep two singular values. This is further justified through cross-validation. In PySpark’s implementation of ALS, a regularization parameter must also be chosen, which we set to be , again using cross-validation.
For Figures 6-6, we keep the percentage of data withheld for out-of-sample testing to be fixed at . We see that performance of the various algorithms improves steadily as the fraction of data observed and the number of data points increases (with ALS outperforming SVT). However, this improvement levels off after approximately of the data is observed and million data points are used. This indicates two takeaways with regards to accurate CTR estimation: (1) some amount of grouping of users and advertisers is an important pre-processing to ensure the resulting matrix is not too sparse; (2) our system needs to collect approximately on the order of tens of millions of user data to apply matrix estimation effectively.
Signal vs. Noise Thresholding via Matrix Estimation. From Section 3.2, we know that matrix estimation should help uncover the underlying mean of the CTR matrix, , while reducing the noise as long as the underlying matrix is low-rank (which we verified through Figure 6). In particular for the Avito dataset, we expect that the average CTR per advertiser category should remain approximately the same both before and after applying matrix estimation 666There are a large number of user location id’s (3434 of them). Thus the pre-ME estimate of CTR per advertiser category should be relatively stable.. However, the variance in CTR per advertiser category should drop significantly.
We recall the notation used in Section 3.2. Let
In words, are the pre-ME and post-ME empirical average CTRs for each advertiser category in the Avito dataset. Let,
In words, are the pre-ME and post-ME empirical variance in CTR for each advertiser category in the Avito dataset.
In Figures 6-6, we plot the normalized differences in mean and variance per advertiser category 777We normalize by to ensure can be appropriately compared between categories. For example, a difference is significant if the underlying average CTR is , but not as much if it is . Same reasoning applies for .,
We see from Figure 6 that is of the order , while is of the order . Thus as desired, is on average approximately three orders of magnitude smaller than , indicating that matrix estimation does indeed effectively retain signal and threshold out noise.
Summary. In summary, ALS outperforms SVT (the two-step procedure of applying SVT and then ALS gives comparable performance). The best achieved is 0.58; This is in-line with or better than state-of-the-art performance for well-studied datasets such as Movielens, which have performance in the range of (cf. ). Further, our experiments show that matrix estimation does effectively retain signal and threshold out noise.
Thus matrix estimation (specifically, ALS) can be used as an accurate, model-agnostic method to do CTR estimation as long as Zorro has access to sufficient amount of user data, and careful grouping of users and advertiser is done.
4.2 Value of Data Experiments
An important question to ask is if with respect to Definition 3.3 of VoD, whether an advertiser finds value in a user’s personal data? Intuitively, if the CTR distribution for a particular advertiser category is tightly concentrated around its mean, then acquiring user data is not as important for that advertiser (as there is not much variation in CTR across users). In contrast, if the CTR distribution for an advertiser category is diffuse, then advertisers are incentivized to acquire personal user data as it helps them identify which segment of the distribution a user lies in.
Large Variation in VoD Across Advertiser Categories. We plot the distribution of CTR per category using a Box-and-Whisker’s plot in Figure 7, and see instances of both cases above. For example, advertisers in Category 18 have relatively lower VoD as the CTR distribution is tightly concentrated, while advertisers in Category 4 have quite a large VoD as the CTR distribution is quite diffuse. Thus we can visually tell that advertiser’s do indeed have a different VoD depending on their category, giving credence to Definition 3.3.
We can summarize these visual findings through the following quantity,
In words, is the average VoD Advertiser experiences per user 888Again, like in Section 4.1.2, we normalize by to ensure we can appropriately compare between categories.. In Figure 7, we plot for the various Avito categories. We see that ranges from 30% to 69% depending on advertiser category, indicating that there is large variation in VoD depending on which category an advertiser belongs to.
Query-by-Query Estimation of VoD. It is important to highlight that in our proposed system, Advertiser can compute the VoD on a query-by-query basis. In other words, each time User loads a webpage, Advertiser can compute . This is contrast with how advertiser’s and third-party data providers engage currently, where advertisers pay based on long-term contracts or a coarse metric such as number of API calls made.
Estimating Dollar Value of VoD. Ideally we would like to translate the estimated VoD to a dollar amount per advertiser category. Unfortunately, the Avito advertiser categories are anonymized and so instead, we give a conservative estimate. From Section 1, the size of the (rapidly growing) display advertising market is USD 54 Billion. Since the minimum is estimated to be (in line with previous work, see Section 1.2), it follows that if Zorro prevents advertisers from getting user data without paying for it, the minimum loss of value for advertisers is on the order of USD 16 Billion.
4.3 Online Ad Experiments
In this section, we test how well we identify and categorize online ads by tying together open source software packages.
4.3.1 Online Ad Identification
Overview of Ad Identification Module. Recall from Section 3.3.1 that the main filter we use on the redirect url is EasyList, the same filter set used by AdBlock. In addition, we manually add keywords such as “doubleclick” and “criteo” that commonly occur when a user has clicked on an ad.
Overview of Ad Identification Experiment. To evaluate this simple system, we conducted a crowdsourced experiment. The experiment consisted of iterating through Alexa’s top 500 global sites (cf. ) and clicking ten unique ads on the first 30 valid sites (websites that were not in English or had no ads were skipped). If there were not ten unique ads on the websites landing page, then ads on different webpages under the some domain were searched for. Additionally, hyperlinks that were not ads were also clicked to test whether our system could also identify non-ads accurately.
Result of Ad Identification Experiment. The result of the above crowdsourced experiment are shown below.
|True Ad||Not Ad|
|Classified Ad||239 (83%)||49 (17%)|
|Classified Not Ad||2 (97%)||68 (3%)|
We have a True Positive rate of and a True Negative Rate of , indicating that our system can reliably identify whether a hyperlink on a webpage is an ad or not. Additionally, a large portion of the ads we could not identify were because they did not go through an ad exchange (i.e. have a redirect url that spawned a tab in a browser). If we do not include such ads, then our True Positive rates goes up to . Such ads are the result of direct negotiations between a website and advertisers. Thus they are not of relevance to our system as they do not go through the RTB ecosystem, and third-party data is not purchased by advertisers in such instances.
4.3.2 Online Ad Categorization
Overview of Ad Categorization Module. Recall from Section 3.3.2 (and verified in Section 4.1.2) that ad categorization is an important pre-processing step to group advertiser’s together to get good out-of-sample performance in CTR estimation. As detailed in Section 3.3.2, we use the industry standard IAB taxonomy to categorize advertisers. To do so, we use the open-source natural language processing software uClassify to predict a webpage’s IAB category based on the text in the webpage’s body. We focus on the top two Tiers of the IAB taxonomy as the categorization is sufficiently detailed. It is straightforward to extend to the bottom two tiers if required.
Overview of Ad Categorization Experiment. Unfortunately, there do not exist good online datasets with IAB categorizations of websites that we can use to test the accuracy of our system. Thus we designed a proxy experiment using Wikipedia webpages as follows. Wikipedia provides a detailed hierarchal categorization of its own webpages. Thus for each IAB Tier 1 category ( in total), we randomly sample two Tier 2 categories within it and then define a mapping from these IAB Tier 2 category to a near identical Wikipedia category. For example, the IAB category "Travel Type" was mapped to the Wikipedia category "Types of Travel". If a close Wikipedia category was not found, we resample an IAB Tier 2 category such that a matching Wikipedia is found.
For each Wikipedia category selected, we chose up to 100 articles under that category. If there were not 100 articles for a given category (for many there were not), we iterated through the subcategories of that category, and randomly sampled articles from subcategories until we reached a 100. We then ran our classifier on the text from those articles.
Result of Ad Categorization Experiment. The results of the above experiment are shown in Figures 8 and 8. For Tier 1 categories, the classification accuracy was 67.36 and for Tier 2 categories, the classification accuracy was 52.75. This is consistent as correctly classifying a Tier 1 category is a necessary requirement to correctly classify a Tier 2 category.
It is worth noting that these results are far better than if we random selected a category. Recall that there are 28 IAB Tier 1categories and 341 Tier 2 categories and so a random classification would give and accuracy respectively.
Furthermore, it is likely that our system performs better than our tests indicate for the following two reasons. (1) It was often that case that articles from subcategories of the selected Wikipedia category were somewhat unrelated the further down in the Wikipedia hierarchy the subcategory was. For example, the category "Environment" was mapped to a Wikipedia subcategory "Underground Laboratories". In such mappings, correct classification is not possible. (2) If an outputted IAB category was incorrect, it was still often strongly related to the correct one. For example, articles on children’s films would be classified as family or fantasy films.
Considering these relatively benign sources of error, and the large number of Tier 1 and Tier 2 categories, our system reliably categorizes an advertiser to a relevant IAB category, based on the advertiser’s webpage.
4.4 User Data Protection Experiments
Recall from Section 3.4 that for our system, it is essential that personally identifiable information (PII) of a user on the Zorro system, is not inadvertently “leaked" to advertisers. We begin by giving a brief overview of how advertisers track users across the web and then describe an experiment to verify that user data is indeed not leaked to an advertiser.
Tracking Techniques. As users surf the web, advertisers track users through three main methods (cf. ).
(1) Third Party Cookies: Primary method of tracking. Cookies on a user’s browser not placed there by the primary website he/she is visiting 999First-party cookies are placed on a user’s browser by the primary website. Used to store site specific settings such as login information.. Meant solely for tracking users across web. Hence we identify and remove such cookies by checking that they are unrelated to the primary website.
(2) Cache: Images and other website-specific information are stored on the browser as cookies. Meant for more efficient page loads as website information need not be retrieved from the server. However, websites can encode information in such cookies to identify and track users. Easily easily solved by clearing the cache periodically. However from the experimental results, this is not currently necessary as websites do not tend to forward this information on to advertisers.
(3) Flash Cookies: Cookies stored in flash player plugins (e.g. Adobe Flash Player). Flash cookies are harder to clear without leading to a significantly poorer user experience; specifically, clearing flash cookies would require clearing data from every plugin a user has installed. Again as with (2), from experimental results, this is not currently necessary as a minimal number of advertisers go through such a mechanism.
Overview of User Data Protection Experiment. We design a simple test to verify whether our system prevents inadvertent “leakage" of user data to advertisers. We do so by clicking on an ad on an initial website, and counting the number of times that ad is repeated on subsequent websites. We do this experiment with: (1) clearing no cookies; (2) blocking only third-party cookie blocking; (3) blocking all types of cookies listed above. We automate this experiment by using software packages Selenium and Pyautogui to automatically click on ads on the top 500 websites listed by Alexa (cf. ). On each website, our software clicks every “iframe" element and uses our ad identification module to classify if is is an ad or not.
Result of User Data Protection Experiment. See Table 4.4 for results of the above experiment. As desired, we see that when we simply block third-party cookies, we reduce the percentage of repeat ads show from to as compared to if there was no blocking. By blocking all cookies listed above, it does not affect the number of repeat ads seen, indicating that blocking Cache and Flash cookies is currently less important.
|Zorro Off||3rd Party||All|
|Total Ads Clicks||103||107||107|
|# Repeat Ads||30 (29%)||10 (9%)||10 (9%)|
5 Forward Looking Intent
Explicit User Intent in Display Advertising. A large loss of value in the current ecosystem is that user intent is not captured. For example, it is common in online advertising that once a user has made a retail purchase, ads for the purchased item are repeatedly shown to the user after the fact. This is clearly a source of large loss of revenue for advertisers and a great irritant to users.
In a future iteration of Zorro, we intend to provide an API to users that let them indicate forward-looking intent for the category of ads they would like to see. Below we extend the LVM described in Section 3.1.2 to incorporate user intent. Further, we describe an algorithm to estimate CTR that exploits the additional structure when users provide intent. We then verify that our proposed algorithm does indeed exploit the additional structure when estimating CTR
CTR Estimation with Intent. Let the set of categories a user can provide intent for be indexed by . Then we can model user intent by generalizing to an order-3 tensor, . Here the -th, “slice" of is a matrix of users and advertisers (appropriately grouped) conditioned on user intent being . As before, we get noisy, sparse observations from , which we denote as . Analogous to the matrix case, if , then User has not been exposed to Advertiser , while providing intent . And if , then indicates the empirical average of the number of times a user has clicked a particular ad while providing a specific intent. The aim is estimating from sparse, noisy observations in .
LVM for CTR estimation with Intent. Recall from Section 3.1.2 that we impose a LVM on the underlying mean matrix , where . We extend the LVM by assuming that , where , , and . , and are defined identically to that in Section 3.1.2. The extension we make to the model now is that is a latent factor that captures the effect of user intent in CTR estimation.
Algorithm for Tensor Based CTR Estimation. We know from an extension of the arguments made in  that if follows a LVM, then it is “approximately" low-rank. Thus we postulate that the flattened tensor, i.e. is also “approximately low-rank". Here is the matrix constructed by appending together the different “slices" of . Thus we propose the following simple algorithm:
(1) Construct from identically to how is defined; (2) Let ; (3) Declare
Experimental Verification of Algorithm. In the Avito dataset, fortunately for our purposes, there is user intent data provided (called “SearchCategoryID"). There are user search category IDs present in the dataset. Analogous to Section 4.1.1, where we create a matrix of user locations and ad category, we create an order-3 tensor of dimension , by incorporating user intent. We now present two experimental results.
In Figure 9, the rank of the flattened user-advertiser-intent matrix (i.e. ) induced from the tensor (i.e. ) is approximately 3! This justifies the generalized LVM above.
In Figure 9, we compare our proposed algorithm against a natural baseline; in the baseline, we run matrix estimation (using ALS) for each of the 40 “slices" separately. By doing so, we do not exploit any of the additional (latent) structure across the user intent dimension. We plot the for the algorithms as a function of the percentage of the tensor that is filled. As desired, we see that the out-of-sample for our algorithm (0.53) is significantly improved over the baseline (), indicating that our algorithm does effectively exploit the latent structure across all three dimensions.
In conclusion, we provide an “absolute" definition of the VoD, which is independent of a buyer’s model for VoD. We operationalize our proposed definition by relying on matrix estimation, and our experiments show that it faithfully (using out-of-sample performance) recovers the VoD. As future work, we intend to open-source and release a Zorro Chrome extension for public use. Lastly, we note that our proposed architecture, where we estimate the VoD in a model agnostic manner, can serve as base to build a general-purpose data recommendation system, i.e., instead of buyers querying Zorro, Zorro recommends data to buyers if estimated VoD is high.
-  Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. Tensorflow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pages 265–283, 2016.
-  Anish Agarwal, Munther A. Dahleh, and Tuhin Sarkar. A marketplace for data: An algorithmic solution. CoRR, abs/1805.08125, 2018.
-  Howard Beales. The value of behavioral targeting. Network Advertising Initiative, 1, 2010.
-  Dirk Bergemann, Alessandro Bonatti, and Alex Smolin. The design and price of information. American Economic Review, 108(1):1–48, January 2018.
-  Hal Berghel. Equifax and the latest round of identity theft roulette. Computer, 50(12):72–76, 2017.
-  Internet Advertising Bureau. Iab tech lab content taxonomy. https://www.iab.com/guidelines/iab-quality-assurance-guidelines-qag-taxonomy/, 2018.
-  Carole Cadwalladr and Emma Graham-Harrison. Revealed: 50 million facebook profiles harvested for cambridge analytica in major data breach. The Guardian, 17, 2018.
-  Jian-Feng Cai, Emmanuel J Candès, and Zuowei Shen. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 20(4):1956–1982, 2010.
-  Sourav Chatterjee et al. Matrix estimation by universal singular value thresholding. The Annals of Statistics, 43(1):177–214, 2015.
-  Mark A Davenport and Justin Romberg. An overview of low-rank matrix recovery from incomplete observations. IEEE Journal of Selected Topics in Signal Processing, 10(4):608–622, 2016.
-  Magdalini Eirinaki and Michalis Vazirgiannis. Web mining for web personalization. ACM Trans. Internet Technol., 3(1):1–27, February 2003.
-  Phillipa Gill, Vijay Erramilli, Augustin Chaintreau, Balachander Krishnamurthy, Konstantina Papagiannaki, and Pablo Rodriguez. Follow the money: understanding economics of online aggregation and advertising. In Proceedings of the 2013 conference on Internet measurement conference, pages 141–148. ACM, 2013.
-  Umar Iqbal, Zubair Shafiq, and Zhiyun Qian. The ad wars: Retrospective measurement and analysis of anti-adblock filter lists. In Proceedings of the 2017 Internet Measurement Conference, IMC ’17, pages 171–183, New York, NY, USA, 2017. ACM.
-  Prateek Jain, Praneeth Netrapalli, and Sujay Sanghavi. Low-rank matrix completion using alternating minimization. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing, pages 665–674. ACM, 2013.
-  Kaggle. Kaggle avito context ad clicks. https://www.kaggle.com/c/avito-context-ad-clicks/, 2015.
-  J Kågström, R Karlsson, and E Kågström. uclassify web service. 2013.
-  Yashodhan Karandikar. Cse 255 assignment 1: Movie rating prediction using the movielens dataset.
-  Christina Lee, Yihua Li, Devavrat Shah, and Dogyoon Song. Blind regression: Nonparametric regression for latent variable models via collaborative filtering. In Advances in Neural Information Processing Systems, pages 2155–2163, 2016.
-  Boberdoo LLC. Ping post software – the basics. https://www.boberdoo.com/ping-post-ping-tree, 2019.
-  Wordstream LLC. Google ads benchmarks for your industry. https://www.wordstream.com/blog/ws/2016/02/29/google-adwords-industry-benchmarks, 2016.
-  Charles Mann. How click fraud could swallow the internet. Wired Magazine, 14(1), 2006.
-  Jonathan R Mayer and John C Mitchell. Third-party web tracking: Policy and technology. In 2012 IEEE Symposium on Security and Privacy, pages 413–427. IEEE, 2012.
-  Xiangrui Meng, Joseph Bradley, Burak Yavuz, Evan Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, DB Tsai, Manish Amde, Sean Owen, et al. Mllib: Machine learning in apache spark. The Journal of Machine Learning Research, 17(1):1235–1241, 2016.
-  Christopher Riederer, Vijay Erramilli, Augustin Chaintreau, Balachander Krishnamurthy, and Pablo Rodriguez. For sale: your data: by: you. In Proceedings of the 10th ACM WORKSHOP on Hot Topics in Networks, page 13. ACM, 2011.
-  David Silverman. Iab internet advertising revenue report. Interactive Advertising Bureau. New York, 2018.
-  Brave Software. Basic attention token (bat) white paper- blockchain based digital advertising. https://basicattentiontoken.org, 2018.
-  Alexa Top. 500 global sites, 2018.
-  Janice Y Tsai, Serge Egelman, Lorrie Cranor, and Alessandro Acquisti. The effect of online privacy information on purchasing behavior: An experimental study. Information Systems Research, 22(2):254–268, 2011.
-  Stefan Van Der Walt, S Chris Colbert, and Gael Varoquaux. The numpy array: a structure for efficient numerical computation. Computing in Science & Engineering, 13(2):22, 2011.
-  Jun Wang, Weinan Zhang, Shuai Yuan, et al. Display advertising with real-time bidding (rtb) and behavioural targeting. Foundations and Trends® in Information Retrieval, 11(4-5):297–435, 2017.
-  Jun Yan, Ning Liu, Gang Wang, Wen Zhang, Yun Jiang, and Zheng Chen. How much can behavioral targeting help online advertising? In Proceedings of the 18th international conference on World wide web, pages 261–270. ACM, 2009.
-  Weinan Zhang, Shuai Yuan, Jun Wang, and Xuehua Shen. Real-time bidding benchmarking with ipinyou dataset. arXiv preprint arXiv:1407.7073, 2014.