1 Introduction

Contextual bandit algorithms (CBAs) often rely on personal data to provide recommendations. This means that potentially sensitive data from past interactions are utilized to provide personalization to end-users. Using a local agent on the user’s device protects the user’s privacy, by keeping the data locally, however, the agent requires longer to produce useful recommendations, as it does not leverage feedback from other users.

This paper proposes a technique we call Privacy-Preserving Bandits (P2B), a system that updates local agents by collecting feedback from other agents in a differentially-private manner. Comparisons of our proposed approach with a non-private, as well as a fully-private (local) system, show competitive performance on both synthetic benchmarks and real-world data. Specifically, we observed a decrease of 2.6% and 3.6% in multi-label classification accuracy, and a CTR increase of 0.0025 in online advertising for a privacy budget . These results suggest P2B is an effective approach to problems arising in on-device privacy-preserving personalization.


Privacy-Preserving Bandits


Privacy-Preserving Bandits {sysmlauthorlist} \sysmlauthorMohammad Malekzadehqmul \sysmlauthorDimitrios Athanasakisbrave \sysmlauthorHamed Haddadibrave,icl \sysmlauthorBenjamin Livshitsbrave,icl \sysmlaffiliationqmulQueen Mary University of London (this work was done during my internship at Brave software), \sysmlaffiliationbraveBrave Software, \sysmlaffiliationiclImperial College London \sysmlcorrespondingauthorMohammad Malekzadehm.malekzadeh@qmul.ac.uk


1 Introduction

Personalization is the practice of tailoring a service to individual users by leveraging their interests, profile, and content-related information. For example, a personalized news service would learn to recommend news articles based on past articles the user has interacted with. Contextual Bandit Algorithms Li et al. (2010); Ronen et al. (2016) are frequently the workhorse of such personalized services. CBAs improve the quality of recommendations by dynamically adapting to users’ interests and requirements. The overall goal of the agent in this setting is to learn a policy that maximizes user engagement through time. In doing so, the agent collects potentially sensitive information about a user’s interests and past interactions, a fact that raises privacy concerns.

On-device recommendation addresses privacy concerns, arising from the processing of users’ past interactions, by storing and processing any user feedback locally. As no personal data leaves the user’s device, this approach naturally maintains a user’s privacy111In a practical setting, the service maintainer still needs to do extra work to eliminate potential side channels attacks.. However, the on-device approach is detrimental to personalization, as it fails to incorporate useful information gleaned from other users, limiting its utility in making new recommendations and leading to the cold-start problem, where the quality of initial recommendations is often insufficient for real-life deployment.

Privacy-preserving data analysis techniques such as the Encode-Shuffle-Analyze (ESA) scheme implemented in PROCHLO Bittau et al. (2017) promise to safeguard user privacy, while preserving the quality of resulting recommendations, through a combination of cryptographic, trusted hardware, and statistical techniques. In this paper, we explore the question of how this approach can be used in a distributed personalization system to balance the quality of the received recommendations with maintaining the privacy of users.

This paper proposes P2B—Privacy-Preserving Bandits—a system where individual agents running locally on users’ devices are able to contribute useful feedback to other agents through centralized model updates while providing differential privacy guarantees Dwork & Roth (2013). To achieve this, P2B combines a CBA agent running locally on a user’s device with a privacy-preserving data collection scheme similar to ESA.

Specifically, this paper makes the following contributions:

  • Problem encoding. We propose simple methods of efficiently encoding feedback instances of a contextual bandit problem on the user’s device. The encoding process combines simple clustering algorithms such as -means, with the favorable spatial structure of normalized contexts. We study the effects of this structure on both privacy and utility,  and experimentally demonstrate how this encoding approach is competitive with methods that do not take privacy under consideration.

  • Privacy analysis. We examine the trade-offs between recommendation quality and privacy loss in the system by performing a differential privacy analysis of our proposal according to the crowd-blending privacy model Gehrke et al. (2012). Our analysis proves that P2B results in a small value, which can be directly quantified from the probability of an agent participating in the data collection mechanism. This demonstrates mathematically that P2B provides a concrete and desirable privacy guarantee.

  • Experimental results. We construct a testbed for the evaluation of our approach where our proposal is evaluated on synthetic and real-world data. We include results on a synthetic benchmark, multi-label classification, and online advertising data. Our results experimentally demonstrate that P2B remains competitive in terms of predictive utility with approaches that provide no privacy protections. At the same time, it substantially outperforms on-device cold-start models that do not share data222Code and data are available at: https://github.com/mmalekzadeh/privacy-preserving-bandits.

2 Background

Contextual bandit algorithms present a principled approach that addresses the exploration-exploitation dilemma in dynamic environments, while utilizing additional, contextual information. In the contextual bandit setting, at time the agent selects an action based on the observed -dimensional context vector at that time. The agent then obtains the reward associated with the selected action, without observing the rewards associated with alternative actions.

Upper confidence bound (UCB) methods operate by computing upper bounds on the plausible rewards of each arm and consequently selecting the arm with the highest bound. In the implemented P2B, we use LinUCB Chu et al. (2011), which computes UCB based on a linear combination of rewards encountered on previous rounds to propose the next action. In LinUCB the exploration-exploitation actions depend on , which is the parameter controlling that trade-off.

In the following we provide some preliminary material on privacy preserving data release mechanisms in relation to personalization with contextual bandits.

2.1 Differential Privacy Preliminaries

In the differential privacy framework  Dwork & Roth (2013) a data sharing mechanism violates its users privacy if data analysis can reveal if a user is included in the data with a degree of confidence higher than the mechanism’s bound.

Definition 1: Differentially-Private Data Sharing. Given , we say a data sharing mechanism satisfies differential privacy if for all pair of neighbor datasets of context vectors differing in only one context vector and for all ,

2.2 Crowd-blending

Crowd-blending privacy model  Gehrke et al. (2012) is a relaxation of differential privacy. In the crowd-blending framework a user blends with a number of users in a way that replacing the user’s data with any other individual in this crowd does not alter the results of statistical queries. A necessary condition for this approach is the existence of other users that blend with user . If the data does not contain a sufficient number of other individuals the mechanism essentially ignores user ’s data. Consequently an encoding mechanism that satisfies crowd-blending privacy Gehrke et al. (2012) can be formalized as follows:

Definition 2: Crowd-Blending Encoding. Given , we say an encoding mechanism satisfies crowd-blending privacy if for every context vector and for every context dataset we have

where denotes the size of a set333Note that can be any context dataset not including ..

This means that for every context vector , either its encoded value blends in a crowd of other values in the released dataset, or the mechanism ignores . In this setting, means that if releases an encoded value for a given , it should be exactly the same as other encoded values coming from other context vectors . An -differentially private mechanism also satisfies -crowd-blending privacy for every integer  Gehrke et al. (2012).

An important aspect of crowd-blending privacy is that if a random pre-sampling is used during data collection, before applying the crowd-blending on the collected data, the combined pipeline provides zero-knowledge Gehrke et al. (2011) privacy and thus differential privacy Dwork & Roth (2013) as well. This intuitively says that if a user blends in a crowd of users, then users similar to can easily take place of . Therefore, user being sampled or not being sampled has a negligible effect on the final output of the combined mechanism.

2.3 Differentially-private Data Collection

Privacy-preserving data collection has numerous applications. To collect browser settings data, Google has built RAPPOR into the Chrome browser Erlingsson et al. (2014). RAPPOR considers data to be shared as a string and hashes it into Bloom filters using some hash functions. It then randomizes the Bloom filter, that is a binary vector, and uses it as the permanent data to generate instantaneous randomized response for sharing with the server. RAPPOR can be used to collect some aggregated statistics over the product users. For example, having a huge number of users, RAPPOR can be used for estimating the most frequent values, e.g. the most frequent homepage URLs. However, since the shared data is only useful for estimating aggregated statistics like mean or frequency, the utility of every single shared data for training a model is often too low —- the accuracy of the private data becomes unacceptable even with a large number of users.

As an extension to RAPPOR, ESA architecture Bittau et al. (2017) adds two more layers to the LDP layer, called shuffler and analyzer, that are aimed to obscure the identity of the data owner through oblivious shuffling with trusted hardware. The Shuffler in the ESA architecture eliminates all metadata attached to the users’ reports to eliminate the possibility of linking a data report to a single user during data collection. However, if users do not want to trust any other party, the provided privacy will be the same as the RAPPOR.

3 Methodology

In P2B, every user runs their own CBA agent that works independent of any other agents. At time the agent observes the current context , proposes an action , and consequently observes the reward associated with the action. As the interaction proceeds locally, we refer to agents running on a user’s device as local agents. The agent may periodically elect to send some observed interaction to a data collection server. Using the collected data, the server updates a central model that is then propagated back to the users.

Figure 1 summarizes the overall architecture of the proposed framework. The rest of this section presents P2B’s operation and details its various components. Section 4 describes why P2B satisfies differential privacy, and Section 5 experimentally demonstrates how this approach improves the performance of local agents.

Figure 1: System architecture for P2B.

3.1 Randomized Data Reporting

Local agents participate in P2B through a randomized participation mechanism. After some interactions with the user, , the local agent may randomly construct a payload containing an encoded instance of interaction data with probability . Randomized participation is crucial step in two ways. First, it raises the difficulty of re-identification attacks by randomizing the timing of recorded interactions. More crucially, the participation probability as a source of randomness, has a direct effect on the differential privacy parameters and we will establish in Section 4. Briefly, by choosing the appropriate participation probability, one can achieve any level of desired privacy guarantee in P2B.

3.2 Encoding

The agent encodes an instance of the context prior to data transmission. The encoding step acts as a function that maps a -dimensional context vector into a code , where is the total number of encoded contexts. Agents encode context as normalized vectors of fixed precision, using digits for each entry in the vector. An example of this representation approach is normalized histogram, where entries in the histogram sum to 1 and are represented to a precision of decimal digits. This combination of normalization and finite precision has two important characteristics.

First, it is possible to precisely enumerate all possible contexts according to the stars and bars pattern in combinatorics Benjamin & Quinn (2011). Specifically the cardinality  of the set of normalized context vectors , using a finite precision of  decimal digits is


Secondly, sample points in the normalized vector space are distributed uniformly in a grid shape. Given that the agent tends to propose similar actions for similar contexts, neighboring context vectors can be encoded into the same context code . While this approach may appear to be limiting, it is generalizes to other instances where the context vectors are bounded and exhibit some degree of clustered structure.

Figure 2, provides a concrete example of the encoding process on a -dimensional vector space being encoded in different codes. The cardinality of the encoding, is important in establishing a desired utility-privacy trade off. The encoding algorithm can be chosen depending on application requirements, but we limited our experimental evaluation to the simple -means clustering Sculley (2010).

After electing to participate and encoding an instance of a user interaction, the agent transmits the data in the form of the tuple to the shuffler.

Figure 2: Normalized vector space of with and cardinality . Size of the circles shows the value of . Rectangles show a sample encoding of the vector space with and minimum cluster size .

3.3 Shuffler

The trusted shuffler is a critical part of every ESA architecture Bittau et al. (2017), and in our P2B it is necessary in ensuring anonymization and crowd-blending. Following the same PROCHLO implementation Bittau et al. (2017), the shuffler operates in a secure enclave on trusted hardware and performs three tasks:

  1. Anonymization: eliminating all the received meta-data (e.g. IP address) originating from local agents.

  2. Shuffling: gathering tuples received from different sources into batches and shuffling their order.

  3. Thresholding: removing tuples whose their encoded context vector frequency in the batch is less than a defined threshold.

After performing these three operations, the shuffler sends the refined batch to the a server for updating the model. Upon receiving the new batch of training data, the server updates the global model based on the observed interaction data and distributes it to local agents that request it.

4 Privacy Analysis

This section analyzes how P2B ensures differential privacy, through a combination of pre-sampling and crowd-blending. As context vectors are multi-dimensional real-valued vectors, we assume a context vector in its original form can be uniquely assigned to a specific user. P2B assumes no prior on the possible values for a context vector, meaning context vectors are coming from a uniform distribution over the underlying vector space. With these assumptions and the provided zero-knowledge privacy, P2B resists against strong adversaries with any kind of prior knowledge or side information.

For the different context vectors in Equation (1), the optimal encoder (Section 3.2) encodes every of them into one of the possible codes. Consequently, when a total number of users participate in P2B to sends a tuple to the server, the optimal encoder satisfies crowd-blending privacy with . In the case of a suboptimal encoder, we consider as the size of the smallest cluster in the vector space. Furthermore, situations where the number of users is small, leading to a small , can be addressed by adjusting the shufflers threshold to reach the desired . Essentially, can always be matched to the shuffler’s threshold.

Each user randomly participates in data sharing, transmitting a data tuple with probability (Section 3.1), and then encoding the pre-sampled data. Following Gehrke et al. (2012), the combination of pre-sampling with probability and crowd-blending, leads to a differentially private mechanism with


Where is a constant that can be calculated based on the analysis provided by Gehrke et al. (2012).

Our encoding scheme provides an for crowd-blending, as the encoded values for all the members of a crowd is exactly the same. As a consequence, the parameter of the differential privacy of the entire data sharing mechanism depends entirely on the probability of participation as (Figure 3):

Figure 3: as a result of the probability of participating in the scheme .

For example by trading half of the potential data () P2B achieves an that is a strong privacy guarantee. On the other hand, the depends on both and . To understand the effect of , Dwork et al. Dwork & Roth (2013) prove that an -differentially private mechanism ensures that for all neighboring datasets, the absolute value of the privacy loss will be bounded by with probability at least . Therefore, by linearly increasing the crowd-blending parameter , we can exponentially reduce the parameter.

5 Exprimental Evaluation

This section explores P2B’s privacy-utility trade-offs compared to competing non-private and completely private approaches. The experimental evaluation assumes the standard bandit setting, where the local agent learns a policy based on the contextual Linear Upper Confidence Bound algorithm (LinUCB) Chu et al. (2011); Li et al. (2010). For simplicity, throughout the experiments, the probability of randomized transmission by a local agent was set to , the rounding parameter of the encoder was set to , and the parameter for LinUCB was set to , meaning that the local agent is equally likely to propose an exploration or exploitation action. The experiments compare the performance of the following three settings:

Cold. The local agent learns a policy without any communication to the server at any point. As there is no communication this provides full privacy, but each agent has to learn a policy from a cold-start.

Warm and Non-Private. In this setting, local agents communicate the observed context to the server in its original form. Thus, other agents are able to initialize their policy with a model received from the server and start adapting it to their local environment. This is called warm and non-private start, and represents the other end of the privacy/utility spectrum with no privacy afforded to the users.

Warm and Private In this setting, local agents communicate with the server using P2B. Once more, other agents initialize their internal policy with an updated model received from the server and start to adapt it to their local environment. We term this a warm and private start, and the provided privacy guarantees function according to the analysis in section 4.

These approaches are evaluated on synthetic benchmarks, two multi-label classification datasets, and an online advertising dataset.

5.1 Synthetic Preference Benchmark

These benchmarks consider the setting where there’s a stochastic function that relates context vectors with the probability of a proposed action receiving a reward. Specifically, is the scaled softmax output of a matrix-vector product of the user preferences with a randomly generated weight matrix . We set the mean reward for a proposed action given context vector as , where is the th component of softmax applied on , is a scaling factor with , and is a random Gaussian noise .

For all synthetic benchmarks the parameters where set as follows. The scaling factor for the preference function was fixed to and the variance of the Gaussian noise . In terms of local agent settings, the agents use codes for encoding purposes and observe local interactions before randomly transmitting an instance with probability . We varied the number of dimensions of the context vector in the range of to , and the number of actions in the range of to . We observe the average reward in each setting as the user population grows from to .

Our results in Figure 4 indicate that for a small number of interactions the cold start local model fails to learn a useful policy. By contrast, the warm models substantially improve as more user data becomes available. Utilizing prior interaction data in this setting, more than doubles the effectiveness of the learned policies, even for relatively small user populations. Overall, the non-private agents have a performance advantage, with the private version trailing.

Figure 5 illustrates how the dimensionality of the context vector affects the agent’s expected reward. By increasing from 6 to 20 the average reward decreases as agents spend more time exploring the larger context space. P2B remains competitive with its non-private counterparts, and on occasion outperforms them, especially for low-dimensional context settings. The number of actions has a similar effect to the dimensionality of the context, as agents have to spend more time exploring suboptimal actions. Once more, the results of Figure 4 indicate that this is the case experimentally.

Figure 4: Synthetic Benchmarks: (Top) , (Middle) (Bottom) . For all: and . The expected reward in this setting has a strong dependence on number of arms as agents will spend considerable time exploring alternative actions.
Figure 5: , , , and . As the dimensionality of the context increases the average reward for this settings is reduced as agents spend more time trying to explore their environment.
Figure 6: Multi-Label Dataset accuracy: (Top) Media-Mill with and (Bottom) Text-Mining with and . As local agents observe more interactions they obtain better accuracy. This has a multiplicative effect in the distributed settings where agents reach to the plateau much faster.

5.2 Multi-Label Classification

These experiments examine the performance of the different approaches on multi-label classification with bandit feedback. We consider two datasets, namely (1) MediaMill Snoek et al. (2006), a video classification dataset including 43,907 instances, 120 extracted features from each video, and 101 possible categories, and (2) a TextMining dataset Srivastava & Zane-Ulman (2005) including 28,596 instances, 500 extracted features from each text, and 22 possible categories.

P2B’s performance in this setting was evaluated according to the following setting. We consider a fixed number of local agents. Each agent has access to, and is able to interact with a small fraction of the dataset. In particular every agent has access to up to samples, which were randomly selected without replacement from the entire dataset. 70% of agents to participate in P2B and we test the accuracy of the resulting models with the remaining 30%. For both datasets, the local agents use codes for encoding purposes.

This setting is particularly interesting as it allows us to study how the predictive performance of the system changes when the local agents interact more with the user. In terms of relative performance between the different approaches, the results in Figure 6 repeat the findings of the synthetic benchmarks. The non-private warm version is better than the private warm version, which is still better than the cold version that utilizes only local feedback. However, we can also observe that the cold version given enough interactions produces increasingly improving results. The centralized update mechanism tends to have a multiplicative effect, especially when there is little local interaction data before reaching a plateau.

5.3 Online Advertising

Here, we consider an advertisement recommendation scenario where the action is to recommend an ad from one of the existing categories. We use a Criteo dataset from a Kaggle contest444https://labs.criteo.com/category/dataset/. It consists of a portion of Criteo’s traffic over a period of 7 days, including 13 numerical features and 26 categorical features. For each record in the dataset there is a label that shows whether the user has clicked on the recommended ad or not. The feature semantics are unknown and the data publisher has hashed the values of categorical features into 32 bits for anonymization purposes.

As the exact semantics of the features were not disclosed, we assume that numerical features represent user’s context and categorical features correspond to the type of the proposed product. For each sample in the dataset, we hash the values of the 26 categorical features into an integer value. The resulting integer value is then used as possible product category to recommend. The hashing procedure operates as follows: First, the  26 categorical values are reduced into a single hashed value using feature hashing Weinberger et al. (2009). After hashing, the 40 most frequent hash codes are selected. These are converted into an integer value in the range between  1 to 40, based on their frequency (label 1 shows the most frequent code and so on). Finally, for evaluation we only use data samples having one of this 40 values as the product label and ignore the remaining data.

During evaluation, the local agent observes the values of the numerical features as the context vector, and in response takes one of the 40 possible actions. The agent obtains a reward of 1, if the proposed action matches the logged action, and the logged action is clicked through.

The remaining experimental setup for the criteo dataset is similar to the one used in the multi-label datasets. Specifically, we present experimental results (Figure 7) for =10 and total number of actions . We compare the results for two values of encoding parameter and . The sampling probability remains , and the shuffling threshold remains 10. This experimental setting has 3,000 agents, and each of the agents accumulates 300 interactions.

The results in this setting are quite surprising as private agents attain a better click-through rate than their non-private counterparts. There is a recent work Nasr et al. (2018) showing that privacy-preserving training of machine learning models can aid generalization as well as protect privacy. There are some factors we believe help explain these experimental results.

Private agents use the encoded value as the context. As a result, the number of possible contexts is much smaller compared to using the original -dimensional context vector. As contextual bandits need to balance exploration with exploitation, especially in the early stages, a bandit with a smaller context size can quickly reach better results. Furthermore, P2B clusters similar context vectors into the same categories and this also helps a private bandit to better act in similar situations. This is an effect that also can be seen for the lower dimensional contexts of the synthetic benchmarks (Figure 5).

Figure 7: Criteo results. , , (Top) , and (Bottom) . The private and non-private agents obtain similar performances for low numbers of local interactions. As the number of local interactions increase the private agents perform better than their non-private counterparts.

6 Related Work

User data fuels a variety of useful machine learning applications, but at the same time comes with a number of ethical and legal considerations. The quest to balance the seemingly conflicting concerns of privacy and utility have shaped the resurgence of distributed machine learning approaches that safeguard the privacy of user data. This section compares P2B to some recent proposals in this area.

On-device inference ensures that data does not need to leave the device for inference, which is an important step towards protecting privacy. Federated learning techniques extend this approach to doing on device training through the idea of “bringing the code to the data, instead of the data to the code”. Bonawitz et al. (2019) proposes and evaluates a high-level system to perform federated learning across devices with varying availability. Their proposal relies on secure aggregation Bonawitz et al. (2017); McMahan & Ramage (2017) to ensure that user’s data remain encrypted even in the server’s memory.

Compared to P2B, federated learning is substantially more complicated in terms of orchestration. Relying on a local model training procedure, puts some overhead on the user’s device and there is substantial communication complexity involved in communicating model updates and training batching. By contrast, P2B puts almost no overhead on user’s devices as they only need to run a distance preserving encoding algorithm Sculley (2010); Aghasaryan et al. (2013) that has the complexity at the inference time, and any data exchange happens opportunistically, without requiring any synchronization between clients.

The “Draw and Discard” machine learning system described in Pihur et al. (2018) is an interesting combination of the approach of bringing the code to the data, with differential privacy guarantees. The proposed system encompasses decentralized model training, for generalized linear models, and maintains instances of the model at the server side. The draw and discard approach consists of two components: On the server side, it sends a randomly chosen model instance to the client. Upon receiving an updated model instance, the server randomly replaces it with one of the existing instances. On the user side, users update the linear model on their data, and add Laplacian noise to the model parameters to satisfy differential privacy. Their privacy analysis is in the feature-level that assumes features of the model are all uncorrelated. As a consequence, for achieving model-level privacy, where most features are correlated to each other, substantially more noise needs to be added to the updated model before sending it to the server. P2B does not make any assumption on the correlation among context vector features and it provides a constant and similar privacy guarantee across all the users. Our experiments only consider situations where users participate by sending only one data tuple. However, in the case of collecting data tuple from each user, because of the composition property of the provided differential privacy Dwork & Roth (2013), P2B still guarantees ()-differential privacy.

Shariff et al. Shariff & Sheffet (2018) discuss that adhering to the standard notion of differential privacy needs to ignore the context and thus incur linear regret. Thus, they use a relaxed notion of DP, called Joint Differential Privacy, that allows to use the non-privatized data at time , while guaranteeing that all interactions with all other users at timepoints have very limited dependence on that user data. Our approach guarantees differential privacy to all users, independent of time of participation in P2B. Basu et al. Basu et al. (2019) argue that if a centralized bandit algorithm only observes a privatized version of the sequence of user’s actions and responses, based on -LDP, the algorithm’s regret scales as a multiplicative factor . Other LDP mechanisms proposed for bandits algorithms Tossou & Dimitrakakis (2017); Gajane et al. (2018) consider a situations where only responses are considered to be private without incorporating contextual information in the learning process.

7 Conclusions

This paper presents P2B, a privacy-preserving approach for machine learning with contextual bandits. We show experimentally that standalone agents trained on an individual’s data require a substantial amount of time to learn a useful recommendation policy. Our experiments show that sharing data between agents can substantially reduce the number of required local interactions in order to reach a useful local policy.

We introduced a simple distributed system for updating agents based on user interactions. In a series of experiments, P2B shows substantial improvement for increasing numbers of users and local interactions. With regards to P2B’s privacy, experiments demonstrate the clustering-based encoding scheme is effective in encoding interactions. In addition, all the experiments relied on a sampling probability . This results in a very competitive privacy budget of .

Given that P2B essentially discards half of the data for the sake of privacy, the system performs surprisingly well. In the synthetic benchmarks, P2B traces a similar trend to its non-private counterpart. While for large populations the non-private agents have an advantage there are more than a few cases where their private counterparts are competitive and even outperform them. Additionally, as the multi-label classification experiments (Figure 6) indicate, the performance gap between the non-private and differentially-private settings seems to be shrinking to within 3.6% and 2.6% for the text mining and MediaMill tasks respectively. P2B’s performance on the Criteo dataset is a rare instance where a privacy preserving regime actually improves generalization. The resulting CTR difference of 0.0025 in favor or the privacy preserving approach is somewhat surprising. We do provide reasons as to why this is not completely unexpected in our evaluation of results.

Overall, P2B represents a simple approach to perform privacy-preserving personalization. Our results indicate that P2B becomes a particularly viable option for settings where large user populations participate in a large amount of local interactions, where the performance penalty for privacy is vanishingly small. As future work, we aim to study the behavior of more encoding approaches as well as their interplay with alternative contextual bandit algorithms.


  • Aghasaryan et al. (2013) Aghasaryan, A., Bouzid, M., Kostadinov, D., Kothari, M., and Nandi, A. On the use of LSH for privacy preserving personalization. In Proceedings - 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2013, 2013. ISBN 9780769550220. doi: 10.1109/TrustCom.2013.46.
  • Basu et al. (2019) Basu, D., Dimitrakakis, C., and Tossou, A. Differential Privacy for Multi-armed Bandits: What Is It and What Is Its Cost? arXiv preprint arXiv:1905.12298, 2019.
  • Benjamin & Quinn (2011) Benjamin, A. T. and Quinn, J. J. Proofs that really count: The art of combinatorial proof. 2011. ISBN 9781614442080. doi: 10.5948/9781614442080.
  • Bittau et al. (2017) Bittau, A., Erlingsson, A., Maniatis, P., Mironov, I., Raghunathan, A., Lie, D., Rudominer, M., Kode, U., Tinnes, J., and Seefeld, B. Prochlo: Strong Privacy for Analytics in the Crowd. 2017. ISBN 9781450350853. doi: 10.1145/3132747.3132769.
  • Bonawitz et al. (2017) Bonawitz, K., Ivanov, V., Kreuter, B., Marcedone, A., McMahan, H. B., Patel, S., Ramage, D., Segal, A., and Seth, K. Practical Secure Aggregation for Privacy-Preserving Machine Learning. Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security - CCS ’17, pp. 1175–1191, 2017. ISSN 15437221. doi: 10.1145/3133956.3133982.
  • Bonawitz et al. (2019) Bonawitz, K., Eichner, H., Grieskamp, W., Huba, D., Ingerman, A., Ivanov, V., Kiddon, C., Konečný, J., Mazzocchi, S., McMahan, H. B., Van Overveldt, T., Petrou, D., Ramage, D., and Roselander, J. Towards Federated Learning at Scale: System Design. 2 2019.
  • Chu et al. (2011) Chu, W., Li, L., Reyzin, L., and Schapire, R. E. Contextual bandits with linear Payoff functions. In Journal of Machine Learning Research, 2011.
  • Dwork & Roth (2013) Dwork, C. and Roth, A. The Algorithmic Foundations of Differential Privacy. Foundations and Trends® in Theoretical Computer Science, 9(3-4):211–407, 2013. ISSN 1551-305X. doi: 10.1561/0400000042.
  • Erlingsson et al. (2014) Erlingsson, A., Pihur, V., and Korolova, A. RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response. 2014. ISSN 15437221. doi: 10.1145/2660267.2660348.
  • Gajane et al. (2018) Gajane, P., Urvoy, T., and Kaufmann, E. Corrupt Bandits for Preserving Local Privacy. In Algorithmic Learning Theory, pp. 387–412, 2018.
  • Gehrke et al. (2011) Gehrke, J., Lui, E., and Pass, R. Towards privacy for social networks: A zero-knowledge based definition of privacy. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2011. ISBN 9783642195709. doi: 10.1007/978-3-642-19571-6–_˝26.
  • Gehrke et al. (2012) Gehrke, J., Hay, M., Lui, E., and Pass, R. Crowd-blending privacy. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2012. ISBN 9783642320088. doi: 10.1007/978-3-642-32009-5–_˝28.
  • Li et al. (2010) Li, L., Chu, W., Langford, J., and Schapire, R. E. A Contextual-Bandit Approach to Personalized News Article Recommendation. 2010. ISSN 9781605587998. doi: 10.1145/1772690.1772758.
  • McMahan & Ramage (2017) McMahan, B. and Ramage, D. Federated Learning : Collaborative Machine Learning without centralized training data. Post, 2017. ISSN 0017-3134. doi: 10.1080/00173130902749999.
  • Nasr et al. (2018) Nasr, M., Shokri, R., and Houmansadr, A. Machine learning with membership privacy using adversarial regularization. In Proceedings of the ACM Conference on Computer and Communications Security, 2018. ISBN 9781450356930. doi: 10.1145/3243734.3243855.
  • Pihur et al. (2018) Pihur, V., Korolova, A., Liu, F., Sankuratripati, S., Yung, M., Huang, D., and Zeng, R. Differentially-Private ”Draw and Discard” Machine Learning. 7 2018.
  • Ronen et al. (2016) Ronen, R., Yom-Tov, E., and Lavee, G. Recommendations meet web browsing: Enhancing collaborative filtering using internet browsing logs. 2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016, pp. 1230–1238, 2016. doi: 10.1109/ICDE.2016.7498327.
  • Sculley (2010) Sculley, D. Web-scale k-means clustering. In Proceedings of the 19th International Conference on World Wide Web, WWW ’10, 2010. ISBN 9781605587998. doi: 10.1145/1772690.1772862.
  • Shariff & Sheffet (2018) Shariff, R. and Sheffet, O. Differentially private contextual linear bandits. In Advances in Neural Information Processing Systems, pp. 4296–4306, 2018.
  • Snoek et al. (2006) Snoek, C. G., Worring, M., Van Gemert, J. C., Geusebroek, J. M., and Smeulders, A. W. The challenge problem for automated detection of 101 semantic concepts in multimedia. In Proceedings of the 14th Annual ACM International Conference on Multimedia, MM 2006, 2006. ISBN 1595934472. doi: 10.1145/1180639.1180727.
  • Srivastava & Zane-Ulman (2005) Srivastava, A. N. and Zane-Ulman, B. Discovering recurring anomalies in text reports regarding complex space systems. In IEEE Aerospace Conference Proceedings, 2005. ISBN 0780388704. doi: 10.1109/AERO.2005.1559692.
  • Tossou & Dimitrakakis (2017) Tossou, A. C. Y. and Dimitrakakis, C. Achieving Privacy in the Adversarial Multi-Armed Bandit. pp. 2653–2659, 2017.
  • Weinberger et al. (2009) Weinberger, K., Dasgupta, A., Langford, J., Smola, A., and Attenberg, J. Feature hashing for large scale multitask learning. In Proceedings of the 26th International Conference On Machine Learning, ICML 2009, 2009. ISBN 9781605585161.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description