Amplification by Shuffling:
From Local to Central Differential Privacy via Anonymity
Sensitive statistics are often collected across sets of users, with repeated collection of reports done over time. For example, trends in users’ private preferences or software usage may be monitored via such reports. We study the collection of such statistics in the local differential privacy (LDP) model, and describe an algorithm whose privacy cost is polylogarithmic in the number of changes to a user’s value.
More fundamentally—by building on anonymity of the users’ reports—we also demonstrate how the privacy cost of our LDP algorithm can actually be much lower when viewed in the central model of differential privacy. We show, via a new and general privacy amplification technique, that any permutation-invariant algorithm satisfying -local differential privacy will satisfy -central differential privacy. By this, we explain how the high noise and overhead of LDP protocols is a consequence of them being significantly more private in the central model. As a practical corollary, our results imply that several LDP-based industrial deployments may have much lower privacy cost than their advertised would indicate—at least if reports are anonymized.
A frequent task in data analysis is the monitoring of the statistical properties of evolving data in a manner that requires repeated computation on the entire evolving dataset. Software applications commonly apply online monitoring, e.g., to establish trends in the software configurations or usage patterns. However, such monitoring may impact the privacy of software users, as it may directly or indirectly expose some of their sensitive attributes (e.g., their location, ethnicity, gender, etc.), either completely or partially. To address this, recent work has proposed a number of mechanisms that provide users with strong privacy-protection guarantees in terms of of differential privacy [DMNS06, Dwo06, KLN08] and, specifically, mechanisms that provide local differential privacy (LDP) have been deployed by Google, Apple, and Microsoft [EPK14, App17, DKY17].
The popularity and practical adoption of LDP monitoring mechanisms stems largely from their simple trust model: for any single LDP report that a user contributes about one of their sensitive attributes, the user will benefit from strong differential privacy guarantees even if the user’s report becomes public and all other parties collude against them.
However, this apparent simplicity belies the realities of most monitoring applications. Software monitoring, in particular, near always involves repeated collection of reports over time, either on a regular basis or triggered by specific software activity; additionally, not just one, but multiple, software attributes may be monitored, and these attributes may all be correlated, as well as sensitive, and may also change in a correlated fashion. Hence, a user’s actual LDP privacy guarantees may be dramatically lower than they might appear, since LDP guarantees can be exponentially reduced by multiple correlated reports (see Tang et al. [TKB17] for a case study). Furthermore, lower accuracy is achieved by mechanisms that defend against such privacy erosion (e.g., the memoized backstop in Google’s RAPPOR [EPK14]). Thus, to square this circle, and make good privacy/utility tradeoffs, practical deployments of privacy-preserving monitoring rely on additional assumptions—in particular, the assumption that each user’s reports are anonymous at each timestep and unlinkable over time.
In this work, we formalize how the addition of anonymity guarantees can improve differential-privacy protection. Our direct motivation is the Encode, Shuffle, Analyze (ESA) architecture and Prochlo implementation of Bittau et al. [BEM17], which relies on an explicit intermediary that processes LDP reports from users to ensure their anonymity. The ESA architecture is designed to ensure a sufficient number of reports are collected at each timestep so that any one report can “hide in the crowd” and to ensure that those reports are randomly shuffled to eliminate any signal in their order. Furthermore, ESA will also ensure that reports are disassociated and stripped of any identifying metadata (such as originating IP addresses) to prevent the linking of any two reports to a single user, whether over time or within the collection at one timestep. Intuitively, the above steps taken to preserve anonymity will greatly increase the uncertainty in the analysis of users’ reports; however, when introducing ESA, Bittau et al. [BEM17] did not show how that uncertainty could be utilized to provide a tighter upper bound on the worst-case privacy loss.
Improving on this, this paper derives results that account for the benefits of anonymity to provide stronger differential privacy bounds. First, inspired by differential privacy under continual observation, we describe an algorithm for high-accuracy online monitoring of users’ data in the LDP model whose total privacy cost is polylogarithmic in the number of changes to each user’s value. This algorithm shows how LDP guarantees can be established in online monitoring, even when users report repeatedly, over multiple timesteps, and whether they report on the same value, highly-correlated values, or independently-drawn values.
Second, and more fundamentally, we show how—when each report is properly anonymized—any collection of LDP reports (like those at each timestep of our algorithm above) with sufficient privacy () is actually subject to much stronger privacy guarantees in the central model of differential privacy. This improved worst-case privacy guarantee is a direct result of the uncertainty induced by anonymity, which can prevent reports from any single user from being singled out or linked together, whether in the set of reports at each timestep, or over time.
1.1 Background and related work.
Differential privacy is a quantifiable measure of the stability of the output of a randomized mechanism in the face of changes to its input data—specifically, when the input from any single user is changed. (See Section 2 for a formal definition.)
Local differential privacy (LDP).
In the local differential privacy model, formally introduced by Kasiviswanathan et al. [KLN08], the randomized mechanism’s output is the transcript of the entire interaction between a specific user and a data collector (e.g., a monitoring system). Even if a user arbitrarily changes their privately held data, local differential privacy guarantees will ensure the stability of the distribution of all possible transcripts. Randomized response, a disclosure control technique from the 1960s [War65], is a particularly simple technique for designing single-round LDP mechanisms. Due to their attractive trust model, LDP mechanisms have recently received significant industrial adoption for the privacy-preserving collection of heavy hitters [EPK14, App17, DKY17], as well as increased academic attention [BS15, BNST17, QYY16, WBLJ17].
Anonymous data collection.
As a pragmatic means for reducing privacy risks, reports are typically anonymized and often aggregated in deployments of monitoring by careful operators (e.g., RAPPOR [EPK14])—even though anonymity is no privacy panacea [DSSU17, Dez18].
To guarantee anonymity of reports, multiple mechanisms have been developed, Many, like Tor [DMS04], are based on the ideas of cryptographic onion routing or mixnets, often trading off latency to offer much stronger guarantees [vdHLZZ15, TGL17, LGZ18]. Some, like those of Prochlo [BEM17], are based on oblivious shuffling, with trusted hardware and attestation used to increase assurance. Others make use of the techniques of secure multi-party computation, and can simultaneously aggregate reports and ensure their anonymity [CGB17, BIK17]. Which of these mechanisms is best used in practice is dictated by what trust model and assumptions apply to any specific deployment.
Central differential privacy.
The traditional, central model of differential privacy applies to a centrally-held dataset for which privacy is enforced by a trusted curator that mediates upon queries posed on the dataset by an untrusted analyst—with curators achieving differential privacy by adding uncertainty (e.g., random noise) to the answers for analysts’ queries. For differential privacy, answers to queries need only be stable with respect to changes in the data of a single user (or a few users); these may constitute only a small fraction of the whole, central dataset, which can greatly facilitate the establishment of differential privacy guarantees. Therefore, the central model can offer much better privacy/utility tradeoffs than the LDP setting. (In certain cases, the noise introduced by the curator may even be less than the uncertainty due to population sampling.)
Online monitoring with privacy was formalized by Dwork et al. as the problem of differential privacy under continual observation [DNPR10]. That work proposed a privacy-preserving mechanisms in the central model of differential privacy, later extended and applied by Chan et al. [CSS11] and Jain et al. [JKT12].
Continual observations constitute a powerful attack vector. For example, Calandrino et al. [CKN11] describe an attack on a collaborative-based recommender system via passive measurements that effectively utilizes differencing between a sequence of updates.
In the local model, Google’s RAPPOR [EPK14] proposed a novel memoization approach as a backstop against privacy erosion over time: A noisy answer is memorized by the user and repeated in response to the same queries about a data value. To avoid creating a trackable identifier, RAPPOR additionally randomizes those responses, which may only improve privacy. (See Ding et al. [DKY17] for alternative approach to memoization.) Although memoization prevents a single data value from ever being fully exposed, over time the privacy guarantees will weaken if answers are given about correlated data or sequences of data values that change in a non-independent fashion.
More recently, Tang et al. [TKB17] performed a detailed analysis of one real-world randomized response mechanisms and examined its longitudinal privacy implications.
1.2 Our contributions
Motivated by the gap in accuracy between central and local differential privacy under continual observations, we describe a general technique for obtaining strong central differential privacy guarantees from (relatively) weak privacy guarantees in the local model. Specifically, our main technical contribution demonstrates that random shuffling of data points ensures that the reports from any LDP protocol will also satisfy central differential privacy at a per-report privacy-cost bound that is a factor lower than the LDP privacy bound established in the local model. Here, is the total number of reports, which can reach into the billions in practical deployments; therefore, the privacy amplification can be truly significant.
Privacy amplification by shuffling.
An immediate corollary of our amplification result is that composing client-side local differential privacy with server-side shuffling allows one to claim strong central differential privacy guarantees without any explicit server-side noise addition.
For this corollary to hold, the LDP reports must be amenable to anonymization via shuffling: the reports cannot have any discriminating characteristics and must, in particular, all utilize the same local randomizer (since the distribution of random values may be revealing). However, even if this assumption holds only partially—e.g., due to faults, software mistakes, or adversarial control of some reports—the guarantees degrade gracefully. Each set of users for which the corollary is applicable (e.g., that utilize the same local randomizer) will still be guaranteed a factor reduction in their worst-case privacy cost in the central model. (See Section 4.1 for a more detailed discussion.)
It is instructive to compare our technique with privacy amplification by subsampling [KLN08]. As in the case of subsampling, we rely on the secrecy of the samples that are used in nested computations. However, unlike subsampling, shuffling, by itself, does not offer any differential privacy guarantees. Yet its combination with a locally differentially private mechanism has an effect that is essentially as strong as that achieved via known applications of subsampling [BST14, ACG16, BBG18]. An important advantage of our reduction over subsampling is that it can include all the data reports (exactly once) and hence need not modify the underlying statistics of the dataset.
In concurrent and independent work, Cheu et al. [CSU18] have also examined an augmented local model of differential privacy that includes an anonymous channel. In this model they demonstrate privacy amplification by the same factor for one-bit randomized response. The analysis in this case relies on a direct estimation of -divergence between two binomial distributions. This simple case was also the starting point of our work but its analysis is unrelated to the general case presented here. color=blue!40color=blue!40todo: color=blue!40VF: It seems worth including this analysis in our paper. It has some advantages and we did derive it independently.
We also remark that another recent privacy amplification technique, via contractive iteration [FMTT18] relies on additional properties of the algorithm and is not directly comparable to results in this work.
Lower bounds in the local model.
Our amplification result can be viewed, conversely, as giving a lower bound for the local model. Specifically, our reduction means that lower bounds in the central model translate—with a penalty factor in the privacy parameter—to a large class of local protocols. In particular, this suggests that our results in the local model are near-optimal unless the corresponding results in the central model can be improved.
LDP monitoring with longitudinal privacy guarantees.
We introduce an online monitoring protocol that guarantees longitudinal privacy to users that report over multiple timesteps, irrespective of whether their reports are about independent or correlated values. By utilizing our protocol, users need not worry about revealing too much over time, and if they are anonymous, their reports may additionally “hide in the crowd” and benefit by amplification-by-shuffling, at least at each timestep.
As a motivating task, we can consider the collection of global statistics from users’ mobile devices, e.g., about users’ adoption of software apps or the frequency of users’ international long-distance travel. This task is a natural fit for a continual-observations protocol with LDP guarantees—since both software use and travel can be highly privacy sensitive—and can be reduced to collecting a boolean value from each user (e.g., whether they are in a country far from home). However, our protocol can be extended to the collection of multi-valued data or even data strings by building on existing techniques [EPK14, App17, BNST17].
Concretely, we consider the collection of user statistics across time periods (e.g., for days) with each user changing their underlying boolean value at most times for some . This is the only assumption we place on the data collection task. For software adoption and international travel, the limit on the number of changes is quite natural. New software is adopted, and then (perhaps) discarded, with only moderate frequency; similarly, speed and distance limit the change rate for even the most ardent travelers. Formally, we show the following: Under the assumption stated above, one can estimate all the frequency statistics from users with error at most in the local differential privacy model.
Motivated by similar issues, in a recent work Joseph et al. [JRUW18] consider the problem of tracking a distribution that changes only a small number of times. Specifically, in their setting each user at each time step receives a random and independent sample from a distribution . It is assumed that changes at most times and they provide utility guarantees that scale logarithmically with the number of time steps. The key difference between this setting and ours is the assumption of independence of each user’s inputs across time steps. Under this assumption even a fixed unbiased coin flip would result in each user receiving values that change in most of the steps. Therefore the actual problems addressed in this work are unrelated to ours and we rely on different algorithmic techniques.
1.3 Organization of the paper
Section 2 introduces our notation and recalls the definition of differential privacy. Section 3 provides an algorithm for collecting statistics and proves its accuracy and privacy in the local model under continual observations. Section 4 contains the derivation of our amplification-by-shuffling result. Section 5 concludes with a discussion.
2 Technical Preliminaries and Background
Notation: For any , denotes the set . For a vector , denotes the value of its ’th coordinate. For an indexed sequence of elements and two indices we denote by the subsequence (or empty sequence if ). denotes , which is also the Hamming weight of when has entries in . All logarithms are meant to be base unless stated otherwise. For a finite set , let denote a sample from drawn uniformly at random.
Definition 1 (-Dp [Dkm06]).
A randomized algorithm satisfies -differential privacy (DP) if for all and for all adjacent it holds that
The notion of adjacent inputs is application-dependent, and it is typically taken to mean that and differ in one of the elements (that corresponds to the contributions of a single individual). We will also say that an algorithm satisfies differential privacy at index if the guarantees hold only for datasets that differ in the element at index . We assume the parameter to be a small constant, and is set to be much smaller than . We repeatedly use the (advanced) composition property of differential privacy.
If are randomized algorithms satisfying -DP, then their composition, defined as for satisfies differential privacy where . Moreover, can be chosen adaptively depending on the outputs of .
A straightforward corollary implies that for , a -fold composition of -DP algorithms leads to an overhead, i.e., .
It will also be convenient to work with the notion of distance between distributions on which -DP is based more directly. We define it below and describe some of the properties we will use. Given two distributions and , we will say that they are -DP close, denoted by , if for all measurable , we have
For random variables , we write to mean that their corresponding distributions are -DP close. We use to mean that the random variables are identically distributed.
For distributions and , we write to denote the mixture distribution that samples from with probability and from with probability . The following properties are well-known properties of -DP.
The notion of -DP satisfies the following properties:
Let . Then for and , .
- Triangle inequality
Let and . Then .
Let and , then for any , it holds that .
3 Locally Private Protocol for Longitudinal Data
Recall the motivation for collecting statistics from user devices with the intent of tracking global trends. We remain in the local differential privacy model, but our protocol addresses the task of collecting reports from users to derive global statistics that are expected to change across time. We consider the simplified task of collecting a boolean value from each user, e.g., their device being at an international location, far from the user’s home. However, our protocol can be straightforwardly extended to collecting richer data, such as strings, by building on existing techniques [EPK14, App17, BNST17].
In what follows, we consider a natural model of collecting user statistics across time periods. We make two minimal assumptions: we are given the time horizon , or the number of time periods (or days) ahead of time, and each user changes their underlying data at most times. The first assumption is mild: a loose upper bound on suffices, and the error depends only polylogarithmically on the upper bound. The second assumption can be enforced at the client side to ensure privacy, while suffering some loss in accuracy.
Our approach is inspired by the work on privacy under continual observations by Dwork et al. [DNPR10], who give a (central) DP mechanism to maintain a counter that is incrementally updated in response to certain user-driven events. Its (central) differential privacy is defined in respect to a single increment, the so-called event-level privacy. The naïve solution of applying additive noise to all partial sums introduces error , proportional to the square root of the time horizon. The key algorithmic contribution of Dwork et al. is an elegant aggregation scheme that reduces the problem of releasing partial sums to the problem of maintaining a binary tree of counters. By carefully correlating noise across updates, they reduce the error to . (In related work Chan et al. [CSS11] describe a post-processing procedure that guarantees output consistency; Xiao et al. [XWG11] present a conceptually similar algorithm framed as a basis transformation.)
We more formally define the problem of collecting global statistics based on reports from users’ devices. Given a time horizon , we consider a population of users reporting a boolean value about their state at each time period . (Without loss of generality, we assume that is a power of 2.) Let denote the states of the ’th user across the time periods with at most changes. The task of collecting statistics requires the server to compute the sum for every time periods .
For the reason that will become clear shortly, it is convenient to consider the setup where users report only changes to their state, i.e., a finite derivative of . Let denote the changes in the ’th user’s state between consecutive time periods. Our assumption implies that each has at most non-zero entries. It holds that for all . Let . For the collection task at hand, it suffices to estimate “running counts” or marginal sums .
An online client-side algorithm for reporting statistics runs on each client device and produces an output for each time period. Correspondingly, the online server-side algorithm receives reports from clients and outputs estimates for the marginal at each time period .
To demonstrate the key techniques in the design of our algorithm, consider a version of the data collection task with every client’s data known ahead of time. Given for user , the client-side algorithm produces (up to) reports and the server computes estimates of the marginal sum for all . Our algorithm is based on the tree-based aggregation scheme [DNPR10, CSS11] used previously for releasing continual statistics over longitudinal data in the central model. Each client maintains a binary tree over the time steps to track the (up to) changes of their state. The binary tree ensures that each change affects only nodes of the tree. We extend the construction of Dwork et al. [DNPR10] in a natural manner to the local model by having each client maintain and report values in this binary tree with sub-sampling as follows.
In the beginning, the client samples uniformly from the ’th change they would like to report on. Changes other than the ’th one are ignored. The client builds a tree with leaves corresponding to an index vector capturing the ’th change ( everywhere except at the change). The rest of the nodes are populated with the sums of their respective subtrees. The client then chooses a random level of the tree to report on. Then, the client runs randomized response on each node of the selected level (with noise determined by ) and reports the level of the tree along with the randomized response value for each node. In actual implementations and our presentation of the protocol (Algorithm 1) the tree is never explicitly constructed. The state maintained by the client consists of just four integer values (, the level of the tree, and two counters).
The server accumulates reports from clients to compute an aggregate tree comprising sums of reports from all clients. To compute the marginal estimate for the time step , the server sums up the respective internal nodes whose subtrees form a disjoint cover of the interval and scales it up by the appropriate factor (to compensate for the client-side sampling).
To simplify the description of what follows, for a given (that is a power of two), we let (and variants such as and ) denote the ’th level of a balanced binary tree with nodes where leaves have level 1. We let denote the value , the number of nodes at level , and write (resp. ) to denote (resp. ). We let , for and denote the ’th node at level of the binary tree , and as the corresponding value stored at the node. Algorithm 1 describes the client-side algorithm to generate differentially-private reports and Algorithm 2 describes the server-side algorithm that collects these reports and estimates marginals for each time period. Theorems 5 and 6 state the privacy and utility guarantees of the algorithms.
Theorem 5 (Privacy).
The sequence of outputs of (Algorithm 1) satisfies -local differential privacy.
To show the local differential privacy of the outputs, we consider two aspects of the client’s reports: (1) the (randomized) values at the nodes output in Step 20, and (2) the timing of the report. The latter entirely depends on the choice of which is independent of the client’s underlying data record and hence does not affect the privacy analysis. Furthermore, is also independently sampled and for the rest of the proof, we fix and and focus only on the randomized values output in Step 20.
By construction, the client chooses only the ’th change to report on. This would imply that two inputs would affect at most two nodes at level with each of the values changing by at most one.
This bound on the sensitivity of the report enables us to use standard arguments for randomized response mechanisms [EPK14] to Steps 17 and 18 to show that the noisy values satisfy -local differential privacy.
Theorem 6 (Utility).
For , with probability at least over the randomness of run on data records, the outputs of satisfy:
The leaves of the binary tree with nodes in Algorithms 1 and 2 comprise events in each of the time periods . The marginal counts and comprise the exact (resp., approximate) number of events observed by all clients in the time period .
The set is uniquely defined, i.e., it is independent of the order in which pairs of intervals are selected in the while loop.
The size of the set is at most .
The leaf nodes of the subtrees in partition .
The marginal counts totaling events in the interval can be computed by adding the corresponding nodes in the tree whose subtrees partition the interval .
It follows that largest error in any is at most times the largest error in any of the subtree counts. We proceed to bound that error.
For a node in the tree, let denote the sum , i.e., the actual sum of values in the subtree at , summed over all . Let denote the contribution of client to . We will argue that is small with high probability, which would then imply the bound on .
Towards that goal, for a client , and node , let be the contribution of client to the sum computed by the server. Thus for any , this value is zero, and for , is a randomized response as defined in steps 14–18 of Algorithm 1. Clearly . We next argue that is equal to . Indeed, for a specific to have an effect on , we should have this be the ’th non-zero entry in (which happens with probability ). Further, should equal (which happens with probability ). Conditioned on these two events, the expected value of is determined in step 17 as multiplied by . When one of these two events fails to happen, the expected value of is determined in line 15, and is zero. It follows that the expectation of is as claimed and thus the expectation of is exactly .
To complete the proof, we note that is the sum of independent (for each ) random variables that come from the range . Standard Chernoff bounds then imply that differs from its expectation by at most , except with probability . This expectation is bounded by . Setting , and taking a union bound, we conclude that except with probability :
Scaling by , and multiplying by to account for the size of , we conclude that except with probability ,
The claim follows. ∎
4 Privacy Amplification via Shuffling
Local differential privacy holds against a strictly more powerful adversary than central differential privacy, namely one that can examine all communications with a particular user. What can we say about central DP guarantees of a protocol that satisfies -local DP? Since local DP implies central DP, this procedure satisfies -central DP. Moreover, without making additional assumptions, we cannot make any stronger statement than that. Indeed, the central mechanism may release the entire transcripts of its communications with all its users annotated with their identities, leaving their local differential privacy as the only check on information leakage via the mechanism’s outputs.
We show that, at a modest cost and with little changes to its data collection pipeline, the analyst can achieve much better central DP guarantees than those suggested by this conservative analysis. Specifically, by shuffling (applying a random permutation), the analyst may amplify local differential privacy by a factor.
To make the discussion more formal we define the following general form of locally differentially private algorithms that allows picking local randomizers sequentially on the basis of answers from the previous randomizers (Algorithm 3).
Our main amplification result is for Algorithm 4 that applies the local randomizers after randomly permuting the elements of the dataset. For algorithms that use a single fixed local randomizer it does not matter whether the data elements are shuffled before or after the application of the randomizer. We will discuss the corollaries of this result for algorithms that shuffle the responses later in this section.
Theorem 7 (Amplification by shuffling).
For a domain , let for (where is the range space of ) be a sequence of algorithms such that is -differentially private for all values of auxiliary inputs in . Let be the algorithm that given a dataset , samples a uniform random permutation over , then sequentially computes for and outputs (see Algorithm 4). For any , and , satisfies -differential privacy in the central model, where .
We remark that while we state our amplification result for local randomizers operating on a single data element, the result extends immediately to arbitrary -DP algorithms that operate on disjoint batches of data elements.
A natural approach to proving this result is to use privacy amplification via subsampling (Lemma 4). At step in the algorithm, conditioned on the values of , the random variable is uniform over the remaining indices. Thus the ’th step will be -DP. The obvious issue with this argument is that it gives very little amplification for values of that are close to , falling short of our main goal. It also unclear how to formalize the intuition behind this argument.
Instead our approach crucially relies on a reduction to analysis of the algorithm that swaps the first element in the dataset with a uniformly sampled element in the dataset before applying the local randomizers (Algorithm 5). We show that has the desired privacy parameters for the first element (that is, satisfies the guarantees of differential privacy only for pairs of datasets that differ in the first element). We then show that for every index , can be decomposed into a random permutation that maps element to be the first element followed by . This implies that the algorithm will satisfy differential privacy at .
To argue about the privacy properties of the we decompose it into a sequence of algorithms, each producing one output symbol (given the dataset and all the previous outputs). It is not hard to see that the output distribution of the ’th algorithm is a mixture , where does not depend on (the first element of the dataset) and is the output distribution of the ’th local randomizer applied to . We then demonstrate that the probability is upper bounded by . Hence, using amplification by subsampling, we obtain that is close to . The distribution does not depend on and therefore, by the same argument, (and hence ) is close to the output distribution of the ’th algorithm on any dataset that differs from only in the first element. Applying advanced composition to the sequence of algorithms gives the claim.
We now provide the full details of the proof starting with the description and analysis of .
Theorem 8 (Amplification by swapping).
For a domain , let for (where is the range space of ) be a sequence of algorithms such that is -differentially private for all values of auxiliary inputs in . Let be the algorithm that given a dataset , samples a uniform index , swaps element with element and then applies the local randomizers to the resulting dataset sequentially (see Algorithm 5). For any , and , satisfies -differential privacy at index in the central model, where .
The algorithm defines a joint distribution between and the corresponding output sequence of , which we denote by . We first observe that can be seen as the output of a sequence of algorithms with conditionally independent randomness: for . On input and , produces a random sample from the distribution of conditioned on . The outputs of are given as the input to . By definition, this ensures that random bits used by are independent of those used by conditioned on the previous outputs. Therefore in order to upper bound the privacy parameters of we can analyze the privacy parameters of and apply the advanced composition theorem for differential privacy (Theorem 2).
Next we observe that, by the definition of , conditioned on the value of , is independent of . In particular, for , can equivalently be implemented as follows. First, sample an index from the distribution of conditioned on . Then, if output . Otherwise, output . To implement we sample uniformly from and then output .
We now prove that for every , is -differentially private at index . Let and be two datasets that differ in the first element. Let denote the input to . We denote by the probability distribution of , denote by (or ) the probability distribution of conditioned on (, respectively) and by the probability that (where is sampled from the distribution of conditioned on as described above). We also denote by and the corresponding quantities when is run on . By the definition, and .
For , is uniform over and hence . Further, is equal to (since both are equal to the output distribution of conditioned on ). By -local differential privacy of and quasi-convexity of -DP we obtain that . Therefore, privacy amplification by subsampling (Lemma 4) implies that . Similarly, we obtain that and therefore, by the triangle inequality, . In other words, is -differentially private at index .
For , we again observe that since in both cases the output is generated by . Similarly, -local differential privacy of implies that and .
We now claim that . To see this, we first observe that the condition is an event defined over the output space of . Conditioning on reduces to running on . Note that for , differs from in at most two positions. Therefore, by -differential privacy of and group privacy (e.g. [DR14]), we obtain that
By quasi-convexity of -DP we obtain that
This immediately implies our claim since
Privacy amplification by sampling implies that . Applying the same argument to and using the triangle inequality we get that .
Applying the advanced composition theorem for differential privacy with and steps we get that satisfies -DP at index 1 for
Note that for we get that and the first term
Using the fact that we have that . This implies that and using and , we get the following bound on the second term
Combining these terms gives the claimed bound. ∎
Finally, we describe a reduction of the analysis of to the analysis of .
Proof of Theorem 7.
Let and be two datasets of length that differ at some index . The algorithm can be seen as follows. We first pick a random one-to-one mapping from and let
Namely, we move to the first place and apply a random permutation to the remaining elements. In the second step we apply to . It is easy to see that for a randomly and uniformly chosen and uniformly chosen the distribution of is exactly a random and uniform permutation of elements in .
For a fixed mapping , the datasets and differ only in the element with index . Therefore for and given in Theorem 8. Using the quasi-convexity of -DP over a random choice of we obtain that . ∎
4.1 Shuffling after local randomization
The proof of Theorem 7 relies crucially on shuffling the data elements before applying the local randomizers. However implementing such an algorithm in a distributed system requires trusting a remote shuffler with sensitive user data, thus negating the key advantage of the LDP model. Conversely, even if shuffling is performed on a set of already-randomized LDP responses, no additional privacy guarantees will be achieved if some attribute of each report (e.g., the choice of a randomizer) can reveal the identity of the reporting user.
Fortunately, it is possible design software systems where reports are randomized before shuffling and in which the reports coming from large groups of users are indistinguishable, e.g., because they apply the same local randomizer. In such constructions, the privacy of each user’s report still have its privacy amplified, by a factor proportional to the square root of the cardinality of indistinguishable reports. This follows immediately from the fact that shuffling the responses from the same local randomizers is equivalent to first shuffling the data points and then applying the local randomizers.
We make this claim formal in the following corollary.
For a domain , let for (where is the range space of ) be a sequence of algorithms such that is -differentially private for all values of auxiliary inputs in . Let be the algorithm that given a dataset , computes , samples a random and uniform permutation and outputs . Let be any set of indices such that for all , . Then for , and and every , satisfies -differential privacy at index in the central model, where .
We note that for the conclusion of this corollary to hold it is not necessary to randomly permute all the randomized responses. It suffices to shuffle the elements of . We also clarify that for , by we mean that for all sequences and , the output distributions of and are identical (and, in particular, the output distribution does not depend on ).
4.2 Lower Bound for Local DP Protocols
The results of this section give us a natural and powerful way to prove lower bounds for protocols in the local differential privacy model. We can apply Theorem 7 in the reverse direction to roughly state that for any given problem, lower bounds on the error of (for some term that might depend on the parameters of the system) of an -centrally differentially private protocol translate to a lower bound on the error of any -locally differentially private protocol of the kind that our techniques apply to.
As an exercise, a lower bound of for the problem of collecting frequency statistics from users across time in the central DP framework with privacy guarantee directly implies that the result in Theorem 6 is tight. We note here that the results of Dwork et al. [DNPR10] do show a lower bound of for the setting when in the central DP framework. This strongly suggests that our bounds might be tight, but we cannot immediately use this lower bound as it is stated only for the pure -differential privacy regime. It is an open problem to extend these results to the approximate differential privacy regime.
5 Discussion and Future Work
Our amplification-by-shuffling result is encouraging, as it demonstrates that the formal guarantees of differential privacy can encompass intuitive privacy-enhancing techniques, such as the addition of anonymity, which are typically part of existing, best-practice privacy processes. By accounting for the uncertainty induced by anonymity, in the central differential privacy model the worst-case, per-user bound on privacy cost can be dramatically lowered.
Our result implies that industrial adoption of LDP-based mechanisms may have offered much stronger privacy guarantees than previously accounted for, since anonymization of telemetry reports is standard privacy practice in industry. This is gratifying, since the direct motivation for our work was to better understand the guarantees offered by one industrial privacy-protection mechanism: the Encode, Shuffle, Analyze (ESA) architecture and Prochlo implementation of Bittau et al. [BEM17].
However, there still remain gaps between our analysis and proposed practical, real-world mechanisms, such as those of the ESA architecture. In particular, our formalization assumes the user population to be static, which undoubtedly it is not. On a related note, our analysis assumes that (most) all users send reports at each timestep and ignores the privacy implications of timing or traffic channels, although both must be considered, since reports may be triggered by privacy-sensitive events on users’ devices, and it is infeasible to send all possible reports at each timestep. The ESA architecture addresses traffic channels using large-scale batching and randomized thresholding of reports, with elision, but any benefits from that mitigation are not included in our analysis.
Finally, even though it is a key aspect of the ESA architecture, our analysis does not consider how users may fragment their sensitive information and at any timestep send multiple LDP reports, one for each fragment, knowing that each will be anonymous and unlinkable. The splitting of user data into carefully constructed fragments to increase users’ privacy has been explored for specific applications, e.g., by Fanti et al. [FPE16] which fragmented users’ string values into overlapping -grams, to bound sensitivity while enabling an aggregator to reconstruct popular user strings. Clearly, such fragmentation should be able to offer significantly improved privacy/utility tradeoffs, at least in the central model. However, in both the local and central models of differential privacy, the privacy implications of users’ sending LDP reports about disjoint, overlapping, or equivalent fragments of their sensitive information remain to be formally understood, in general.
- [ACG16] Martín Abadi, Andy Chu, Ian J. Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. In Proc. of the 2016 ACM SIGSAC Conf. on Computer and Communications Security (CCS), pages 308–318, 2016.
- [App17] Apple’s Differential Privacy Team. Learning with privacy at scale. Apple Machine Learning Journal, 1(9), December 2017.
- [BBG18] Borja Balle, Gilles Barthe, and Marco Gaboardi. Privacy amplification by subsampling: Tight analyses via couplings and divergences. CoRR, abs/1807.01647, 2018.
- [BEM17] Andrea Bittau, Úlfar Erlingsson, Petros Maniatis, Ilya Mironov, Ananth Raghunathan, David Lie, Mitch Rudominer, Ushasree Kode, Julien Tinnes, and Bernhard Seefeld. Prochlo: Strong privacy for analytics in the crowd. In Proc. of the 26th ACM Symp. on Operating Systems Principles (SOSP), 2017.