# Local Differential Privacy for Evolving Data

###### Abstract

There are now several large scale deployments of differential privacy used to collect statistical information about users. However, these deployments periodically recollect the data and recompute the statistics using algorithms designed for a single use. As a result, these systems do not provide meaningful privacy guarantees over long time scales. Moreover, existing techniques to mitigate this effect do not apply in the “local model” of differential privacy that these systems use.

In this paper, we introduce a new technique for local differential privacy that makes it possible to maintain up-to-date statistics over time, with privacy guarantees that degrade only in the number of changes in the underlying distribution rather than the number of collection periods. We use our technique for tracking a changing statistic in the setting where users are partitioned into an unknown collection of groups, and at every time period each user draws a single bit from a common (but changing) group-specific distribution. We also provide an application to frequency and heavy-hitter estimation.

## 1 Introduction

After over a decade of research, differential privacy [12] is moving from theory to practice, with notable deployments by Google [15, 6], Apple [2], Microsoft [10], and the U.S. Census Bureau [1]. These deployments have revealed gaps between existing theory and the needs of practitioners. For example, the bulk of the differential privacy literature has focused on the central model, in which user data is collected by a trusted aggregator who performs and publishes the results of a differentially private computation [11]. However, Google, Apple, and Microsoft have instead chosen to operate in the local model [15, 6, 2, 10], where users individually randomize their data on their own devices and send it to a potentially untrusted aggregator for analysis [18]. In addition, the academic literature has largely focused on algorithms for performing one-time computations, like estimating many statistical quantities [7, 22, 16] or training a classifier [18, 9, 4]. Industrial applications, however have focused on tracking statistics about a user population, like the set of most frequently used emojis or words [2]. These statistics evolve over time and so must be re-computed periodically.

Together, the two problems of periodically recomputing a population statistic
and operating in the local model pose a challenge.
Naïvely repeating a differentially private computation causes
the privacy loss to degrade as the square root of the number of
recomputations, quickly leading to enormous values of . This naïve
strategy is what is used in practice [15, 6, 2]. As a
result, Tang et al. [23] discovered that the privacy parameters guaranteed by
Apple’s implementation of differentially private data collection can become
unreasonably large even in relatively short time periods.^{1}^{1}1Although
the value of that Apple guarantees over the
course of say, a week, is not meaningful on its own, Apple does take additional
heuristic steps (as does Google) that make it difficult to combine user data
from multiple data collections [2, 15, 6]. Thus, they may
still provide a strong, if heuristic, privacy guarantee.
Published research on Google and Microsoft’s deployments suggests that they
encounter similar issues [15, 6, 10].

On inspection the naïve strategy of regular statistical updates seems wasteful as aggregate population statistics don’t change very frequently—we expect that the most frequently visited website today will typically be the same as it was yesterday. However, population statistics do eventually change, and if we only recompute them infrequently, then we can be too slow to notice these changes.

The central model of differential privacy allows for an elegant solution to this problem. For large classes of statistics, we can use the sparse vector technique [13, 22, 16, 11] to repeatedly perform computations on a dataset such that the error required for a fixed privacy level grows not with the number of recomputations, but with the number of times the computation’s outcome changes significantly. For statistics that are relatively stable over time, this technique dramatically reduces the overall error. Unfortunately, the sparse vector technique provably has no local analogue [18, 24].

In this paper we present a technique that makes it possible to repeatedly recompute a statistic with error that decays with the number of times the statistic changes significantly, rather than the number of times we recompute the current value of the statistic, all while satisfying local differential privacy. This technique allows for tracking of evolving local data in a way that makes it possible to quickly detect changes, at modest cost, so long as those changes are relatively infrequent. Our approach guarantees privacy under any conditions, and obtains good accuracy by leveraging three assumptions: (1) each user’s data comes from one of evolving distributions; (2), these distributions change relatively infrequently; and (3) users collect a certain amount of data during each reporting period, which we term an epoch. By varying the lengths of the epochs (for example, collecting reports hourly, daily, or weekly), we can trade off more frequent reports versus improved privacy and accuracy.

### 1.1 Our Results and Techniques

Although our techniques are rather general, we first focus our attention on the problem of privately estimating the average of bits, with one bit held by each user. This simple problem is widely applicable because most algorithms in the local model have the following structure: on each individual’s device, data records are translated into a short bit vector using sketching or hashing techniques. The bits in this vector are perturbed to ensure privacy using a technique called randomized response, and the perturbed vector is then sent to a server for analysis. The server collects the perturbed vectors, averages them, and produces a data structure encoding some interesting statistical information about the users as a whole. Thus many algorithms (for example, those based on statistical queries) can be implemented using just the simple primitive of estimating the average of bits.

We analyze our algorithm in the following probabilistic model (see Section 3 for a formal description). The population of users has an unknown partition into subgroups, each of which has size at least , time proceeds in rounds, and in each round each user samples a private bit independently from their subgroup-specific distribution. The private data for each user consists of the vector of bits sampled across rounds, and our goal is to track the total population mean over time. We require that the estimate be private, and ask for the strong (and widely known) notion of local differential privacy—for every user, no matter how other users or the server behave, the distribution of the messages sent by that user should not depend significantly on that user’s private data.

To circumvent the limits of local differential privacy, we consider a slightly relaxed estimation guarantee. Specifically, we batch the rounds into epochs, each consisting of rounds, and aim in each epoch to estimate , the population-wide mean across the subgroups and rounds of epoch . Thus, any sufficiently large changes in this mean will be identified after the current epoch completes, which we think of as introducing a small “delay”.

Our main result is an algorithm that takes data generated according to our
model, guarantees a fixed level of local privacy that grows (up to a
certain point) with the number of distributional changes rather than the number
of epochs, and guarantees that the estimates released at the end of each epoch
are accurate up to error that scales sublinearly in and only
polylogarithmically with the total number of epochs . Our method
improves over the naïve solution of simply recomputing the statistic every
epoch – which would lead to either privacy parameter or error that scales
linearly with the number of epochs—and offers a quantifiable way to reason
about the interaction of collection times, reporting frequency, and accuracy.
We note that one can alternatively phrase our algorithm so as to have a fixed
error guarantee, and a privacy cost that scales dynamically with the
number of times the distribution changes^{2}^{2}2We can achieve a dynamic,
data-dependent privacy guarantee using the notion of ex-post
differential privacy [19], for example by using a so-called privacy
odometer [21]..

###### Theorem 1.1 (Protocol for Bernoulli Means, Informal Version of Theorem 4.3).

In the above model, there is an -differentially private local protocol that achieves the following guarantee: with probability at least , while the total number of elapsed epochs where some subgroup distribution has changed is fewer than , the protocol outputs estimates where

where is the smallest subgroup size, is the number of users, is the chosen epoch length, and is the resulting number of epochs.

To interpret the theorem, consider the setting where there is only one subgroup and . Then to achieve error we need, ignoring factors, and that fewer than changes have occured. We emphasize that our algorithm satisfies -differential privacy for all inputs without a distributional assumption—only accuracy relies on distributional assumptions.

Finally, we demonstrate the versatility of our method as a basic building block in the design of locally differentially private algorithms for evolving data by applying it to the well-known heavy hitters problem. We do so by implementing a protocol due to [3] on top of our simple primitive. This adapted protocol enables us to efficiently track the evolution of histograms rather than single bits. Given a setting in which each user in each round independently draws an object from a discrete distribution over a dictionary of elements, we demonstrate how to maintain a frequency oracle (a computationally efficient representation of a histogram) for that dictionary with accuracy guarantees that degrade with the number of times the distribution over the dictionary changes, and only polylogarithmically with the number of rounds. We summarize this result below.

###### Theorem 1.2 (Protocol for Heavy-Hitters, Informal Version of Theorem 5.2).

In the above model, there is an -differentially private local protocol that achieves the following guarantee: with probability at least , while the total number of elapsed epochs where some subgroup distribution has changed is fewer than the protocol outputs estimate oracles such that

where is the number of users, is the smallest subgroup size, is the mean distribution over dictionary elements in epoch , is the number of dictionary elements, is the chosen epoch length, and is the resulting number of epochs.

### 1.2 Related Work

The problem of privacy loss for persistent local statistics has been recognized since at least the original work of Erlingsson et al. [15] on RAPPOR (the first large-scale deployment of differential privacy in the local model). Erlingsson et al. [15] offers a heuristic memoization technique that impedes a certain straightforward attack but does not prevent the differential privacy loss from accumulating linearly in the number of times the protocol is run. Ding et al. [10] give a formal analysis of a similar memoization technique, but the resulting guarantee is not differential privacy—instead it is a privacy guarantee that depends on the behavior of other users, and may offer no protection to users with idiosyncratic device usage. In contrast, we give a worst-case differential privacy guarantee.

Our goal of maintaining a persistent statistical estimate is similar in spirit to the model of privacy under continual observation Dwork et al. [14]. The canonical problem for differential privacy under continual observation is to maintain a running count of a stream of bits. However, the problem we study is quite different. In the continual observation model, new users are arriving, while existing users’ data does not change. In our model each user receives new information in each round. (Also, we work in the local model, which has not been the focus of the work on continual observation.)

The local model was originally introduced by Kasiviswanathan et al. [18], and the canonical algorithmic task performed in this model has become frequency estimation (and heavy hitters estimation). This problem has been studied in a series of theoretical [17, 3, 5, 8, 2] and practical works [15, 6, 2].

## 2 Local Differential Privacy

We require that our algorithms satisfy local differential privacy. Informally, differential privacy is a property of an algorithm , and states that the distribution of the output of is insensitive to changes in one individual user’s input. Formally, for every pair of inputs differing on at most one user’s data, and every set of possible outputs , . A locally differentially private algorithm is one in which each user applies a private algorithm only to their data.

Most local protocols are non-interactive: each user sends a single message that is independent of all other messages. Non-interactive protocols can thus be written as for some function , where each algorithm satisfies -differential privacy. Our model requires an interactive protocol: each user sends several messages over time, and these may depend on the messages sent by other users. This necessitates a slightly more complex formalism.

We consider interactive protocols among the users and an additional center. Each user runs an algorithm (possibly taking a private input ) and the central party runs an algorithm . We let the random variable denote the transcript containing all the messages sent by all of the parties. For a given party and a set of algorithms , we let denote the messages sent by user in the transcript . As a shorthand we will write , since will be clear from context. We say that the protocol is locally differentially private if the function is differentially private for every user and every (possibly malicious) .

###### Definition 2.1.

An interactive protocol satisfies -local differential privacy if for every user , every pair of inputs for user , and every set of algorithms , the resulting algorithm is -differentially private. That is, for every set of possible outputs , .

## 3 Overview: The Thresh Algorithm

Here we present our main algorithm, Thresh. The algorithmic framework is quite general, but for this high level overview we focus on the simplest setting where the data is Bernoulli. In Section 4 we formally present the algorithm for the Bernoulli case and analyze the algorithm to prove Theorem 1.1.

To explain the algorithm we first recall the distributional model. There are users, each of whom belongs to a subgroup for some ; denote user ’s subgroup by . There are rounds divided into epochs of length , denoted . In each round , each user receives a private bit . We define the population-wide mean by . For each epoch , we use to denote the average of the Bernoulli means during epoch , After every epoch , our protocol outputs such that is small.

The goal of Thresh is to maintain some public global estimate of . After any epoch , we can update this global estimate using randomized response: each user submits some differentially private estimate of the mean of their data, and the center aggregates these responses to obtain . The main idea of Thresh is therefore to update the global estimate only when it might become sufficiently inaccurate, and thus take advantage of the possibly small number of changes in the underlying statistic . The challenge is to privately identify when to update the global estimate.

The Voting Protocol. We identify these “update needed” epochs through a voting protocol. Users will examine their data and privately publish a vote for whether they believe the global estimate needs to be updated. If enough users vote to update the global estimate, we do so (using randomized response). The challenge for the voting protocol is that users must use randomization in their voting process, to keep their data private, so we can only detect when a large number of users vote to update.

First, we describe a naïve voting protocol. In each epoch , each user computes a binary vote . This vote is if the user concludes from their own samples that the global estimate is inaccurate, and otherwise. Each user casts a noisy vote using randomized response accordingly, and if the sum of the noisy votes is large enough then a global update occurs.

The problem with this protocol is that small changes in the underlying mean may cause some users to vote and others to vote , and this might continue for an arbitrarily long time without inducing a global update. As a result, each voter “wastes” privacy in every epoch, which is what we wanted to avoid. We resolve this issue by having voters also estimate their confidence that a global update needs to occur, and vote proportionally. As a result, voters who have high confidence will lose more privacy per epoch (but the need for a global update will be detected quickly), while voters with low confidence will lose privacy more slowly (but may end up voting for many rounds).

In more detail, each user decides their confidence level by comparing —the difference between the local average of their data in the current epoch and their local average the last time a global update occurred—to a small set of discrete thresholds. Users with the highest confidence will vote in every epoch, whereas users with lower confidence will only vote in a small subset of the epochs. We construct these thresholds and subsets so that in expectation no user votes in more than a constant number of epochs before a global update occurs, and the amount of privacy each user loses from voting will not grow with the number of epochs required before an update occurs.

## 4 Thresh: The Bernoulli Case

### 4.1 The Thresh Algorithm (Bernoulli Case)

We now present pseudocode for the algorithm Thresh, including both the general framework as well as the specific voting and randomized response procedures. We emphasize that the algorithm only touches user data through the subroutines Vote, and Est, each of which accesses data from a single user in a single epoch. Thus, it is an online local protocol in which user ’s response in epoch depends only on user ’s data from epoch (and the global information that is viewable to all users). Thresh uses carefully chosen thresholds for to discretize the confidence of each user; see Section 4.2 for details on this choice.

We begin with a privacy guarantee for Thresh. Our proof uses the standard analysis of the privacy properties of randomized response, combined with the fact that users have a cap on the number of updates that prevents the privacy loss from accumulating. We remark that our privacy proof does not depend on distributional assumptions, which are only used for the proof of accuracy. We sketch a proof here. A full proof appears in Section A of the Appendix.

###### Theorem 4.1.

The protocol Thresh satisfies -local differential privacy (Definition 2.1)

Proof Sketch: Naïvely applying composition would yield a privacy parameter that scales with . Instead, we will rely on our defined privacy “caps” and that limit the number of truthful votes and estimates each user sends. Intuitively, each user sends at most messages that depend on their private data, and the rest are sampled independently of their private data. Thus, we need only bound the privacy “cost” of each of these elements of a user’s transcript coming from a different distribution and bound the sum of the costs by .

### 4.2 Accuracy Guarantee

Our accuracy theorem needs the following assumption on , the size of the smallest subgroup, to guarantee that a global update occurs whenever any subgroup has all of its member users vote “yes”.

###### Assumption 4.2.

.

This brings us to our accuracy theorem, followed by a proof sketch (see Appendix B for full details).

###### Theorem 4.3.

Given number of users , number of subgroups , smallest subgroup size , number of rounds , privacy parameter , and chosen epoch length and number of epochs , with probability at least , in every epoch such that fewer than

changes have occurred in epochs , Thresh outputs such that

Proof Sketch: We begin by proving correctness of the voting process. Lemma B.1 guarantees that if every user decides that their subgroup distribution has not changed then a global update does not occur, while Lemma B.2 guarantees that if every user in some subgroup decides that a change has occurred, then a global update occurs. By Lemma B.3, for each user the individual user estimates driving these voting decisions are themselves accurate to within of the true . Finally, by Lemma B.4 guarantees that if every user decides that a change has occurred, then a global update occurs that produces a global estimate that is within of the true .

To reason about how distribution changes across multiple epochs affect Thresh, we use the preceding results to show that the number of global updates never exceeds the number of distribution changes (Lemma B.6). A more granular guarantee then bounds the number of changes any user detects—and the number of times they vote accordingly—as a function of the number of distribution changes (Lemma B.7). These results enable us, in Lemma B.8, to show that each change increases a user’s vote privacy cap by at most 2 and estimate privacy cap by at most 1.

Finally, recall that Thresh has each user compare their current local estimate to their local estimate in the last global update, , to decide how to vote, with higher thresholds for increasing the likelihood of a “yes” vote. This implies that if every user in some subgroup computes a local estimate such that exceeds the highest threshold, then every user sends a “yes” vote and a global update occurs, bringing with it the accuracy guarantee of Lemma B.4. In turn, we conclude that never exceeds the highest threshold, and our accuracy result follows.

We conclude this section with a few remarks about Thresh. First, while the provided guarantee depends on the number of changes of any size, one can easily modify Thresh to be robust to changes of size , paying and additive term in the accuracy. Second, the accuracy’s dependence on offers guidance for its selection: roughly, for desired accuracy , one should set . Finally, in practice one may want to periodically assess how many users have exhausted their privacy budgets, which we can achieve by extending the voting protocol to estimate the fraction of “live” users. We primarily view this as an implementation detail outside of the scope of the exact problem we study.

## 5 An Application to Heavy Hitters

We now use the methods developed above to obtain similar guarantees for a common problem in local differential privacy known as heavy hitters. In this problem each of users has their own dictionary value (e.g. their homepage), and an aggregator wants to learn the most frequently held dictionary values (e.g. the most common homepages), known as “heavy hitters”, while satisfying local differential privacy for each user. The heavy hitters problem has attracted significant attention [20, 17, 5, 8]. Here, we show how our techniques combine with an approach of Bassily and Smith [3] to obtain the first guarantees for heavy hitters on evolving data. We note that our focus on this approach is primarily for expositional clarity; our techniques should apply just as well to other variants, which can lead to more efficient algorithms.

### 5.1 Setting Overview

As in the simpler Bernoulli case, we divide time into rounds and epochs. Here, in each round each user draws a sample from a subgroup-specific distribution over the values in dictionary and track , the weighted average dictionary distribution in each epoch. We will require the same Assumption 4.2 as in the Bernoulli case, and we also suppose that , a common parameter regime for this problem.

In the Bernoulli case users could reason about the evolution of directly from their own samples in each epoch. Since it is reasonable to assume , this is no longer possible in our new setting— is too large an object to estimate from samples. However, we can instead adopt a common approach in heavy hitters estimation and examine a “smaller” object using a hash on dictionary samples. We will therefore have users reason about the distribution over hashes that induces, which is a much smaller joint distribution of (transformed) Bernoulli distributions. Our hope is that users can reliably “detect changes” by analyzing , and the feasibility of this method leans crucially on the properties of the hash in question.

### 5.2 Details and Privacy Guarantee

First we recall the details of the one-shot protocol from Bassily and Smith [3]. In their protocol, each user starts with a dictionary value with an associated basis vector . The user hashes this to a smaller vector using a (population-wide) , a Johnson-Lindenstrauss matrix where . The user then passes this hash to their own local randomizer , and the center aggregates these randomized values into a single which induces a frequency oracle.

We will modify this to produce a protocol HeavyThresh in the vein of Thresh. In each epoch each user computes an estimated histogram and then hashes it into , where (we assume the existence of a subroutine GenProj for generating ). Each user votes on whether or not a global update has occurred by comparing to their estimate during the most recent update, , in HeavyVote. Next, HeavyThresh aggregates these votes to determine whether or not a global update will occur. Depending on the result, each user then calls their own estimation subroutine HeavyEst and outputs a randomized response using accordingly. If a global update occurs, HeavyThresh aggregates these responses into a new published global hash ; if not, HeavyThresh publishes . In either case, HeavyThresh publishes as well. This final output is a frequency oracle, which for any offers an estimate of .

HeavyThresh will use the following thresholds with for . See Section 5.3 for details on this choice. Fortunately, the bulk of our analysis uses tools already developed either in Section 4 or Bassily and Smith [3]. Our privacy guarantee is almost immediate: since HeavyThresh shares its voting protocols with Thresh, the only additional analysis needed is for the estimation randomizer (Lemma C.1). Using the privacy of , privacy for HeavyThresh follows by the same proof as for the Bernoulli case.

###### Theorem 5.1.

HeavyThresh is -local differentially private.

### 5.3 Accuracy Guarantee

As above, an accuracy guarantee for HeavyThresh unfolds along similar lines as that for Thresh, with additional recourse to results from Bassily and Smith [3]. We again require Assumption 4.2 and also assume (a weak assumption made primarily for neatness in Theorem 1.2). Our result and its proof sketch follow, with details and full pseudocode in Appendix Section D.

###### Theorem 5.2.

With probability at least , in every epoch such that fewer than

changes have occurred in epochs ,

Proof Sketch: Our proof is similar to that of Theorem 4.3 and proceeds by proving analogous versions of the same lemmas, with users checking for changes in the subgroup distribution over observed hashes rather than observed bits. This leads to one new wrinkle in our argument: once we show that the globally estimated hash is close to the true hash, we must translate from closeness of hashes to closeness of the distributions they induce (Lemma D.4) . The rest of the proof, which uses guarantees of user estimate accuracy to 1. guarantee that sufficiently large changes cause global updates and 2. each change incurs a bounded privacy loss, largely follows that of Theorem 4.3.

## References

- Abowd [2016] John M. Abowd. The challenge of scientific reproducibility and privacy protection for statistical agencies. Census Scientific Advisory Committee, 2016.
- Apple [2017] Differential Privacy Team Apple. Learning with privacy at scale. Technical report, Apple, 2017.
- Bassily and Smith [2015] Raef Bassily and Adam Smith. Local, private, efficient protocols for succinct histograms. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 127–135. ACM, 2015.
- Bassily et al. [2014] Raef Bassily, Adam Smith, and Abhradeep Thakurta. Differentially private empirical risk minimization: Efficient algorithms and tight error bounds. arXiv preprint arXiv:1405.7085, 2014.
- Bassily et al. [2017] Raef Bassily, Uri Stemmer, and Abhradeep Guha Thakurta. Practical locally private heavy hitters. In Advances in Neural Information Processing Systems, pages 2285–2293, 2017.
- Bittau et al. [2017] Andrea Bittau, Úlfar Erlingsson, Petros Maniatis, Ilya Mironov, Ananth Raghunathan, David Lie, Mitch Rudominer, Usharsee Kode, Julien Tinnes, and Bernhard Seefeld. Prochlo: Strong privacy for analytics in the crowd. arXiv preprint arXiv:1710.00901, 2017.
- Blum et al. [2013] Avrim Blum, Katrina Ligett, and Aaron Roth. A learning theory approach to noninteractive database privacy. Journal of the ACM (JACM), 60(2):12, 2013.
- Bun et al. [2017] Mark Bun, Jelani Nelson, and Uri Stemmer. Heavy hitters and the structure of local privacy. arXiv preprint arXiv:1711.04740, 2017.
- Chaudhuri et al. [2011] Kamalika Chaudhuri, Claire Monteleoni, and Anand D Sarwate. Differentially private empirical risk minimization. Journal of Machine Learning Research, 12(Mar):1069–1109, 2011.
- Ding et al. [2017] Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin. Collecting telemetry data privately. In Proceedings of Advances in Neural Information Processing Systems 30 (NIPS 2017), 2017.
- Dwork and Roth [2014] Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014.
- Dwork et al. [2006] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference, pages 265–284. Springer, 2006.
- Dwork et al. [2009] Cynthia Dwork, Moni Naor, Omer Reingold, Guy N Rothblum, and Salil Vadhan. On the complexity of differentially private data release: efficient algorithms and hardness results. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pages 381–390. ACM, 2009.
- Dwork et al. [2010] Cynthia Dwork, Moni Naor, Toniann Pitassi, and Guy N Rothblum. Differential privacy under continual observation. In Proceedings of the forty-second ACM symposium on Theory of computing, pages 715–724. ACM, 2010.
- Erlingsson et al. [2014] Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, pages 1054–1067. ACM, 2014.
- Hardt and Rothblum [2010] Moritz Hardt and Guy N Rothblum. A multiplicative weights mechanism for privacy-preserving data analysis. In Foundations of Computer Science (FOCS), 2010 51st Annual IEEE Symposium on, pages 61–70. IEEE, 2010.
- Hsu et al. [2012] Justin Hsu, Sanjeev Khanna, and Aaron Roth. Distributed private heavy hitters. In International Colloquium on Automata, Languages, and Programming, pages 461–472. Springer, 2012.
- Kasiviswanathan et al. [2008] Shiva Prasad Kasiviswanathan, Homin K Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. What can we learn privately? In Proceedings of the 54th Annual Symposium on Foundations of Computer Science, pages 531–540, 2008.
- Ligett et al. [2017] Katrina Ligett, Seth Neel, Aaron Roth, Bo Waggoner, and Steven Z Wu. Accuracy first: Selecting a differential privacy level for accuracy constrained erm. In Advances in Neural Information Processing Systems, pages 2563–2573, 2017.
- Mishra and Sandler [2006] Nina Mishra and Mark Sandler. Privacy via pseudorandom sketches. In Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 143–152. ACM, 2006.
- Rogers et al. [2016] Ryan M Rogers, Aaron Roth, Jonathan Ullman, and Salil Vadhan. Privacy odometers and filters: Pay-as-you-go composition. In Advances in Neural Information Processing Systems, pages 1921–1929, 2016.
- Roth and Roughgarden [2010] Aaron Roth and Tim Roughgarden. Interactive privacy via the median mechanism. In Proceedings of the forty-second ACM symposium on Theory of computing, pages 765–774. ACM, 2010.
- Tang et al. [2017] Jun Tang, Aleksandra Korolova, Xiaolong Bai, Xueqiang Wang, and Xiaofeng Wang. Privacy loss in apple’s implementation of differential privacy on macos 10.12. arXiv preprint arXiv:1709.02753, 2017.
- Ullman [2018] Jonathan Ullman. Tight lower bounds for locally differentially private selection. Manuscript, 2018.

## Appendix A Missing Proofs from Section 4

###### Theorem A.1.

The protocol Thresh satisfies -local differential privacy (Definition 2.1)

###### Proof.

To begin, we fix an arbitrary private user and arbitrary algorithms for the other users and for the center. Fix any pair of inputs for user . To ease notation, let and be the random variables corresponding to the messages sent by user in the protocol with inputs , respectively. Note that we drop the subscript , since user will be fixed throughout. To prove the theorem, it suffices to show

for every possible set of messages .

The structure of the transcripts is as follows: each epoch contributes two elements, first the vote (the output of ) and the estimate (the output of ). So we can write and

For any execution of the protocol, we can partition the set of epochs into those epochs where in at least one of and user sets VoteYes to True, and those where VoteYes is False in both and ; similarly, we can partition into those epochs where SendEstimate is True in at least one of and and those where SendEstimate is False in both and . sets SendEstimate to true in one of or .

Since every epoch in causes the counter to increase by , contains at most epochs from each of and , so .

For any , user will sample and from in both and . Thus

To complete the proof, we need to bound

which will hold because every factor in the product is at most and . To see why, consider some epoch . The first component of is the vote . The only two possibilities for how is chosen are or . One can easily verify that for any ,

We now consider the second component of , which is . As in the case, since every epoch in causes the counter to increase by , contains at most epochs from each of and , so .

When SendEstimate is False, then is sampled from

and when SendEstimate is True, then is sampled from

depending on the value of the private data , which lies in . Thus, the parameter in the Bernoulli distribution lies in . Again, one can easily verify that for any ,

Putting it together, we have

This completes the proof. ∎

## Appendix B Missing Proofs From Section 4.2

We begin the proof of our accuracy guarantee with a series of lemmas. Recalling that we set

and

we start by showing that if every user votes that a change has not occurred, then a global update will not occur.

###### Lemma B.1.

With probability at least , in every epoch , if every user sets VoteYes False then GlobalUpdate False.

###### Proof.

Since every user sets VoteYes False, every is an iid draw from a Bern distribution. Thus a Chernoff bound says

Since GlobalUpdate, GlobalUpdate False. Union-bounding across epochs completes the proof. ∎

Similarly, we also want to ensure that if every user in some subgroup votes that an update has occurred then a global update will indeed occur.

###### Lemma B.2.

With probability at least , in every epoch , if there is a subgroup where every user sets VoteYes True then GlobalUpdate True.

###### Proof.

Since , Chernoff bounds imply that the aggregate vote satisfies

Recalling that GlobalUpdate True if and only if , it suffices to show that

Rearranging, it is enough to show that

and using the fact that it is enough that

which follows from our setting of . Union-bounding across subgroups and epochs completes the proof. ∎

We now show that every user in every epoch obtains an estimate of of bounded inaccuracy. This will enable us to—among other things—guarantee that users do not send “false positive” votes.

###### Lemma B.3.

With probability at least , in each epoch each user has

###### Proof.

, so by an additive Chernoff bound

A union bound across users and epochs then completes the proof. ∎

Next, in those epochs in which a global update occurs and no user has hit their estimation privacy cap , in the interest of asymptotic optimality we want to obtain a similar error for the resulting collated estimate .

###### Lemma B.4.

With probability at least , in every epoch where every user sets SendEstimate True,

###### Proof.

Since every user sets SendEstimate True we know that for all

so

Since is an average of -valued random variables, we transform it into the -valued random variable

Applying an additive Chernoff bound as above yields

which implies that

Similarly, as ,

Combining these results in the triangle inequality yields that with probability at least

Since , this implies that

so to get

it is enough for

Substituting in our setting of

and union-bounding over epochs completes the proof. ∎

Finally, we use the above lemmas to reason about how long users’ privacy budgets last. We’ll first define a useful term for this claim.

###### Definition B.5.

We say a change occurs in epoch if there exists subgroup such that . Given changes and where , we say that and are adjacent changes if there does not exist a change such that .

This lets us prove the following lemma bounding the frequency of global updates.

###### Lemma B.6.

With probability at least , given adjacent changes and , GlobalUpdate True in at most one epoch .

###### Proof.

First, with probability at least all of the preceding lemma in this section apply, and we condition on them for the remainder of this proof.

Assume instead that GlobalUpdate True and GlobalUpdate True as well for , and that GlobalUpdate False for all . Recall that by Lemma B.1, if in epoch every user sets VoteYes False then

which means GlobalUpdate False. Therefore since we know GlobalUpdate True, it follows that at least one user sets VoteYes True. By the thresholding structure of Thresh, this implies that