The Power of The Hybrid Model for Mean Estimation
In this work we explore the power of the hybrid model of differential privacy (DP) proposed in (Avent et al., 2017), where some users desire the guarantees of the local model of DP and others are content with receiving the trusted curator model guarantees. In particular, we study the accuracy of mean estimation algorithms for arbitrary distributions in bounded support. We show that a hybrid mechanism which combines the sample mean estimates obtained from the two groups in an optimally weighted convex combination performs a constant factor better for a wide range of sample sizes than natural benchmarks. We analyze how this improvement factor is parameterized by the problem setting and how it varies with sample size.
The Power of The Hybrid Model for Mean Estimation
Yatharth Dubey††thanks: Work done while visiting University of Southern California School of Industrial and Systems Engineering Georgia Institute of Technology firstname.lastname@example.org Aleksandra Korolova Department of Computer Science University of Southern California email@example.com
noticebox[b]Accepted as a contributed talk at the Workshop on Privacy Preserving Machine Learning, 32nd NeurIPS, 2018.\end@float
Differential privacy, introduced by (Dwork et al., 2006), has become the de facto standard of privacy in the computer science literature dealing with machine learning and statistical data analysis. Two traditional models of trust in the literature are: the trusted curator model (TCM) and the local model (LM). In the trusted curator model, the analyst sees the true sample data and is able to calibrate the noise needed to achieve DP to the sensitivity of the query. In the local model, the analyst is only able to access samples through some locally randomizing oracle ensuring DP for each sample before it reaches the analyst. This means that each sample is received with noise calibrated to the sensitivity of the domain. It is well understood theoretically and empirically that accuracy guarantees are far better in the trusted-curator model than in the local model (Kairouz et al., 2014; Bassily and Smith, 2015; Duchi et al., 2018; Bittau et al., 2017; Fanti et al., 2016). Another way to phrase this is that when DP is desired in the TCM model, fewer samples are needed than in the LM. On the other hand, when it comes to deployments of DP, both companies and users find the local model of privacy better matched to their goals (Greenberg, June 13, 2016; Madden and Rainie, 2015). This poses a challenge for companies with a smaller user bases than Apple and Google looking to deploy DP – on the one hand, they want to guarantee local DP to their users; on the other hand, if they do, sample complexity results show they won’t be able to gain much utility from the data.
Until recently in the DP community, these models were considered mutually exclusively. Recent work of (Avent et al., 2017) observed that it may be natural and beneficial to consider a hybrid model, in which the majority of the users desires privacy in the local model, but a small fraction of users is willing to contribute data with TCM guarantees. Indeed, it is common in industry to have a small group of “power users" or “early adopters" who may be willing to trust the company more than the general user (Merriman, Oct 7, 2014). The work of (Avent et al., 2017) demonstrates empirically that in the hybrid model one can develop algorithms that take advantage of the early adopter data in order to improve the overall accuracy of the statistics learned for the task of local search. However, their results leave open the question as to how much improvement can be gained compared with the LM, and the dependence of it on the sample size and opt-in percentage into TCM.
These are exactly the questions we aim to address in this work. In particular, we study the problem of mean estimation for arbitrary distributions in bounded support, where the support and the variance of the distribution are known to the analyst. We consider a hybrid mechanism that calculates a DP estimate of the mean of the opt-in samples (i.e., those who trust the analyst) and calculates the mean of the locally randomized samples and then outputs a convex combination of the two. We compare the performance of this hybrid mechanism with two natural benchmarks: an estimate of the mean obtained by ensuring local DP for all samples and an estimate obtained only from the opt-in samples. We characterize the outputs of these mechanisms as random variables and analyze their performance in terms of mean squared error.
We find that the hybrid mechanism outperforms the benchmarks and provide the precise characterization by how much and the dependence of the improvement on the opt-in percentage and sample size. Although the improvement factor is bounded roughly by 2, which may not seem like much, it is an encouraging finding for two reasons: first, constants matter in practical deployments of DP; and second, both the problem of mean estimation we consider and the hybrid mechanism we use are very simple. For more complex problems, where the separation in sample complexity with what’s achievable in the LM and TCM is larger, and using more sophisticated hybrid mechanisms, the improvements may be significantly better.
In Section 2 we introduce the formal notation for exploring the problem. We present analytical results describing the improvement factor and empirical results showing the performance of the hybrid mechanism against the two benchmarks in Section 3. We conclude with a discussion of future work. Appendix A presents the necessary mathematical background for our analysis. Appendix B details the technical reasons for choosing mean squared error as the accuracy metric.
2 Models and Notation
Each of the individuals has data drawn from distribution 111We leave the question of analyzing performance when the data of opt-in and local groups comes from different distributions to future work. with variance . It is important to note in this work the bounds for the support, as stated above, as well as the variance of the distribution are known to the analyst.222Relaxing assumptions on what is known to the analyst might be an interesting direction of future work. The analyst only sees the original data of the individuals who opt-in to the trusted curator model of DP. The rest of the individuals prefer the local model of DP, where each individual randomizes their own data to satisfy -DP and only then submits the sample to the analyst. Note that is the opt-in rate, and naturally must be bounded such that . Further, we expect to be small. The analyst would like to estimate the sample mean of the individuals such that each individual’s trust preference is satisfied and -DP is ensured. We use to represent the true sample mean of all individuals. will represent the analyst’s -DP estimate of the sample mean.
As discussed in the introduction, there are two natural benchmarks to compare with:
The model in which we provide all individuals with the stronger notion of local DP, and therefore the mechanism, which we will denote by FullLM and using the symbol , calculates the mean of the samples submitted to it after undergoing randomization necessary for ensuring local DP. (Duchi et al., 2018) and (Ding et al., 2017) show that the Laplace mechanism is optimal for the problem of one-dimensional mean estimation in the local model, so wlog, we assume the samples are submitted after addition of properly calibrated Laplace noise (Dwork et al., 2006).
The model in which we ignore the data submitted in the local model, and rely only on the opt-in samples to calculate the DP mean, a mechanism which we will denote by OnlyTCM. We use to represent the true sample mean of the opt-in samples and for the -DP estimate of that satisfies TCM differential privacy, which we will also compute using the properly calibrated Laplace noise (Dwork et al., 2006).
We explore how much we can improve the accuracy of our estimate by using a hybrid mechanism, which we call Hybrid. This mechanism calculates two subsample means while preserving exactly the privacy preferences of each subsample, and outputs the convex combination of the two. Specifically, let represent the true sample mean of the individuals who prefer LM differential privacy and be the -DP estimate of in the LM. Hybrid returns an estimate using some weight , s.t., We will say that the Hybrid mechanism has optimal weighing and denote this weight by if the weight minimizes the mean squared error of the estimate. We will derive in Lemma 3.3.
3 Performance of Hybrid Mechanism
We now study the accuracy improvement the Hybrid mechanism provides over the benchmarks, OnlyTCM and FullLM. We study the accuracy of these mechanisms by modeling their errors as random variables and studying their expectations, in particular, their second moments about the origin. The derivations of second moments for the Laplace distribution, Normal distribution, and the Normal-Laplace distribution (Reed, 2006), which are needed to compute the squared errors, can be found in Appendix A.
Lemma 3.1 (Expected squared error of FullLM).
FullLM has expected squared error
FullLM returns estimate where and iid where . Then clearly, and we follow by the moment generating function of the Laplace distribution. Note that . Then, we compute second moment of about the origin as follows, . Therefore, . ∎
Lemma 3.2 (Expected squared error of OnlyTCM).
OnlyTCM has expected squared error
OnlyTCM returns the estimate . Then we would like to study the following decomposition of the error
The first term is a Laplace random variable
The second term is the difference of two sample means, so by the CLT and the difference of Normally distributed random variables
Therefore, our error follows a Normal-Laplace distribution Thus, the conclusion follows from Lemma A.4. ∎
Lemma 3.3 (Expected squared error of Hybrid).
Hybrid has expected squared error where
We prove that is an optimal weighing for the convex combination of the two mean estimates, i.e., that it minimizes the mean squared error of the estimate, at the end of this proof. For now, we study the distribution of the following decomposition of the error .
Each of the terms in the above equation follows a familiar distribution. In particular, the first term is simply a weight times the Laplace noise we add to the sample mean of the opt-in data. So, by Lemma A.5,
The second term is a weight times the sum of the Laplace noise that is added at each local randomization. Thus, by the CLT,
The third term is the difference of two sample means, therefore, by the CLT and the difference of two Normally distributed random variables,
Putting these together, we see that our error is distributed according to the following instantiation of the Normal-Laplace distribution The statement of the Lemma now follows from Lemma A.4.
Now it is easy to see by taking the derivative that minimizes ∎
We can now compute the improvement afforded by the Hybrid mechanism using the optimal weighting as compared to the best of the two benchmarks.
Theorem 3.4 (Hybrid vs. Benchmarks).
Let . Then,
Theorem 3.4 is difficult to interpret as it makes clear that the improvement that can be achieved by using the Hybrid approach depends on all parameters of the model: the number of samples, the opt-in rate, the distribution, the data universe, and the desired privacy parameter.
Let us first compute when Let and let By means of algebraic manipulation, it is easy to see that if and , then In other words, as intuitively expected given the sample complexity results, when the opt-in rate is sufficiently large and the number of samples is sufficiently large, the DP mean estimate obtained from the opt-in samples will be more accurate than if all data was received using the LM. In this case, the improvement offered by using the Hybrid model is at most
The more interesting case, given that the motivation for the hybrid model comes from the desire to deploy DP by companies with smaller number of users, is when In that case, and the factor of improvement is It is easy to check algebraically that this expression is maximized at and is equal to
The maximum value of is achieved at and is equal to
4 Conclusions and Future Directions
We showed that the hybrid model holds some promise for enabling wider adoption of DP by analyzing the performance of a simple hybrid mechanism for the task of mean estimation and demonstrating that in scenarios with small sample sizes it gives a lower error compared to alternatives.
Although for the problem of mean estimation the improvement is only a constant factor, we believe that there is additional promise in considering more sophisticated hybrid mechanisms for more complex problems. We conjecture that a hybrid mechanism that uses the opt-in samples to inform the local randomization (as done by (Avent et al., 2017)) can significant increase utility. The conjecture stems from our analogy to the power of interactivity, where separation results between the class of problems that can be solved in the non-interactive LM and those that can be solved in the interactive LM are known (Kasiviswanathan et al., 2011). Thus, a natural extension to our work is to show separation for hybrid DP and local DP.
We are grateful to Brendan Avent for constructive comments and improved proof of Lemma 3.1 and to Rachel Cummings for comments on the draft. The work was supported by NSF Award # CNS-1755992.
- Avent et al. (2017) Brendan Avent, Aleksandra Korolova, David Zeber, Torgeir Hovden, and Benjamin Livshits. BLENDER: Enabling local search with a hybrid differential privacy model. In 26th USENIX Security Symposium (USENIX Security 17), pages 747–764. USENIX Association, 2017. ISBN 978-1-931971-40-9. URL https://www.usenix.org/conference/usenixsecurity17/technical-sessions/presentation/avent.
- Bassily and Smith (2015) Raef Bassily and Adam Smith. Local, private, efficient protocols for succinct histograms. In Proceedings of the Symposium on Theory of Computing (STOC), pages 127–135, 2015.
- Bittau et al. (2017) Andrea Bittau, Ulfar Erlingsson, Petros Maniatis, Ilya Mironov, Ananth Raghunathan, David Lie, Mitch Rudominer, Ushasree Kode, Julien Tinnes, and Bernhard Seefeld. Prochlo: Strong privacy for analytics in the crowd. In Proceedings of the 26th Symposium on Operating Systems Principles, pages 441–459. ACM, 2017.
- Chan et al. (2011) T-H Hubert Chan, Elaine Shi, and Dawn Song. Private and continual release of statistics. ACM Transactions on Information and System Security (TISSEC), 14(3):26, 2011.
- Ding et al. (2017) Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin. Collecting telemetry data privately. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 3571–3580. Curran Associates, Inc., 2017. URL http://papers.nips.cc/paper/6948-collecting-telemetry-data-privately.pdf.
- Duchi et al. (2018) John C. Duchi, Michael I. Jordan, and Martin J. Wainwright. Minimax optimal procedures for locally private estimation. Journal of the American Statistical Association, 113(521):182–201, 2018. doi: 10.1080/01621459.2017.1389735. URL https://doi.org/10.1080/01621459.2017.1389735.
- Dwork et al. (2006) Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Shai Halevi and Tal Rabin, editors, Theory of Cryptography, pages 265–284, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg. ISBN 978-3-540-32732-5.
- Fanti et al. (2016) Giulia Fanti, Vasyl Pihur, and Úlfar Erlingsson. Building a RAPPOR with the unknown: Privacy-preserving learning of associations and data dictionaries. Proceedings on Privacy Enhancing Technologies (PETS), 3:41–61, 2016.
- Greenberg (June 13, 2016) Andy Greenberg. Apple’s differential privacy is about collecting your data – but not your data. In Wired, June 13, 2016.
- Kairouz et al. (2014) Peter Kairouz, Sewoong Oh, and Pramod Viswanath. Extremal mechanisms for local differential privacy. In Advances in Neural Information Processing Systems (NIPS), pages 2879–2887, 2014.
- Kasiviswanathan et al. (2011) Shiva Prasad Kasiviswanathan, Homin K. Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. What can we learn privately? SIAM J. Comput., 40(3):793–826, June 2011. ISSN 0097-5397. doi: 10.1137/090756090. URL http://dx.doi.org/10.1137/090756090.
- Madden and Rainie (2015) Mary Madden and Lee Rainie. Americans’ attitudes about privacy, security and surveillance. Technical report, Pew Research Center, 2015.
- Merriman (Oct 7, 2014) Chris Merriman. Microsoft reminds privacy-concerned Windows 10 beta testers that they’re volunteers. In The Inquirer, http://www.theinquirer.net/2374302, Oct 7, 2014.
- Papoulis et al. (2002) A. Papoulis, S.U. Pillai, and S.U. Pillai. Probability, Random Variables, and Stochastic Processes. McGraw-Hill electrical and electronic engineering series. McGraw-Hill, 2002. ISBN 9780073660110. URL https://books.google.com/books?id=YYwQAQAAIAAJ.
- Reed (2006) William J. Reed. The normal-laplace distribution and its relatives. In N. Balakrishnan, José María Sarabia, and Enrique Castillo, editors, Advances in Distribution Theory, Order Statistics, and Inference, pages 61–74. Birkhäuser Boston, Boston, MA, 2006. ISBN 978-0-8176-4487-1. doi: 10.1007/0-8176-4487-3_4. URL https://doi.org/10.1007/0-8176-4487-3_4.
Appendix A Preliminaries
We present the moment generating functions (mgf) of the Laplace, Gaussian, and Normal-Laplace distributions and derive the second moment about the origin for each. These results are used in calculating the expected squared errors of the mechanisms of interest in Section 3.
Lemma A.1 (2nd moment about origin, Laplace distribution).
Let . Then,
The moment generating function for is Then, Plugging in , we get ∎
The Normal random variables relevant for the work have , so we focus on the central moments of the Normal distribution.
Lemma A.2 (2nd central moment about origin, Normal distribution [Papoulis et al., 2002]).
Let . Then,
We will also need to consider the error of the sum of a Laplace random variable and a Normal random variable.
Definition A.3 (Normal-Laplace distribution [Reed, 2006]).
A random variable where and , is distributed according to the Normal-Laplace distribution, which we denote .
As we did with the Normal distribution above, we focus on central moments of the Normal-Laplace distribution.
Lemma A.4 (2nd central moment about origin, Normal-Laplace distribution).
Let . Then,
The moment generating function for is Then, as the lemma states ∎
We will also need the following property of the Laplace distribution.
Let and where is a constant. Then,
Let be the cumulative distribution function (cdf) of and be the probability density function (pdf) of . Let and be the same for . Then, by definition we have
Evaluating the cdf of the Laplace distribution at , we get
Finally, we translate the cdfs to pdfs and the lemma follows immediately,
Appendix B Sample Complexity Results
Let be the accuracy parameter and be the confidence parameter. The following table describes the sample complexities of algorithms for mean estimation in the local model and the trusted curator model. Here the sample complexity of a mechanism is the number of samples sufficient to upper bound its absolute error by with probability at least .
|Sample Complexity||Optimal Algorithm [Duchi et al., 2018]|
The sample complexity derivation for the local model uses the concentration of the sum of i.i.d. Laplace random variables given in [Chan et al., 2011]. The sample complexity derivation for the trusted curator model comes from the accuracy statement of the Laplace mechanism in [Dwork et al., 2006].
In addition to comparing the performance of hybrid mechanism with alternatives using mean squared errors, we also explored the idea of comparing them in terms of sample complexity. This proved difficult and un-informative for several reasons. First, we found that the bound on the concentration of the sum of Laplace random variables in [Chan et al., 2011] is nice to work with analytically, but is far from tight. In addition, the inherent lack of strength of the union bound limits what we can say about errors with several terms in their algebraic expressions.
Another measure of performance we considered was absolute error. The trouble with this approach came from the difficulty in integrating one-sided probability density functions that arose from the convolution of several random variables, whether it was several Laplace random variables or a Laplace random variable and a Normal random variable. In comparison, the mean squared error was both comparatively straight-forward to calculate and gave us a precise characterization of the performance.