# Privacy Amplification by Subsampling: Tight Analyses via Couplings and Divergences

###### Abstract

Differential privacy comes equipped with multiple analytical tools for the design of private data analyses. One important tool is the so called “privacy amplification by subsampling” principle, which ensures that a differentially private mechanism run on a random subsample of a population provides higher privacy guarantees than when run on the entire population. Several instances of this principle have been studied for different random subsampling methods, each with an ad-hoc analysis. In this paper we present a general method that recovers and improves prior analyses, yields lower bounds and derives new instances of privacy amplification by subsampling. Our method leverages a characterization of differential privacy as a divergence which emerged in the program verification community. Furthermore, it introduces new tools, including advanced joint convexity and privacy profiles, which might be of independent interest.

## 1 Introduction

Subsampling is a fundamental tool in the design and analysis of differentially private mechanisms. Broadly speaking, the intuition behind the “privacy amplification by subsampling” principle is that the privacy guarantees of a differentially private mechanism can be amplified by applying it to a small random subsample of records from a given dataset. In machine learning, many classes of algorithms involve sampling operations, e.g. stochastic optimization methods and Bayesian inference algorithms, and it is not surprising that results quantifying the privacy amplification obtained via subsampling play a key role in designing differentially private versions of these learning algorithms (Bassily et al., 2014; Wang et al., 2015; Abadi et al., 2016; Jälkö et al., 2017; Park et al., 2016b, a). Additionally, from a practical standpoint subsampling provides a straightforward method to obtain privacy amplification when the final mechanism is only available as a black-box. For example, in Apple’s iOS and Google’s Chrome deployments of differential privacy for data collection the privacy parameters are hard-coded into the implementation and cannot be modified by the user. In this type of settings, if the default privacy parameters are not satisfactory one could achieve a stronger privacy guarantee by devising a strategy that only submits to the mechanism a random sample of the data.

Despite the practical importance of subsampling, existing tools to bound privacy amplification only work for specific forms of subsampling and typically come with cumbersome proofs providing no information about the tightness of the resulting bounds. In this paper we remedy this situation by providing a general framework for deriving tight privacy amplification results that can be applied to any of the subsampling strategies considered in the literature. Our framework builds on a characterization of differential privacy in terms of -divergences (Barthe and Olmedo, 2013). This characterization has been used before for program verification (Barthe et al., 2012, 2016), while we use it here for the first time in the context of algorithm analysis. In order to do this, we develop several novel analytical tools, including advanced joint convexity – a property of -divergence with respect to mixture distributions – and privacy profiles – a general tool describing the privacy guarantees that private algorithms provide.

One of our motivations to initiate a systematic study of privacy amplification by subsampling is that this is an important primitive for the design of differentially private algorithms which has received less attention than other building blocks like composition theorems (Dwork et al., 2010; Kairouz et al., 2017; Murtagh and Vadhan, 2016). Given the relevance of sampling operations in machine learning, it is important to understand what are the limitations of privacy amplification and develop a fine-grained understanding of its theoretical properties. Our results provide a first step in this direction by showing how privacy amplification resulting from different sampling techniques can be analyzed by means of single set of tools, and by showing how these tools can be used for proving lower bounds. Our analyses also highlight the importance of choosing a sampling technique that is well-adapted to the notion of neighbouring datasets under consideration. A second motivation is that subsampling provides a natural example of mechanisms where the output distribution is a mixture. Because mixtures have an additive structure and differential privacy is defined in terms of a multiplicative guarantee, analyzing the privacy guarantees of mechanisms whose output distribution is a mixture is in general a challenging task. Although our analyses are specialized to mixtures arising from subsampling, we believe the tools we develop in terms of couplings and divergences will also be useful to analyze other types of mechanisms involving mixture distributions. Finally, we want to remark that privacy amplification results also play a role in analyzing the generalization and sample complexity properties of private learning algorithms (Kasiviswanathan et al., 2011; Beimel et al., 2013; Bun et al., 2015; Wang et al., 2016); an in-depth understanding of the interplay between sampling and differential privacy might also have applications in this direction.

## 2 Problem Statement and Methodology Overview

A mechanism with input space and output space is a randomized algorithm that on input outputs a sample from the distribution over . Here denotes the set of probability measures on the output space . We implicitly assume is equipped with a sigma-algebra of measurable subsets and a base measure, in which case is restricted to probability measures that are absolutely continuous with respect to the base measure. In most cases of interest is either a discrete space equipped with the counting measure or an Euclidean space equipped with the Lebesgue measure. We also assume is equipped with a binary symmetric relation defining the notion of neighbouring inputs.

Let and . A mechanism is said to be -differentially private w.r.t. if for every pair of inputs and every measurable subset we have

(1) |

For our purposes, it will be more convenient to express differential
privacy in terms of -divergences^{2}^{2}2Also known in the
literature as elementary
divergences (Österreicher, 2002) and hockey-stick
divergences (Sason and Verdú, 2016).. Concretely, we will use (for
) the -divergence between two probability
measures is defined as^{3}^{3}3Here denotes the Radon-Nikodym derivative between and
. In particular, if and have densities and with respect to some base measure ,
then .

(2) |

where ranges over all measurable subsets of , , and the last equality is a specialization for discrete . It is easy to see (Barthe and Olmedo, 2013) that is -differentially private if and only if for every and such that .

In order to emphasize the relevant properties of from a privacy amplification point of view, we introduce the concepts of privacy profile and group-privacy profiles. The privacy profile of a mechanism is a function associating to each privacy parameter a bound on the -divergence between the results of running the mechanism on two adjacent datasets, i.e. (we will discuss the properties of this tool in more details in the next section). Informally speaking, the privacy profile represents the set of all of privacy parameters under which a mechanism provides differential privacy. In particular, recall that an -DP mechanism is also -DP for any and any . The privacy profile defines a curve in that separates the space of privacy parameters into two regions: the ones for which satisfies differential privacy and the ones for which it does not. This curve exists for every mechanism , even for mechanisms that satisfy pure DP for some value of .

To define group-privacy profiles for we use the path-distance on induced by :

(3) |

With this notation, we define . Note that .

### Problem statement

A well-known method for increasing privacy of a mechanism is to apply the mechanism to a random subsample of the input database, rather than on the database itself. Intuitively, the method decreases the chances of leaking information about a particular individual because nothing about that individual can be leaked in the cases where the individual is not included in the subsample. The question addressed in this paper is to devise methods for quantifying amplification and for proving optimality of the bounds. This turns out to be a surprisingly subtle problem.

Formally, let and be two sets equipped with relations and respectively. We assume that both and contain databases (modelled as sets, multisets, or tuples) over a universe that represents all possible records contained in a database. A subsampling mechanism is a randomized algorithm that takes as input a database and outputs a finitely supported distribution over datasets. Note that we find it convenient to distinguish between and because and might not always have the same type. For example, sampling with replacement from a set yields a multiset .

The problem of privacy amplification can now be stated as follows: let be a mechanism with respect to , and let be a subsampling mechanism. Consider the mechanism given by . The goal is to relate the privacy profiles of and , via an inequality of the form: for every , there exists such that for some function .

A full specification of this problem requires formalizing the following three ingredients: (i) dataset representation specifying whether the inputs to the mechanism are sets, multisets, or tuples; (ii) neighbouring relations in and , including the usual remove/add-one and substitute-one relations; (iii) subsampling method and its parameters, with the most commonly used being subsample without replacement, subsampling with replacement, and Poisson subsampling.

Regardless of the specific setting being considered, the main challenge in the analysis of privacy amplification by subsampling resides in the fact that the output distribution of the mechanism is a mixture distribution. In particular, writing for any and taking to be the (finitely supported) distribution over subsamples from produced by the subsampling mechanism, we can write , where denotes the Markov kernel operating on measures defined by . Consequently, proving privacy amplifications results requires reasoning about the mixtures obtained when sampling from two neighbouring datasets , and how the privacy parameters change in the mixture.

### Our contribution

Our privacy amplification results use properties of divergences and privacy profiles, together with two additional ingredients.

The first ingredient is a novel advanced joint convexity property useful to give improved upper bounds on the -divergence of a mixture distribution. In the specific context of differential privacy this result yields for every :

(4) |

for , , for being the total variation distance between the distributions over subsamples and for suitable measures . We notice that the term plays a role in most of the known privacy amplification results. Interestingly, the proof of advanced joint convexity uses ideas from probabilistic couplings, and more specifically the maximal coupling construction.

The second ingredient in our analysis establishes an upper bound for the divergences occurring in the right hand side of (4) in terms of group-privacy profiles. It states that under suitable conditions, we have:

(5) |

for suitable choices of . Again, the proof of the inequality uses tools from probabilistic couplings.

The combination of these results yields a bound of the privacy profile of as a function of the group-privacy profiles of . Based on this inequality, we will establish several privacy amplification result and prove tightness results. This methodology can be applied to any of the settings discussed above in terms of dataset representation, neighbouring relation, and type of subsampling.

## 3 Tools: Couplings, Divergences and Privacy Profiles

We next introduce several tools that will be used to support our analyses. The first and second tools are known, whereas the remaining tools are new and of independent interest.

### Divergences

The following characterization follows immediately from the definition of -divergence in terms of the supremum over .

###### Theorem 1 ((Barthe and Olmedo, 2013)).

A mechanism is -differentially private with respect to if and only if .

Note that in the statement of the theorem we take . Throughout the paper we will use these two notations interchangeably to make expressions more compact.

We now state a few useful consequences of the definition of -divergence: (i) ; (ii) the function is monotonically decreasing; (iii) the function is jointly convex. Furthermore, one can show that if and only if .

### Couplings

Couplings are a standard tool for deriving upper bounds for the statistical distance between distributions. Concretely, it is well-known that the total variation distance between two distributions satisfies for any coupling , where equality is attained by taking the so-called maximal coupling. We recall the definition of coupling and provide a construction of the maximal coupling, which we shall use in later sections.

A coupling between two distributions is a distribution whose marginals along the projections and are and respectively. Couplings always exist, and furthermore, there exists a maximal coupling, which exactly characterizes the statistical distance between and . Let and let , where denotes the total variation distance. The maximal coupling between and is defined as the mixture , where , and .

### Advanced Joint Convexity

The privacy amplification phenomenon is tightly connected to an interesting new form of joint convexity for -divergences, which we call advanced joint convexity.

######
Theorem 2 (Advanced Joint Convexity of ^{4}^{4}4Proofs of all our results are presented in the appendix.).

Let be measures satisfying and for some , , , and . Given , let and . Then the following holds:

(6) |

Note that writing and in the above lemma we get the relation commonly found in privacy amplification bounds. Applying standard joint convexity to the right hand side above we conclude: . Note that applying joint convexity directly on instead of advanced joint complexity yields a weaker bound which implies an amplification on the privacy parameter, but not on the privacy parameter.

When using advanced joint convexity to analyze privacy amplification we consider two elements and and fix the following notation. Let and and and , where we use the notation to denote the Markov kernel associated with mechanism operating on measures over . We then consider the mixture factorization of and obtained by taking the decompositions induced by projecting the maximal coupling on the first and second marginals: and . It is easy to see from the construction of the maximal coupling that and have disjoint supports and is the smallest probability such that this condition holds. In this way we obtain the canonical mixture decompositions and , where , and .

### Privacy Profiles

We state some important properties of privacy profiles. Our first result illustrates our claim that the “privacy curve” exists for every mechanism in the context of the Laplace output perturbation mechanism.

###### Theorem 3.

Let be a function with global sensitivity . Suppose is a Laplace output perturbation mechanism with noise parameter . The privacy profile of is given by , where .

The well-known fact that the Laplace mechanism with is -DP follows from this result by noting that for any . However, Theorem 3 also provides more information: it shows that for the Laplace mechanism with noise parameter satisfies -DP with .

For mechanisms that only satisfy approximate DP, the privacy profile provides information about the behaviour of as we increase . The classical analysis for the Gaussian output perturbation mechanism provides some information in this respect. Recall that for a function with global sensitivity the mechanism satisfies -DP if and (cf. (Dwork and Roth, 2014, Theorem A.1)). This can be rewritten as for , where . Recently, Balle and Wang (Balle and Wang, 2018) gave a new analysis of the Gaussian mechanism that is valid for all values of . Their analysis can be interpreted as providing an expression for the privacy profile of the Gaussian mechanism in terms of the Gaussian CDF .

###### Theorem 4 ((Balle and Wang, 2018)).

Let be a function with global sensitivity . For any let . The privacy profile of the Gaussian mechanism is given by

(7) |

Interestingly, the proof of Theorem 4 implicitly provides a characterization of privacy profiles in terms of privacy loss random variables that holds for any mechanism. Recall that the privacy loss random variable of a mechanism on inputs is defined as , where , , and .

###### Theorem 5 ((Balle and Wang, 2018)).

The privacy profile of any mechanism satisfies

(8) |

The characterization above generalizes the well-known inequality (eg. see (Dwork and Roth, 2014)). This bound is often used to derive -DP guarantees from other notions of privacy defined in terms of the moment generating function of the privacy loss random variable, including concentrated DP (Dwork and Rothblum, 2016), zero-concentrated DP (Bun and Steinke, 2016), Rényi DP (Mironov, 2017), and truncated concentrated DP (Bun et al., 2018). We conclude this section by showing a reverse implication. Namely, that privacy profiles can be used to recover all the information provided by the moment generating function of the privacy loss random variable.

###### Theorem 6.

Given a mechanism and inputs let and . For , define the moment generating function . Then we have

(9) |

In particular, if holds^{5}^{5}5Eg. this is satisfied for all output perturbation mechanisms with symmetric noise distributions. for every , then .

### Group-privacy Profiles

It is easy to see that the standard analysis of group privacy^{6}^{6}6If is -DP with respect to , then it is -DP with respect to , cf. (Vadhan, 2017, Lemma 2.2) yields the bounds . However, “white-box” approaches based on full knowledge of the privacy profile of can be used to improve this result for specific mechanisms. For example, it is not hard to see that, combining the expressions from Theorems 3 and 4 with the triangle inequality on the global sensitivity of changing records in a dataset, one obtains bounds that improve on the “black-box” approach for all ranges of parameters. This is one of the reasons why we state our bounds directly in terms of (group-)privacy profiles.

### Distance-compatible Coupling

The last property we need to prove general privacy amplification bounds based on -divergences is the existence of a certain type of couplings between two distributions like the ones occurring in the right hand side of (6). Recall that any coupling between two distributions can be used to rewrite the mixture distributions and as and . Using the joint convexity of and the definition of group-privacy profiles to get the bound

(10) |

Since this bound holds for any coupling , one can set out to optimize it by finding a coupling the minimizes the right hand side of (10). We show that the existence of couplings whose support is contained inside a certain subset of is enough to obtain an optimal bound. Furthermore, we show that in that case the resulting bound depends only on and the group-privacy profiles of . We say that the distributions are -compatible if there exists a coupling between and such for any we have .

###### Theorem 7.

Let be the set of all couplings between and and for let . If and are -compatible, then the following holds:

(11) |

Applying this result to the bound resulting from the right hand side of (6) yields most of the concrete privacy amplification results presented in the next section.

## 4 Examples

In this section we provide explicit privacy amplification bounds for the most common subsampling methods and neighbouring relations found in the literature on differential privacy. For our analysis we work with order-independent representations of datasets without repetitions. This is mostly for technical convenience, since all our results also hold if one considers datasets represented as tuples or multisets. Note however that subsampling with replacement for a set can yield a multiset; hence we introduce suitable notations for sets and multisets.

Fix a universe of records and let .
We write and for the space of all sets and multisets with records from . Note every set is also a multiset. For we also write and for the space of all sets and multisets containing exactly^{7}^{7}7In the case of multisets records are counted with multiplicity. records from . Given we write for the number of occurrences of in . The support of a multiset is the set given by . Given multisets we write to denote that for all .

For order-independent datasets represented as multisets it is natural to consider the two following neighbouring relations. The remove/add-one relation is obtained by letting hold whenever with or with ; i.e. is obtained by removing or adding a single element to . The substitute-one relation is obtained by letting hold whenever and ; i.e. is obtained by replacing an element in with a different element from . Note how relates pairs of datasets with different sizes, while only relates pairs of datasets with the same size.

### Poisson Subsampling

Perhaps the most well-known privacy amplification result refers to the analysis of Poisson subsampling with respect to the remove/add-one relation. In this case the subsampling mechanism takes a set and outputs a sample from the distribution supported on all set given by . This corresponds to independently adding to with probability each element from . Now, given a mechanism with privacy profile with respect to , we are interested in bounding the privacy profile of the subsampled mechanism with respect to .

###### Theorem 8.

For any we have where

Privacy amplification with Poisson sampling was used in (Chaudhuri and Mishra, 2006; Beimel et al., 2010; Kasiviswanathan et al., 2011; Beimel et al., 2014), which considered loose bounds. A proof of this tight result in terms of -DP was first given in (Li et al., 2012). In the context of the moments accountant technique based on the moment generating function of the privacy loss random variable, (Abadi et al., 2016) provide an amplification result for Gaussian output perturbation mechanisms under Poisson subsampling.

### Sampling Without Replacement

Another known results on privacy amplification corresponds to the analysis of sampling without replacement with respect to the substitution relation. In this case one considers the subsampling mechanism that given a set of size outputs a sample from the uniform distribution over all subsets of size . Then, for a given a mechanism with privacy profile with respect to the substitution relation on sets of size , we are interested in bounding the privacy profile of the mechanism with respect to the substitution relation on sets of size .

###### Theorem 9.

For any we have where

This setting has been used in (Beimel et al., 2013; Bassily et al., 2014; Wang et al., 2016) with non-tight bounds. A proof of this tight bound formulated in terms of -DP can be found in Ullman’s class notes (Ullman, 2017). Recently, privacy amplification results for subsampling without replacement under Rényi DP were developed in (Wang et al., 2018).

### Sampling With Replacement

Next we consider the case of sampling with replacement with respect to the substitution relation . The subsampling with replacement mechanism takes a set of size and outputs a sample from the multinomial distribution over all multisets of size with , given by . In this case we suppose the base mechanism is defined on multisets and has privacy profile with respect to . We are interested in bounding the privacy profile of the subsampled mechanism with respect to .

###### Theorem 10.

For any we have

(12) |

where .

Note that if , then . A version of this bound in terms of -DP that implicitly uses the group privacy property can be found in (Bun et al., 2015).

### Hybrid Neighbouring Relations

Using our method it is also possible to analyze new settings which have not been considered before. One interesting example occurs when there is a mismatch between the two neighbouring relations arising in the analysis. For example, suppose one knows the group-privacy profiles of a base mechanism with respect to the substitution relation . In this case one could ask whether it makes sense to study the privacy profile of the subsampled mechanism with respect to the remove/add relation . In principle, this makes sense in settings where the size of the inputs to is restricted due to implementation constraints (eg. limited by the memory available in a GPU used to run a private mechanism that computes a gradient on a mini-batch of size ). In this case one might still be interested in analyzing the privacy loss incurred from releasing such stochastic gradients under the remove/add relation. Note that this setting cannot be implemented using sampling without replacement since under the remove/add relation we cannot a priori guarantee that the input dataset will have at least size because the size of the dataset must be kept private. Furthermore, one cannot hope to get a meaningful result about the privacy profile of the subsampled mechanism across all inputs sets in ; instead the privacy guarantee will depend on the size of the input dataset as shown in the following result.

###### Theorem 11.

For any and we have

(13) |

where .

### When the Neighbouring Relation is “Wrong”

Now we consider a simple example where distance-compatible couplings are not available: Poisson subsampling with respect to the substitution relation. Suppose are sets of size related by the substitution relation . Let and and note that . Let and , . In this case the factorization induced by the maximal coupling is obtained by taking , , and . Note the support of contains sets of sizes between and , while the supports of and contain sets of sizes between and . From this observation one can deduce that and are not -compatible, and and are not -compatible.

This argument shows that the method we used to analyze the previous settings cannot be extended to analyze Poisson subsampling under the substitution relation, regardless of whether the privacy profile of the base mechanism is given in terms of the replacement/addition or the substitution relation. One can interpret this observation as saying that some pairings between subsampling method and neighbouring relation are more natural than others. Nonetheless, even without distance-compatible couplings it is possible to provide privacy amplification bounds for Poisson subsampling with respect to the substitution relation, although the resulting bound is quite cumbersome. We provide the corresponding statement and analysis in the appendix.

## 5 Lower Bounds

In this section we show that many of the results given in the previous section are tight by constructing a randomized membership mechanism that attains these upper bounds. For the sake of generality, we state the main construction in terms of tuples instead of multisets. Furthermore, we proof a general lemma that can be used to obtain tightness results for any subsampling mechanism and any neighbouring relation satisfying a minimal set of assumptions.

For let be the randomized response mechanism that given returns with probability and with probability . Note that for this mechanism is -DP. Let and . For any and define . Then it is easy to show that . Now let be a universe containing at least two elements. For and we define the randomized membership mechanism that given a tuple returns . We say that a subsampling mechanism defined on some is natural if the following two conditions are satisfied: (1) for any and , if then there exists such that ; (2) for any and , if then we have for every .

###### Lemma 12.

Let be a set of tuples equipped with a neighbouring relation such that there exist with and . Suppose is a natural subsampling mechanism and let . Given and we have

(14) |

We can now apply this lemma to show that the first three results from previous section are tight. This requires specializing from tuples to (multi)sets, and plugging in the definitions of neighbouring relation, subsampling mechanism, and used in each of these theorems.

## 6 Conclusions

We have developed a general method for reasoning about privacy amplification by subsampling. Our method is applicable to many different settings, some which have already been studied in the literature, and others which are new. Technically, our method leverages two new tools of independent interest: advanced joint convexity and privacy profiles. In the future, it would be interesting to study whether our tools can be extended to give concrete bounds on privacy amplification for other privacy notions such as concentrated DP (Dwork and Rothblum, 2016), zero-concentrated DP (Bun and Steinke, 2016), Rényi DP (Mironov, 2017), and truncated concentrated DP (Bun et al., 2018). A good starting point is Theorem 6 establishing relations between privacy profiles and moment generating functions of the privacy loss random variable. An alternative approach is to extend the recent results for Rényi DP amplification by subsampling without replacement given in (Wang et al., 2018) to more general notions of subsampling and neighbouring relations.

## References

- Abadi et al. [2016] Martín Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pages 308–318. ACM, 2016.
- Balle and Wang [2018] Borja Balle and Yu-Xiang Wang. Improving the gaussian mechanism for differential privacy: Analytical calibration and optimal denoising. In Proceedings of the 35th International Conference on Machine Learning, ICML, 2018.
- Barthe and Olmedo [2013] Gilles Barthe and Federico Olmedo. Beyond differential privacy: Composition theorems and relational logic for f-divergences between probabilistic programs. In International Colloquium on Automata, Languages, and Programming, pages 49–60. Springer, 2013.
- Barthe et al. [2012] Gilles Barthe, Boris Köpf, Federico Olmedo, and Santiago Zanella Béguelin. Probabilistic relational reasoning for differential privacy. In Symposium on Principles of Programming Languages (POPL), pages 97–110, 2012.
- Barthe et al. [2016] Gilles Barthe, Marco Gaboardi, Benjamin Grégoire, Justin Hsu, and Pierre-Yves Strub. Proving differential privacy via probabilistic couplings. In Symposium on Logic in Computer Science (LICS), pages 749–758, 2016.
- Bassily et al. [2014] Raef Bassily, Adam Smith, and Abhradeep Thakurta. Private empirical risk minimization: Efficient algorithms and tight error bounds. In Foundations of Computer Science (FOCS), 2014 IEEE 55th Annual Symposium on, pages 464–473. IEEE, 2014.
- Beimel et al. [2010] Amos Beimel, Shiva Prasad Kasiviswanathan, and Kobbi Nissim. Bounds on the sample complexity for private learning and private data release. In Theory of Cryptography Conference, pages 437–454. Springer, 2010.
- Beimel et al. [2013] Amos Beimel, Kobbi Nissim, and Uri Stemmer. Characterizing the sample complexity of private learners. In Proceedings of the 4th conference on Innovations in Theoretical Computer Science, pages 97–110. ACM, 2013.
- Beimel et al. [2014] Amos Beimel, Hai Brenner, Shiva Prasad Kasiviswanathan, and Kobbi Nissim. Bounds on the sample complexity for private learning and private data release. Machine learning, 94(3):401–437, 2014.
- Bun and Steinke [2016] Mark Bun and Thomas Steinke. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Theory of Cryptography - 14th International Conference, TCC 2016-B, Beijing, China, October 31 - November 3, 2016, Proceedings, Part I, pages 635–658, 2016.
- Bun et al. [2015] Mark Bun, Kobbi Nissim, Uri Stemmer, and Salil Vadhan. Differentially private release and learning of threshold functions. In Foundations of Computer Science (FOCS), 2015 IEEE 56th Annual Symposium on, pages 634–649. IEEE, 2015.
- Bun et al. [2018] Mark Bun, Cynthia Dwork, Guy Rothblum, and Thomas Steinke. Composable and versatile privacy via truncated cdp. In Symposium on Theory of Computing, STOC, 2018.
- Chaudhuri and Mishra [2006] Kamalika Chaudhuri and Nina Mishra. When random sampling preserves privacy. In Annual International Cryptology Conference, pages 198–213. Springer, 2006.
- Dwork and Roth [2014] Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3-4):211–407, 2014.
- Dwork and Rothblum [2016] Cynthia Dwork and Guy N Rothblum. Concentrated differential privacy. arXiv preprint arXiv:1603.01887, 2016.
- Dwork et al. [2010] Cynthia Dwork, Guy N Rothblum, and Salil Vadhan. Boosting and differential privacy. In Foundations of Computer Science (FOCS), 2010 51st Annual IEEE Symposium on, pages 51–60. IEEE, 2010.
- Jälkö et al. [2017] Joonas Jälkö, Antti Honkela, and Onur Dikmen. Differentially private variational inference for non-conjugate models. In Proceedings of the Thirty-Third Conference on Uncertainty in Artificial Intelligence, UAI 2017, Sydney, Australia, August 11-15, 2017, 2017.
- Kairouz et al. [2017] Peter Kairouz, Sewoong Oh, and Pramod Viswanath. The composition theorem for differential privacy. IEEE Transactions on Information Theory, 63(6):4037–4049, 2017.
- Kasiviswanathan et al. [2011] Shiva Prasad Kasiviswanathan, Homin K Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. What can we learn privately? SIAM Journal on Computing, 40(3):793–826, 2011.
- Li et al. [2012] Ninghui Li, Wahbeh Qardaji, and Dong Su. On sampling, anonymization, and differential privacy or, k-anonymization meets differential privacy. In Proceedings of the 7th ACM Symposium on Information, Computer and Communications Security, pages 32–33. ACM, 2012.
- Mironov [2017] Ilya Mironov. Rényi differential privacy. In 30th IEEE Computer Security Foundations Symposium, CSF 2017, Santa Barbara, CA, USA, August 21-25, 2017, pages 263–275, 2017.
- Murtagh and Vadhan [2016] Jack Murtagh and Salil Vadhan. The complexity of computing the optimal composition of differential privacy. In Theory of Cryptography Conference, pages 157–175. Springer, 2016.
- Österreicher [2002] F Österreicher. Csiszár’s f-divergences-basic properties. RGMIA Res. Rep. Coll, 2002.
- Park et al. [2016a] Mijung Park, James R. Foulds, Kamalika Chaudhuri, and Max Welling. Private topic modeling. CoRR, abs/1609.04120, 2016a.
- Park et al. [2016b] Mijung Park, James R. Foulds, Kamalika Chaudhuri, and Max Welling. Variational bayes in private settings (VIPS). CoRR, abs/1611.00340, 2016b.
- Sason and Verdú [2016] Igal Sason and Sergio Verdú. -divergence inequalities. IEEE Transactions on Information Theory, 62(11):5973–6006, 2016.
- Ullman [2017] Jonathan Ullman. Cs7880: Rigorous approaches to data privacy. http://www.ccs.neu.edu/home/jullman/PrivacyS17/HW1sol.pdf, 2017.
- Vadhan [2017] Salil P. Vadhan. The complexity of differential privacy. In Tutorials on the Foundations of Cryptography., pages 347–450. 2017.
- Wang et al. [2015] Yu-Xiang Wang, Stephen Fienberg, and Alex Smola. Privacy for free: Posterior sampling and stochastic gradient monte carlo. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pages 2493–2502, 2015.
- Wang et al. [2016] Yu-Xiang Wang, Jing Lei, and Stephen E. Fienberg. Learning with differential privacy: Stability, learnability and the sufficiency and necessity of erm principle. Journal of Machine Learning Research, 17(183):1–40, 2016.
- Wang et al. [2018] Yu-Xiang Wang, Borja Balle, and Shiva Kasiviswanathan. Subsampled Rényi differential privacy and analytical moments accountant. In Preprint, 2018.

## Appendix A Proofs from Section 3

###### Proof of Theorem 2.

It suffices to check that for any ,

Plugging this bound in the definition of we get the desired equality

∎

###### Proof of Theorem 3.

Let , , , . Plugging the density of the Laplace distribution in the definition of -divergence we get

(15) |

Now we observe that the quantity inside the integral above is positive if and only if . Since , we see that the divergence is zero for . On the other hand, for we have (assuming without loss of generality) . Thus, we have

(16) |

Now we can compute both integrals as probabilities under the Laplace distribution:

(17) | ||||

(18) | ||||

(19) | ||||

(20) |

Putting these two quantities together we finally get

(21) | ||||

(22) |

∎

###### Proof of Theorem 6.

Let , , , and . Recall that for any non-negative random variable one has . We use this to write the moment generating function of the corresponding privacy loss random variable for as follows:

where , and and represent the densities of and with respect to a fixed base measure. Next we observe the probability inside the integral above can be decomposed in terms of a divergence and a second integral with respect to :