Lower Bounds for Differential Privacy from Gaussian Width
We study the optimal sample complexity of a given workload of linear queries under the constraints of differential privacy. The sample complexity of a query answering mechanism under error parameter is the smallest such that the mechanism answers the workload with error at most on any database of size . Following a line of research started by Hardt and Talwar [STOC 2010], we analyze sample complexity using the tools of asymptotic convex geometry. We study the sensitivity polytope, a natural convex body associated with a query workload that quantifies how query answers can change between neighboring databases. This is the information that, roughly speaking, is protected by a differentially private algorithm, and, for this reason, we expect that a “bigger” sensitivity polytope implies larger sample complexity. Our results identify the mean Gaussian width as an appropriate measure of the size of the polytope, and show sample complexity lower bounds in terms of this quantity. Our lower bounds completely characterize the workloads for which the Gaussian noise mechanism is optimal up to constants as those having asymptotically maximal Gaussian width.
Our techniques also yield an alternative proof of Pisier’s Volume Number Theorem which also suggests an approach to improving the parameters of the theorem.
The main goal of private data analysis is to estimate aggregate statistics while preserving individual privacy guarantees. Intuitively, we expect that, for statistics that do not depend too strongly on any particular individual, a sufficiently large database allows computing an estimate that is both accurate and private. A natural question then is to characterize the sample complexity under privacy constraints: the smallest database size for which we can privately estimate the answers to a given collection of queries within some allowable error tolerance. Moreover, it is desirable to identify algorithms that are simple, efficient, and have close to the best possible sample complexity. In this work, we study these questions for collections of linear queries under the constraints of approximate differential privacy.
We model a database of size as a multiset of elements (counted with repetition) from an arbitrary finite universe . Each element of the database corresponds to the data of a single individual. To define a privacy-preserving computation on , we use the strong notion of differential privacy. Informally, an algorithm is differentially private if it has almost identical behavior on any two databases and that differ in the data of a single individual. To capture this concept formally, let us define two databases to be neighboring if they have symmetric difference of size at most (counted with multiplicity). Then differential privacy is defined as follows:
Definition 1 ([Dmns06]).
A randomized algorithm that takes as input a database and outputs a random element from the set satisfies -differential privacy if for all neighboring databases and all measurable we have that:
where probabilities are taken with respect to the randomness of .
One of the most basic primitives in private data analysis, and data analysis in general, are counting queries and, slightly more generally, linear queries. While interesting and natural in themselves, they are also quite powerful: any statistical query (SQ) learning algorithm can be implemented using noisy counting queries as a black box [Kea98]. In our setting, we specify a linear query by a function (given by its truth table). Slightly abusing notation, we define the value of the query as , where the elements of are counted with multiplicity. For example, when , we can think of as a property defined on and as the fraction of elements of that satisfy the property: this is a counting query. We call a set of linear queries a workload and an algorithm that answers a query workload a mechanism. We denote by the vector of answers to the queries in . Throughout the paper, we will use the letter for the size of a workload .
Starting from the work of Dinur and Nissim [DN03], it is known that we cannot hope to answer too many linear queries too accurately while preserving even a very weak notion of privacy. For this reason, we must allow our private mechanisms to make some error. We focus on average error (in an sense). We define the average error of an algorithm on a query workload and databases of size at most as:
where the maximum is over all databases of size at most , is the answer to query given by the algorithm on input , and expectations are taken with respect to the random choices of . This is a natural notion of error that also works particularly well with the geometric tools that we use.
In this work we study sample complexity: the smallest database size which allows us to answer a given query workload with error at most . The sample complexity of an algorithm with error is defined as:
The sample complexity of answering the linear queries with error under -differential privacy is defined by:
The two main questions we are interested in are:
Can we characterize in terms of a natural property of the workload ?
Can we identify conditions under which simple and efficient -differentially private mechanisms have nearly optimal sample complexity?
We make progress on both questions. We identify a geometrically defined property of the workload that gives lower bounds on the sample complexity. The lower bounds also characterize when one of the simplest differentially private mechanisms, the Gaussian noise mechanism, has nearly optimal sample complexity in the regime of constant .
Before we can state our results, we need to define a natural geometric object associated with a workload of linear queries. This object has been important in applying geometric techniques to differential privacy [HT10, BDKT12, NTZ13, Nik15].
The sensitivity polytope of a workload of linear queries is equal to .
From the above definition, we see that is a symmetric (i.e. ) convex polytope in . The importance of lies in the fact that it captures how query answers can change between neighboring databases: for any two neighboring databases and of size and respectively, . This is exactly the information that a differentially private algorithm is supposed to hide. Intuitively, we expect that the larger is, the larger should be.
We give evidence for the above intuition, and propose the width of in a random direction as a measure of its “size”. Let be the support function of : . For a unit vector , is the width of in the direction of ; for arbitrary , scales linearly with (and is, in fact, a norm). We define the -norm of , also known as its Gaussian mean width, as , where is a standard Gaussian random vector in . The following theorem captures our main result.
Let be a workload of linear queries, and let be its sensitivity polytope. The following holds for all , , and any , where is an absolute constant, and :
The upper bound on sample complexity is achieved by a mechanism running in time polynomial in , , and . Moreover, if , then for any , where is an absolute constant.
The sample complexity upper bounds in the theorem above are known from prior work: one is given by the projection mechanism from [NTZ13], with the sample complexity upper bound in terms of shown in [DNT14]; the other upper bound is given by the Gaussian noise mechanism [DN03, DN04, DMNS06]. The main new contribution in this work are the lower bounds on sample complexity. The gap between upper and lower bounds is small when is close to its maximal value of . Indeed, when , our results imply that the Gaussian noise mechanism has optimal sample complexity up to constants. This is, to the best of our knowledge, the first example of a general geometric condition under which a simple and efficient mechanism has optimal sample complexity up to constant factors. Moreover, in the constant error regime this condition is also necessary for the Gaussian mechanism to be optimal up to constants: when and , the projection mechanism has asymptotically smaller sample complexity than the Gaussian mechanism.
We can prove somewhat stronger results for another natural problem in private data analysis, which we call the mean point problem. In this problem, we are given a closed convex set , and we are asked to approximate the mean of the database , where is a multiset of points in and . This problem, which will be the focus for most of this paper, has a more geometric flavor, and is closely related to the query release problem for linear queries. In fact, Theorem 1 will essentially follow from a reduction from the results below for the mean point problem.
With respect to the mean point problem, we define the error of an algorithm as:
where the supremum is over databases consisting of at most points from , and the expectation is over the randomness of the algorithm. The sample complexity of an algorithm with error is defined as:
The sample complexity of solving the mean point problem with error over is defined by:
Our main result for the mean point problem is given in the following theorem:
Let be a symmetric convex body contained in the unit Euclidean ball in . The following holds for all , , and any , where is an absolute constant, and :
The upper bound on sample complexity is achieved by a mechanism running in time polynomial in , , and . Moreover, when , then for any , where is an absolute constant.
The upper bounds again follow from prior work, and in fact are also given by the projection mechanism and the Gaussian noise mechanism, which can be defined for the mean point problem as well. Notice that the gap between the upper and the lower bound is on the order . If the lower bound was valid for all values of the error parameter less than a fixed constant, rather than for , Theorem 2 would nearly characterize the optimal sample complexity for the mean point problem for all constant . Unfortunately, the restriction on is, in general, necessary (up to the logarithmic terms) for lower bounds on sample complexity in terms of . For example, we can take , i.e. a Euclidean ball in with radius . Then, , but the sample complexity is when , since the trivial algorithm which ignores the database and outputs achieves error . Thus, a more sensitive measure of the size of is necessary to prove optimal lower bounds. We do, nevertheless, trust that the techniques introduced in this paper bring us closer to this goal.
We conclude this section with a high-level overview of our techniques. Our starting point is a recent tight lower bound on the sample complexity of a special class of linear queries: the -way marginal queries. These queries achieve the worst case sample complexity for a family of linear queries: [BUV14, SU15]. The sensitivity polytope of the -way marginals is the cube , and it can be shown that the lower bound on the sample complexity of -way marginals implies an analogous lower bound on the sample complexity of the mean point problem with . For the mean point problem, it is easy to see that when , the sample complexity for is no larger than the sample complexity for . Moreover, we can show that the sample complexity of any projection of is no bigger than the sample complexity of itself. So, our strategy then is to find a large scaled copy of , , inside a projection of onto a large dimensional subspace whenever is large. We solve this geometric problem using deep results from asymptotic convex geometry, namely the Dvoretzky criterion, the low estimate, and the estimate.
Our techniques also yield an alternative proof of the volume number theorem of Milman and Pisier [MP87]. Besides avoiding the quotient of subspace theorem, our proof yields an improvement in the volume number theorem, conditional on the well-known conjecture that any symmetric convex body has a position (affine image) for which , where is the expected -norm of a standard Gaussian. More details about this connection are given in Section 7.
1.1 Prior Work
Most closely related to our work are the results of Nikolov, Talwar, and Zhang [NTZ13], who gave a private mechanism (also based on the projection mechanism, but more involved) which has nearly optimal sample complexity (with respect to average error), up to factors polynomial in and . This result was subsequently improved by Nikolov [Nik15], who showed that the factors can be replaced by . While these results are nearly optimal for subconstant values of the error parameter , i.e. the optimality guarantees do not depend on , factors polynomial in can be prohibitively large. Indeed, in many natural settings, such as that of marginal queries, is exponential in the number of queries , so the competitiveness ratio can be polynomial in .
The line of work that applies techniques from convex geometry to differential privacy started with the beautiful paper of Hardt and Talwar [HT10], whose results were subsequently strengthened in [BDKT12]. These papers focused on the “large database” regime (or, in our language, the setting of subconstant error), and pure differential privacy ().
We begin with the introduction of some notation. Throughout the paper we use , , etc., for absolute constants, whose value may change from line to line. We use for the Euclidean norm, and for the norm. We define and to be the and unit balls in respectively, while will refer to the -dimensional hypercube, normalized to be contained in the unit Euclidean ball. We use for the identity operator on , as well as for the identity matrix. For a given subspace , we define as the orthogonal projection operator onto . Moreover, when is a linear operator between the subspaces , we define as its operator norm, which is also equal to its largest singular value . For the diameter of a set we use the nonstandard, but convenient, definition . For sets symmetric around , this is equivalent to the standard definition, but scaled up by a factor of . We use to refer to the Gaussian distribution with mean and covariance , and we use the notation to denote that is distributed as a Gaussian random variable with mean and covariance . For a matrix (or equivalently an operator from to ) we use to denote that is positive semidefinite. For positive semidefinite matrices/operators , , we use the notation to denote .
3 Probability Theory
We make use of some basic comparison theorems from the theory of stochastic processes. First we state the well-known symmetrization lemma. We also give the short proof for completeness. Recall that are a sequence of Rademacher random variables if each is uniformly and independently distributed in .
Lemma 1 (Symmetrization).
Let , and let be a norm on . Then, for any sequence of independent random variables in such that is finite for every , we have
where are Rademacher random variables, independent of . Each expectation above is with respect to all random variables involved.
Let be independent copies of . Then, , and, by convexity of the function and Jensen’s inequality,
Because and are independent and identically distributed, the random variables and are also identically distributed. Therefore, , where are independent Rademacher random variables, as in the statement of the lemma. Finally, by Minkowski’s inequality (i.e. triangle inequality for ), we have
as desired. ∎
Next we state a simple comparison theorem for Gaussian random variables.
Let and be Gaussian random variables in , and assume . Then, for any norm on , we have:
Couple and so that they are independent, and define a new random variable , independent of and . Then the random variables and are distributed identically to , and, by linearity of expectation and the triangle inequality we have
This completes the proof. ∎
Note that the same conclusion follows under weaker assumptions from Slepian’s lemma.
3.1 Convex Geometry
In this section, we outline the main geometric tools we use in later sections. For a more detailed treatment, we refer to the lecture notes by Vershynin [Ver09] and the books by Pisier [Pis89] and Artstein-Avidan, Giannopoulos, and Milman [AAGM15].
Throughout, we define a convex body as a compact subset of with non-empty interior. A convex body is (centrally) symmetric if and only if . We define the polar body of as: The following basic facts are easy to verify and very useful.
For convex bodies ,
Fact 2 (Section/Projection Duality).
For a convex body and a subspace :
In both cases, the polar is taken in the subspace .
For any invertible linear map and any convex body , , where is the inverse of the adjoint operator .
For a convex body and a subspace with , the following two statements are equivalent:
where, as before, taking the polar set is considered in the subspace . Notice that the second statement is also equivalent to .
Our work relies on appropriately quantifying the “size” of (projections and sections of) a convex body. It turns out that, for our purposes, the right measure of size is related to the notion of width, captured by the support function. Recall from the introduction that the support function of a convex body is given by for every . The support function is intimately related to the Minkowski norm , defined for a symmetric convex body by for every . It is easy to verify that is indeed a norm. The support function is identical to the Minkowski norm of the polar body (which is also the dual norm to ): for every .
Now we come to the measure of the “size” of a convex body which will be central to our results: the Gaussian mean width of the body, defined next.
The Gaussian mean width and Gaussian mean norm of a symmetric convex body are defined respectively as:
where is a standard Gaussian random variable.
The next lemma gives an estimate of how the mean width changes when applying a linear transformation to .
For any symmetric convex body , and any linear operator :
Notice that, for a standard Gaussian ,
Treating as an matrix in the natural way, we see that . By applying Lemma 2 to and , we have that
This finishes the proof of the lemma. ∎
Definition 4 (-position).
A convex body is in -position if for all linear operators :
Clearly, is in -position if and only if is in -position, since for any convex body . Note further that the product is scale-invariant, in the sense that for any real . This is because, for any , , and , so and .
We will relate the Gaussian mean width of to another measure of its size, and the size of its projections and sections, known as Gelfand width. A definition follows.
Definition 5 (Gelfand width).
For two symmetric convex bodies , the Gelfand width of order of with respect to is defined as:
where the first infimum is over subspaces of co-dimension at most (i.e. of dimension at least ). When , we define . We denote , and we call simply the Gelfand width of of order .
Note that , where the infimum is over subspaces of codimension at most . Observe also that for any and , is non-increasing in . It is well-known that the infimum in the definition is actually achieved [Pin85].
3.2 Composition of Differential Privacy
One of the most important properties of differential privacy is that it behaves nicely under (adaptively) composing mechanisms.
Lemma 4 (Composition).
For randomized algorithms and satisfying - and -differential privacy respectively, the algorithm satisfies -differential privacy.
3.3 Known Bounds
In this section, we recall some known differentially private mechanisms, with bounds on their sample complexity, as well as a lower bound on the optimal sample complexity. We start with the lower bound:
Next we recall one of the most basic mechanisms in differential privacy, the Gaussian mechanism. A proof of the privacy guarantee, with the constants given below, can be found in [NTZ13].
Let be such that . If , and is the identity matrix, then the algorithm defined by is -differentially private.
For any symmetric convex :
In the rest of the paper we will use the notation from the theorem statement above.
Finally, we also present the projection mechanism from [NTZ13], which post-processes the output of the Gaussian mechanism by projecting onto .
Let be a symmetric convex body, and define to be the algorithm that, on input , outputs:
where , . Then satisfies -differential privacy and has sample complexity:
For any symmetric convex :
4 Basic Properties of Sample Complexity
In this section, we prove some fundamental properties of sample complexity that will be extensively used in later sections.
Observe first that for any algorithm and any , , because is a supremum over a larger set than . This implies that holds for any algorithm , and, in particular, for the -differentially private algorithm that achieves . Then, we have:
as desired. ∎
For all , and :
For any , any linear operator and any symmetric convex body :
Let be an -differentially private algorithm that achieves . Fix a function so that for every , . We define a new algorithm that takes as input and outputs , where . We claim that is -differentially private and that holds for every . This claim is sufficient to prove the lemma, because it implies:
To show the claim, first observe that, by linearity, . We get:
where the first inequality follows by the definition of the operator norm. Since this holds for arbitrary and of size , it implies the claim on the error bound of . It remains to show that is -differentially private. Note that for every two neighboring databases and of points in ,the corresponding databases and of points in are also neighboring. Then is -differentially private as a function of , and the privacy of follows from Lemma 4. ∎
For any :
Taking in Lemma 6, where is the identity on , the lemma implies . Since this inequality holds for any and , we may apply it to and , and we get . ∎
Since for any subspace of , the corresponding orthogonal projection has operator norm , we also immediately get the following corollary of Lemma 6:
For any subspace :
In the next theorem, we combine the lower bound in Corollary 4 and the properties we proved above in order to give a lower bound on the sample complexity of an arbitrary symmetric convex body in terms of its geometric properties. In the following sections we will relate this geometric lower bound to the mean Gaussian width of .
Theorem 6 (Geometric Lower Bound).
For all , , any convex symmetric body , any and any :
Notice that is the Euclidean unit ball in the subspace , and, therefore:
Finally, by Corollary 4, we get the following lower bound, as long as :
Combining the inequalities completes the proof. ∎
5 Optimality of the Gaussian Mechanism
In this section, we present the result that the Gaussian mechanism is optimal, up to constant factors, when is sufficiently large. More specifically, if the Gaussian mean width of is asymptotically maximal, then we can get a tight lower bound on the sample complexity of the Gaussian mechanism. This is summarized in the theorem below.
For all , , sufficiently small constant , and any symmetric convex body , if
and is achieved, up to constants, by the Gaussian mechanism.
By Corollary 2 we have an upper bound for the Gaussian mechanism defined previously. To prove its optimality, we use a classical result from convex geometry, known as Dvoretzky’s criterion, to show a matching lower bound for the sample complexity. This result relates the existence of a nearly-spherical section of a given convex body to the Gaussian mean norm. It was a key ingredient in Milman’s probabilistic proof of Dvoretzky’s theorem: see Matoušek’s book [Mat02] for an exposition.
Theorem 8 ([Mil71]; Dvoretzky’s Criterion).
For every symmetric convex body such that , and every , there exists a constant and a subspace with dimension for which:
Proof of Theorem 7.
Given the matching upper bound on sample complexity in Corollary 2, it suffices to show the equivalent lower bound, namely that:
To this end, we will show that there exists a , for an absolute constant , so that . Then the lower bound will follow directly from Theorem 6.
We will prove the claim above by applying Dvoretzky’s criterion to . By Fact 1, . We can then apply Dvoretzky’s criterion with , ensuring that there exists a subspace of dimension for which:
Let us define ; then . Since, by assumption , there exists a constant so that . Finally, by the definition of Gelfand width, , as desired. This completes the proof. ∎
6 Gaussian Width Lower Bounds in -position
In Section 5 we showed that the Gaussian Mechanism is optimal when the Gaussian mean width of is asymptotically as large possible. Our goal in this and the following section is to show general lower bounds on sample complexity in terms of . This is motivated by the sample complexity upper bound in terms of provided by the projection mechanism.
It is natural to follow the strategy from Section 5: use Dvoretzky’s criterion to find a nearly-spherical projection of of appropriate radius and dimension. An inspection of the proof of 7 shows that the sample complexity lower bound we get this way is (ignoring the dependence on , , and here, and in the rest of this informal discussion). Recall that we are aiming for a lower bound of of , so we are off by a factor of . Roughly speaking, the problem is that Dvoretzky’s criterion does too much: it guarantees a spherical section of , while we only need a bound on the diameter of the section. In order to circumvent this difficulty, we use a different result from asymptotic convex geometry, the low -estimate, which bounds the diameter of a random section of , without also producing a large ball contained inside the section. A technical difficulty is that the resulting upper bound on the diameter is in terms of the Gaussian mean -norm, rather than the (reciprocal of the) mean width. When is in -position, this is not an issue, because results of Pisier, Figiel, and Tomczak-Jaegermann show that in that case . In this section we assume that is in -position, and we remove this requirement in the subsequent section.
The main result of this section is summarized below.
For all , , all symmetric convex bodies in -position, and for , where is an absolute constant:
The following two theorems are the main technical ingredients we need in the proof of Theorem 9.
There exists a constant such that for every symmetric convex body in -position:
It is an open problem whether this bound can be improved to . This would be tight for the cube . This improvement would lead to a corresponding improvement in our bounds.
Theorem 11 ([Ptj86]; Low estimate).
There exists a constant such that for every symmetric conv