The Geometry of Differential Privacy:The Sparse and Approximate Cases

# The Geometry of Differential Privacy: The Sparse and Approximate Cases

Aleksandar Nikolov Department of Computer Science, Rutgers University, Piscataway, NJ 08854. This work was done while the author was at Microsoft Research SVC.    Kunal Talwar Microsoft Research SVC, Mountain View, CA 94043.    Li Zhang Microsoft Research SVC, Mountain View, CA 94043.
###### Abstract

In this work, we study trade-offs between accuracy and privacy in the context of linear queries over histograms. This is a rich class of queries that includes contingency tables and range queries, and has been a focus of a long line of work [BLR08, RR10, DRV10, HT10, HR10, LHR10, BDKT12]. For a given set of linear queries over a database , we seek to find the differentially private mechanism that has the minimum mean squared error. For pure differential privacy, [HT10, BDKT12] give an approximation to the optimal mechanism. Our first contribution is to give an approximation guarantee for the case of -differential privacy. Our mechanism is simple, efficient and adds carefully chosen correlated Gaussian noise to the answers. We prove its approximation guarantee relative to the hereditary discrepancy lower bound of [MN12], using tools from convex geometry.

We next consider this question in the case when the number of queries exceeds the number of individuals in the database, i.e. when . The lower bounds used in the previous approximation algorithm no longer apply, and in fact better mechanisms are known in this setting [BLR08, RR10, HR10, GHRU11, GRU12]. Our second main contribution is to give an -differentially private mechanism that for a given query set and an upper bound on , has mean squared error within of the optimal for and . This approximation is achieved by coupling the Gaussian noise addition approach with linear regression over the ball. Additionally, we show a similar polylogarithmic approximation guarantee for the best -differentially private mechanism in this sparse setting. Our work also shows that for arbitrary counting queries, i.e. with entries in , there is an -differentially private mechanism with expected error per query, improving on the bound of [BLR08], and matching the lower bound implied by [DN03] up to logarithmic factors.

The connection between hereditary discrepancy and the privacy mechanism enables us to derive the first polylogarithmic approximation to the hereditary discrepancy of a matrix .

\typearea

15

## 1 Introduction

Differential privacy [DMNS06] is a recent privacy definition that has quickly become the standard notion of privacy in statistical databases. Informally, a mechanism (a randomized function on databases) satisfies differential privacy if the distribution of the outcome of the mechanism does not change noticeably when one individual’s input to the database is changed. Privacy is measured by how small this change must be: an -differentially private (-DP) mechanism satisfies for any pair of neighboring databases, and for any measurable subset of the range. A relaxation of this definition is approximate differential privacy. A mechanism is -differentially private (-DP) if with as before. Here is thought of as negligible in the size of the database. Both these definitions satisfy several desirable properties such as composability, and are resistant to post-processing of the output of the mechanism.

In recent years, a large body of research has shown that this strong privacy definition still allows for very accurate analyses of statistical databases. At the same time, answering a large number of adversarially chosen queries accurately is inherently impossible with any semblance of privacy. Indeed Dinur and Nissim [DN03] show that answering random subset sums ( hides polylogarithmic factors in .) of a set of bits with (per query) error allows an attacker to reconstruct (an arbitrarily good approximation to) all the private information. Thus there is an inherent trade-off between privacy and accuracy when answering a large number of queries. In this work, we study this trade-off in the context of counting queries, and more generally linear queries.

We think of the database as being given by a multiset of database rows, one for each individual. We will let denote the size of the universe that these rows come from, and we will denote by the number of individuals in the database. We can represent the database as its histogram with denoting the number of occurrences of the th element of the universe. Thus would in fact be a vector of non-negative integers with . We will be concerned with reporting reasonably accurate answers to a given set of linear queries over this histogram . This set of queries can naturally be represented by a matrix with the vector giving the correct answers to the queries. When , we call such queries counting queries. We are interested in the (practical) regime where , although our results hold for all settings of the parameters.

A differentially private mechanism will return a noisy answer to the query and, in this work, we measure the performance of the mechanisms in terms of its worst case total expected squared error. Suppose that is the set of all possible databases. The error of a mechanism is defined as . Here the expectation is taken over the internal coin tosses of the mechanism itself, and we look at the worst case of this expected squared error over all the databases in . Unless stated otherwise, the error of the mechanism will refer to this worst case expected error. Phrased thus, the Gaussian noise mechanism of Dwork et al. [DKM06a] gives error at most for any counting query and guarantees -DP111Here and in the rest of the introduction, we suppress the dependence of the error on and . over all the databases, i.e. . Moreover, the aforementioned lower bounds imply that there exist counting queries for which this bound can not be improved. For -DP, Hardt and Talwar [HT10] gave a mechanism with error and showed that this is the best possible for random counting queries. Thus the worst case accuracy for counting queries is fairly well-understood in this measure.

Specific sets of counting queries of interest can however admit much better mechanisms than adversarially chosen queries for which the lower bounds are shown. Indeed several classes of specific queries have attracted attention. Some, such as range queries, are “easier”, and asymptotically better mechanisms can be designed for them. Others, such as constant dimensional contingency tables, are nearly as hard as general counting queries, and asymptotically better mechanisms can be ruled out in some ranges of the parameters. These query-specific upper bounds are usually proved by carefully exploiting the structure of the query, and query-specific lower bounds have been proved by reconstruction attacks that exploit a lower bound on the smallest singular value of an appropriately chosen  [DN03, DMT07, DY08, KRSU10, De12, KRS13]. It is natural to address this question in a competitive analysis framework: can we design an efficient algorithm that given any query , computes (even approximately) the minimum error differentially private mechanism for ?

Hardt and Talwar [HT10] answered this question in the affirmative for -DP mechanisms, and gave a mechanism that has error within factor of the optimal assuming a conjecture from convex geometry known as the hyperplane conjecture or the slicing conjecture. Bhaskara et al. [BDKT12] removed the dependence on the hyperplane conjecture and improved the approximation ratio to . Can relaxing the privacy requirement to -DP help with accuracy? In many settings, -DP mechanisms can be simpler and more accurate than the best known -DP mechanisms. This motivates the first question we address.

###### Question 1

Given , can we efficiently approximate the optimal error -DP mechanism for it?

Hardt and Talwar [HT10] showed that for some , the lower bound for -DP mechanism can be larger than known -DP mechanisms. For non-linear Lipschitz queries, De [De12] showed that this gap can be as large as (even when ). This leads us to ask:

###### Question 2

How large can the gap between the optimal -DP mechanism and the optimal -DP mechanism be for linear queries?

When the databases are sparse, e.g. when , one may obtain better mechanisms. Blum, Ligett and Roth [BLR08] gave an -DP mechanism that can answer any set of counting queries with error . A series of subsequent works [DNR09, DRV10, RR10, HR10, GHRU11, HLM12, GRU12] led to -DP mechanisms that have error only . Thus when , the lower bound of for arbitrary databases can be breached by exploiting the sparsity of the database. This motivates a more refined measure of error that takes the sparsity of into account. Given an and , one can ask for the mechanism that minimizes the sparse case error . The next set of questions we study address this measure.

###### Question 3

Given and , can we approximate the optimal sparse case error -DP mechanism for when restricted to databases of size at most ?

###### Question 4

Given and , can we approximate the optimal sparse case error -DP mechanism for when restricted to databases of size at most ?

The gap between the error -DP mechanism of [BLR08] and the error -DP mechanism of [HR10] leads us to ask:

###### Question 5

Is there an -DP mechanism with error for databases of size at most ?

### 1.1 Results

In this work, we answer Questions 1-5 above. Denote by the -dimensional ball. Recall that for any query matrix and any set , the (worst-case expected squared) error of is defined as

 errM(A,X)≜maxx∈XE[∥M(x)−Ax∥22].

In this paper, we are interested in both the case when , called the dense case, and when for , called the sparse case. We also write and .

Our first result is a simple and efficient mechanism that for query matrix gives an approximation to the optimal error.

###### Theorem 1

Given a query matrix , there is an efficient -DP mechanism and an efficiently computable lower bound such that

• , and

• for any -DP mechanism , .

We also show that the gap of between -DP and -DP mechanisms shown in [HT10] is essentially the worst possible, within factor, for linear queries. More precisely, the lower bound on -DP mechanisms used in [HT10] is always within of the lower bound computed by our algorithm above. Let denote the -DP generalized -norm mechanism in [HT10].

###### Theorem 2

For any -DP mechanism , .

We next move to the sparse case. Here we give results analogous to the dense case with a slightly worse approximation ratio.

###### Theorem 3

Given and a bound , there is an efficient -DP mechanism and an efficiently computable lower bound such that

• , and

• For any -DP mechanism , .

###### Theorem 4

Given and a bound , there is an efficient -DP mechanism and an efficiently computable lower bound such that

• , and

• For any -DP mechanism , .

We remark that in these theorems, our upper bounds hold for all with , whereas the lower bounds hold even when is an integer vector.

The -DP mechanism of Theorem 3 when run on any counting query has error no larger than the best known bounds [GRU12] for counting queries, up to constants (not ignoring logarithmic factors). The -DP mechanism of Theorem 4 when run on any counting query can be shown to have nearly the same asymptotics, answering question 5 in the affirmative.

###### Theorem 5

For any counting query , there is an -DP mechanism such that .

We will summarize some key ideas we use to achieve these results. More details will follow in Section 1.2.

For the upper bounds, the first crucial step is to decompose into “geometrically nice” components and then add Gaussian noise to each component. This is similar to the approach in [HT10, BDKT12] but we use the minimum volume enclosing ellipsoid, rather than the -ellipsoid used in those works, to facilitate the decomposition process. This allows us to handle the approximate and the sparse cases. In addition, it simplifies the mechanism as well as the analysis. For the sparse case, we further couple the mechanism with least squares estimation of the noisy answer with respect to . By utilizing techniques from statistical estimation, we can show that this process can reduce the error when , and prove an error upper bound dependent on the size of the smallest projection of .

For the lower bounds, we first lower bound the accuracy of -DP mechanism by the hereditary discrepancy of the query matrix , which we in turn lower bound in terms of the least singular values of submatrices of . Finally, we close the loop by utilizing the restricted invertibility principle by Bourgain and Tzafriri [BT87] and its extension by Vershynin [Ver01] which, informally, shows that if there does not exist a “small” projection of then has a “large” submatrix with a “large” least singular value.

Approximating Hereditary Discrepancy

The discrepancy of a matrix is defined to be . The hereditary discrepancy of a matrix is defined as , where denotes the matrix restricted to the columns indexed by .

As hereditary discrepancy is a maximum over exponentially many submatrices, it is not a priori clear if there even exists a polynomial-time verifiable certificate for low hereditary discrepancy. Additionally, we can show that it is -hard to approximate hereditary discrepancy to within a factor of . Bansal [Ban10] gave a pseudo-approximation algorithm for hereditary discrepancy, which efficiently computes a coloring of discrepancy at most a factor of larger than for a matrix . His algorithm allows efficiently computing a lower bound on for any restriction ; however, such a lower bound may be arbitrarily loose, and before our work it was not known how to efficiently compute nearly matching lower and upper bounds on .

Muthukrishnan and Nikolov [MN12] show that for a query matrix , the error of any -DP mechanism is lower bounded by (an version of) (up to logarithmic factors). Moreover, the lower bound used in Theorem 1 is in fact a lower bound on this version of . Using the von Neumann minimax theorem, we can go between the and the versions of these concepts, allowing us to sandwich the hereditary discrepancy of between two quantities: a determinant based lower bound and the efficiently computable expected error of the private mechanism. As the two quantities are nearly matching, our work therefore leads to a polylogarithmic approximation to the hereditary discrepancy of any matrix .

### 1.2 Techniques

In addition to known techniques from the differential privacy literature, our work borrows tools from discrepancy theory, convex geometry and statistical estimation. We next briefly describe how they fit in.

Central to designing a provably good approximation algorithm is an efficiently computable lower bound on the optimum. Muthukrishnan and Nikolov [MN12] proved that (a slight variant of) the hereditary discrepancy of leads to a lower bound for the error of any -DP mechanism. Lovász, Spencer and Vesztergombi [LSV86] showed that hereditary discrepancy itself can be lower bounded by a quantity called the determinant lower bound. Geometrically, this lower bound corresponds to picking the columns of that (along with the origin) give us a simplex with the largest possible volume. The volume or this simplex, appropriately normalized, gives us a lower bound on OPT. More precisely for any simplex , gives a lower bound on the error. The factor can be removed by using a lower bound based on the least singular values of submatrices of . Geometrically, for the least singular value lower bound we need to find a simplex of large volume whose non-zero vertices are also nearly pairwise orthogonal.

If the columns of all lie in a unit ball of radius , it can be shown that adding Gaussian noise proportional to suffices to guarantee -DP, resulting in a mechanism having total squared error . Can we relate this quantity to the lower bound? It turns out that if the unit ball of radius is the minimum volume ellipsoid containing the columns of , this can be done. In this case, a result of Vershynin [Ver01], building on the restricted invertability results by Bourgain and Tzafriri [BT87], tells us that one can find vertices of that touch the minimum containing ellipsoid, and are nearly orthogonal. The simplex formed by these vertices therefore has large volume, giving us a -DP lower bound of . In this case, the Gaussian mechanism with the optimal is within a constant factor of the lower bound. When the minimum volume enclosing ellipsoid is not a ball, we need to project the query along the shortest axes of this ellipsoid, answer this projection using the Gaussian mechanism, and recurse on the orthogonal projection. Using the full power of the restricted invertability result by Vershynin allows us to construct a large simplex and prove our competitive ratio.

Hardt and Talwar [HT10] also used a volume based lower bound, but for -DP mechanisms, one can take , the symmetric convex hull of all the columns of and use its volume instead of the volume of in the lower bound above. How do these lower bounds compare? By a result of Bárány and Füredi [BF88] and Gluskin [Glu07], one can show that the volume of the convex hull of points can be bounded by times that of the minimum enclosing ellipsoid. This, along with the aforementioned restricted invertability results, allows us to prove that the -DP lower bound is within of the -DP lower bound.

How do we handle sparse queries? The first observation is that the lower bounding technique gives us columns of and the resulting lower bound holds not just for but even for the submatrix of corresponding to the maximum volume simplex ; moreover, the lower bound holds even when all databases are restricted to individuals. Thus the lower bound holds when and this value marks the transition between the sparse and the dense cases. Moreover, when the minimum volume ellipsoid containing the columns of is a ball, the restricted invertibility principle of Bourgain and Tzafriri and Vershynin gives us a -dimensional simplex with nearly pairwise orthogonal vertices, and, therefore any -dimensional face of this simplex is another simplex of large volume. The large -dimensional simplex gives a lower bound on error when databases are restricted to have at most individuals.

For smaller , the error added by the Gaussian mechanism may be too large, and even though the value lies in , the noisy answer will likely fall outside this set. A common technique in statistical estimation for handling such error is to “project” the noisy point back into , i.e. report the point in that minimizes the Euclidean distance to the noisy answer . This projection step provably reduces the expected error! Geometrically, we use well known techniques from statistics to show that the error after projection is bounded by the “shadow” that leaves on the noise vector; this shadow is much smaller than the length of the noise vector when . In fact, when the noise is a spherical Gaussian, it can be shown that is only about . This gives near optimal bounds for the case when the minimum volume ellipsoid is a ball; the general case is handled using a recursive mechanism as before.

To get an -DP mechanism, we use the -norm mechanism [HT10] instead of Gaussian noise. To bound the shadow of on , where is the noise vector generated by the -norm mechanism, we first analyze the expectation of for any column of , and we use the log concavity of the noise distribution to prove concentration of this random variable. A union bound helps complete the argument as in the Gaussian case.

### 1.3 Related Work

Dwork et al. [DMNS06] showed that any query can be released while adding noise proportional to the total sensitivity of the query. This motivated the question of designing mechanisms with good guarantees for any set of low sensitivity queries. Nissim, Raskhodnikova and Smith [NRS07] showed that adding noise proportional to (a smoothed version of) the local sensitivity of the query suffices for guaranteeing differential privacy; this may be much smaller than the worst case sensitivity for non-linear queries. Lower bounds on the amount of noise needed for general low sensitivity queries have been shown in [DN03, DMT07, DY08, DMNS06, RHS07, HT10, De12]. Kasiviswathan et al. [KRSU10] showed upper and lower bounds for contingency table queries and more recently [KRS13] showed lower bounds on publishing error rates of classifiers or even M-estimators. Muthukrishnan and Nikolov [MN12] showed that combinatorial discrepancy lower bounds the noise for answering any set of linear queries.

Using learning theoretic techniques, Blum, Ligett and Roth [BLR08] first showed that one can exploit sparsity of the database, and answer a large number of counting queries with error small compared to the number of individuals in the database. This line of work has been further extended and improved in terms of error bounds, efficiency, generality and interactivity in several subsequent works [DNR09, DRV10, RR10, HR10, GHRU11, HLM12].

Ghosh, Roughgarden and Sundarajan [GRS09] showed that for any one dimensional counting query, a discrete version of the Laplacian mechanism is optimal for pure privacy in a very general utilitarian framework and Gupte and Sundararajan [GS10] extended this to risk averse agents. Brenner and Nissim [BN10] showed that such universally optimal private mechanisms do not exist for two counting queries or for a single non-binary sum query. As mentioned above, Hardt and Talwar [HT10], and Bhaskara et al. [BDKT12] gave relative guarantees for multi-dimensional queries under pure privacy with respect to total squared error. De [De12] unified and strengthened these bounds and showed stronger lower bounds for the class of non-linear low sensitivity queries.

For specific queries of interest, improved upper bounds are known. Barak et al. [BCD07] studied low dimensional marginals and showed that by running the Laplace mechanism on a different set of queries, one can reduce error. Using a similar strategy, improved mechanisms were given by [XWG10, CSS10] for orthogonal counting queries, and near optimal mechanisms were given by Muthukrishnan and Nikolov [MN12] for halfspace counting queries. The approach of answering a set of queries different from the target query set has also been studied in more generality and for other sets of queries by [LHR10, DWHL11, RHS07, XWG10, XXY10, YZW12]. Li and Miklau [LM12a, LM12b] study a class of mechanisms called extended matrix mechanisms and show that one can efficiently find the best mechanisms from this class. Hay et al. [HRMS10] show that in certain settings such as unattributed histograms, correcting noisy answers to enforce a consistency constraint can improve accuracy.

Very recently, Fawaz et al. [FMN] used the hereditary discrepancy lower bounds of Muthukrishnan and Nikolov, as well as the determinant lower bound on discrepancy of Lovasz, Spencer, and Vesztergombi, to prove that a certain Gaussian noise mechanism is nearly optimal (in the dense setting) for computing any given convolution map. Like our algorithms, their algorithm adds correlated Gaussian noise; however, they always use the Fourier basis to correlate the noise.

We refer the reader to texts by Chazelle [Cha00] and Matoušek [Mat99] and the chapter by Beck and Sós [BS95] for an introduction to discrepancy theory. Bansal [Ban10] showed that a semidefinite relaxation can be used to design a pseudo-approximation algorithm for hereditary discrepancy. Matoušek [Mat11] showed that the determinant based lower bound of Lovász, Spencer and Vesztergombi [LSV86] is tight up to polylogarithmic factors. Larsen [Lar11] showed applications of hereditary discrepancy to data structure lower bounds, and Chandrasekaran and Vempala [CV11] recently showed applications of hereditary discrepancy to problems in integer programming.

In Section 2.3.3 we introduce relevant preliminaries. In Section 3 we present our main results for approximate differential privacy, and in Section 4 we present our main results for pure differential privacy. In Section 5 we prove absolute upper bounds on the error required for privately answering sets of counting queries. In Section 6 we give some extensions and applications of our main results, namely an optimal efficient mechanism for error in the dense case, and the efficient approximation to hereditary discrepancy implied by that mechanism. We conclude in Section 7.

## 2 Preliminaries

We start by introducing some basic notation.

Let , and be, respectively, the and unit balls in . Also, let be the convex hull of the vectors . Equivalently, where is a matrix whose columns equal .

For a matrix and a set , we denote by the submatrix of consisting of those columns of indexed by elements of . Occasionally we refer to a matrix whose columns form an orthonormal basis for some subspace of interest as the orthonormal basis of . is the set of orthogonal projections onto -dimensional subspaces of .

By and we denote, respectively, the smallest and largest singular value of . I.e., and . In general, is the -th largest singular value of , and is the -th largest eigenvalue of . We recall the minimax characterization of eigenvalues for symmetric matrices:

 λi=maxV:dimV=iminx∈V:∥x∥2=1xTAx.

For a matrix (and the corresponding linear operator), we denote by the spectral norm of and the Frobenius norm of . By we denote the kernel of , i.e. the subspace of vectors for which .

### 2.1 Geometry

For a set , we denote by its -dimensional volume. Often we use instead the volume radius

Subscripts are omitted when this does not cause confusion. When lies in a -dimensional affine subspace of , and (without subscripts) are understood to imply and , respectively.

For a convex body , the polar body is defined by . The fundamental fact about polar bodies we use is that for any two convex bodies and

 K⊆L⇔L∘⊆K∘. (1)

In the remainder of this paper, when we claim that a fact follows “by convex duality,” we mean that it is implied by (1).

A convex body is (centrally) symmetric if . The Minkowski norm induced by a symmetric convex body is defined as . The Minkowski norm induced by the polar body of is the dual norm of and also has the form . For convex symmetric , the induced norm and dual norm satisfy Hölder’s inequality:

 |⟨x,y⟩|≤∥x∥K∥y∥K∘. (2)

An ellipsoid in is the image of under an affine map. All ellipsoids we consider are symmetric, and therefore, are equal to an image of the ball under a linear map . A full dimensional ellipsoid can be equivalently defined as . The polar body of a symmetric ellipsoid is the ellipsoid (or cylinder with an ellipsoid as its base in case is not full dimensional) .

We repeatedly use a classical theorem of Fritz John, characterizing the (unique) minimum volume enclosing ellipsoid (MEE) of any convex body . We note that John’s theorem is frequently stated in terms of the maximum volume enclosed ellipsoid in ; the two variants of the theorem are equivalent by convex duality. The MEE of is also known as a the Löwner or Löwner-John ellipsoid of .

###### Theorem 6 ([Joh48])

Any convex body is contained in a unique ellipsoid of minimal volume. This ellipsoid is if and only if there exist unit vectors and positive reals such that

 ∑ciui =0 ∑ciuiuTi =I

According to John’s characterization, when the MEE of is the ball , the contact points of and satisfy a structural property — the identity decomposes into a linear combination of the projection matrices onto the lines of the contact points. Intuitively, this means that “hits” in all directions — it has to, or otherwise can be “pinched” in order to produce a smaller ellipsoid that still contains . This intuition is formalized by a theorem of Vershynin, which generalizes the work of Bourgain and Tzafriri on restricted invertibility [BT87]. Vershynin ([Ver01] Theorem 3.1) shows that there exist contact points of and which are approximately pairwise orthogonal.

###### Theorem 7 ([Ver01])

Let be a symmetric convex body whose minimum volume enclosing ellipsoid is the unit ball . Let be a linear map with spectral norm . Then for any , there exist constant , and contact points with such that the matrix satisfies

 C1(β)∥T∥F√d≤σmin(TX)≤σmax(TX)≤C2(β)∥T∥F√d

### 2.2 Statistical Estimation

A key element in our algorithms for the sparse case is the use of least squares estimation to reduce error. Below we present a bound on the error of least squares estimation with respect to symmetric convex bodies. This analysis appears to be standard in the statistics literature; a special case of it appears for example in [RWY11].

###### Lemma 1

Let be a symmetric convex body, and let and for some . Let, finally, We have

###### Proof.

First we show the easier bound , which follows by the triangle inequality:

 ∥^y−y∥2≤∥^y−~y∥2+∥~y−y∥2≤2∥~y−y∥2.

The second bound is based on Hölder’s inequality and the following simple but very useful fact, illustrated schematically in Figure 1:

 ∥^y−y∥22 =⟨^y−y,~y−y⟩+⟨^y−y,^y−~y⟩ ≤2⟨^y−y,~y−y⟩. (3)

The inequality (3) follows from

 ⟨^y−y,~y−y⟩=∥~y−y∥22+⟨^y−~y,~y−y⟩≥∥^y−~y∥22+⟨^y−~y,~y−y⟩=⟨^y−~y,^y−y⟩.

Inequality (3), , and Hölder’s inequality imply

 ∥^y−y∥22≤2⟨^y−y,w⟩≤2∥^y−y∥L∥w∥L∘≤4∥w∥L∘,

which completes the proof. ∎

### 2.3 Differential Privacy

Following recent work in differential privacy, we model private data as a database of rows, where each row of contains information about an individual. Formally, a database is a multiset of size of elements of the universe of possible user types. Our algorithms take as input a histogram of the database , where the -th component of encodes the number of individuals in of type . Notice that in this histogram representation, we have when is a database of size . Also, two neighboring databases and that differ in the presence or absence of a single individual correspond to two histograms and satisfying .

Through most of this paper, we work under the notion of approximate differential privacy. The definition follows.

###### Definition 1 ([Dmns06, Dkm+06b])

A (randomized) algorithm with input domain and output range is -differentially private if for every , every with , and every measurable , satisfies

 Pr[M(x)∈S]≤eεPr[M(x′)∈S]+δ.

When , we are in the regime of pure differential privacy.

An important basic property of differential privacy is that the privacy guarantees degrade smoothly under composition and are not affected by post-processing.

###### Lemma 2 ([Dmns06, Dkm+06b])

Let and satisfy - and -differential privacy, respectively. Then the algorithm which on input outputs the tuple satisfies -differential privacy.

#### 2.3.1 Optimality for Linear Queries

In this paper we study the necessary and sufficient error incurred by differentially private algorithms for approximating linear queries. A set of linear queries is given by a query matrix or workload ; the exact answers to the queries on a histogram are given by the -dimensional vector .

We define error as total squared error. More precisely, for an algorithm and a subset , we define

 errM(A,X)≜supx∈XE∥Ax−M(A,x)∥22.

We also write as . The optimal error achievable by any -differentially private algorithm for queries and databases of size up to is

 optε,δ(A,n)≜infMerrM(A,n),

where the infimum is taken over all -differentially private algorithms. When no restrictions are placed on the size of the database, the appropriate notion of optimal error is . Similarly, for an algorithm , the error when database size is not bounded is . A priori it is not clear that these quantities are necessarily finite, but we will show that this is the case.

In order to get tight dependence on the privacy parameter in our analyses, we will use the following relationship between and .

###### Lemma 3

For any , any , any integer and for ,

 optε,δ(A,n)≥k2optkε,δ′(A,n/k).
###### Proof.

Let be an -differentially private algorithm achieving . We will use as a black box to construct a -differentially private algorithm which satisfies the error guarantee .

The algorithm on input satisfying outputs . We need to show that satisfies -differential privacy. Let and be two neighboring inputs to , i.e. , and let be a measurable subset of the output . Denote and . We need to show that . To that end, define , , , , . Applying the -privacy guarantee of to each of the pairs of neighboring inputs , , , in sequence gives us

 p1≤ekεp2+(1+eε+…+e(k−1)ε)δ=ekεp2+ekε−1eε−1δ.

This finishes the proof of privacy for . It is straightforward to verify that . ∎

Above, we state the error and optimal error definitions for histograms , which can be arbitrary real vectors. All our algorithms work in this general setting. Recall, however, that the histograms arising from our definition of databases are integer vectors. Our lower bounds do hold against integer histograms as well. Therefore, defining and in terms of integer histograms (i.e. taking ) does not change the asymptotics of our theorems.

#### 2.3.2 Gaussian Noise Mechanism

A basic mechanism for achieving -differential privacy for linear queries is adding appropriately scaled independent Gaussian noise to each query. This approach goes back to the work of Blum et al. [BDMN05], predating the definition of differential privacy. Next we define this basic mechanism formally and give a privacy guarantee. The privacy analysis of the Gaussian mechanism in the context of -differential privacy was first given in [DKM06b]. We give the full proof here for completeness.

###### Lemma 4

Let be a matrix such that . Then a mechanism which on input outputs , where , satisfies -differential privacy.

###### Proof.

Let and let be the probability density function of . Let also , so implies . Define

 Dv(w)≜lnp(w)p(w+v).

We will prove that when , for all , . This suffices to prove -differential privacy. Indeed, let the algorithm output and fix any s.t. . Let and . For any measurable we have

 Pr[Ax+w∈T] =Pr[w∈T−Ax] =∫S∩(T−Ax)p(w)dw+∫¯S∩(T−Ax)p(w)dw ≤δ+eε∫¯S∩(T−Ax′)p(w)dw =δ+eεPr[w∈T−Ax′]=δ+eεPr[Ax′+w∈T].

We fix an arbitrary and proceed to prove with probability at least . We will first compute and then apply a tail bound. Recall that . Notice also that, since can be written as where , we have . Then we can write

 EDv(w) =E∥v+w∥22−∥w∥222C2σ2 =E∥v∥2+2vTw2C2σ2≤12C2

Note that to bound we simply need to bound from above and below. Since , we can apply a Chernoff bound and we get

 Pr[|vTw|>1C√2ln(1/δ)]≤δ.

Therefore, with probability ,

 1/2C−√2ln(1/δ)C≤Dv(w)≤1/2C+√2ln(1/δ)C.

Substituting completes the proof. ∎

The following corollary is a useful geometric generalization of Lemma 4.

###### Corollary 8

Let be a matrix of rank and let . Let ( is a linear map) be an ellipsoid containing . Then a mechanism that outputs where satisfies -differential privacy.

###### Proof.

Since is full dimensional (by ) and contains , is full dimensional as well, and, therefore, is an invertible linear map. Define . For each column of , we have . Therefore, by Lemma 4, a mechanism that outputs (where is distributed as in the statement of the corollary) satisfies -differential privacy. Therefore, is -differentially private by the post-processing property of differential privacy. ∎

We present a composition theorem, specific to composing Gaussian noise mechanisms. We note that a similar composition result in a much more general setting but with slightly inferior dependence on the parameters is proven in [DRV10].

###### Corollary 9

Let be vector spaces of respective dimensions , such that , and . Let be a matrix of rank and let . Let be the projection matrix for and let be an ellipsoid such that . Then the mechanism that outputs where for each , , satisfies -differential privacy.

###### Proof.

Let . Since the random variables are pairwise independent Gaussian random variables, and has covariance matrix , we have that is a Gaussian random variable with covariance , whee . By Corollary 8, it is sufficient to show that the ellipsoid contains . By convex duality, this is equivalent to showing , which is in turn equivalent to . Recalling that and , we need to establish

 ∀x∈Rd,∀j∈[N]:⟨aj,x⟩2≤xTGGTx. (4)

We proceed by establishing (4). Since for all , , by duality and the same reasoning as above, we have that for all and , . Therefore, by the Cauchy-Schwarz inequality,

 ⟨aj,x⟩2 =(k∑i=1⟨Πiaj,x⟩)2 ≤kk∑i=1⟨Πiaj,x⟩2 ≤kk∑i=1xTFiFTx=xTGGTx.

This completes the proof. ∎

#### 2.3.3 Noise Lower Bounds

We will make extensive use of a lower bound on the noise complexity of -differentially private mechanisms in terms of combinatorial discrepancy. First we need to define the notion of hereditary -discrepancy:

 herdiscα(A,n)≜maxS⊆[N]:|S|≤nminx∈{−1,0,+1}S∥x∥1≥α|S|∥(A|S)x∥2.

We denote . An equivalent notation is . When the norm is substituted with , we have the classical notion hereditary discrepancy, here denoted .

Next we present the lower bound, which is a simple extension of the discrepancy lower bound on noise recently proved by Muthukrishnan and Nikolov [MN12].

###### Theorem 10 ([Mn12])

Let be an real matrix. For any constant and sufficiently small constant and ,

 optε,δ(A,n)=Ω(1)herdiscα(A,n)2.

We further develop two lower bounds for which are more convenient to work with. The first lower bound is by using spectral techniques. Observe first that, since the -norm of any vector does not increase under projection, we have for any projection matrix . Furthermore, recall that for a matrix , . For any satisfying , we have . Therefore,

 herdiscα(A,n)2≥maxS⊆[N]:|S|≤nα|S|σ2min(A|S). (5)

Let’s define

 specLB(A,n)≜