The Cost of Privacy: Optimal Rates of Convergence for Parameter Estimation with Differential Privacy\thanksrefT1
Privacy-preserving data analysis is a rising challenge in contemporary statistics, as the privacy guarantees of statistical methods are often achieved at the expense of accuracy. In this paper, we investigate the tradeoff between statistical accuracy and privacy in mean estimation and linear regression, under both the classical low-dimensional and modern high-dimensional settings. A primary focus is to establish minimax optimality for statistical estimation with the -differential privacy constraint. To this end, we find that classical lower bound arguments fail to yield sharp results, and new technical tools are called for.
By refining the “tracing adversary” technique for lower bounds in the theoretical computer science literature, we formulate a general lower bound argument for minimax risks with differential privacy constraints, and apply this argument to high-dimensional mean estimation and linear regression problems. We also design computationally efficient algorithms that attain the minimax lower bounds up to a logarithmic factor. In particular, for the high-dimensional linear regression, a novel private iterative hard thresholding pursuit algorithm is proposed, based on a privately truncated version of stochastic gradient descent. The numerical performance of these algorithms is demonstrated by simulation studies and applications to real data containing sensitive information, for which privacy-preserving statistical methods are necessary.
arXiv:0000.0000 \startlocaldefs \endlocaldefs \zxrsetuptoltxlabel=true, tozreflabel=true \zexternaldocument*Appendix_201908
The Cost of Privacy \thankstextT1The research was supported in part by NSF grant DMS-1712735 and NIH grants R01-GM129781 and R01-GM123056.
, , and label=u2]URL: https://linjunz.github.io/
class=MSC] \kwd[Primary ]62F30 \kwd[; secondary ]62F12\kwd62J05
High-dimensional data; Differential privacy; Mean estimation; Linear regression; Minimax optimality
With the unprecedented availability of datasets containing sensitive personal information, there are increasing concerns that statistical analysis of such datasets may compromise individual privacy. These concerns give rise to statistical methods that provide privacy guarantees at the cost of statistical accuracy, but there has been very limited understanding of the optimal tradeoff between accuracy and privacy in many important problems that we shall investigate in the paper.
A rigorous definition of privacy is a prerequisite for such an understanding. Differential privacy, introduced in Dwork et al. , is arguably the most widely adopted definition of privacy in statistical data analysis. The promise of a differentially private algorithm is protection of any individual’s privacy from an adversary who has access to the algorithm output and even sometimes the rest of the data.
Differential privacy has gained significant attention in the machine learning communities over the past few years [17, 1, 20, 14] and found its way into real world applications developed by Google , Apple , Microsoft , and the U.S. Census Bureau .
A usual approach to developing differentially private algorithms is perturbing the output of non-private algorithms by random noise. When the observations are continuous, differential privacy can be guaranteed by adding Laplace/Gaussian noise to the non-private output . For discrete data, differential privacy can be achieved by adding Gumbel noise to utility score functions (also known as the exponential mechanism ). Naturally, the processed output suffers from some loss of accuracy, which has been observed and studied in the literature, see, for example, Wasserman and Zhou , Smith , Lei , Bassily et al. , Dwork et al. . The goal of this paper is to provide a quantitative characterization of the tradeoff between differential privacy guarantees and statistical accuracy, under the statistical minimax framework. Specifically, we consider this problem for mean estimation and linear regression models in both classical and high-dimensional settings with -differential privacy constraint, which is formally defined as follows.
Definition 1 (Differential Privacy ).
A randomized algorithm is -differentially private if and only if for every pair of adjacent datasets and , and for any set ,
where we say two datasets and are adjacent if and only if
According to the definition, the two parameters and control the level of privacy against an adversary who attempts to detect the presence of a certain subject in the sample. Roughly speaking, is an upper bound on the amount of influence an individual’s record has on the information released and is the probability that this bound fails to hold, so the privacy constraint becomes more stringent as tend to .
Sharp lower bounds based on tracing attacks: We establish the necessary cost of privacy by providing sharp minimax risk lower bounds, with the -differential privacy constraint, for high-dimensional sparse mean estimation and linear regression under both low-dimensional and high-dimensional settings. To this end, a key technical tool is presented in Theorem 2.1, which reduces lower bounding the minimax risk to designing a tracing adversary that aims to detect the presence of an individual in a dataset using the output of a differentially private statistic computed from the dataset. This reduction to tracing adversaries was first introduced by  and  to provide sample complexity lower bounds for attaining a given level of in-sample accuracy and applied to establishing lower bounds for estimating population quantities such as discrete and Gaussian mean vectors (see also [19, 27]). Compared to these existing lower bounds, as shown in Section 2.3, our refinement leads to tight dependence on the parameter.
Rate-optimal differentially private algorithms: For mean estimation and linear regression problems, we construct efficient algorithms that attain matching upper bounds up to logarithmic factors. The high-dimensional mean estimation algorithm is based on several differentially private subroutines, such as the Gaussian mechanism, reporting noisy top-, and their modifications. For high-dimensional linear regression, we propose a novel private iterative hard thresholding pursuit algorithm, based on a privately truncated version of stochastic gradient descent. Such a private truncation step effectively enforces the sparsity of the resulting estimator and leads to optimal control of the privacy cost (see more details in Section 4.2). To the best of our knowledge, these algorithms are the first results achieving the minimax optimal rates of convergence in high-dimensional statistical estimation problems with the -differential privacy guarantee.
In theoretical computer science,  showed that under strong conditions on privacy parameters, some point estimators attain the statistical convergence rates and hence privacy can be gained for free. [4, 18, 42] proposed differentially private algorithms for convex empirical risk minimization, principal component analysis, and high-dimensional sparse regression
The tracing adversary argument for lower bounds originated in the theoretical computer science literature [6, 40]. Early works in this direction were primarily concerned with the accuracy of releasing in-sample quantities, such as -way marginals, with differential privacy constraints. Some more recent works [19, 27] applied the idea to obtain lower bounds for estimating population quantities such as discrete and Gaussian mean vectors. We shall compare existing lower bounds for Gaussian mean estimation with our results in Section 2.3.
In the statistics literature, there has also been a series of works that studied differential privacy in the context of statistical estimation.  observed that, -local differentially private schemes seem to yield slower convergence rates than the optimal minimax rates in general;  developed a framework for statistical minimax rates with the -local privacy constraint; in addition,  showed minimax optimal rates of convergence under -local differential privacy and exhibited a mechanism that is minimax optimal for ÂÂnearly linear functionals based on randomized response. However, -local privacy is a much stronger notion of privacy than -differential privacy that is hardly compatible with high-dimensional problems . As we shall see in this paper, the cost of -differential privacy in statistical estimation behaves quite differently compared to that of -local privacy.
Organization of the paper
The rest of the paper is organized as follows. Section 2 describes a general technical tool for lower bounding the minimax risk with differential privacy constraint. This technical tool is then applied in Section 3 to the high-dimensional mean estimation problem. Both minimax lower bound and algorithm with matching upper bounds are obtained. Section 4 further applies the general lower bound technique to investigate the minimax lower bounds of the linear regression problem with differential privacy constraint, in both low-dimensional and high-dimensional settings. The upper bounds are also obtained by providing novel differentially private algorithms and analyzing their risks. The results together show that our bounds are rate-optimal up to logarithmic factors. Simulation studies are carried out in Section Section 5 to show the advantages of our proposed algorithms. Section 6 applies our algorithms to real data sets with potentially sensitive information that warrants privacy-preserving methods. Section 7 discusses extensions to other statistical estimation problems with privacy constraints. The proofs are given in Section 8 and the supplementary materials .
Definitions and notation
We conclude this section by introducing notations that will be used in the rest of the paper. For a positive integer , denotes the set . For a vector , we use and to denote the usual vector and norm, respectively, where the “norm” counts the number of nonzero entries in a vector. For a set , we use to denote its complement, and denotes the indicator function on . For any set and , let denote the -dimensional vector consisting of such that . The Frobenius norm of a matrix is denoted by , and the spectral norm of is . In addition, we use to denote the smallest and the largest eigenvalues of . For two sequences of positive numbers and , means that for some constant , for all . Similarly, means that for some constant , for all , and , if and . We use , , and to denote generic constants which may vary from place to place.
2 A General Lower Bound for Minimax Risk with Differential Privacy
Sections 2.1 and 2.2 present a general minimax lower bound technique for statistical estimation problems with differential privacy constraint. As an application, we use this technique to establish a tight lower bound for differentially private mean estimation in Section 2.3.
The lower bound technique is based on a tracing adversary that attempts to detect the presence of an individual data entry in a data set with the knowledge of an estimator computed from the data set. If one can construct a tracing adversary that is effective at this task given an accurate estimator, an argument by contradiction leads to a lower bound of the accuracy of differentially private estimators: suppose a differentially private estimator from a data set is sufficiently accurate, the tracing adversary will be able to determine the presence of an individual data entry in the data set, thus contradicting with the differential privacy guarantee. In other words, the privacy guarantee and the tracing adversary together ensure that a differentially private estimator cannot be “too accurate”.
This tracing adversary argument, originally proposed by , has proven to be a powerful tool for obtaining lower bounds in the context of releasing sample quantities [40, 41] and for Gaussian mean estimation . We shall refine the tracing adversary argument to formulate a minimax lower bound technique that will be applied to a series of statistical estimation problems later in the paper.
2.1 Background and problem formulation
Let denote a family of distributions supported on a set , and let denote a population quantity of interest. The statistician has access to a data set of i.i.d. samples, , drawn from a statistical model . We denote the empirical distribution over by .
With the data, our goal is to estimate a population parameter by an estimator that belongs to , the collection of all -differentially private procedures. The performance of is measured by its distance to the truth : formally, let be a metric induced by a norm on , namely , and let be a loss function that is monotonically increasing on , this paper studies the minimax risk for differentially-private estimation of the population parameter :
In this paper, the setting of privacy parameters are and . This is essentially the most-permissive setting under which -differential privacy is a nontrivial guarantee:  shows that is essentially the weakest privacy guarantee that is still meaningful.
2.2 Lower bound by tracing
Consider a tracing adversary that outputs IN if it determines that a certain sample is in the data set after seeing , and outputs OUT otherwise. We define , the index set of samples that are determined as IN by the adversary . A survey of tracing adversaries and their relationship with differential privacy can be found in  and the references therein.
Our general lower bound technique requires some regularity conditions for and : for every , we assume that there exists a with , such that for every , , and .
The star-convexity condition is a weaker assumption than convexity, which turns out to be true for many commonly seen distribution spaces such as sub-Gaussian distributions. A similar star-convexity condition has also been assumed in  when proving lower bounds with differential privacy constraints. The linearity condition is natural for the moment-based estimands, namely the mean vector or regression coefficient, that we study in this paper. For their technical roles, we refer interested readers to Section 8.1.
The following theorem shows that minimax lower bounds for statistical estimation problems with privacy constraint can be constructed if there exist effective tracing adversaries:
Assume that and satisfy the regularity conditions described above. Suppose there exist a distribution and a tracing adversary such that the following conditions are satisfied for every and with .
Soundness. for every , where is an adjacent dataset of with replaced by an independent copy .
Closure under resampling. The empirical distribution over , , belongs to with probability at least .
If and for some , then for and sufficiently large, we have
The detailed proof can be found in Section 8.1.
Completeness and soundness roughly correspond to “true positive” and “false positive” in classification: completeness requires the adversary to return some nontrivial result when its input is accurate relative to its non-private counterpart ; soundness guarantees that an individual is unlikely to be identified as IN if the estimator that used is independent of the individual. When a tracing adversary satisfies these properties, Theorem 2.1 conveniently leads to a minimax risk lower bound; that is, Theorem 2.1 is a reduction from constructing minimax risk lower bounds to finding complete and sound tracing adversaries. Closure under resampling is a technical condition needed for obtaining correct dependence on in the lower bounds. For the two classes of distributions that we shall consider in this paper, namely bounded distributions and sub-Gaussian() distributions, it is straightforward to find in these classes that satisfy the condition.
Let be the class of all -dimensional distributions with bounded support. The closure under resampling condition is satisfied by every : the empirical distribution of a sample drawn from remains bounded.
Let be the class of one-dimensional zero-mean sub-Gaussian () distributions. As long as the sample size , The closure under resampling condition is satisfied by every that is sub-Gaussian with parameter at most .
Let be a sub-Gaussian distribution and consider . By sub-Gaussianity, we have for each where is a universal constant. Then if By Chebyshev’s inequality, we have
as long as .
In this paper, we shall return to these examples to help us verify closure under resampling.
The connection between tracing attacks and minimax lower bounds has long been observed in differential privacy community [6, 19, 40]. While prior works on tracing attacks and lower bounds primarily focused on estimating -way marginals, Theorem 2.1 provides an abstract lower bound statement that is potentially applicable to more general statistical estimation problems.
In the next section, we apply the abstract formulation of Theorem 2.1 to the concrete setting of Gaussian mean estimation. The results, by comparison with existing lower bounds for Gaussian mean estimation in the differential privacy literature, demonstrates the utility of the general lower bound tool.
2.3 A first application: private mean estimation in the classical setting
Consider the -dimensional sub-Gaussian distribution family :
where is the mean of , and denotes the th standard basis vector of .
Following the notation introduced in Section 2.1, and . Further we take and , so that our risk function is simply the error. The minimax risk is then denoted by
To establish a lower bound, let be the product distribution supported on with the mean vector drawn from : for where , and
The associated tracing adversary is defined as
where is a fresh independent draw from .
The chosen and tracing adversary satisfy the properties needed to establish a lower bound.
If , for every and , we have
, where is an adjacent dataset of with replaced by .
belongs to with probability one.
Intuitively, this adversary is constructed as follows. Without privacy constraints, a natural estimator for is the sample mean . When does not belong to , is a sum of independent zero-mean random variables and we have . When belongs to , we will have , and is more likely to output IN than OUT.
In view of Theorem 2.1, and ; it follows that
Combining with the well-known statistical minimax lower bound (see for example, ), we arrive at the minimax lower bound result for differentially private mean estimation.
Let denote the collection of all -differentially private algorithms, and let be an i.i.d. sample drawn from . Suppose that , for some and , then for sufficiently large ,
The theorem is proved in Section A.1 of the supplementary materials .
The minimax lower bound characterizes the cost of privacy in the mean estimation problem: the cost of privacy dominates the statistical risk when .
This minimax lower bound matches the sample complexity lower bound in , which considered the deterministic worst case instead of the i.i.d. statistical setting.  studied the Gaussian mean estimation problem but did not obtain a tight bound with respect to . Theorem 2.2 improves the lower bound in  by , which is shown to be sharp up to by Theorem 3.3.
3 Privacy Cost of High-dimensional Mean Estimation
In this section and the subsequent Section 4, we consider the high-dimensional setting where and the population parameters of interest, such as the mean vector or the regression coefficient , are sparse. In each statistical problem investigated, we present a minimax risk lower bound with differential privacy constraint, as well as a procedure with differential privacy guarantee that attains the lower bound up to factor(s) of .
3.1 Private high-dimensional mean estimation
We first consider the problem of estimating the sparse mean vector of a -dimensional sub-Gaussian distribution, where can possibly be much larger than the sample size . We denote the parameter space of interest by
where the sparsity level is controlled by the parameter .
For the lower bound, we construct a distribution as follows:The mean vector is obtained by first generating an i.i.d. sample from distribution such that , keeping the top values, and setting the other coordinates to 0. Given the mean vector, follows a product distribution with each coordinate’s distribution defined as follows:
For sparse statistical models, the design and analysis of tracing adversaries is closely connected to the problem of differentially private selection, as we would like to characterize the difficulty of estimating the index set of non-zero coordinates of . For more background on differentially private selection, we refer interested readers to  and the references therein.
Let us consider the tracing adversary proposed by :
where is an independent draw from , and
Given computed from a data set , this tracing adversary attempts to identify whether an individual belongs to , by calculating the difference of and over those coordinates where has a large value. If belongs to , the former should be correlated with and is likely to be larger than the latter.
Formally, the tracing adversary is complete and sound under appropriate sample size constraint:
If , for every and , we have
, where is an adjacent data set of with replaced by .
belongs to with probability one.
The lemma is proved in Section B.3 of the supplementary materials .
In conjunction with our general lower bound result Theorem 2.1, we have
Let denote the collection of all -differentially private algorithms, and let be an i.i.d. sample drawn from . Suppose that , for some , , and , then for sufficiently large ,
The theorem is proved in Section A.3 of the supplementary materials .
The first term is the statistical minimax lower bound of sparse mean estimation (see, for example, ), and the second term is due to the privacy constraint. Comparing the two terms shows that, in high-dimensional sparse mean estimation, the cost of differential privacy is significant when
In the next section, we present a differentially private procedure that attains this convergence rate up to a logarithmic factor.
3.2 Rate-optimal procedures
The rate-optimal algorithms in this paper utilize some classical subroutines in the differential privacy literature, such as the Laplace and Gaussian mechanisms and reporting the noisy maximum of a vector. Before describing our rate-optimal algorithms in detail, it is helpful to review some relevant results, which will also serve as the building blocks of the differentially private linear regression methods in Section 4.
Basic differentially private procedures
It is frequently the case that differential privacy can be attained by adding properly scaled noises to the output of a non-private algorithm. Among the most prominent examples are the Laplace and Gaussian mechanisms.
The Laplace and Gaussian mechanisms
As the name suggests, the Laplace and Gaussian mechanisms achieve differential privacy by perturbing an algorithm with Laplace and Gaussian noises respectively. The scale of such noises is determined by the sensitivity of the algorithm:
For any algorithm mapping a dataset to , The -sensitivity of is
For algorithms with finite -sensitivity, the differential privacy guarantee can be attained by adding noises sampled from a Laplace distribution.
Lemma 3.2.1 (The Laplace mechanism ).
For any algorithm mapping a dataset to such that , the Laplace mechanism, given by
where is an i.i.d. sample drawn from , achieves -differential privacy.
Similarly, adding Gaussian noises to algorithms with finite -sensitivity guarantees differential privacy.
Lemma 3.2.2 (The Gaussian mechanism ).
For any algorithm mapping a dataset to such that , the Gaussian mechanism, given by
where is an i.i.d. sample drawn from , achieves -differential privacy.
An important application of these mechanisms is differentially private selection of the maximum/minimum, which also plays a crucial role in our high-dimensional mean estimation algorithm. Next we review some algorithms for differentially private selection, to provide some concrete examples and prepare us for stating the main algorithms.
Differentially private selection
Selecting the maximum (in absolute value) coordinate of is a straightforward application of the Laplace mechanism, as follows:
Algorithm 1: PrivateMax: 1:Sample . 2:For , compute the noisy version . 3:Return and , where is an independent draw from .
Lemma 3.2.3 ().
If , then PrivateMax is -differentially private.
In applications, we are often interested in finding the top- numbers with . There are two methods for this task: an iterative “Peeling” algorithm that runs the PrivateMax algorithm times, with appropriately chosen privacy parameters in each iteration.
Algorithm 2: Peeling: 1:Set . 2:for to do 3: Run PrivateMax to obtain . 4: Remove from . 5:end for 6:Report the selected pairs.
Lemma 3.2.4 ().
If , then Peeling is -differentially private.
With these differentially private selection subroutine, we are ready to present the high-dimensional mean estimation algorithm in the next section.
Differentially-private mean estimation in high dimensions
Let denote projection onto the ball of radius and centered at the origin in , where is a tuning parameter for the truncation level. With suitably chosen , the following algorithm attains the minimax lower bound in Theorem 3.1, up to at most a logarithmic factor in .
In view of Theorem 3.1, the theorem below shows that the high-dimensional mean estimation algorithm is rate-optimal up to a factor of .
For with , if Ã¯ÅÂ and , then Algorithm 3 is -differentially private, and
if there exists a constant such that , when ,
otherwise, with the choice of for a sufficiently large constant ,
The proof of this theorem can be found in Section C.2 of the supplementary materials .
The truncation parameter serves to control the sensitivity of the sample mean so that the Laplace/Gaussian mechanisms apply. can be replaced by a differentially-private estimator that consistently estimates the sample’s range (for example, the methods in  and ). We shall demonstrate the selection of in practice in Section 5.
Differentially private algorithms in the classical setting
In the classical setting of , the optimal rate of convergence of the mean estimation problem can be achieved simply by a noisy, truncated sample mean: given an i.i.d. sample , the estimator is defined as where denotes projection onto the ball of radius centered at the origin in , and is an independent draw from The theoretical guarantees for this estimator are summarized in the theorem below.
For an i.i.d. sample with satisfying , is an -differentially private procedure, and:
if there exists a constant such that , when ,
otherwise, with the choice of for a sufficiently large constant ,
4 Privacy Cost of Linear Regression
In this section, we investigate the cost of differential privacy in linear regression problems, with primary focus on the high-dimensional setting where and the regression coefficient is assumed to be sparse; the classical, low-dimensional case () will also be covered. Through the general lower bound technique described in Section 2, we are able to establish minimax lower bounds that match the minimax rate of our differentially private procedures up to factor(s) of .
4.1 Lower bound of high-dimensional linear regression
We consider the distribution space , defined as
where the parameter of interest is defined as Let denote the design matrix and denote the vector of response variables, then is the best linear approximation of . For the sparsity, we assume that there exists an index set with satisfying
For the lower bound, let us specify some .
We first generate an i.i.d. sample from distribution such that , keep the top values, and set the remaining coordinates to 0. The resulting vector is our choice of . We denote .
Let denote the standard basis vectors of , the rows of the design matrix are assumed to be independently drawn from a uniform mixture of two distributions:
, with probability and with probability for every .
, is drawn from a continuous distribution supported on the -ball with radius and centered at the origin in .
Next, we consider where
Let denote an i.i.d. sample drawn . We propose the tracing adversary
where , and is a fresh independent sample with covariates . That is,
This adversary satisfies the following properties:
If and , then for every satisfying and drawn from , we have
where , and .
, where is an adjacent data set of with replaced by .
belongs to with probability at least .
The lemma is proved in Section B.4 of the supplementary materials . We note that the extra assumption in Lemma 4.1.1 that can be gained “for free”: when it fails to hold, there would be an automatic lower bound that On the other hand, when , the general lower bound result in Theorem 2.1 is applicable, and we obtain the following lower bound result.
Let denote the collection of all -differentially private algorithms. Suppose that , for some , and , then for sufficiently large ,
Specifically, the second term in the lower bound is a consequence of Lemma 4.1.1 and Theorem 2.1. The first term is due to the statistical minimax lower bound for high-dimensional linear regression (see, for instance,  and ). The proof of this theorem is in Section A.4 of the supplementary materials .
4.2 Upper bound of high-dimensional linear regression
For high-dimensional sparse linear regression, we propose the following differentially private LASSO algorithm, which iterates the truncated gradient descent with random perturbation. .