The Cost of Privacy: Optimal Rates of Convergence for Parameter Estimation with Differential Privacy\thanksrefT1
Abstract
Privacypreserving data analysis is a rising challenge in contemporary statistics, as the privacy guarantees of statistical methods are often achieved at the expense of accuracy. In this paper, we investigate the tradeoff between statistical accuracy and privacy in mean estimation and linear regression, under both the classical lowdimensional and modern highdimensional settings. A primary focus is to establish minimax optimality for statistical estimation with the differential privacy constraint. To this end, we find that classical lower bound arguments fail to yield sharp results, and new technical tools are called for.
By refining the “tracing adversary” technique for lower bounds in the theoretical computer science literature, we formulate a general lower bound argument for minimax risks with differential privacy constraints, and apply this argument to highdimensional mean estimation and linear regression problems. We also design computationally efficient algorithms that attain the minimax lower bounds up to a logarithmic factor. In particular, for the highdimensional linear regression, a novel private iterative hard thresholding pursuit algorithm is proposed, based on a privately truncated version of stochastic gradient descent. The numerical performance of these algorithms is demonstrated by simulation studies and applications to real data containing sensitive information, for which privacypreserving statistical methods are necessary.
arXiv:0000.0000 \startlocaldefs \endlocaldefs \zxrsetuptoltxlabel=true, tozreflabel=true \zexternaldocument*Appendix_201908
The Cost of Privacy \thankstextT1The research was supported in part by NSF grant DMS1712735 and NIH grants R01GM129781 and R01GM123056.
, , and label=u2]URL: https://linjunz.github.io/
class=MSC] \kwd[Primary ]62F30 \kwd[; secondary ]62F12\kwd62J05
Highdimensional data; Differential privacy; Mean estimation; Linear regression; Minimax optimality
1 Introduction
With the unprecedented availability of datasets containing sensitive personal information, there are increasing concerns that statistical analysis of such datasets may compromise individual privacy. These concerns give rise to statistical methods that provide privacy guarantees at the cost of statistical accuracy, but there has been very limited understanding of the optimal tradeoff between accuracy and privacy in many important problems that we shall investigate in the paper.
A rigorous definition of privacy is a prerequisite for such an understanding. Differential privacy, introduced in Dwork et al. [16], is arguably the most widely adopted definition of privacy in statistical data analysis. The promise of a differentially private algorithm is protection of any individual’s privacy from an adversary who has access to the algorithm output and even sometimes the rest of the data.
Differential privacy has gained significant attention in the machine learning communities over the past few years [17, 1, 20, 14] and found its way into real world applications developed by Google [22], Apple [10], Microsoft [11], and the U.S. Census Bureau [2].
A usual approach to developing differentially private algorithms is perturbing the output of nonprivate algorithms by random noise. When the observations are continuous, differential privacy can be guaranteed by adding Laplace/Gaussian noise to the nonprivate output [17]. For discrete data, differential privacy can be achieved by adding Gumbel noise to utility score functions (also known as the exponential mechanism [32]). Naturally, the processed output suffers from some loss of accuracy, which has been observed and studied in the literature, see, for example, Wasserman and Zhou [44], Smith [39], Lei [30], Bassily et al. [4], Dwork et al. [18]. The goal of this paper is to provide a quantitative characterization of the tradeoff between differential privacy guarantees and statistical accuracy, under the statistical minimax framework. Specifically, we consider this problem for mean estimation and linear regression models in both classical and highdimensional settings with differential privacy constraint, which is formally defined as follows.
Definition 1 (Differential Privacy [16]).
A randomized algorithm is differentially private if and only if for every pair of adjacent datasets and , and for any set ,
where we say two datasets and are adjacent if and only if
According to the definition, the two parameters and control the level of privacy against an adversary who attempts to detect the presence of a certain subject in the sample. Roughly speaking, is an upper bound on the amount of influence an individual’s record has on the information released and is the probability that this bound fails to hold, so the privacy constraint becomes more stringent as tend to .
Our contributions
Sharp lower bounds based on tracing attacks: We establish the necessary cost of privacy by providing sharp minimax risk lower bounds, with the differential privacy constraint, for highdimensional sparse mean estimation and linear regression under both lowdimensional and highdimensional settings. To this end, a key technical tool is presented in Theorem 2.1, which reduces lower bounding the minimax risk to designing a tracing adversary that aims to detect the presence of an individual in a dataset using the output of a differentially private statistic computed from the dataset. This reduction to tracing adversaries was first introduced by [6] and [40] to provide sample complexity lower bounds for attaining a given level of insample accuracy and applied to establishing lower bounds for estimating population quantities such as discrete and Gaussian mean vectors (see also [19, 27]). Compared to these existing lower bounds, as shown in Section 2.3, our refinement leads to tight dependence on the parameter.
Rateoptimal differentially private algorithms: For mean estimation and linear regression problems, we construct efficient algorithms that attain matching upper bounds up to logarithmic factors. The highdimensional mean estimation algorithm is based on several differentially private subroutines, such as the Gaussian mechanism, reporting noisy top, and their modifications. For highdimensional linear regression, we propose a novel private iterative hard thresholding pursuit algorithm, based on a privately truncated version of stochastic gradient descent. Such a private truncation step effectively enforces the sparsity of the resulting estimator and leads to optimal control of the privacy cost (see more details in Section 4.2). To the best of our knowledge, these algorithms are the first results achieving the minimax optimal rates of convergence in highdimensional statistical estimation problems with the differential privacy guarantee.
Related literature
In theoretical computer science, [39] showed that under strong conditions on privacy parameters, some point estimators attain the statistical convergence rates and hence privacy can be gained for free. [4, 18, 42] proposed differentially private algorithms for convex empirical risk minimization, principal component analysis, and highdimensional sparse regression
The tracing adversary argument for lower bounds originated in the theoretical computer science literature [6, 40]. Early works in this direction were primarily concerned with the accuracy of releasing insample quantities, such as way marginals, with differential privacy constraints. Some more recent works [19, 27] applied the idea to obtain lower bounds for estimating population quantities such as discrete and Gaussian mean vectors. We shall compare existing lower bounds for Gaussian mean estimation with our results in Section 2.3.
In the statistics literature, there has also been a series of works that studied differential privacy in the context of statistical estimation. [44] observed that, local differentially private schemes seem to yield slower convergence rates than the optimal minimax rates in general; [13] developed a framework for statistical minimax rates with the local privacy constraint; in addition, [38] showed minimax optimal rates of convergence under local differential privacy and exhibited a mechanism that is minimax optimal for ÂÂnearly linear functionals based on randomized response. However, local privacy is a much stronger notion of privacy than differential privacy that is hardly compatible with highdimensional problems [13]. As we shall see in this paper, the cost of differential privacy in statistical estimation behaves quite differently compared to that of local privacy.
Organization of the paper
The rest of the paper is organized as follows. Section 2 describes a general technical tool for lower bounding the minimax risk with differential privacy constraint. This technical tool is then applied in Section 3 to the highdimensional mean estimation problem. Both minimax lower bound and algorithm with matching upper bounds are obtained. Section 4 further applies the general lower bound technique to investigate the minimax lower bounds of the linear regression problem with differential privacy constraint, in both lowdimensional and highdimensional settings. The upper bounds are also obtained by providing novel differentially private algorithms and analyzing their risks. The results together show that our bounds are rateoptimal up to logarithmic factors. Simulation studies are carried out in Section Section 5 to show the advantages of our proposed algorithms. Section 6 applies our algorithms to real data sets with potentially sensitive information that warrants privacypreserving methods. Section 7 discusses extensions to other statistical estimation problems with privacy constraints. The proofs are given in Section 8 and the supplementary materials [8].
Definitions and notation
We conclude this section by introducing notations that will be used in the rest of the paper. For a positive integer , denotes the set . For a vector , we use and to denote the usual vector and norm, respectively, where the “norm” counts the number of nonzero entries in a vector. For a set , we use to denote its complement, and denotes the indicator function on . For any set and , let denote the dimensional vector consisting of such that . The Frobenius norm of a matrix is denoted by , and the spectral norm of is . In addition, we use to denote the smallest and the largest eigenvalues of . For two sequences of positive numbers and , means that for some constant , for all . Similarly, means that for some constant , for all , and , if and . We use , , and to denote generic constants which may vary from place to place.
2 A General Lower Bound for Minimax Risk with Differential Privacy
Sections 2.1 and 2.2 present a general minimax lower bound technique for statistical estimation problems with differential privacy constraint. As an application, we use this technique to establish a tight lower bound for differentially private mean estimation in Section 2.3.
The lower bound technique is based on a tracing adversary that attempts to detect the presence of an individual data entry in a data set with the knowledge of an estimator computed from the data set. If one can construct a tracing adversary that is effective at this task given an accurate estimator, an argument by contradiction leads to a lower bound of the accuracy of differentially private estimators: suppose a differentially private estimator from a data set is sufficiently accurate, the tracing adversary will be able to determine the presence of an individual data entry in the data set, thus contradicting with the differential privacy guarantee. In other words, the privacy guarantee and the tracing adversary together ensure that a differentially private estimator cannot be “too accurate”.
This tracing adversary argument, originally proposed by [6], has proven to be a powerful tool for obtaining lower bounds in the context of releasing sample quantities [40, 41] and for Gaussian mean estimation [27]. We shall refine the tracing adversary argument to formulate a minimax lower bound technique that will be applied to a series of statistical estimation problems later in the paper.
2.1 Background and problem formulation
Let denote a family of distributions supported on a set , and let denote a population quantity of interest. The statistician has access to a data set of i.i.d. samples, , drawn from a statistical model . We denote the empirical distribution over by .
With the data, our goal is to estimate a population parameter by an estimator that belongs to , the collection of all differentially private procedures. The performance of is measured by its distance to the truth : formally, let be a metric induced by a norm on , namely , and let be a loss function that is monotonically increasing on , this paper studies the minimax risk for differentiallyprivate estimation of the population parameter :
In this paper, the setting of privacy parameters are and . This is essentially the mostpermissive setting under which differential privacy is a nontrivial guarantee: [40] shows that is essentially the weakest privacy guarantee that is still meaningful.
2.2 Lower bound by tracing
Consider a tracing adversary that outputs IN if it determines that a certain sample is in the data set after seeing , and outputs OUT otherwise. We define , the index set of samples that are determined as IN by the adversary . A survey of tracing adversaries and their relationship with differential privacy can be found in [20] and the references therein.
Our general lower bound technique requires some regularity conditions for and : for every , we assume that there exists a with , such that for every , , and .
The starconvexity condition is a weaker assumption than convexity, which turns out to be true for many commonly seen distribution spaces such as subGaussian distributions. A similar starconvexity condition has also been assumed in [3] when proving lower bounds with differential privacy constraints. The linearity condition is natural for the momentbased estimands, namely the mean vector or regression coefficient, that we study in this paper. For their technical roles, we refer interested readers to Section 8.1.
The following theorem shows that minimax lower bounds for statistical estimation problems with privacy constraint can be constructed if there exist effective tracing adversaries:
Theorem 2.1.
Assume that and satisfy the regularity conditions described above. Suppose there exist a distribution and a tracing adversary such that the following conditions are satisfied for every and with .

Completeness.

Soundness. for every , where is an adjacent dataset of with replaced by an independent copy .

Closure under resampling. The empirical distribution over , , belongs to with probability at least .
If and for some , then for and sufficiently large, we have
The detailed proof can be found in Section 8.1.
Completeness and soundness roughly correspond to “true positive” and “false positive” in classification: completeness requires the adversary to return some nontrivial result when its input is accurate relative to its nonprivate counterpart ; soundness guarantees that an individual is unlikely to be identified as IN if the estimator that used is independent of the individual. When a tracing adversary satisfies these properties, Theorem 2.1 conveniently leads to a minimax risk lower bound; that is, Theorem 2.1 is a reduction from constructing minimax risk lower bounds to finding complete and sound tracing adversaries. Closure under resampling is a technical condition needed for obtaining correct dependence on in the lower bounds. For the two classes of distributions that we shall consider in this paper, namely bounded distributions and subGaussian() distributions, it is straightforward to find in these classes that satisfy the condition.
Example 1.
Let be the class of all dimensional distributions with bounded support. The closure under resampling condition is satisfied by every : the empirical distribution of a sample drawn from remains bounded.
Example 2.
Let be the class of onedimensional zeromean subGaussian () distributions. As long as the sample size , The closure under resampling condition is satisfied by every that is subGaussian with parameter at most .
Let be a subGaussian distribution and consider . By subGaussianity, we have for each where is a universal constant. Then if By Chebyshev’s inequality, we have
as long as .
In this paper, we shall return to these examples to help us verify closure under resampling.
The connection between tracing attacks and minimax lower bounds has long been observed in differential privacy community [6, 19, 40]. While prior works on tracing attacks and lower bounds primarily focused on estimating way marginals, Theorem 2.1 provides an abstract lower bound statement that is potentially applicable to more general statistical estimation problems.
In the next section, we apply the abstract formulation of Theorem 2.1 to the concrete setting of Gaussian mean estimation. The results, by comparison with existing lower bounds for Gaussian mean estimation in the differential privacy literature, demonstrates the utility of the general lower bound tool.
2.3 A first application: private mean estimation in the classical setting
Consider the dimensional subGaussian distribution family :
where is the mean of , and denotes the th standard basis vector of .
Following the notation introduced in Section 2.1, and . Further we take and , so that our risk function is simply the error. The minimax risk is then denoted by
To establish a lower bound, let be the product distribution supported on with the mean vector drawn from : for where , and
The associated tracing adversary is defined as
where is a fresh independent draw from .
The chosen and tracing adversary satisfy the properties needed to establish a lower bound.
Lemma 2.3.1.
If , for every and , we have


, where is an adjacent dataset of with replaced by .

belongs to with probability one.
The third point can be seen from the boundedness of and Example 1. This lemma is proved in Section B.1 of the supplementary materials [8].
Intuitively, this adversary is constructed as follows. Without privacy constraints, a natural estimator for is the sample mean . When does not belong to , is a sum of independent zeromean random variables and we have . When belongs to , we will have , and is more likely to output IN than OUT.
In view of Theorem 2.1, and ; it follows that
Combining with the wellknown statistical minimax lower bound (see for example, [29]), we arrive at the minimax lower bound result for differentially private mean estimation.
Theorem 2.2.
Let denote the collection of all differentially private algorithms, and let be an i.i.d. sample drawn from . Suppose that , for some and , then for sufficiently large ,
The theorem is proved in Section A.1 of the supplementary materials [8].
Remark 1.
The minimax lower bound characterizes the cost of privacy in the mean estimation problem: the cost of privacy dominates the statistical risk when .
Remark 2.
This minimax lower bound matches the sample complexity lower bound in [40], which considered the deterministic worst case instead of the i.i.d. statistical setting. [27] studied the Gaussian mean estimation problem but did not obtain a tight bound with respect to . Theorem 2.2 improves the lower bound in [27] by , which is shown to be sharp up to by Theorem 3.3.
3 Privacy Cost of Highdimensional Mean Estimation
In this section and the subsequent Section 4, we consider the highdimensional setting where and the population parameters of interest, such as the mean vector or the regression coefficient , are sparse. In each statistical problem investigated, we present a minimax risk lower bound with differential privacy constraint, as well as a procedure with differential privacy guarantee that attains the lower bound up to factor(s) of .
3.1 Private highdimensional mean estimation
We first consider the problem of estimating the sparse mean vector of a dimensional subGaussian distribution, where can possibly be much larger than the sample size . We denote the parameter space of interest by
where the sparsity level is controlled by the parameter .
For the lower bound, we construct a distribution as follows:The mean vector is obtained by first generating an i.i.d. sample from distribution such that , keeping the top values, and setting the other coordinates to 0. Given the mean vector, follows a product distribution with each coordinate’s distribution defined as follows:
For sparse statistical models, the design and analysis of tracing adversaries is closely connected to the problem of differentially private selection, as we would like to characterize the difficulty of estimating the index set of nonzero coordinates of . For more background on differentially private selection, we refer interested readers to [41] and the references therein.
Given computed from a data set , this tracing adversary attempts to identify whether an individual belongs to , by calculating the difference of and over those coordinates where has a large value. If belongs to , the former should be correlated with and is likely to be larger than the latter.
Formally, the tracing adversary is complete and sound under appropriate sample size constraint:
Lemma 3.1.1.
If , for every and , we have


, where is an adjacent data set of with replaced by .

belongs to with probability one.
The lemma is proved in Section B.3 of the supplementary materials [8].
In conjunction with our general lower bound result Theorem 2.1, we have
Theorem 3.1.
Let denote the collection of all differentially private algorithms, and let be an i.i.d. sample drawn from . Suppose that , for some , , and , then for sufficiently large ,
The theorem is proved in Section A.3 of the supplementary materials [8].
The first term is the statistical minimax lower bound of sparse mean estimation (see, for example, [26]), and the second term is due to the privacy constraint. Comparing the two terms shows that, in highdimensional sparse mean estimation, the cost of differential privacy is significant when
In the next section, we present a differentially private procedure that attains this convergence rate up to a logarithmic factor.
3.2 Rateoptimal procedures
The rateoptimal algorithms in this paper utilize some classical subroutines in the differential privacy literature, such as the Laplace and Gaussian mechanisms and reporting the noisy maximum of a vector. Before describing our rateoptimal algorithms in detail, it is helpful to review some relevant results, which will also serve as the building blocks of the differentially private linear regression methods in Section 4.
Basic differentially private procedures
It is frequently the case that differential privacy can be attained by adding properly scaled noises to the output of a nonprivate algorithm. Among the most prominent examples are the Laplace and Gaussian mechanisms.
The Laplace and Gaussian mechanisms
As the name suggests, the Laplace and Gaussian mechanisms achieve differential privacy by perturbing an algorithm with Laplace and Gaussian noises respectively. The scale of such noises is determined by the sensitivity of the algorithm:
Definition 2.
For any algorithm mapping a dataset to , The sensitivity of is
For algorithms with finite sensitivity, the differential privacy guarantee can be attained by adding noises sampled from a Laplace distribution.
Lemma 3.2.1 (The Laplace mechanism [17]).
For any algorithm mapping a dataset to such that , the Laplace mechanism, given by
where is an i.i.d. sample drawn from , achieves differential privacy.
Similarly, adding Gaussian noises to algorithms with finite sensitivity guarantees differential privacy.
Lemma 3.2.2 (The Gaussian mechanism [17]).
For any algorithm mapping a dataset to such that , the Gaussian mechanism, given by
where is an i.i.d. sample drawn from , achieves differential privacy.
An important application of these mechanisms is differentially private selection of the maximum/minimum, which also plays a crucial role in our highdimensional mean estimation algorithm. Next we review some algorithms for differentially private selection, to provide some concrete examples and prepare us for stating the main algorithms.
Differentially private selection
Selecting the maximum (in absolute value) coordinate of is a straightforward application of the Laplace mechanism, as follows:
Algorithm 1: PrivateMax: 1:Sample . 2:For , compute the noisy version . 3:Return and , where is an independent draw from .
Lemma 3.2.3 ([21]).
If , then PrivateMax is differentially private.
In applications, we are often interested in finding the top numbers with . There are two methods for this task: an iterative “Peeling” algorithm that runs the PrivateMax algorithm times, with appropriately chosen privacy parameters in each iteration.
Algorithm 2: Peeling: 1:Set . 2:for to do 3: Run PrivateMax to obtain . 4: Remove from . 5:end for 6:Report the selected pairs.
Lemma 3.2.4 ([21]).
If , then Peeling is differentially private.
With these differentially private selection subroutine, we are ready to present the highdimensional mean estimation algorithm in the next section.
Differentiallyprivate mean estimation in high dimensions
Let denote projection onto the ball of radius and centered at the origin in , where is a tuning parameter for the truncation level. With suitably chosen , the following algorithm attains the minimax lower bound in Theorem 3.1, up to at most a logarithmic factor in .
Algorithm 3: Private Highdimensional Mean Estimation
1:Compute
2:Find the top components of by running Peeling and set the remaining components to 0. Denote the resulting vector by .
3:Return
In view of Theorem 3.1, the theorem below shows that the highdimensional mean estimation algorithm is rateoptimal up to a factor of .
Theorem 3.2.
For with , if Ã¯ÅÂ and , then Algorithm 3 is differentially private, and

if there exists a constant such that , when ,

otherwise, with the choice of for a sufficiently large constant ,
The proof of this theorem can be found in Section C.2 of the supplementary materials [8].
Remark 3.
Remark 4.
The truncation parameter serves to control the sensitivity of the sample mean so that the Laplace/Gaussian mechanisms apply. can be replaced by a differentiallyprivate estimator that consistently estimates the sample’s range (for example, the methods in [30] and [28]). We shall demonstrate the selection of in practice in Section 5.
Differentially private algorithms in the classical setting
In the classical setting of , the optimal rate of convergence of the mean estimation problem can be achieved simply by a noisy, truncated sample mean: given an i.i.d. sample , the estimator is defined as where denotes projection onto the ball of radius centered at the origin in , and is an independent draw from The theoretical guarantees for this estimator are summarized in the theorem below.
Theorem 3.3.
For an i.i.d. sample with satisfying , is an differentially private procedure, and:

if there exists a constant such that , when ,

otherwise, with the choice of for a sufficiently large constant ,
4 Privacy Cost of Linear Regression
In this section, we investigate the cost of differential privacy in linear regression problems, with primary focus on the highdimensional setting where and the regression coefficient is assumed to be sparse; the classical, lowdimensional case () will also be covered. Through the general lower bound technique described in Section 2, we are able to establish minimax lower bounds that match the minimax rate of our differentially private procedures up to factor(s) of .
4.1 Lower bound of highdimensional linear regression
We consider the distribution space , defined as
where the parameter of interest is defined as Let denote the design matrix and denote the vector of response variables, then is the best linear approximation of . For the sparsity, we assume that there exists an index set with satisfying
For the lower bound, let us specify some .
We first generate an i.i.d. sample from distribution such that , keep the top values, and set the remaining coordinates to 0. The resulting vector is our choice of . We denote .
Let denote the standard basis vectors of , the rows of the design matrix are assumed to be independently drawn from a uniform mixture of two distributions:

, with probability and with probability for every .

, is drawn from a continuous distribution supported on the ball with radius and centered at the origin in .
Next, we consider where
Let denote an i.i.d. sample drawn . We propose the tracing adversary
where , and is a fresh independent sample with covariates . That is,
This adversary satisfies the following properties:
Lemma 4.1.1.
If and , then for every satisfying and drawn from , we have

where , and .

, where is an adjacent data set of with replaced by .

belongs to with probability at least .
The lemma is proved in Section B.4 of the supplementary materials [8]. We note that the extra assumption in Lemma 4.1.1 that can be gained “for free”: when it fails to hold, there would be an automatic lower bound that On the other hand, when , the general lower bound result in Theorem 2.1 is applicable, and we obtain the following lower bound result.
Theorem 4.1.
Let denote the collection of all differentially private algorithms. Suppose that , for some , and , then for sufficiently large ,
Specifically, the second term in the lower bound is a consequence of Lemma 4.1.1 and Theorem 2.1. The first term is due to the statistical minimax lower bound for highdimensional linear regression (see, for instance, [37] and [45]). The proof of this theorem is in Section A.4 of the supplementary materials [8].
4.2 Upper bound of highdimensional linear regression
For highdimensional sparse linear regression, we propose the following differentially private LASSO algorithm, which iterates the truncated gradient descent with random perturbation. .
Algorithm 4: Differentially Private LASSO 1:Inputs: privacy parameters deign matrix ,response vector , step size , sparsity tuning parameter , truncation tuning parameter and the number of iterations . 2:Initialize the algorithm with an sparse vector . 3:for do 4: