Differentially Private Empirical Risk Minimization with Input Perturbation
Abstract
We propose a novel framework for the differentially private ERM, input perturbation. Existing differentially private ERM implicitly assumed that the data contributors submit their private data to a database expecting that the database invokes a differentially private mechanism for publication of the learned model. In input perturbation, each data contributor independently randomizes her/his data by itself and submits the perturbed data to the database. We show that the input perturbation framework theoretically guarantees that the model learned with the randomized data eventually satisfies differential privacy with the prescribed privacy parameters. At the same time, input perturbation guarantees that local differential privacy is guaranteed to the server. We also show that the excess risk bound of the model learned with input perturbation is under a certain condition, where is the sample size. This is the same as the excess risk bound of the stateoftheart.
1 Introduction
In recent years, differential privacy has become widely recognized as a theoretical definition for output privacy (Dwork et al., 2006b). Let us suppose a database collects private information from data contributors. Analysts can submit queries to learn knowledge from the database. Queryanswering algorithms that satisfy differential privacy return responses such that the distribution of outputs does not change significantly and is independent of whether the database contains particular private information submitted by any single data contributor. Based on this idea, a great deal of effort has been devoted to guaranteeing differential privacy for various problems. For example, there are algorithms for privacypreserving classification (Jain and Thakurta, 2014), regression (Lei, 2011), etc.
Differentially private empirical risk minimization (ERM), or more generally, differentially private convex optimization, has attracted a great deal of research interest in machine learning, for example, (Chaudhuri et al., 2011; Kifer et al., 2012; Jain and Thakurta, 2014; Bassily et al., 2014). These works basically follow the standard setting of differentially private mechanisms; the database collects examples and builds a model with the collected examples so that the released model satisfies differential privacy.
Recently, the data collection process is also recognized as an important step in privacy preservation. With this motivation, a local privacy was introduced as a privacy notion in the data collection process (Wainwright et al., 2012; Duchi et al., 2013; Kairouz et al., 2014). However, the existing methods of differentially private ERM are specifically derived for satisfying differential privacy of the released model, and thus there is no guarantee for the local privacy.
In this work, we aim to preserve the local privacy of the data and the differential privacy of the released model simultaneously in the setting of releasing the model constructed by ERM. The goal of this paper is to derive a differentially private mechanism with an utility guarantee, at the same time, the mechanism satisfies the local privacy in the data collection process.
Method  Perturbation  Privacy  Utility  Additional 
requirements  
Objective (Chaudhuri et al., 2011; Kifer et al., 2012)  obj. func.  DP for model  smooth  
Gradient Descent (Bassily et al., 2014)  grad.  DP for model  
Input (proposal)  example  DP for model  
DLP for data  
s.t.  smooth  
quadratic loss 
Related Work. Chaudhuri et al. Chaudhuri et al. (2011) formulated the problem of differentially private empirical risk minimization (ERM) and presented two different approaches: output perturbation and objective perturbation. Kifer et al. Kifer et al. (2012) improved the utility of objective perturbation by adding an extra regularizer into the objective function. Moreover, they introduced a variant of objective perturbation that employs Gaussian distribution for the random linear term, which improves dimensional dependency from to whereas the satisfying privacy is relaxed from differential privacy to differential privacy (Table 1, line 1). Objective perturbation is work well for smooth losses, whereas Bassily et al. Bassily et al. (2014) proved that it is suboptimal for nonsmooth losses. They developed the optimal algorithm of differentially private ERM, named differentially private gradient descent. It conducts the stochastic gradient decent where the gradient is perturbed by adding a Gaussian noise. They showed that the expected empirical excess risk of the differentially private gradient descent is optimal up to multiplicative factor of and even for nonsmooth losses (Table 1, line 2). They also provides the optimal mechanisms that satisfy differential privacy for strong and nonstrong convex losses. Jain et al. Jain and Thakurta (2014) showed that for the specific applications, the dimensional dependency of the excess risk can be improved from polynomic to constant or logarithmic. These studies assume that the database collects raw data from the data contributors, and so no attention has been paid to the data collection phase.
Recently, a new privacy notion referred to as local privacy (Wainwright et al., 2012; Duchi et al., 2013; Kairouz et al., 2014) has been presented. In these studies, data are drawn from a distribution by each contributor independently and communicated to the data collector via a noisy channel; local privacy is a privacy notion that ensures that data cannot be accurately estimated from individual privatized data. Duchi et al. (2013) has introduced a private convex optimization mechanism that satisfies the local privacy. Their method has guarantee of differential privacy for the model, whereas its privacy level is same as the differential local privacy.
Our Contribution. In this study, we propose a novel framework for the differentially private ERM, input perturbation (Table 1, line 3). In contrast to the existing methods, input perturbation allows data contributors to take part in the process of privacy preservation of model learning. The mechanism of input perturbation is quite simple: each data contributor independently randomizes her/his data with a Gaussian distribution, in which the noise variance is determined by a function of privacy parameters , sample size , and some constants related to the loss function.
In this paper, we prove that models learned with randomized examples following our input perturbation scheme are guaranteed to satisfy differential privacy under some conditions, especially, differential privacy if (Table 1, line 3, column 3). The guarantee of differential privacy is proved using the fact that the difference between the objective function of input perturbation and that of objective perturbation is probabilistically bounded. To achieve this approximation with randomization by independent data contributors, input perturbation requires that the loss function be quadratic with respect to the model parameter, (Table 1, line 3, column 5).
From the perspective of data contributors, data collection with input perturbation satisfies the local privacy with the privacy parameter where (Table 1, line 3, column 3). In the input perturbation framework, not only differential privacy of the learned models, but also privacy protection of data against the database is attained. From this perspective, we theoretically and empirically investigate the influence of input perturbation on the excess risk.
We compared the utility analysis of input perturbation with those of the output and objective perturbation methods in terms of the expectation of the excess empirical risk. We show that the excess risk of the model learned with input perturbation is (Table 1, line 3, column 4). If , the utility and the privacy guarantee of the model are equivalent to that of objective perturbation.
All proofs defer to the full version of this paper due to space limitation.
2 Problem Definition and Preliminary
Let be the domain of examples. The objective of supervised prediction is to learn a parameter on a closed convex domain from a collection of given examples , where parametrizes a predictor that outputs from . Let be a loss function. Learning algorithms following the empirical risk minimization principle choose the model that minimizes the empirical risk:
(1) 
where is a convex regularizer. We suppose that the following assumptions hold throughout this paper: 1) is bounded, i.e., there is s.t. for all , 2) is doubly continuously differentiable w.r.t. , 3) is Lipschitz, i.e., for any and , and 4) is smooth, i.e., for any and where is the matrix norm.
Three stakeholders appear in the problem we consider: data contributors, database, and model user. Each data contributor owns a single example . The goal is that the model user obtains the model learned by ERM, at the same time, privacy of the data contributors is ensured against the database and the model user. Let us consider the following process of data collection and model learning.

All the stakeholders reach an agreement on the privacy parameters before data collection

Each data contributor independently perturbs its own example and sends it to the database

The database conducts model learning at the request of the model user with the collected perturbed examples and publishes the model
Note that once a data contributor sends her perturbed example to the database, she can no longer interact with the database. This setting is suitable for real use, for example, if the data contributors sends their own data to the database via their smartphones, the database is difficult to always interact with the data contributors due to instability of internet connection. In this process, the privacy concerns arise at two occasions; when the data contributors release their own data to the database (data privacy), and when the database publishes the learned model to the model user (model privacy).
Model privacy. The model privacy is preserved by guaranteeing the differential privacy. It is a privacy definition of a randomization mechanism which is a stochastic mapping from a set of examples to an output on an arbitrary domain . Given two databases and , we say and are neighbor databases, or , if two databases differ in at most one element. Then, differential privacy is defined as follows:
Definition 1 (differential privacy (Dwork et al., 2006a)).
A randomization mechanism is differential privacy, if, for all pairs s.t. and for any subset of ranges ,
(2) 
Data privacy. For the definition of the data privacy, we introduce the differential local privacy (Wainwright et al., 2012; Duchi et al., 2013; Kairouz et al., 2014). Because of the data collection and model learning process, the noninteractive case of the local privacy should be considered, where in this case, individuals release his/her private data without seeing the other individuals’ private data. Under the noninteractive setting, the differential local privacy is defined as follows.
Definition 2 (differential local privacy (Wainwright et al., 2012; Evfimievski et al., 2003; Kasiviswanathan et al., 2011)).
A randomization mechanism is differentially locally private, if, for all pairs s.t. and for any subset of ranges ,
(3) 
Utility. To assess utility, we use the empirical excess risk. Let . Given a randomization mechanism that (randomly) outputs over , the empirical excess risk of is defined as .
3 Input Perturbation
In this section, we introduce a novel framework for differentially private ERM. The objective of the input perturbation framework is threefold:

(data privacy) The released data from the data contributors to the database satisfies differentially locally private,

(model privacy) The model resulted from the process eventually meets differentially private,

(utility) The expectation of the excess empirical risk of the resulting models is , which is equivalent to that obtained with nonprivacypreserving model learning.
Furthermore, we show that by adjusting the noise variance that the input perturbation injects, the input perturbation satisfies differential privacy and differential local privacy with the excess empirical risk where .
3.1 Loss Function for Input Perturbation
The strategy of input perturbation is to minimize a function that is close to the objective function of the objective perturbation method. The requirements on the loss and objective function thus basically follow the objective perturbation method with the Gaussian noise (Kifer et al., 2012). Input perturbation allows any (possibly nondifferential) convex regularizer as supported by objective perturbation. However, for simplicity, we consider the nonregularized case where .
In addition to the requirements from the objective perturbation, input perturbation requires a restriction; the loss function is quadratic in . Let and be dimensional vectors and be a scalar. Then, our quadratic loss function has a form:
3.2 Input Perturbation Method
In this subsection, we introduce the input perturbation method. Algorithm 1 describes the detail of input perturbation; Algorithm 2 describes model learning with examples randomized with input perturbation. In Algorithm 1, each data contributor transforms owing example into , where . Then, she adds perturbation to in Step 3. We denote the example after perturbation by , which is submitted to the database independently by each data contributors.
In Algorithm 2, the database collects the perturbed examples from the data contributors. Then, the database learns a model with these randomized examples by minimizing
(4) 
In the following subsections, we show the privacy guarantee of the input perturbation in the sense of the differential local privacy and the differential privacy. The utility analysis of models obtained following the input perturbation framework is also shown.
3.3 Privacy of Input Perturbation
In this subsection, we analyze the privacy of the input perturbation in the sense of the data privacy and the model privacy.
Data privacy of input perturbation. In Algorithm 1, each data contributor of the input perturbation adds a Gaussian noise into the released data. Adding a Gaussian noise into the released data satisfies differential local privacy as well as the Gaussian mechanism (Dwork et al., 2014). As a result, we get the following corollary that shows the level of the differential local privacy of Algorithm 1.
Corollary 1.
Since we have as , Algorithm 1 is differentially locally private.
Model privacy of input perturbation. The following theorem states the guarantee of differential privacy of models that the database learns from examples randomized by the input perturbation scheme.
Theorem 1.
The main idea of the proof is that the objective function of the input perturbation scheme holds the same linear perturbation term as that of objective perturbation. The objective function of input perturbation in Eq. 4 is rearranged as:
(5) 
where and . The derivation can be found in the proof of Theorem 1. In the linear term, forms a random vector generated from , which is exactly the same as the random linear regularization term introduced in the objective perturbation method. By noting that and , the objective function of input perturbation is equivalent to that of objective perturbation with an infinitely large number of samples.
For guarantee of differential privacy with a finite number of samples , we use the following probabilistic bound of .
Lemma 1.
Let , where . Let . Then, for any , with probability at least , we get the following bound:
where
The proof can be found in Lemma 1. This bound shows how generated with samples is distant from . Setting as in Algorithm 1, we can get w.p. . Thus, the output of input perturbation guarantees differential privacy w.p. . The proof of Theorem 1 is obtained by incorporating the probabilistic bound on into the privacy proof of (Kifer et al., 2012).
3.4 Utility Analysis
The following theorem shows the excess empirical error bound of the model learned by input perturbation:
Lemma 2.
Let be the output of Algorithm 2. If and examples are randomized by Algorithm 1, w.p. at least the bound of the excess empirical risk is
(6) 
In the right side of the bound, the first two terms of are the same as the excess empirical risk of objective perturbation (Kifer et al., 2012). The third term of and the last term of are introduced by input perturbation. The same holds with expectation of the excess risk, as stated in the following theorem.
Theorem 2.
Let be the output of Algorithm 2. If , , and examples are randomized by Algorithm 1, expectation of the excess empirical risk by setting and as the lowest value specified in Algorithm 1.
3.5 Balancing Local Privacy and Utility
The privacy parameters of the differential local privacy that satisfy the input perturbation are . Unfortunately, the privacy level of the input perturbation becomes weaker as the sample size increases. However, the input perturbation can satisfy stronger differential local privacy by adjusting . If the data contributors require stronger local privacy, we set for small , which enables the input perturbation to satisfy differential local privacy. Such setting of results in a higher privacy guarantee of the published model as differentially private, and a lower utility as .
4 Experiments
In this section, we examine the performance of input perturbation by experimentation. As predicted by Theorem 2, under the same privacy guarantee and the optimal setting of in input perturbation, the expectation of the excess empirical risk of the models learned with the input perturbation and the objective perturbation is the same as . We experimentally evaluate the difference between the input perturbation and the objective perturbation with real datasets while changing the size of training data and privacy parameters. We compared the performance of the input perturbation method (Input) against two methods, namely, output perturbation with Laplace mechanism (Output), (Chaudhuri et al., 2011), and objective perturbation with the Gaussian mechanism (ObjGauss) (Kifer et al., 2012). We evaluated all approaches to learn the linear regression model and the logistic regression model. For the performance measure, the root mean squared error (RMSE) was used for the linear regression model and the prediction accuracy was used for the logistic regression model. For regularization parameter tuning, with each method we found the best parameter for the largest size of training dataset, then used it for other sizes of the training dataset.
In each experiment, we randomly divided the examples into a training dataset and a test dataset with the ratio ; we trained the model with the training dataset and evaluated the performance measure with the test dataset. The average results over 100 trials were reported. We used IBM ILOG CPLEX Optimizer to optimize the objective function.
4.1 Dataset and Preprocessing
We used a dataset from Integrated Public Use Microdata Series: (IPMS) (Minnesota, 2014), which contains census records collected in the US for year 2000 after removing unknown values and missing values. We performed an experiment with , the size of the training dataset, by each times from to . We set privacy parameters .
The IPMS dataset originally contained features. The binary status Marital status was transformed into two attributes: Is Single and Is Married. Hence, attributes were employed. For linear regression model learning, Annual Income, a continuous feature, was used as the prediction target. For logistic regression model learning, we converted Annual Income into a binary attribute and used it as the label, in which values higher than a predefined threshold were mapped to , and otherwise. In both kinds of model learning, the remaining attributes were used for the features. As preprocessing, we scaled the feature values so that the norm of each feature vector was at most ; annual income used as the prediction target was scaled so that the norm was at most before transformation to the binary label.
4.2 Results
Figure 2 (a) and (b) show the experimental results of differentially private linear regression model learning. In Figure 2, the horizontal axis shows the logarithmic scale of the example size , and the vertical axis shows the average RMSE of the comparative methods. As predicted by the theorem, the results show that the average RMSE of input perturbation approaches the RMSE of nonprivacy linear regression as increases. Therefore, when the number of instances is very large, the performance of input perturbation is almost the same as that of nonprivacy, as confirmed by Theorem 2.
Input perturbation is an approximation of objective perturbation with the Gaussian mechanism. So, at the limit of , the behavior of input perturbation is equivalent to that of objective perturbation with the Gaussian mechanism. This can be confirmed from the results, too. Even with small , we can see that the RMSEs of Input and ObjGauss are still quite close in both figures. This is because the difference of the excess risk of objective and input perturbation is in .
4.3 Differentially Private Logistic Regression Model Learning
Figure 2 shows the experimental results of differentially private logistic regression model learning. In Figure 2, the horizontal axis shows the logarithmic scale of the example size , and the vertical axis shows the average accuracy of comparative methods. Similar to linear regression, the average accuracy of input perturbation is almost the same as that of objective perturbation with the Gaussian mechanism, when the example size is large. Because the average accuracy of input perturbation and objective perturbation approach the accuracy of nonprivacy logistic regression as increases. However, when is small, the accuracy of input perturbation is slightly lower than that of objective perturbation with the Gaussian mechanism. This behavior can be caused by the approximation error of the logistic loss function.
5 Conclusion
In this study, we propose a novel framework for differentially private ERM, input perturbation. In contrast to objective perturbation, input perturbation allows data contributors to take part in the process of privacy preservation of model learning. From the privacy analysis of the data releasing of the data contributors, the data collection process in the input perturbation satisfies differential local privacy. Thus, from the perspective of data contributors, data collection with input perturbation can be preferable.
Models with randomized examples following the scheme of input perturbation are guaranteed to satisfy differential privacy. To achieve this approximation with randomization by independent data contributors, input perturbation requires that the loss function be quadratic with respect to the model parameter, . Applying other loss functions in our proposed method is remained as an area of our future work.
We compared the utility analysis and the empirical evaluation of input perturbation with those of output and objective perturbations in terms of the excess empirical risk against the nonprivacypreserving ERM. We show that the excess empirical risk of the model learned with input perturbation is , which is equivalent to that of objective perturbation in the optimal setting of for every data contributors.
Acknowledgments.
References
 Bassily et al. (2014) Raef Bassily, Adam Smith, and Abhradeep Thakurta. Private empirical risk minimization: Efficient algorithms and tight error bounds. In Proceedings  Annual IEEE Symposium on Foundations of Computer Science, FOCS, pages 464–473. IEEE, oct 2014. ISBN 9781479965175. doi: 10.1109/FOCS.2014.56.
 Chaudhuri et al. (2011) Kamalika Chaudhuri, Claire Monteleoni, and Anand D Sarwate. Differentially private empirical risk minimization. The Journal of Machine Learning Research, 12:1069–1109, 2011.
 Dasgupta and Schulman (2007) Sanjoy Dasgupta and Leonard Schulman. A probabilistic analysis of em for mixtures of separated, spherical gaussians. The Journal of Machine Learning Research, 8:203–226, 2007.
 Duchi et al. (2013) John C Duchi, Michael I Jordan, and Martin J Wainwright. Local privacy and statistical minimax rates. In Foundations of Computer Science (FOCS), 2013 IEEE 54th Annual Symposium on, pages 429–438. IEEE, 2013.
 Dwork et al. (2006a) Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our data, ourselves: Privacy via distributed noise generation. In Advances in CryptologyEUROCRYPT 2006, pages 486–503. Springer, 2006a.
 Dwork et al. (2006b) Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography, pages 265–284. Springer, 2006b.
 Dwork et al. (2014) Cynthia Dwork, Aaron Roth, et al. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(34):211–407, 2014.
 Evfimievski et al. (2003) Alexandre Evfimievski, Johannes Gehrke, and Ramakrishnan Srikant. Limiting privacy breaches in privacy preserving data mining. In Proceedings of the twentysecond ACM SIGMODSIGACTSIGART symposium on Principles of database systems, pages 211–222. ACM, 2003.
 Jain and Thakurta (2014) Prateek Jain and Abhradeep Guha Thakurta. (near) dimension independent risk bounds for differentially private learning. In Proceedings of The 31st International Conference on Machine Learning, pages 476–484, 2014.
 Kairouz et al. (2014) Peter Kairouz, Sewoong Oh, and Pramod Viswanath. Extremal mechanisms for local differential privacy. In Advances in Neural Information Processing Systems, pages 2879–2887, 2014.
 Kasiviswanathan et al. (2011) Shiva Prasad Kasiviswanathan, Homin K Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. What can we learn privately? SIAM Journal on Computing, 40(3):793–826, 2011.
 Kifer et al. (2012) Daniel Kifer, Adam Smith, and Abhradeep Thakurta. Private convex empirical risk minimization and highdimensional regression. Journal of Machine Learning Research, 1:41, 2012.
 Laurent and Massart (2000) Béatrice Laurent and Pascal Massart. Adaptive estimation of a quadratic functional by model selection. Annals of Statistics, pages 1302–1338, 2000.
 Lei (2011) Jing Lei. Differentially private mestimators. In Advances in Neural Information Processing Systems, pages 361–369, 2011.
 Minnesota (2014) Population Center Minnesota. Integrated public use microdata series, international: Version 6.3 [machinereadable database]. Minneapolis: University of Minnesota, 2014.
 Rao (2009) C Radhakrishna Rao. Linear statistical inference and its applications, volume 22. John Wiley & Sons, 2009.
 Wainwright et al. (2012) Martin J Wainwright, Michael I Jordan, and John C Duchi. Privacy aware learning. In Advances in Neural Information Processing Systems, pages 1430–1438, 2012.
Appendix A Notation
Here, we summerize the notations in Table 2.
Notation  Description 

domain of the dimentional feature vector  
output domain  
database of examples  
the ith example of data base  
a vector computed by  
a vector computed by  
domain of the model parameter  
the model parameter  
the upper bound of for any  
the loss function  
the upper bound of  
the upper bound of  
the average loss function  
the optimal parameter of the average loss function  
the objective function of ERM  
the optimal parameter of ERM  
the differential privacy parameters  
the variance of Gaussian distribution  
the variance of Gaussian distribution  
is added with noise from  
is added with noise from  
with noise added  
the objective function of output perturbation  
the optimal parameter of output perturbation  
the objective function of objective perturbation  
the optimal parameter of objective perturbation  
the objective function of input perturbation  
the optimal parameter of input perturbation 
Appendix B Proof of Lemma 1
We first introduce known results in order to prove this lemma.
Lemma 3 (Rao (2009)).
Let be a random matrix drawn from Wishart distribution with degrees of freedom and variance matrix . Let be a nonzero constant vector. Then,
where is the chisquared distribution with degrees of freedom and . (Note that is a constant; it is positive because is positive definite).
Lemma 4 (Laurent and Massart (2000)).
Let . Then, for any ,
(7) 
Also, for any ,
(8) 
Lemma 5.
Let , then for all , we have
Lemma 6.
Let , where , and . Then, with probability at least we get the following bound:
Proof.
holds because . By using Lemma 3, we thus get . Noting that , the upper bound of is derived by applying Eq. (7) of Lemma 4 as follows:
(9) 
In a similar manner, by applying Eq. (8) of Lemma 4, the lower bound of is given as follows:
(10) 
By setting , then we get . Replacing the value of as and combining Eq. (9) and Eq. (10) gives our claim. ∎
Next, we investigate the tail bound of .
Lemma 7.
Let , where . For , with probability at least ,
Proof.
Let . Since , we have . From the property of the sum of the normally distributed independent random variables, we have . Since holds, we get . . Application of Lemma 5 thus yields
(11) 
By setting , then we get . To make sure , we need to have . This is always true for any . Replacing the value of , with probability at least we get the following bound.
We get the claim since . ∎
Proof of Lemma 1.
Here is a corollary of Lemma 1
Corollary 2.
Let , where . Let . Then for any , we get the following:
(12) 
Proof.
Corollary 3.
When , we have

the upper bound of is with

the lower bound of is .
Proof.
We derive the lower bound of as
We derive the upper bound of as
Letting , we have . Hence,