Adversarial Privacy Preservation under Attribute Inference Attack
Abstract
With the prevalence of machine learning services, crowdsourced data containing sensitive information poses substantial privacy challenges. Existing work focusing on protecting against membership inference attacks under the rigorous framework of differential privacy are vulnerable to attribute inference attacks. In light of the current gap between theory and practice, we develop a novel theoretical framework for privacypreservation under the attack of attribute inference. Under our framework, we propose a minimax optimization formulation to protect the given attribute and analyze its privacy guarantees against arbitrary adversaries. On the other hand, it is clear that privacy constraint may cripple utility when the protected attribute is correlated with the target variable. To this end, we also prove an informationtheoretic lower bound to precisely characterize the fundamental tradeoff between utility and privacy. Empirically, we extensively conduct experiments to corroborate our privacy guarantee and validate the inherent tradeoffs in different privacy preservation algorithms. Our experimental results indicate that the adversarial representation learning approaches achieve the best tradeoff in terms of privacy preservation and utility maximization.
1 Introduction
With the growing demand for machine learning systems provided as services, a massive amount of data containing sensitive information, such as race, income level, age, etc., are generated and collected from local users. This poses a substantial privacy challenge and it has become an imperative object of study in machine learning (Abadi et al., 2016; GiladBachrach et al., 2016), computer vision (Chou et al., 2018; Wu et al., 2018), healthcare (BeaulieuJones et al., 2018b, a), security (Shokri et al., 2017), and many other domains. In this paper, we consider a practical scenario where the prediction vendor requests crowdsourced data for a target task, e.g, scientific modeling. The data owner agrees on the data usage for the target task while she does not want her other private information (e.g., age, race) to be leaked. The goal of privacypreserving in this context is then to protect private attributes of the sanitized data released by data owner from potential attribute inference attacks of a malicious adversary. For example, in an online advertising scenario, while the user (data owner) may agree to share her historical purchasing events, she also wants to protect her age information so that no malicious adversary can infer her age range from the shared data. Note that simply removing age attribute from the shared data is insufficient for this purpose, due to the redundant encoding in data, i.e., other attributes may have a high correlation with age.
Among many other techniques, differential privacy (DP) has been proposed and extensively investigated to protect the privacy of collected data (Dwork & Nissim, 2004; Dwork et al., 2006). DP embraces formal guarantees for privacy problems such as defending against the membership query attacks (Abadi et al., 2016; Papernot et al., 2016), or ensures the distribution of any two data records statistically indistinguishable (Erlingsson et al., 2014; Duchi et al., 2013; Bassily & Smith, 2015). However, DP still suffers from attribute inference attacks (Fredrikson et al., 2015; Cormode, 2011; Gong & Liu, 2016), as it only prevents an adversary from gaining additional knowledge by inclusion/exclusion of a subject, not from gaining knowledge from the data itself (Dwork et al., 2014). As a result, an adversary can still accurately infer sensitive attributes of data owners from differentiallyprivate datasets. Such a gap between theory and practice calls for an important and appealing challenge: {quoting} Can we find a representation of the raw data to remove private information related to a sensitive attribute while still preserving our utility of the target task? If no, what is the fundamental tradeoff between privacy preservation and utility maximization?
Clearly, under the setting of attribute inference attacks, the notion of privacy preservation should be attributespecific: the goal is to protect specific attributes from being inferred by malicious adversaries. Note that this is in sharp contrast with differential privacy, where mechanisms are usually designed to resist worstcase membership query among all the data owners. From this perspective, our relaxed definition of privacy also allows for a more flexible design of algorithms with better utility.
Our Contributions
In this paper, we first formally define the notion of utility and privacy. We justify why our definitions are particularly suited under the setting of attribute inference attacks. Through the lens of representation learning, we then formulate the problem of utility maximization with privacy constraint as a minimax optimization problem that can be effectively and practically implemented. To provide a formal guarantee on privacy preservation, we prove an informationtheoretic lower bound on the inference error of the protected attribute under attacks from arbitrary adversaries. To investigate the relationship between privacy preservation and utility maximization, we also provide a theoretical result to formally characterize the inherent tradeoff between these two concepts. Empirically, we extensively conduct experiments to corroborate our privacy guarantee and validate the inherent tradeoffs in different privacy preservation algorithms. From our empirical results, we conclude that the adversarial representation learning approach achieves the best tradeoff in terms of privacy preservation and utility maximization, among various stateoftheart privacy preservation algorithms.
2 Preliminary
We first introduce our problem setting, the notations used throughout the paper and formally define the notions of utility and privacy discussed in this paper.
2.1 Problem Setup and Notation
Problem Setup
We focus on the setting where the goal of the adversary is to perform attribute inference. This setting is ubiquitous in severclient paradigm where machine learning is provided as a service (MLaaS, Ribeiro et al. (2015)). Formally, there are two parties in the system, namely the prediction vendor and the data owner. We consider the practical scenarios where users agree to contribute their data for training a machine learning model for specific purposes but do not want others to infer their private attributes in the data, such as health information, race, gender, etc. The prediction vendor will not collect raw user data but processed user data and the target attribute for the target task. In our setting, we assume the adversary cannot get other auxiliary information than the processed user data. In this case, the adversary can be anyone who can get access to the processed user data to some extent and wants to infer other private information. For example, malicious machine learning service providers are motivated to infer more information from users to do user profiling and targeted advertisements. The goal of the data owner is to provide as much information as possible to the prediction vendor to maximize the vendor’s own utility, but under the constraint that the data owner should also protect the private information of the data source.
Notation
We use , and to denote the input, output and adversary’s output space, respectively. Accordingly, we use to denote the random variables which take values in and . We note that in our framework the input space may or may not contain the private attribute . For two random variables and , denotes the mutual information between and . We use to mean the Shannon entropy of random variable . Similarly, we use to denote the conditional entropy of given . We assume there is a joint distribution over from which the data are sampled. To make our notation consistent, we use , and to denote the marginal distribution of over , and . Given a feature map function that maps instances from the input space to feature space , we define to be the induced (pushforward) distribution of under , i.e., for any event , .
2.2 Utility and Privacy
To simplify the exposition, we mainly discuss the attribute inference setting where , but the underlying theory and methodology could easily be extended to the categorical case as well. In what follows, we shall first formally define both the utility of the prediction vendor and the privacy of the data owner. It is worth pointing out that our definition of privacy is attributespecific, and this is in contrast with the classic framework of differential privacy where the goal is to preserve privacy in the general and worstcase query scenario. In particular, we seek to keep the utility of the data while being robust to an adversary on protecting specific information from attack.
A hypothesis is a function . The error of a hypothesis under the distribution over is defined as: . Similarly, we use to denote the empirical error of on a sample from . For binary classification problem, when , the above loss also reduces to the error rate of classification. Let be the Hilbert space of hypotheses. In the context of binary classification, we define the utility of a hypothesis as the opposite of error:
Definition 2.1 (Utility).
The utility of a hypothesis is .
For binary classification, we always have . Now we proceed to define a measure of privacy in our framework:
Definition 2.2 (Privacy).
The privacy w.r.t. task under attacks from is defined as .
Again, it is straightforward to verify that . Based on our definition, then measures the privacy of data under possible attacks from adversaries in . We can also refine the above definition to a particular hypothesis to measure its ability to steal information about : .
Proposition 2.1.
Let be a hypothesis, then iff and iff almost surely or almost surely.
Proposition 2.1 justifies our definition of privacy: when , it means that contains no information about the sensitive attribute . On the other hand, if , then fully predicts (or equivalently, ) from input . In the latter case also contains perfect information of in the sense that , i.e., the Shannon entropy of . It is worth pointing out that our definition of privacy is insensitive to the marginal distribution of , and hence is more robust than other definitions such as the error rate of predicting . In that case, if is extremely imbalanced, even a naive predictor can attain small prediction error by simply outputting constant. We call a hypothesis space symmetric if , as well. Interestingly, when is symmetric, we can also relate the privacy to a binary classification problem:
Proposition 2.2.
If is symmetric, then .
TN  FP  
FN  TP 
Remark
Consider the following confusion matrix between the actual private attribute and its predicted variable in Table 1. The false positive rate (eqv. TypeI error) is defined as FPR = FP / (FP + TN) and the false negative rate (eqv. TypeII error) is similarly defined as FNR = FN / (FN + TP). Using the terminology of confusion matrix, it is then clear that and . In other words, Proposition 2.2 says that if is symmetric, then the privacy of a hypothesis space corresponds to the minimum sum of TypeI and TypeII error that is achievable under attacks from .
3 Minimax Optimization against Attribute Inference Attacks
3.1 Minimax Formulation
Given a set of samples drawn i.i.d. from the joint distribution , how can the data owner keeps the utility of the data while keeping the sensitive attribute private under potential attacks from malicious adversary? Through the lens of representation learning, we seek to find a (nonlinear) feature representation from input space to feature space such that still preserves relevant information w.r.t. the target task of inferring while hiding sensitive attribute . Specifically, we can solve the following unconstrained regularized problem with :
(1) 
It is worth pointing out that the optimization formulation in (1) admits an interesting gametheoretic interpretation, where two agents and play a game whose score is defined by the objective function in (1). Intuitively, seeks to minimize the sum of TypeI and TypeII error while plays against by learning transformation to removing information about the sensitive attribute . Algorithmically, for the data owner to achieve the goal of hiding information about the sensitive attribute from malicious adversary, it suffices to learn representation that is independent of . Formally:
Proposition 3.1.
Let be a deterministic function and be a hypothesis class over . For any joint distribution over , if , then .
Note that in this sequential game, is the firstmover and is the second. Hence without explicit constraint possesses a firstmover advantage so that can dominate the game by simply mapping all the input to a constant or uniformly random noise. To avoid these degenerate cases, the first term in the objective function of (1) acts as an incentive to encourage to preserve taskrelated information. But will this incentive compromise our privacy? As an extreme case if the target variable and the sensitive attribute are perfectly correlated, then it should be clear that there is a tradeoff in achieving utility and preserving privacy. In Sec. 4 we shall provide an informationtheoretic bound to precisely characterize such inherent tradeoff. Furthermore, although the formulation in (1) only works for defense over a single attribute, it is straightforward to extend the formulation so that it can protect attacks over multiple attributes. Due to space limit, we defer the discussion of this extension to Section C in appendix.
3.2 Privacy Guarantees on Attribute Inference Attacks
In the last section we propose the unconstrained minimax formulation (1) to optimize both our utility and the defined privacy measure. Clearly, the hyperparameter measures the tradeoff between utility and our privacy. On one hand, if , we barely care about the privacy and devote all the focus to maximize our utility. On the other extreme, if , we are only interested in protecting the privacy. In what follows we analyze the true error that an optimal adversary has to incur in the limit when both the task classifier and the adversary have unlimited capacity, i.e., they can be any randomized functions from to . To study the true error, we hence use the population loss rather than the empirical loss in our objective function. Furthermore, since the binary classification error in (1) is NPhard to optimize even for hypothesis class of linear predictors, in practice we consider the crossentropy loss function as a convex surrogate loss. The crossentropy loss of a probabilistic hypothesis w.r.t. on a distribution is defined as follows:
(2) 
With a slight abuse of notation, we use to mean the crossentropy loss of the adversary w.r.t. . Using the same notation, the optimization formulation with crossentropy loss becomes:
(3) 
Given a feature map , assume that contains all the possible randomized classifiers from the feature space to . For example, a randomized classifier can be constructed by first defining a probabilistic function followed by a random coin flipping to determine the output label, where the probability of the coin being 1 is given by . Under such assumptions, the following lemma shows that the optimal target classifier under is given by the conditional distribution .
Lemma 3.1.
For any feature map , assume that contains all the randomized binary classifiers, then and .
By a symmetric argument, we can also see that the worstcase (optimal) adversary under is the conditional distribution and . Hence we can further simplify the optimization formulation (3) to the following form where the only optimization variable is the feature map :
(4) 
Since is a deterministic feature map, it follows from the basic properties of Shannon entropy that
which means that is a lower bound of the optimum of the objective function in (4). However, such lower bound is not necessarily achievable. To see this, consider the simple case where almost surely. In this case there exists no deterministic feature map that is both a sufficient statistics of w.r.t. while simultaneously filters out all the information w.r.t. except in the degenerate case where is constant. On the other hand, to show that solving the optimization problem in (4) helps to protect our privacy, the following theorem gives a bound of privacy in terms of the error that has to be incurred by the optimal adversary:
Theorem 3.1.
Let be the optimal feature map of (4) and define . Then for any adversary such that , .
Remark
Theorem 3.1 shows that whenever the conditional entropy is large, then the inference error of the protected attribute incurred by any (randomized) adversary has to be at least . As we have already shown above, the conditional entropy essentially corresponds to the second term in our objective function, whose optimal value could further be flexibly adjusted by tuning the tradeoff parameter . As a final note, Theorem 3.1 also shows that representation learning helps to protect the privacy about since we always have for any deterministic feature map so that the lower bound of inference error by any adversary is larger after learning the representation .
4 Inherent tradeoff between Utility and Privacy
As we briefly mentioned in Sec. 3.1, when the protected sensitive attribute and the target variable are perfectly correlated, it is impossible to simultaneously achieve the goal of privacypreserving and utilitymaximizing. But what is the exact tradeoff between utility and privacy when they are correlated? In this section we shall provide an informationtheoretic bound to quantitatively characterize the inherent tradeoff between privacy and utility, due to the discrepancy between the conditional distributions of the target variable given the sensitive attribute. Our result is algorithmindependent, hence it applies to a general setting where there is a need to preserve both utility and privacy. To the best of our knowledge, this is the first informationtheoretic result to precisely quantify such tradeoff. Due to space limit, we defer all the proofs to appendix.
Before we proceed, we first define several informationtheoretic concepts that will be used in our analysis. For two distributions and , the JensenShannon (JS) divergence is:
where is the Kullback–Leibler (KL) divergence and . The JS divergence can be viewed as a symmetrized and smoothed version of the KL divergence, and it is upper bounded by the distance (total variation) between two distributions through Lin’s Lemma:
Lemma 4.1 (Lin (1991)).
Let and be two distributions, then .
Unlike the KL divergence, the JS divergence is bounded: . Additionally, from the JS divergence, we can define a distance metric between two distributions as well, known as the JS distance (Endres & Schindelin, 2003): . With respect to the JS distance, for any feature space and any deterministic mapping , we can prove the following lemma via the celebrated data processing inequality:
Lemma 4.2.
Let and be two distributions over and let and be the induced distributions of and over by function , then .
Without loss of generality, any method aiming to predict the target variable defines a Markov chain as , where is the predicted target variable given by hypothesis and is the intermediate representation defined by the feature mapping . Hence for any distribution of , this Markov chain also induces a distribution of and a distribution of . Now let be the underlying true conditional distribution of given . Realize that the JS distance is a metric, the following chain of triangular inequalities holds:
Combining the above inequality with Lemma 4.2 to show
we immediately have:
Intuitively, and measure the distance between the predicted and the true target distribution on cases, respectively. Formally, let be the prediction error of function conditioned on . With the help of Lemma 4.1, the following result establishes a relationship between and the utility of :
Lemma 4.3.
Let be the predictor, then for , .
Combine Lemma 4.2 and Lemma 4.3, we get the following key lemma that is the backbone for proving the main results in this section:
Lemma 4.4 (Key lemma).
Let , be two distributions over conditioned on and respectively. Assume the Markov chain holds, then :
(5) 
We emphasize that for , the term measures the conditional error of the predicted variable by composite function over . Similarly, we can define the conditional utility for . The following main theorem then characterizes a fundamental tradeoff between utility and privacy:
Theorem 4.1.
Let contains all the classifiers from to . Given the conditions in Lemma 4.4, , .
A few remarks follow. First, note that the maximal value achievable by the sum of the three terms on the L.H.S. is 3. In light of this, the upper bound given in Theorem 4.1 shows that when the marginal distribution of the target variable differ between two cases or , then it is impossible to perfectly maximize utility and privacy. Furthermore, the tradeoff due to the difference in marginal distributions is precisely given by the JS divergence . Note that in Theorem 4.1 the upper bound holds for any hypothesis in the richest hypothesis class that contains all the possible binary classifiers. Put it another way, if we would like to maximally preserve privacy w.r.t. sensitive attribute , then we have to incur a large joint error:
Theorem 4.2.
Assume the conditions in Theorem 4.1 hold. If , then , .
Remark
The above lower bound characterizes a fundamental tradeoff between privacy and joint error. In particular, up to a certain level , the larger the privacy, the larger the joint error. In light of Proposition 3.1, this means that although the dataowner, or the firstmover , could try to maximally preserve the privacy via constructing such that is independent of , such construction will also inevitably compromise the joint utility of the prediction vendor. It is also worth pointing out that our results in both Theorem 4.1 and Theorem 4.2 are attributeindependent in the sense that neither of the bounds depends on the marginal distribution of . Instead, all the terms in our results only depend on the conditional distributions given and . This is often more desirable than bounds involving mutual information, e.g., , since is close to if the marginal distribution of is highly imbalanced.
5 Experiments
Our theoretical results on the privacy guarantee of attribute inference attacks imply that the inference error of the protected attribute incurred by any (randomized) adversary has to be at least . In this section we extensively conduct experiments on two realworld benchmark datasets, the UCI Adult dataset (Dua & Graff, 2017) and the UTKFace dataset (Zhang et al., 2017) to verify 1). Our guarantee on privacy can be used as a certificate for different privacy preservation methods. 2). Inherent tradeoffs exist between privacy and utility exist in all methods. 3). Among all the privacy preservation algorithms, including differential privacy, the adversarial representation learning approach achieves the best tradeoff in terms of privacy preservation and utility maximization.
5.1 Datasets and Setup
Datasets
1). Adult dataset: The Adult dataset is a benchmark dataset for privacypreservation. The task is to predict whether an individual’s income is greater or less than 50K/year based on census data. The attributes in the dataset includes gender, education, occupation, age, etc. In this experiment we set the target task as income prediction and the private task as inferring gender, age and education, respectively. 2). The UTKFace dataset is a largescale face dataset containing more than 20,000 images with annotations of age, gender, and ethnicity. It is one of the benchmark datasets for age estimation, gender and race classifications. In this experiment, we set our target task as gender classification and we use the age and ethnicity as the protected attributes. We refer readers to Section D in the appendix for detailed descriptions about the data preprocessing pipeline.
Methods
We conduct extensive experiments with the following methods to verify our theoretical results and provide a thorough practical comparison among these methods. 1). Privacy Partial Least Squares (PPLS) (Enev et al., 2012), 2). Privacy Linear Discriminant Analysis (PLDA) (Whitehill & Movellan, 2012), 3). Minimax filter with alternative update (ALTUP) (Hamm, 2017) 4). Gradient Reversal Layer (GRL) (Ganin et al., 2016) 5). Principal Component Analysis (PCA) 6). No defense (NODEF) and, 7) Differential Privacy (DP). Among the first six methods, the first four are stateoftheart minimax methods for protecting against attribute inference attacks while the latter two are nonprivate baselines for comprehensive comparison. Although DP is not tailored to attribute inference attack, we can still add noise to the raw data based on the Laplacian mechanism. Our goal here is to provide a thorough comparison in terms of utilityprivacy tradeoff by using methods from both representation learning and differential privacy.
To make sure the comparison is fair among different methods, we conduct a controlled experiment by using the same network structure as the baseline hypothesis among all the methods for each dataset. For each experiment on the Adult dataset, we repeat the experiments for 5 times to report both the average performance and their standard deviations. Similarly, we repeat each experiment on the UTKFace dataset for 3 times, and we also report both the means and standard deviations. Details on network structures and implementation of the above algorithms are provided in the appendix.
Note that in practice due to the nonconvexity nature of optimizing deep neural nets, we cannot guarantee to find the global optimal conditional entropy . Hence in order to compute the privacy guarantee given by our lower bound in Theorem 3.1, we use the crossentropy loss of the optimal adversary found by our algorithm on inferring the sensitive attribute . Furthermore, since our analysis only applies to representation learning based approaches, we do not have a privacy guarantee for DPrelated methods in our context.
5.2 Results and Analysis
We visualize the performances of the aforementioned algorithms on privacy preservation and utility maximization in Figure 1 and Figure 2, respectively. First, from Figure 1, we can see that among all the methods, both DP, PLDA, ALTUP and GRL are effective in privacy preservation by forcing the optimal adversary to incur a large inference error. On the other hand, PCA and NODEF are the least effective ones. This is expected as either NODEF nor PCA tries to filter information in data about the sensitive attribute . We can also see that with a larger tradeoff value , both ALTUP and GRL achieve better privacy preservation.
Second, from Figure 2, we can also see a sharp contrast between DP and other methods in terms of the joint conditional error on the target task: DP incurs much more utility loss compared with other methods. Combining this one with our previous observation from Figure 1, we can see that DP makes data private by adding large amount of noise to effectively filter out all the information available in the data, including both targetrelated and sensitive information. As a comparison, representation learning based approaches leads to a much better tradeoff. Among all the representation learning methods, PLDA, ALTUP and GRL perform the best in privacy preservation. Compared to ALTUP and GRL, PLDA is more effective in privacy preservation in some cases, but at the cost of a significant drop in utility. PPLS is less effective in privacy preservation compared to other private representation learning based approaches and it causes a large utility loss in the UTKFace dataset.
6 Related Work
The attribute inference attack problem has close connections to both differential privacy and algorithmic fairness. In this section we mainly focus on discussing the connections and differences between these problems. As a summary, we visualize their relationships in the diagram shown in Figure 3.
Differential Privacy
DP has been proposed to bound the difference of algorithmic output between any two “neighboring” datasets from the released data (Dwork & Nissim, 2004; Dwork et al., 2006; Erlingsson et al., 2014) and was used in the training of deep neural network recently (Abadi et al., 2016; Papernot et al., 2016). Our definition of privacy is different from (local) differential privacy since the goal of DP tries to make any two neighboring datasets have close probabilities to produce the same output. In the setting of learning algorithms, this means that the models trained from two neighboring datasets should be close to each other. However, this does not necessarily imply that the learned model itself is free from attribute inference attacks. As a comparison, our goal of defending attribute inference attacks is to learn a representation such that the protected attributes cannot be accurately inferred. Put it in another way, given a dataset matrix, the goal of DP is to ensure that it is hard to infer about a row in the matrix while our privacy definition seeks to ensure that it is hard to infer about a specific column of the data matrix. From this perspective, DP is closely related to the wellknown membership inference attack (Shokri et al., 2017) instead. It is observed (Dwork et al., 2012) that the notion of individual fairness may be viewed as a generalization of DP.
Algorithmic Fairness
The privacy defined in this work is related to the notion of group fairness in the literature of algorithmic fairness (Dwork et al., 2012; Edwards & Storkey, 2015). In particular, adversarial learning methods have been used as a tool in both fields to achieve the corresponding goals. However, the motivations and goals significantly differ between these two fields. Specifically, the widely adopted notion of group fairness, namely equalized odds (Hardt et al., 2016), requires equalized false positive and false negative rates across different demographic subgroups. As a comparison, in applications where privacy is a concern, we mainly want to ensure that adversaries cannot steal sensitive information from the data. Hence our goal is to give a worst case guarantee on the inference error that any adversary has at least to incur. To the best of our knowledge, our results in Theorem 3.1 is the first one to analyze the performance of privacy preservation in such scenarios. Furthermore, no prior theoretical results exist on discussing the tradeoff between privacy and utility on defending attribute inference attacks. Our proof techniques developed in this work could also be used to derive informationtheoretic lower bounds in related problems as well (Zhao et al., 2019; Zhao & Gordon, 2019).
7 Conclusion
We develop a theoretical framework for privacy preservation under the setting of attribute inference attacks. Under this setting, we propose a theoretical framework that suggests using adversarial learning techniques to protect the private attribute. We further analyze the privacy guarantee of the defense method in the limit of worstcase adversaries and prove an informationtheoretic lower bound to quantify the inherent tradeoff between utility and privacy. Following our formulation, we conduct extensive experiments to corroborate our theoretical results and to empirically compare different stateoftheart privacy preservation algorithms. Experimental results show that the adversarial representation learning approaches are very effective in defending attribute inference attacks and often achieve the best tradeoff in terms of privacy preservation and utility maximization. We believe our work takes an important step towards better understanding the privacyutility tradeoff, and it also helps to stimulate the future design of privacypreservation algorithm with adversarial learning techniques.
References
 Abadi et al. (2016) Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308–318. ACM, 2016.
 Bassily & Smith (2015) Raef Bassily and Adam Smith. Local, private, efficient protocols for succinct histograms. In Proceedings of the fortyseventh annual ACM symposium on Theory of computing, pp. 127–135. ACM, 2015.
 BeaulieuJones et al. (2018a) Brett K BeaulieuJones, Zhiwei Steven Wu, Chris Williams, Ran Lee, Sanjeev P Bhavnani, James Brian Byrd, and Casey S Greene. Privacypreserving generative deep neural networks support clinical data sharing. BioRxiv, pp. 159756, 2018a.
 BeaulieuJones et al. (2018b) Brett K BeaulieuJones, William Yuan, Samuel G Finlayson, and Zhiwei Steven Wu. Privacypreserving distributed deep learning for clinical data. arXiv preprint arXiv:1812.01484, 2018b.
 Calabro (2009) Chris Calabro. The exponential complexity of satisfiability problems. PhD thesis, UC San Diego, 2009.
 Chou et al. (2018) Edward Chou, Josh Beal, Daniel Levy, Serena Yeung, Albert Haque, and Li FeiFei. Faster cryptonets: Leveraging sparsity for realworld encrypted inference. arXiv preprint arXiv:1811.09953, 2018.
 Cormode (2011) Graham Cormode. Personal privacy vs population privacy: learning to attack anonymization. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1253–1261. ACM, 2011.
 Daskalakis & Panageas (2018) Constantinos Daskalakis and Ioannis Panageas. The limit points of (optimistic) gradient descent in minmax optimization. In Advances in Neural Information Processing Systems, pp. 9236–9246, 2018.
 Dua & Graff (2017) Dheeru Dua and Casey Graff. UCI machine learning repository, 2017. URL http://archive.ics.uci.edu/ml.
 Duchi et al. (2013) John C Duchi, Michael I Jordan, and Martin J Wainwright. Local privacy and statistical minimax rates. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, pp. 429–438. IEEE, 2013.
 Dwork & Nissim (2004) Cynthia Dwork and Kobbi Nissim. Privacypreserving data mining on vertically partitioned databases. In Annual International Cryptology Conference, pp. 528–544. Springer, 2004.
 Dwork et al. (2006) Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference, pp. 265–284. Springer, 2006.
 Dwork et al. (2012) Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference, pp. 214–226. ACM, 2012.
 Dwork et al. (2014) Cynthia Dwork, Aaron Roth, et al. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014.
 Edwards & Storkey (2015) Harrison Edwards and Amos Storkey. Censoring representations with an adversary. arXiv preprint arXiv:1511.05897, 2015.
 Endres & Schindelin (2003) Dominik Maria Endres and Johannes E Schindelin. A new metric for probability distributions. IEEE Transactions on Information theory, 2003.
 Enev et al. (2012) Miro Enev, Jaeyeon Jung, Liefeng Bo, Xiaofeng Ren, and Tadayoshi Kohno. Sensorsift: balancing sensor data privacy and utility in automated face understanding. In Proceedings of the 28th Annual Computer Security Applications Conference, pp. 149–158. ACM, 2012.
 Erlingsson et al. (2014) Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. Rappor: Randomized aggregatable privacypreserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, pp. 1054–1067. ACM, 2014.
 Fredrikson et al. (2015) Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 1322–1333. ACM, 2015.
 Ganin et al. (2016) Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor Lempitsky. Domainadversarial training of neural networks. The Journal of Machine Learning Research, 17(1):2096–2030, 2016.
 GiladBachrach et al. (2016) Ran GiladBachrach, Nathan Dowlin, Kim Laine, Kristin Lauter, Michael Naehrig, and John Wernsing. Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy. In International Conference on Machine Learning, pp. 201–210, 2016.
 Gong & Liu (2016) Neil Zhenqiang Gong and Bin Liu. You are who you know and how you behave: Attribute inference attacks via users’ social friends and behaviors. In 25th USENIX Security Symposium (USENIX Security 16), pp. 979–995, 2016.
 Goodfellow et al. (2014) Ian Goodfellow, Jean PougetAbadie, Mehdi Mirza, Bing Xu, David WardeFarley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680, 2014.
 Hamm (2017) Jihun Hamm. Minimax filter: Learning to preserve privacy from inference attacks. The Journal of Machine Learning Research, 18(1):4704–4734, 2017.
 Hardt et al. (2016) Moritz Hardt, Eric Price, Nati Srebro, et al. Equality of opportunity in supervised learning. In Advances in neural information processing systems, pp. 3315–3323, 2016.
 Lin (1991) Jianhua Lin. Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, 37(1):145–151, 1991.
 Papernot et al. (2016) Nicolas Papernot, Martín Abadi, Ulfar Erlingsson, Ian Goodfellow, and Kunal Talwar. Semisupervised knowledge transfer for deep learning from private training data. arXiv preprint arXiv:1610.05755, 2016.
 Ribeiro et al. (2015) Mauro Ribeiro, Katarina Grolinger, and Miriam AM Capretz. Mlaas: Machine learning as a service. In 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), pp. 896–902. IEEE, 2015.
 Shokri et al. (2017) Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 3–18. IEEE, 2017.
 Whitehill & Movellan (2012) Jacob Whitehill and Javier Movellan. Discriminately decreasing discriminability with learned image filters. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2488–2495. IEEE, 2012.
 Wu et al. (2018) Zhenyu Wu, Zhangyang Wang, Zhaowen Wang, and Hailin Jin. Towards privacypreserving visual recognition via adversarial training: A pilot study. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 606–624, 2018.
 Zagoruyko & Komodakis (2016) Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. arXiv preprint arXiv:1605.07146, 2016.
 Zhang et al. (2017) Zhifei Zhang, Yang Song, and Hairong Qi. Age progression/regression by conditional adversarial autoencoder. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5810–5818, 2017.
 Zhao & Gordon (2019) Han Zhao and Geoffrey J Gordon. Inherent tradeoffs in learning fair representations. In Advances in neural information processing systems, 2019.
 Zhao et al. (2019) Han Zhao, Remi Tachet Des Combes, Kun Zhang, and Geoffrey Gordon. On learning invariant representations for domain adaptation. In International Conference on Machine Learning, pp. 7523–7532, 2019.
Appendix
In this appendix we provide the missing proofs of theorems and claims in our main paper. We also describe detailed experimental settings here.
Appendix A Technical Tools
In this section we list the lemmas and theorems used during our proof.
Lemma A.1 (Theorem 2.2, (Calabro, 2009)).
Let be the inverse binary entropy function for , then .
Theorem A.1 (Data processing inequality).
Let , then .
Appendix B Missing Proofs
See 2.1
Proof.
We first prove the first part of the proposition. By definition, iff , which is also equivalent to . It then follows that .
For the second part of the proposition, again, by definition of , it is clear to see that we either have and , or and . Hence we discuss by these two cases. For ease of notation, we omit the subscript from when it is obvious from the context which probability distribution we are referring to.

If and , then we know that:

If and , similarly, we have:
Combining the above two parts completes the proof. ∎
See 2.2
Proof.
By definition, we have:
where the third equality holds due to the fact that . To see this, for any specific such that the term inside the absolute value is negative, we can find such that it becomes positive, due to the assumption that is symmetric. ∎
See 3.1
Proof.
First, by the celebrated dataprocessing inequality, :
By Proposition 2.1, this means that , , which further implies that by definition. ∎
See 3.1
Proof.
Let be the induced (pushforward) distribution of under the map . By the definition of crossentropy loss, we have:
It is also clear from the above proof that the minimum value of the crossentropy loss is achieved when equals the conditional probability , i.e., . ∎
See 3.1
Proof.
To prove this theorem, let be the binary random variable that takes value 1 iff , i.e., . Now consider the joint entropy of and . On one hand, we have:
Note that the second equation holds because is a deterministic function of and , that is, once and are known, is also known, hence . On the other hand, we can also decompose as follows:
Combining the above two equalities yields
Furthermore, since conditioning cannot increase entropy, we have , which further implies
Now consider . Since , by definition of the conditional entropy, we have:
To lower bound , realize that
Since is a randomized function of such that , due to the celebrated dataprocessing inequality, we have , which implies
Combine everything above, we have the following chain of inequalities hold:
which implies
where is the inverse function of the binary entropy when . To conclude the proof, we apply Lemma A.1 to further lower bound the inverse binary entropy function by
completing the proof. ∎
See 4.2
Proof.
Let be a uniform random variable taking value in and let the random variable with distribution (resp. with distribution ) be the mixture of and (resp. and ) according to . It is easy to see that , and we have:
Similarly, we have:
Since (resp. ) is induced by from (resp. ), by linearity, is also induced by from . Hence and the following Markov chain holds:
Apply the data processing inequality, we have
Taking square root on both sides of the above inequality completes the proof. ∎
See 4.3
Proof.
For , by definition of the JS distance:
where the expectation is taken over the joint distribution of . Taking square root at both sides then completes the proof. ∎
See 4.1
Proof.
Before we delve into the details, we first give a highlevel sketch of the main idea. The proof could be basically partitioned into two parts. In the first part, we will show that when contains all the measurable prediction functions, could be used to upper bound . The second part combines Lemma 4.3 and Lemma 4.2 to complete the proof.
In this part we first show that :
where denotes the total variation distance and is the sigma algebra that contains all the measurable subsets of . On the other hand, when contains all the measurable functions in , we have:
where the last equality follows from the fact that is complete and contains all the measurable functions. Combine the above two parts we immediately have .
Now using the key lemma, we have: