Understanding Compressive Adversarial Privacy
Abstract
Designing a data sharing mechanism without sacrificing too much privacy can be considered as a game between data holders and malicious attackers. This paper describes a compressive adversarial privacy framework that captures the tradeoff between the data privacy and utility. We characterize the optimal data releasing mechanism through convex optimization when assuming that both the data holder and attacker can only modify the data using linear transformations. We then build a more realistic data releasing mechanism that can rely on a nonlinear compression model while the attacker uses a neural network. We demonstrate in a series of empirical applications that this framework, consisting of compressive adversarial privacy, can preserve sensitive information.
I Introduction
Machine learning has progressed dramatically in many reallife tasks such as classifying image[1], processing natural language[2], predicting electricity consumption[3], and many more. These tasks rely on large datasets that are usually saturated with private information. Data holders who want to apply machine learning techniques may not be cautious about what additional information the model can capture from training data, as long as the primary task can be solved by some model with high accuracy.
In this paper, we propose a privatization mechanism to avoid the potential exposure of the sensitive information while still preserving the necessary utility of the data that is going to be released. This mechanism largely leverages the concept of the gametheoretic approach by perturbing the data and retraining the model iteratively between data holder and malicious data attacker. Such a data perturbation idea is highly correlated with feature transformation and selection of the raw data input that correlated with private labels.
Protecting privacy has been extensively explored in myriad literature. A popular procedure is to anonymize the identifiable personal information in datasets (e.g. removing name, social security number, etc.). Yet anonymization doesn’t provide good immunity against correlation attacks. A previous study [4] was able to successfully deanonymize watch histories in the Netflix Prize, a public recommender system competition. Another study designed reidentification attacks on anonymized fMRI (functional magnetic resonance imaging) imaging datasets[5]. On the other hand, the Differential privacy (DP) [6] has a strong standard of privacy guarantee and is applicable to many problems beyond database release [7]. This DP mechanism has been introduced in data privacy analysis in control and networks[8, 9, 10, 11]. In particular, [8] gave a thorough investigation on performing the centralized and distributed optimization under differential privacy constraints. In this line of research, [10] and [11] focused on the cases of dynamic data perturbations in control systems. [9] presented a noise adding mechanism to protect the differential privacy of network topology.
However, training machine learning models with DP guarantees using randomized data often leads to a significantly reduced utility and comes with a tremendous hit in sample complexity[12, 13]. A recent work [14] applied the DP concept on a deep neural network to demonstrate that a modest accuracy loss can be obtained at certain worstcase privacy levels. However, this was still a “contextfree” approach that didn’t leverage the full structure between the data input and output.
To overcome the aforementioned challenges, we take a new holistic approach towards enabling private data publishing with consideration on both privacy and utility. Instead of adopting worstcase, contextfree notions of data privacy (such as differential privacy), we introduce a contextaware model of privacy that allows the data holder to cleverly alter the data where it matters.
Our main contributions are listed as follows. First, with the goal of having a “distribution free” data releasing mechanism and inspired by general minmax games, we investigate a typical way of perturbing the data that is the compression using the datadriven approach. As a second contribution, we formulate the interaction between data holders and attackers through convex optimization during the minmax game when both players apply linear models. A corresponding equilibrium can be found and used as the optimal strategy for the data holder to yield the altered data. The third contribution is that our thorough evaluations of realistic datasets demonstrate the effectiveness of our compressive adversarial privacy framework. Finally, we leverage the mutual information to validate that sensitive information can be protected from the privatized data.
The remainder of our paper is arranged as follows. In section II, we introduce the general adversarial privacy game. Section III describes the compressive adversarial privacy game with several cases of realistic data analyses. Section IV describes the quantification of privacy. Section V concludes the paper.
Ii Privacy Preserved Releasing
We propose a general data publishing framework by incorporating the following game concept. In general there are two roles in this data releasing game: a data holder and a data consumer. Among data consumers, some people are good users who explore the pattern and extract the value of the data. We merge the good data consumers together with data holders and address their roles from the data holder perspective, since good data user are irrelevant in this game. Yet there are people who also try to learn personal sensitive information from the data on purpose. We define those malicious users as attackers. We focus on the data holders and attackers for the following description of the game.
Consider a dataset which contains both the original data and the associated customers’ sensitive demographic information (e.g. account holder age, house squarefootage, gender, etc.). Thus a sample has a record . We denote the function as a general mechanism for a data holder to release the data. The released data are denoted as for the customer , which can also be described as . Notice we don’t release to the public because it’s private information. Generally speaking, . Let the function represent the adversarial hypothesis, e.g. the estimated outcome . The attacker would like to minimize the inference loss on private labels, namely given some loss function , while the data holder would like to maximize the attacker’s loss, and in principle, also wants to preserve the quality of the released data for research purposes. This data quality is characterized by some distance function measuring between the original data and the altered data. Therefore, we formulate a minmax game between the data holder and the attacker as follows:
(1)  
(2) 
where could be some distance function, such as Total Variation (TV), Wasserstein1, or Frobenius norm, etc. [15, 16], and is a hyperparameter. The constraint ensures that the released data will not be distorted too much from the original data.
This framework allows an attacker to incorporate any loss functions and design various adversarial inference models, which typically take the released data to predict the personal information. Given such a challenge, the data publisher has to design a good privatization mechanism and to deteriorate the attacker’s performance, which are also data dependent. For simplicity, we focus on the supervised learning setting in this work, but the concept can potentially be extended to the unsupervised learning.
Iii Compressive Adversarial Privacy
A typical method to enforce data privacy is data compression. This method is well studied in [17] from a theoretical point of differential privacy. In reality, data compression is used in many applications such as text messaging and video transmission to protect the privacy. In this section, we extend the general minmax framework to a compression approach, namely a compressive adversarial privacy framework, as shown in Figure 2
We focus on two scenarios to illustrate concrete privatization mechanisms. The first one is the linear compression when an attacker takes a linear model. The second one is the nonlinear compression when an attacker uses a neural network. We evaluate of both cases based on real data.
Iiia Linear compression with continuous label
We introduce the case of an attacker who uses a linear model and the least squared loss function to infer private information through released data. This case is practical especially when private labels are continuous and have a linear relationship with the original data. We denote the original data matrix and altered data matirx , where is number of samples, is number of features, and is the set of real numbers. Such a data matrix contains individual samples and respectively, where . The privateinfo matrix consists of that each sample has types of private labels.
Consider a data holder who has a simple datareleasing mechanism that applies a linear transformation of , namely projecting it down to lower dimensions to protect confidential information, i.e. , where the matrix . Hence, . In order to release meaningful data that still can be utilized by a majority of good users, the data holder performs a linear operation by multiplying on and recovers it back to the same dimension as , yielding . The attacker fits a linear model to minimize the mean squared loss that is , where . Because the domain of is contained in which is in . This attacker’s loss is lower bounded by
(3) 
where . Therefore, when the data holder maximizes the attacker’s loss, we can maximize this lower bound that automatically maximizes the minimum loss of the attacker. The resulting minmax problem can be formulated as
(4)  
(5) 
where is the Frobenius norm. Given , we further simplify the expression by finding the best recovering matrix in place of as follows (see VIB for details):
(6) 
We denote the to be the pseudoinverse of . The best predictor for the attacker can be expressed as , where . Substituting and , we have the following problem:
(7)  
(8) 
Notice that . By flipping the sign of the maximization and denoting which is a positive semidefinite matrix (see the appendix VIC), we have the following problem:
(9)  
(10)  
(11)  
(12) 
We put a rank constraint (12) because the dimension is . This problem can be further relaxed to a convex optimization by regularizing the nuclear norm of matrix as follows:
(13)  
(14)  
(15) 
where is the nuclear norm of a matrix that heuristically controls the rank of a matrix. Such a convex relaxation allows the data publisher to find an optimal solution of , and correspondingly yields the appropriate (see appendix VIA). Thus, both players can achieve an equilibrium in this game. To ensure the problem is feasible, one caveat is that we cannot pick arbitrarily small without considering the aforementioned rank . We note that is a low rank approximation of the original data matrix .
Theorem 1
Suppose a rank matrix consists of the singular values . With the best rank approximation under Frobenius norm, the distortion threshold is at least .
We put the proof in appendix VIF. The theorem reveals the relationship between setting the distortion tolerance and the rank . Hence, a simple algorithm (Algorithm 1) is proposed for the data holder to generate .
Remark 1
This approach can be interpreted as releasing a low dimensional approximation to a set of data, incorporating the relation between the original data and private labels, while still maintaining a certain distortion between the released data and the original data.
We also discovered that a similar scheme can be applied on compressing original data with additive Gaussian noise. See Appendix VIE for details.
IiiB Case study: Power consumption data
The first experiment of our analysis uses the CER dataset, which was collected during a smart metering trial conducted in Ireland by the Irish Commission for Energy Regulation (CER) [18]. The dataset contains measurements of electricity consumption gathered from over 4000 households every 30 minutes between July 2009 and December 2010. Each participating household was asked to fill out a questionnaire about the households’ socioeconomic status, appliances stock, properties of the dwelling, etc.[19]. To demonstrate our concepts, we sampled a portion of the customers who has valid entries of demographic information, e.g. number of appliances and floor area of the individual house. In the following experiment, we treat floor area as private data .
Throughout the case simulations, we extract the fourweek time series in September 2010. Since the power consumption (in kilowatts) is recorded every 30 minutes, there are entries for a single household. To simplify the input dimension and avoid the overfitting issue from raw input, we compute a set of features on the electricity consumption records of a household. The features then serve as the input to the prediction model. Table III lists all 23 features we calculated from electricity consumption data, which is also used in [19]. We treat these features as and normalize them such that they range from 0 to 1. Data normalization is required in our experiment in that it gets rid of the scale inconsistency across the different features.
In the linear transformation model, given is the private information, we run algorithm 1 to release . This procedure involves solving semidefinite programming, which could be slow when the dimension of input samples is large. So we partition the samples into several groups with a reasonable number of households in each group (e.g. 30 to 40 as long as the number of households is larger than the number of features). After running experiments on several rank conditions of data matrices, we found that lower rank indicates better privacy (higher prediction error), given in Table I. With a low rank condition that the data holder maintains, the attacker can barely (see Table I) predict the private label . We also partitioned the data into 80% for training and 20% for testing. Table I shows the corresponding results with different ranks of the compression matrix for the testing set. A batch of released data differs from the original when rank is and . The difference is shown in Figure 4.
Rank  RMSE  distortion  

4  6.7e+04  1.61e08  0.616 
10  7.1e+01  2.28e03  0.081 
18  8.7e01  1.12e02  0.079 
23  3.9e02  8.01e01  0 
IiiC Nonlinear compression with categorical variable
Another common type of data has publishable features are highdimensional continuous and the private labels are discrete, for instance, images with some discrete labels (e.g. gender). Generally speaking, a sample has and where is the th row of the data matrix . The data holder designs a nonlinear compression mechanism to reduce the classification accuracy of given , where . We assume the attacker can use an advanced model, e.g. neural networks, to estimate the private labels. We further specify that and are functions parametrized by and . The attacker minimizes the estimation loss, that is, . The data holder designs a compressive function to maximize the attacker’s loss as well as maintain a certain distortion as aforementioned in equations (1),(2). This minmax game is difficult to find its equilibrium point in the context of neural networks with constraints, because the objective functions are nonconvex with respect to parameters. Therefore, we use a heuristic way to cast the constrained optimization into a unconstrained optimization with regularization as follows:
(16)  
(17)  
(18) 
where and are the hyper parameters controlling the iterates satisfied by the constraints. The distortion is characterized by the averaged Euclidean norm of the difference in samples. We propose a simple minmax alternative algorithm (Algorithm 2) to obtain the parameter for the function and yield the corresponding .
Similar to the idea of the Augmented Lagrangian method[20], the scale of and are gradually increasing as the iteration step increases. The term (18) is added to ensure the solution strictly satisfies the constraint mentioned in expression (2). Other alternative approaches are also proposed in [21, 22, 23]. Distinguished from those works, we construct a convex approximation with distortion constraints that is applied in privacy games.
IiiD Case study: Images of people
To perform our experiment of the nonlinear compressive model with a categorical response variable, we use the Groups of People dataset[24]. The dataset contains 4550 images from Flicker of human faces with labeled attributes such as age and gender. These images are in grayscale pixels ranging from 0 to 255, with 3500 training and 1050 testing samples respectively. In this experiment, the images are and the label of gender, which is evenly spread in both the training and testing sets, is . We label female or male as 1 or 1. Sampled raw images are shown in Appendix VIH.
For the data holder to perform nonlinear compression, we implement a threelayer neural network, which shares the similar concept of the autoencoder[25]. The first two layers serve as an encoder. The initial layer has 2989 units that takes original vectorized images [], followed by a ReLU activation and batch normalization. We vary the second layer units from , and for several cases, which are denoted as compressionrank. We define the corresponding compressionrank rate 0.685, 0.171, and 0.043 to be high, medium and low respectively^{1}^{1}1the compressionrank rate is obtained by number of bottleneck units divided by input units. e.g. . The last layer, connected with ReLU activation, has the same dimension as the vectorized image input that performs the role of a decoder. The attacker is represented by a 3layer neural network, comprised of an initial 2989 units layer, followed by 2048 units layer, and lastly a two units layer as softmax output. We apply leaky ReLU activation and batch normalization between each layer.
Before considering adversarial compression, we first classify reconstructed images with different compressionrank rates without having a minmax game. This operation serves two purposes: a) investigating the accuracy of gender classification; b) fetching the minimum distortion threshold in the context of mean squared error loss (i.e. min yields the smallest ). The following results are evaluated based on the testing set. Figure 5 displays a sampled image associated with different scenarios. A lower compression rank rate yields a worse image quality. Table II shows that compressing images with the high and medium ranks doesn’t reduce the gender classification accuracy too much, yielding a relatively low image quality loss. In the example of high compressionrank, the average distortion per pixel is which is not too large.
compressionrank  accuracy(gender)  distortion/pixel  distortion 

raw (2989)  0.692  0  0 
high (2048)  0.685  0.0166  0.195 
medium (512)  0.664  0.0259  0.304 
low (128)  0.627  0.0312  0.365 
Utilizing the previous result as a reference, we pick several proper values of to further understand the adversarial privacy compression. In the high compressionrank case, we test three scenarios where is and respectively. We discover that the encoderdecoder tends to alternate pixels near eyes, mouths, and rims of hair. A similar patten can also be observed when we test the low compressionrank case where is and . We also notice that the low compressionrank scenario has a more scattered dotted patten of black/gray pixels at the large tolerance level, whereas the high compressionrank case has more concentrated black pixels, as shown in Figure 6. We believe the reason is that the data holder always adjusts the pixels that are highly correlated with gender. Since the high compressionrank encoderdecoder preserves more information than the low compressionrank one, it’s much easier for the data holder to alter the target pixel features within limited total distortion. The privatized images generated through minmax training indeed yield lower prediction accuracy of gender than the original encodeddecoded images. Table IIID depicts the gender classification results indicating that it is harder to predict gender with increased distortion. The table also reveals that higher compression rank performs better in terms of decreasing the accuracy if the distortion is sufficiently large.
[!hbpt] compressionrank high (2048) 0.628 0.600 0.573 0.486 medium (512)^{1} 0.607 0.594 0.512 low (128) 0.602 0.585 0.521

is unattainable, since the compression rank is small enough so that the minimum reconstruction loss (Mean Squared Error) is already reach to the 0.3.
Iv Privacy Guarantee
Our previous experiments show that a (local) equilibrium can be achieved through this minmax game approach. While we cannot preclude that there may be some other equilibria in the context of a neural network. Thus, a quantifiable metric is needed to give privacy guarantee between sensitive data and altered data.
In this section we introduce the empirical mutualinformation concept to quantify the privatization quality of this minmax game approach, i.e. measuring the correlation between the sensitive response data and the released feature data pre and postprivatization. Mutual information (MI) [26] is a well established tool that has been widely adopted to quantify the correlation between the two streams of data by a nonnegative scalar [27]. From the data driven perspective, we have the empirical MI , where characterizes the empirical entropy. This empirical entropy can be calculated using the classical nearest th neighbor method [28].
Continuous response label: Given that is continuous, the mutual information can be expressed as , where can be obtained directly from method in [28], given samples . The joint empirical entropy is calculated by concatenating each and together as one sample and using the nearest neighbor entropy estimation again.
Categorical response label: For the discrete response , we have the mutual information , where can be approximated by the sample frequency in the dataset, and can be obtained by the aforementioned th nearest neighbor method with partitioned samples according to the value of .
For the experiment of the continuous response variable, we first calculate the empirical MI between the power consumption statistics and floor areas. The original MI between power usage statistics data and floor area is . The resulting MI between altered power usage data and floor areas, which is denoted by , are 0.995, 0.494, and 0.216 when the rank of compression matrices are , , and . For the categorical response variable experiment, the empirical MI between the images data and gender data is obtained as follows. The original MI between raw images and gender label is . When we pick high compressionrank with and , the are , and . The medium compressionrank yields to be , and for and respectively. In the low compression rank case, are and with the aforementioned . We notice the empirical MI indeed decreases as the distortion increases. Due to the challenge of high dimensional data, we apply the principal component analysis to project the down to 16 dimensions and find the approximate . This is an alternative attempt to demonstrate the effectiveness of using our framework. The changes of mutual information value show the privacy guarantee between the released data and sensitive labels under various distortion conditions. Yet we believe a more advanced architecture of the neural network can be applied to extract the embeddings of semantic features, resulting a better estimates of empirical mutual information. We will explore this potential direction in our future research.
V Conclusion
Recent breakthroughs in artificial intelligence require a huge amount of data to support the learning quality of various models. Yet the risk of data privacy is often overlooked in the current data sharing processes. The recent news of data leakage by Facebook shows that privacy risk could significantly impact some issues in politics. Therefore, securely designing a good data privatization mechanism is important in the context of utilizing machine learning models. By thorough evaluations, our new minmax adversarial compressive privacy framework provide an effective and robust approach to protect private information. We leverage the datadriven approach without posing assumptions on data distribution. It’s crucial during practical implementation, since the real data is often more complicated than a simple characterization of a parametric probability distribution. Along this line of research, many interesting extensions can be built on our framework to create a robust privacy protector for data holders.
Vi Appendix
Via Casting linear problem
Consider the following problem
(19) 
where matrix is , matrix is , and matrix is . We give a brief minimizer derivation as follows:
The derivative of the first term with respect to is
The derivative of the second term with respect to is
Thus, we set the derivative equals zero and obtain the minimizer as follows:
(20)  
(21)  
(22)  
(23) 
Now we design the such that
(24)  
(25) 
Instead of explicitly designing a low rank matrix , we solve an alternative equivalent problem of determining a low rank matrix to compress the data.
ViB Recovering Linear Operation
Claim: is pseudoinverse of , i.e. .
We apply Singular Value Decomposition (aka SVD, which is similar to PCA) on data matrix . is , is , is . We have
ViC Proof of positive semidefinite property of a matrix
Claim: is Positive Semidefinite. \proofshow is positive semidefinite. Since , for any vector , we have
Therefore, we can apply Signaler Value Decomposition on , we get , where . The resulting can be expressed as
where
. Because all are positive, we denote . Hence . And
(26) 
For any , we have
(27)  
(28) 
Thus is positive semidefinite.
ViD Convexity of a reparameterized problem
Claim: The following optimization is convex:
(29)  
s.t.  (30) 
It is easy to see that the first term is convex, since is positive semidefinite, trace operator is linear with respect to . The second term and third term are also convex. For any norm, given and two matrices , we have
Hence Frobenius norm and Nuclear norm are specific forms of norm that is convex with respect to . The first term in the objective is just linear in . Thus, the problem is convex.
ViE Deriving linear compression with noise
Consider the case . We have the minmax game as follows
(31)  
(32) 
For attacker, we have the following minimization problem
(33)  
(34)  
(35) 
We first find the minimizer for the attacker. By taking the derivative over , we have
(36)  
(37) 
Also we also find the best recover matrix by considering the following relation
(38)  
(39)  
(40) 
Taking the derivative over and set it equals 0, we have . Hence the data holder’s maximization can be casted into
(41)  
(42) 
It is not difficult to discover that is also positive semidefinite. Thus the problem can be relaxed to convex optimization.
ViF Proof of Theorem 1
\proofDenote , where is singular value, are corresponding left and right singular vectors. The best rank approximation is achieved by SVD in Frobenius norm by EckartYoung theorem[29]. Then
ViG Features extracted from power data
Index  Description 

1  Week total mean 
2  Weekday total mean 
3  Weekend total mean 
4  Day (6am  10pm) total mean 
5  Evening (6pm  10pm) total mean 
6  Morning (6am  10am) total mean 
7  Noon (10am  2pm) total mean 
8  Night (1am  5am) total mean 
9  Week max power 
10  Week min power 
11  ratio of Mean over Max 
12  ratio of Min over Mean 
13  ratio of Morning over Noon 
14  ratio of Noon over Day 
15  ratio of Night over Day 
16  ratio of Weekday over Weekend 
17  proportion of time with 
18  proportion of time with 
19  proportion of time with 
20  sample variance of 
21  sum of difference 
22  sample cross correlation of subsequent days 
23  number of counts that 
ViH Images
ViI Low rank linear transformation
References
 [1] B. Oshri, A. Hu, P. Adelson, X. Chen, P. Dupas, J. Weinstein, M. Burke, D. Lobell, and S. Ermon, “Infrastructure quality assessment in africa using satellite imagery and deep learning,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery; Data Mining, ser. KDD ’18. New York, NY, USA: ACM, 2018.
 [2] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” in Advances in neural information processing systems, 2014, pp. 3104–3112.
 [3] R. Sevlian and R. Rajagopal, “A scaling law for short term load forecasting on varying levels of aggregation,” International Journal of Electrical Power & Energy Systems, vol. 98, pp. 350–361, 2018.
 [4] A. Narayanan and V. Shmatikov, “Robust deanonymization of large sparse datasets,” in Security and Privacy, 2008. SP 2008. IEEE Symposium on. IEEE, 2008, pp. 111–125.
 [5] E. S. Finn, X. Shen, D. Scheinost, M. D. Rosenberg, J. Huang, M. M. Chun, X. Papademetris, and R. T. Constable, “Functional connectome fingerprinting: identifying individuals using patterns of brain connectivity,” Nature neuroscience, vol. 18, no. 11, pp. 1664–1671, 2015.
 [6] C. Dwork, “Differential privacy: A survey of results,” in International Conference on Theory and Applications of Models of Computation. Springer, 2008, pp. 1–19.
 [7] C. Dwork, A. Roth et al., “The algorithmic foundations of differential privacy,” Foundations and Trends® in Theoretical Computer Science, vol. 9, no. 3–4, pp. 211–407, 2014.
 [8] J. Cortés, G. E. Dullerud, S. Han, J. Le Ny, S. Mitra, and G. J. Pappas, “Differential privacy in control and network systems,” in Decision and Control (CDC), 2016 IEEE 55th Conference on. IEEE, 2016, pp. 4252–4272.
 [9] V. Katewa, A. Chakrabortty, and V. Gupta, “Protecting privacy of topology in consensus networks,” in American Control Conference (ACC), 2015. IEEE, 2015, pp. 2476–2481.
 [10] Z. Huang, Y. Wang, S. Mitra, and G. Dullerud, “Controller synthesis for linear timevarying systems with adversaries,” arXiv preprint arXiv:1501.04925, 2015.
 [11] F. Koufogiannis and G. J. Pappas, “Differential privacy for dynamical sensitive data,” in Decision and Control (CDC), 2017 IEEE 56th Annual Conference on. IEEE, 2017, pp. 1118–1125.
 [12] J. C. Duchi, M. I. Jordan, and M. J. Wainwright, “Local privacy and statistical minimax rates,” in Foundations of Computer Science (FOCS), 2013 IEEE 54th Annual Symposium on. IEEE, 2013, pp. 429–438.
 [13] S. E. Fienberg, A. Rinaldo, and X. Yang, “Differential privacy and the riskutility tradeoff for multidimensional contingency tables,” in International Conference on Privacy in Statistical Databases. Springer, 2010, pp. 187–199.
 [14] M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang, “Deep learning with differential privacy,” in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM, 2016, pp. 308–318.
 [15] X. Nguyen, M. J. Wainwright, and M. I. Jordan, “On surrogate loss functions and fdivergences,” The Annals of Statistics, pp. 876–904, 2009.
 [16] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein gan,” arXiv preprint arXiv:1701.07875, 2017.
 [17] S. Zhou, K. Ligett, and L. Wasserman, “Differential privacy with compression,” in Information Theory, 2009. ISIT 2009. IEEE International Symposium on. IEEE, 2009, pp. 2718–2722.
 [18] C. for Energy Regulation (CER), “Cer smart metering project  electricity customer behaviour trial, 20092010 [dataset],” Irish Social Science Data Archive. SN: 001200. 1st Edition., 2012. [Online]. Available: www.ucd.ie/issda/CERelectricity
 [19] C. Beckel, L. Sadamori, T. Staake, and S. Santini, “Revealing household characteristics from smart meter data,” Energy, vol. 78, pp. 397–410, 2014.
 [20] C. Wu and X.C. Tai, “Augmented lagrangian method, dual methods, and split bregman iteration for rof, vectorial tv, and high order models,” SIAM Journal on Imaging Sciences, vol. 3, no. 3, pp. 300–339, 2010.
 [21] I. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural information processing systems, 2014, pp. 2672–2680.
 [22] J. Hamm, “Minimax filter: Learning to preserve privacy from inference attacks,” arXiv preprint arXiv:1610.03577, 2016.
 [23] C. Huang, P. Kairouz, X. Chen, L. Sankar, and R. Rajagopal, “Contextaware generative adversarial privacy,” Entropy, vol. 19, no. 12, 2017.
 [24] A. Gallagher and T. Chen, “Understanding images of groups of people,” in Proc. CVPR, 2009.
 [25] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” science, vol. 313, no. 5786, pp. 504–507, 2006.
 [26] C. E. Shannon, “A mathematical theory of communication,” ACM SIGMOBILE Mobile Computing and Communications Review, vol. 5, no. 1, pp. 3–55, 2001.
 [27] T. M. Cover and J. A. Thomas, Elements of information theory. John Wiley & Sons, 2012.
 [28] A. Kraskov, H. Stögbauer, and P. Grassberger, “Estimating mutual information,” Physical review E, vol. 69, no. 6, p. 066138, 2004.
 [29] G. H. Golub, A. Hoffman, and G. W. Stewart, “A generalization of the eckartyoungmirsky matrix approximation theorem,” Linear Algebra and its applications, vol. 88, pp. 317–327, 1987.