Maximal adversarial perturbations for obfuscation: Hiding certain attributes while preserving rest

Maximal adversarial perturbations for obfuscation: Hiding certain attributes while preserving rest

Indu Ilanchezian, Praneeth Vepakomma, Abhishek Singh, Otkrist Gupta, G.N.S.Prasanna, Ramesh Raskar Massachusetts Institute of Technology, Cambridge, U.S.A Indian Institute of Information Technology, Bangalore, India
Abstract

In this paper we investigate the usage of adversarial perturbations for the purpose of privacy from human perception and model (machine) based detection. We employ adversarial perturbations for obfuscating certain variables in raw data while preserving the rest. Current adversarial perturbation methods are used for data poisoning with minimal perturbations of the raw data such that the machine learning model’s performance is adversely impacted while the human vision cannot perceive the difference in the poisoned dataset due to minimal nature of perturbations. We instead apply relatively maximal perturbations of raw data to conditionally damage model’s classification of one attribute while preserving the model performance over another attribute. In addition, the maximal nature of perturbation helps adversely impact human perception in classifying hidden attribute apart from impacting model performance. We validate our result qualitatively by showing the obfuscated dataset and quantitatively by showing the inability of models trained on clean data to predict the hidden attribute from the perturbed dataset while being able to predict the rest of attributes.

1 Introduction

In this paper we investigate the usage of adversarial perturbations for privacy from both human and model (machine) based classification of hidden attributes from the perturbed data. With the advent of distributed learning, several methods such as federated learning and split learning have become prominent for distributed deep learning. At the same time, privacy preserving machine learning is a very active area of research. Adversarial approaches of minimally perturbing the data to fool the model performance while fooling human perception in detecting such a perturbation have become popular. We investigate whether an advesarial perturbation can be used to achieve the following goals:

  1. Damage the model performance for predicting a chosen sensitive attribute while keeping the perfomance of predicting another attribute intact.

  2. Obfuscate with maximal perturbation to make it difficult for human to detect the hidden attribute.

Such perturbations are necessitated in sectors like finance (Bateni et al. , 2018; Srinivasan et al. , 2019; Chen et al. , 2018; Chen, 2018), healthcare (Vepakomma et al. , 2018; Chang et al. , 2018), government, retail (Zhao et al. , 2017; Yao & Huang, 2017) and hiring (Kay et al. , 2015; Harrison et al. , 2018) due to privacy, fairness, ethical and regulatory issues.

We validate our results qualitatively by presenting the perturbed datasets and quantitatively by showing the inability of models trained on clean data to predict the hidden attribute from the perturbed dataset while being able to predict the rest of attributes. We consider these to be useful intermediate experiments and results towards the goal of using adversarial methods for generating perturbations such that when a model is trained from the perturbed data for predicting the hidden attribute, the model performance is under control. We also would like to mention that these approaches can motivate a theoretical study of privacy guarantees of adversarial approaches under worst-case settings.

1.1 Contributions and method

Figure 1: The architecture of the model used is shown above, with two forks, one to preserve the hidden label attribute and the other to model the public label attribute. Adversarial perturbations are added at the layer preceding the hidden attribute fork.

We employ adversarial perturbation methods with a larger -ball of perturbations to generate images that are pretty hard for humans to classify with respect to the hidden attribute. Typically adversarial perturbations have only been used with small -ball perturbations to the best of our knowledge in rest of the works. However, the goal of this work is to obfuscate the data to hide certain latent information from the both model as well as humans, therefore, we conduct our experiments over a broad spectrum of the epsilon values which encompass the output range which fool machine as well as human. In this study we consider only images but our technique is generic in design and applicable wherever adversarial perturbations can be performed successfully. To generate adversarial perturbation, we use VGG-16 Simonyan & Zisserman (2014) model with pretrained layers on ImageNet Deng et al.  (2009). We employ an architecture as shown in Figure 1 where a fork is created after three blocks of the VGG-16 architecture where each block consists of two convolution layers and one pooling layer. The hidden attribute fork consists of few layers of DNN for local computation. The rest of the network after the red fork is used to predict the label attribute that is supposed to be preserved. We then train the network. Upon training, we then employ adversarial methods of fast gradient sign method (FGSM) and projected gradient based perturbation (PG), to perturb the layer preceding the hidden attribute fork (shown by grey arrow). We although choose a higher -ball of possible perturbations in order to generate the perturbation of this layer with respect to loss function corresponding to only the hidden label attribute. We show detailed results with regards to the quality of our results in the experiments section. In addition we weight the two loss functions with .

1.2 Related work

In Figure 2 we share the landscape of deep learning based approaches for hiding certain attributes in data. We categorize it broadly into 4 approaches that include a.) perturbation of raw data which includes our approach, b.) overlaying mask on raw data to hide certain parts of image, Wang et al.  (2018), c.) modifying the output of intermediate (encoded) representations (Lample et al. , 2017; Vepakomma et al. , 2018, 2019), d.) transforming the entire (or partial) natural image into another natural image (Du et al. , 2014; Chen et al. , 2019; Wu et al. , 2018). We share 6 example methods within these categories. Our approach belongs to the category of ‘adversarial attack based perturbations’. As a sub note, all the above approaches can be further categorized into sub approaches that fool humans and/or machines while our intermediate work focuses on both.

Figure 2: We share the landscape of deep learning based approaches for hiding certain attributes in data. We categorize it broadly into 4 approaches that include a.) perturbation of raw data, b.) overlaying mask on raw data to hide certain parts of image, c.) modifying the output of intermediate (encoded) representations, d.) transforming the entire (or partial) natural image into another natural image. We share 6 example methods within these categories. Our approach belongs to the category of ‘adversarial attack based perturbations’. As a sub note, all the above approaches can be further categorized into sub approaches that fool humans and/or machines.

2 Results

We detail our experimental setup and results in this section. A condensed version of the architecture in our setup is shown in figure 1. We perform experiments where VGG16 is the initially trained model that is adversarially perturbed with large -ball of perturbations. We then predict over the perturbed data with a clean model trained on unperturbed data. Three clean models were trained with architectures of VGG16 and VGG19. The two methods of adversarial perturbations used with large choice of -ball were fast gradient sign method (FGSM), Goodfellow et al.  (2014) and projected gradient adversarial method (PGD), Athalye et al.  (2017). The dataset used was UTKFace which is a large-scale face dataset with long age span. The dataset consists of over 20,000 face images with annotations of age, gender, and ethnicity. The images cover large variation in pose, facial expression, illumination, occlusion, resolution, etc. This dataset could be used on a variety of tasks, e.g., face detection, age estimation, age progression/regression and landmark localization. In Table 1, we show results of the different methods of PGD abd FGSM employed in the large -ballsetting with different weights for the weighted loss and the accuracy of predicting on perturbeddata generated by our approach using clean models of VGG16 and VGG19 trained on unperturbed data. We note that of race predictions by the clean model after our perturbation belong to the majority race class. This shows that our method is successfully able to increase the no-information rate in our predictions of the hidden label attribute of race while preserving gender accuracy. of ground truth of race belong to the same class as well. Therefore we reach the required level of obfuscation.

Figure 3: Original and perturbed images obtained by projected gradient descent based adversarial perturbation shows a much greater quality of perturbed images. The used here is 0.2 and the data belongs to the UTK data set. Race is the attribute preserved and gender is the public attribute. The table 1 shows the corresponding results where the race accuracy is brought down to as ideally desired while the gender accuracy is relatively preserved.
Figure 4: Original and perturbed images obtained by fast gradient sign method based adversarial perturbation. The data belongs to the UTK data set. The ’s used here include 0.3, 0.4 and 0.5. Race is the attribute preserved and gender is the public attribute. The table 1 shows the corresponding results where the race accuracy is brought down to while the gender accuracy is relatively preserved at . This was a good method to investigate although the projected gradient descent based method has better qualitative and quantitative performance.

Epsilon Race Accuracy Gender Accuracy Method Clean Model
1 1 0.2 0% 72.9% PGD VGG16
1 1 0.2 30% 54% PGD VGG19
1 1.00E-05 0.3 40% 65% FGSM VGG16
1 1.00E-05 0.4 40% 58% FGSM VGG16
1 1.00E-05 0.5 40% 54% FGSM VGG16
Table 1: We show results of the different methods of PGD and FGSM employed in the large -ball setting with different weights for the weighted loss and the accuracy of predicting on perturbed data generated by our approach using clean models of VGG16 and VGG19 trained on unperturbed data. The baseline performance on clean data prior to perturbation is for race accuracy and for gender accuracy. We note that of race predictions by the clean model after our perturbation belong to the majority race class. This shows that our method is successfully able to increase the no-information rate in our predictions of the hidden label attribute of race while preserving gender accuracy. of ground truth of race belong to the same class as well. Therefore we reach the required obfuscation target.

3 Conclusion and future work

We investigate large -ball perturbations obtained via adversarial methods for obfuscating one label attribute while preserving rest. We show better performance in fooling human with projected gradient descent based approach and better utility in preserving accuracy of public label attributes with fast gradient sign method based prediction. For future work, we aim to enhance this approach with information theoretic and other statistical dependency minimizing loss functions like distance correlation, Hilbert Schmidt Independence Criterion and Kernel Target Alignment. We note that of race predictions by the clean model after our perturbation belong to the majority race class. This shows that our method is successfully able to increase the no-information rate in our predictions of the hidden label attribute of race while preserving gender accuracy. of ground truth of race belong to the same class as well. Therefore we reach the required baseline. Therefore the other goal would be to raise the performance of predicting the public label attribute as we reach the required obfuscation performance on hidden label attribute.

References

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
392079
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description