Adversarial Data Encryption

Adversarial Data Encryption


In the big data era, many organizations face the dilemma of data sharing. Regular data sharing is often necessary for human-centered discussion and communication, especially in medical scenarios. However, unprotected data sharing may also lead to data leakage. Inspired by adversarial attack, we propose a method for data encryption, so that for human beings the encrypted data look identical to the original version, but for machine learning methods they are misleading.

To show the effectiveness of our method, we collaborate with the Beijing Tiantan Hospital, which has a world leading neurological center. We invite doctors to manually inspect our encryption method based on real world medical images. The results show that the encrypted images can be used for diagnosis by the doctors, but not by machine learning methods.


120201-484/0010/00meila00aYingdong Hu, Liang Zhang, Wei Shan, Xiaoxiao Qin, Jin Qi, ZHenzhou Wu and Yang Yuan \ShortHeadingsAdversarial Data EncryptionHu, Zhang, Shan, Qin, Qi, Wu and Yuan \firstpageno1




Adversarial Examples, Adversarial Attack, Healthcare, Data Sharing, Data Encryption

1 Introduction

Data sharing is a crucial and necessary component in many human-centered activities. For example, imagine a radiologist who wants to discuss Magnetic Resonance Imaging (MRI) findings with her/his research collaborators. To support her/his findings in a scientific way, she/he may need to send thousands of medical images to other experts.

With the advancement of deep learning, data like medical images has become a valuable asset that is highly sought after. As a result, for many data-sensitive organizations such as hospitals, unprotected data sharing can be very risky. One of the major concerns is that instead of just being used for intellectual discussion, the data might be used for commercial purposes in training machine learning models without the owner’s consent.

Figure 1: Our goal of data sharing: for experts the encrypted data and original data can both be used for discussion, but for ML models the encrypted data are useless in training.

Our goal is to derive a technique that encrypts the image such that it is visually indifferent from the original version, but for a model trained on the encrypted data, it works poorly on the original data as illustrated in Figure 1. Doing so makes the encrypted data as well as the model trained on the encrypted data useless for the data stealers.

We get our inspiration of encryption and content preservation from adversarial examples (Szegedy et al., 2014; Biggio et al., 2013; Engstrom et al., 2017; Athalye et al., 2018), where the perturbations are visually imperceptible but can easily fool a network.

Such adversarial examples arise because the convolutional filters tend to emphasize local features like textures or patterns (Brendel and Bethge, 2019), while humans are able to focus on global structure. Local features are non-robust because they are sensitive to minor perturbations. Following Ilyas et al. (2019), we name the local features/global structures as non-robust/robust features, respectively.


Robust features: Dog

Non-Robust features: Dog

Original training data



Robust features: Dog

Non-Robust features: Cat

Encrypted training data






Robust features: Dog

Non-Robust features: Cat

Evaluated on encrypted test set

Robust features: Dog


Non-Robust features: Dog

Evaluated on original test set

good accuracy

bad accuracy

Figure 2: A conceptual diagram of the encryption principle. We encrypt the data by changing the non-robust features of the data.

While adversarial examples were commonly used for hacking ML systems (Athalye et al., 2018; Evtimov et al., 2017; Brown et al., 2017; Li et al., 2019a, b), it turns out that we can use it as a data encryption method by exploiting the weakness of the existing models in handling local features. See Figure 2 for details. The goal is to encrypt these local features with easy-to-learn and misleading information such that a trained network will easily overfit this misleading information, making the model prone to wrong judgments on the original images.

The security of our method relies on the hardness of solving the adversarial example problem. In other words, our encryption method fails if there are models that are trained in adversarial domain but work indifferently in non-adversarial domain (or vice versa), or there are methods for projecting the adversarially perturbed data into the natural data manifold. However, there had been many explorations in both directions (Meng and Chen, 2017; Song et al., 2017; Samangouei et al., 2018a; Santhanam and Grnarova, 2018; Xie et al., 2019), but so far no satisfactory solution has been found.

To show the effectiveness of our method, we first conduct experiments on CIFAR-10. On top of ensuring the encryption works, we also make sure that the image before and after encryption is of negligible difference. To further investigate whether the encryption will destroy minute local information, we collaborate with Beijing Tiantan Hospital, which has the largest neurological center in China. We conduct the experiment on MRI images, and also invite doctors to manually evaluate the difference between the original and the encrypted images. Our results show that our encryption does not affect the diagnosis of the doctors, but significantly lower the performance of machine learning methods.

2 Preliminaries

We consider the standard classification setting. Let be the input domain, be the output domain, where is the number of classes. Let be the -th instance in the original training set, where is the input and is the corresponding true label. The encryption process is to change every pair to the encrypted pair where . In this paper, we consider the case that is close to in terms of distance.

The learning system aims to learn a classifier using the encrypted training set, which predicts label given input . We hope that the training will “fail”: after training, has high classification accuracy on the encrypted data, but low classification accuracy on the original data. This means the training was successful on the encrypted data, but it cannot be used for natural data.

2.1 Robust and Non-robust Features

In order to explain adversarial examples, people have proposed the notion of robust and non-robust features (Ilyas et al., 2019). There are no formal definitions, but intuitively, each data point may contain both “robust” and “non-robust” features. Robust features correspond to patterns that are predictive of the true label even when is adversarially perturbed under some pre-defined perturbation set, e.g. the ball. Conversely, non-robust features correspond to patterns that are also predictive, but can be easily “flipped” by adversarial perturbations.

Humans can only perceive robust features, so after adversarial perturbations, we can hardly see the changes in the data. However, ML models will use both types of features to minimize the loss during the training, therefore flipping non-robust features will have a huge impact on their prediction accuracy.

3 Basic Encryption

In this section, we introduce the basic encryption method based on adversarial attack. We will first present the specific steps of our encryption method and explain the underlying principle. Afterward, we run simulations on CIFAR-10 to validate our method, and also show one potential weakness of the basic encryption method.

3.1 Basic Encryption Method

We first define a permutation for all the classes, and we call a class correspondence. For a given class , decides the target class so that the inputs belong to class will be perturbed with non-robust features of class .

We also need to train a base classifier on the original training set, where can be any modern neural network structure (e.g. ResNet or DenseNet). Given any input-label pair in the original training set, we compute the encrypted input as follows.


where is the loss function defined for the prediction and target label , and is a small constant. We solve this optimization problem using projected gradient descent (PGD) (Madry et al., 2017). Because is small, the resulting encrypted inputs looks almost the same as the original input .

All the encrypted input-label pairs make up the new encrypted training set. The overall encryption process is similar to the normal adversarial attack, except that we pick a fixed target class for each source class, and the attack is made for the training set instead of the test set. As we will see in Section 3.3, using a fixed target class is necessary in our setting.

3.2 Underlying Intuition

When applying the encryption, we change the non-robust features of the data. All the inputs in the encrypted training set exhibit non-robust features correlated with target class , but robust features correlated with source class . For example, for every image containing a dog, we may modify its non-robust feature to be cat. Hence, during the training, robust dog features and non-robust cat features always come together and are labeled as “dog”. Similarly, robust bird features and non-robust dog features always come together and are labeled as “bird”. The classifier will easily learn this pattern. However, when a natural dog image comes, it has both robust dog features and non-robust dog features. The classifier will get confused because robust dog features are correlated with the label “dog”, but non-robust dog features are correlated with the label “bird”. Two guidelines confuse the model and cause it to make incorrect predictions. A conceptual description of the encryption principle can be found in Figure 2.

3.3 Simulation of Basic Technique

We have run simulations based on CIFAR-10 dataset, which contains classes. Our base classifier is a ResNet-50, and its classification accuracy on the original test set is . We then randomly generate a class correspondence function for the source class. The correspondence we use is: .

During the encryption process, our perturbations are constrained in -norm while each PGD step is normalized to a fixed step size. We only add small perturbation, the , step size and iterations are set as 0.5, 0.1, 100 respectively.

In Table 1, we report the performance of three different models (all models are implemented in PyTorch and used in Li, 2019) in four different settings: Enc, Orig, Orig+P, R+Orig. In all settings, the models are trained using the encrypted training set, but in the first three settings, the training set is encrypted using our fixed class correspondence (fixed class correspondence is a key step in the basic encryption method), while in R+Orig, we encrypt each data point by picking random class targets. The test data set in Enc is encrypted using our class correspondence function, and for other three settings, we use the original test set. On top of that, in Orig+P, we add a post-processing step, where the output of the model is for the input and classifier .

Model Enc Orig Orig+P R+Orig
DenseNet-100bc (Huang et al., 2019) 94.70% 22.61% 32.00% 94.26%
PreResNet-110 (He et al., 2016) 94.64% 20.67% 41.95% 93.26%
VGG-19 (Simonyan and Zisserman, 2014) 93.58% 28.77% 34.43% 92.17%
Table 1: Classification accuracies of different models on the CIFAR-10 encrypted test set and original test set.
Class Proportion
0 9.4%
1 0.3%
2 0.2%
3 36.8%
4 30.6%
5 0.9%
6 0.3%
7 0.2%
8 21.2%
9 0.1%
Table 2: The prediction distribution for images in class .

From the first two columns of the Table 1 (Enc and Orig), we can see that our encrypted dataset has achieved its purpose: the model has high accuracy on the encrypted test set (the accuracy is similar to the results obtained by training on the original training set and predicting on the original test set), and has extremely low accuracy on the original test set. We remark that the high accuracy on the encrypted test set does not mean that the model will be useful if one knows how to encrypt the data. This is because when creating an encrypted data set, we need the extra information of the correct label of each input, and then apply the correspondence function to perturb the input. In practice, without knowing the true label, one cannot encrypt the data even with the corresponding function . In other words, the accuracies for Enc is vacuous in practice, and we include it here to serve as a strong benchmark for comparison.

If the model relies more on robust features, the accuracies in Orig (second column in Table 1) should not be extremely low. If the model relies more on non-robust features, the accuracies in Orig+P (third column in Table 1) should be much higher than Orig. However, we see accuracies in Orig are very low, and accuracies in Orig+P are slightly higher than Orig. This shows that the trained model gets confused when seeing the original images (as we explained in Section 3.2), so it may make predictions different from or . Indeed, as we show in Table 2, the prediction distribution of the trained model for images in class is not well concentrated in class or class  (the target class of 0 is 8).

Moreover, the accuracies in R+Orig show that using fixed target class is very necessary, otherwise the trained model will have equally good accuracy as the normal case, which means the encryption fails.

3.4 Recovery of Correspondence Function

As we mentioned before, decrypting the encrypted data is as hard as solving the adversarial example problem. Instead of directly decrypting the data and cracking our method, in this subsection we consider a simpler problem of recovering the correspondence function . According to Table 1, with the recovery of the attacker can only slightly increase the accuracy, but in practice that is also the extra hidden information that we do not want to share.

Assume that there exists an attacker who knows our encryption method, and also has the encrypted data set (e.g., the CIFAR-10 dataset we used in the previous subsection). He may also have some different labeled data sampled from the original data distribution, obtained from other sources.

He can learn the secret class correspondence function as follows. First, he trains a classifier on and a classifier on . Then he uses to simulate our encryption process, that is, based on a correspondence function , he modifies the the data points in to have incorrect non-robust features according to . After encrypting , he uses to make predictions on it. Usually, we assume is a permutation, but here we relax this requirement and allow the target class to appear in multiple locations. Now the attacker can fix the target class for classes -, and enumerate the target class for class . For each possible correspondence , he evaluates the performance of , and finally picks the one with the highest accuracy for class , which will be the actual target class in (see Table 3). After processing the images of classes - in this way, the attacker can get the correct correspondence within attempts.

Test set Test set acc
(orig test set) 22.61%
Table 3: We simulate the attack process. Suppose we use the correspondence defined in Section 3.3 (i.e. ). Assume the attack has a PreResNet-20 trained on the CIFAR-10 encrypted training set as classifier , and has a ResNet-50 trained on CIFAR-10 original training set as classifier . The attacker fixes the target class for classes -, and enumerates the target class for class . This table shows the 10 accuracies he gets from , where has the highest accuracy.

4 Combined Encryption

For the basic encryption method, the correspondence function is easy to recover, because if each class of data only corresponds to one specific class, the number of correspondences is limited. In this section, we present the combined encryption method, which addresses this problem. Therefore, it is not only hard to decrypt the data, but also hard to recover the specific encryption method.



Original training data

Left half non-robust features of deer

Right half non-robust features of bird

of total non-robust features of deer

of total non-robust features of bird

Horiz. Concat. encrypted data

Mixup encrypted data














Figure 3: Combined encryption method with different filters. The first row shows the Horizontal Concat method: the left 50% of image 1 is horizontally concatenated with the 50% of image 2. The second row shows the Mixup method: creates a new encrypted image by interpolating between two images, using a constant alpha 0.50.

4.1 Combined Encryption Method

For the combined encryption methods, each class corresponds to multiple target classes (i.e. we select multiple target classes ). Then we modify each input-label pair from the original training set as follows. For each target class , construct encrypted input using the basic encryption method described above. Then, we combine all the encrypted input into a new encrypted input . Formally:


where is a function that maps multiple inputs into a single new encrypted input, and at the same time keeps small. In other words, humans can hardly notice the difference between and . For example, may be a function that concatenates two examples horizontally, or generates a mixing coefficient and produces the new example as a convex combination (similar to Mixup Zhang et al., 2017). See Figure 3 for illustration.

The search space of our combined encryption method is exponentially larger than the basic encryption method. First, each class of data corresponds to multiple classes (the number can vary for different classes), which greatly increases the number of class correspondences. Secondly, there are many different valid functions for combining data. For example, various data augmentation methods such as Mixup (Zhang et al., 2017), CutMix (Yun et al., 2019), and Random square (Summers and Dinneen, 2019) can be adapted as methods to combine multiple inputs into new encrypted inputs. As a result, it is very difficult for an attacker to recover the encryption method (see detailed explanation in Section 4.3).

4.2 Simulation Results

In this subsection, we demonstrate two combined encryption methods on CIFAR-10 dataset: 1) Horizontal Concat; 2) Mixup And Concat.

Horizontal Concat

We select two target classes for each source class (as shown in Table 4), then use PGD to add adversarial perturbations to the image based on its two targets. For each image, we get two sightly changed images. The left of Image 1 is horizontally concatenated with the of Image 2. In practice, we may pick other composition ratios for each source class, e.g., and . As a result, each image contains non-robust features of two target classes.

class 0 1 2 3 4 5 6 7 8 9
target1 8 3 1 0 2 4 9 6 7 5
target2 4 2 3 5 7 1 8 0 6 9
Table 4: Encrypt correspondence used in Horizontal Concat.
Test set Test set acc
orig test set 31.70%
50%left + 50%right(encrypted) 94.44%
40%left + 60%right 94.10%
30%left + 70%right 92.26%
20%left + 80%right 90.28%
10%left + 90%right 85.96%
100%right 82.79%
Table 5: Classification accuracies obtain by changing the composition ratio of two target images

One benefit of this method is that the ratio between the two images is unknown to the attacker (not necessarily -), so it provides protection for the encryption process. As described in Section 3.4, if the attacker wants to recover the correspondence function, he needs to know not only which classes each type of image corresponds to, but also how the two pictures are concatenated together. Table 5 shows the accuracy obtained by using the correct target class set but different composition ratio, set fixed for all source classes (this is a simplification, because empirically one may need to pick different composition ratio for different source classes). Hence, it is no longer easy for the attacker to pick the correct correspondence class set by only looking at the relationship between accuracy and correspondence class, because the composition ratio has a great impact on the final accuracy.

class 0 1 2 3 4 5 6 7 8 9
target1 8 3 1 0 2 4 9 6 7 5
target2 4 2 3 5 7 1 8 0 6 9
target2 6 2 9 1 0 7 5 3 4 8
target2 3 9 6 1 5 8 4 7 0 2
Table 6: Encrypt correspondence used in Horizontal Mixup And Concat.
Model Enc Orig Enc Orig
DenseNet-100bc 94.62% 29.69% 94.45% 32.92%
PreResNet-110 94.49% 32.65% 94.03% 37.21%
VGG-19 94.30% 48.13% 93.06% 55.00%
Table 7: Classification accuracies of different models on the CIFAR-10 encrypted test set and original test set when using Horizontal Concat and Mixup And Concat. The first and second columns correspond to Horizontal Concat, the third and fourth columns correspond to Mixup And Concat.
Test set Test set acc
orig test set 33.29%
(8,4) 41.31%
(4,8) 41.28%
(8,6) 40.37%
(8,5) 40.76%
(8,3) 39.82%
(1,4) 33.06%
(2,4) 38.53%
(3,4) 36.50%
(9,8) 34.81%
(7,8) 39.83%
(4,2) 38.16%
(4,3) 36.45%
(7,0) 32.85%
(5,0) 32.90%
(9,2) 33.51%
(0,3) 32.60%
(6,0) 32.52%
Table 8: Classification accuracies by adding two non-robust features to class 0 image
Test set Test set acc
mixup and concat(encrypted) 93.85%
orig test set 48.14%
target1 81.53%
target1+target2(horiz concat) 82.50%
Table 9: Classification accuracy by guessing different number of targets

Mixup And Concat

Mixup And Concat is a more complicated combination method, where each class of image corresponds to four classes, as shown in the Table 6. Similar to the previous method, for each image we first get four new target images. Then we mixup Image 1 and Image 2 to get Image 5 (e.g. ), mixup Image 3 and Image 4 to get Image 6 (e.g. ). Finally, the left of Image 5 is horizontally concatenated with the right of Image 6.

Compared with Horizontal Concat, this approach is more secure. If the attacker does not know the encryption correspondence in advance, it is difficult for him to figure out the encryption method, as we will see in the next section. Table 7 shows that both Horizontal Concat and Mixup And Concat methods work well, in the sense that after encryption, the data can no longer be used for ML training.

4.3 Recovery of the Encryption Method

To show it is difficult to recover the encryption method, we start with Horizontal Concat, where each source class corresponds to two other classes, and the encrypt correspondence is defined in Table 4. As in Section 3.4, we assume that the attacker has the encrypted dataset and some labeled original data. To find out the encryption method, he first needs to decide for each source class , which is very difficult.

Assume he can set for all , as we used in our encryption, and he needs to figure out the specific correspondence function. For example, as in Section 3.4, we may start with source class . It may correspond to one hundred combinations ( e.g. , etc). Table 8 shows some combinations and their accuracy, assuming the encryption uses and composition ratio. After trying all combinations, the attacker will find the two with the highest accuracy (), which is the correct correspondence. However, from classes -, he has to repeat this process and tries a total of times to find all the target classes. This overhead is quite large, and this is just the case when Horizontal Concat is used. If each class image corresponds to more classes, such as using Mixup and Concat, he cannot find the correspondence efficiently. See Table 9, where if Mixup and Concat is used, even the attacker knows the exact composition ratio and correspondence function, he can only get test accuracy compared with if he knows the exact encryption method. Moreover, is fairly close to the accuracy of , which is the accuracy one can get if he knows one of the target classes used in Mixup and Concat.

In addition, the various hyperparameters in the encryption process (the value of , step size, iteration), how multiple images are combined (Mixup, Concat, or CutMix) are also unknown to the attacker. All these factors show that it is hard for the attacker to even recover the encryption method.

4.4 Domain Adaptation

Although the encrypted data cannot be used for training, maybe it can still provide some other extra information about the data distribution because the encrypted data look similar to the original data. However, in this subsection, we demonstrate that the encrypted data are from a distribution different from the natural data distribution, by applying techniques from domain adaptation.

The rationale is the following. If the two datasets are close to each other, then training on the first one and testing on the second one will give us good accuracy. However, if the two datasets are far away from each other, the training on the first one and testing on the second one will give us bad accuracy. In the latter case, we may use domain adaptation methods to improve the test accuracy.

In Table 10, we use Generate To Adapt (GTA) described in Sankaranarayanan et al. (2018) to illustrate the idea. To form a comparative experiment, we test the effect of using GTA on original training set and new CIFAR-10 test set (as described in Recht et al. (2018), there exists a small distribution shift between the original CIFAR-10 dataset and the new test set). From Table 10, we observe that GTA method improves the accuracy of new test set from to . This proves that GTA can solve the general domain shift problems. But it can only help the attacker improve the accuracy of the original test set from to . This shows that GTA has little effect on our encrypted dataset. In other words, the encrypted data distribution and the original data distribution are far away from each other. Attempts to crack our encryption methods are much more complex and difficult to solve than the general domain shift problem.

Method orig train setnew test set enc train setorig test set
source only 83.80% 28.73%
GTA 89.28% 49.72%
Table 10: Classification accuracies of GTA on original CIFAR-10 and encrypted CIFAR-10

5 Real World Experiments on Medical Data

Figure 4: Brain MRIs from the original test set (top row) and corresponding images from the encrypted test set (bottom row). For each original image, the tumor is marked out using red bounding box.

Magnetic Resonance Imaging (MRI) provides excellent soft tissue contrast of brain tumors without exposing the patient to radiations, consequently it is widely used in the clinical diagnosis of brain tumors. We collaborate with a world-leading neurological center and use their preprocessed, isotropic interpolated brain MRIs as experiment data. For each MRI, we choose the cross-section with the largest tumor size and form a 2D image dataset. The dataset consists of MRIs (320320 pixels each) in 5 classes, and the number of images in each class is not the same. 20% of images from each class is selected as the test set, and the remaining images are used as the training set (As shown in Table 11).

Class Number in training set Number in test set
Meningioma 879 219
Chordoma 1149 287
Schwannoma 541 135
Pituitary Adenoma 1451 362
Craniopharyngioma 233 58
Total 4253 1061
Table 11: Number of each class of brain tumor MRI in the dataset.
Architecture Enc test set Orig test set
DenseNet-100bc 79.01% 38.84%
PreResNet-110 79.92% 33.14%
PreResNet-20 78.10% 43.64%
VGG-19 80.01% 48.84%
VGG-11 79.98% 50.15%
Table 12: Classification accuracies of different models on the brain tumor MRI encrypted test set and original test set.

Original MRIs

Encrypted MRIs



Diagnosis distribution

Error assay




Figure 5: Doctor’s evaluation process for original MRIs and encrypted MRIs

Correct on encrypted Wrong on original

Correct on original Wrong on encrypted



Figure 6: Results of doctors’ diagnoses on the original test set and the encrypted test set.

We use the Horizontal Concat encryption technique. The base classifier is ResNet-18 (since the dataset is not huge, we choose a smaller model to avoid overfitting), and its classification accuracy on the original test set is 79.92%. The correspondence we used is: . The adversarial perturbations are constrained in -norm. Hyperparameter , step size and iterations are set as 2, 0.1, 100 respectively. The experimental process is similar to that in Section 3.3. After constructing the encrypted dataset using the Horizontal Concat encryption encryption method, we train different models on the encrypted training set and observe their performances on the encrypted test set and the original test set.

Results are shown in Table 12. We can see that all the models trained on the encrypted training set have extremely low accuracies on the original test set. We also sample some images from the original and encrypted brain image test sets and display them in Figure 4.

We invite 3 doctors from Beijing Tiantan Hospital, who are experts on brain MRIs, to evaluate the difference between the original and encrypted brain image test sets. The entire evaluation process can be seen Figure 5. Firstly, images in the original test set and the encrypted test set are shuffled randomly. Then, doctors examine and make a diagnosis based on each image in the original test set. Thereafter, doctors examine and make a diagnosis based on each image in the encrypted test set.

In the end, diagnoses of each patient from the two test sets are compared, and the results are summarised in Figure 6. In cases, doctors make the same diagnoses for both original and encrypted images, with cases being correct diagnoses and being wrong diagnoses (this is similar to other brain MRI . In the remaining cases, doctors make different diagnoses, with being correct on the original image and wrong on the encrypted image, and being wrong on the original image and correct on the encrypted image. These results indicate that encryption does not affect the doctors’ diagnosis of brain tumors.

Overall, we think our encryption method works well on real medical data and achieves its goal: for humans the encrypted data and original data can both be used for diagnosis, but for machine learning models the encrypted data are useless in training.

6 Related Work

Researchers have been studying the data sharing problem for a long time. For example, multi-party computation (Yao, 1982, 1986; Goldreich et al., 1987; Chaum et al., 1988; Ben-Or et al., 1988; Bogetoft et al., 2009) considers the setting that multiple parties jointly compute a function, without the need of revealing each other’s private inputs. As another example, differential privacy (Dwork and Nissim, 2004; Blum et al., 2005; Dwork et al., 2006; Dwork, 2008; Abadi et al., 2016) considers the mechanism design problem for database privacy, where adding or removing any single element in the database will only slightly change the outcome for the query to the database.

Adversarial example is an active research area in deep learning. There have been many attack strategies to fool the neural networks (Szegedy et al., 2014; Goodfellow et al., 2015; Madry et al., 2017; Dong et al., 2018; Moosavi-Dezfooli et al., 2016; Carlini and Wagner, 2017). On the other side, researchers have tried to propose defense mechanisms against such attacks to train robust networks (Gu and Rigazio, 2014; Madry et al., 2017; Zheng et al., 2016; Samangouei et al., 2018b; Schott et al., 2018; Cohen et al., 2019; Lee et al., 2019). There are also many papers proposing models to explain adversarial examples (Ilyas et al., 2019; Gilmer et al., 2018; Fawzi et al., 2018; Ford et al., 2019; Tanay and Griffin, 2016; Shafahi et al., 2019; Mahloujifar et al., 2018; Bubeck et al., 2018), among which Ilyas et al. (2019) propose that adversarial perturbations arise as well-generalizing, yet brittle, features (non-robust features).

7 Conclusion

In this paper, we present a new encryption method for the data-sharing problem, so that the encrypted data can be used for human-centered activities, but not for machine learning training purposes. Using the encrypted data, the data stealers cannot train a model that generalizes to original natural data. Our method is based on adversarial attack and can be divided into basic encryption method and combined encryption method. The basic encryption method solves our data sharing problem, and the combined encryption method further improves its security. We present a series of simulations on CIFAR-10 to validate both methods. We also apply our combined encryption to the real-world clinical data and find that our encryption does not affect the doctors’ diagnosis of brain tumors. Our method heavily relies on the hardness of adversarial examples. Hence, for future work, it would be interesting to understand the limitation of adversarial examples theoretically.


This work has been supported in part by the Zhongguancun Haihua Institute for Frontier Information Technology, the Institute for Guo Qiang, Tsinghua University under grant 2019GQG1002, and Beijing Academy of Artificial Intelligence.


  1. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308–318. Cited by: §6.
  2. Synthesizing robust adversarial examples. In Proceedings of the 35th International Conference on Machine Learning, Vol. 80, pp. 284–293. Cited by: §1, §1.
  3. Completeness theorems for non-cryptographic fault-tolerant distributed computation. In STOC, pp. 1–10. Cited by: §6.
  4. Evasion attacks against machine learning at test time. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 387–402. Cited by: §1.
  5. Practical privacy: the sulq framework. In Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 128–138. Cited by: §6.
  6. Secure multiparty computation goes live. In International Conference on Financial Cryptography and Data Security, pp. 325–343. Cited by: §6.
  7. Approximating cnns with bag-of-local-features models works surprisingly well on imagenet. arXiv preprint arXiv:1904.00760. Cited by: §1.
  8. Adversarial patch. CoRR abs/1712.09665. Cited by: §1.
  9. Adversarial examples from computational constraints. arXiv preprint arXiv:1805.10204. Cited by: §6.
  10. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), Vol. , pp. 39–57. External Links: Document, ISSN 2375-1207 Cited by: §6.
  11. Multiparty unconditionally secure protocols. In Proceedings of the twentieth annual ACM symposium on Theory of computing, pp. 11–19. Cited by: §6.
  12. Certified adversarial robustness via randomized smoothing. arXiv preprint arXiv:1902.02918. Cited by: §6.
  13. Boosting adversarial attacks with momentum. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §6.
  14. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference, pp. 265–284. Cited by: §6.
  15. Privacy-preserving datamining on vertically partitioned databases. In Annual International Cryptology Conference, pp. 528–544. Cited by: §6.
  16. Differential privacy: a survey of results. In International conference on theory and applications of models of computation, pp. 1–19. Cited by: §6.
  17. A rotation and a translation suffice: fooling cnns with simple transformations. CoRR abs/1712.02779. External Links: 1712.02779 Cited by: §1.
  18. Robust physical-world attacks on deep learning models. arXiv preprint arXiv:1707.08945. Cited by: §1.
  19. Adversarial vulnerability for any classifier. CoRR abs/1802.08686. External Links: 1802.08686 Cited by: §6.
  20. Adversarial examples are a natural consequence of test error in noise. Cited by: §6.
  21. Adversarial spheres. CoRR abs/1801.02774. External Links: 1801.02774 Cited by: §6.
  22. How to play any mental game, or a completeness theorem for protocols with honest majority. In STOC, pp. 218–229. Cited by: §6.
  23. Explaining and harnessing adversarial examples. In International Conference on Learning Representations, Cited by: §6.
  24. Towards deep neural network architectures robust to adversarial examples. arXiv preprint arXiv:1412.5068. Cited by: §6.
  25. Identity mappings in deep residual networks. CoRR abs/1603.05027. External Links: 1603.05027 Cited by: Table 1.
  26. Convolutional networks with dense connectivity. IEEE Transactions on Pattern Analysis and Machine Intelligence. Cited by: Table 1.
  27. Adversarial examples are not bugs, they are features. In Advances in Neural Information Processing Systems 32, Cited by: §1, §2.1, §6.
  28. A stratified approach to robustness for randomly smoothed classifiers. arXiv preprint arXiv:1906.04948. Cited by: §6.
  29. Adversarial music: real world audio adversary against wake-word detection system. In Advances in Neural Information Processing Systems, pp. 11908–11918. Cited by: §1.
  30. Adversarial camera stickers: a physical camera-based attack on deep learning systems. In International Conference on Machine Learning, pp. 3896–3904. Cited by: §1.
  31. CIFAR-zoo: pytorch implementation of cnns for cifar dataset. Note: \url Cited by: §3.3.
  32. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083. Cited by: §3.1, §6.
  33. The curse of concentration in robust learning: evasion and poisoning attacks from concentration of measure. CoRR abs/1809.03063. External Links: 1809.03063 Cited by: §6.
  34. Magnet: a two-pronged defense against adversarial examples. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 135–147. Cited by: §1.
  35. DeepFool: a simple and accurate method to fool deep neural networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §6.
  36. Do CIFAR-10 classifiers generalize to cifar-10?. CoRR abs/1806.00451. Cited by: §4.4.
  37. Defense-gan: protecting classifiers against adversarial attacks using generative models. arXiv preprint arXiv:1805.06605. Cited by: §1.
  38. Defense-GAN: protecting classifiers against adversarial attacks using generative models. In International Conference on Learning Representations, Cited by: §6.
  39. Generate to adapt: aligning domains using generative adversarial networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §4.4.
  40. Defending against adversarial attacks by leveraging an entire gan. arXiv preprint arXiv:1805.10652. Cited by: §1.
  41. Robust perception through analysis by synthesis. CoRR abs/1805.09190. External Links: 1805.09190 Cited by: §6.
  42. Are adversarial examples inevitable?. In International Conference on Learning Representations, Cited by: §6.
  43. Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556. Cited by: Table 1.
  44. Pixeldefend: leveraging generative models to understand and defend against adversarial examples. arXiv preprint arXiv:1710.10766. Cited by: §1.
  45. Improved mixed-example data augmentation. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Vol. , pp. 1262–1270. External Links: Document, ISSN 1550-5790 Cited by: §4.1.
  46. Intriguing properties of neural networks. In 2nd International Conference on Learning Representations, Cited by: §1, §6.
  47. A boundary tilting persepective on the phenomenon of adversarial examples. CoRR abs/1608.07690. External Links: 1608.07690 Cited by: §6.
  48. Feature denoising for improving adversarial robustness. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 501–509. Cited by: §1.
  49. Protocols for secure computations. In 23rd annual symposium on foundations of computer science (sfcs 1982), pp. 160–164. Cited by: §6.
  50. How to generate and exchange secrets. In 27th Annual Symposium on Foundations of Computer Science (sfcs 1986), pp. 162–167. Cited by: §6.
  51. CutMix: regularization strategy to train strong classifiers with localizable features. CoRR abs/1905.04899. Cited by: §4.1.
  52. Mixup: beyond empirical risk minimization. CoRR abs/1710.09412. Cited by: §4.1, §4.1.
  53. Improving the robustness of deep neural networks via stability training. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §6.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description