Identifying Classes Susceptible to Adversarial Attacks
Despite numerous attempts to defend deep learning based image classifiers, they remain susceptible to the adversarial attacks. This paper proposes a technique to identify susceptible classes, those classes that are more easily subverted. To identify the susceptible classes we use distance-based measures and apply them on a trained model. Based on the distance among original classes, we create mapping among original classes and adversarial classes that helps to reduce the randomness of a model to a significant amount in an adversarial setting. We analyze the high dimensional geometry among the feature classes and identify the most susceptible target classes in an adversarial attack. We conduct experiments using MNIST, Fashion MNIST, CIFAR-10 (ImageNet and ResNet-32) datasets. Finally, we evaluate our techniques in order to determine which distance based measure works best and how the randomness of a model changes with perturbation.
Identifying Classes Susceptible to Adversarial Attacks
Rangeet Pan Department of Computer Science Iowa State University firstname.lastname@example.org Md Johirul Islam Department of Computer Science Iowa State University email@example.com Shibbir Ahmed Department of Computer Science Iowa State University firstname.lastname@example.org Hridesh Rajan Department of Computer Science Iowa State University email@example.com
noticebox[b]Preprint. Under review.\end@float
Protecting against adversarial attacks has become an important concern for machine learning (ML) models since an adversary can cause a model to misclassify an input with high confidence by adding small perturbation . A number of prior works [25, 5, 33, 22, 34, 30, 31, 11, 4, 28, 27] have tried to understand the characteristics of adversarial attacks. This work focuses on adversarial attacks on deep neural networks (DNN) based image classifiers.
Our Contributions. Our work is driven by the two fundamental questions. Can an adversary fool all classes equally well? If not, which classes are susceptible to adversarial attacks more so than others? Identifying such classes can be important for developing better defense mechanisms. We introduce a technique for identifying the top susceptible classes. Our technique analyzes the DNN model to understand the high dimensional geometry in the feature space. We have used four different distance-based measures (t-SNE, N-D Euclidean, N-D Euclidean Cosine, and Nearest Neighbor Hopping distance) for understanding the feature space. To determine the top susceptible classes, we create an adversarial map, which requires the distance in feature space among classes as input and outputs a mapping of probable adversarial classes for each actual class. To create adversarial map, we introduce the concept of the forbidden distance i.e., the distance measured in high dimension which describes the capability of a model to defend an adversarial attack.
We conduct experiments on FGSM attack using MNIST , Fashion MNIST , and CIFAR-10  (ImageNet  and ResNet-32 ) datasets to evaluate our technique. Finally, we compare our results with cross-entropy (CE) and reverse cross-entropy (RCE) based training techniques that defend against adversarial attack. Our evaluation suggests that in comparison to the previous state-of-the-art training based techniques, our proposed approach performs better and does not require additional computational resources.
Next, we describe related works. describes our methodology, describes detailed results with the experimental setup, and concludes.
2 Related Work
The work on adversarial frameworks  can be categorized into attack and defense related studies.
Attack-related studies. Several studies have crafted attacks on ML models, e.g. FGSM , CW , JSMA , Graph-based attack , attack on stochastic bandit algorithms , black-box attacks [20, 24], etc.  studies transferability of attacks, and  surveys the kinds of attack using synthesizing robust adversarial examples for any classifier.
Defense techniques (our work fits here). These works are primarily focused on improving robustness [33, 22, 34], detecting adversarial samples [30, 34], image manipulation [31, 11], attack bounds [4, 28], distillation , geometric understanding , etc. There are other studies on the geometrical understanding of adversarial attack [25, 5, 29]. Papernot et al.’s  work is closely related to ours, where the authors built a capability-based adversarial saliency map between benign class and adversarial class to craft perturbation in the input. In contrast, we utilize distance-based measures to understand a DNN model and detect -susceptible target classes.  utilizes the decision boundary to understand the model and the authors have observed a relationship between the decision boundary and the Gaussian noise added to the input.  has conducted a similar study to understand how decision boundary learned by a model helps to understand high dimensional data and proposed the bound over the error of a model. Our approach finds the relation of high dimensional geometry with adversarial attacks and identifies susceptible classes.
3 Our Approach: Identifying Susceptible Classes
We use distances () to create adversarial map () and use it to pick susceptible classes ().
This study concentrates on the feed-forward DNN classifiers. A DNN can be represented as a function , where is the set of tunable parameters, is the input, is the number of labeled classes and is the number of features and . In this study, the feature space for a model has been represented using . The focus of the paper is to understand the high dimensional geometry to identify susceptible target classes for a model. We calculate distance between two classes and , where . We utilize four different distance-based measures and compare them. In adversarial setting, , where is the perturbation added to the input which causes, . Our assumption in this study is that depends on the of a model. To compare different distances, we calculate the randomness in a model using the entropy. We define the entropy of a model as where,
denotes the probability of input , which has been misclassified to class given the actual label . In this study, we use terms e.g., actual class and adversarial class, which represents the label of a data point predicted by a model and the label after a model has gone through an attack respectively. We introduce a term forbidden distance as , a measured distance which provides the upper bound of displacement of data points in . In this context, displacement represents the distance between the adversarial class and the actual class. In this study, we have conducted our experiment using the Fast Gradient Sign Method (FGSM) . Here, we have chosen single attack based on the Adversarial transferable property [25, 26], which defines that adversarial examples created for one model are highly probable to be misclassified by a different model.
3.2 Hypothesis ()
According to linearity hypothesis proposed in , there is still a significant amount of linearity present in a model even though a DNN model utilizes non-linear transformation. The primary reason behind this is the usage of LSTM , ReLu [15, 7], etc, which possess a significant amount of linear components to optimize the complexity. Here, we assume that the input examples can be misclassified to neighboring classes in the during adversarial attack.
3.3 Distance Calculation
Calculation of t-SNE distance
To understand representation, we utilize t-SNE  dimension reduction technique. t-SNE uses the Euclidean distance of data points in dimension as input and converts it into the probability of similarity, where, represents the probability of similarity between two input data points and in . We calculate the distance based on the . We convert -dimensional problem to a -dimensional problem. In this process, we do not consider the error due to the curse of dimensionality . In 2-D feature space, we have the co-ordinate for a data point and we calculate the center of mass , where is the class. Here, mass of each point , is assumed to be unit, then center of mass represents, , where, represents the data point of class .
Calculation of N-D Euclidean Distance
Furthermore, we calculate the -dimensional Euclidean distance between two data points. Each data point can be represented by a feature vector , where is the vector component of data point of class . Here, has been represented as a coordinate in . We calculate the center of mass similar to the t-SNE based approach. The main difference is the calculated center of mass is a vector of coordinates. Thus, , distance can be calculated as,
Calculation of N-D Euclidean Cosine Distance
We use the dimensional angular distance as our next measure. In this process, we calculate the -dimensional Euclidean distance similar to the prior technique. In -dimension, the angular similarity among the center of mass of classes , can be calculated as, , where is the magnitude of the vector. We leverage the angular similarity and calculate the -dimensional Euclidean angular distance between center of mass of two classes using the following equation,
Calculation of Nearest Neighbor Hopping Distance
Here, we use the nearest neighbor algorithm to understand the behavior of an adversarial attack. For distance calculation, we develop an algorithm which computes the hopping distance 1 between two classes. Initially, we calculate nearest neighbors for each data point. Due to Reflexive property, a data point is inevitably neighbor to itself, which approves different neighbors for a data point. The boundary learned by the nearest neighbor algorithm distinguishes classes by dividing into clusters. Then, nearest neighbors will belong to the same class for most of the data points except the data points located near the boundary. We leverage that information and compute the classes nearest to a particular class.
Let us assume the points in class , , . For example, we find the closest point of outside is , of is , of is , is . As depicted in Figure 1(b), we observe that of the data points in have their closest neighbors in . So, we can say that shares more boundary with and is the closest neighboring class to .
Hopping Distance computes how many hops a data point needs to travel to reach the closest data point in the target class. From the nearest neighbor algorithm, we get the unique neighbors to each data point. Algorithm 1 takes the output from the nearest neighbor, the actual predicted data point and the misclassified label as input. This problem has been converted to a problem of tree generation from lines 2 - 4 . From lines 5 - 21, we expand the tree when a new neighbor has been found and traverse using BFS. Finally, we calculate the depth of the expanded tree to calculate the minimum distance that a data point has to travel to reach the misclassified class in . This algorithm utilizes the same time and space complexity as BFS does, which is for time and for space, as in the worst case, we need to traverse all the neighbors () for an actual class. We also calculate the forbidden distance based on the average hopping distance () for a model. In the Eq.4, and denote the total number of classes and data points respectively. We use Eq.4 for both calculating the forbidden distance () and also the average displacement of data points in under an attack. For calculating the later, is the actual class and is the adversarial class. In order to create the adversarial map, we compute a matrix storing the distance among all classes () using Eq. 5, where is the total number of data points in class .
if average hopping distance , then is closer to than to i.e., the distance of center of mass .
. Without the loss of generality, we can say that, . As center of mass will always be within the polygon surrounding a class, without loss of generality we assume all the are at the same location and all the are at the same location .
Assuming the balanced dataset, , So, the center of mass of is closer to than the center of mass of to . ∎
In , if a class has been misclassified to a closer class , the entropy will decrease.
The entropy . With the increase of , also increases. So, we can say that . From (§3.2), we assume that if a class is close to class , we allocate a higher probability to . So, if classes are mostly misclassified to the closer one, the entropy of the entire model will decrease. ∎
3.4 Adversarial Map
Forbidden distance (): When a model encounters an adversarial attack, each input class requires to travel a certain distance in to accomplish the attack. Based on the attack type, maximum distance changes. We call this forbidden distance () as beyond this distance adversarial attack will not be successful. For example, to accomplish the adversarial attack given a forbidden distance of a model and to misclassify as , distance constraint between and is .
In this section, we describe how we create the adversarial map annotated with distance to neighbors. Here, we utilize the forbidden distance while creating the adversarial map. Hypothetically, any class as shown in Figure 1(a) can be misclassified to any other class by traveling the same distance . But our hypothesis is that every attack has a limitation. A data point in might need to travel different distance for misclassifying to different classes , where . If we represent the distance between and as then the attack can be accomplished more easily where is minimum. In the above equation if is minimum then the attack can missclassify as represented by the function . We create this adversarial map by using the distance between classes and as described in §3.3. Then we introduce the notion of forbidden distance . We claim that the attack on a certain class can misclassify as class if and only if as mentioned in the following equation:
Now, we create the adversarial map from the distance between different classes as depicted in Algorithm 2, which takes the distance between different classes () as input. Then, different edges are added to the graph mentioned in lines 3 - 7. Finally, the adversarial map is returned in line 7. This algorithm runs in time and space complexity.
The attack can misclassify a class only to one of its neighbors in adversarial map.
Let us assume an attack has a prior knowledge of a model, training example of class and can misclassify to which is not a neighbor. We know an attack can only misclassify as if and only if . According to Algorithm 2 if then, is the neighbor of . This leads to a contradiction. So the attack can only misclassify as one of its neighboring classes. ∎
3.5 Susceptible Class Identification
Here, we use the best among four distance-based measures and identify susceptible target classes for a model. In an adversarial setting, we find the target classes which are most likely being misclassified from the actual class. Our primary hypothesis () claims that any class will be misclassified to the nearest class under an attack. In order to identify susceptible classes, we use our mapping between the actual class and adversarial class mentioned in §3.4. For a particular class , we assign weighted probability to all misclassified classes , where based on the distance computed using the best distance-based measure. Higher the distance between two classes, lower the probability of one class being misclassified as another. We perform a cumulative operation on individual probability of being misclassified given the actual input label for . The top classes with highest probability will be identified as the susceptible classes under an adversarial attack.
Cumulative of the individual probability of adversarial classes given the actual classes determines the most susceptible classes of a model.
For a DNN model , the data sets are categorized into classes. For each class , there is a list of at most classes which can be close to . For each class, we determine them based on the hypothesis . The probability of a class being misclassified as can be determined based on the adversarial map. Lesser the distance between and , higher the probability of being misclassifed as . So, . Here, denotes the probability of being misclassifed as . As, are all independent events, the total probability of an adversarial class is and as if class has been misclassified as , we do not consider that as an adversarial effect. Hence, . Without the loss of generality, . So, the probability of an adversarial class is the cumulative of individual probability of that class given all the actual classes. ∎
4.1 Experimental Setup
In this study, we have used MNIST , Fashion MNIST (F-MNIST)  and CIFAR-10  datasets. The number of labeled classes is 10 for each dataset. MNIST and F-MNIST contains 60,000 training images and 10,000 test images. Both train and test dataset are equally partitioned into 10 classes. Each class has 6,000 training and 1,000 test images. CIFAR-10 contains 50,000 training images and 10,000 test images. We have worked on one model each for MNIST and F-MNIST with accuracy 98% and 89% respectively whereas, for CIFAR-10, we have performed our experiment on two models, Simplified ImageNet  and ResNet-32  with accuracy 72% and 82% respectively. For crafting FGSM attack, we have utilized the Cleverhans library. We have experimented using four distance-based measures on each dataset with the label of each data point predicted by the model. We have run our susceptible class detection on the entire dataset with FGSM attack for each model and determine -susceptible target classes. For all experiment with variable perturbation, with change the for each simulation and run the similation from to . Hence, 20 simulations have been executed for each experiment.
4.2 Usability of adversarial map for susceptible class detection
We have claimed that using the adversarial map we can identify the susceptible classes. We will discuss the accuracy of the best distance-based measure in §4.4. We have evaluated our approach on four separate models. In §4.1, we have briefly described each model. For MNIST and F-MNIST, we have utilized a simple model with one input, one dense and one output layer. We have used state-of-the-art models for CIFAR-10 to evaluate our techniques. To calculate the distance, we have implemented four different measures and compared among them by computing the entropy of the model. Initially, we have calculated the entropy of a model, by applying an adversarial attack with a fixed perturbation. In the equation, , we assume that without any prior information, the probability of an input misclassified as an adversarial class given the actual class is . For a fixed model, the value of is constant for all data points based on the previous assumption. However, we have leveraged adversarial map to provide weighted probability to each adversarial class based on the calculated distance between them. We have calculated the entropy based on the weighted probability using calculated distance from the actual class e.g., for actual class has neighbors and . Here, is closer to . In this scenario, . Our goal is to reduce the entropy in a model with the distance-based measures. In Figure 3, we evaluate the change of randomness by computing the entropy for all four models with four distance-based measures and compare them. In all the cases, Nearest Neighbor Hopping distance based measure performs best in decreasing the entropy of a model under an adversarial setting. We have found that with increasing perturbation, the randomness typically increases. In contrast, the entropy in all the cases becomes more or less constant after a certain amount of perturbation. This indicates that, mostly all images are misclassified after certain perturbation and thus the entropy will not change in relation to the perturbation. Surprisingly, we have found that for CIFAR-10 ImageNet and ResNet-32 model, the entropy decreases with increasing perturbation. In Figure 3(c) and (d), initially the entropy increases with increasing perturbation but decreases with increasing perturbation after a certain simulation. We found that data points that have been misclassified with lower perturbation, were classified correctly with higher perturbation. In Figure 5, initially the image has been classified correctly as frog by the ImageNet model. With perturbation , the image has been misclassified as deer. Whereas, with perturbation , the image has been classified correctly.
4.3 Effect of forbidden distance on adversarial attack
In this section, we have shown that the forbidden distance for different attacks and models. Moreover, we have evaluated the effectiveness of in misclassification. We have claimed that the attack can not travel more than under a particular adversarial setting. We have proved our claim by computing on actual training data and demonstrate that the average hopping distance () traveled under an attack which remains less than . After calculating using the Eq.4, we have simulated with increasing perturbation for each model. We have assumed that with increasing perturbation, the force of the attack increases so as the average distance (displacement) traveled by data points in . This is similar to the simple harmonic motion law of physics, which states that the displacement is proportional to the force. We have also evaluated the forbidden distance for each model and found whether the assumption regarding and applies. In Figure 2, we have simulated four cases and found that increases with perturbation and it remains under the bound given by . Hence, provides an upper-bound distance for a model. But, in our approach, we have defined it as the capacity of withstanding an attack for a particular model.
Using our approach, if the hopping distance between the classes and is more than , then can not be misclassified to . We have compared the model’s with after a model has undergone through an attack. We have found that prior knowledge of a model provides a good estimation for describing the behavior of the adversarial examples.
4.4 Effect of adversarial map and susceptible class identifier
In this section, we take advantage of our adversarial map and susceptible class identifier to analyze the threats to a DNN model. We have utilized the adversarial map and computed the top susceptible target classes as described in the . In Figure 4, we have simulated our approach varying perturbation and . Though, it is apparent that with a larger value of , the accuracy of predicting susceptible target classes will increase. We want to increase the accuracy of our approach with least value of . This is a trade-off situation between and accuracy. From Figure 4, we have found that our approach performs best with for all models used in the evaluation. We have compared our work in Table 1 with reverse cross-entropy (RCE)  and common cross-entropy (CE). Our approach can identify the susceptible target class with higher accuracy using CIFAR-10 with ResNet-32 and ImageNet respectively. Whereas, the accuracy for DNN model using MNIST is lower than the previous work. To understand the reason, we have examined the adversarial classes for models using MNIST. We have found that model using MNIST has different adversarial map for each actual class and all most all adversarial classes are susceptible to be attacked. To check further, we have visualized MNIST based model using t-SNE in 2-D space and have observed that the visualization shows a distinct separation among classes. Whereas, the 2-D visualization of the CIFAR-10 dataset based DNN model depicts some overlaps among the features, and our distance-based approach has discovered a certain pattern in the adversarial map. Thus, we can conclude that our approach works better for models with high complexity e.g., CIFAR-10 based DNN models.
In this paper, we have presented a technique to detect susceptible classes using the prior information of a model. First, we analyze a DNN model to compute the distance among classes in feature space. Then, we utilize that information to identify classes that are susceptible to be attacked. We found that with , our approach performs best. To compare the four distance-based measures, we have presented a technique to create adversarial map to identify susceptible classes. We have evaluated the utility of four different measures in creating adversarial map. We have also introduced the idea of forbidden distance in the construction of adversarial map. We have experimentally evaluated that the adversary can not misclassify to a target beyond distance . We have found that Nearest Neighbor hopping is able to describe the adversarial behavior by decreasing the entropy of a model and computing the upper bound distance () accurately. Our approach is also able to detect susceptible target classes that can detect adversarial examples with high accuracy for CIFAR-10 dataset (ImageNet and ResNet-32). In addition, for MNIST and F-MNIST, our approach possesses an accuracy of and respectively. Currently, our susceptible class detection identifies the source class of an adversarial example with probability. In the future, we want to find and study more properties of adversarial attack to detect adversarial examples with a lower bound guarantee. Analyzing model analysis techniques and algorithms to achieve the goal remain future work.
-  Anish Athalye, Logan Engstrom, Andrew Ilyas, and Kevin Kwok. Synthesizing robust adversarial examples. arXiv preprint arXiv:1707.07397, 2017.
-  Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pages 39–57. IEEE, 2017.
-  Gamaleldin Elsayed, Shreya Shankar, Brian Cheung, Nicolas Papernot, Alexey Kurakin, Ian Goodfellow, and Jascha Sohl-Dickstein. Adversarial examples that fool both computer vision and time-limited humans. In Advances in Neural Information Processing Systems, pages 3910–3920, 2018.
-  Alhussein Fawzi, Hamza Fawzi, and Omar Fawzi. Adversarial vulnerability for any classifier. In Advances in Neural Information Processing Systems, pages 1178–1187, 2018.
-  Jean-Yves Franceschi, Alhussein Fawzi, and Omar Fawzi. Robustness of classifiers to uniform lp and gaussian noise. arXiv preprint arXiv:1802.07971, 2018.
-  Justin Gilmer, Luke Metz, Fartash Faghri, Samuel S. Schoenholz, Maithra Raghu, Martin Wattenberg, and Ian Goodfellow. The relationship between high-dimensional geometry and adversarial examples. arXiv preprint arXiv:1801.02774v3, 2018.
-  Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Deep sparse rectifier neural networks. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pages 315–323, 2011.
-  Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016.
-  Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
-  Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
-  Gaurav Goswami, Nalini Ratha, Akshay Agarwal, Richa Singh, and Mayank Vatsa. Unravelling robustness of deep learning based face recognition against adversarial attacks. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
-  Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
-  Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
-  Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, pages 604–613. ACM, 1998.
-  Kevin Jarrett, Koray Kavukcuoglu, Yann LeCun, et al. What is the best multi-stage architecture for object recognition? In 2009 IEEE 12th international conference on computer vision, pages 2146–2153. IEEE, 2009.
-  Kwang-Sung Jun, Lihong Li, Yuzhe Ma, and Jerry Zhu. Adversarial attacks on stochastic bandits. In Advances in Neural Information Processing Systems, pages 3640–3649, 2018.
-  Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.
-  Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
-  Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner, et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
-  Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. Delving into transferable adversarial examples and black-box attacks. arXiv preprint arXiv:1611.02770, 2016.
-  Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(Nov):2579–2605, 2008.
-  Tianyu Pang, Chao Du, Yinpeng Dong, and Jun Zhu. Towards robust detection of adversarial examples. In Advances in Neural Information Processing Systems, pages 4579–4589, 2018.
-  Nicolas Papernot, Ian Goodfellow, Ryan Sheatsley, Reuben Feinman, and Patrick McDaniel. cleverhans v1. 0.0: an adversarial machine learning library. arXiv preprint arXiv:1610.00768, 10, 2016.
-  Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia conference on computer and communications security, pages 506–519. ACM, 2017.
-  Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. The limitations of deep learning in adversarial settings. In 2016 IEEE European Symposium on Security and Privacy (EuroS&P), pages 372–387. IEEE, 2016.
-  Nicolas Papernot, Patrick McDaniel, Arunesh Sinha, and Michael Wellman. Towards the science of security and privacy in machine learning. arXiv preprint arXiv:1611.03814, 2016.
-  Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE Symposium on Security and Privacy (SP), pages 582–597. IEEE, 2016.
-  Jonathan Peck, Joris Roels, Bart Goossens, and Yvan Saeys. Lower bounds on the robustness to adversarial perturbations. In Advances in Neural Information Processing Systems, pages 804–813, 2017.
-  Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, and Aleksander Madry. Adversarially robust generalization requires more data. In Advances in Neural Information Processing Systems, pages 5014–5026, 2018.
-  Guanhong Tao, Shiqing Ma, Yingqi Liu, and Xiangyu Zhang. Attacks meet interpretability: Attribute-steered detection of adversarial samples. In Advances in Neural Information Processing Systems, pages 7717–7728, 2018.
-  Shixin Tian, Guolei Yang, and Ying Cai. Detecting adversarial examples through image transformation. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
-  Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
-  Ziang Yan, Yiwen Guo, and Changshui Zhang. Deep defense: Training dnns with improved adversarial robustness. In Advances in Neural Information Processing Systems, pages 419–428, 2018.
-  Zhihao Zheng and Pengyu Hong. Robust detection of adversarial attacks by modeling the intrinsic properties of deep neural networks. In Advances in Neural Information Processing Systems, pages 7913–7922, 2018.
-  Daniel Zügner, Amir Akbarnejad, and Stephan Günnemann. Adversarial attacks on neural networks for graph data. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 2847–2856. ACM, 2018.