Adversarial Defense via Data Dependent Activation Function and Total Variation Minimization
Abstract
We improve the robustness of deep neural nets to adversarial attacks by using an interpolating function as the output activation. This datadependent activation function remarkably improves both classification accuracy and stability to adversarial perturbations. Together with the total variation minimization of adversarial images and augmented training, under the strongest attack, we achieve up to 20.6, 50.7, and 68.7 accuracy improvement w.r.t. the fast gradient sign method, iterative fast gradient sign method, and CarliniWagner attacks, respectively. Our defense strategy is additive to many of the existing methods. We give an intuitive explanation of our defense strategy via analyzing the geometry of the feature space. For reproducibility, the code is made available at: https://github.com/BaoWangMath/DNNDataDependentActivation.
1 Introduction
The adversarial vulnerability [27] of deep neural nets (DNNs) threatens their applicability in security critical tasks, e.g., autonomous cars [1], robotics [9], DNNbased malware detection systems [21, 8]. Since the pioneering work by Szegedy et al. [27], many advanced adversarial attack schemes have been devised to generate imperceptible perturbations to sufficiently fool the DNNs [7, 20, 6, 30, 12, 3]. And not only are adversarial attacks successful in whitebox attacks, i.e. when the adversary has access to the DNN parameters, but attacks are also successful in blackbox attacks, i.e. it has no access to the parameters. Blackbox attacks are successful because one can perturb an image so it misclassifies on one DNN, and the same perturbed image also has a significant chance to be misclassified by another DNN; this is known as transferability of adversarial examples [23]. Due to the transferability of adversarial examples, it is very easy to attack neural nets in a blackbox fashion [15, 5]. In fact, there exist universal perturbations that can imperceptibly perturb any image and cause misclassification for any given network [17]. There is much recent research on designing advanced adversarial attacks and defending against adversarial perturbation.
In this work, we propose to defend against adversarial attacks by changing the DNNs’ output activation function to a manifoldinterpolating function, in order to seamlessly utilize the training data’s information when performing inference. Together with the total variation minimization (TVM) and augmented training, we show stateoftheart defense results on the CIFAR10 benchmark. Moreover, we show that adversarial images generated from attacking the DNNs with an interpolating function are more transferable to other DNNs, than those resulting from attacking standard DNNs.
2 Related Work
Defensive distillation was recently proposed to increase the stability of DNNs which dramatically reduces the success rate of adversarial attacks [22], and a related approach ([28]) cleverly modifies the training data to increase robustness against blackbox attacks, and adversarial attacks in general. To counter the adversarial perturbations, [10] proposed to use image transformation, e.g., bitdepth reduction, JPEG compression, TVM, and image quilting. Similar idea of denoising the input was later explored by [18], where they divide the input into patches, denoise each patch, and then reconstruct the image. These input transformations are intended to be nondifferentiable, thus making adversarial attacks more difficult, especially for gradientbased attacks. Song et al [26] noticed that small adversarial perturbations shift the distribution of adversarial images far from the distribution of clean images. Therefore they proposed to purify the adversarial images by PixelDefend. Adversarial training is another family of defense methods to improve the stability of DNNs [7, 16, 19]. GANs are also employed for adversarial defense [25]. In [2], the authors proposed a straightthrough estimation of the gradient to attack the defense methods that is based on the obfuscated gradient. Meanwhile, many advanced attack methods have been proposed to attack the DNNs [30, 12].
Instead of using softmax functions as the DNNs’ output activation, Wang et al [29] utilized a class of nonparametric interpolating functions. This is a combination of both deep and manifold learning which causes the DNNs to sufficiently utilize the geometric information of the training data. The authors show a significant amount of generalization accuracy improvement, and the results are more stable to when only has a limited amount of training data.
3 Deep Neural Nets with DataDependent Activation Function
In this section, we summarize the architecture, training, and testing procedures of the DNNs with the datadependent activation [29]. An overview of training and testing of the standard DNNs with softmax output activation is shown in Fig. 1 (a) and (b), respectively. In the th iteration of training, given a minibatch of training data , the procedure is:
Forward propagation: Transform into features by a DNN block (ensemble of convolutional layers, nonlinearities and others), and then through the softmax activation to get the predictions :
Then the loss is computed (e.g., cross entropy) between and : .
Backpropagation: Update weights (, ) by gradient descent (learning rate ):
Once the model is optimized, the predicted labels for testing data are:
[29] proposed to replace the dataagnostic softmax activation by a datadependent interpolating function, defined in the next section.
(a)  (b)  (c)  (d) 
3.1 Manifold Interpolation  A Harmonic Extension Approach
Let be a set of points in a high dimensional manifold and be a subset of which are labeled with label function . We want to interpolate a function that is defined on the entire manifold and can be used to label the entire dataset . The harmonic extension is a natural and elegant approach to find such an interpolating function, which is defined by minimizing the Dirichlet energy functional:
(1) 
with the boundary condition:
where is a weight function, typically chosen to be Gaussian: with being a scaling parameter. The EulerLagrange equation for Eq. (1) is:
(2) 
By solving the linear system (Eq. (2)), we obtain labels for unlabeled data . This interpolation becomes invalid when the labeled data is tiny, i.e., . To resolve this issue, the weights of the labeled data is increased in the EulerLagrange equation, which gives:
(3) 
The solution to Eq. (3) is named weighted nonlocal Laplacian (WNLL), denoted as . For classification tasks, is the onehot labels for the example .
3.2 Training and Testing the DNNs with DataDependent Activation Function
In both training and testing of the WNLL activated DNNs, we need to reserve a small portion of data/label pairs, denoted as , to interpolate the label for new data . We name the reserved data as the template. Directly replacing softmax by WNLL has difficulties in back propagation, namely, the true gradient is difficult to compute since WNLL defines a very complex implicit function. Instead, to train WNLL activated DNNs, a proxy via an auxiliary neural net (Fig.1(c)) is employed. On top of the original DNNs, we add a buffer block (a fully connected layer followed by a ReLU), and followed by two parallel branches, WNLL and the linear (fully connected) layers. The auxiliary DNNs can be trained by alternating between training DNNs with linear and WNLL activations, respectively. The training loss of the WNLL activation function is backpropped via a straightthrough estimation approach [2, 4]. At test time, we remove the linear classifier from the neural nets and use the DNN and buffer blocks together with WNLL to predict new data (Fig. 1 (d)); here for simplicity, we merge the buffer block to the DNN block. For a given set of testing data , and the labeled template , the predicted labels for is given by
4 Adversarial Attacks
We consider three benchmark attack methods in this work, namely, the fast gradient sign method (FGSM) [7], iterative FGSM (IFGSM) [14], and CarliniWagner’s (CWL2) [6] attacks. We denote the classifier defined by the DNNs with softmax activation as for a given instance (, ). FGSM finds the adversarial image by maximizing the loss , subject to the perturbation with as the attack strength. Under the first order approximation i.e., , the optimal perturbation is given by
(4) 
IFGSM iterates FGSM to generate enhanced adversarial images, i.e.,
(5) 
where , and , with be the number of iterations.
The CWL2 attack is proposed to circumvent defensive distillation. For a given image label pair , and , CWL2 searches the adversarial image that will be classified to class by solving the optimization problem:
(6) 
where is the adversarial perturbation (for simplicity, we ignore the dependence of in ).
The equality constraint in Eq. (6) is hard to satisfy, so instead Carlini et al. consider the surrogate
(7) 
where is the logit vector for an input , i.e., output of the neural net before the softmax layer. is the logit value corresponding to class . It is easy to see that is equivalent to . Therefore, the problem in Eq. (6) can be reformulated as
(8) 
where is the Lagrangian multiplier.
By letting , Eq. (8) can be converted to an unconstrained optimization problem. Moreover, Carlini et al. introduce the confidence parameter into the above formulation. Above all, CWL2 attacks seek adversarial images by solving the following problem
(9) 
This unconstrained optimization problem can be solved efficiently by the Adam optimizer [13]. All three of the attacks clip the values of the adversarial image to between 0 and 1.
4.1 Adversarial Attack for DNNs with WNLL Activation Function
In this work, we focus on untargeted attacks and defend against them. For a given small batch of testing images and template , we denote the DNNs modified with WNLL as output activation as , where is the composition of the DNN and buffer blocks defined in Fig. 1 (c). By ignoring dependence of the loss function on the parameters, the loss function for DNNs with WNLL activation can be written as . The above attacks for DNNs with WNLL activation on the batch of images, , are formulated below.

FGSM
(10) 
IFGSM
(11) where ; and .

CWL2
(12) where is the logit values of the input images .
Based on our numerical experiments, the batch size of has minimal influence on the adversarial attack and defense. In all of our experiments we choose the batch size of to be . Similar to [29], we choose the size of the template to be .
We apply the above attack methods to ResNet56 [11] with either softmax or WNLL as the output activation function. For IFGSM, we run 10 iterations of Eqs. (5) and (11) to attack DNNs with two different output activations, respectively. For CWL2 attacks (Eqs. (9, 12)) in both scenarios, we set the parameters and . Figure 2 depicts three randomly selected images (horse, automobile, airplane) from the CIFAR10 dataset, their adversarial versions by different attack methods on ResNet56 with two kinds of activation functions, and the TV minimized images. All attacks successfully fool the classifiers to classify any of them correctly. Figure 2 (a) shows that FGSM and IFGSM with perturbation changes the contrast of the images, while it is still easy for humans to correctly classify them. The adversarial images of the CWL2 attacks are imperceptible, however they are extremely strong in fooling DNNs. Figure 2 (b) shows the images of (a) with a stronger attack, . With a larger , the adversarial images become more noisy. The TV minimized images of Fig. 2 (a) and (b) are shown in Fig. 2 (c) and (d), respectively. The TVM removes a significant amount of detailed information from the original and adversarial images, meanwhile it also makes it harder for humans to classify both the TVminimized version of the original and adversarial images. Visually, it is hard to discern the adversarial images resulting from attacking the DNNs with two types of output layers.
(a)  (b)  (c)  (d) 
5 Analysis of the Geometry of Features
We consider the geometry of features of the original and adversarial images. We randomly select 1000 training and 100 testing images from the airplane and automobile classes, respectively. We consider two visualization strategies for ResNet56 with softmax activation: (1) extract the original 64D features output from the layer before the softmax, and (2) apply the principle component analysis (PCA) to reduce them to 2D. However, the principle components (PCs) do not encode the entire geometric information of the features. Alternatively, we add a 2 by 2 fully connected (FC) layer before the softmax, then utilize the 2D features output from this newly added layer. We verify that the newly added layer does not change the performance of ResNet56 as shown in Fig. 3 that the training and testing performance remains essentially the same for these two cases.
(a)  (b) 
Figure 4 (a) and (b) show the 2D features generated by ResNet56 with additional FC layer for the original and adversarial testing images, respectively, where we generate the adversarial images by using FGSM (). Before adversarial perturbation (Fig. 4 (a)), there is a straight line that can easily separate the two classes. The small perturbation causes the features to overlap and there is no linear classifier that can easily separate these two classes (Fig. 4 (b)). The first two PCs of the 64D features of the clean and adversarial images are shown in Fig. 4 (c) and (d), respectively. Again, the PCs are well separated for clean images, while adversarial perturbation causes overlap and concentration.
The bottom charts of Fig. 4 depict the first two PCs of the 64D features output from the layer before the WNLL. The distributions of the unperturbed training and testing data are the same, as illustrated in panels (e) and (f). The new features are better separated which indicates that DNNs with WNLL is more robust to small random perturbation. Panels (g) and (h) plot the features of the adversarial and TV minimized adversarial images in the test set. The adversarial attacks move the automobiles’ features to the airplanes’ region. The TVM helps to eliminate the outliers. Based on our computation, most of the adversarial images of the airplane classes can be correctly classified with the interpolating function. The training data guides the interpolating function to classify adversarial images correctly. The fact that the adversarial changes the features’ distribution was also noticed in [26].
(a)  (b)  (c)  (d) 
(e)  (f)  (g)  (h) 
6 Adversarial Defense by Interpolating Function and TVM
To defend against adversarials, we combine the ideas of datadependent activation, input transformation, and training data augmentation. We train ResNet56, respectively, on the original training data, the TV minimized training data, and a combination of the previous two. On top of the datadependent activation output and augmented training, we further apply the TVM [24] used by [10] to transform the adversarial images to boost defensive performance. The basic idea is to reconstruct the simplest image from the subsampled image, , with the mask filled by a Bernoulli binary random variable, by solving the following TVM problem
where is the regularization constant.
7 Numerical Results
7.1 Transferability of the Adversarial Images
To verify the efficacy of attack methods for DNNs with WNLL output activation, we consider the transferability of adversarial images. We train ResNet56 on the aforementioned three types of training data with either softmax or WNLL activation. After the DNNs are trained, we attack them by FGSM, IFGSM, and CWL2 with different .Finally, we classify the adversarial images by using ResNet56 with the opponent activation. We list the mutual classification accuracy on adversarial images in Table. 1. The adversarial images resulting from attacking DNNs with two types of activation functions are both transferable, as the mutual classification accuracy is significantly lower than testing on the clean images. Overall, when applying ResNet56 with WNLL activation to classify the adversarial images resulting from attacking ResNet56 with softmax activation, the network has a remarkably higher accuracy. For instance, for DNNs that are trained on the original images and attacked by FGSM, DNNs with the WNLL classifier have at least 5.4 higher accuracy (56.3 v.s. 61.7 ()). The accuracy improvement is more significant in many other scenarios.
Attack Method  Training data  
Classification accuracy of ResNet56 with softmax on adversarial images produced by attacking ResNet56 with WNLL  
FGSM  Original data  59.6  59.5  58.0  56.3  54.3 
FGSM  TVM data  50.7  40.6  41.2  37.4  34.5 
FGSM  Original + TVM data  62.9  61.7  60.6  59.4  58.9 
IFGSM  Original data  49.1  43.6  40.4  36.8  34.8 
IFGSM  TVM data  30.3  23.7  20.1  18.0  17.3 
IFGSM  Original + TVM data  53.9  49.2  44.7  41.9  39.9 
CWL2  Original data  54.7  54.2  54.4  53.8  54.0 
CWL2  TVM data  59.8  59.5  58.7  59.8  59.1 
CWL2  Original + TVM data  81.5  81.5  81.8  81.2  81.5 
Classification accuracy of ResNet56 with WNLL on adversarial images produced by attacking ResNet56 with softmax  
FGSM  Original data  65.4  65.9  63.6  61.7  60.5 
FGSM  TVM data  61.5  56.7  50.8  44.7  41.0 
FGSM  Original + TVM data  69.7  67.6  65.5  64.8  63.4 
IFGSM  Original data  51.9  43.9  38.9  35.4  34.2 
IFGSM  TVM data  32.1  22.8  19.5  17.8  16.1 
IFGSM  Original + TVM data  60.0  53.0  47.5  41.6  38.4 
CWL2  Original data  81.5  81.4  81.5  81.6  81.4 
CWL2  TVM data  57.6  58.4  57.8  58.4  58.4 
CWL2  Original + TVM data  90.6  90.6  90.5  90.1  90.4 
7.2 Adversarial Defense
Figure 5 plots the result of adversarial defense by combining the WNLL activation, TVM, and training data augmentation. Panels (a), (b) and (c) show the testing accuracy of ResNet56 with and without defense on CIFAR10 data for FGSM, IFGSM, and CWL2, respectively. It can be observed that with increasing attack strength, , the testing accuracy decreases rapidly. FGSM is a relatively weak attack method, as the accuracy remains above 53.5 () even with the strongest attack. Meanwhile, the defense maintains accuracy above 71.8 (). Figure 5 (b) and (c) show that both IFGSM and CWL2 can fool ResNet56 near completely even with small . The defense maintains the accuracy above 68.0, 57.2, respectively, under the CWL2 and IFGSM attacks. Compared to stateoftheart defensive methods on CIFAR10, PixelDefend, our method is much simpler and faster. Without adversarial training, we have shown our defense is more stable to IFGSM, and more stable to all three attacks under the strongest attack than PixelDefend [26]. Moreover, our defense strategy is additive to adversarial training and many other defenses including PixelDefend.
(a)  (b)  (c) 
To analyze the defensive contribution from each component of the defensive strategy, we separate the three parts and list the testing accuracy in Table. 2. Simple TVM cannot defend FGSM attacks except when the DNNs are trained on the augmented data, as shown in the first and fourth horizontal blocks of the table. WNLL activation improves the testing accuracy of adversarial attacks significantly and persistently. Augmented training can improve the stability consistently as well.
Attack Method  Training data  
Vanilla ResNet56  
FGSM  Original data  93.0  60.4/39.4  60.3/39.4  58.2/40.2  55.8/30.9  53.5/40.1 
FGSM  TVM data  88.3  54.1/39.6  49.5/41.6  43.6/44.3  39.5/45.1  35.9/45.0 
FGSM  Original + TVM data  93.1  63.2/66.6  62.7/67.8  62.4/68.7  62.0/68.1  61.3/68.7 
IFGSM  Original data  93.0  20.6/35.0  11.6/32.3  8.6/31.0  7.5/28.8  6.5/27.6 
IFGSM  TVM data  88.3  10.3/32.9  6.7/31.1  6.1/31.7  6.1/30.8  6.0/29.2 
IFGSM  Original + TVM data  93.1  32.1/61.5  24.5/57.4  20.1/54.1  17.1/51.3  15.9/48.9 
CWL2  Original data  93.0  4.7/36.8  3.5/36.4  0/36.8  0/36.8  0/35.9 
CWL2  TVM data  88.3  8.2/36.5  8.1/36.0  8.0/35.9  8.0/35.8  8.0/36.3 
CWL2  Original + TVM data  93.1  13.6/62.2  13.6/62.2  13.0/62.1  12.0/62.1  12.0/61.9 
DataDependent Activated ResNet56  
FGSM  Original data  94.5  71.1/49.9  72.1/51.1  71.3/51.7  70.6/52.2  67.3/51.8 
FGSM  TVM data  90.6  62.6/49.3  56.8/54.1  52.1/56.2  46.0/56.6  41.0/57.1 
FGSM  Original + TVM data  94.7  70.6/71.8  68.8/73.1  67.2/74.9  66.9/73.6  63.7/74.1 
IFGSM  Original data  94.5  43.7/44.7  35.3/42.1  31.3/39.5  28.2/37.8  27.0/35.5 
IFGSM  TVM data  90.6  12.1/44.3  7.1/41.1  7.2/37.4  6.9/37.2  6.8/35.3 
IFGSM  Original + TVM data  94.7  35.0/67.4  25.1/64.9  20.5/61.9  17.5/58.7  16.3/57.2 
CWL2  Original data  94.5  11.9/40.1  11.7/40.8  11.0/40.8  10.8/41.2  10.8/40.5 
CWL2  TVM data  90.6  52.6/48.5  52.7/48.4  52.2/45.8  52.8/47.7  51.9/44.8 
CWL2  Original + TVM data  94.7  61.6/68.6  61.1/68.0  61.9/68.1  61.2/69.2  61.5/68.7 
8 Concluding Remarks
In this paper, by analyzing the influence of adversarial perturbations on the geometric structure of the DNNs’ features, we propose to defend against adversarial attack by applying a datadependent activation function, total variation minimization on the adversarial images, and training data augmentation. Results on ResNet56 with CIFAR10 benchmark reveal that the defense improves robustness to adversarial perturbation significantly. Total variation minimization simplifies the adversarial images, which is very useful in removing adversarial perturbation. Another interesting direction to explore is to apply other denoising methods to remove adversarial perturbation. Moreover, we noticed that an adversarial perturbation changes the features’ distribution severely, and one possible way to correct this is to design algorithms that purify the adversarial images.
Acknowledgments
This material is based on research sponsored by the Air Force Research Laboratory and DARPA under agreement number FA87501820066. And by the U.S. Department of Energy, Office of Science and by National Science Foundation, under Grant Numbers DOESC0013838 and DMS1554564, (STROBE). And by the NSF DMS1737770 and the Simons foundation. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon.
References
 [1] N. Akhtar and A. Mian. Threat of adversarial attacks on deep learning in computer vision: A survey. arXiv preprint arXiv:1801.00553, 2018.
 [2] A. Athalye, N. Carlini, and D. Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. International Conference on Machine Learning, 2018.
 [3] A. Athalye, L. Engstrom, A. Ilyas, and K. Kwok. Synthesizing robust adversarial examples. International Conference on Machine Learning, 2018.
 [4] Y. Bengio, N. Leonard, and A. Courville. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432, 2013.
 [5] W. Brendel, J. Rauber, and M. Bethge. Decisionbased adversarial attacks: Reliable attacks against blackbox machine learning models. arXiv preprint arXiv:1712.04248, 2017.
 [6] N. Carlini and D.A. Wagner. Towards evaluating the robustness of neural networks. IEEE European Symposium on Security and Privacy, pages 39–57, 2016.
 [7] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6275, 2014.
 [8] K. Grosse, N. Papernot, P. Manoharan, M. Backes, and P. McDaniel. Adversarial perturbations against deep neural networks for malware classification. arXiv preprint arXiv:1606.04435, 2016.
 [9] A. Guisti, J. Guzzi, D.C. Ciresan, F.L. He, J.P. Rodriguez, F. Fontana, M. Faessler, C. Forster, J. Schmidhuber, G. Di Carlo, and et al. A machine learning approach to visual perception of forecast trails for mobile robots. IEEE Robotics and Automation Letters, pages 661–667, 2016.
 [10] C. Guo, M. Cisse, and L. van der Maaten. Countering adversarial images using input transformation. International Conference on Learning Representations, 2018.
 [11] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
 [12] A. Ilyas, L. Engstrom, A. Athalye, and J. Lin. Blackbox adversarial attacks with limited queries and information. International Conference on Machine Learning, 2018.
 [13] D. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
 [14] A. Kurakin, I. J. Goodfellow, and S. Bengio. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533, 2016.
 [15] Y. Liu, X. Chen, C. Liu, and D. Song. Delving into transferable adversarial examples and blackbox attacks. arXiv preprint arXiv:1611.02770, 2016.
 [16] A. Mardy, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models resistant to adversarial attacks. International Conference on Learning Representations, 2018.
 [17] SeyedMohsen MoosaviDezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Universal adversarial perturbations. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
 [18] SeyedMohsen MoosaviDezfooli, Ashish Shrivastava, and Oncel Tuzel. Divide, denoise, and defend against adversarial attacks. CoRR, abs/1802.06806, 2018.
 [19] T. Na, J. H. Ko, and S. Mukhopadhyay. Cascade adversarial machine learning regularized with a unified embedding. International Conference on Learning Representations, 2018.
 [20] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z.B. Celik, and A. Swami. The limitations of deep learning in adversarial settings. IEEE European Symposium on Security and Privacy, pages 372–387, 2016.
 [21] N. Papernot, P. McDaniel, A. Sinha, and M. Wellman. Sok: Towards the science of security and privacy in machien learning. arXiv preprint arXiv:1611.03814, 2016.
 [22] N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami. Distillation as a defense to adversarial perturbations against deep neural networks. IEEE European Symposium on Security and Privacy, 2016.
 [23] Nicolas Papernot, Patrick D. McDaniel, and Ian J. Goodfellow. Transferability in machine learning: from phenomena to blackbox attacks using adversarial samples. CoRR, abs/1605.07277, 2016.
 [24] L. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena, pages 259–268, 1992.
 [25] P. Samangouei, M. Kabkab, and R. Chellappa. Defensegan: Protecting classifiers against adversaial attacks using generative models. International Conference on Learning Representations, 2018.
 [26] Y. Song, T. Kim, S. Nowozin, S. Ermon, and N. Kushman. Pixeldefend: Leveraging generative models to understand and defend against adversarial examples. International Conference on Learning Representations, 2018.
 [27] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, and I. Goodfellow. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
 [28] Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. Ensemble adversarial training: Attacks and defenses. In International Conference on Learning Representations, 2018.
 [29] B. Wang, X. Luo, Z. Li, W. Zhu, Z. Shi, and S. Osher. Deep neural nets with interpolating function as output activation. Advances in Neural Information Processing Systems, 2018.
 [30] X. Wu, U. Jang, J. Chen, L. Chen, and S. Jha. Reinforcing adversarial robustness using model confidence induced by adversarial training. International Conference on Machine Learning, 2018.