Towards Robust Neural Networks via Random Self-ensemble

Towards Robust Neural Networks via Random Self-ensemble

Xuanqing Liu, Minhao Cheng, Huan Zhang and Cho-Jui Hsieh
University of California, Davis
{xqliu, mhcheng, ecezhang, chohsieh}@ucdavis.edu
Abstract

Recent studies have revealed the vulnerability of deep neural networks - A small adversarial perturbation that is imperceptible to human can easily make a well-trained deep neural network mis-classify. This makes it unsafe to apply neural networks in security-critical applications. In this paper, we propose a new defensive algorithm called Random Self-Ensemble (RSE) by combining two important concepts: randomness and ensemble. To protect a targeted model, RSE adds random noise layers to the neural network to prevent from state-of-the-art gradient-based attacks, and ensembles the prediction over random noises to stabilize the performance. We show that our algorithm is equivalent to ensemble an infinite number of noisy models without any additional memory overhead, and the proposed training procedure based on noisy stochastic gradient descent can ensure the ensemble model has good predictive capability. Our algorithm significantly outperforms previous defense techniques on real datasets. For instance, on CIFAR-10 with VGG network (which has 92% accuracy without any attack), under the state-of-the-art C&W attack within a certain distortion tolerance, the accuracy of unprotected model drops to less than 10%, the best previous defense technique has accuracy, while our method still has prediction accuracy under the same level of attack. Finally, our method is simple and easy to integrate into any neural network.

1 Introduction

Deep neural networks have demonstrated their success in many machine learning and computer vision applications, including image classification [9, 5], object recognition [21] and image captioning [25]. Despite having near-perfect prediction performance, recent studies have revealed the vulnerability of deep neural networks to adversarial examples - given a correctly classified image, a carefully designed perturbation to the image can make a well-trained neural network mis-classify. Algorithms crafting these adversarial images, called attack algorithms, are designed to minimize the perturbation, thus making adversarial images hard to be distinguished from natural images. This leads to security concerns, especially when applying deep neural networks to security-sensitive systems such as self-driving cars and medical imaging.

To make deep neural networks more robust to adversarial attacks, several defensive algorithms have been proposed recently  [16, 27, 12, 11, 26]. However, several recent studies showed that these defensive algorithms can only marginally improve the accuracy under the adversarial attacks  [1, 2].

In this paper, we propose a new defensive algorithm: Random Self-Ensemble (RSE). More specifically, we introduce a new “noise layer” that fuses input vector with a randomly generated noise, and then add this layer before each convolution layer of a deep network. In the training phase, gradient is still computed by back-propagation but will be perturbed by a random noise when passing through the noise layer. In the inference phase, we perform several forward propagations, each time with different prediction scores on the noise layer, and then ensemble the results. We show that RSE makes the network resistant against adversarial attacks – using the proposed training and testing scheme, it will only slightly affect test accuracy. The algorithm is easy to implement and can be applied to any deep neural networks to improve the robustness against adversarial attacks.

Intuitively, RSE works well because of two important concepts: ensemble and randomness. It is known that ensemble of several trained models can improve the robustness [20], but will also increase the model size by folds. In comparison, without any additional memory requirement, RSE can construct infinite number of models , where is generated randomly, and ensemble the results to improve robustness. But how do we guarantee the ensemble of these models can achieve good accuracy? Indeed, if we train the original model without noise, but only add noise layers in the inference phase, the algorithm performs poorly. This suggests adding random noise to an existing well-trained network will significantly degrade the performance. Instead, we show that if the noise layer is taken into account in the training phase, the training procedure can be viewed as minimizing an upper bound of the loss of the ensemble model, and thus our algorithm can achieve good prediction accuracy.

Our contribution of this paper can be summarized below:

  • We propose a Random Self-Ensemble (RSE) approach for improving the robustness of deep neural networks. The main idea is to add a “noise layer” before each convolution layer in both training and prediction phases. The algorithm is equivalent to ensembling an infinite number of random models to defense against the attackers.

  • We explain why RSE can significantly improve the robustness toward adversarial attacks and show adding noise layers is equivalent to training the original network with an extra regularization of Lipchitz constant.

  • RSE significantly outperforms existing defensive algorithms in all our experiments. For example, on CIFAR-10 with VGG network (which has 92% accuracy without any attack), under the C&W attack the accuracy of unprotected model drops to less than 10%; the best previous defense technique has accuracy; while RSE still has prediction accuracy under the same level of attack. Moreover, RSE is easy to implement and can combine with any neural network.

2 Related Work

Security of deep neural networks has been studied recently. Let us denote the neural network as where is the model parameters (weights) and is the input image. Given a correctly-classified image (), an attacking algorithm will try to find a slightly perturbed image such that (1) the neural network will mis-classify this perturbed image; and (2) the distortion is small so that the perturbation is hard to be detected by human. A defensive algorithm is designed to improve the robustness of neural networks against attackers, usually by slightly changing the loss function or training procedure. In the following, we summarize recent works along this line.

2.1 White-box attack

In the white-box setting, attackers have full information about the targeted neural network, including network structure and network weights (denoted by ). Using this information, attackers can compute gradient with respect to input data by back-propagation. Note that gradient is very informative for attackers since it characterizes the sensitivity of the prediction with respect to the input image.

To craft an adversarial example,  [7] proposed a fast gradient sign method (FGSM), where the adversarial example is constructed by

with some small . In fact, FGSM can be viewed as one step of gradient descent, and several works have been trying to improve over it, including rand FGSM  [23], I-FGSM  [12]. Recently, Carlini & Wagner [2] showed that constructing an adversarial example can be formulated as solving the following optimization problem:

(1)

where the first term is the loss function that characterizes the success of the attack and the second term is to enforce small distortion. The parameter is used to balance these two terms. Several variances are proposed recently [3, 14], but most of them are following the similar framework. The C&W attack has been recognized as the most successful attacking algorithm.

For untargeted attack, where the goal is to find an adversarial example that is close to the original example but yields different class prediction, the loss function in (1) can be defined as

where is the predicted label, is the network’s output before softmax layer.

For targeted attack, the loss function can be defined to force the classifier to predict the target label. For attackers, targeted attack is strictly harder than untargeted attack (since success of targeted attack implies success of untargeted attack). On the other hand, for defenders, untargeted attacks are strictly harder to defense than targeted attack. Therefore, we focus on defending untargeted attacks in our experiments.

2.2 Black-box attack

The white-box setting is often impractical because real world systems usually do not release their internal states. Therefore several recent papers are focusing on the black-box setting [17, 4]. In the black-box setting, the only thing attackers can do is to make queries to the targeted neural network and get the corresponding output. In this setting, a common approach is to train a “substitute model” [16] based on many input/output pairs and then attack this substitute model instead of the real one. This is based on the idea of transferability of adversarial examples [13]. However, this approach has very high failure rate since the substitute model can be totally different from the targeted network.

Recently, [4] proposed a black-box attack algorithm called ZOO. The main idea is to solve the same objective function (1) as the C&W attack using zero-th order optimization. To solve (1), C&W attack needs to compute gradient and apply gradient descent. However, due to the black-box setting, the gradient cannot be computed using backpropagation, thus ZOO [4] estimates the gradient by

(2)

for all , where is a small number that controls the estimation accuracy and is the -th indicator vector. In fact, if , ZOO can find a solution with similar quality as C&W’s white box attack (see [4]).

Note that our proposed method can perfectly prevent ZOO attack. Since our algorithm adds randomness into the neural network, it will never return the objective function using the same weights, which makes ZOO infeasible to use (2) to estimate gradient anymore. The details will be discussed in Section 3.

2.3 Defensive Algorithms

Because of the vulnerability of adversarial examples  [22], several methods have been proposed to strengthen the network’s ability to defense against adversarial examples.  [18] proposed defensive distillation, which uses a modified softmax layer by temperature to train the network as teacher network, and then use the prediction probability (soft-labels) from teacher network to train the student network which has the same structure with the teacher network. However, as stated in  [2], this method is not working when dealing with the C&W attack.  [27] showed that by using a modified ReLU activation layer (BReLU) and adding noise into origin images to augment the training dataset, learned network will gain some ability to defense from adversarial attacks. Another popular defense approach is adversarial training  [12, 11]. It generates and adds adversarial examples found by a certain attack algorithm into training set, which helps the network to learn how to distinguish an adversarial example. Combining adversarial training with enlarged model capacity, [14] is able to create a MNIST model that is robust to first order attacks, but this approach does not work well on larger dataset like CIFAR-10. In addition to changing the network structure, there are other methods  [26, 15, 6, 8] “detecting” the adversarial examples, which are not in the scope of our paper.

3 Proposed Algorithm: Random Self-Ensemble

Figure 1: Our proposed noisy VGG style network, we add a noise layer before each convolution layer. For simplicity, we call the noise layer before the first convolution layer the “init-noise”, and all other noise layer “inner-noise”. For these two kinds of layers we adopt different variances of Gaussian noise. Note that similar design can be transplanted to other architectures such as ResNet.

In this section, we propose our self-ensemble algorithm to improve the robustness of neural networks. We will first motivate and introduce our algorithm and then discuss several theoretical reasons behind it.

It is known that ensemble of several different models can improve the robustness. However, an ensemble of finite models is not very practical because it will increase the model size by folds. For example, AlexNet model on ImageNet requires 240MB storage, and storing 100 of them will require 24GB memory. Moreover, it is hard to find many heterogeneous models with similar accuracy. To improve the robustness of practical systems, we propose the following self-ensemble algorithm that can generate an infinite number of models on-the-fly without any additional memory cost.

Our main idea is to add randomness into the network structure. More specifically, we introduce a new “noise layer” that fuses input vector with a randomly generated noise, i.e. when passing through the noise layer. Then we add this layer before each convolution layer as shown in Figure 1. Since most attacks require computing or estimating gradient, the noise level in our model will control the success rate of those attacking algorithms. In fact, we can integrate this layer into any other neural network.

If we denote the original neural network as where is the weights and is the input image, then considering the random noise layer, the network can be denoted as with random . Therefore we have an infinite number of models in the pocket (with different ) without having any memory overhead. However, adding randomness will also affect the prediction accuracy of the model. How can we make sure that the ensemble of these random models has enough accuracy?

Training phase:
for   do
     Randomly sample in dataset
     Randomly generate for each noise layer.
     Compute (Noisy gradient)
     Update weights: .
end for
Testing phase:
Given testing image , initialize
for  do
     Randomly generate for each noise layer.
     Forward propagation to calculate probability output
     Update : .
end for
Predict the class with maximum score
Algorithm 1 Training and Testing of Random Self-Ensemble (RSE)

A critical observation is that we need to add this random layer in both training and testing phases. The training and testing algorithms are listed in Algorithm 1. In the training phase, gradient is computed as which includes the noise layer, and the noise is generated randomly for each stochastic gradient descent update. In the testing phase, we construct random noises and ensemble their probability outputs by

If we do not care about the prediction time, can be very large, but in practice we found the performance will be quite stable after beyond 10 (see Figure 5).

This approach is different from Gaussian data augmentation in  [27]: they only add Gaussian noise to images during the training time, while we add noise before each convolution layer at both training and inference time. When training, the noise helps optimization algorithm to find a stable convolution filter that is robust to perturbed input, while when testing, the roles of noise are two-folded: one is to perturb the gradient to fool gradient-based attacks.The other is it gives different outputs by doing multiple forward operations and a simple ensemble method can improve the testing accuracy.

3.1 Mathematical explanations

Training and testing of RSE

Here we explain our training and testing procedure. In the training phase, our algorithm is solving the following optimization problem:

(3)

where is the loss function and is the data distribution. Note that for simplicity we assume follows a zero-mean Gaussian , but in general our algorithm can work for any noise distribution.

At testing time, we ensemble the outputs through several forward propagations, specifically:

(4)

here  means the index of maximum element in a vector. The reason that our RSE algorithm achieves the similar prediction accuracy with original network is because (3) is minimizing an upper bound of the loss of (4) – If we choose negative log-likelihood loss, then and :

(5)

Here comes from Jensen’s inequality and is by the inference rule (4). So by minimizing (3) we are actually minimizing the upper bound of inference loss , this validates our ensemble inference procedure.

RSE is equivalent to Lipschitz regularization

Another point of view is that perturbed training is equivalent to Lipschitz regularization, which further helps defensing gradient based attack. If we fix the output label then the loss function can be simply denoted as . Lipchitz of the function is a constant such that

(6)

for all . In fact, it has been proved recently that Lipschitz constant can be used to measure the robustness of machine learning model [10]. If is large enough, even a tiny change of input can significantly change the loss and eventually get an incorrect prediction. On the contrary, by controlling to be small, we will have a more robust network.

Next we show that our noisy network indeed controls the Lipschitz constant. Following the notation of (3), we can see that

(7)

For , we do Taylor expansion at . Since we set the variance of noise very small, we only keep the second order term. For , we notice that the Gaussian vector is i.i.d. with zero mean. So the linear term of has zero expectation, and the quadratic term is directly dependent on variance of noise and Frobenius norm of Hessian. By the norm inequality for , we can rewrite (7) as

(8)

which means the training of noisy networks is equivalent to training the original model with an extra regularization of Lipschitz constant, and by controlling the variance of noise we can balance the robustness of network with training loss.

3.2 Discussions

Here we show both randomness and ensemble are important in our algorithm. Indeed, if we remove any component, the performance will significantly drop. And some naive ways to add random noise and ensemble does not work.

First, as mentioned before, the main idea of our model is to have infinite number of models , each with a different value, and then ensemble the result. A naive way to achieve this goal is to fix a pre-trained model and then generate many in the testing phase by adding different small noise to . However, Figure 2 shows this approach (denoted as Test noise only) will result in much worse performance (20% without any attack). Therefore it is non-trivial to guarantee the model to be good after adding small random noise. In our random self-ensemble algorithm, in addition to adding noise in the testing phase, we also add noise layer in the training phase, and this is important for getting good performance.

Second, we found adding noise in the testing phase and then ensemble the predictions is important. In Figure 2, we compare the performance of RSE with the version that only adds the noise layer in the training phase but not in the testing phase (so the prediction is instead of ). The results clearly show that the performance drop under smaller attacks. This proves ensemble in the testing phase is important.

Figure 2: We test three models on CIFAR10 and VGG16 network: In the first model noise is added both at training and testing time, in the second model noise is added only at training time, in the last model we only add noise at testing time. As a comparison we also plot baseline model which is trained conventionally. For all models that are noisy at testing time, we automatically enable self-ensemble.

3.3 Resistant against black-box attack (ZOO)

As discussed in Section 2.2, ZOO [4] is the most successful attack algorithm in the black-box setting and outperforms transfer attacks by a significant amount (see [4]). Interestingly, the accuracy of ZOO attack is theoretically controlled by the noise added in our noise layer. Recall that ZOO crafts the adversarial example by solving the optimization problem similar to (1) using zero-th order optimization, where gradient is estimated by finite difference. However, with RSE, the gradient estimator computed by ZOO will become

which is no longer an estimator of even when because . Therefore, ZOO will not even converge.

4 Experiments

Datasets and network structure

We test our method on two datasets—CIFAR10 and STL10. We do not compare the results on MNIST since it is a much easier dataset and existing defense methods such as  [16, 27, 12, 11] can effectively increase image distortion under adversarial attacks. On CIFAR10 data, we evaluate the performance on both VGG-16 [19] and ResNeXt [24]; on STL10 data we copy and slightly modify a simple model111Publicly available at https://github.com/aaron-xichen/pytorch-playground which we name it as “Model A”.

Defensive algorithms

We include the following defensive algorithms into comparison (their parameter settings can be found in Table 1):

  • Random Self-Ensemble (RSE): our proposed method.

  • Defensive distillation [18]: first train a teacher network at temperature , then use the teacher network to train a student network of the same architecture and same temperature. The student network is called the distilled network.

  • Robust optimization combined with BReLU activation [27]: first we replace all ReLU activation with BReLU activation. And then at the training phase, we randomly perturb training data by Gaussian noise with as suggested.

  • Adversarial retraining by FGSM attacks [12, 11]: we first pre-train a neural network without adversarial retraining. After that, we either select an original data batch or an adversarial data batch with probability . We continue training it until convergence.

Attack models

Although nowadays there are many attacking methods discussed in Section 2, they differ greatly on the power of attacks. Obviously white-box attacks know more information about the targeted model so they have higher success rate, qualifying itself as a challenger to defense models. Thus we choose C&W attack [2] as a representative one since it is a powerful white-box attacks, despite more computation is needed. Moreover, we test our algorithm under untargeted attack, since untargeted attack is strictly harder to defense than targeted attack. In fact, C&W untargeted attack is the most challenging attack for a defensive algorithm. As experiment in  [2] shows, C&W attack should be the benchmark for defensive methods.

Measure

Unlike attacking models that only need to operate on correctly classified images, a competitive defense model not only protects the model when attackers exist, but also keeps a good performance on clean datasets. Based on this thought, we compare the accuracy of guarded models under different strengths of C&W attack, the strength can be measured by -norm of image distortion and further controlled by parameter in (1). Note that an adversarial image is correctly predicted under C&W attack if and only if the original image is correctly classified and C&W attack cannot find an adversarial example within a certain distortion level.

Figure 3: The effect of noise level on robustness and generalization ability. Clearly random noise can improve the robustness of the model.
Figure 4: Comparing the accuracy under different levels of attack, here we choose VGG16+CIFAR10 combination. We can see that the ensemble model achieves better accuracy under weak attacks.
Methods Settings
No defense Baseline model
RSE(for CIFAR10 + VGG16) Initial noise: 0.4, inner noise: 0.1, 50-ensemble
RSE(for CIFAR10 + ResNeXt) Initial noise: 0.1, inner noise 0.1, 50-ensemble
RSE(for STL10 + Model A) Initial noise: 0.4, inner noise: 0.1, 50-ensemble
Defensive distill Temperature = 40
Adversarial retraining FGSM adversarial examples,
Robust Opt. + BReLU Following [27]
Table 1: Experiment setting for defensive methods
Figure 5: Testing accuracy (without attack) of different (number of random models used for ensemble).
RSE(ours) 90.00% 86.06% 79.44% 67.19% 34.75%
Adv retraining 27.00% 9.81% 4.13% 3.69% 1.44%
Robust Opt+BReLU 75.06% 47.93% 30.94% 20.69% 13.50%
Distill 49.88% 17.69% 4.56% 3.13% 1.44%
No defense 30.38% 8.93% 5.06% 3.56% 2.19%
Table 2: Prediction accuracy of defense methods under C&W attack with different . We can clearly observe that RSE is the most robust model. Our accuracy level remains at above 75% when other methods are below 30%.
Figure 6: Comparing the accuracy of CIFAR10+{VGG16, ResNeXt} and STL10+Model A. We show both the change of accuracy and average distortion w.r.t. attacking strength parameter (the parameter in the C&W attack). Our model (RSE) clearly outperforms all the existing methods under strong attacks in both accuracy and average distortion.
Figure 7: Targeted adversarial image distortion, each column indicates a defensive algorithm and each row is the adversarial target (the original image is in “ship” class, shown in the right side). Here we choose for targetd C&W attack. Visually, color spot means the distortion of images, thus a successful defending method should lead to more spots.
bird car cat deer dog frog horse plane truck
No defense 1.94 0.31 0.74 4.72 7.99 3.66 9.22 0.75 1.32
Defensive distill 6.55 0.70 13.78 2.54 13.90 2.56 11.36 0.66 3.54
Adv. retraining 2.58 0.31 0.75 6.08 0.75 9.01 6.06 0.31 4.08
Robust Opt. + BReLU 17.11 1.02 4.07 13.50 7.09 15.34 7.15 2.08 17.57
RSE(ours) 12.87 2.61 12.47 21.47 31.90 19.09 9.45 10.21 22.15
Table 3: Image distortion required for targeted attacks.

4.1 The effect of noise level

We first test the performance of RSE under different noise levels. We use Gaussian noise for all the noise layers in our network and the standard deviation of Gaussian controls the noise level. Note that we call the noise layer before the first convolution layer the “init-noise”, and all other noise layer “inner-noise”.

In this experiment, we apply different noise level in both training and testing phases to see how different variances change the robustness as well as generalization ability of networks. As an example, we choose on VGG16+CIFAR10. The result is shown in Figure 3.

As we can see, both “init-noise” and “inner-noise” are beneficial to the robustness of neural network, but at the same time, one can see higher noise reduces the accuracy for weak attacks (). From Figure 3, we observe that if the input image distribution is in the range of , then choosing and is good. Thus we fix this parameter for all the experiments.

4.2 Self-ensemble

Next we show self-ensemble helps to improve the test accuracy of our noisy mode. As an example, we choose VGG16+CIFAR10 combination and the standard deviation of initial noise layer is , other noise layers is . We compare 50-ensemble with 1-ensemble (i.e. single model), and the result can be found in Figure 4.

We find the 50-ensemble method outperform the 1-ensemble method by accuracy when . This is because when the attack is weak enough, the majority choice of networks has lower variance and higher accuracy. On the other hand, we can see if or equivalently the average distortion greater than , the ensemble model is worse. We conjecture that this is because when the attack is strong enough then the majority of random sub-models make wrong prediction, but when looking at any individual model, the random effect might be superior than group decision. In this situation, self-ensemble may have a negative effect on accuracy.

Practically, if running time is of primary concern, it is not necessary to calculate many ensemble models. In fact, we find the accuracy is easily saturated with respect to number of models, moreover, if we inject smaller noise then ensemble effect would be weaker and the accuracy gets saturated earlier. Therefore, we find -ensemble is good enough for testing accuracy, see Figure 5.

4.3 Comparing defense methods

Finally, we compare our RSE method with other existing defensive algorithms. Note that we test all of them using C&W untargeted attack, which is the most difficult setting for defenders.

The comparison across different datasets and networks can be found in Table 2 and Figure 6. As we can see, previous defense methods have little effect on C&W attacks. For example, Robust Opt+BReLU [27] is useful for CIFAR10+ResNeXt, but the accuracy is even worse than no defense model for STL10+Model A. In contrast, our RSE method acts as a good defence across all cases. Specifically, RSE method enforces the attacker to find much more distorted adversarial images in order to start a successful attack. As showed in Figure  6, when we allow an average distortion of on CIFAR10+VGG16, C&W attack is able to conduct untargeted attacks with success rate . On the contrary, by defending the networks via RSE, C&W attack only yields a success rate of .

Apart from accuracy under C&W attack, we find the distortion of adversarial images also increases significantly, this can be seen in Figure 2(2nd row), as is large enough (so that all defensive algorithms no longer works) our RSE method achieves the largest distortion.

Although all above experiments are concerning untargeted attack, it doesn’t mean targeted attack is not covered, as we said, targeted attack is harder for attacking methods and easier to defense. As an example, we test all the defensive algorithms on cifar-10 dataset under targeted attack. We randomly pick an image from CIFAR10 and plot the perturbation in Figure 7 (the exact number is in Table 3), to make it easier to print out, we subtract RGB channels from 255. One can easily find RSE method makes the adversarial images more distorted.

5 Conclusion

In this paper, we propose a new defensive algorithm called Random Self-Ensemble (RSE) to improve the robustness of deep neural networks against adversarial attacks. We show that our algorithm is equivalent to ensemble a huge amount of noisy models together, and our proposed training process ensures that the ensemble model can generalize well. We further show that the algorithm is equivalent to adding a Lipchitz regularization and thus can improve the robustness of neural networks. Experimental results demonstrate that our method is very robust against state-of-the-art white-box attacks. Moreover, Our method is simple, easy-to-implement, and can be easily embedded into an existing network.

References

  • [1] N. Carlini and D. Wagner. Adversarial examples are not easily detected: Bypassing ten detection methods. arXiv preprint arXiv:1705.07263, 2017.
  • [2] N. Carlini and D. Wagner. Towards evaluating the robustness of neural networks. In Security and Privacy (SP), 2017 IEEE Symposium on, pages 39–57. IEEE, 2017.
  • [3] P.-Y. Chen, Y. Sharma, H. Zhang, J. Yi, and C.-J. Hsieh. Ead: Elastic-net attacks to deep neural networks via adversarial examples. arXiv preprint arXiv:1709.04114, 2017.
  • [4] P.-Y. Chen, H. Zhang, Y. Sharma, J. Yi, and C.-J. Hsieh. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. arXiv preprint arXiv:1708.03999, 2017.
  • [5] J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, A. Senior, P. Tucker, K. Yang, Q. V. Le, et al. Large scale distributed deep networks. In Advances in neural information processing systems, pages 1223–1231, 2012.
  • [6] R. Feinman, R. R. Curtin, S. Shintre, and A. B. Gardner. Detecting adversarial samples from artifacts. arXiv preprint arXiv:1703.00410, 2017.
  • [7] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
  • [8] K. Grosse, P. Manoharan, N. Papernot, M. Backes, and P. McDaniel. On the (statistical) detection of adversarial examples. arXiv preprint arXiv:1702.06280, 2017.
  • [9] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  • [10] M. Hein and M. Andriushchenko. Formal guarantees on the robustness of a classifier against adversarial manipulation. arXiv preprint arXiv:1705.08475, 2017.
  • [11] R. Huang, B. Xu, D. Schuurmans, and C. Szepesvári. Learning with a strong adversary. arXiv preprint arXiv:1511.03034, 2015.
  • [12] A. Kurakin, I. Goodfellow, and S. Bengio. Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236, 2016.
  • [13] Y. Liu, X. Chen, C. Liu, and D. Song. Delving into transferable adversarial examples and black-box attacks. arXiv preprint arXiv:1611.02770, 2016.
  • [14] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
  • [15] D. Meng and H. Chen. Magnet: a two-pronged defense against adversarial examples. arXiv preprint arXiv:1705.09064, 2017.
  • [16] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami. Practical black-box attacks against deep learning systems using adversarial examples. arXiv preprint arXiv:1602.02697, 2016.
  • [17] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pages 506–519. ACM, 2017.
  • [18] N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In Security and Privacy (SP), 2016 IEEE Symposium on, pages 582–597. IEEE, 2016.
  • [19] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  • [20] T. Strauss, M. Hanselmann, A. Junginger, and H. Ulmer. Ensemble methods as a defense to adversarial perturbations against deep neural networks. arXiv:1709.03423, 2017.
  • [21] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015.
  • [22] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
  • [23] F. Tramèr, A. Kurakin, N. Papernot, D. Boneh, and P. McDaniel. Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204, 2017.
  • [24] S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He. Aggregated residual transformations for deep neural networks. arXiv preprint arXiv:1611.05431, 2016.
  • [25] K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. In International Conference on Machine Learning, pages 2048–2057, 2015.
  • [26] W. Xu, D. Evans, and Y. Qi. Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155, 2017.
  • [27] V. Zantedeschi, M.-I. Nicolae, and A. Rawat. Efficient defenses against adversarial attacks. arXiv preprint arXiv:1707.06728, 2017.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
15274
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description