Adversarial Robustness via Adversarial Label-Smoothing

Adversarial Robustness via Adversarial Label-Smoothing

\nameMorgane Goibert \emailm.goibert@criteo.com
\nameElvis Dohmatob \emaile.dohmatob@criteo.com
\addr Criteo AI Lab
Abstract

We study Label-Smoothing as a means for improving adversarial robustness of supervised deep-learning models. After establishing a thorough and unified framework, we propose several novel Label-Smoothing methods: adversarial, Boltzmann and second-best Label-Smoothing methods. On various datasets (MNIST, CIFAR10, SVHN) and models (linear models, MLPs, LeNet, ResNet), we show that these methods improve adversarial robustness against a variety of attacks (FGSM, BIM, DeepFool, Carlini-Wargner) by better taking account of the dataset geometry. These proposed Label-Smoothing methods have two main advantages: they can be implemented as a modified cross-entropy loss, thus do not require any modifications of the network architecture nor do they lead to increased training times, and they improve both standard and adversarial accuracy.

0.5cm

1 Introduction

NeuralPreprint. Under review. Networks (NNs) have proved their efficiency in solving classification problems in areas such as computer vision krizhevsky2012imagenet (). Despite these successes, recent works have shown that NN are sensitive to adversarial examples (e.g szegedy2013intriguing ()), which is problematic for critical applications sitawarin2018darts (). Many strategies have thus been developed to improve robustness and different attacks have been proposed to test these defenses. Broadly speaking, an adversarial attack succeeds when an image looks to a human like it belongs to a specific class, but a classifier misclassifies it. Despite the number of works on it goodfellow2014explaining (); fawzi2016robustness (); tanay2016boundary (); tramer2017space (), there is still no complete understanding of the adversarial phenomenon. Yet, the vulnerability of NN to adversarial attacks suggests a shortcoming in the generalization of the network. As overconfidence in predictions hinders generalization, addressing it can be a good way to tackle adversarial attacks zheng2018improvement (). Label-Smoothing (LS) is a method which creates uncertainty in the labels of a dataset used to train a NN. This uncertainty helps to tackle the over-fitting issue, and thus LS can be an efficient method to address the adversarial attack phenomenon.

1.1 Notations and terminology

General.

We denote by the input space of dimension , and the label space, denotes the true (unknown) joint distribution of on . is the -dimensional probability simplex , identified with the set of probability distributions on . An iid sample drawn from is written . To avoid any ambiguity with the label , we use boldface to denote the one-hot encoding of . The empirical distribution of the input-label pairs is written . A classifier is a measurable function depending on a set of real parameters , here a NN with several hidden layers, with the last always being a softmax function. The logits of the classifier (pre-softmax) are written , and is its component for the th class. The prediction vector of the classifier (post-softmax) is written .

Adversarial attacks.

An attacker constructs an adversarial example based on a clean input by adding a perturbation to it: . The goal of the attack is to have . The norm of the the perturbation vector measures the size of the attack. In this work, we limit ourselves to -norm attacks, wherein . A tolerance threshold controls the size of the attack: the attacker is only allowed to inflict perturbations of size .

1.2 Related works

Works on adversarial robustness can be mainly divided into three fields: attacks, defenses, and understanding of the adversarial phenomenon.

Our work will focus on untargeted, white-box attacks, i.e. threat models that only seek to fool the NN (as opposed to tricking it into predicting a specific class), and have unlimited access to the NN parameters. State-of-the art attacks include FGSM goodfellow2014explaining () which is a very simple, fast and popular attack, BIM kurakin2016adversarial (), an iterative attack based on FGSM, DeepFool moosavi2016deepfool () and C&W carlini2017towards (). Note that the tolerance threshold can be explicitely tuned in FGSM and BIM, but not in DeepFool and C&W.

Many idea have been proposed regarding defenses. The main one is adversarial training goodfellow2014explaining (), which consists in feeding a NN with both clean and adversarially-crafted data during training time. This defense method will be used in this paper as a baseline for comparative purposes. Another important method is defensive distillation papernot2016distillation (); papernot2016effectiveness (), which is quite closely related to LS. This method trains a separate NN algorithm and uses its outputs as the input labels for the main NN algorithm. It was proven to be an efficient defense method until being broken by C&W attack carlini2017towards ().

LS was first introduced as a regularization method pereyra2017regularizing (); labelsmoothing (), but was also briefly studied as a defense method in shafahi2018label (). In this paper, we generalize the idea of LS proposed and used in these three papers and propose three novel methods relevant for the adversarial issue. We develop theoretical as well as empirical results about the defensive potential of LS.

For a more thorough introduction to the field, interested readers can refer to surveys like akhtar2018threat (); zhang2018adversarial ().

1.3 Contributions overview

In section 2, we develop a unified framework for Label-Smoothing (LS), and propose a variety of new LS methods, the main one being Adversarial Label-Smoothing (ALS), and show that these LS methods all induce some kind of logit-squeezing, which results in robustness to adversarial attacks. In section 3, we give a complete mathematical treatment of the effect of LS in a simple case, with regards to robustness to adversarial attacks. Section 4 reports empirical results on real datasets. In section 5, we conclude and provide ideas for future works.

2 A unified framework for Label-Smoothing

In standard classification datasets, each example is hard-labeled with exactly one class . Such overconfidence in the labels can lure a classification algorithm into over-fitting the input distribution  labelsmoothing (). LS  labelsmoothing (); labelsmoothingbis () is a resampling technique wherein one replaces the vector of probability one on the true class (one-hot encoding) with a different vector which is "close" to . Precisely, LS withdraws a fraction of probability mass from the "real" class label and reallocates it to other classes. As we will see, the choice of redistribution method is quite flexible, and leads to different LS methods.

Let be the Total-Variation distance between two probability vectors . For , define the uncertainty set of acceptable label distributions by

made up of joint distributions on the dataset , for which the conditional label distribution is within TV distance less than of the one-hot encoding of the observed label .

By direct computation, one has that and so the uncertainty set can be rewritten as

Any conditional label distribution from the uncertainty set can be written

(1)

This is why different choices of lead to different Label-Smoothing methods, so the training of the NN corresponds to the following optimization problem:

(2)

where is the smoothed cross-entropy loss (generalizing the standard cross-entropy loss), defined by

It turns out that the optimization problem (2) can be rewritten as the optimization of a usual cross-entropy loss, plus a penalty term on the gap between the components of logits (one logit per class) produced by the model on each example .

Theorem 1 (General Label-Smoothing enforces logit-squeezing).

The optimization problem (2) is equivalent to the logit-regularized problem

where is the standard cross-entropy loss, and

where is the logits vector for example .

Proof.

See Appendix A.1. ∎

In the next two parts, we present four different LS methods that are relevant to tackle the adversarial robustness issue. A summary of these methods are presented in Table 1.

Paper Name Induced logit penalty
 labelsmoothing () standard label-smoothing (SLS)
Our paper adversarial label-smoothing (ALS) , see (4)
Our paper Boltzmann label-smoothing (BLS) , see (6)
Our paper second-best label-smoothing (SBLS) , see (7)
Table 1: Different of LS methods. They all derive from the general equation (1).

2.1 Adversarial Label-Smoothing

Adversarial Label-Smoothing (ALS) arises from the worst possible smooth label for each example . To this end, consider the two-player game:

(3)

The inner problem in (3) has an analytic solution (see Appendix A.2) given by, :

(4)

where is the index of smallest component of the logits vector for input , and is the one-hot encoding thereof.

Interpretation of ALS.

acts as a smoothing parameter: if , then , and we recover hard labels. If , the adversarial weights live in the sub-simplex spanned by the smallest components of the predictions vector . For , is a proper convex combination of the two previous cases. Applying Theorem 1, we have:

Corollary 1 (ALS enforces logit-squeezing).

The logit-regularized problem equivalent of the ALS problem (3) is given by: , where

For each data point with true label , the logit-squeezing penalty term forces the model to refrain from making over-confident predictions, corresponding to large , that can lead to overfitting. This means that every class label receive a positive prediction output: . The resulting models are less vulnerable to adversarial perturbations on the input . One can also see ALS as the label analog of adversarial training goodfellow2014explaining (); kurakin2016adversarial (). Instead of modyfing the input data , we modify the label data . However, unlike adversarial training, ALS is attack-independent: it does not require to choose a specific attack method to be trained on.

ALS implementation.

We noted that ALS only consists in redefining a loss, taking the smoothed cross-entropy instead of the classical cross-entropy. It is very simple to implement, and computationally as efficient as a traditional training.

See algorithm 1 for an easy implementation of ALS.

  Input: training data ; a given model; smoothing parameter ;
  for each epoch do
     for each mini-batch ,  do
        Smooth labels via (4)
        Get predictions
        Compute loss
        Update model parameters via back-prop
     end for
  end for
Algorithm 1 Adversarial Label-Smoothing (ALS) training

2.2 Other Label-Smoothing methods

Standard Label-Smoothing.

Standard Label-Smoothing (SLS) is the method developed in labelsmoothing (). It corresponds to uniformly distributing the mass removed from the real class over the other classes. That is, the term in (1) is given by

(5)

In this case, . If , with a perfect model (i.e. if and else), we have , an -norm penalty on the logits.

Boltzmann Label-Smoothing.

ALS puts weights on only two classes: the true class label (due to the constraint of the model) and the class label which minimizes the logit vector. It thus gives "two-hot" labels rather than "smoothed" labels. Replacing hard-min with a soft-min in (4) leads to the so-called Boltzmann Label-Smoothing (BLS), defined by setting the term in (1) to:

(6)

where is the Boltzmann distribution with energy levels at temperature . It interpolates between ALS (corresponding to ), and SLS (corresponding to ).

Second-Best Label-Smoothing.

SLS, ALS and BLS give positive prediction outputs for every label because we add weight to either every label, or the "worst" wrong label. However, in the problem we consider, it does not matter if we fool the classifier by making it predict the "worst" or the "closest" wrong class. Therefore, a completely different approach consists in concentrating our effort and add all the available mass only on the "closest" class label. This leads to Second Best Label-Smoothing (SBLS) defined by: :

(7)

The problem can be rewritten as: . Note the correspondence with the opposite of the Hinge loss in the second term: this penalty tends to make the margin between the true class prediction and the closest wrong class prediction smaller.

Training with each of these Label-Smoothing methods (SLS, BLS, SBLS) can be implemented via Alg. 1, using Eqn. 5, 6 or 7 respectively, in line 5 of the the Alg. 1 instead of Eqn. 4. We finally obtain four different LS methods: ALS, BLS, SBLS and SLS. The effects of ALS in particular and LS as a general method are investigated in Section 3, and each of the four methods will be tested as defense methods in Section 4.

3 Understanding the effects of LS

We now explore a simple example illustrating some of the implications of using LS, with regards to standard accuracy and rebustness to adversarial attacks.

3.1 Case study: the "triangular" dataset

(a) Illustration: std. accuracies are very close (Bayes: ; ALS: ); adv. accuracies () are not (Bayes: ; ALS: ). Correctly classified points within the dark area are adversarially misclassified.
(b) Results: triangular experiment run for different values of . The adversarial accuracy of ALS classifiers is better, especially for greater .
Figure 1: Triangular experiment

In a very simple set-up (univariate, linear model, binary classification), we want to show how ALS impacts the decision boundary of a neural network, which improves adversarial robustness.

Let’s consider a problem with: and ; where is the triangular law with density and is the law with density . The two density functions are represented in Figure 0(a).

We use a simple linear classifier: , and let be the output of the model. Here, is the softmax activation function. W.L.O.G, assume and fix and . Thus, the model predicts class 1 if and only if . The optimal adversarial attack of strength is just if and otherwise.

Bayes classifier.

We easily check that if and if , so the Bayes classifier predicts class 1 if and only if .

The standard accuracy is then

and the adversarial one (i.e. accuracy for adversarial inputs )

ALS classifier.

To derive the optimal value for the ALS classifier, we need to compute and set it to 0, where

(8)

To do so, let’s explore the different possible values for the loss. The loss for an input depends on the value taken by . With ALS, we will have as much mass as possible on the class corresponding to the smallest prediction value. Thus:

  • Case (1):

  • Case (2):

  • Case (3):

  • Case (4):

Writing for the probabilities of each case respectively, we can now separate the integral in Eqn. 8, obtaining:

where , which depends on . Thus,

which yields

(9)

The constant depends on , and we can prove that Eqn. (9) admits a unique solution written . The ALS classifier thus predicts class 1 iff .

Analytic formulae for the adversarial robustness accuracy of this ALS classifier as a function of is given in Appendix A.3. The results of these experiments are presented in Fig. 0(b). The boundary between class and shifts from to . For example, : this improves the adversarial accuracy without decreasing standard accuracy.

We can see on Figure 0(a) that unlike the Bayes case, the ALS decision boundary is shifted towards a region of low density for the two laws. Therefore, few points are close to the decision boundary, so few adversarial attacks succeed. Traditional training does not take into account this geometrical aspect of the classification problem, while LS does.

3.2 Logit-squeezing and gradient-based method

Applying LS generates a logit-squeezing effect (see Theorem 1) which tends to prevent the model from being over-confident in its predictions. This effect was investigated in pereyra2017regularizing () and is illustrated in Fig. 1(a).

In addition to this impact on the logits and predictions, LS also have an effect on the logits’ gradients (with respect to , see Fig. 1(b)) which can help explain why ALS trained models are more robust to adversarial attacks. As described in shafahi2018label (), using a linear approximation, an attack is successful if for any , where is the attack perturbation. With an FGSM-like attack of strength , it thus works if i.e. if .

By reducing this logit vector gradient gap, LS provides more robust models at least against gradient-based attack methods.

(a) Regularization effect: illustration of logit squeezing as a function of . Darker is more confidence.
(b) Gradient gap reduction: density of the gradient gap for a linear model.
Figure 2: Effects of LS

3.3 Why does LS help adversarial robustness ?

Pointwise, the SmoothCE loss induces different costs compared to the traditional CE loss. As discussed in Sec. 3.2, over-confidently classified points are more penalized. Likewise, very badly classified points are also more penalized. The model is thus forced to put the decision boundary in a region with few data points (see Section 3.1). If not, either the penalty term or the general term , defined in Sec. 2.1, will be too high. The underlying geometry of the dataset is thus better addressed compared to a traditional training: boundaries are closer to "the middle", i.e the margin between two classes is bigger (similar to how SVM operates), leading to increased robustness. Traditional CE loss, however, induces a direct power relationship: the boundary between two classes is pushed close to the smallest one.

4 Experiments

We run the four different attacks111In the CIFAR10 ResNet and SVHN LeNet set-ups, the number of iteration for C&W attack is sub-optimal. More iterations would have broken the models, however, C&W is hardly scalable because it requires a long time to train, especially for sophisticated models like ResNet. on different set ups (datasets MNIST, CIFAR10, SVHN and models MLP, LeNet, ResNet18). For comparison purposes, we also run the attacks on reference models: the same models used in the experiments but without any LS or regularization (see black lines in Fig. 2(d) and values for Figs. 2(a), 2(b), 2(c)); and models trained using adversarial training against FGSM as defined in goodfellow2014explaining (). The modified loss is given by , and here (see purples lines in Fig. 3). Some results are shown in Fig. 3, and more numerical results are presented in Tables 3 to 6 (see Appendix A.4).

One can see that for all attacks but FGSM, at least one of our LS methods perform better than adversarial training (see Figs 2(a), 2(b), 2(c)). This suggests that LS is a general defense method that can perform well on many different attacks (even on FGSM: adversarial training is better but LS still performs well); unlike adversarial training which is excellent but only on one attack. However, the optimal value of is not universal, and seem to depend both on the set-up (dataset and NN architecture) and on the attack.

On the whole, ALS and BLS give better results than SLS and SBLS, except on one set-up (C&W attack on MNIST LeNet Fig. 2(a)). They thus should be preferred when implementing LS. Furthermore, the temperature hyperparameter for BLS method does not seem to have a great impact on the results. for example is a good default value.

Altogether, we see that LS is a good candidate for improving the adversarial robustness of NNs.

(a) MNIST LeNet. for FGSM and BIM
(b) CIFAR10 ResNet. for FGSM and BIM
(c) SVHN LeNet. for FGSM and BIM
(d) FGSM: all set-ups
Figure 3: Experiments: Figs 2(a), 2(b), 2(c) show the evolution of adversarial accuracy as a function of for different models against all attacks. Fig 2(d) shows the evolution of adversarial accuracy as a function of for all models agains FGSM.

5 Conclusion

We have proposed a general framework for Label-Smoothing (LS) as well as a new variety of LS methods (Section 2) as a way to alleviate the vulnerability of Deep learning image classification algorithms. We developed a theoretical understanding of LS (Theorem 1 and Section 3) and our results have been demonstrated empirically via experiments on real datasets (CIFAR10, MNIST, SVHN), neural-network models (MLP, LeNet, ResNet), and SOTA attack models (FGSM, BIM, DeepFool, C&W).

LS improves the adversarial accuracy of neural networks, and can also boost standard accuracy, suggesting a connection between adversarial robustness and generalization. Even though our results (see Section 3) provide evidence that LS classifiers are more robust because they take the dataset geometry into better consideration, better understanding of the adversarial phenomenon and the representations learned by NNs would be desirable.

Moreover, compared to other robustification methods (e.g adversarial training), the ease of implementation of LS is very appealing: it is simple, fast, with one interpretable hyperparameter (). Being costless is one of the major benefits of implementing LS. Experimental results (section 4) could be completed with various NNs and datasets, which is also left for future works.

References

  • (1) Naveed Akhtar and Ajmal Mian. Threat of adversarial attacks on deep learning in computer vision: A survey. IEEE Access, 6:14410–14430, 2018.
  • (2) Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pages 39–57. IEEE, 2017.
  • (3) Alhussein Fawzi, Seyed-Mohsen Moosavi-Dezfooli, and Pascal Frossard. Robustness of classifiers: from adversarial to random noise. In Advances in Neural Information Processing Systems, pages 1632–1640, 2016.
  • (4) Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
  • (5) Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
  • (6) Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236, 2016.
  • (7) Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2574–2582, 2016.
  • (8) Nicolas Papernot and Patrick McDaniel. On the effectiveness of defensive distillation. arXiv preprint arXiv:1607.05113, 2016.
  • (9) Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE Symposium on Security and Privacy (SP), pages 582–597. IEEE, 2016.
  • (10) Gabriel Pereyra, George Tucker, Jan Chorowski, Lukasz Kaiser, and Geoffrey Hinton. Regularizing neural networks by penalizing confident output distributions. arXiv preprint arXiv:1701.06548, 2017.
  • (11) Ali Shafahi, Amin Ghiasi, Furong Huang, and Tom Goldstein. Label smoothing and logit squeezing: A replacement for adversarial training? 2018.
  • (12) Chawin Sitawarin, Arjun Nitin Bhagoji, Arsalan Mosenia, Mung Chiang, and Prateek Mittal. Darts: Deceiving autonomous cars with toxic signs. arXiv preprint arXiv:1802.06430, 2018.
  • (13) C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2818–2826, June 2016.
  • (14) Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
  • (15) Thomas Tanay and Lewis Griffin. A boundary tilting persepective on the phenomenon of adversarial examples. arXiv preprint arXiv:1608.07690, 2016.
  • (16) Florian Tramèr, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. The space of transferable adversarial examples. arXiv preprint arXiv:1704.03453, 2017.
  • (17) David Warde-Farley. Adversarial perturbations of deep neural networks. 2016.
  • (18) Jiliang Zhang and Xiaoxiong Jiang. Adversarial examples: Opportunities and challenges. arXiv preprint arXiv:1809.04790, 2018.
  • (19) Qinghe Zheng, Mingqiang Yang, Jiajie Yang, Qingrui Zhang, and Xinxin Zhang. Improvement of generalization ability of deep cnn via implicit regularization in two-stage training process. IEEE Access, 6:15844–15869, 2018.

Appendix A Appendix

a.1 LS optimization program

Proof of Theorem 1.

Recall that is the one-hot encoding of the example with label . By direct computation, one has

where is the vector logits for example . ∎

a.2 Analytic solution for ALS formula

Lemma 1.

Let , , and . The general solution of the problem

(10)

is , where is any solution to the problem with , namely

Proof.

Consider the invertible change of variable which maps the simplex unto itself, with inverse .

It follows, that

which is attained by

yielding . ∎

a.3 Triangular experiment: formulae for the ALS classifier accuracies

We have, assuming :

and thus

The ALS classifier predicts 1 iff , where is the solution of the equation

Assuming , the standard accuracy of this ALS classifier is given by:

and the adversarial one by:

a.4 Experiments: numerical results

The following tables show the adversarial accuracy for different model set-ups, defenses and attacks. In each table, we have the adversarial accuracies for one attack (or the standard accuracies in Table 6). Accuracies for LS-regularized models are presented for three different choices of and . For FGSM and BIM ((Tables 3 and 3), we chose 3 different values of the attack strength and . For example, the adversarial accuracy against FGSM attack with for the BLS-regularized model with using MNIST LeNet set-up is shown in Table 3 and is equal to .

Moreover, we highlighted in color the best accuracy for a set-up and a particular attack (or attack and strength in the case of FGSM and BIM). Each set-up corresponds to one color (e.g. light yellow for MNIST Linear and red for SVHN LeNet). If the best accuracy is less than the accuracy obtained with random predictions (i.e. in all our set-ups), it is not highlighted. For example, the best accuracy against FGSM of strenght in the CIFAR LeNet set-up is equal to and is obtained by both a ALS and BLS-regularized NN with . Overall, adversarial training is better on FGSM (more colors on the adversarial training lines in Table 3 compared to other defenses), but ALS and BLS are better on other attacks. SBLS is better only against C&W. In Table 6, we see that ALS, BLS and SLS NNs are always better or equivalent to a normal classifier (no regularization, no defense method) in terms of standard accuracy.

val.
ALS MNIST Linear 0.832 0.870 0.857 0.507 0.526 0.489 0.450 0.465 0.432
MNIST LeNet 0.957 0.959 0.954 0.847 0.732 0.639 0.822 0.561 0.130
CIFAR LeNet 0.098 0.160 0.002 0.069 0.147 0.002 0.067 0.151 0.001
CIFAR ResNet 0.245 0.365 0.433 0.099 0.125 0.123 / / /
SVHN LeNet 0.450 0.458 0.497 0.370 0.302 0.381 / / /
SLS MNIST Linear 0.818 0.848 0.840 0.532 0.470 0.412 0.506 0.437 0.384
MNIST LeNet 0.956 0.956 0.954 0.840 0.764 0.656 0.823 0.699 0.229
CIFAR LeNet 0.119 0.153 0.127 0.097 0.134 0.101 0.092 0.136 0.093
CIFAR ResNet 0.254 0.345 0.389 0.098 0.115 0.116 / / /
SVHN LeNet 0.404 0.472 0.395 0.353 0.312 0.195 / / /
BLS MNIST Linear 0.821 0.868 0.858 0.494 0.525 0.494 0.442 0.457 0.441
MNIST LeNet 0.956 0.958 0.954 0.838 0.741 0.642 0.812 0.616 0.131
CIFAR LeNet 0.085 0.160 0.122 0.055 0.141 0.107 0.055 0.130 0.114
CIFAR ResNet 0.252 0.346 0.450 0.096 0.129 0.138 / / /
SVHN LeNet 0.430 0.442 0.327 0.352 0.293 0.153 / / /
SBLS MNIST Linear 0.763 0.804 0.639 0.530 0.353 0.327 0.431 0.212 0.186
MNIST LeNet 0.955 0.956 0.929 0.855 0.695 0.491 0.836 0.279 0.141
CIFAR LeNet 0.103 0.143 0.136 0.061 0.098 0.068 0.058 0.085 0.053
CIFAR ResNet 0.191 0.159 0.231 0.077 0.079 0.093 / / /
SVHN LeNet 0.449 0.375 0.277 0.378 0.157 0.149 / / /
Normal classifier MNIST Linear 0.791 0.003 0.000
MNIST LeNet 0.956 0.498 0.022
CIFAR LeNet 0.049 0.002 0.009
CIFAR ResNet 0.090 0.052 /
SVHN LeNet 0.195 0.006 /
Adv. training MNIST Linear 0.914 0.829 0.936
MNIST LeNet 0.970 0.931 0.754
CIFAR LeNet 0.011 0.542 0.515
CIFAR ResNet 0.078 0.525 0.222
SVHN LeNet 0.191 0.815 0.724
Table 3: BIM
val.
ALS MNIST Linear 0.816 0.854 0.845 0.429 0.485 0.456 0.420 0.472 0.435
MNIST LeNet 0.947 0.946 0.945 0.813 0.614 0.606 0.811 0.560 0.484
CIFAR LeNet 0.064 0.140 0.000 0.055 0.137 0.000 0.054 0.136 0.000
CIFAR ResNet 0.102 0.167 0.248 0.046 0.076 0.074 / / /
SVHN LeNet 0.403 0.408 0.470 0.386 0.324 0.435 / / /
SLS MNIST Linear 0.801 0.829 0.826 0.495 0.447 0.398 0.492 0.439 0.391
MNIST LeNet 0.946 0.942 0.942 0.817 0.700 0.398 0.815 0.689 0.165
CIFAR LeNet 0.093 0.131 0.100 0.087 0.128 0.096 0.087 0.128 0.094
CIFAR ResNet 0.135 0.142 0.230 0.051 0.035 0.046 / / /
SVHN LeNet 0.371 0.421 0.322 0.362 0.345 0.193 / / /
BLS MNIST Linear 0.803 0.853 0.841 0.534 0.481 0.458 0.528 0.470 0.443
MNIST LeNet 0.946 0.944 0.947 0.835 0.645 0.532 0.834 0.603 0.389
CIFAR LeNet 0.062 0.140 0.103 0.056 0.136 0.097 0.055 0.135 0.097
CIFAR ResNet 0.153 0.142 0.265 0.060 0.043 0.075 / / /
SVHN LeNet 0.435 0.393 0.266 0.424 0.322 0.142 / / /
SBLS MNIST Linear 0.736 0.772 0.597 0.470 0.227 0.299 0.370 0.091 0.252
MNIST LeNet 0.945 0.945 0.914 0.841 0.359 0.239 0.808 0.126 0.072
CIFAR LeNet 0.074 0.114 0.090 0.057 0.077 0.036 0.055 0.068 0.028
CIFAR ResNet 0.054 0.021 0.045 0.010 0.008 0.018 / / /
SVHN LeNet 0.423 0.265 0.200 0.383 0.073 0.116 / / /
Normal classifier MNIST Linear 0.776 0.001 0.000
MNIST LeNet 0.946 0.114 0.000
CIFAR LeNet 0.015 0.000 0.000
CIFAR ResNet 0.003 0.000 /
SVHN LeNet 0.117 0.000 /
Adv. training MNIST Linear 0.899 0.128 0.001
MNIST LeNet 0.968 0.894 0.316
CIFAR LeNet 0.000 0.000 0.000
CIFAR ResNet 0.000 0.000 0.000
SVHN LeNet 0.007 0.000 0.000
Table 2: FGSM
Adv. acc.
val.
ALS MNIST Linear 0.037 0.092 0.341
MNIST LeNet 0.047 0.299 0.529
CIFAR LeNet 0.005 0.006 0.001
CIFAR ResNet 0.040 0.030 0.33
SVHN LeNet 0.020 0.010 0.008
SLS MNIST Linear 0.036 0.080 0.683
MNIST LeNet 0.035 0.183 0.380
CIFAR LeNet 0.006 0.007 0.008
CIFAR ResNet 0.073 0.049 0.057
SVHN LeNet 0.020 0.007 0.007
BLS MNIST Linear 0.033 0.088 0.414
MNIST LeNet 0.031 0.314 0.500
CIFAR LeNet 0.007 0.008 0.008
CIFAR ResNet 0.051 0.059 0.032
SVHN LeNet 0.024 0.006 0.006
SBLS MNIST Linear 0.035 0.152 0.516
MNIST LeNet 0.028 0.142 0.305
CIFAR LeNet 0.005 0.006 0.006
CIFAR ResNet 0.050 0.037 0.064
SVHN LeNet 0.024 0.014 0.038
Normal classifier MNIST Linear 0.025
MNIST LeNet 0.050
CIFAR LeNet 0.009
CIFAR ResNet 0.036
SVHN LeNet 0.031
Adv. training MNIST Linear 0.192
MNIST LeNet 0.175
CIFAR LeNet 0.020
CIFAR ResNet 0.052
SVHN LeNet 0.057
Table 5: CW
Adv. acc.
val.
ALS MNIST Linear 0.014 0.056 0.073
MNIST LeNet 0.020 0.042 0.084
CIFAR LeNet 0.000 0.002 0.000
CIFAR ResNet 0.022 0.098 0.205
SVHN LeNet 0.010 0.039 0.056
SLS MNIST Linear 0.013 0.025 0.031
MNIST LeNet 0.022 0.046 0.058
CIFAR LeNet 0.000 0.004 0.000
CIFAR ResNet 0.029 0.087 0.159
SVHN LeNet 0.006 0.044 0.044
BLS MNIST Linear 0.014 0.035 0.059
MNIST LeNet 0.018 0.046 0.062
CIFAR LeNet 0.000 0.004 0.000
CIFAR ResNet 0.019 0.089 0.231
SVHN LeNet 0.006 0.028 0.045
SBLS MNIST Linear 0.017 0.071 0.007
MNIST LeNet 0.016 0.128 0.474
CIFAR LeNet 0.002 0.000 0.014
CIFAR ResNet 0.063 0.173 0.106
SVHN LeNet 0.009 0.062 0.052
Normal classifier MNIST Linear 0.015
MNIST LeNet 0.026
CIFAR LeNet 0.000
CIFAR ResNet 0.015
SVHN LeNet 0.031
Adv. training MNIST Linear 0.037
MNIST LeNet 0.433
CIFAR LeNet 0.020
CIFAR ResNet 0.052
SVHN LeNet 0.195
Table 6: Std. accuracies
Std. acc.
val.
ALS MNIST Linear 0.979 0.981 0.976
MNIST LeNet 0.990 0.990 0.989
CIFAR LeNet 0.623 0.664 0.148
CIFAR ResNet 0.887 0.890 0.889
SVHN LeNet 0.890 0.894 0.879
SLS MNIST Linear 0.978 0.981 0.975
MNIST LeNet 0.989 0.990 0.986
CIFAR LeNet 0.628 0.638 0.643
CIFAR ResNet 0.885 0.894 0.895
SVHN LeNet 0.892 0.894 0.889
BLS MNIST Linear 0.978 0.981 0.976
MNIST LeNet 0.989 0.990 0.989
CIFAR LeNet 0.639 0.651 0.622
CIFAR ResNet 0.888 0.889 0.897
SVHN LeNet 0.891 0.890 0.841
SBLS MNIST Linear 0.977 0.977 0.949
MNIST LeNet 0.989 0.988 0.975
CIFAR LeNet 0.628 0.619 0.572
CIFAR ResNet 0.883 0.881 0.840
SVHN LeNet 0.886 0.883 0.816
Normal classifier MNIST Linear 0.980
MNIST LeNet 0.990
CIFAR LeNet 0.635
CIFAR ResNet 0.886
SVHN LeNet 0.884
Adv. training MNIST Linear 0.987
MNIST LeNet 0.983
CIFAR LeNet 0.621
CIFAR ResNet 0.747
SVHN LeNet 0.875
Table 4: DeepFool
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
379701
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description