Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers

Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers

Hadi Salman, Greg Yang, Jerry Li,
Pengchuan Zhang11footnotemark: 1, Huan Zhang11footnotemark: 1, Ilya Razenshteyn11footnotemark: 1, Sébastien Bubeck
Microsoft Research AI
{hadi.salman, gregyang, jerrl,
penzhan, t-huzhan, ilyaraz, sebubeck}@microsoft.com
Reverse alphabetical order. Work done as part of the Microsoft AI Residency Program.
Abstract

Recent works have shown the effectiveness of randomized smoothing as a scalable technique for building neural network-based classifiers that are provably robust to -norm adversarial perturbations. In this paper, we employ adversarial training to improve the performance of randomized smoothing. We design an adapted attack for smoothed classifiers, and we show how this attack can be used in an adversarial training setting to boost the provable robustness of smoothed classifiers. We demonstrate through extensive experimentation that our method consistently outperforms all existing provably -robust classifiers by a significant margin on ImageNet and CIFAR-10, establishing the state-of-the-art for provable -defenses. Our code and trained models are available at http://github.com/Hadisalman/smoothing-adversarial.

 

Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers


  Hadi Salman, Greg Yang, Jerry Li, Pengchuan Zhang11footnotemark: 1, Huan Zhang11footnotemark: 1, Ilya Razenshteyn11footnotemark: 1, Sébastien Bubeck thanks: Reverse alphabetical order. Work done as part of the Microsoft AI Residency Program. Microsoft Research AI {hadi.salman, gregyang, jerrl, penzhan, t-huzhan, ilyaraz, sebubeck}@microsoft.com

\@float

noticebox[b]Preprint.\end@float

1 Introduction

Neural networks have been very successful in tasks such as image classification and speech recognition, but have been shown to be extremely brittle to small, adversarially-chosen perturbations of their inputs (Szegedy et al., 2013; Goodfellow et al., 2015). A classifier (e.g., a neural network), which correctly classifies an image , can be fooled by an adversary to misclassify where is an adversarial perturbation so small that and are indistinguishable for the human eye. Recently, many works have proposed heuristic defenses intended to train models robust to such adversarial perturbations. However, most of these defenses were broken using more powerful adversaries (Carlini and Wagner, 2017; Athalye et al., 2018; Uesato et al., 2018). This encouraged researchers to develop defenses that lead to certifiably robust classifiers, i.e., whose predictions for most of the test examples can be verified to be constant within a neighborhood of (Wong and Kolter, 2018; Raghunathan et al., 2018a). Unfortunately, these techniques do not immediately scale to large neural networks that are used in practice.

To mitigate this limitation of prior certifiable defenses, a number of papers (Lecuyer et al., 2018; Li et al., 2018; Cohen et al., 2019) consider the randomized smoothing approach, which transforms any classifier (e.g., a neural network) into a new smoothed classifier that has certifiable -norm robustness guarantees. This transformation works as follows.

Let be an arbitrary base classifier which maps inputs in to classes in . Given an input , the smoothed classifier labels as having class which is the most likely to be returned by the base classifier when fed a noisy corruption , where is a vector sampled according to an isotropic Gaussian distribution.

As shown in Cohen et al. (2019), one can derive certifiable robustness for such smoothed classifiers via the Neyman-Pearson lemma. They demonstrate that for perturbations, randomized smoothing outperforms other certifiably robust classifiers that have been previously proposed. It is scalable to networks with any architecture and size, which makes it suitable for building robust real-world neural networks.

Our contributions

In this paper, we employ adversarial training to substantially improve on the previous certified robustness results of randomized smoothing (Lecuyer et al., 2018; Li et al., 2018; Cohen et al., 2019). We present, for the first time, a direct attack for smoothed classifiers. We then demonstrate how to use this attack to adversarially train smoothed models with not only boosted empirical robustness but also substantially improved certifiable robustness using the certification method of Cohen et al. (2019).

We demonstrate that our method outperforms all existing provably -robust classifiers by a significant margin on ImageNet and CIFAR-10, establishing the state-of-the-art for provable -defenses. For instance, our Resnet-50 ImageNet classifier achieves provable top-1 accuracy (compared to the best previous provable accuracy of ) under adversarial perturbations with norm less than . Similarly, our Resnet-110 CIFAR-10 classifier achieves up to improvement over previous state-of-the-art. Our main results are reported in Tables 1 and 2 for ImageNet and CIFAR-10.

radius (ImageNet) 0.5 1.0 1.5 2.0 2.5 3.0 3.5
Cohen et al. (2019) (%) 49 37 29 19 15 12 9
Ours (%) 56 43 37 27 25 20 16
Table 1: Certified top-1 accuracy of our best ImageNet classifiers at various radii.
Radius (CIFAR-10)
Cohen et al. (2019) (%) 60 43 32 23 17 14 12 10 8
Ours (%) 74 57 48 38 33 29 24 19 17
Table 2: Certified top-1 accuracy of our best CIFAR-10 classifiers at various radii.

2 Our techniques

Here, we describe our techniques for adversarial attacks and training on smoothed classifiers. We first require some background on randomized smoothing classifiers. For a more detailed description of randomized smoothing, see Cohen et al. (2019).

2.1 Background on randomized smoothing

Consider a classifier from to classes . Randomized smoothing is a method that constructs a new, smoothed classifier from the base classifier . The smoothed classifier assigns to a query point the class which is most likely to be returned by the base classifier under isotropic Gaussian noise perturbation of , i.e.,

(1)

The noise level is a hyperparameter of the smoothed classifier which controls a robustness/accuracy tradeoff. Equivalently, this means that returns the class whose decision region has the largest measure under the distribution . Cohen et al. (2019) recently presented a tight robustness guarantee for the smoothed classifier and gave Monte Carlo algorithms for certifying the robustness of around or predicting the class of using , that succeed with high probability. This guarantee can in fact be obtained alternatively by explicitly computing the Lipschitz constant of the smoothed classifier, as we do in Appendix A.

Robustness guarantee for smoothed classifiers

The robustness guarantee presented by Cohen et al. (2019) is as follows: suppose that when the base classifier classifies , the class is returned with probability , and the “runner-up” class is returned with probability . The smoothed classifier is robust around within the radius

(2)

where is the inverse of the standard Gaussian CDF. It is not clear how to compute and exactly (if is given by a deep neural network for example). Monte Carlo sampling is used to estimate some and for which and with arbitrarily high probability over the samples. The result of (2) still holds if we replace with and with .

2.2 SmoothAdv: Attacking smoothed classifiers

We now describe our attack against smoothed classifiers. To do so, it will first be useful to describe smoothed classifiers in a more general setting. Specifically, we consider a generalization of (1) to soft classifiers, namely, functions , where is the set of probability distributions over . Neural networks typically learn such soft classifiers, then use the argmax of the soft classifier as the final hard classifier. Given a soft classifier , its associated smoothed soft classifier is defined as

(3)

Let and denote the hard and soft classifiers learned by the neural network, respectively, and let and denote the associated smoothed hard and smoothed soft classifiers. Directly finding adversarial examples for the smoothed hard classifier is a somewhat ill-behaved problem because of the argmax, so we instead propose to find adversarial examples for the smoothed soft classifier . Empirically we found that doing so will also find good adversarial examples for the smoothed hard classifier. More concretely, given a labeled data point , we wish to find a point which maximizes the loss of in an ball around for some choice of loss function. As is canonical in the literature, we focus on the cross entropy loss . Thus, given a labeled data point our (ideal) adversarial perturbation is given by the formula:

()

We will refer to () as the SmoothAdv objective. The SmoothAdv objective is highly non-convex, so as is common in the literature, we will optimize it via projected gradient descent (PGD), and variants thereof. It is hard to find exact gradients for (), so in practice we must use some estimator based on random Gaussian samples. There are a number of different natural estimators for the derivative of the objective function in (), and the choice of estimator can dramatically change the performance of the attack. For more details, see Section 3.

We note that () should not be confused with the similar-looking objective

(4)

as suggested in section G.3 of Cohen et al. (2019). There is a subtle, but very important, distinction between () and (4). Conceptually, solving (4) corresponds to finding an adversarial example of that is robust to Gaussian noise. In contrast, () is directly attacking the smoothed model i.e. trying to find adversarial examples that decrease the probability of correct classification of the smoothed soft classifier . From this point of view, () is the right optimization problem that should be used to find adversarial examples of . This distinction turns out to be crucial in practice: empirically, Cohen et al. (2019) found attacks based on (4) not to be effective.

Interestingly, for a large class of classifiers, including neural networks, one can alternatively derive the objective () from an optimization perspective, by attempting to directly find adversarial examples to the smoothed hard classifier that the neural network provides. While they ultimately yield the same objective, this perspective may also be enlightening, and so we include it in Appendix B.

2.3 Adversarial training using SmoothAdv

We now wish to use our new attack to boost the adversarial robustness of smoothed classifiers. We do so using the well-studied adversarial training framework (Kurakin et al., 2016; Madry et al., 2017). In adversarial training, given a current set of model weights and a labeled data point , one finds an adversarial perturbation of for the current model , and then takes a gradient step for the model parameters, evaluated at the point . Intuitively, this encourages the network to learn to minimize the worst-case loss over a neighborhood around the input.

At a high level, we propose to instead do adversarial training using an adversarial example for the smoothed classifier. We combine this with the approach suggested in Cohen et al. (2019), and train at Gaussian perturbations of this adversarial example. That is, given current set of weights and a labeled data point , we find as a solution to (), and then take a gradient step for based at gaussian perturbations of . In contrast to standard adversarial training, we are training the base classifier so that its associated smoothed classifier minimizes worst-case loss in a neighborhood around the current point. For more details of our implementation, see Section 3.2. We emphasize that although we are training using adversarial examples for the smoothed soft classifier, in the end we certify the robustness of the smoothed hard classifier we obtain after training.

We make two important observations about our method. First, adversarial training is an empirical defense, and typically offers no provable guarantees. However, we demonstrate that by combining our formulation of adversarial training with randomized smoothing, we are able to substantially boost the certifiable robust accuracy of our smoothed classifiers. Thus, while adversarial training using SmoothAdv is still ultimately a heuristic, and offers no provable robustness by itself, the smoothed classifier that we obtain using this heuristic has strong certifiable guarantees.

Second, we found empirically that to obtain strong certifiable numbers using randomized smoothing, it is insufficient to use standard adversarial training on the base classifier. While such adversarial training does indeed offer good empirical robust accuracy, the resulting classifier is not optimized for randomized smoothing. In contrast, our method specifically finds base classifiers whose smoothed counterparts are robust. As a result, the certifiable numbers for standard adversarial training are noticeably worse than those obtained using our method. See Appendix C.1 for an in-depth comparison.

3 Implementing SmoothAdv via first order methods

As mentioned above, it is difficult to optimize the SmoothAdv objective, so we will approximate it via first order methods. We focus on two such methods: the well-studied projected gradient descent (PGD) method (Kurakin et al., 2016; Madry et al., 2017), and the recently proposed decoupled direction and norm (DDN) method (Rony et al., 2018) which achieves robust accuracy competitive with PGD on CIFAR-10.

The main task when implementing these methods is to, given a data point , compute the gradient of the objective function in () with respect to . If we let denote the objective function in (), we have

(5)

However, it is not clear how to evaluate (5) exactly, as it takes the form of a complicated high dimensional integral. Therefore, we will use Monte Carlo approximations. We sample i.i.d. Gaussians , and use the plug-in estimator for the expectation:

(6)

It is not hard to see that if is smooth, this estimator will converge to (5) as we take more samples. In practice, if we take samples, then to evaluate (6) on all samples requires evaluating the network times. This becomes expensive for large , especially if we want to plug this into the adversarial training framework, which is already slow. Thus, when we use this for adversarial training, we use . When we run this attack to evaluate the empirical adversarial accuracy of our models, we use substantially larger choices of , specifically, . Empirically we found that increasing beyond did not substantially improve performance.

While this estimator does converge to the true gradient given enough samples, note that it is not an unbiased estimator for the gradient. Despite this, we found that using (6) performs very well in practice. Indeed, using (6) yields our strongest empirical attacks, as well as our strongest certifiable defenses when we use this attack in adversarial training. In the remainder of the paper, we let denote the PGD attack with gradient steps given by (6), and similarly we let denote the DDN attack with gradient steps given by (6).

3.1 An unbiased, gradient free method

We note that there is an alternative way to optimize () using first order methods. Notice that the logarithm in () does not change the argmax, and so it suffices to find a minimizer of subject to the constraint. We then observe that

(7)

The equality (a) is known as Stein’s lemma Stein (1981), although we note that something similar can be derived for more general distributions. There is a natural unbiased estimator for (7): sample i.i.d. gaussians , and form the estimator This estimator has a number of nice properties. As mentioned previously, it is an unbiased estimator for (7), in contrast to (6). It also requires no computations of the gradient of ; if is a neural network, this saves both time and memory by not storing preactivations during the forward pass. Finally, it is very general: the derivation of (7) actually holds even if is a hard classifier (or more precisely, the one-hot embedding of a hard classifier). In particular, this implies that this technique can even be used to directly find adversarial examples of the smoothed hard classifier.

Despite these appealing features, in practice we find that this attack is quite weak. We speculate that this is because the variance of the gradient estimator is too high. For this reason, in the empirical evaluation we focus on attacks using (6), but we believe that investigating this attack in practice is an interesting direction for future work. See Appendix C.6 for more details.

  function TrainMiniBatch(, , …, )
   Attacker ( or )
   Generate noise samples for ,
     # List of adversarial examples for training
   for do
      # Adversarial example
    for do
     Update according to the -th step of Attacker, where we use
     the noise samples , , …, to estimate a gradient of the loss of the smoothed
     model according to (6)
     # We are reusing the same noise samples between different steps of the attack
    end
    Append to
    # Again, we are reusing the same noise samples for the augmentation
   end
   Run backpropagation on with an appropriate learning rate
Algorithm 1 1: SmoothAdv-ersarial Training

3.2 Implementing adversarial training for smoothed classifiers

We incorporate adversarial training into the approach of Cohen et al. (2019) changing as few moving parts as possible in order to enable a direct comparison. In particular, we use the same network architectures, batch size, and learning rate schedule. For CIFAR-10, we change the number of epochs, but for ImageNet, we leave it the same. We discuss more of these specifics in Appendix D, and here we describe how to perform adversarial training on a single mini-batch. The algorithm is shown in Pseudocode 1, with the following parameters: is the mini-batch size, is the number of noise samples used for gradient estimation in (6) as well as for Gaussian noise data augmentation, and is the number of steps of an attack111Note that we are reusing the same noise samples during every step of our attack as well as during augmentation. Intuitively, this helps to stabilize the attack process..

4 Experiments

We primarily compare with Cohen et al. (2019) as it was shown to outperform all other scalable provable defenses by a wide margin. As our experiments will demonstrate, our method consistently and significantly outperforms Cohen et al. (2019) even further, establishing the state-of-the-art for provable -defenses. We run experiments on ImageNet (Deng et al., 2009) and CIFAR-10 (Krizhevsky and Hinton, 2009). We use the same base classifiers as Cohen et al. (2019): a ResNet-50 (He et al., 2016) on ImageNet, and ResNet-110 on CIFAR-10. Other than the choice of attack ( or ) for adversarial training, our experiments are distinguished based on four main hyperparameters:

()

Given a smoothed classifier , we use the same prediction and certification algorithms, Predict and Certify, as Cohen et al. (2019). Both algorithms sample base classifier predictions under Gaussian noise. Predict outputs the majority vote if the vote count passes a binomial hypothesis test, and abstains otherwise. Certify certifies the majority vote is robust if the fraction of such votes is higher by a calculated margin than the fraction of the next most popular votes, and abstains otherwise. For details of these algorithms, we refer the reader to Cohen et al. (2019).

The certified accuracy at radius is defined as the fraction of the test set which classifies correctly (without abstaining) and certifies robust at an radius . Unless otherwise specified, we use the same for certification as the one used for training the base classifier . Note that is a randomized smoothing classifier, so this reported accuracy is approximate, but can get arbitrarily close to the true certified accuracy as the number of samples of increases (see Cohen et al. (2019) for more details). Similarly, the empirical accuracy is defined as the fraction of the SmoothAdv-ersarially attacked test set which classifies correctly (without abstaining).

Both Predict and Certify have a parameter defining the failure rate of these algorithms. Throughout the paper, we set (similar to Cohen et al. (2019)), which means there is at most a 0.1% chance that Predict does not return the most probable class under the smoothed classifier , or that Certify falsely certifies a non-robust input.

Figure 1: Comparing our SmoothAdv-ersarially trained CIFAR-10 classifiers vs Cohen et al. (2019). (Left) Upper envelopes of certified accuracies over all experiments. (Middle) Upper envelopes of certified accuracies per . (Right) Certified accuracies of one representative model per . Details of each model used to generate these plots and their certified accuracies are in Tables 4-12 in Appendix E.
Figure 2: Comparing our SmoothAdv-ersarially trained ImageNet classifiers vs Cohen et al. (2019). Subfigure captions are same as Fig. 1. Details of each model used to generate these plots and their certified accuracies are in Table 3 in Appendix E.

4.1 SmoothAdv-ersarial training

To assess the effectiveness of our method, we learn a smoothed classifier that is adversarial trained using (). Then we compute the certified accuracies over a range of radii . Tables 1 and 2 report the certified accuracies using our method compared to (Cohen et al., 2019). For all radii, we outperform the certified accuracies of (Cohen et al., 2019) by a significant margin on both ImageNet and CIFAR-10. These results are elaborated below.

For CIFAR-10

Fig. 1(left) plots the upper envelope of the certified accuracies that we get by choosing the best model for each radius over a grid of hyperparameters. This grid consists of , , (see 4 for explanation), and one of the following attacks {, } with steps. The certified accuracies of each model can be found in Tables 4-12 in Appendix E. These results are compared to those of Cohen et al. (2019) by plotting their reported certified accuracies. Fig. 1(left) also plots the corresponding empirical accuracies using with . Note that our certified accuracies are higher than the empirical accuracies of Cohen et al. (2019).

Fig. 1(middle) plots our vs (Cohen et al., 2019)’s best models for varying noise level . Fig. 1(right) plots a representative model for each from our adversarially trained models. Observe that we outperform (Cohen et al., 2019) in all three plots.

For ImageNet

The results are summarized in Fig. 2, which is similar to Fig. 1 for CIFAR-10, with the difference being the set of smoothed models we certify. This set includes smoothed models trained using , , , and one of the following attacks {1-step , 2-step }. Again, our models outperform those of Cohen et al. (2019) overall and per as well. The certified accuracies of each model can be found in Table 3 in Appendix E.

We point out, as mentioned by Cohen et al. (2019), that controls a robustness/accuracy trade-off. When is low, small radii can be certified with high accuracy, but large radii cannot be certified at all. When is high, larger radii can be certified, but smaller radii are certified at a lower accuracy. This can be observed in the middle and the right plots of Fig. 1 and 2.

Effect on clean accuracy

Training smoothed classifers using SmoothAdv as shown improves upon the certified accuracy of Cohen et al. (2019) for each , although this comes with the well-known effect of adversarial training in decreasing the standard accuracy, so we sometimes see small drops in the accuracy at , as observed in Fig. 1(right) and 2(right).

Additional experiments and observations

We compare the effectiveness of smoothed classifiers when they are trained SmoothAdv-versarially vs. when their base classifier is trained via standard adversarial training (we will refer to the latter as vanilla adversarial training). As expected, because the training objective of SmoothAdv-models aligns with the actual certification objective, those models achieve noticeably more certified robustness over all radii compared to smoothed classifiers resulting from vanilla adversarial training. We defer the results and details to Appendix C.1.

Furthermore, SmoothAdv requires the evaluation of (6) as discussed in Section 3. We analyze in Appendix C.2 how the number of Gaussian noise samples , used in (6) to find adversarial examples, affects the robustness of the resulting smoothed models. As expected, we observe that models trained with higher tend to have higher certified accuracies.

Finally, we analyze the effect of the maximum allowed perturbation used in SmoothAdv on the robustness of smoothed models in Appendix C.3.

4.2 Attacking trained models with SmoothAdv

Figure 3: Certified and empirical robust accuracy of Cohen et al. (2019)’s models on CIFAR-10. For each radius , the certified/empirical accuracy is the maximum over randomized smoothing models trained using . The empirical accuracies are found using 20 steps of . The closer an empirical curve is to the certified curve, the stronger the corresponding attack is (the lower the better).

In this section, we assess the performance of our attack, particularly , for finding adversarial examples for the CIFAR-10 randomized smoothing models of Cohen et al. (2019).

requires the evaluation of (6) as discussed in Section 3. Here, we analyze how sensitive our attack is to the number of samples used in (6) for estimating the gradient of the adversarial objective. Fig. 3 shows the empirical accuracies for various values of . Lower accuracies corresponds to stronger attack. SmoothAdv with sample performs worse than the vanilla PGD attack on the base classifier, but as increases, our attack becomes stronger, decreasing the gap between certified and empirical accuracies. We did not observe any noticeable improvement beyond . More details are in Appendix C.4.

While as discussed here, the success rate of the attack is affected by the number of Gaussian noise samples used by the attacker, it is also affected by the number of Gaussian noise samples in Predict used by the classifier. Indeed, as increases, abstention due to low confidence becomes more rare, increasing the prediction quality of the smoothed classifier. See a detailed analysis in Appendix C.5.

5 Related Work

Recently, many approaches (defenses) have been proposed to build adversarially robust classifiers, and these approaches can be broadly divided into empirical defenses and certified defenses.

Empirical defenses are empirically robust to existing adversarial attacks, and the best empirical defense so far is adversarial training (Kurakin et al., 2016; Madry et al., 2017). In this kind of defense, a neural network is trained to minimize the worst-case loss over a neighborhood around the input. Although such defenses seem powerful, nothing guarantees that a more powerful, not yet known, attack would not break them; the most that can be said is that known attacks are unable to find adversarial examples around the data points. In fact, most empirical defenses proposed in the literature were later “broken” by stronger adversaries (Carlini and Wagner, 2017; Athalye et al., 2018; Uesato et al., 2018; Athalye and Carlini, 2018). To stop this arms race between defenders and attackers, a number of work tried to focus on building certified defenses which enjoy formal robustness guarantees.

Certified defenses are provably robust to a specific class of adversarial perturbation, and can guarantee that for any input , the classifier’s prediction is constant within a neighborhood of . These are typically based on certification methods which are either exact (a.k.a “complete”) or conservative (a.k.a “sound but incomplete”). Exact methods, usually based on Satisfiability Modulo Theories solvers (Katz et al., 2017; Ehlers, 2017) or mixed integer linear programming (Tjeng et al., 2019; Lomuscio and Maganti, 2017; Fischetti and Jo, 2017), are guaranteed to find an adversarial example around a datapoint if it exists. Unfortunately, they are computationally inefficient and difficult to scale up to large neural networks. Conservative methods are also guaranteed to detect an adversarial example if exists, but they might mistakenly flag a safe data point as vulnerable to adversarial examples. On the bright side, these methods are more scalable and efficient which makes some of them useful for building certified defenses (Wong and Kolter, 2018; Wang et al., 2018a, b; Raghunathan et al., 2018a, b; Wong et al., 2018; Dvijotham et al., 2018b, a; Croce et al., 2018; Gehr et al., 2018; Mirman et al., 2018; Singh et al., 2018; Gowal et al., 2018; Weng et al., 2018; Zhang et al., 2018). However, none of them have yet been shown to scale to practical networks that are large and expressive enough to perform well on ImageNet, for example. To scale up to practical networks, randomized smoothing has been proposed as a probabilistically certified defense.

Randomized smoothing

A randomized smoothing classifier is not itself a neural network, but uses a neural network as its base for classification. Randomized smoothing was proposed by several works (Liu et al., 2018; Cao and Gong, 2017) as a heuristic defense without proving any guarantees. Lecuyer et al. (2018) first proved robustness guarantees for randomized smoothing classifier, utilizing inequalities from the differential privacy literature. Subsequently, Li et al. (2018) gave a stronger robustness guarantee using tools from information theory. Recently, Cohen et al. (2019) provided a tight robustness guarantee for randomized smoothing and consequently achieved the state of the art in -norm certified defense.

6 Conclusions

In this paper, we designed an adapted attack for smoothed classifiers, and we showed how this attack can be used in an adversarial training setting to substantially improve the provable robustness of smoothed classifiers. We demonstrated through extensive experimentation that our adversarially trained smooth classifiers consistently outperforms all existing provably -robust classifiers by a significant margin on ImageNet and CIFAR-10, establishing the state of the art for provable -defenses.

Acknowledgements

We would like to thank Zico Kolter, Jeremy Cohen, Elan Rosenfeld, Aleksander Madry, Andrew Ilyas, Dimitris Tsipras, Shibani Santurkar, Jacob Steinhardt for comments and discussions.

References

  • Athalye and Carlini (2018) Anish Athalye and Nicholas Carlini. On the robustness of the cvpr 2018 white-box adversarial example defenses. arXiv preprint arXiv:1804.03286, 2018.
  • Athalye et al. (2018) Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420, 2018.
  • Cao and Gong (2017) Xiaoyu Cao and Neil Zhenqiang Gong. Mitigating evasion attacks to deep neural networks via region-based classification. In Proceedings of the 33rd Annual Computer Security Applications Conference, pages 278–287. ACM, 2017.
  • Carlini and Wagner (2017) Nicholas Carlini and David Wagner. Adversarial examples are not easily detected: Bypassing ten detection methods. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pages 3–14. ACM, 2017.
  • Cohen et al. (2019) Jeremy M Cohen, Elan Rosenfeld, and J Zico Kolter. Certified adversarial robustness via randomized smoothing. arXiv preprint arXiv:1902.02918, 2019.
  • Croce et al. (2018) Francesco Croce, Maksym Andriushchenko, and Matthias Hein. Provable robustness of relu networks via maximization of linear regions. arXiv preprint arXiv:1810.07481, 2018.
  • Deng et al. (2009) Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  • Dvijotham et al. (2018a) Krishnamurthy Dvijotham, Sven Gowal, Robert Stanforth, Relja Arandjelovic, Brendan O’Donoghue, Jonathan Uesato, and Pushmeet Kohli. Training verified learners with learned verifiers. arXiv preprint arXiv:1805.10265, 2018a.
  • Dvijotham et al. (2018b) Krishnamurthy Dvijotham, Robert Stanforth, Sven Gowal, Timothy Mann, and Pushmeet Kohli. A dual approach to scalable verification of deep networks. UAI, 2018b.
  • Ehlers (2017) Ruediger Ehlers. Formal verification of piece-wise linear feed-forward neural networks. In International Symposium on Automated Technology for Verification and Analysis, pages 269–286. Springer, 2017.
  • Fischetti and Jo (2017) Matteo Fischetti and Jason Jo. Deep neural networks as 0-1 mixed integer linear programs: A feasibility study. arXiv preprint arXiv:1712.06174, 2017.
  • Gehr et al. (2018) Timon Gehr, Matthew Mirman, Dana Drachsler-Cohen, Petar Tsankov, Swarat Chaudhuri, and Martin Vechev. Ai2: Safety and robustness certification of neural networks with abstract interpretation. In 2018 IEEE Symposium on Security and Privacy (SP), pages 3–18. IEEE, 2018.
  • Goodfellow et al. (2015) Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. ICLR, 2015.
  • Gowal et al. (2018) Sven Gowal, Krishnamurthy Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Timothy Mann, and Pushmeet Kohli. On the effectiveness of interval bound propagation for training verifiably robust models. arXiv preprint arXiv:1810.12715, 2018.
  • He et al. (2016) Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  • Katz et al. (2017) Guy Katz, Clark Barrett, David L Dill, Kyle Julian, and Mykel J Kochenderfer. Reluplex: An efficient smt solver for verifying deep neural networks. In International Conference on Computer Aided Verification, pages 97–117. Springer, 2017.
  • Krizhevsky and Hinton (2009) Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.
  • Kurakin et al. (2016) Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236, 2016.
  • Lecuyer et al. (2018) Mathias Lecuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, and Suman Jana. Certified robustness to adversarial examples with differential privacy. arXiv preprint arXiv:1802.03471, 2018.
  • Li et al. (2018) Bai Li, Changyou Chen, Wenlin Wang, and Lawrence Carin. Second-order adversarial attack and certifiable robustness. arXiv preprint arXiv:1809.03113, 2018.
  • Liu et al. (2018) Xuanqing Liu, Minhao Cheng, Huan Zhang, and Cho-Jui Hsieh. Towards robust neural networks via random self-ensemble. In Proceedings of the European Conference on Computer Vision (ECCV), pages 369–385, 2018.
  • Lomuscio and Maganti (2017) Alessio Lomuscio and Lalit Maganti. An approach to reachability analysis for feed-forward relu neural networks. arXiv preprint arXiv:1706.07351, 2017.
  • Madry et al. (2017) Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
  • Mirman et al. (2018) Matthew Mirman, Timon Gehr, and Martin Vechev. Differentiable abstract interpretation for provably robust neural networks. In International Conference on Machine Learning, pages 3575–3583, 2018.
  • Raghunathan et al. (2018a) Aditi Raghunathan, Jacob Steinhardt, and Percy Liang. Certified defenses against adversarial examples. International Conference on Learning Representations (ICLR), arXiv preprint arXiv:1801.09344, 2018a.
  • Raghunathan et al. (2018b) Aditi Raghunathan, Jacob Steinhardt, and Percy S Liang. Semidefinite relaxations for certifying robustness to adversarial examples. In Advances in Neural Information Processing Systems, pages 10877–10887, 2018b.
  • Rony et al. (2018) Jérôme Rony, Luiz G Hafemann, Luis S Oliveira, Ismail Ben Ayed, Robert Sabourin, and Eric Granger. Decoupling direction and norm for efficient gradient-based l2 adversarial attacks and defenses. arXiv preprint arXiv:1811.09600, 2018.
  • Singh et al. (2018) Gagandeep Singh, Timon Gehr, Matthew Mirman, Markus Püschel, and Martin Vechev. Fast and effective robustness certification. In Advances in Neural Information Processing Systems, pages 10825–10836, 2018.
  • Stein (1981) Charles M Stein. Estimation of the mean of a multivariate normal distribution. The annals of Statistics, pages 1135–1151, 1981.
  • Szegedy et al. (2013) Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
  • Tjeng et al. (2019) Vincent Tjeng, Kai Y. Xiao, and Russ Tedrake. Evaluating robustness of neural networks with mixed integer programming. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=HyGIdiRqtm.
  • Uesato et al. (2018) Jonathan Uesato, Brendan O’Donoghue, Aaron van den Oord, and Pushmeet Kohli. Adversarial risk and the dangers of evaluating against weak attacks. arXiv preprint arXiv:1802.05666, 2018.
  • Wang et al. (2018a) Shiqi Wang, Yizheng Chen, Ahmed Abdou, and Suman Jana. Mixtrain: Scalable training of formally robust neural networks. arXiv preprint arXiv:1811.02625, 2018a.
  • Wang et al. (2018b) Shiqi Wang, Kexin Pei, Justin Whitehouse, Junfeng Yang, and Suman Jana. Efficient formal safety analysis of neural networks. In Advances in Neural Information Processing Systems, pages 6369–6379, 2018b.
  • Weng et al. (2018) Tsui-Wei Weng, Huan Zhang, Hongge Chen, Zhao Song, Cho-Jui Hsieh, Duane Boning, Inderjit S Dhillon, and Luca Daniel. Towards fast computation of certified robustness for ReLU networks. In International Conference on Machine Learning, 2018.
  • Wong and Kolter (2018) Eric Wong and Zico Kolter. Provable defenses against adversarial examples via the convex outer adversarial polytope. In International Conference on Machine Learning (ICML), pages 5283–5292, 2018.
  • Wong et al. (2018) Eric Wong, Frank Schmidt, Jan Hendrik Metzen, and J Zico Kolter. Scaling provable adversarial defenses. Advances in Neural Information Processing Systems (NIPS), 2018.
  • Zhang et al. (2018) Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel. Efficient neural network robustness certification with general activation functions. In Advances in Neural Information Processing Systems, pages 4939–4948, 2018.

Appendix A Alternative proof of the robustness guarantee of Cohen et al. [2019] via explicit Lipschitz constants of smoothed classifier

Fix and define by:

Lemma 1.

The function is -Lipschitz.

Proof.

It suffices to prove that for any unit direction one has . Note that:

(8)

and thus (using , and classical integration of the Gaussian density)

A very nice observation of Cohen et al. [2019] is that the function actually satisfies a stronger smoothness property:

Lemma 2.

Let . Assume that . Then the function is -Lipschitz.

Proof.

Note that:

and thus we need to prove that for any unit direction , denoting ,

Note that the left-hand side can be written as follows (recall (8))

Invoking a simple symmetry argument, one can actually compute the supremum of the above quantity over all functions , subject to the constraint that , namely it is equal to:

thus concluding the proof. ∎

Both lemmas give the same robustness guarantee for small gaps, but the second lemma is much better for large gaps (in fact, in the limit of a gap going to , the second lemma gives an infinite radius while the first lemma only gives a radius of ).

Appendix B Another perspective for deriving SmoothAdv

In this section we provide an alternative motivation for the SmoothAdv objective presented in Section 2.2. We assume that we have a hard classifier which takes the form , for some function . If is a neural network classifier, this can be taken for instance to be the map from the input to the logit layer immediately preceding the softmax. If is of this form, then the smoothed soft classifier with parameter associated to (the one-hot encoding of) can be written has

(9)

for all , where is the function, which at input , has -th coordinate equal to if and only if , and zero otherwise. The function is somewhat hard to work with, therefore we will approximate it with a smooth function, namely, the softmax function. Recall that the softmax function with inverse temperature parameter is the function given by . Observe that for any , we have that as . Thus we can approximate (9) with

(10)

To find an adversarial perturbation of at data point , it is sufficient to find a perturbation so that is minimized. Combining this with the approximation (10), we find that a heuristic to find an adversarial example for the smoothed classifier at is to solve the following optimization problem:

(11)

and as we let , this converges to finding an adversarial example for the true smoothed classifier.

To conclude, we simply observe that for neural networks, is exactly the soft classifier that is thresholded to form the hard classifier, if is taken to be . Therefore the solution to () and (11) with are the same, since is a monotonic function.

An interesting direction is to investigate whether varying in (11) allows us to improve our adversarial attacks, and if they do, whether this gives us stronger adversarial training as well. Intuitively, as we take , the quality of the optimal solution should increase, but the optimization problem becomes increasingly ill-behaved, and so it is not clear if the actual solution we obtain to this problem via first order methods becomes better or not.

Appendix C Additional Experiments

c.1 Adversarial attacking the base model instead of the smoothed model

We compare SmoothAdv-ersarial training (training the smoothed classifier ) to:

  1. using vanilla adversarial training (PGD) to find adversarial examples of the base classifier and train on them. We refer to this as Vanilla PGD training.

  2. using vanilla adversarial training (PGD) to find adversarial examples of the base classifier , add Gaussian noise to them, then train on the resulting inputs. We refer to this as Vanilla PGD+noise training.

For our method and the above two methods, we use steps of attack, , and we train for , and for .

Fig. 4 plots the best certified accuracies over all and values, for each radius using our trained classifiers vs. smoothed models trained via Vanilla PGD or Vanilla PGD+noise. Fig. 4 also plots Cohen et al. [2019] results as a baseline. Observe that SmoothAdv-ersially trained models are more robust overall.

Figure 4: Certified defenses: ours vs. Cohen et al. [2019] vs. vanilla PGD vs. vanilla PGD + noise

c.2 Effect of number of noise samples in (6) during SmoothAdv-ersarial training on the certified accuracy of smoothed classifiers

As presented in Section 4.2, more noise samples lead to stronger SmoothAdv-eraial attack. Here, we demonstrate that if we train with such improved attacks, we get higher certified accuracies of the smoothed classifier. Fig. 5 plots the best certified accuracies over models trained using or with , , , and across various number of noise samples for the attack. Observe that models trained with higher tend to have higher certified accuracies.

Figure 5: Vary number of samples .

c.3 Effect of during training on the certified accuracy of smoothed classifiers

Here, we analyze the effect of the maximum allowed perturbation of SmoothAdv during adversarial training on the robustness of the obtained smoothed classifier. Fig. 6 plots the best certified accuracies for over models trained using with , , and . Observe that as increases, the certified accuracies for small radii decrease, but those for large radii increase, which is expected.

Figure 6: Vary . Observe that as increases, the certified accuracies for small radii decrease, but those for large radii increase, which is expected.

c.4 Effect of the number of samples in (6) during SmoothAdv attack on the empirical accuracies

requires the evaluation of (6) as discussed in Section 3. Here, we analyze how sensitive our attack is to the number of samples used in (6). Fig. 3 shows the empirical accuracies for various values of . Lower accuracies correspond to stronger attacks. For , the vanilla PGD attack (attacking the base classifier instead of the smooth classifier) performs better than SmoothAdv, but as increases, our attack becomes stronger, decreasing the gap between certified and empirical accuracies. We did not observe any noticeable improvement beyond .

c.5 Effect of the number of Monte Carlo samples in Predict on the empirical accuracies

Fig. 7 plots the empirical accuracies of using a attack (with ) across different numbers of Monte Carlo samples n that are used by Predict. Observe that the empirical accuracies increase as increases since the prediction quality of the smoothed classifier improves i.e. less predictions are abstained.

Figure 7: Empirical accuracies. Vary number of samples . The higher the better.

c.6 Performance of the gradient-free estimator (7)

Despite the appealing features of the gradient-free estimator (7) presented in Section 3.1 as an alternative to (6), in practice we find that this attack is quite weak. This is shown in Fig. 8 for various values of .

We speculate that this is because the variance of the gradient estimator is too high. We believe that investigating this attack in practice is an interesting direction for future work.

Figure 8: The emprirical accuracies found by the attack () using the plug-in estimator (6) vs. the gradient-free estimator (7). The closer an empirical curve is to the certified curve, the stronger the attack.

Appendix D Experiments Details

Here we include details of all the experiments conducted in this paper.

Attacks used in the paper

We use two of the strongest attacks in the literature, projected gradient descent (PGD) Madry et al. [2017] and decoupled direction and norm (DDN) Rony et al. [2018] attacks. We adapt these attacks such that their gradient steps are given by (6), and we call the resulting attacks and , respectively.

For PGD (), we use a constant step size where is the number of attack steps, and is the maximum allowed perturbation of the input.

For DDN (), the attack objective is in fact different than that of PGD (i.e. different that ()). DDN tries to find the “closest” adversarial example to the input instead of finding the “best” adversarial example (in terms of maximizing the loss in a given neighborhood of the input). We stick to the hyperparameters used in the original paper Rony et al. [2018]. We use , , and an initial step size that is reduced with cosine annealing to 0.01 in the last iteration (see Rony et al. [2018] for the definition of these parameters). We experimented with very few iterations () as compared to the original paper, but we still got good results.

We emphasize that we are not using PGD and DDN to attack the base classifer of a smoothed model, instead we are using them to adversarially train smoothed classiers (see Pseudocode 1).

Training details

In order to report certified radii in the original coordinates, we first added Gaussian noise and/or do adversarial attacks, and then standardized the data (in contrast to importing a standardized dataset). Specifically, in our PyTorch implementation, the first layer of the base classifier is a normalization layer that performed a channel-wise standardization of its input.

For both ImageNet and CIFAR-10, we trained the base classifier with random horizontal flips and random crops (in addition to the Gaussian data augmentation discussed in Section 3.2).

The main training algorithm is shown in Pseudocode 1. It has the following parameters: is the mini-batch size, is the number of noise samples used for gradient estimation in (6) as well as for Gaussian noise data augmentation, and is the number of steps of an attack.

We point out few remarks.

  1. First, an important parameter is the radius of the attack . During the first epoch, it is set to zero, then we linearly increase it over the first ten epochs, then it stays constant.

  2. Second, we are reusing the same noise samples during every step of our attack as well as augmentation. Intuitively, it helps to stabilize the attack process.

  3. Finally, the way training is described in Pseudocode 1 is not efficient; it needs to be appropriately batched so that we compute adversarial examples for every input in a batch at the same time.

Compute details and training time

On CIFAR-10, we trained using SGD on one NVIDIA P100 GPU. We train for 150 epochs. We use a batch size of 256, and an initial learning rate of 0.1 which drops by a factor of 10 every 50 epochs. Training time varies between few hours to few days, depending on how many attack steps and noise samples are used in Pseudocode 1.

On ImageNet we trained with synchronous SGD on four NVIDIA V100 GPUs. We train for 90 epochs. We use a batch size of 400, and an initial learning rate of 0.1 which drops by a factor of 10 every 30 epochs. Training time varies between 2 to 6 days depending on whether we are doining SmoothAdv-ersarial training or just Gaussian noise training (similar to Cohen et al. [2019]).

Models used

The models used in this paper are similar to those used in Cohen et al. [2019]: a ResNet-50 [He et al., 2016] on ImageNet, and ResNet-110 on CIFAR-10. These models can be found on the github repo accompanying Cohen et al. [2019] https://github.com/locuslab/smoothing/blob/master/code/architectures.py.

Parameters of Certify amd Predict

For details of these algorithms, please see the Pseudocode in Cohen et al. [2019].

For Certify, unless otherwise specified, we use , , .

For Predict, unless otherwise specified, we use and .

Source code

Our code and trained models are publicly available at http://github.com/Hadisalman/smoothing-adversarial. The repository also includes all our training/certification logs, which enables the replication of all the results of this paper by running a single piece of code. Check the repository for more details.

Appendix E ImageNet and CIFAR-10 Detailed Results

In this appendix, we include the certified accuracies of each mode that we use in the paper. For each radius, we highlight the best accuracy accross all models. Note that we outperform the models of Cohen et al. [2019] (first three rows of each table) over all radii by wide margins.

Radius (ImageNet)
Cohen et al. [2019] 0.67 0.49 0.00 0.00 0.00 0.00 0.00 0.00
0.57 0.46 0.37 0.29 0.00 0.00 0.00 0.00
0.44 0.38 0.33 0.26 0.19 0.15 0.12 0.09
   0.63 0.54 0.00 0.00 0.00 0.00 0.00 0.00
   0.62 0.54 0.00 0.00 0.00 0.00 0.00 0.00
   0.56 0.52 0.00 0.00 0.00 0.00 0.00 0.00
   0.49 0.45 0.00 0.00 0.00 0.00 0.00 0.00
   0.56 0.48 0.42 0.34 0.00 0.00 0.00 0.00
   0.54 0.49 0.43 0.37 0.00 0.00 0.00 0.00
   0.48 0.45 0.42 0.37 0.00 0.00 0.00 0.00
   0.44 0.42 0.39 0.37 0.00 0.00 0.00 0.00
   0.44 0.38 0.34 0.29 0.24 0.20 0.15 0.11
   0.41 0.36 0.34 0.31 0.26 0.21 0.18 0.14
   0.40 0.37 0.34 0.30 0.27 0.25 0.20 0.15
   0.34 0.31 0.29 0.27 0.25 0.22 0.19 0.16
   0.66 0.52 0.00 0.00 0.00 0.00 0.00 0.00
   0.65 0.56 0.00 0.00 0.00 0.00 0.00 0.00
   0.65 0.54 0.00 0.00 0.00 0.00 0.00 0.00
   0.67 0.55 0.00 0.00 0.00 0.00 0.00 0.00
   0.59 0.48 0.38 0.29 0.00 0.00 0.00 0.00
   0.55 0.49 0.40 0.32 0.00 0.00 0.00 0.00
   0.58 0.49 0.42 0.34 0.00 0.00 0.00 0.00
   0.58 0.51 0.41 0.32 0.00 0.00 0.00 0.00
   0.44 0.37 0.31 0.26 0.20 0.16 0.11 0.08
   0.46 0.39 0.32 0.26 0.22 0.17 0.11 0.09
   0.45 0.39 0.34 0.27 0.23 0.16 0.13 0.09
   0.44 0.39 0.34 0.28 0.22 0.16 0.12 0.08
Table 3: Approximate certified test accuracy on ImageNet. Each row is a setting of the hyperparameters and , each column is an radius. The entry of the best for each radius is bolded. For comparison, random guessing would attain 0.001 accuracy.
Radius (CIFAR-10)
Cohen et al. [2019] 0.81 0.59 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.75 0.60 0.43 0.27 0.00 0.00 0.00 0.00 0.00 0.00
0.65 0.55 0.41 0.32 0.23 0.15 0.09 0.05 0.00 0.00
0.47 0.39 0.34 0.28 0.22 0.17 0.14 0.12 0.10 0.08
   0.84 0.69 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
   0.75 0.63 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
   0.75 0.63 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
   0.78 0.64 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
   0.77 0.65 0.49 0.33 0.00 0.00 0.00 0.00 0.00 0.00
   0.70 0.57 0.50 0.40 0.00 0.00 0.00 0.00 0.00 0.00
   0.72 0.58 0.45 0.35 0.00 0.00 0.00 0.00 0.00 0.00
   0.71 0.60 0.48 0.36 0.00 0.00 0.00 0.00 0.00 0.00
   0.66 0.55 0.44 0.34 0.25 0.18 0.12 0.08 0.00 0.00
   0.68 0.55 0.49 0.33 0.26 0.17 0.11 0.10 0.00 0.00
   0.63 0.52 0.44 0.33 0.25 0.18 0.14 0.09 0.00 0.00
   0.63 0.52 0.44 0.36 0.28 0.20 0.15 0.10 0.00 0.00
   0.50 0.42 0.34 0.27 0.22 0.19 0.15 0.13 0.10 0.07
   0.47 0.39 0.34 0.27 0.23 0.18 0.16 0.13 0.11 0.08
   0.48 0.41 0.35 0.29 0.25 0.20 0.16 0.14 0.12 0.09
   0.43 0.40 0.34 0.28 0.25 0.21 0.17 0.14 0.12 0.10
   0.82 0.69 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
   0.79 0.67 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
   0.71 0.64 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
   0.54 0.49 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
   0.77 0.65 0.51 0.39 0.00 0.00 0.00 0.00 0.00 0.00
   0.74 0.64 0.53 0.41 0.00 0.00 0.00 0.00 0.00 0.00
   0.64 0.59 0.53 0.45 0.00 0.00 0.00 0.00 0.00 0.00
   0.53 0.49 0.46 0.42 0.00 0.00 0.00 0.00 0.00 0.00
   0.64 0.56 0.46 0.38 0.30 0.23 0.15 0.10 0.00 0.00
   0.63 0.55 0.47 0.39 0.30 0.24 0.19 0.14 0.00 0.00
   0.57 0.52 0.46 0.41 0.33 0.28 0.23 0.18 0.00 0.00
   0.47 0.45 0.41 0.39 0.35 0.31 0.26 0.23 0.00 0.00
   0.48 0.42 0.36 0.30 0.25 0.21 0.17 0.14 0.12 0.09
   0.47 0.41 0.37 0.31 0.27 0.22 0.20 0.17 0.14 0.12
   0.46 0.41 0.37 0.33 0.28 0.24 0.22 0.18 0.16 0.14
   0.39 0.37 0.34 0.30 0.27 0.25 0.22 0.20 0.18 0.15
Table 4: SmoothAdv-ersarial training steps, sample.
Radius (CIFAR-10)
Cohen et al. [2019] 0.81 0.59 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.75 0.60 0.43 0.27 0.00 0.00 0.00 0.00 0.00 0.00
0.65 0.55 0.41 0.32 0.23 0.15 0.09 0.05 0.00 0.00
0.47 0.39 0.34 0.28 0.22 0.17 0.14 0.12 0.10 0.08
   0.82 0.68 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
   0.80 0.67 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
   0.78 0.69 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
   0.78 0.67 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
   0.77 0.64 0.50 0.38 0.00 0.00 0.00 0.00 0.00 0.00
   0.70 0.61 0.50 0.40 0.00 0.00 0.00 0.00 0.00 0.00
   0.72 0.61 0.53 0.42 0.00 0.00 0.00 0.00 0.00 0.00
   0.72 0.63 0.54 0.40 0.00 0.00 0.00 0.00 0.00 0.00
   0.65 0.57 0.47 0.37 0.27 0.19 0.12 0.07 0.00 0.00
   0.64 0.54 0.45 0.35 0.28 0.20 0.15 0.10 0.00 0.00
   0.63 0.54 0.46 0.38 0.30 0.23 0.16 0.11 0.00 0.00
   0.63 0.53 0.44 0.36 0.29 0.22 0.17 0.10 0.00 0.00
   0.48 0.41 0.34 0.29 0.22 0.19 0.17 0.14 0.10 0.09
   0.47 0.40 0.34 0.28 0.23 0.20 0.17 0.14 0.11 0.09
   0.47 0.39 0.34 0.28 0.24 0.21 0.18 0.15 0.13 0.09
   0.48 0.40 0.35 0.30 0.25 0.21 0.17 0.14 0.12 0.09
   0.83 0.69 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
   0.81 0.69 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
   0.72 0.63 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
   0.56 0.52 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
   0.76 0.66 0.51 0.39 0.00 0.00 0.00 0.00 0.00 0.00
   0.69 0.63 0.53 0.42 0.00 0.00 0.00 0.00 0.00 0.00
   0.66 0.59 0.53 0.46 0.00 0.00 0.00 0.00 0.00 0.00
   0.53 0.49 0.45 0.42 0.00 0.00 0.00 0.00 0.00 0.00
   0.65 0.57 0.47 0.37 0.29 0.23 0.16 0.09 0.00 0.00
   0.62 0.54 0.48 0.40 0.29 0.25 0.19 0.14 0.00 0.00
   0.56 0.50 0.44 0.39 0.34 0.30 0.23 0.18 0.00 0.00
   0.47 0.44 0.41 0.38 0.34 0.31 0.27 0.24 0.00 0.00
   0.49 0.42 0.36 0.30 0.25 0.21 0.18 0.14 0.12 0.10
   0.48 0.43 0.37 0.30 0.26 0.24 0.19 0.16 0.14 0.12
   0.45 0.40 0.37 0.34 0.30 0.25 0.21 0.19 0.17 0.15
   0.37 0.35 0.32 0.30 0.28 0.25 0.23 0.19 0.17 0.15
Table 5: SmoothAdv-ersarial training steps, sample.
Radius (CIFAR-10)
Cohen et al. [2019] 0.81 0.59 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.75 0.60 0.43 0.27 0.00 0.00 0.00 0.00 0.00 0.00
0.65 0.55 0.41 0.32 0.23 0.15 0.09 0.05 0.00 0.00
0.47 0.39 0.34 0.28 0.22 0.17 0.14 0.12 0.10 0.08
   0.81 0.66 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
   0.81 0.67 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
   0.76 0.66 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
   0.80 0.67 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
   0.77 0.65 0.49 0.36 0.00 0.00 0.00 0.00 0.00 0.00
   0.75 0.64 0.51 0.37 0.00 0.00 0.00 0.00 0.00 0.00
   0.72 0.63 0.53 0.41 0.00 0.00 0.00 0.00 0.00 0.00
   0.71 0.63 0.52 0.40 0.00 0.00 0.00 0.00 0.00 0.00
   0.68 0.56 0.47 0.36 0.25 0.19 0.12 0.08 0.00 0.00
   0.67 0.58 0.45 0.38 0.30 0.22 0.16 0.11 0.00 0.00
   0.62 0.52 0.43 0.35 0.29 0.25 0.18 0.12 0.00 0.00
   0.63 0.54 0.45 0.36 0.27 0.22 0.16 0.11 0.00 0.00
   0.48 0.41 0.35 0.30 0.23 0.19 0.15 0.12 0.10 0.08
   0.47 0.40 0.35 0.30 0.23 0.19 0.17 0.13 0.10 0.09
   0.47 0.40 0.35 0.30 0.24 0.21 0.17 0.15 0.13 0.09
   0.45 0.40 0.34 0.30 0.24 0.18 0.17 0.15 0.12 0.09
   0.81 0.65 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
   0.78 0.68 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
   0.70 0.62 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
   0.56 0.53 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
   0.76 0.63 0.54 0.40 0.00 0.00 0.00 0.00 0.00 0.00
   0.72 0.61 0.52 0.43 0.00 0.00 0.00 0.00 0.00 0.00