Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers
Abstract
Recent works have shown the effectiveness of randomized smoothing as a scalable technique for building neural networkbased classifiers that are provably robust to norm adversarial perturbations. In this paper, we employ adversarial training to improve the performance of randomized smoothing. We design an adapted attack for smoothed classifiers, and we show how this attack can be used in an adversarial training setting to boost the provable robustness of smoothed classifiers. We demonstrate through extensive experimentation that our method consistently outperforms all existing provably robust classifiers by a significant margin on ImageNet and CIFAR10, establishing the stateoftheart for provable defenses. Our code and trained models are available at http://github.com/Hadisalman/smoothingadversarial.
Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers
Hadi Salman, Greg Yang, Jerry Li, Pengchuan Zhang^{1}^{1}footnotemark: 1, Huan Zhang^{1}^{1}footnotemark: 1, Ilya Razenshteyn^{1}^{1}footnotemark: 1, Sébastien Bubeck ^{†}^{†}thanks: Reverse alphabetical order. Work done as part of the Microsoft AI Residency Program. Microsoft Research AI {hadi.salman, gregyang, jerrl, penzhan, thuzhan, ilyaraz, sebubeck}@microsoft.com
noticebox[b]Preprint.\end@float
1 Introduction
Neural networks have been very successful in tasks such as image classification and speech recognition, but have been shown to be extremely brittle to small, adversariallychosen perturbations of their inputs (Szegedy et al., 2013; Goodfellow et al., 2015). A classifier (e.g., a neural network), which correctly classifies an image , can be fooled by an adversary to misclassify where is an adversarial perturbation so small that and are indistinguishable for the human eye. Recently, many works have proposed heuristic defenses intended to train models robust to such adversarial perturbations. However, most of these defenses were broken using more powerful adversaries (Carlini and Wagner, 2017; Athalye et al., 2018; Uesato et al., 2018). This encouraged researchers to develop defenses that lead to certifiably robust classifiers, i.e., whose predictions for most of the test examples can be verified to be constant within a neighborhood of (Wong and Kolter, 2018; Raghunathan et al., 2018a). Unfortunately, these techniques do not immediately scale to large neural networks that are used in practice.
To mitigate this limitation of prior certifiable defenses, a number of papers (Lecuyer et al., 2018; Li et al., 2018; Cohen et al., 2019) consider the randomized smoothing approach, which transforms any classifier (e.g., a neural network) into a new smoothed classifier that has certifiable norm robustness guarantees. This transformation works as follows.
Let be an arbitrary base classifier which maps inputs in to classes in . Given an input , the smoothed classifier labels as having class which is the most likely to be returned by the base classifier when fed a noisy corruption , where is a vector sampled according to an isotropic Gaussian distribution.
As shown in Cohen et al. (2019), one can derive certifiable robustness for such smoothed classifiers via the NeymanPearson lemma. They demonstrate that for perturbations, randomized smoothing outperforms other certifiably robust classifiers that have been previously proposed. It is scalable to networks with any architecture and size, which makes it suitable for building robust realworld neural networks.
Our contributions
In this paper, we employ adversarial training to substantially improve on the previous certified robustness results of randomized smoothing (Lecuyer et al., 2018; Li et al., 2018; Cohen et al., 2019). We present, for the first time, a direct attack for smoothed classifiers. We then demonstrate how to use this attack to adversarially train smoothed models with not only boosted empirical robustness but also substantially improved certifiable robustness using the certification method of Cohen et al. (2019).
We demonstrate that our method outperforms all existing provably robust classifiers by a significant margin on ImageNet and CIFAR10, establishing the stateoftheart for provable defenses. For instance, our Resnet50 ImageNet classifier achieves provable top1 accuracy (compared to the best previous provable accuracy of ) under adversarial perturbations with norm less than . Similarly, our Resnet110 CIFAR10 classifier achieves up to improvement over previous stateoftheart. Our main results are reported in Tables 1 and 2 for ImageNet and CIFAR10.
radius (ImageNet)  0.5  1.0  1.5  2.0  2.5  3.0  3.5 

Cohen et al. (2019) (%)  49  37  29  19  15  12  9 
Ours (%)  56  43  37  27  25  20  16 
Radius (CIFAR10)  

Cohen et al. (2019) (%)  60  43  32  23  17  14  12  10  8 
Ours (%)  74  57  48  38  33  29  24  19  17 
2 Our techniques
Here, we describe our techniques for adversarial attacks and training on smoothed classifiers. We first require some background on randomized smoothing classifiers. For a more detailed description of randomized smoothing, see Cohen et al. (2019).
2.1 Background on randomized smoothing
Consider a classifier from to classes . Randomized smoothing is a method that constructs a new, smoothed classifier from the base classifier . The smoothed classifier assigns to a query point the class which is most likely to be returned by the base classifier under isotropic Gaussian noise perturbation of , i.e.,
(1) 
The noise level is a hyperparameter of the smoothed classifier which controls a robustness/accuracy tradeoff. Equivalently, this means that returns the class whose decision region has the largest measure under the distribution . Cohen et al. (2019) recently presented a tight robustness guarantee for the smoothed classifier and gave Monte Carlo algorithms for certifying the robustness of around or predicting the class of using , that succeed with high probability. This guarantee can in fact be obtained alternatively by explicitly computing the Lipschitz constant of the smoothed classifier, as we do in Appendix A.
Robustness guarantee for smoothed classifiers
The robustness guarantee presented by Cohen et al. (2019) is as follows: suppose that when the base classifier classifies , the class is returned with probability , and the “runnerup” class is returned with probability . The smoothed classifier is robust around within the radius
(2) 
where is the inverse of the standard Gaussian CDF. It is not clear how to compute and exactly (if is given by a deep neural network for example). Monte Carlo sampling is used to estimate some and for which and with arbitrarily high probability over the samples. The result of (2) still holds if we replace with and with .
2.2 SmoothAdv: Attacking smoothed classifiers
We now describe our attack against smoothed classifiers. To do so, it will first be useful to describe smoothed classifiers in a more general setting. Specifically, we consider a generalization of (1) to soft classifiers, namely, functions , where is the set of probability distributions over . Neural networks typically learn such soft classifiers, then use the argmax of the soft classifier as the final hard classifier. Given a soft classifier , its associated smoothed soft classifier is defined as
(3) 
Let and denote the hard and soft classifiers learned by the neural network, respectively, and let and denote the associated smoothed hard and smoothed soft classifiers. Directly finding adversarial examples for the smoothed hard classifier is a somewhat illbehaved problem because of the argmax, so we instead propose to find adversarial examples for the smoothed soft classifier . Empirically we found that doing so will also find good adversarial examples for the smoothed hard classifier. More concretely, given a labeled data point , we wish to find a point which maximizes the loss of in an ball around for some choice of loss function. As is canonical in the literature, we focus on the cross entropy loss . Thus, given a labeled data point our (ideal) adversarial perturbation is given by the formula:
() 
We will refer to () as the SmoothAdv objective. The SmoothAdv objective is highly nonconvex, so as is common in the literature, we will optimize it via projected gradient descent (PGD), and variants thereof. It is hard to find exact gradients for (), so in practice we must use some estimator based on random Gaussian samples. There are a number of different natural estimators for the derivative of the objective function in (), and the choice of estimator can dramatically change the performance of the attack. For more details, see Section 3.
We note that () should not be confused with the similarlooking objective
(4) 
as suggested in section G.3 of Cohen et al. (2019). There is a subtle, but very important, distinction between () and (4). Conceptually, solving (4) corresponds to finding an adversarial example of that is robust to Gaussian noise. In contrast, () is directly attacking the smoothed model i.e. trying to find adversarial examples that decrease the probability of correct classification of the smoothed soft classifier . From this point of view, () is the right optimization problem that should be used to find adversarial examples of . This distinction turns out to be crucial in practice: empirically, Cohen et al. (2019) found attacks based on (4) not to be effective.
Interestingly, for a large class of classifiers, including neural networks, one can alternatively derive the objective () from an optimization perspective, by attempting to directly find adversarial examples to the smoothed hard classifier that the neural network provides. While they ultimately yield the same objective, this perspective may also be enlightening, and so we include it in Appendix B.
2.3 Adversarial training using SmoothAdv
We now wish to use our new attack to boost the adversarial robustness of smoothed classifiers. We do so using the wellstudied adversarial training framework (Kurakin et al., 2016; Madry et al., 2017). In adversarial training, given a current set of model weights and a labeled data point , one finds an adversarial perturbation of for the current model , and then takes a gradient step for the model parameters, evaluated at the point . Intuitively, this encourages the network to learn to minimize the worstcase loss over a neighborhood around the input.
At a high level, we propose to instead do adversarial training using an adversarial example for the smoothed classifier. We combine this with the approach suggested in Cohen et al. (2019), and train at Gaussian perturbations of this adversarial example. That is, given current set of weights and a labeled data point , we find as a solution to (), and then take a gradient step for based at gaussian perturbations of . In contrast to standard adversarial training, we are training the base classifier so that its associated smoothed classifier minimizes worstcase loss in a neighborhood around the current point. For more details of our implementation, see Section 3.2. We emphasize that although we are training using adversarial examples for the smoothed soft classifier, in the end we certify the robustness of the smoothed hard classifier we obtain after training.
We make two important observations about our method. First, adversarial training is an empirical defense, and typically offers no provable guarantees. However, we demonstrate that by combining our formulation of adversarial training with randomized smoothing, we are able to substantially boost the certifiable robust accuracy of our smoothed classifiers. Thus, while adversarial training using SmoothAdv is still ultimately a heuristic, and offers no provable robustness by itself, the smoothed classifier that we obtain using this heuristic has strong certifiable guarantees.
Second, we found empirically that to obtain strong certifiable numbers using randomized smoothing, it is insufficient to use standard adversarial training on the base classifier. While such adversarial training does indeed offer good empirical robust accuracy, the resulting classifier is not optimized for randomized smoothing. In contrast, our method specifically finds base classifiers whose smoothed counterparts are robust. As a result, the certifiable numbers for standard adversarial training are noticeably worse than those obtained using our method. See Appendix C.1 for an indepth comparison.
3 Implementing SmoothAdv via first order methods
As mentioned above, it is difficult to optimize the SmoothAdv objective, so we will approximate it via first order methods. We focus on two such methods: the wellstudied projected gradient descent (PGD) method (Kurakin et al., 2016; Madry et al., 2017), and the recently proposed decoupled direction and norm (DDN) method (Rony et al., 2018) which achieves robust accuracy competitive with PGD on CIFAR10.
The main task when implementing these methods is to, given a data point , compute the gradient of the objective function in () with respect to . If we let denote the objective function in (), we have
(5) 
However, it is not clear how to evaluate (5) exactly, as it takes the form of a complicated high dimensional integral. Therefore, we will use Monte Carlo approximations. We sample i.i.d. Gaussians , and use the plugin estimator for the expectation:
(6) 
It is not hard to see that if is smooth, this estimator will converge to (5) as we take more samples. In practice, if we take samples, then to evaluate (6) on all samples requires evaluating the network times. This becomes expensive for large , especially if we want to plug this into the adversarial training framework, which is already slow. Thus, when we use this for adversarial training, we use . When we run this attack to evaluate the empirical adversarial accuracy of our models, we use substantially larger choices of , specifically, . Empirically we found that increasing beyond did not substantially improve performance.
While this estimator does converge to the true gradient given enough samples, note that it is not an unbiased estimator for the gradient. Despite this, we found that using (6) performs very well in practice. Indeed, using (6) yields our strongest empirical attacks, as well as our strongest certifiable defenses when we use this attack in adversarial training. In the remainder of the paper, we let denote the PGD attack with gradient steps given by (6), and similarly we let denote the DDN attack with gradient steps given by (6).
3.1 An unbiased, gradient free method
We note that there is an alternative way to optimize () using first order methods. Notice that the logarithm in () does not change the argmax, and so it suffices to find a minimizer of subject to the constraint. We then observe that
(7) 
The equality (a) is known as Stein’s lemma Stein (1981), although we note that something similar can be derived for more general distributions. There is a natural unbiased estimator for (7): sample i.i.d. gaussians , and form the estimator This estimator has a number of nice properties. As mentioned previously, it is an unbiased estimator for (7), in contrast to (6). It also requires no computations of the gradient of ; if is a neural network, this saves both time and memory by not storing preactivations during the forward pass. Finally, it is very general: the derivation of (7) actually holds even if is a hard classifier (or more precisely, the onehot embedding of a hard classifier). In particular, this implies that this technique can even be used to directly find adversarial examples of the smoothed hard classifier.
Despite these appealing features, in practice we find that this attack is quite weak. We speculate that this is because the variance of the gradient estimator is too high. For this reason, in the empirical evaluation we focus on attacks using (6), but we believe that investigating this attack in practice is an interesting direction for future work. See Appendix C.6 for more details.
3.2 Implementing adversarial training for smoothed classifiers
We incorporate adversarial training into the approach of Cohen et al. (2019) changing as few moving parts as possible in order to enable a direct comparison. In particular, we use the same network architectures, batch size, and learning rate schedule. For CIFAR10, we change the number of epochs, but for ImageNet, we leave it the same. We discuss more of these specifics in Appendix D, and here we describe how to perform adversarial training on a single minibatch. The algorithm is shown in Pseudocode 1, with the following parameters: is the minibatch size, is the number of noise samples used for gradient estimation in (6) as well as for Gaussian noise data augmentation, and is the number of steps of an attack^{1}^{1}1Note that we are reusing the same noise samples during every step of our attack as well as during augmentation. Intuitively, this helps to stabilize the attack process..
4 Experiments
We primarily compare with Cohen et al. (2019) as it was shown to outperform all other scalable provable defenses by a wide margin. As our experiments will demonstrate, our method consistently and significantly outperforms Cohen et al. (2019) even further, establishing the stateoftheart for provable defenses. We run experiments on ImageNet (Deng et al., 2009) and CIFAR10 (Krizhevsky and Hinton, 2009). We use the same base classifiers as Cohen et al. (2019): a ResNet50 (He et al., 2016) on ImageNet, and ResNet110 on CIFAR10. Other than the choice of attack ( or ) for adversarial training, our experiments are distinguished based on four main hyperparameters:
()  
Given a smoothed classifier , we use the same prediction and certification algorithms, Predict and Certify, as Cohen et al. (2019). Both algorithms sample base classifier predictions under Gaussian noise. Predict outputs the majority vote if the vote count passes a binomial hypothesis test, and abstains otherwise. Certify certifies the majority vote is robust if the fraction of such votes is higher by a calculated margin than the fraction of the next most popular votes, and abstains otherwise. For details of these algorithms, we refer the reader to Cohen et al. (2019).
The certified accuracy at radius is defined as the fraction of the test set which classifies correctly (without abstaining) and certifies robust at an radius . Unless otherwise specified, we use the same for certification as the one used for training the base classifier . Note that is a randomized smoothing classifier, so this reported accuracy is approximate, but can get arbitrarily close to the true certified accuracy as the number of samples of increases (see Cohen et al. (2019) for more details). Similarly, the empirical accuracy is defined as the fraction of the SmoothAdversarially attacked test set which classifies correctly (without abstaining).
Both Predict and Certify have a parameter defining the failure rate of these algorithms. Throughout the paper, we set (similar to Cohen et al. (2019)), which means there is at most a 0.1% chance that Predict does not return the most probable class under the smoothed classifier , or that Certify falsely certifies a nonrobust input.
4.1 SmoothAdversarial training
To assess the effectiveness of our method, we learn a smoothed classifier that is adversarial trained using (). Then we compute the certified accuracies over a range of radii . Tables 1 and 2 report the certified accuracies using our method compared to (Cohen et al., 2019). For all radii, we outperform the certified accuracies of (Cohen et al., 2019) by a significant margin on both ImageNet and CIFAR10. These results are elaborated below.
For CIFAR10
Fig. 1(left) plots the upper envelope of the certified accuracies that we get by choosing the best model for each radius over a grid of hyperparameters. This grid consists of , , (see ‣ 4 for explanation), and one of the following attacks {, } with steps. The certified accuracies of each model can be found in Tables 412 in Appendix E. These results are compared to those of Cohen et al. (2019) by plotting their reported certified accuracies. Fig. 1(left) also plots the corresponding empirical accuracies using with . Note that our certified accuracies are higher than the empirical accuracies of Cohen et al. (2019).
For ImageNet
The results are summarized in Fig. 2, which is similar to Fig. 1 for CIFAR10, with the difference being the set of smoothed models we certify. This set includes smoothed models trained using , , , and one of the following attacks {1step , 2step }. Again, our models outperform those of Cohen et al. (2019) overall and per as well. The certified accuracies of each model can be found in Table 3 in Appendix E.
We point out, as mentioned by Cohen et al. (2019), that controls a robustness/accuracy tradeoff. When is low, small radii can be certified with high accuracy, but large radii cannot be certified at all. When is high, larger radii can be certified, but smaller radii are certified at a lower accuracy. This can be observed in the middle and the right plots of Fig. 1 and 2.
Effect on clean accuracy
Training smoothed classifers using SmoothAdv as shown improves upon the certified accuracy of Cohen et al. (2019) for each , although this comes with the wellknown effect of adversarial training in decreasing the standard accuracy, so we sometimes see small drops in the accuracy at , as observed in Fig. 1(right) and 2(right).
Additional experiments and observations
We compare the effectiveness of smoothed classifiers when they are trained SmoothAdvversarially vs. when their base classifier is trained via standard adversarial training (we will refer to the latter as vanilla adversarial training). As expected, because the training objective of SmoothAdvmodels aligns with the actual certification objective, those models achieve noticeably more certified robustness over all radii compared to smoothed classifiers resulting from vanilla adversarial training. We defer the results and details to Appendix C.1.
Furthermore, SmoothAdv requires the evaluation of (6) as discussed in Section 3. We analyze in Appendix C.2 how the number of Gaussian noise samples , used in (6) to find adversarial examples, affects the robustness of the resulting smoothed models. As expected, we observe that models trained with higher tend to have higher certified accuracies.
Finally, we analyze the effect of the maximum allowed perturbation used in SmoothAdv on the robustness of smoothed models in Appendix C.3.
4.2 Attacking trained models with SmoothAdv
In this section, we assess the performance of our attack, particularly , for finding adversarial examples for the CIFAR10 randomized smoothing models of Cohen et al. (2019).
requires the evaluation of (6) as discussed in Section 3. Here, we analyze how sensitive our attack is to the number of samples used in (6) for estimating the gradient of the adversarial objective. Fig. 3 shows the empirical accuracies for various values of . Lower accuracies corresponds to stronger attack. SmoothAdv with sample performs worse than the vanilla PGD attack on the base classifier, but as increases, our attack becomes stronger, decreasing the gap between certified and empirical accuracies. We did not observe any noticeable improvement beyond . More details are in Appendix C.4.
While as discussed here, the success rate of the attack is affected by the number of Gaussian noise samples used by the attacker, it is also affected by the number of Gaussian noise samples in Predict used by the classifier. Indeed, as increases, abstention due to low confidence becomes more rare, increasing the prediction quality of the smoothed classifier. See a detailed analysis in Appendix C.5.
5 Related Work
Recently, many approaches (defenses) have been proposed to build adversarially robust classifiers, and these approaches can be broadly divided into empirical defenses and certified defenses.
Empirical defenses are empirically robust to existing adversarial attacks, and the best empirical defense so far is adversarial training (Kurakin et al., 2016; Madry et al., 2017). In this kind of defense, a neural network is trained to minimize the worstcase loss over a neighborhood around the input. Although such defenses seem powerful, nothing guarantees that a more powerful, not yet known, attack would not break them; the most that can be said is that known attacks are unable to find adversarial examples around the data points. In fact, most empirical defenses proposed in the literature were later “broken” by stronger adversaries (Carlini and Wagner, 2017; Athalye et al., 2018; Uesato et al., 2018; Athalye and Carlini, 2018). To stop this arms race between defenders and attackers, a number of work tried to focus on building certified defenses which enjoy formal robustness guarantees.
Certified defenses are provably robust to a specific class of adversarial perturbation, and can guarantee that for any input , the classifier’s prediction is constant within a neighborhood of . These are typically based on certification methods which are either exact (a.k.a “complete”) or conservative (a.k.a “sound but incomplete”). Exact methods, usually based on Satisfiability Modulo Theories solvers (Katz et al., 2017; Ehlers, 2017) or mixed integer linear programming (Tjeng et al., 2019; Lomuscio and Maganti, 2017; Fischetti and Jo, 2017), are guaranteed to find an adversarial example around a datapoint if it exists. Unfortunately, they are computationally inefficient and difficult to scale up to large neural networks. Conservative methods are also guaranteed to detect an adversarial example if exists, but they might mistakenly flag a safe data point as vulnerable to adversarial examples. On the bright side, these methods are more scalable and efficient which makes some of them useful for building certified defenses (Wong and Kolter, 2018; Wang et al., 2018a, b; Raghunathan et al., 2018a, b; Wong et al., 2018; Dvijotham et al., 2018b, a; Croce et al., 2018; Gehr et al., 2018; Mirman et al., 2018; Singh et al., 2018; Gowal et al., 2018; Weng et al., 2018; Zhang et al., 2018). However, none of them have yet been shown to scale to practical networks that are large and expressive enough to perform well on ImageNet, for example. To scale up to practical networks, randomized smoothing has been proposed as a probabilistically certified defense.
Randomized smoothing
A randomized smoothing classifier is not itself a neural network, but uses a neural network as its base for classification. Randomized smoothing was proposed by several works (Liu et al., 2018; Cao and Gong, 2017) as a heuristic defense without proving any guarantees. Lecuyer et al. (2018) first proved robustness guarantees for randomized smoothing classifier, utilizing inequalities from the differential privacy literature. Subsequently, Li et al. (2018) gave a stronger robustness guarantee using tools from information theory. Recently, Cohen et al. (2019) provided a tight robustness guarantee for randomized smoothing and consequently achieved the state of the art in norm certified defense.
6 Conclusions
In this paper, we designed an adapted attack for smoothed classifiers, and we showed how this attack can be used in an adversarial training setting to substantially improve the provable robustness of smoothed classifiers. We demonstrated through extensive experimentation that our adversarially trained smooth classifiers consistently outperforms all existing provably robust classifiers by a significant margin on ImageNet and CIFAR10, establishing the state of the art for provable defenses.
Acknowledgements
We would like to thank Zico Kolter, Jeremy Cohen, Elan Rosenfeld, Aleksander Madry, Andrew Ilyas, Dimitris Tsipras, Shibani Santurkar, Jacob Steinhardt for comments and discussions.
References
 Athalye and Carlini (2018) Anish Athalye and Nicholas Carlini. On the robustness of the cvpr 2018 whitebox adversarial example defenses. arXiv preprint arXiv:1804.03286, 2018.
 Athalye et al. (2018) Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420, 2018.
 Cao and Gong (2017) Xiaoyu Cao and Neil Zhenqiang Gong. Mitigating evasion attacks to deep neural networks via regionbased classification. In Proceedings of the 33rd Annual Computer Security Applications Conference, pages 278–287. ACM, 2017.
 Carlini and Wagner (2017) Nicholas Carlini and David Wagner. Adversarial examples are not easily detected: Bypassing ten detection methods. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pages 3–14. ACM, 2017.
 Cohen et al. (2019) Jeremy M Cohen, Elan Rosenfeld, and J Zico Kolter. Certified adversarial robustness via randomized smoothing. arXiv preprint arXiv:1902.02918, 2019.
 Croce et al. (2018) Francesco Croce, Maksym Andriushchenko, and Matthias Hein. Provable robustness of relu networks via maximization of linear regions. arXiv preprint arXiv:1810.07481, 2018.
 Deng et al. (2009) Jia Deng, Wei Dong, Richard Socher, LiJia Li, Kai Li, and Li FeiFei. Imagenet: A largescale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
 Dvijotham et al. (2018a) Krishnamurthy Dvijotham, Sven Gowal, Robert Stanforth, Relja Arandjelovic, Brendan O’Donoghue, Jonathan Uesato, and Pushmeet Kohli. Training verified learners with learned verifiers. arXiv preprint arXiv:1805.10265, 2018a.
 Dvijotham et al. (2018b) Krishnamurthy Dvijotham, Robert Stanforth, Sven Gowal, Timothy Mann, and Pushmeet Kohli. A dual approach to scalable verification of deep networks. UAI, 2018b.
 Ehlers (2017) Ruediger Ehlers. Formal verification of piecewise linear feedforward neural networks. In International Symposium on Automated Technology for Verification and Analysis, pages 269–286. Springer, 2017.
 Fischetti and Jo (2017) Matteo Fischetti and Jason Jo. Deep neural networks as 01 mixed integer linear programs: A feasibility study. arXiv preprint arXiv:1712.06174, 2017.
 Gehr et al. (2018) Timon Gehr, Matthew Mirman, Dana DrachslerCohen, Petar Tsankov, Swarat Chaudhuri, and Martin Vechev. Ai2: Safety and robustness certification of neural networks with abstract interpretation. In 2018 IEEE Symposium on Security and Privacy (SP), pages 3–18. IEEE, 2018.
 Goodfellow et al. (2015) Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. ICLR, 2015.
 Gowal et al. (2018) Sven Gowal, Krishnamurthy Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Timothy Mann, and Pushmeet Kohli. On the effectiveness of interval bound propagation for training verifiably robust models. arXiv preprint arXiv:1810.12715, 2018.
 He et al. (2016) Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
 Katz et al. (2017) Guy Katz, Clark Barrett, David L Dill, Kyle Julian, and Mykel J Kochenderfer. Reluplex: An efficient smt solver for verifying deep neural networks. In International Conference on Computer Aided Verification, pages 97–117. Springer, 2017.
 Krizhevsky and Hinton (2009) Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.
 Kurakin et al. (2016) Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236, 2016.
 Lecuyer et al. (2018) Mathias Lecuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, and Suman Jana. Certified robustness to adversarial examples with differential privacy. arXiv preprint arXiv:1802.03471, 2018.
 Li et al. (2018) Bai Li, Changyou Chen, Wenlin Wang, and Lawrence Carin. Secondorder adversarial attack and certifiable robustness. arXiv preprint arXiv:1809.03113, 2018.
 Liu et al. (2018) Xuanqing Liu, Minhao Cheng, Huan Zhang, and ChoJui Hsieh. Towards robust neural networks via random selfensemble. In Proceedings of the European Conference on Computer Vision (ECCV), pages 369–385, 2018.
 Lomuscio and Maganti (2017) Alessio Lomuscio and Lalit Maganti. An approach to reachability analysis for feedforward relu neural networks. arXiv preprint arXiv:1706.07351, 2017.
 Madry et al. (2017) Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
 Mirman et al. (2018) Matthew Mirman, Timon Gehr, and Martin Vechev. Differentiable abstract interpretation for provably robust neural networks. In International Conference on Machine Learning, pages 3575–3583, 2018.
 Raghunathan et al. (2018a) Aditi Raghunathan, Jacob Steinhardt, and Percy Liang. Certified defenses against adversarial examples. International Conference on Learning Representations (ICLR), arXiv preprint arXiv:1801.09344, 2018a.
 Raghunathan et al. (2018b) Aditi Raghunathan, Jacob Steinhardt, and Percy S Liang. Semidefinite relaxations for certifying robustness to adversarial examples. In Advances in Neural Information Processing Systems, pages 10877–10887, 2018b.
 Rony et al. (2018) Jérôme Rony, Luiz G Hafemann, Luis S Oliveira, Ismail Ben Ayed, Robert Sabourin, and Eric Granger. Decoupling direction and norm for efficient gradientbased l2 adversarial attacks and defenses. arXiv preprint arXiv:1811.09600, 2018.
 Singh et al. (2018) Gagandeep Singh, Timon Gehr, Matthew Mirman, Markus Püschel, and Martin Vechev. Fast and effective robustness certification. In Advances in Neural Information Processing Systems, pages 10825–10836, 2018.
 Stein (1981) Charles M Stein. Estimation of the mean of a multivariate normal distribution. The annals of Statistics, pages 1135–1151, 1981.
 Szegedy et al. (2013) Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
 Tjeng et al. (2019) Vincent Tjeng, Kai Y. Xiao, and Russ Tedrake. Evaluating robustness of neural networks with mixed integer programming. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=HyGIdiRqtm.
 Uesato et al. (2018) Jonathan Uesato, Brendan O’Donoghue, Aaron van den Oord, and Pushmeet Kohli. Adversarial risk and the dangers of evaluating against weak attacks. arXiv preprint arXiv:1802.05666, 2018.
 Wang et al. (2018a) Shiqi Wang, Yizheng Chen, Ahmed Abdou, and Suman Jana. Mixtrain: Scalable training of formally robust neural networks. arXiv preprint arXiv:1811.02625, 2018a.
 Wang et al. (2018b) Shiqi Wang, Kexin Pei, Justin Whitehouse, Junfeng Yang, and Suman Jana. Efficient formal safety analysis of neural networks. In Advances in Neural Information Processing Systems, pages 6369–6379, 2018b.
 Weng et al. (2018) TsuiWei Weng, Huan Zhang, Hongge Chen, Zhao Song, ChoJui Hsieh, Duane Boning, Inderjit S Dhillon, and Luca Daniel. Towards fast computation of certified robustness for ReLU networks. In International Conference on Machine Learning, 2018.
 Wong and Kolter (2018) Eric Wong and Zico Kolter. Provable defenses against adversarial examples via the convex outer adversarial polytope. In International Conference on Machine Learning (ICML), pages 5283–5292, 2018.
 Wong et al. (2018) Eric Wong, Frank Schmidt, Jan Hendrik Metzen, and J Zico Kolter. Scaling provable adversarial defenses. Advances in Neural Information Processing Systems (NIPS), 2018.
 Zhang et al. (2018) Huan Zhang, TsuiWei Weng, PinYu Chen, ChoJui Hsieh, and Luca Daniel. Efficient neural network robustness certification with general activation functions. In Advances in Neural Information Processing Systems, pages 4939–4948, 2018.
Appendix A Alternative proof of the robustness guarantee of Cohen et al. [2019] via explicit Lipschitz constants of smoothed classifier
Fix and define by:
Lemma 1.
The function is Lipschitz.
Proof.
It suffices to prove that for any unit direction one has . Note that:
(8) 
and thus (using , and classical integration of the Gaussian density)
∎
A very nice observation of Cohen et al. [2019] is that the function actually satisfies a stronger smoothness property:
Lemma 2.
Let . Assume that . Then the function is Lipschitz.
Proof.
Note that:
and thus we need to prove that for any unit direction , denoting ,
Note that the lefthand side can be written as follows (recall (8))
Invoking a simple symmetry argument, one can actually compute the supremum of the above quantity over all functions , subject to the constraint that , namely it is equal to:
thus concluding the proof. ∎
Both lemmas give the same robustness guarantee for small gaps, but the second lemma is much better for large gaps (in fact, in the limit of a gap going to , the second lemma gives an infinite radius while the first lemma only gives a radius of ).
Appendix B Another perspective for deriving SmoothAdv
In this section we provide an alternative motivation for the SmoothAdv objective presented in Section 2.2. We assume that we have a hard classifier which takes the form , for some function . If is a neural network classifier, this can be taken for instance to be the map from the input to the logit layer immediately preceding the softmax. If is of this form, then the smoothed soft classifier with parameter associated to (the onehot encoding of) can be written has
(9) 
for all , where is the function, which at input , has th coordinate equal to if and only if , and zero otherwise. The function is somewhat hard to work with, therefore we will approximate it with a smooth function, namely, the softmax function. Recall that the softmax function with inverse temperature parameter is the function given by . Observe that for any , we have that as . Thus we can approximate (9) with
(10) 
To find an adversarial perturbation of at data point , it is sufficient to find a perturbation so that is minimized. Combining this with the approximation (10), we find that a heuristic to find an adversarial example for the smoothed classifier at is to solve the following optimization problem:
(11) 
and as we let , this converges to finding an adversarial example for the true smoothed classifier.
To conclude, we simply observe that for neural networks, is exactly the soft classifier that is thresholded to form the hard classifier, if is taken to be . Therefore the solution to () and (11) with are the same, since is a monotonic function.
An interesting direction is to investigate whether varying in (11) allows us to improve our adversarial attacks, and if they do, whether this gives us stronger adversarial training as well. Intuitively, as we take , the quality of the optimal solution should increase, but the optimization problem becomes increasingly illbehaved, and so it is not clear if the actual solution we obtain to this problem via first order methods becomes better or not.
Appendix C Additional Experiments
c.1 Adversarial attacking the base model instead of the smoothed model
We compare SmoothAdversarial training (training the smoothed classifier ) to:

using vanilla adversarial training (PGD) to find adversarial examples of the base classifier and train on them. We refer to this as Vanilla PGD training.

using vanilla adversarial training (PGD) to find adversarial examples of the base classifier , add Gaussian noise to them, then train on the resulting inputs. We refer to this as Vanilla PGD+noise training.
For our method and the above two methods, we use steps of attack, , and we train for , and for .
Fig. 4 plots the best certified accuracies over all and values, for each radius using our trained classifiers vs. smoothed models trained via Vanilla PGD or Vanilla PGD+noise. Fig. 4 also plots Cohen et al. [2019] results as a baseline. Observe that SmoothAdversially trained models are more robust overall.
c.2 Effect of number of noise samples in (6) during SmoothAdversarial training on the certified accuracy of smoothed classifiers
As presented in Section 4.2, more noise samples lead to stronger SmoothAdveraial attack. Here, we demonstrate that if we train with such improved attacks, we get higher certified accuracies of the smoothed classifier. Fig. 5 plots the best certified accuracies over models trained using or with , , , and across various number of noise samples for the attack. Observe that models trained with higher tend to have higher certified accuracies.
c.3 Effect of during training on the certified accuracy of smoothed classifiers
Here, we analyze the effect of the maximum allowed perturbation of SmoothAdv during adversarial training on the robustness of the obtained smoothed classifier. Fig. 6 plots the best certified accuracies for over models trained using with , , and . Observe that as increases, the certified accuracies for small radii decrease, but those for large radii increase, which is expected.
c.4 Effect of the number of samples in (6) during SmoothAdv attack on the empirical accuracies
requires the evaluation of (6) as discussed in Section 3. Here, we analyze how sensitive our attack is to the number of samples used in (6). Fig. 3 shows the empirical accuracies for various values of . Lower accuracies correspond to stronger attacks. For , the vanilla PGD attack (attacking the base classifier instead of the smooth classifier) performs better than SmoothAdv, but as increases, our attack becomes stronger, decreasing the gap between certified and empirical accuracies. We did not observe any noticeable improvement beyond .
c.5 Effect of the number of Monte Carlo samples in Predict on the empirical accuracies
Fig. 7 plots the empirical accuracies of using a attack (with ) across different numbers of Monte Carlo samples n that are used by Predict. Observe that the empirical accuracies increase as increases since the prediction quality of the smoothed classifier improves i.e. less predictions are abstained.
c.6 Performance of the gradientfree estimator (7)
Despite the appealing features of the gradientfree estimator (7) presented in Section 3.1 as an alternative to (6), in practice we find that this attack is quite weak. This is shown in Fig. 8 for various values of .
We speculate that this is because the variance of the gradient estimator is too high. We believe that investigating this attack in practice is an interesting direction for future work.
Appendix D Experiments Details
Here we include details of all the experiments conducted in this paper.
Attacks used in the paper
We use two of the strongest attacks in the literature, projected gradient descent (PGD) Madry et al. [2017] and decoupled direction and norm (DDN) Rony et al. [2018] attacks. We adapt these attacks such that their gradient steps are given by (6), and we call the resulting attacks and , respectively.
For PGD (), we use a constant step size where is the number of attack steps, and is the maximum allowed perturbation of the input.
For DDN (), the attack objective is in fact different than that of PGD (i.e. different that ()). DDN tries to find the “closest” adversarial example to the input instead of finding the “best” adversarial example (in terms of maximizing the loss in a given neighborhood of the input). We stick to the hyperparameters used in the original paper Rony et al. [2018]. We use , , and an initial step size that is reduced with cosine annealing to 0.01 in the last iteration (see Rony et al. [2018] for the definition of these parameters). We experimented with very few iterations () as compared to the original paper, but we still got good results.
We emphasize that we are not using PGD and DDN to attack the base classifer of a smoothed model, instead we are using them to adversarially train smoothed classiers (see Pseudocode 1).
Training details
In order to report certified radii in the original coordinates, we first added Gaussian noise and/or do adversarial attacks, and then standardized the data (in contrast to importing a standardized dataset). Specifically, in our PyTorch implementation, the first layer of the base classifier is a normalization layer that performed a channelwise standardization of its input.
For both ImageNet and CIFAR10, we trained the base classifier with random horizontal flips and random crops (in addition to the Gaussian data augmentation discussed in Section 3.2).
The main training algorithm is shown in Pseudocode 1. It has the following parameters: is the minibatch size, is the number of noise samples used for gradient estimation in (6) as well as for Gaussian noise data augmentation, and is the number of steps of an attack.
We point out few remarks.

First, an important parameter is the radius of the attack . During the first epoch, it is set to zero, then we linearly increase it over the first ten epochs, then it stays constant.

Second, we are reusing the same noise samples during every step of our attack as well as augmentation. Intuitively, it helps to stabilize the attack process.

Finally, the way training is described in Pseudocode 1 is not efficient; it needs to be appropriately batched so that we compute adversarial examples for every input in a batch at the same time.
Compute details and training time
On CIFAR10, we trained using SGD on one NVIDIA P100 GPU. We train for 150 epochs. We use a batch size of 256, and an initial learning rate of 0.1 which drops by a factor of 10 every 50 epochs. Training time varies between few hours to few days, depending on how many attack steps and noise samples are used in Pseudocode 1.
On ImageNet we trained with synchronous SGD on four NVIDIA V100 GPUs. We train for 90 epochs. We use a batch size of 400, and an initial learning rate of 0.1 which drops by a factor of 10 every 30 epochs. Training time varies between 2 to 6 days depending on whether we are doining SmoothAdversarial training or just Gaussian noise training (similar to Cohen et al. [2019]).
Models used
The models used in this paper are similar to those used in Cohen et al. [2019]: a ResNet50 [He et al., 2016] on ImageNet, and ResNet110 on CIFAR10. These models can be found on the github repo accompanying Cohen et al. [2019] https://github.com/locuslab/smoothing/blob/master/code/architectures.py.
Parameters of Certify amd Predict
For details of these algorithms, please see the Pseudocode in Cohen et al. [2019].
For Certify, unless otherwise specified, we use , , .
For Predict, unless otherwise specified, we use and .
Source code
Our code and trained models are publicly available at http://github.com/Hadisalman/smoothingadversarial. The repository also includes all our training/certification logs, which enables the replication of all the results of this paper by running a single piece of code. Check the repository for more details.
Appendix E ImageNet and CIFAR10 Detailed Results
In this appendix, we include the certified accuracies of each mode that we use in the paper. For each radius, we highlight the best accuracy accross all models. Note that we outperform the models of Cohen et al. [2019] (first three rows of each table) over all radii by wide margins.
Radius (ImageNet)  

Cohen et al. [2019]  0.67  0.49  0.00  0.00  0.00  0.00  0.00  0.00  
0.57  0.46  0.37  0.29  0.00  0.00  0.00  0.00  
0.44  0.38  0.33  0.26  0.19  0.15  0.12  0.09  
0.63  0.54  0.00  0.00  0.00  0.00  0.00  0.00  
0.62  0.54  0.00  0.00  0.00  0.00  0.00  0.00  
0.56  0.52  0.00  0.00  0.00  0.00  0.00  0.00  
0.49  0.45  0.00  0.00  0.00  0.00  0.00  0.00  
0.56  0.48  0.42  0.34  0.00  0.00  0.00  0.00  
0.54  0.49  0.43  0.37  0.00  0.00  0.00  0.00  
0.48  0.45  0.42  0.37  0.00  0.00  0.00  0.00  
0.44  0.42  0.39  0.37  0.00  0.00  0.00  0.00  
0.44  0.38  0.34  0.29  0.24  0.20  0.15  0.11  
0.41  0.36  0.34  0.31  0.26  0.21  0.18  0.14  
0.40  0.37  0.34  0.30  0.27  0.25  0.20  0.15  
0.34  0.31  0.29  0.27  0.25  0.22  0.19  0.16  
0.66  0.52  0.00  0.00  0.00  0.00  0.00  0.00  
0.65  0.56  0.00  0.00  0.00  0.00  0.00  0.00  
0.65  0.54  0.00  0.00  0.00  0.00  0.00  0.00  
0.67  0.55  0.00  0.00  0.00  0.00  0.00  0.00  
0.59  0.48  0.38  0.29  0.00  0.00  0.00  0.00  
0.55  0.49  0.40  0.32  0.00  0.00  0.00  0.00  
0.58  0.49  0.42  0.34  0.00  0.00  0.00  0.00  
0.58  0.51  0.41  0.32  0.00  0.00  0.00  0.00  
0.44  0.37  0.31  0.26  0.20  0.16  0.11  0.08  
0.46  0.39  0.32  0.26  0.22  0.17  0.11  0.09  
0.45  0.39  0.34  0.27  0.23  0.16  0.13  0.09  
0.44  0.39  0.34  0.28  0.22  0.16  0.12  0.08 
Radius (CIFAR10)  

Cohen et al. [2019]  0.81  0.59  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  
0.75  0.60  0.43  0.27  0.00  0.00  0.00  0.00  0.00  0.00  
0.65  0.55  0.41  0.32  0.23  0.15  0.09  0.05  0.00  0.00  
0.47  0.39  0.34  0.28  0.22  0.17  0.14  0.12  0.10  0.08  
0.84  0.69  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  
0.75  0.63  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  
0.75  0.63  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  
0.78  0.64  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  
0.77  0.65  0.49  0.33  0.00  0.00  0.00  0.00  0.00  0.00  
0.70  0.57  0.50  0.40  0.00  0.00  0.00  0.00  0.00  0.00  
0.72  0.58  0.45  0.35  0.00  0.00  0.00  0.00  0.00  0.00  
0.71  0.60  0.48  0.36  0.00  0.00  0.00  0.00  0.00  0.00  
0.66  0.55  0.44  0.34  0.25  0.18  0.12  0.08  0.00  0.00  
0.68  0.55  0.49  0.33  0.26  0.17  0.11  0.10  0.00  0.00  
0.63  0.52  0.44  0.33  0.25  0.18  0.14  0.09  0.00  0.00  
0.63  0.52  0.44  0.36  0.28  0.20  0.15  0.10  0.00  0.00  
0.50  0.42  0.34  0.27  0.22  0.19  0.15  0.13  0.10  0.07  
0.47  0.39  0.34  0.27  0.23  0.18  0.16  0.13  0.11  0.08  
0.48  0.41  0.35  0.29  0.25  0.20  0.16  0.14  0.12  0.09  
0.43  0.40  0.34  0.28  0.25  0.21  0.17  0.14  0.12  0.10  
0.82  0.69  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  
0.79  0.67  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  
0.71  0.64  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  
0.54  0.49  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  
0.77  0.65  0.51  0.39  0.00  0.00  0.00  0.00  0.00  0.00  
0.74  0.64  0.53  0.41  0.00  0.00  0.00  0.00  0.00  0.00  
0.64  0.59  0.53  0.45  0.00  0.00  0.00  0.00  0.00  0.00  
0.53  0.49  0.46  0.42  0.00  0.00  0.00  0.00  0.00  0.00  
0.64  0.56  0.46  0.38  0.30  0.23  0.15  0.10  0.00  0.00  
0.63  0.55  0.47  0.39  0.30  0.24  0.19  0.14  0.00  0.00  
0.57  0.52  0.46  0.41  0.33  0.28  0.23  0.18  0.00  0.00  
0.47  0.45  0.41  0.39  0.35  0.31  0.26  0.23  0.00  0.00  
0.48  0.42  0.36  0.30  0.25  0.21  0.17  0.14  0.12  0.09  
0.47  0.41  0.37  0.31  0.27  0.22  0.20  0.17  0.14  0.12  
0.46  0.41  0.37  0.33  0.28  0.24  0.22  0.18  0.16  0.14  
0.39  0.37  0.34  0.30  0.27  0.25  0.22  0.20  0.18  0.15 
Radius (CIFAR10)  

Cohen et al. [2019]  0.81  0.59  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  
0.75  0.60  0.43  0.27  0.00  0.00  0.00  0.00  0.00  0.00  
0.65  0.55  0.41  0.32  0.23  0.15  0.09  0.05  0.00  0.00  
0.47  0.39  0.34  0.28  0.22  0.17  0.14  0.12  0.10  0.08  
0.82  0.68  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  
0.80  0.67  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  
0.78  0.69  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  
0.78  0.67  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  
0.77  0.64  0.50  0.38  0.00  0.00  0.00  0.00  0.00  0.00  
0.70  0.61  0.50  0.40  0.00  0.00  0.00  0.00  0.00  0.00  
0.72  0.61  0.53  0.42  0.00  0.00  0.00  0.00  0.00  0.00  
0.72  0.63  0.54  0.40  0.00  0.00  0.00  0.00  0.00  0.00  
0.65  0.57  0.47  0.37  0.27  0.19  0.12  0.07  0.00  0.00  
0.64  0.54  0.45  0.35  0.28  0.20  0.15  0.10  0.00  0.00  
0.63  0.54  0.46  0.38  0.30  0.23  0.16  0.11  0.00  0.00  
0.63  0.53  0.44  0.36  0.29  0.22  0.17  0.10  0.00  0.00  
0.48  0.41  0.34  0.29  0.22  0.19  0.17  0.14  0.10  0.09  
0.47  0.40  0.34  0.28  0.23  0.20  0.17  0.14  0.11  0.09  
0.47  0.39  0.34  0.28  0.24  0.21  0.18  0.15  0.13  0.09  
0.48  0.40  0.35  0.30  0.25  0.21  0.17  0.14  0.12  0.09  
0.83  0.69  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  
0.81  0.69  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  
0.72  0.63  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  
0.56  0.52  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  
0.76  0.66  0.51  0.39  0.00  0.00  0.00  0.00  0.00  0.00  
0.69  0.63  0.53  0.42  0.00  0.00  0.00  0.00  0.00  0.00  
0.66  0.59  0.53  0.46  0.00  0.00  0.00  0.00  0.00  0.00  
0.53  0.49  0.45  0.42  0.00  0.00  0.00  0.00  0.00  0.00  
0.65  0.57  0.47  0.37  0.29  0.23  0.16  0.09  0.00  0.00  
0.62  0.54  0.48  0.40  0.29  0.25  0.19  0.14  0.00  0.00  
0.56  0.50  0.44  0.39  0.34  0.30  0.23  0.18  0.00  0.00  
0.47  0.44  0.41  0.38  0.34  0.31  0.27  0.24  0.00  0.00  
0.49  0.42  0.36  0.30  0.25  0.21  0.18  0.14  0.12  0.10  
0.48  0.43  0.37  0.30  0.26  0.24  0.19  0.16  0.14  0.12  
0.45  0.40  0.37  0.34  0.30  0.25  0.21  0.19  0.17  0.15  
0.37  0.35  0.32  0.30  0.28  0.25  0.23  0.19  0.17  0.15 
Radius (CIFAR10)  

Cohen et al. [2019]  0.81  0.59  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  
0.75  0.60  0.43  0.27  0.00  0.00  0.00  0.00  0.00  0.00  
0.65  0.55  0.41  0.32  0.23  0.15  0.09  0.05  0.00  0.00  
0.47  0.39  0.34  0.28  0.22  0.17  0.14  0.12  0.10  0.08  
0.81  0.66  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  
0.81  0.67  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  
0.76  0.66  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  
0.80  0.67  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  
0.77  0.65  0.49  0.36  0.00  0.00  0.00  0.00  0.00  0.00  
0.75  0.64  0.51  0.37  0.00  0.00  0.00  0.00  0.00  0.00  
0.72  0.63  0.53  0.41  0.00  0.00  0.00  0.00  0.00  0.00  
0.71  0.63  0.52  0.40  0.00  0.00  0.00  0.00  0.00  0.00  
0.68  0.56  0.47  0.36  0.25  0.19  0.12  0.08  0.00  0.00  
0.67  0.58  0.45  0.38  0.30  0.22  0.16  0.11  0.00  0.00  
0.62  0.52  0.43  0.35  0.29  0.25  0.18  0.12  0.00  0.00  
0.63  0.54  0.45  0.36  0.27  0.22  0.16  0.11  0.00  0.00  
0.48  0.41  0.35  0.30  0.23  0.19  0.15  0.12  0.10  0.08  
0.47  0.40  0.35  0.30  0.23  0.19  0.17  0.13  0.10  0.09  
0.47  0.40  0.35  0.30  0.24  0.21  0.17  0.15  0.13  0.09  
0.45  0.40  0.34  0.30  0.24  0.18  0.17  0.15  0.12  0.09  
0.81  0.65  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  
0.78  0.68  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  
0.70  0.62  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  
0.56  0.53  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  
0.76  0.63  0.54  0.40  0.00  0.00  0.00  0.00  0.00  0.00  
0.72  0.61  0.52  0.43  0.00  0.00  0.00  0.00  0.00  0.00  