Improved robustness to adversarial examples using Lipschitz regularization of the loss
Adversarial training is an effective method for improving robustness to adversarial attacks. We show that adversarial training using the Fast Signed Gradient Method can be interpreted as a form of regularization. We implemented a more effective form of adversarial training, which in turn can be interpreted as regularization of the loss in the 2-norm, . We obtained further improvements to adversarial robustness, as well as provable robustness guarantees, by augmenting adversarial training with Lipschitz regularization.
|Chris Finlay, Adam Oberman & Bilal Abbasi††thanks: Bilal Abbasi completed this work during his PhD at McGill. He is now at Eidos Montréal.|
|Department of Mathematics and Statistics|
|Montréal, Québec, Canada|
1.1 Contributions of this work
Adversarial training is an effective method for improving robustness to adversarial attacks. We show that adversarial training using the Fast Signed Gradient Method (Goodfellow et al., 2014) can be interpreted as regularization by the average of the 1-norm of the gradient of the loss over the data,
The choice of norm for the adversarial perturbation can lead to different interpretations: using the 2-norm for adversarial training corresponds to
We consider Lipschitz regularization in §3. Write for the Lipschitz constant of loss of the model, . We found existing methods of Lipschitz regularization based on norms of weight matrices (Bartlett, 1996; Szegedy et al., 2013) to be ineffective. As an alternative, we consider a tractable Lipschitz regularization of the loss of the model, by taking the maximum of over the data of the norm of the gradient of the loss of the model.
Moreover, we show in 3.2 that controls the adversarial robustness of the model. Thus we interpret adversarial training (in the 2-norm) augmented with Lipschitz regularization as minimization of the objective function
which we refer to as (tulip). In practice, outperforms and . For example on CIFAR-10, for a ResNeXt model, adversarial training alone reduced adversarial training error by 29% (measured at adversarial distance111Apologies for overloading ‘’ for both the loss and for norms: we hope the meaning is clear from context ) over an undefended model. In contrast, with Lipschitz regularization () reduces adversarial error by 42% over baseline. See Table 1. We trained with hyperparameters and . Other values of and may work better; we did not tune these hyperparameters. See §4 for empirical results.
1.2 Background on adversarial examples and adversarial training
Improving robustness to adversarial samples is a first step towards model verification (Szegedy et al., 2013; Goodfellow et al., 2018). However robustness guarantees to adversarial samples are difficult to obtain, since in practice it is only possible to generate suboptimal adversarial attacks.
Adversarial samples are unlikely to occur randomly. Rather, they are generated by an adversary. Adversarial attacks are classified according to the amount of information available to the attacker. White box attacks occur when the attacker has full access to the loss, model, and label. Typically white box attacks are generated using loss gradients: these attacks include L-BFGS (Szegedy et al. (2013)), Fast Signed Gradient (Goodfellow et al. (2014)), Jacobian Saliency (Papernot et al. (2016a)), and Projected Gradient Descent (Madry et al. (2017)). Black box attacks rely on less information, using model outputs rather than model gradients. Black box attacks require more effort (Papernot et al. (2017)) to implement, but their brute force approach may make them more effective evading adversarial defences (Brendel et al. (2018)).
The recent review Goodfellow et al. (2018) discusses defences against adversarial attacks and their limitations. The earliest and most successful defense is adversarial training (Szegedy et al. (2013); Goodfellow et al. (2014); Tramèr et al. (2018); Madry et al. (2017)). Top entries in a recent adversarial defence competition (Kurakin et al. (2017)) used Ensemble Adversarial Training (Tramèr et al. (2018)), where a model is adversarially trained with inputs generated by an ensemble of other models.
In adversarial training, the model, , is trained to solve the minimax problem
However in practice this problem is not computationally feasible. Instead, (1) is approximated. A popular and effective approximation is the Fast Gradient Sign Method (FGSM) (Goodfellow et al., 2014), which also defines an attack.
Other forms of defences against gradient based attacks besides adversarial training include (Papernot et al., 2017; 2016b) as well as adding stochastic noise to the model, using a non-differentiable classifier (Lu et al. (2017)), or defense distillation (Hinton et al. (2015); Papernot et al. (2016c)). Gradient based methods may be less successful against black box attacks (Brendel et al. (2018)).
Other possible defences discussed in Goodfellow et al. (2018) include input validation and preprocessing, which would potentially allow adversarial samples to be recognized before being input to the model, and architecture modifications designed to improve robustness to adversarial samples. For more information we refer to the review (Goodfellow et al. (2018)) and the discussion of attack methods in (Brendel et al. (2018)).
1.3 Background on Lipschitz Regularization of the model
A form of robustness guarantees for a network is provided by the global Lipschitz constant of the model. Weng et al. (2018) show that the Lipschitz constant of the model gives an certifiable minimum adversarial distance: a successful attack on image will have adversarial distance at least
where is the Lipschitz constant of the model, , and is the correct label of . Thus training models to have small Lipschitz constant could improve adversarial robustness (Hein & Andriushchenko (2017); Tsuzuku et al. (2018)). Oberman & Calder (2018) recently showed that Lipschitz regularization leads to a proof of generalization. The Lipschitz constant of a model may be estimated using only the product of the norms of model weight matrices (Bartlett (1996); Szegedy et al. (2013)), which is independent of the data. Models have been trained using this estimate as a regularization term in (Cissé et al., 2017; Gouk et al., 2018; Miyato et al., 2018a; Tsuzuku et al., 2018).
2 Adversarial training and regularization
Definition 2.1 (Adversarial attacks).
Write for the correct label and for the classifier. An adversarial attack , is a perturbation of the input which leads to incorrect classification
Adversarial attacks seek to find the minimum norm attack vector, which is an intractable problem (Athalye et al., 2018). An alternative which permits loss gradients to be used, is to consider the attack vector of a given norm which most increases the loss, .
2.1 Derivation of attack directions
The solution of (3) can be approximated using the dual norm (Boyd & Vandenberghe, 2004, A.1.6). If the -norm is used, we recover the Signed Gradient (Goodfellow et al., 2014). However a different attack vector is obtained if we measure attacks in the 2-norm.
The optimal attack vector defined by (3) in a generic norm can be approximated to with the vector , where is the solution of
and is the dual norm. In particular is given by
Write and use the Taylor expansion of
Then we can approximate (3) by solving
In the case of the -norm, the dual norm is the 1-norm, and the solution is given by the Signed Gradient vector . In the case of the 2-norm the dual norm is itself the 2-norm and the solution of (6) is given by . ∎
The 2-norm attack vector, points in the direction of the gradient of the loss, while the signed gradient attack vector points in the direction of the optimal dual vector.
2.2 Interpretation of adversarial training
Adversarial training can be interpreted as minimizing
Adversarial training using the attack vector (5) can be interpreted as augmenting the loss function with the regularization where
2.3 Iterative attacks based on gradient norms
The angle between and is given by
where is the input dimension. Because , this ratio is always between zero and one. On the networks we studied, the ratio above could be as small as 0.32. To illustrate, Figure 0(b) shows the angle between iterative FGSM and the iterative gradient ascent on a toy loss (convex quadratic) in two dimensions. In practice we find iterative attacks using the steepest ascent direction are more effective than iterative FGSM based attacks, see Section 4.2.
3 Lipschitz Regularization
3.1 Evaluating the Lipschitz constant of a model
The Lipschitz constant of a function is given by
When is differentiable on a closed, bounded domain, , then
Here for vector value functions, , the induced matrix norm must be used, based on the norms for and (Horn et al., 1990, Chapter 5.6.4). The result is standard in analysis, it follows from the Mean Value Theorem and the definition of the derivative. Using (10), we can approximate the Lipschitz constant by testing on the data
Because the loss is a scalar, Lipschitz regularization of the loss is implemented by taking and minimizing the regularized loss function
The first term in () is the expected loss, and the second term is the approximation of the Lipschitz constant of the loss coming from (11). During training with Stochastic Gradient Descent, both terms are evaluated over mini-batches.
3.2 Lipschitz constant of data and optimal extensions
Define the Lipschitz constant of the data (in the norms) to be
Table 4.1 lists the Lipschitz constant of the training data for common datasets, which are all small: all by one are below 1.
The Lipschitz extension theorem (Valentine, 1945) says that given function values , there exists an extension which perfectly fits the data, and has the same Lipschitz constant, provided the appropriate norm are used on the and spaces. This can be done using, for example, the 2-norm for and the norm on the label space. In other norms, we can also make an extension, but the Lipschitz constant may increase (Johnson & Lindenstrauss, 1984). Of course, such a function may not be consistent with a given architecture.
3.3 Robustness guarantees from the Lipschitz constant
The following Lemma shows that the Lipschitz constant of the loss function gives a robustness guarantee for the loss incurred by an adversarial perturbation of norm . An analogous formula gives the corresponding robustness result using the Lipschitz constant of the model (2).
Lemma 3.2 (Stability of network).
Suppose the composed loss function is -Lipschitz continuous. Let be an adversarial perturbation of norm . Then
By Lipschitz continuity of
There are two cases for the left-hand side, depending on the sign. In both cases we obtain (13). ∎
3.4 Regularization of the model versus the loss of the model
If the goal is adversarial robustness, then regularization of the loss is just as effective (empirically) as regularizing the model, at a much lower cost. Since the loss is a scalar, regularizing by the Lipschitz constant of the loss is equivalent to corresponds to regularization of the model in one direction. By the chain rule,
For example, when is the KL divergence, and when then
Thus, in this case, regularizing corresponds to regularization of in the direction .
3.5 Upper bounds on the Lipschitz constant
The estimate (11) is a lower bound of the Lipschitz constant of the loss. It is well known that data independent upper bounds on the Lipschitz constant of the model are available (Bartlett, 1996) using the product of the norm of the weight matrices. See also Szegedy et al. (2013); Cissé et al. (2017); Gouk et al. (2018); Miyato et al. (2018b; a); Tsuzuku et al. (2018). Other estimates are also available, for example Weng et al. (2018) used Extreme value theory to estimate the local Lipschitz constant of a model.
Let be the weight matrix of the -th layer of a network comprised of layers, and suppose all non linearities of a network are at most 1-Lipschitz. Then via the chain rule and properties of induced matrix norms, it can be shown
with and . Certain conditions on the ’s must be met. For a proof with see Tsuzuku et al. (2018). A similar bound is available with and . However, for deep models, we found that for the networks we studied used this bound was way to large: by a factor on the order of to .
For many networks, a tighter bound is available. Here we prove a bound on the Lipschitz constant (in the -norm), as the norm of the product of weight matrices.
Let be the last layer of model before the layer. Suppose the only non-linearities in the model up to the layer are entry-wise activation functions which are 1-Lipschitz. Let be the weight matrix of the -th layer of a network with layers. Then
The proof is in §C. This is a tighter bound, but we found empirically that it is still an over-estimate of the Lipschitz constant for deep networks. Networks with other non-linearities, such as -pooling, are not captured by the lemma, but we believe a generalization is possible.
In practice, computing from a predefined network is straightforward. The following is a simple method for calculating the product of weights. Let be a linearized network defined using the weight matrices of , but without activation functions or a final . Then the product is determined by evaluating the linearized network at the identity , . If the layers also have biases, then the linearized bias is recovered by evaluating at . In this case the product of weights is given by subtracting off the linearized bias.
4 Empirical results
We considered two toy problems, using image classification on the CIFAR-10 and CIFAR-100 datasets (Krizhevsky & Hinton (2009)). We tested our methods on three networks, chosen to represent a broad range of architectures: AllCNN-C (Springenberg et al. (2014)), a 34 layer ResNet (He et al. (2016)), and a 34 layer ResNeXt (Xie et al. (2017)). Training and model details are provided in Appendix A.
4.1 Error curves for models and robustness metrics
We define the error curve of a model given an attack. The curve provides information about the robustness of a model to attacks of different norms.
The error curve of the model for the attack is the probability over the test data that an attack of size leads to a misclassification
See Figure 0(a) for error curves for a given model over a range of attacks. We also plot the data stability curve, which is the probability that a perturbation can move one data point in the direction of another data point with a different label (which can be interpreted as a very weak attack).
We can compare how two different models perform against various attacks using the model error curve. In Table 1 we report and , using the 2-norm on . These values corresponded to the test error, and noise which is slightly smaller than a human perceptible perturbation (see Figure 3). We also report the median distance which corresponds to the -intercept of 50% error on the curve.
4.2 Attack evaluation
We attacked each model on the test/validation set using six untargeted attack methods: gradient attack; projected gradient descent (constrained in ); the Fast Gradient Sign Method (FGSM) (Goodfellow et al. (2014)); Iterative FGSM (I-FGSM) (Kurakin et al. (2016)); DeepFool (Moosavi-Dezfooli et al. (2016)); and Boundary attack (Brendel et al., 2018). The first five methods are white-box attacks, while the last is a black-box attack. I-FGSM and the projected gradient attack are iterative methods, whereas FGSM and the gradient attack are single step. All attacks were implemented with Foolbox 1.3.1 (Rauber et al. (2017)). Hyperparameters were set to Foolbox defaults, except for the Boundary attack222 The Boundary attack is a computationally demanding attack, and so due to resource constraints we ran the Boundary attack for only 500 iterations per test image. The boundary attack should have better performance with more iterations.. For each image and attack method, each attack reports an adversarial distance (in ).
On each model, dataset, and regularization method, we tested all six attack methods on the entire test/validation set. We compared attack methods using the attack error curve. For example see Figure 0(a), where we plot attack error curves for each attack method on an undefended model. The attack error curve plots the percent of misclassified test images as a function of adversarial distance. We report against Euclidean distance, , because it is an often reported measure of adversarial robustness, although other choices (MSE, distance in -norm) are equally valid. Common adversarial metrics are easily read off the attack error curve. For example, median Euclidean adversarial distance occurs at the -intercept of 50% error. The percent error given a maximum adversarial distance (for example ) is also readily available.
On all models and for all defence methods studied, projected gradient descent (constrained in ) consistently outperformed the other attack methods: Projected gradient descent had the smallest mean adversarial distance and the highest attack error curve. See Figure 0(a) for an illustration. A close second was I-FGSM, the other iterative attack method tested. The next two best attack methods were gradient attack, followed by FGSM. The Boundary attack outperformed DeepFool. We observed the same ranking of attacks on all models and defences studied. For this reason in the following section, we only report model statistics using projected gradient descent, constrained in , which could also be regarded as the strongest attack of all the attacks listed.
4.3 Evaluation of defence methods
Each model was trained with combinations of up to three adversarial defences. The methods were: (i) , the baseline undefended model; (ii) , adversarial training with FGSM; (iii) , adversarial training with 2-norm; each of can be augmented with Lipschitz regularization, which in the last case we call ; we also considered adding a final sigmoid layer to the network, prior to the .
The choice of sigmoid we choose is , and is inspired by (but not equivalent to) -estimators used in classical statistics as a robust estimator (Hampel et al., 2011, Chapter 2). The intuition behind this choice is to normalize the logit scores of the model, which we believe should improve robustness to outliers. Outside of deep learning, -estimators have been successfully used to normalize scores and improve robustness, for example in machine learning biometrics (Jain et al. (2005)). See Appendix A.1 for layer details.
Model robustness is evaluated on the entire test/validation set using the median adversarial distance (in ), and the percent misclassified at adversarial distance . We chose because at this magnitude attacks are still imperceptible to the human eye. We argue it is reasonable to ask that models classify images with imperceptible perturbations correctly. At attacks are perceptible, albeit only slightly. See Figure 3. We also plot the attack error curve for each model. These statistics were generated with the projected gradient attack. Table 1 and Figure 2 present results for ResNeXt-34. The best statistics are in bold.
Here we summarize our results for ResNeXt-34, the model studied with the greatest capacity, and defer results for the other models to Appendix B. Without adversarial perturbations, all ResNeXt-34 models achieve roughly 4% test error on CIFAR-10. However, the undefended (baseline, ) model achieves 54% test error at adversarial distance . Adversarial training via FGSM () reduces test error to 24.6%, whereas adversarial training () reduces test error to 13.5%. A combination of all defenses ( with ) further reduces test error to 12.1%. The models are ranked in the same order when instead measured with median adversarial distance. The model with all defenses has median adversarial distance six times that of the undefended model. FGSM () only doubles the median adversarial distance relative to the baseline undefended model. Figure 1(a) illustrates that this ranking of defenses holds over all distances of adversarial perturbations.
We observe a similar ranking on CIFAR-100. See for Figure 1(b). Unperturbed, all models achieve between 21% and 22% test error. Without adversarial defenses, ResNeXt-34 (4x32d) has a test error of 74% at adversarial distance . Adversarial training alone brings the test error down to 56.3% and 53.7%, with respectively FGSM and adversarial training. A combination of all defenses further reduces test error to 42.6%. Median adversarial distance increase from 0.05 on the undefended model to 0.14 on the model with all defenses.
In Table 1 we also report statistics measuring the model’s Lipschitz constant. The columns and give the maximum of these norms over the test/validation set. The norm of the product of weights is independent of test data, and is an upper bound on the global Lipschitz constant of the model. Employing all defenses dramatically decreases the norm of the model Jacobian on the test data, and hence improves model robustness. On CIFAR-10 the model with all defenses has Jacobian norm nearly 10 times smaller than the undefended model, whereas adversarial training only improves the Jacobian norm by a factor of three at most. On CIFAR-100, adversarial training alone does not appear to improve the norm of the Jacobian significantly. However a combination of all defenses decreases the norm of the model Jacobian by a factor of two.
In Appendix B we report results for all models and combinations of defense methods. Of the individual defenses by themselves, adversarial training ( or ) improves model robustness the most. We find adversarial training () to be more effective than FGSM (). We observe the same ranking of defense methods for AllCNN and ResNet-34. Adversarial training improves model robustness. However model robustness is further improved by adding Lipschitz regularization, which empirically decreases the Jacobian norm of the model on the test data.
Both adversarial training and Lipschitz regularization increase training time by a factor of no more than four. In contrast, adding a final layer to normalize the logits is nearly free, and consistently improves model robustness by itself.
Rather than using as the Lipschitz penalty, we also tried training models with direct estimates of the Lipschitz constant. We tried both the product of layer weight norms , and the tighter estimate . However, neither of these direct estimates were effective as regularizers. The gap between the empirical Lipschitz constant on the data (the modulus of continuity on the data), and the estimated Lipschitz constant is too large. See for example Table 1, where we report the maximum Jacobian norm and . These two statistics differ by at least four orders of magnitude. The estimate is worse, and is numerically infeasible for models with more than a few layers. For example, on the two 34-layer networks we studied, this estimate was at least , and was as large as . Another estimate of the local Lipschitz constant is available using a statistic from Extreme value theory (Weng et al. (2018)). However this estimate requires at a minimum many tens of model evaluations for each image, and so is not tractable as a Lipschitz estimate during training.
The authors thank Bill Tubbs, Alex Iannantuono and Aram Pooladian for their assistance designing the experimental pipeline. The authors acknowledge the support of a Google gift which was used to support Bilal Abbasi during a collaboration at Google Brain Montreal. Adam Oberman was partially supported by AFOSR grant FA9550-18-1-0167.
- Athalye et al. (2018) Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In Jennifer Dy and Andreas Krause (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp. 274–283, Stockholmsmässan, Stockholm Sweden, 10–15 Jul 2018. PMLR. URL http://proceedings.mlr.press/v80/athalye18a.html.
- Bartlett (1996) Peter L. Bartlett. For valid generalization the size of the weights is more important than the size of the network. In Advances in Neural Information Processing Systems 9, NIPS, Denver, CO, USA, December 2-5, 1996, pp. 134–140, 1996. URL http://papers.nips.cc/paper/1204-for-valid-generalization-the-size-of-the-weights-is-more-important-than-the-size-of-the-network.
- Boyd & Vandenberghe (2004) Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University press, 2004.
- Brendel et al. (2018) Wieland Brendel, Jonas Rauber, and Matthias Bethge. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=SyZI0GWCZ.
- Cissé et al. (2017) Moustapha Cissé, Piotr Bojanowski, Edouard Grave, Yann Dauphin, and Nicolas Usunier. Parseval networks: Improving robustness to adversarial examples. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, pp. 854–863, 2017. URL http://proceedings.mlr.press/v70/cisse17a.html.
- Devries & Taylor (2017) Terrance Devries and Graham W. Taylor. Improved regularization of convolutional neural networks with cutout. CoRR, abs/1708.04552, 2017. URL http://arxiv.org/abs/1708.04552.
- Goodfellow et al. (2018) Ian Goodfellow, Patrick McDaniel, and Nicolas Papernot. Making machine learning robust against adversarial inputs. Communications of the ACM, 61(7):56–66, June 2018. URL http://dl.acm.org/citation.cfm?doid=3234519.3134599.
- Goodfellow et al. (2014) Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. CoRR, abs/1412.6572, 2014. URL http://arxiv.org/abs/1412.6572.
- Gouk et al. (2018) Henry Gouk, Eibe Frank, Bernhard Pfahringer, and Michael Cree. Regularisation of neural networks by enforcing lipschitz continuity. CoRR, abs/1804.04368, 2018. URL http://arxiv.org/abs/1804.04368.
- Hampel et al. (2011) Frank R Hampel, Elvezio M Ronchetti, Peter J Rousseeuw, and Werner A Stahel. Robust statistics: the approach based on influence functions, volume 196. John Wiley & Sons, 2011.
- He et al. (2016) Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part IV, pp. 630–645, 2016. URL https://doi.org/10.1007/978-3-319-46493-0_38.
- Hein & Andriushchenko (2017) Matthias Hein and Maksym Andriushchenko. Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pp. 2263–2273, 2017. URL http://papers.nips.cc/paper/6821-formal-guarantees-on-the-robustness-of-a-classifier-against-adversarial-manipulation.
- Hinton et al. (2015) Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey Dean. Distilling the knowledge in a neural network. CoRR, abs/1503.02531, 2015. URL http://arxiv.org/abs/1503.02531.
- Horn et al. (1990) Roger A Horn, Roger A Horn, and Charles R Johnson. Matrix Analysis. Cambridge University Press, 1990.
- Jain et al. (2005) Anil K. Jain, Karthik Nandakumar, and Arun Ross. Score normalization in multimodal biometric systems. Pattern Recognition, 38(12):2270–2285, 2005. URL https://doi.org/10.1016/j.patcog.2005.01.012.
- Johnson & Lindenstrauss (1984) William B Johnson and Joram Lindenstrauss. Extensions of Lipschitz mappings into a Hilbert space. Contemporary Mathematics, 26(189-206):1, 1984.
- Krizhevsky & Hinton (2009) Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
- Kurakin et al. (2016) Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. Adversarial examples in the physical world. CoRR, abs/1607.02533, 2016. URL http://arxiv.org/abs/1607.02533.
- Kurakin et al. (2017) Alexey Kurakin, Ian J Goodfellow, and Samy Bengio. Nips 2017: Defense against adversarial attack. https://www.kaggle.com/c/nips-2017-defense-against-adversarial-attack, 2017.
- Lu et al. (2017) Jiajun Lu, Theerasit Issaranon, and David A. Forsyth. SafetyNet: Detecting and rejecting adversarial examples robustly. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 446–454, 2017. URL https://doi.org/10.1109/ICCV.2017.56.
- Madry et al. (2017) Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. CoRR, abs/1706.06083, 2017. URL http://arxiv.org/abs/1706.06083.
- Miyato et al. (2018a) Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. Spectral normalization for generative adversarial networks. CoRR, abs/1802.05957, 2018a. URL http://arxiv.org/abs/1802.05957.
- Miyato et al. (2018b) Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. Spectral normalization for generative adversarial networks. In International Conference on Learning Representations, 2018b. URL https://openreview.net/forum?id=B1QRgziT-.
- Moosavi-Dezfooli et al. (2016) Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. DeepFool: A simple and accurate method to fool deep neural networks. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2574–2582, 2016. URL https://doi.org/10.1109/CVPR.2016.282.
- Oberman & Calder (2018) Adam M. Oberman and Jeff Calder. Lipschitz regularized deep neural networks converge and generalize. CoRR, abs/1808.09540, 2018. URL http://arxiv.org/abs/1808.09540.
- Papernot et al. (2016a) Nicolas Papernot, Patrick D. McDaniel, Somesh Jha, Matt Fredrikson, Z. Berkay Celik, and Ananthram Swami. The limitations of deep learning in adversarial settings. In IEEE European Symposium on Security and Privacy, EuroS&P 2016, Saarbrücken, Germany, March 21-24, 2016, pp. 372–387, 2016a. URL https://doi.org/10.1109/EuroSP.2016.36.
- Papernot et al. (2016b) Nicolas Papernot, Patrick D. McDaniel, Arunesh Sinha, and Michael P. Wellman. Towards the science of security and privacy in machine learning. CoRR, abs/1611.03814, 2016b. URL http://arxiv.org/abs/1611.03814.
- Papernot et al. (2016c) Nicolas Papernot, Patrick D. McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In IEEE Symposium on Security and Privacy, SP 2016, San Jose, CA, USA, May 22-26, 2016, pp. 582–597, 2016c. URL https://doi.org/10.1109/SP.2016.41.
- Papernot et al. (2017) Nicolas Papernot, Patrick D. McDaniel, Ian J. Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, AsiaCCS 2017, Abu Dhabi, United Arab Emirates, April 2-6, 2017, pp. 506–519, 2017. URL http://doi.acm.org/10.1145/3052973.3053009.
- Rauber et al. (2017) Jonas Rauber, Wieland Brendel, and Matthias Bethge. Foolbox v0.8.0: A Python toolbox to benchmark the robustness of machine learning models. CoRR, abs/1707.04131, 2017. URL http://arxiv.org/abs/1707.04131.
- Springenberg et al. (2014) Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin A. Riedmiller. Striving for simplicity: The all convolutional net. CoRR, abs/1412.6806, 2014. URL http://arxiv.org/abs/1412.6806.
- Szegedy et al. (2013) Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. CoRR, abs/1312.6199, 2013. URL http://arxiv.org/abs/1312.6199.
- Tramèr et al. (2018) Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. Ensemble adversarial training: Attacks and defenses. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=rkZvSe-RZ.
- Tsuzuku et al. (2018) Yusuke Tsuzuku, Issei Sato, and Masashi Sugiyama. Lipschitz-margin training: Scalable certification of perturbation invariance for deep neural networks. CoRR, abs/1802.04034, 2018. URL http://arxiv.org/abs/1802.04034.
- Valentine (1945) Frederick Albert Valentine. A Lipschitz condition preserving extension for a vector function. American Journal of Mathematics, 67(1):83–93, 1945.
- Weng et al. (2018) Tsui-Wei Weng, Huan Zhang, Pin-Yu Chen, Jinfeng Yi, Dong Su, Yupeng Gao, Cho-Jui Hsieh, and Luca Daniel. Evaluating the robustness of neural networks: An extreme value theory approach. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=BkUHlMZ0b.
- Xie et al. (2017) Saining Xie, Ross B. Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. Aggregated residual transformations for deep neural networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 5987–5995, 2017. URL https://doi.org/10.1109/CVPR.2017.634.
Appendix A Model and training details
We used standard data augmentation for the CIFAR dataset, comprising of horizontal flips, and random crops of padded images, four pixels per side. We used square cutout (Devries & Taylor (2017)) of width 16 on CIFAR-10, and width 8 on CIFAR-100, but no dropout. Batch normalization was used after every convolution layer. We used SGD with an initial learning rate of 0.1, momentum set to 0.9, and a batch size of 128. CIFAR-10 was trained for 200 epochs, dropping the learning rate by a factor of five after epochs 60, 120, and 180. On CIFAR-100, networks were trained for 300 epochs, and the learning rate was dropped by a factor of 10 after epochs 150 and 225. For CIFAR-10 weight decay (Tikhonov/ regularization) was set to ; on CIFAR-100 it was .
For networks with Lipschitz regularization, the Lagrange multiplier of the excess Lipschitz term was set to . Adversarially trained models were trained with images perturbed to an distance of . We did not tune either of these hyperparameters.
For CIFAR-10, the ResNeXt architecture we used had a depth of 34 layers, cardinality 2 and width 32, with a basic residual block rather than a bottleneck. The branches (convolution groups) of the blocks were aggregated via a mean, rather than using a fully connected layer. For CIFAR-100 the architecture was the same, but had cardinality 4.
a.1 Pre- sigmoid layer
Prior to the final layer, we found inserting a sigmoid activation function improved model robustness. In this case, the sigmoid layer comprised of first batch normalization (without learnable parameters), followed by the activation function , where is a single learnable parameter, common across all layer inputs.
Appendix B Further experimental results
Here we present complete results for all regularization types, on all models and datasets considered. Because adversarial training outperforms FGSM, we only report results for the former.
|Model||variant||Without adversarial training||Adversarial training ()|
|Median||% error at||Median||% error at|
|Model||variant||Without adversarial training||Adversarial training ()|
|Model||variant||Without adversarial training||Adversarial training ()|
|Median||% error at||Median||% error at|
|Model||variant||Without adversarial training||Adversarial training ()|
|penalty||5.41||27.72||333We believe this value is an error, but we report it regardless||3.55||19.19|
Appendix C Proofs
Proof of Lemma 3.3.
Let be the -th layer of a network, with activation . Then the gradient of the -th layer is
Note that is a vector, with components, where is the number of rows of . For brevity let be the -th component of this vector, and let be the entries of .
The Jacobian is defined entry-wise. The entry of in the -th row and -th column is given by
Because each activation function is at most 1-Lipschitz, . Pulling the maximum of this term out of the matrix multiplication bounds
as desired. ∎