Improved robustness to adversarial examples using Lipschitz regularization of the loss

# Improved robustness to adversarial examples using Lipschitz regularization of the loss

Chris Finlay, Adam Oberman & Bilal Abbasi
Department of Mathematics and Statistics
McGill University
{christopher.finlay,bilal.abbasi}@mail.mcgill.ca
Bilal Abbasi completed this work during his PhD at McGill. He is now at Eidos Montréal.
###### Abstract

Adversarial training is an effective method for improving robustness to adversarial attacks. We show that adversarial training using the Fast Signed Gradient Method can be interpreted as a form of regularization. We implemented a more effective form of adversarial training, which in turn can be interpreted as regularization of the loss in the 2-norm, . We obtained further improvements to adversarial robustness, as well as provable robustness guarantees, by augmenting adversarial training with Lipschitz regularization.

Improved robustness to adversarial examples using Lipschitz regularization of the loss

 Chris Finlay, Adam Oberman & Bilal Abbasi††thanks: Bilal Abbasi completed this work during his PhD at McGill. He is now at Eidos Montréal. Department of Mathematics and Statistics McGill University Montréal, Québec, Canada {christopher.finlay,bilal.abbasi}@mail.mcgill.ca adam.oberman@mcgill.ca

## 1 Introduction

### 1.1 Contributions of this work

Adversarial training is an effective method for improving robustness to adversarial attacks. We show that adversarial training using the Fast Signed Gradient Method (Goodfellow et al., 2014) can be interpreted as regularization by the average of the 1-norm of the gradient of the loss over the data,

 J1[w]=E(x,y)∼D[ℓ(x)+ε∥∇ℓ(x)∥1] (J1)

The choice of norm for the adversarial perturbation can lead to different interpretations: using the 2-norm for adversarial training corresponds to

 J2[w]=E(x,y)∼D[ℓ(x)+ε∥∇ℓ(x)∥2] (J2)

We present theoretical justification and empirical evidence that training with () is more adversarially robust than ().

We consider Lipschitz regularization in §3. Write for the Lipschitz constant of loss of the model, . We found existing methods of Lipschitz regularization based on norms of weight matrices (Bartlett, 1996; Szegedy et al., 2013) to be ineffective. As an alternative, we consider a tractable Lipschitz regularization of the loss of the model, by taking the maximum of over the data of the norm of the gradient of the loss of the model.

 maxx∈D∥∇ℓ(x)∥2≤Lℓ∘f.

Moreover, we show in 3.2 that controls the adversarial robustness of the model. Thus we interpret adversarial training (in the 2-norm) augmented with Lipschitz regularization as minimization of the objective function

 J2−Lip[w]=E(x,y)∼D[ℓ(x)+ε∥∇ℓ(x)∥2]+λmax(x,y)∈D∥∇xℓ(x)∥2. (J2−Lip)

which we refer to as (tulip). In practice, outperforms and . For example on CIFAR-10, for a ResNeXt model, adversarial training alone reduced adversarial training error by 29% (measured at adversarial distance111Apologies for overloading ‘’ for both the loss and for norms: we hope the meaning is clear from context ) over an undefended model. In contrast, with Lipschitz regularization () reduces adversarial error by 42% over baseline. See Table 1. We trained with hyperparameters and . Other values of and may work better; we did not tune these hyperparameters. See §4 for empirical results.

Improving robustness to adversarial samples is a first step towards model verification (Szegedy et al., 2013; Goodfellow et al., 2018). However robustness guarantees to adversarial samples are difficult to obtain, since in practice it is only possible to generate suboptimal adversarial attacks.

The recent review Goodfellow et al. (2018) discusses defences against adversarial attacks and their limitations. The earliest and most successful defense is adversarial training (Szegedy et al. (2013); Goodfellow et al. (2014); Tramèr et al. (2018); Madry et al. (2017)). Top entries in a recent adversarial defence competition (Kurakin et al. (2017)) used Ensemble Adversarial Training (Tramèr et al. (2018)), where a model is adversarially trained with inputs generated by an ensemble of other models.

In adversarial training, the model, , is trained to solve the minimax problem

 minwE(x,y)∼D[max∥δ∥≤εℓ(f(x+δ;w),y)]. (1)

However in practice this problem is not computationally feasible. Instead, (1) is approximated. A popular and effective approximation is the Fast Gradient Sign Method (FGSM) (Goodfellow et al., 2014), which also defines an attack.

Other forms of defences against gradient based attacks besides adversarial training include (Papernot et al., 2017; 2016b) as well as adding stochastic noise to the model, using a non-differentiable classifier (Lu et al. (2017)), or defense distillation (Hinton et al. (2015); Papernot et al. (2016c)). Gradient based methods may be less successful against black box attacks (Brendel et al. (2018)).

Other possible defences discussed in Goodfellow et al. (2018) include input validation and preprocessing, which would potentially allow adversarial samples to be recognized before being input to the model, and architecture modifications designed to improve robustness to adversarial samples. For more information we refer to the review (Goodfellow et al. (2018)) and the discussion of attack methods in (Brendel et al. (2018)).

### 1.3 Background on Lipschitz Regularization of the model

A form of robustness guarantees for a network is provided by the global Lipschitz constant of the model. Weng et al. (2018) show that the Lipschitz constant of the model gives an certifiable minimum adversarial distance: a successful attack on image will have adversarial distance at least

 δ≥minj≠i∗fi∗(x)−fj(x)2Lf (2)

where is the Lipschitz constant of the model, , and is the correct label of . Thus training models to have small Lipschitz constant could improve adversarial robustness (Hein & Andriushchenko (2017); Tsuzuku et al. (2018)). Oberman & Calder (2018) recently showed that Lipschitz regularization leads to a proof of generalization. The Lipschitz constant of a model may be estimated using only the product of the norms of model weight matrices (Bartlett (1996); Szegedy et al. (2013)), which is independent of the data. Models have been trained using this estimate as a regularization term in (Cissé et al., 2017; Gouk et al., 2018; Miyato et al., 2018a; Tsuzuku et al., 2018).

For deep neural networks, we argue that there is a large gap between the empirical Lipschitz constant of a model on the data and the estimate of the model Lipschitz constant provided using the model weights (Bartlett, 1996), see §4.

## 2 Adversarial training and regularization

Write for the correct label and for the classifier. An adversarial attack , is a perturbation of the input which leads to incorrect classification

Adversarial attacks seek to find the minimum norm attack vector, which is an intractable problem (Athalye et al., 2018). An alternative which permits loss gradients to be used, is to consider the attack vector of a given norm which most increases the loss, .

 max∥a∥≤εℓ(f(x+a),y) (3)

### 2.1 Derivation of attack directions

The solution of (3) can be approximated using the dual norm (Boyd & Vandenberghe, 2004, A.1.6). If the -norm is used, we recover the Signed Gradient (Goodfellow et al., 2014). However a different attack vector is obtained if we measure attacks in the 2-norm.

###### Theorem 2.2.

The optimal attack vector defined by (3) in a generic norm can be approximated to with the vector , where is the solution of

 a⋅v=∥v∥∗, with v=∇xℓ(f(x),y) (4)

and is the dual norm. In particular is given by

 ⎧⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪⎩aSGi=∇ℓ(x)i|∇ℓ(x)i| for the ∞-normaℓ2=∇ℓ(x)∥∇ℓ(x)∥2 for the 2-% norm (5)
###### Proof.

Write and use the Taylor expansion of

 g(x+a)=g(x)+a⋅∇xg(x)+O(∥a∥2)

Then we can approximate (3) by solving

 max∥a∥≤ε∇xℓ(f(x),y)⋅a (6)

The value of the solution of (6) is given by the dual norm (Boyd & Vandenberghe, 2004, A.1.6) of the gradient, , and the optimal vector is then given by the -scaled solution of

 a⋅v=∥v∥∗.

In the case of the -norm, the dual norm is the 1-norm, and the solution is given by the Signed Gradient vector . In the case of the 2-norm the dual norm is itself the 2-norm and the solution of (6) is given by . ∎

The 2-norm attack vector, points in the direction of the gradient of the loss, while the signed gradient attack vector points in the direction of the optimal dual vector.

### 2.2 Interpretation of adversarial training

Adversarial training can be interpreted as minimizing

 E(x,y)∼D[ℓ(x+a(x))],where a is given by (???) (7)
###### Theorem 2.3.

Adversarial training using the attack vector (5) can be interpreted as augmenting the loss function with the regularization where

 R[ℓ]=⎧⎪⎨⎪⎩E(x,y)∼D∥∇ℓ(x)∥1,FGSME(x,y)∼D∥∇ℓ(x)∥2,2-% norm (8)
###### Proof.

The adversarial vector given by (5) combined with the Taylor expansion gives

 ℓ(x+a)=ℓ(x)+ε∥∇ℓ(x)∥+O(ε2)

Substitute the last equation into the adversarial training equation (7) to obtain

 E(x,y)∼D[ℓ(x)+ε∥∇ℓ(x)∥∗]+O(ε2)

which, up to give the regularization term (8). ∎

### 2.3 Iterative attacks based on gradient norms

Iterative attacks based on gradient ascent such as iterative FGSM (Madry et al., 2017) should be performed using the 2-norm direction , since this follows the gradient ascent curve, see Figure 0(b).

The angle between and is given by

 cosθ=∥∇ℓ∥1√n∥∇ℓ∥2,

where is the input dimension. Because , this ratio is always between zero and one. On the networks we studied, the ratio above could be as small as 0.32. To illustrate, Figure 0(b) shows the angle between iterative FGSM and the iterative gradient ascent on a toy loss (convex quadratic) in two dimensions. In practice we find iterative attacks using the steepest ascent direction are more effective than iterative FGSM based attacks, see Section 4.2.

## 3 Lipschitz Regularization

### 3.1 Evaluating the Lipschitz constant of a model

###### Definition 3.1.

The Lipschitz constant of a function is given by

 Lip2,∞(f)=maxx1≠x2∥f(x1)−f(x2)∥∞∥x1−x2∥2 (9)

When is differentiable on a closed, bounded domain, , then

 Lip(f)=maxx∥∇f(x)∥2,∞. (10)

Here for vector value functions, , the induced matrix norm must be used, based on the norms for and (Horn et al., 1990, Chapter 5.6.4). The result is standard in analysis, it follows from the Mean Value Theorem and the definition of the derivative. Using (10), we can approximate the Lipschitz constant by testing on the data

 maxx∈D∥∇f(x)∥2,∞≤Lip(f) (11)

Because the loss is a scalar, Lipschitz regularization of the loss is implemented by taking and minimizing the regularized loss function

 JLip(w)=E(x,y)∼D[ℓ(f(x;w),y)]+λmaxx∈D∥∇xℓ(f(x;w),y)∥2. (JLip)

The first term in () is the expected loss, and the second term is the approximation of the Lipschitz constant of the loss coming from (11). During training with Stochastic Gradient Descent, both terms are evaluated over mini-batches.

### 3.2 Lipschitz constant of data and optimal extensions

Define the Lipschitz constant of the data (in the norms) to be

 Lip2,∞(D)=maxx1,x2∈D{∥c∗(x1)−c∗(x2)∥∞∥x1−x2∥2 ∣∣∣ c∗(x1)≠c∗(x2)} (12)

Table 4.1 lists the Lipschitz constant of the training data for common datasets, which are all small: all by one are below 1.

The Lipschitz extension theorem (Valentine, 1945) says that given function values , there exists an extension which perfectly fits the data, and has the same Lipschitz constant, provided the appropriate norm are used on the and spaces. This can be done using, for example, the 2-norm for and the norm on the label space. In other norms, we can also make an extension, but the Lipschitz constant may increase (Johnson & Lindenstrauss, 1984). Of course, such a function may not be consistent with a given architecture.

### 3.3 Robustness guarantees from the Lipschitz constant

The following Lemma shows that the Lipschitz constant of the loss function gives a robustness guarantee for the loss incurred by an adversarial perturbation of norm . An analogous formula gives the corresponding robustness result using the Lipschitz constant of the model (2).

###### Lemma 3.2 (Stability of network).

Suppose the composed loss function is -Lipschitz continuous. Let be an adversarial perturbation of norm . Then

 ℓ(x+a(x))≤ℓ(x)+Lε (13)
###### Proof.

By Lipschitz continuity of

 |ℓ(x+a(x))−ℓ(x)|≤L∥a(x)∥=Lε (14)

There are two cases for the left-hand side, depending on the sign. In both cases we obtain (13). ∎

### 3.4 Regularization of the model versus the loss of the model

If the goal is adversarial robustness, then regularization of the loss is just as effective (empirically) as regularizing the model, at a much lower cost. Since the loss is a scalar, regularizing by the Lipschitz constant of the loss is equivalent to corresponds to regularization of the model in one direction. By the chain rule,

 ∇xℓ(f(x),y)=∇fℓ(f(x),y)∇xf(x)

For example, when is the KL divergence, and when then

 ∇xℓ(f(x),y)=(f(x)−y))∇xz(x)

Thus, in this case, regularizing corresponds to regularization of in the direction .

### 3.5 Upper bounds on the Lipschitz constant

The estimate (11) is a lower bound of the Lipschitz constant of the loss. It is well known that data independent upper bounds on the Lipschitz constant of the model are available (Bartlett, 1996) using the product of the norm of the weight matrices. See also Szegedy et al. (2013); Cissé et al. (2017); Gouk et al. (2018); Miyato et al. (2018b; a); Tsuzuku et al. (2018). Other estimates are also available, for example Weng et al. (2018) used Extreme value theory to estimate the local Lipschitz constant of a model.

Let be the weight matrix of the -th layer of a network comprised of layers, and suppose all non linearities of a network are at most 1-Lipschitz. Then via the chain rule and properties of induced matrix norms, it can be shown

 Lipp,q(f)≤N∏k=1∥Wk∥pk,pk−1 (15)

with and . Certain conditions on the ’s must be met. For a proof with see Tsuzuku et al. (2018). A similar bound is available with and . However, for deep models, we found that for the networks we studied used this bound was way to large: by a factor on the order of to .

For many networks, a tighter bound is available. Here we prove a bound on the Lipschitz constant (in the -norm), as the norm of the product of weight matrices.

###### Lemma 3.3.

Let be the last layer of model before the layer. Suppose the only non-linearities in the model up to the layer are entry-wise activation functions which are 1-Lipschitz. Let be the weight matrix of the -th layer of a network with layers. Then

 Lip2,∞(f)≤∥∥ ∥∥N∏k=0Wk∥∥ ∥∥2,∞ (16)

The proof is in §C. This is a tighter bound, but we found empirically that it is still an over-estimate of the Lipschitz constant for deep networks. Networks with other non-linearities, such as -pooling, are not captured by the lemma, but we believe a generalization is possible.

In practice, computing from a predefined network is straightforward. The following is a simple method for calculating the product of weights. Let be a linearized network defined using the weight matrices of , but without activation functions or a final . Then the product is determined by evaluating the linearized network at the identity , . If the layers also have biases, then the linearized bias is recovered by evaluating at . In this case the product of weights is given by subtracting off the linearized bias.

## 4 Empirical results

We considered two toy problems, using image classification on the CIFAR-10 and CIFAR-100 datasets (Krizhevsky & Hinton (2009)). We tested our methods on three networks, chosen to represent a broad range of architectures: AllCNN-C (Springenberg et al. (2014)), a 34 layer ResNet (He et al. (2016)), and a 34 layer ResNeXt (Xie et al. (2017)). Training and model details are provided in Appendix A.

### 4.1 Error curves for models and robustness metrics

We define the error curve of a model given an attack. The curve provides information about the robustness of a model to attacks of different norms.

###### Definition 4.1.

The error curve of the model for the attack is the probability over the test data that an attack of size leads to a misclassification

 Cerr(ε)=pD{c(x+a(x))≠c∗(x)∣ % for an attack ∥a(x)∥X≤ε} (17)

See Figure 0(a) for error curves for a given model over a range of attacks. We also plot the data stability curve, which is the probability that a perturbation can move one data point in the direction of another data point with a different label (which can be interpreted as a very weak attack).

We can compare how two different models perform against various attacks using the model error curve. In Table 1 we report and , using the 2-norm on . These values corresponded to the test error, and noise which is slightly smaller than a human perceptible perturbation (see Figure 3). We also report the median distance which corresponds to the -intercept of 50% error on the curve.

### 4.2 Attack evaluation

We attacked each model on the test/validation set using six untargeted attack methods: gradient attack; projected gradient descent (constrained in ); the Fast Gradient Sign Method (FGSM) (Goodfellow et al. (2014)); Iterative FGSM (I-FGSM) (Kurakin et al. (2016)); DeepFool (Moosavi-Dezfooli et al. (2016)); and Boundary attack (Brendel et al., 2018). The first five methods are white-box attacks, while the last is a black-box attack. I-FGSM and the projected gradient attack are iterative methods, whereas FGSM and the gradient attack are single step. All attacks were implemented with Foolbox 1.3.1 (Rauber et al. (2017)). Hyperparameters were set to Foolbox defaults, except for the Boundary attack222 The Boundary attack is a computationally demanding attack, and so due to resource constraints we ran the Boundary attack for only 500 iterations per test image. The boundary attack should have better performance with more iterations.. For each image and attack method, each attack reports an adversarial distance (in ).

On each model, dataset, and regularization method, we tested all six attack methods on the entire test/validation set. We compared attack methods using the attack error curve. For example see Figure 0(a), where we plot attack error curves for each attack method on an undefended model. The attack error curve plots the percent of misclassified test images as a function of adversarial distance. We report against Euclidean distance, , because it is an often reported measure of adversarial robustness, although other choices (MSE, distance in -norm) are equally valid. Common adversarial metrics are easily read off the attack error curve. For example, median Euclidean adversarial distance occurs at the -intercept of 50% error. The percent error given a maximum adversarial distance (for example ) is also readily available.

On all models and for all defence methods studied, projected gradient descent (constrained in ) consistently outperformed the other attack methods: Projected gradient descent had the smallest mean adversarial distance and the highest attack error curve. See Figure 0(a) for an illustration. A close second was I-FGSM, the other iterative attack method tested. The next two best attack methods were gradient attack, followed by FGSM. The Boundary attack outperformed DeepFool. We observed the same ranking of attacks on all models and defences studied. For this reason in the following section, we only report model statistics using projected gradient descent, constrained in , which could also be regarded as the strongest attack of all the attacks listed.

### 4.3 Evaluation of defence methods

Each model was trained with combinations of up to three adversarial defences. The methods were: (i) , the baseline undefended model; (ii) , adversarial training with FGSM; (iii) , adversarial training with 2-norm; each of can be augmented with Lipschitz regularization, which in the last case we call ; we also considered adding a final sigmoid layer to the network, prior to the .

The choice of sigmoid we choose is , and is inspired by (but not equivalent to) -estimators used in classical statistics as a robust estimator (Hampel et al., 2011, Chapter 2). The intuition behind this choice is to normalize the logit scores of the model, which we believe should improve robustness to outliers. Outside of deep learning, -estimators have been successfully used to normalize scores and improve robustness, for example in machine learning biometrics (Jain et al. (2005)). See Appendix A.1 for layer details.

Model robustness is evaluated on the entire test/validation set using the median adversarial distance (in ), and the percent misclassified at adversarial distance . We chose because at this magnitude attacks are still imperceptible to the human eye. We argue it is reasonable to ask that models classify images with imperceptible perturbations correctly. At attacks are perceptible, albeit only slightly. See Figure 3. We also plot the attack error curve for each model. These statistics were generated with the projected gradient attack. Table 1 and Figure 2 present results for ResNeXt-34. The best statistics are in bold.

Here we summarize our results for ResNeXt-34, the model studied with the greatest capacity, and defer results for the other models to Appendix B. Without adversarial perturbations, all ResNeXt-34 models achieve roughly 4% test error on CIFAR-10. However, the undefended (baseline, ) model achieves 54% test error at adversarial distance . Adversarial training via FGSM () reduces test error to 24.6%, whereas adversarial training () reduces test error to 13.5%. A combination of all defenses ( with ) further reduces test error to 12.1%. The models are ranked in the same order when instead measured with median adversarial distance. The model with all defenses has median adversarial distance six times that of the undefended model. FGSM () only doubles the median adversarial distance relative to the baseline undefended model. Figure 1(a) illustrates that this ranking of defenses holds over all distances of adversarial perturbations.

We observe a similar ranking on CIFAR-100. See for Figure 1(b). Unperturbed, all models achieve between 21% and 22% test error. Without adversarial defenses, ResNeXt-34 (4x32d) has a test error of 74% at adversarial distance . Adversarial training alone brings the test error down to 56.3% and 53.7%, with respectively FGSM and adversarial training. A combination of all defenses further reduces test error to 42.6%. Median adversarial distance increase from 0.05 on the undefended model to 0.14 on the model with all defenses.

In Table 1 we also report statistics measuring the model’s Lipschitz constant. The columns and give the maximum of these norms over the test/validation set. The norm of the product of weights is independent of test data, and is an upper bound on the global Lipschitz constant of the model. Employing all defenses dramatically decreases the norm of the model Jacobian on the test data, and hence improves model robustness. On CIFAR-10 the model with all defenses has Jacobian norm nearly 10 times smaller than the undefended model, whereas adversarial training only improves the Jacobian norm by a factor of three at most. On CIFAR-100, adversarial training alone does not appear to improve the norm of the Jacobian significantly. However a combination of all defenses decreases the norm of the model Jacobian by a factor of two.

In Appendix B we report results for all models and combinations of defense methods. Of the individual defenses by themselves, adversarial training ( or ) improves model robustness the most. We find adversarial training () to be more effective than FGSM (). We observe the same ranking of defense methods for AllCNN and ResNet-34. Adversarial training improves model robustness. However model robustness is further improved by adding Lipschitz regularization, which empirically decreases the Jacobian norm of the model on the test data.

Both adversarial training and Lipschitz regularization increase training time by a factor of no more than four. In contrast, adding a final layer to normalize the logits is nearly free, and consistently improves model robustness by itself.

Rather than using as the Lipschitz penalty, we also tried training models with direct estimates of the Lipschitz constant. We tried both the product of layer weight norms , and the tighter estimate . However, neither of these direct estimates were effective as regularizers. The gap between the empirical Lipschitz constant on the data (the modulus of continuity on the data), and the estimated Lipschitz constant is too large. See for example Table 1, where we report the maximum Jacobian norm and . These two statistics differ by at least four orders of magnitude. The estimate is worse, and is numerically infeasible for models with more than a few layers. For example, on the two 34-layer networks we studied, this estimate was at least , and was as large as . Another estimate of the local Lipschitz constant is available using a statistic from Extreme value theory (Weng et al. (2018)). However this estimate requires at a minimum many tens of model evaluations for each image, and so is not tractable as a Lipschitz estimate during training.

#### Acknowledgments

The authors thank Bill Tubbs, Alex Iannantuono and Aram Pooladian for their assistance designing the experimental pipeline. The authors acknowledge the support of a Google gift which was used to support Bilal Abbasi during a collaboration at Google Brain Montreal. Adam Oberman was partially supported by AFOSR grant FA9550-18-1-0167.

## References

• Athalye et al. (2018) Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In Jennifer Dy and Andreas Krause (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp. 274–283, Stockholmsmässan, Stockholm Sweden, 10–15 Jul 2018. PMLR.
• Bartlett (1996) Peter L. Bartlett. For valid generalization the size of the weights is more important than the size of the network. In Advances in Neural Information Processing Systems 9, NIPS, Denver, CO, USA, December 2-5, 1996, pp. 134–140, 1996.
• Boyd & Vandenberghe (2004) Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University press, 2004.
• Brendel et al. (2018) Wieland Brendel, Jonas Rauber, and Matthias Bethge. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. In International Conference on Learning Representations, 2018.
• Cissé et al. (2017) Moustapha Cissé, Piotr Bojanowski, Edouard Grave, Yann Dauphin, and Nicolas Usunier. Parseval networks: Improving robustness to adversarial examples. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, pp. 854–863, 2017.
• Devries & Taylor (2017) Terrance Devries and Graham W. Taylor. Improved regularization of convolutional neural networks with cutout. CoRR, abs/1708.04552, 2017.
• Goodfellow et al. (2018) Ian Goodfellow, Patrick McDaniel, and Nicolas Papernot. Making machine learning robust against adversarial inputs. Communications of the ACM, 61(7):56–66, June 2018.
• Goodfellow et al. (2014) Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. CoRR, abs/1412.6572, 2014.
• Gouk et al. (2018) Henry Gouk, Eibe Frank, Bernhard Pfahringer, and Michael Cree. Regularisation of neural networks by enforcing lipschitz continuity. CoRR, abs/1804.04368, 2018.
• Hampel et al. (2011) Frank R Hampel, Elvezio M Ronchetti, Peter J Rousseeuw, and Werner A Stahel. Robust statistics: the approach based on influence functions, volume 196. John Wiley & Sons, 2011.
• He et al. (2016) Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part IV, pp. 630–645, 2016.
• Hein & Andriushchenko (2017) Matthias Hein and Maksym Andriushchenko. Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pp. 2263–2273, 2017.
• Hinton et al. (2015) Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey Dean. Distilling the knowledge in a neural network. CoRR, abs/1503.02531, 2015.
• Horn et al. (1990) Roger A Horn, Roger A Horn, and Charles R Johnson. Matrix Analysis. Cambridge University Press, 1990.
• Jain et al. (2005) Anil K. Jain, Karthik Nandakumar, and Arun Ross. Score normalization in multimodal biometric systems. Pattern Recognition, 38(12):2270–2285, 2005.
• Johnson & Lindenstrauss (1984) William B Johnson and Joram Lindenstrauss. Extensions of Lipschitz mappings into a Hilbert space. Contemporary Mathematics, 26(189-206):1, 1984.
• Krizhevsky & Hinton (2009) Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
• Kurakin et al. (2016) Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. Adversarial examples in the physical world. CoRR, abs/1607.02533, 2016.
• Kurakin et al. (2017) Alexey Kurakin, Ian J Goodfellow, and Samy Bengio. Nips 2017: Defense against adversarial attack.
• Lu et al. (2017) Jiajun Lu, Theerasit Issaranon, and David A. Forsyth. SafetyNet: Detecting and rejecting adversarial examples robustly. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 446–454, 2017.
• Miyato et al. (2018a) Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. Spectral normalization for generative adversarial networks. CoRR, abs/1802.05957, 2018a.
• Miyato et al. (2018b) Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. Spectral normalization for generative adversarial networks. In International Conference on Learning Representations, 2018b.
• Moosavi-Dezfooli et al. (2016) Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. DeepFool: A simple and accurate method to fool deep neural networks. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2574–2582, 2016.
• Oberman & Calder (2018) Adam M. Oberman and Jeff Calder. Lipschitz regularized deep neural networks converge and generalize. CoRR, abs/1808.09540, 2018.
• Papernot et al. (2016a) Nicolas Papernot, Patrick D. McDaniel, Somesh Jha, Matt Fredrikson, Z. Berkay Celik, and Ananthram Swami. The limitations of deep learning in adversarial settings. In IEEE European Symposium on Security and Privacy, EuroS&P 2016, Saarbrücken, Germany, March 21-24, 2016, pp. 372–387, 2016a.
• Papernot et al. (2016b) Nicolas Papernot, Patrick D. McDaniel, Arunesh Sinha, and Michael P. Wellman. Towards the science of security and privacy in machine learning. CoRR, abs/1611.03814, 2016b.
• Papernot et al. (2016c) Nicolas Papernot, Patrick D. McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In IEEE Symposium on Security and Privacy, SP 2016, San Jose, CA, USA, May 22-26, 2016, pp. 582–597, 2016c.
• Papernot et al. (2017) Nicolas Papernot, Patrick D. McDaniel, Ian J. Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, AsiaCCS 2017, Abu Dhabi, United Arab Emirates, April 2-6, 2017, pp. 506–519, 2017.
• Rauber et al. (2017) Jonas Rauber, Wieland Brendel, and Matthias Bethge. Foolbox v0.8.0: A Python toolbox to benchmark the robustness of machine learning models. CoRR, abs/1707.04131, 2017.
• Springenberg et al. (2014) Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin A. Riedmiller. Striving for simplicity: The all convolutional net. CoRR, abs/1412.6806, 2014.
• Szegedy et al. (2013) Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. CoRR, abs/1312.6199, 2013.
• Tramèr et al. (2018) Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. Ensemble adversarial training: Attacks and defenses. In International Conference on Learning Representations, 2018.
• Tsuzuku et al. (2018) Yusuke Tsuzuku, Issei Sato, and Masashi Sugiyama. Lipschitz-margin training: Scalable certification of perturbation invariance for deep neural networks. CoRR, abs/1802.04034, 2018.
• Valentine (1945) Frederick Albert Valentine. A Lipschitz condition preserving extension for a vector function. American Journal of Mathematics, 67(1):83–93, 1945.
• Weng et al. (2018) Tsui-Wei Weng, Huan Zhang, Pin-Yu Chen, Jinfeng Yi, Dong Su, Yupeng Gao, Cho-Jui Hsieh, and Luca Daniel. Evaluating the robustness of neural networks: An extreme value theory approach. In International Conference on Learning Representations, 2018.
• Xie et al. (2017) Saining Xie, Ross B. Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. Aggregated residual transformations for deep neural networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 5987–5995, 2017.

## Appendix A Model and training details

We used standard data augmentation for the CIFAR dataset, comprising of horizontal flips, and random crops of padded images, four pixels per side. We used square cutout (Devries & Taylor (2017)) of width 16 on CIFAR-10, and width 8 on CIFAR-100, but no dropout. Batch normalization was used after every convolution layer. We used SGD with an initial learning rate of 0.1, momentum set to 0.9, and a batch size of 128. CIFAR-10 was trained for 200 epochs, dropping the learning rate by a factor of five after epochs 60, 120, and 180. On CIFAR-100, networks were trained for 300 epochs, and the learning rate was dropped by a factor of 10 after epochs 150 and 225. For CIFAR-10 weight decay (Tikhonov/ regularization) was set to ; on CIFAR-100 it was .

For networks with Lipschitz regularization, the Lagrange multiplier of the excess Lipschitz term was set to . Adversarially trained models were trained with images perturbed to an distance of . We did not tune either of these hyperparameters.

For CIFAR-10, the ResNeXt architecture we used had a depth of 34 layers, cardinality 2 and width 32, with a basic residual block rather than a bottleneck. The branches (convolution groups) of the blocks were aggregated via a mean, rather than using a fully connected layer. For CIFAR-100 the architecture was the same, but had cardinality 4.

### a.1 Pre-softmax sigmoid layer

Prior to the final layer, we found inserting a sigmoid activation function improved model robustness. In this case, the sigmoid layer comprised of first batch normalization (without learnable parameters), followed by the activation function , where is a single learnable parameter, common across all layer inputs.

## Appendix B Further experimental results

Here we present complete results for all regularization types, on all models and datasets considered. Because adversarial training outperforms FGSM, we only report results for the former.

## Appendix C Proofs

###### Proof of Lemma 3.3.

Let be the -th layer of a network, with activation . Then the gradient of the -th layer is

 ∇lk(x)=diag(σ′k(Wklk−1(x)))Wk∇lk−1(x) (18)

Note that is a vector, with components, where is the number of rows of . For brevity let be the -th component of this vector, and let be the entries of .

The Jacobian is defined entry-wise. The entry of in the -th row and -th column is given by

 [∇u]in,i0 =∑i1,…,in−1(N∏k=1dkik(x)wkik,ik−1) (19) =∑i1,…,in−1(N∏k=1dkik(x))(N∏k=1wkik,ik−1) (20)

Because each activation function is at most 1-Lipschitz, . Pulling the maximum of this term out of the matrix multiplication bounds

 maxx∥∇u(x)∥2,∞≤∥∥ ∥∥N∏k=1Wk∥∥ ∥∥2,∞ (21)

as desired. ∎

You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters