Towards Stable and Efficient Training of VerifiablyRobust Neural Networks

Towards Stable and Efficient Training of Verifiably
Robust Neural Networks

Huan Zhang1  Hongge Chen2  Chaowei Xiao3  Sven Gowal4  Robert Stanforth4
  Bo Li5  Duane Boning2  Cho-Jui Hsieh1
1
UCLA  2 MIT  3 University of Michigan  4 DeepMind  5 UIUC
huan@huan-zhang.com, chenhg@mit.edu, xiaocw@umich.edu
sgowal@google.com, stanforth@google.com
lbo@illinois.edu, boning@mtl.mit.edu, chohsieh@cs.ucla.edu
Work partially done during an internship at DeepMind.
Abstract

Training neural networks with verifiable robustness guarantees is challenging. Several existing approaches utilize linear relaxation based neural network output bounds under perturbation, but they can slow down training by a factor of hundreds depending on the underlying network architectures. Meanwhile, interval bound propagation (IBP) based training is efficient and significantly outperforms linear relaxation based methods on many tasks, yet it may suffer from stability issues since the bounds are much looser especially at the beginning of training. In this paper, we propose a new certified adversarial training method, CROWN-IBP, by combining the fast IBP bounds in a forward bounding pass and a tight linear relaxation based bound, CROWN, in a backward bounding pass. CROWN-IBP is computationally efficient and consistently outperforms IBP baselines on training verifiably robust neural networks. We conduct large scale experiments on MNIST and CIFAR datasets, and outperform all previous linear relaxation and bound propagation based certified defenses in robustness. Notably, we achieve 7.02% verified test error on MNIST at , and 66.94% on CIFAR-10 with .

1 Introduction

The success of deep neural networks (DNNs) has motivated their deployment in some safety-critical environments, such as autonomous driving and facial recognition systems. Applications in these areas make understanding the robustness and security of deep neural networks urgently needed, especially their resilience under malicious, finely crafted inputs. Unfortunately, the performance of DNNs are often so brittle that even imperceptibly modified inputs, also known as adversarial examples, are able to completely break the model (goodfellow2014explaining; szegedy2013intriguing). The robustness of DNNs under adversarial examples is well-studied from both attack (crafting powerful adversarial examples) and defence (making the model more robust) perspectives (athalye2018obfuscated; carlini2017adversarial; carlini2017towards; goodfellow2014explaining; madry2018towards; papernot2016distillation; xiao2019meshadv; xiao2018generating; xiao2018spatially; eykholt2018robust; chen2018attacking; xu2018structured; zhang2019limitations). Recently, it has been shown that defending against adversarial examples is a very difficult task, especially under strong and adaptive attacks. Early defenses such as distillation (papernot2016distillation) have been broken by stronger attacks like C&W (carlini2017towards). Many defense methods have been proposed recently (guo2017countering; song2017pixeldefend; buckman2018thermometer; ma2018characterizing; samangouei2018defense; xiao2018characterizing; xiao2019advit), but their robustness improvement cannot be certified – no provable guarantees can be given to verify their robustness. In fact, most of these uncertified defenses become vulnerable under stronger attacks (athalye2018obfuscated; he2017adversarial).

Several recent works in the literature seeking to give provable guarantees on the robustness performance, such as linear relaxations (wong2018provable; mirman2018differentiable; wang2018mixtrain; dvijotham2018training; weng2018towards; zhang2018crown), interval bound propagation (mirman2018differentiable; gowal2018effectiveness), ReLU stability regularization (xiao2018training), and distributionally robust optimization (sinha2018certifying) and semidefinite relaxations (raghunathan2018certified; dvijothamefficient2019). Linear relaxations of neural networks, first proposed by wong2018provable, is one of the most popular categories among these certified defences. They use the dual of linear programming or several similar approaches to provide a linear relaxation of the network (referred to as a “convex adversarial polytope”) and the resulting bounds are tractable for robust optimization. However, these methods are both computationally and memory intensive, and can increase model training time by a factor of hundreds. On the other hand, interval bound propagation (IBP) is a simple and efficient method for training verifiable neural networks (gowal2018effectiveness), which achieved state-of-the-art verified error on many datasets. However, since the IBP bounds are very loose during the initial phase of training, the training procedure can be unstable and sensitive to hyperparameters.

In this paper, we first discuss the strengths and weakness of existing linear relaxation based and interval bound propagation based certified robust training methods. Then we propose a new certified robust training method, CROWN-IBP, which marries the efficiency of IBP and the tightness of a linear relaxation based verification bound, CROWN (zhang2018crown). CROWN-IBP bound propagation involves a IBP based fast forward bounding pass, and a tight convex relaxation based backward bounding pass (CROWN) which scales linearly with the size of neural network output and is very efficient for problems with low output dimensions. Additional, CROWN-IBP provides flexibility for exploiting the strengths of both IBP and convex relaxation based verifiable training methods.

The efficiency, tightness and flexibility of CROWN-IBP allow it to outperform state-of-the-art methods for training verifiable neural networks with robustness under all settings on MNIST and CIFAR-10 datasets. In our experiment, on MNIST dataset we reach and IBP verified error under distortions and , respectively, outperforming the state-of-the-art baseline results by IBP (8.55% and 15.01%). On CIFAR-10, at , CROWN-IBP decreases the verified error from (IBP) to and matches convex relaxation based methods; at a larger , CROWN-IBP outperforms all other methods with a noticeable margin.

2 Related Work and Background

2.1 Robustness Verification and Relaxations of Neural Networks

Neural network robustness verification algorithms seek for upper and lower bounds of an output neuron for all possible inputs within a set , typically a norm bounded perturbation. Most importantly, the margins between the ground-truth class and any other classes determine model robustness. However, it has already been shown that finding the exact output range is a non-convex problem and NP-complete (katz2017reluplex; weng2018towards). Therefore, recent works resorted to giving relatively tight but computationally tractable bounds of the output range with necessary relaxations of the original problem. Many of these robustness verification approaches are based on linear relaxations of non-linear units in neural networks, including CROWN (zhang2018crown), DeepPoly (Singh2019robustness), Fast-Lin (weng2018towards), DeepZ (singh2018fast) and Neurify (wang2018efficient). We refer the readers to (salman2019convex) for a comprehensive survey on this topic. After linear relaxation, they bound the output of a neural network by linear upper/lower hyper-planes:

(1)

where a row vector is the product of the network weight matrices and diagonal matrices reflecting the ReLU relaxations for output neuron ; and are two bias terms unrelated to . Additionally, dvijotham2018dual; dvijotham18verification; qin2018verification solve the Lagrangian dual of verification problem; raghunathan2018certified; raghunathan2018semidefinite; dvijothamefficient2019 propose semidefinite relaxations which are tighter compared to linear relaxation based methods, but computationally expensive. Bounds on neural network local Lipschitz constant can also be used for verification (zhang2018recurjac; hein2017formal). Besides these deterministic verification approaches, randomized smoothing can be used to certify the robustness of any model in a probabilistic manner (cohen2019certified; salman2019provably; lecuyer2018certified; li2018second).

2.2 Robust Optimization and Verifiable Adversarial Defense

To improve the robustness of neural networks against adversarial perturbations, a natural idea is to generate adversarial examples by attacking the network and then use them to augment the training set (kurakin2016adversarial). More recently, madry2018towards showed that adversarial training can be formulated as solving a minimax robust optimization problem as in (2). Given a model with parameter , loss function , and training data distribution , the training algorithm aims to minimize the robust loss, which is defined as the maximum loss within a neighborhood of each data point , leading to the following robust optimization problem:

(2)

madry2018towards proposed to use projected gradient descent (PGD) to approximately solve the inner max and then use the loss on the perturbed example to update the model. Networks trained by this procedure achieve state-of-the-art test accuracy under strong attacks (athalye2018obfuscated; wang2018mixtrain; zheng2018distributionally). Despite being robust under strong attacks, models obtained by this PGD-based adversarial training do not have verified error guarantees. Due to the nonconvexity of neural networks, PGD attack can only compute the lower bound of robust loss (the inner maximization problem). Minimizing a lower bound of the inner max cannot guarantee (2) is minimized. In other words, even if PGD-attack cannot find a perturbation with large loss, that does not mean there exists no such perturbation. This becomes problematic in safety-critical applications since those models need certified safety.

Verifiable adversarial training methods, on the other hand, aim to obtain a network with good robustness that can be verified efficiently. This can be done by combining adversarial training and robustness verification—instead of using PGD to find a lower bound of inner max, certified adversarial training uses a verification method to find an upper bound of the inner max, and then update the parameters based on this upper bound of robust loss. Minimizing an upper bound of the inner max guarantees to minimize the robust loss. There are two certified robust training methods that are related to our work and we describe them in detail below.

Linear Relaxation Based Verifiable Adversarial Training.

One of the most popular verifiable adversarial training method was proposed in (wong2018provable) using linear relaxations of neural networks to give an upper bound of the inner max. Other similar approaches include mirman2018differentiable; wang2018mixtrain; dvijotham2018training. Since the bound propagation process of a convex adversarial polytope is too expensive, several methods were proposed to improve its efficiency, like Cauchy projection (wong2018scaling) and dynamic mixed training (wang2018mixtrain). However, even with these speed-ups, the training process is still slow. Also, this method may significantly reduce a model’s standard accuracy (accuracy on natural, unmodified test set). As we will demonstrate shortly, we find that this method tends to over-regularize the network during training, which is harmful for obtaining good accuracy.

Interval Bound Propagation (IBP).

Interval Bound Propagation (IBP) uses a very simple rule to compute the pre-activation outer bounds for each layer of the neural network. Unlike linear relaxation based methods, IBP does not relax ReLU neurons and does not consider the correlations between neurons of different layers, yielding much looser bounds. mirman2018differentiable proposed a variety of abstract domains to give sound over-approximations for neural networks, including the “Box/Interval Domain” (referred to as IBP in gowal2018effectiveness) and showed that it could scale to much larger networks than other works (raghunathan2018certified) could at the time. gowal2018effectiveness demonstrated that IBP could outperform many state-of-the-art results by a large margin with more precise approximations for the last linear layer and better training schemes. However, IBP can be unstable to use and hard to tune in practice, since the bounds can be very loose especially during the initial phase of training, posing a challenge to the optimizer. To mitigate instability, gowal2018effectiveness use a mixture of regular and minimax robust cross-entropy loss as the model’s training loss.

3 Methodology

Notation.

We define an -layer feed-forward neural network recursively as:

where , represents input dimension and is the number of classes, is an element-wise activation function. We use to represent pre-activation neuron values and to represent post-activation neuron values. Consider an input example with ground-truth label , we define a set of and we desire a robust network to have the property for all . We define element-wise upper and lower bounds for and as and .

Verification Specifications.

Neural network verification literature typically defines a specification vector , that gives a linear combination for neural network output: . In robustness verification, typically we set where is the ground truth class label, where is the attack target label and other elements in are 0. This represents the margin between class and class . For an class classifier and a given label , we define a specification matrix as:

(3)

Importantly, each element in vector gives us margins between class and all other classes. We define the lower bound of for all as , which is a very important quantity: when all elements of , is verifiably robust for any perturbation with norm less than . can be obtained by a neural network verification algorithm, such as convex adversarial polytope, IBP, or CROWN. Additionally, wong2018provable showed that for cross-entropy (CE) loss:

(4)

(4) gives us the opportunity to solve the robust optimization problem (2) via minimizing this tractable upper bound of inner-max. This guarantees that is also minimized.

Dataset ( norm) CAP verified error CROWN verified error IBP verified error
MNIST 0.1 8.90% 7.05% 5.83%
0.2 45.37% 24.17% 7.37%
0.3 97.77% 65.26% 10.68%
0.4 99.98% 99.57% 16.76%
Fashion-MNIST 0.1 44.64% 36.85% 23.49%
CIFAR-10 2/255 62.94% 60.83% 58.75%
8/255 91.44% 82.68% 73.34%
Table 1: IBP trained models have low IBP verified errors but when verified with a typically much tighter bound, including convex adversarial polytope (CAP) (wong2018scaling) and CROWN (zhang2018crown), the verified errors increase significantly. CROWN is generally tighter than convex adversarial polytope however the gap between CROWN and IBP is still large, especially at large . We used a 4-layer CNN network for all datasets to compute these bounds.1

3.1 Analysis of IBP and Linear Relaxation based Verifiable Training Methods

11footnotetext: We implemented CROWN with efficient CNN support on GPUs: https://github.com/huanzhang12/CROWN-IBP

Interval Bound Propagation (IBP)

Interval Bound Propagation (IBP) uses a simple bound propagation rule. For the input layer we set element-wise. For affine layers we have:

(5)
(6)

where takes element-wise absolute value. Note that and 222For inputs bounded with general norms, IBP can be applied as long as this norm can be converted to per-neuron intervals after the first affine layer. For example, for norms () Hölder’s inequality can be applied at the first affine layer to obtain and , and IBP rule for later layers do not change.. And for element-wise monotonic increasing activation functions ,

(7)

We found that IBP can be viewed as training a simple augmented ReLU network which is friendly to optimizers (see Appendix A for more discussions). We also found that a network trained using IBP can obtain good verified errors when verified using IBP, but it can get much worse verified errors using linear relaxation based verification methods, including convex adversarial polytope (CAP) by wong2018provable (equivalently, Fast-Lin by weng2018towards) and CROWN (zhang2018crown). Table 1 demonstrates that this gap can be very large on large .

However, IBP is a very loose bound during the initial phase of training, which makes training unstable and hard to tune; purely using IBP frequently leads to divergence. gowal2018effectiveness proposed to use a schedule where is gradually increased during training, and a mixture of robust cross-entropy loss with natural cross-entropy loss as the objective to stabilize training:

(8)

Issues with linear relaxation based training.

Since IBP hugely outperforms linear relaxation based methods in the recent work (gowal2018effectiveness) in many settings, we want to understand what is going wrong with linear relaxation based methods. We found that, empirically, the norm of the weights in the models produced by linear relaxation based methods such as (wong2018provable) and (wong2018scaling) does not change or even decreases during training.

Figure 1: Verified error and 2nd CNN layer’s induced norm for a model trained using (wong2018scaling) and CROWN-IBP. is increased from 0 to 0.3 in 60 epochs.

In Figure 1 we train a small 4-layer MNIST model and we linearly increase from 0 to 0.3 in 60 epochs. We plot the induced norm of the 2nd CNN layer during the training process of CROWN-IBP and (wong2018scaling). The norm of weight matrix using  (wong2018scaling) does not increase. When becomes larger (roughly at , epoch 40), the norm even starts to decrease slightly, indicating that the model is forced to learn smaller norm weights. Meanwhile, the verified error also starts to ramp up possibly due to the lack of capacity. We conjecture that linear relaxation based training over-regularizes the model, especially at a larger . However, in CROWN-IBP, the norm of weight matrices keep increasing during the training process, and verifiable error does not significantly increase when reaches 0.3.

Another issue with current linear relaxation based training or verification methods is their high computational and memory cost, and poor scalability. For the small network in Figure 1, convex adversarial polytope (with 50 random Cauchy projections) is 8 times slower and takes 4 times more memory than CROWN-IBP (without using random projections). Convex adversarial polytope scales even worse for larger networks; see Appendix J for a comparison.

3.2 The proposed algorithm: CROWN-IBP

Overview.

We have reviewed IBP and linear relaxation based methods above. As shown in gowal2018effectiveness, IBP performs well at large with much smaller verified error, and also efficiently scales to large networks; however, it can be sensitive to hyperparameters due to its very imprecise bound at the beginning phase of training. On the other hand, linear relaxation based methods can give tighter lower bounds at the cost of high computational expenses, but it over-regularizes the network at large and forbids us to achieve good standard and verified accuracy. We propose CROWN-IBP, a new certified defense where we optimize the following problem ( represents the network parameters):

(9)

where our lower bound of margin is a combination of two bounds with different natures: IBP, and a CROWN-style bound (which will be detailed below); is the cross-entropy loss. Note that the combination is inside the loss function and is thus still a valid lower bound; thus (4) still holds and we are within the minimax robust optimization theoretical framework. Similar to IBP and TRADES (zhang2019theoretically), we use a mixture of natural and robust training loss with parameter , allowing us to explicitly trade-off between clean accuracy and verified accuracy.

In a high level, the computation of the lower bounds of CROWN-IBP () consists of IBP bound propagation in a forward bounding pass and CROWN-style bound propagation in a backward bounding pass. We discuss the details of CROWN-IBP algorithm below.

Forward Bound Propagation in CROWN-IBP.

In CROWN-IBP, we first obtain and for all layers by applying (5), (6) and (7). Then we will obtain (assuming is merged into ). The time complexity is comparable to two forward propagation passes of the network.

Linear Relaxation of ReLU neurons

Given and computed in the previous step, we first check if some neurons are always active () or always inactive (), since they are effectively linear and no relaxations are needed. For the remaining unstable neurons, zhang2018crown; wong2018provable give a linear relaxation for ReLU activation function:

(10)

where ; zhang2018crown propose to adaptively select when and 0 otherwise, which minimizes the relaxation error. Following (10), for an input vector , we effectively replace the ReLU layer with a linear layer, giving upper or lower bounds of the output:

(11)

where and are two diagonal matrices representing the “weights” of the relaxed ReLU layer. Other general activation functions can be supported similarly. In the following we focus on conceptually presenting the algorithm, while more details of each term can be found in the Appendix.

Backward Bound Propagation in CROWN-IBP.

Unlike IBP, CROWN-style bounds start bounding from the last layer, so we refer to it as backward bound propagation (not to be confused with the back-propagation algorithm to obtain gradients). Suppose we want to obtain the lower bound (we assume the specification matrix has been merged into ). The input to layer is , which can be bounded linearly by Eq. (11). CROWN-style bounds choose the lower bound of (LHS of (11)) when is positive, and choose the upper bound otherwise.We then merge and the linearized ReLU layer together and define:

(12)

Now we have a lower bound where collects all terms not related to . Note that the diagonal matrix implicitly depends on . Then, we merge with the next linear layer, which is straight forward by plugging in :

Then we continue to unfold the next ReLU layer using its linear relaxations, and compute a new matrix, with in a similar manner as in (12). Along with the bound propagation process, we need to compute a series of matrices, , where , and . At this point, we merged all layers of the network into a linear layer: where collects all terms not related to . A lower bound for with can then be easily given as

(13)

For ReLU networks, convex adversarial polytope (wong2018provable) uses a very similar bound propagation procedure. CROWN-style bounds allow an adaptive selection of in (10), thus often gives better bounds (e.g., see Table 1). We give details on each term in Appendix L.

Computational Cost.

Ordinary CROWN (zhang2018crown) and convex adversarial polytope (wong2018provable) use (13) to compute all intermediate layer’s and (), by considering as the final layer of the network. For each layer , we need a different set of matrices, defined as . This causes three computational issues:

  • [wide, labelwidth=!, labelindent=0pt]

  • Unlike the last layer , an intermediate layer typically has a much larger output dimension thus all have large dimensions .

  • Computation of all matrices is expensive. Suppose the network has neurons for all intermediate and input layers and neurons for the output layer (assuming ), the time complexity of ordinary CROWN or convex adversarial polytope is . A ordinary forward propagation only takes time per example, thus ordinary CROWN does not scale up to large networks for training, due to its quadratic dependency in and extra times overhead.

  • When both and represent convolutional layers with small kernel tensors and , there are no efficient GPU operations to form the matrix using and . Existing implementations either unfold at least one of the convolutional kernels to fully connected weights, or use sparse matrices to represent and . They suffer from poor hardware efficiency on GPUs.

In CROWN-IBP, we use IBP to obtain bounds of intermediate layers, which takes only twice the regular forward propagate time (), thus we do not have the first and second issues. The time complexity of the backward bound propagation in CROWN-IBP is , only times slower than forward propagation and significantly more scalable than ordinary CROWN (which is times slower than forward propagation, where typically ). The third convolution issue is also not a concern, since we start from the last specification layer which is a small fully connected layer. Suppose we need to compute and is a convolutional layer with kernel , we can efficiently compute on GPUs using the transposed convolution operator with kernel , without unfolding any convoluational layers. Conceptually, the backward pass of CROWN-IBP propagates a small specification matrix backwards, replacing affine layers with their transposed operators, and activation function layers with a diagonal matrix product. This allows efficient implementation and better scalability.

Benefits of CROWN-IBP.

Tightness, efficiency and flexibility are unique benefits of CROWN-IBP:

  • [wide, labelwidth=!, labelindent=0pt]

  • CROWN-IBP is based on CROWN, a tight linear relaxation based lower bound which can greatly improve the quality of bounds obtained by IBP to guide verifiable training and improve stabability;

  • CROWN-IBP avoids the high computational cost of convex relaxation based methods : the time complexity is reduced from to , well suited to problems where the output size is much smaller than input and intermediate layers’ sizes; also, there is no quadratic dependency on . Thus, CROWN-IBP is efficient on relatively large networks;

  • The objective (9) is strictly more general than IBP and allows the flexibility to exploit the strength from both IBP (good for large ) and convex relaxation based methods (good for small ). We can slowly decrease to 0 during training to avoid the over-regularization problem, yet keeping the initial training of IBP more stable by providing a much tighter bound; we can also keep which helps to outperform convex relaxation based methods in small regime (e.g., on CIFAR-10).

4 Experiments

Models and training schedules.

We evaluate CROWN-IBP on three models that are similar to the models used in (gowal2018effectiveness) on MNIST and CIFAR-10 datasets with different perturbation norms. Here we denote the small, medium and large models in gowal2018effectiveness as DM-small, DM-medium and DM-large. During training, we first warm up (regular training without robust loss) for a fixed number of epochs and then increase from 0 to using a ramp-up schedule of epochs. Similar techniques are also used in many other works (wong2018scaling; wang2018mixtrain; gowal2018effectiveness). For both IBP and CROWN-IBP, a natural cross-entropy (CE) loss with weight (as in Eq (9)) may be added, and is scheduled to linearly decrease from to within ramp-up epochs. gowal2018effectiveness used and . To understand the trade-off between verified accuracy and standard (clean) accuracy, we explore two more settings: (without natural CE loss) and , . For , a linear schedule during the ramp-up period is used, but we always set and , except that we set for CIFAR-10 at . Detailed model structures and hyperparameters are in Appendix C. Our training code for IBP and CROWN-IBP, and pre-trained models are publicly available 333TensorFlow implementation and pre-trained models: https://github.com/deepmind/interval-bound-propagation/
PyTorch implementation and pre-trained models: https://github.com/huanzhang12/CROWN-IBP
.

Metrics.

Verified error is the percentage of test examples where at least one element in the lower bounds is . It is an guaranteed upper bound of test error under any perturbations. We obtain using IBP or CROWN-IBP (Eq. 13). We also report standard (clean) errors and errors under 200-step PGD attack. PGD errors are lower bounds of test errors under perturbations.

Comparison to IBP.

Table 2 represents the standard, verified and PGD errors under different for each dataset with different settings. We test CROWN-IBP on the same model structures in Table 1 of gowal2018effectiveness. These three models’ architectures are presented in Table A in the Appendix. Here we only report the DM-large model structure in as it performs best under all setttings; small and medium models are deferred to Table C in the Appendix. When both , no natural CE loss is added and the model focuses on minimizing verified error, but the lack of natural CE loss may lead to unstable training, especially for IBP; the , setting emphasizes on minimizing standard error, usually at the cost of slightly higher verified error rates. , typically achieves the best balance. We can observe that under the same settings, CROWN-IBP outperforms IBP in both standard error and verified error. The benefits of CROWN-IBP is significant especially when model is large and is large. We highlight that CROWN-IBP reduces the verified error rate obtained by IBP from 8.21% to 7.02% on MNIST at and from 55.88% to 46.03% on CIFAR-10 at (it is the first time that an IBP based method outperforms results from (wong2018scaling), and our model also has better standard error). We also note that we are the first to obtain verifiable bound on CIFAR-10 at .

Figure 2: Standard and verified errors of IBP and CROWN-IBP with different and values.

Trade-off Between Standard Accuracy and Verified Accuracy.

To show the trade-off between standard and verified accuracy, we evaluate DM-large CIFAR-10 model with under different settings, while keeping all other hyperparameters unchanged. For each , we uniformly choose 11 while keeping all other hyper-parameters unchanged. A larger or tends to produce better standard errors, and we can explicitly control the trade-off between standard accuracy and verified accuracy. In Figure 2 we plot the standard and verified errors of IBP and CROWN-IBP trained models with different settings. Each cluster on the figure has 11 points, representing 11 different values. Models with lower verified errors tend to have higher standard errors. However, CROWN-IBP clearly outperforms IBP with improvement on both standard and verified accuracy, and pushes the Pareto front towards the lower left corner, indicating overall better performance. To reach the same verified error of 70%, CROWN-IBP can reduce standard error from roughly 55% to 45%.

\adjustbox

max width=0.95 Dataset   ( norm)   Training Method   schedules   Model errors (%)   Best errors reported in literature (%)          Standard Verified PGD   Source Standard Verified MNIST     IBP   0 0   1.13 2.89 2.24   gowal2018effectiveness *   1 0.5   1.08 2.75 2.02   dvijotham2018training 1.2 4.44   1 0   1.14 2.81 2.11   xiao2018training     CROWN-IBP   0 0   1.17 2.36 1.91   wong2018scaling 1.08 3.67     1 0.5   0.95 2.38 1.77   mirman2018differentiable 1.0 3.4     1 0   1.17 2.24 1.81       IBP   0 0   3.45 6.46 6.00   gowal2018effectiveness *     1 0.5   2.12 4.75 4.24   xiao2018training     1 0   2.74 5.46 4.89       CROWN-IBP   0 0   2.84 5.15 4.90       1 0.5   1.82 4.13 3.81       1 0   2.17 4.31 3.99       IBP   0 0   3.45 9.76 8.42   gowal2018effectiveness 1.66 8.21*     1 0.50   2.12 8.47 6.78   wong2018scaling 14.87 43.1     1 0   2.74 8.73 7.37   xiao2018training 2.67 19.32     CROWN-IBP   0 0   2.84 7.65 6.90       1 0.5   1.82 7.02 6.05       1 0   2.17 7.03 6.12       IBP   0 0   3.45 16.19 12.73   gowal2018effectiveness 1.66 15.01*     1 0.5   2.12 15.37 11.05       1 0   2.74 14.80 11.14       CROWN-IBP   0 0   2.84 12.74 10.39       1 0.5   1.82 12.59 9.58       1 0   2.17 12.06 9.47   CIFAR-10     IBP   0 0   38.54 55.21 49.72   gowal2018effectiveness 29.84 55.88* 1 0.5   33.77 58.48 50.54   mirman2018differentiable 38.0 47.8 1 0   39.22 55.19 50.40   wong2018scaling 31.72 46.11   CROWN-IBP   0 0   28.48 46.03 40.28   xiao2018training 38.88 54.07     1 0.5   26.19 50.53 40.24       1 0   28.91 46.43 40.27       IBP   0 0   59.41 71.22 68.96   gowal2018effectiveness 50.51 (68.44)   1 0.5   49.01 72.68 68.14   dvijotham2018training 51.36 73.33   1 0   58.43 70.81 68.73   xiao2018training 59.55 79.73   CROWN-IBP   0 0   54.02 66.94 65.42   wong2018scaling 71.33 78.22   1 0.5   45.47 69.55 65.74   mirman2019provable 59.8 76.8   1 0   55.27 67.76 65.71       IBP   0 0   68.97 78.12 76.66   None, but our best verified test error (76.80%) and standard test error (66.06%) are both better than wong2018scaling at , despite our being twice larger.   1 0.5   59.46 80.85 76.97     1 0   68.88 78.91 76.95     CROWN-IBP   0 0   67.17 77.27 75.76     1 0.5   56.73 78.20 74.87     1 0   66.06 76.80 75.23    

  • Verified errors reported in Table 4 of gowal2018effectiveness are evaluated using mixed integer programming (MIP) and linear programming (LP), which are strictly smaller than IBP verified errors but computationally expensive. For a fair comparison, we use the IBP verified errors reported in their Table 3.

  • According to direct communications with gowal2018effectiveness, achieving the 68.44% IBP verified error requires to adding an extra PGD adversarial training loss. Without adding PGD, the verified error is 72.91% (LP/MIP verified) or 73.52% (IBP verified). Our result should be compared to 73.52%.

  • Although not explicitly mentioned, the CIFAR-10 models in (gowal2018effectiveness) are trained using . We thus follow their settings.

  • We use for this setting, and thus CROWN-IBP bound () is used to evaluate the verified error.

Table 2: The verified, standard (clean) and PGD attack errors for models trained using IBP and CROWN-IBP on MNIST and CIFAR-10. We only present performance on model DM-large here due to limited space (see Table C for a full comparison). CROWN-IBP outperforms IBP under all settings, and achieves state-of-the-art performance on both MNIST and CIFAR datasets for all .

Training Stability.

To discourage hand-tuning on a small set of models and demonstrate the stability of CROWN-IBP over a broader range of models, we evaluate IBP and CROWN-IBP on a variety of small and medium sized model architectures (18 for MNIST and 17 for CIFAR-10), detailed in Appendix D. To evaluate training stability, we compare verified errors under different ramp-up schedule length ( on CIFAR-10 and on MNIST) and different settings. Instead of reporting just the best model, we compare the best, worst and median verified errors over all models. Our results are presented in Figure 3: (a) is for MNIST with ; (c),(d) are for CIFAR with . We can observe that CROWN-IBP achieves better performance consistently under different schedule length. In addition, IBP with cannot stably converge on all models when schedule is short; under other settings, CROWN-IBP always performs better. We conduct additional training stability experiments on MNIST and CIFAR-10 dataset under other model and settings and the observations are similar (see Appendix H).

(a) MNIST, , best

  (b) CIFAR, , best (c) CIFAR, , best   

Figure 3: Verified error vs. schedule length on 8 medium MNIST models and 8 medium CIFAR-10 models. The solid bars show median values of verified errors. except for the setting. The upper and lower ends of an error bar are the worst and best verified error, respectively. For each schedule length, three color groups represent three different settings.

5 Conclusions

We propose a new certified defense method, CROWN-IBP, by combining the fast interval bound propagation (IBP) bound and a tight linear relaxation based bound, CROWN. Our method enjoys high computational efficiency provided by IBP while facilitating the tight CROWN bound to stabilize training under the robust optimization framework, and provides the flexibility to trade-off between the two. Our experiments show that CROWN-IBP consistently outperforms other IBP baselines in both standard errors and verified errors and achieves state-of-the-art verified test errors for robustness.

References

Appendix A IBP as a Simple Augmented Network

Despite achieving great success, it is still an open question why IBP based methods significantly outperform convex relaxation based methods, despite the fact that convex relaxations usually provide significantly tighter bounds. We conjecture that IBP performs better because the bound propagation process can be viewed as a ReLU network with the same depth as the original network, and the IBP training process is effectively training this equivalent network for standard accuracy, as explained below.

Given a fixed neural network (NN) , IBP gives a very loose estimation of the output range of . However, during training, since the weights of this NN can be updated, we can equivalently view IBP as an augmented neural network, which we denote as an IBP-NN (Figure A). Unlike a usual network which takes an input with label , IBP-NN takes two points and as inputs (where , element-wisely). The bound propagation process can be equivalently seen as forward propagation in a specially structured neural network, as shown in Figure A. After the last specification layer (typically merged into ), we can obtain . Then, is sent to softmax layer for prediction. Importantly, since (as the -th row in is always 0), the top-1 prediction of the augmented IBP network is if and only if all other elements of are positive, i.e., the original network will predict correctly for all . When we train the augmented IBP network with ordinary cross-entropy loss and desire it to predict correctly on an input , we are implicitly doing robust optimization (Eq. (2)).

Figure A: Interval Bound Propagation viewed as training an augmented neural network (IBP-NN). The inputs of IBP-NN are two images and . The output of IBP-NN is a vector of lower bounds of margins (denoted as ) between ground-truth class and all classes (including the ground-truth class itself) for all . This vector is negated and sent into a regular softmax function to get model prediction. The top-1 prediction of softmax is correct if and only if all margins between the ground-truth class and other classes (except the ground truth class) are positive, i.e., the model is verifiably robust. Thus, an IBP-NN with low standard error guarantees low verified error on the original network.

The simplicity of IBP-NN may help a gradient based optimizer to find better solutions. On the other hand, while the computation of convex relaxation based bounds can also be cast as an equivalent network (e.g., the “dual network” in wong2018provable), its construction is significantly more complex, and sometimes requires non-differentiable indicator functions (the sets , and in wong2018provable). As a consequence, it can be challenging for the optimizer to find a good solution, and the optimizer tends to making the bounds tighter naively by reducing the norm of weight matrices and over-regularizing the network, as demonstrated in Figure 1.

Appendix B Tightness comparison between IBP and CROWN-IBP

Both IBP and CROWN-IBP produce lower bounds , and a larger lower bound has better quality. To measure the relative tightness of the two bounds, we take the average of all bounds of training examples:

A positive value indicates that CROWN-IBP is tighter than IBP. In Figure B we plot this averaged bound differences during schedule for one MNIST model and one CIFAR-10 model. We can observe that during the early phase of training when the schedule just starts, CROWN-IBP produces significantly better bounds than IBP. A tighter lower bound gives a tighter upper bound for , making the minimax optimization problem (2) more effective to solve. When the training schedule proceeds, the model gradually learns how to make IBP bounds tighter and eventually the difference between the two bounds become close to 0.

Figure B: Bound differences between IBP and CROWN-IBP for DM-large models during training. The bound difference is only computed during the schedule (epoch 10 to 60 for MNIST, and 320 to 1920 for CIFAR-10), as we don’t compute CROWN-IBP bounds in warmup period and after schedule.

Why CROWN-IBP stabilizes IBP training?

When taking a randomly initialized network or a naturally trained network, IBP bounds are very loose. But in Table 1, we show that a network trained using IBP can eventually obtain quite tight IBP bounds and high verified accuracy; the network can adapt to IBP bounds and learn a specific set of weights to make IBP tight and also correctly classify examples. However, since the training has to start from weights that produce loose bounds for IBP, the beginning phase of IBP training can be challenging and is vitally important.

We observe that IBP training can have a large performance variance across models and initializations. Also IBP is more sensitive to hyper-parameter like or schedule length; in Figure 3, many IBP models converge sub-optimally (large worst/median verified error). The reason for instability is that during the beginning phase of training, the loose bounds produced by IBP make the robust loss (9) ineffective, and it is challenging for the optimizer to reduce this loss and find a set of good weights that produce tight IBP verified bounds in the end.

Conversely, if our bounds are much tighter at the beginning, the robust loss (9) always remains in a reasonable range during training, and the network can gradually learn to find a good set of weights that make IBP bounds increasingly tighter (this is obvious in Figure B). Initially, tighter bounds can be provided by a convex relaxation based method like CROWN, and they are gradually replaced by IBP bounds (using ), eventually leading to a model with learned tight IBP bounds in the end.

Appendix C Models and Hyperparameters for comparison to IBP

The goal of these experiments is to reproduce the performance reported in (gowal2018effectiveness) and demonstrate the advantage of CROWN-IBP under the same experimental settings. Specifically, to reproduce the IBP results, for CIFAR-10 we train using a large batch size and long training schedule on TPUs (we can also replicate these results on multi-GPUs using a reasonable amount of training time; see Section F). Also, for this set of experiments we use the same code base as in gowal2018effectiveness. For model performance on a comprehensive set of small and medium sized models trained on a single GPU, please see Table D in Section F, as well as the training stability experiments in Section 4 and Section H.

The models structures (DM-small, DM-medium and DM-large) used in Table C and Table 2 are listed in Table A. These three model structures are the same as in gowal2018effectiveness. Training hyperparameters are detailed below:

  • For MNIST IBP baseline results, we follow exact the same set of hyperparameters as in (gowal2018effectiveness). We train 100 epochs (60K steps) with a batch size of 100, and use a warm-up and ramp-up duration of 2K and 10K steps. Learning rate for Adam optimizer is set to and decayed by 10X at steps 15K and 25K. Our IBP results match their reported numbers. Note that we always use IBP verified errors rather than MIP verified errors. We use the same schedule for CROWN-IBP with () in Table C and Table 2. For , this schedule can obtain verified error rates 4.22%, 7.01% and 12.84% at using the DM-Large model, respectively.

  • For MNIST CROWN-IBP with in Table C and Table 2, we train 200 epochs with a batch size of 256. We use Adam optimizer and set learning rate to . We warm up with 10 epochs’ regular training, and gradually ramp up from 0 to in 50 epochs. We reduce the learning rate by 10X at epoch 130 and 190. Using this schedule, IBP’s performance becomes worse (by about 1-2% in all settings), but this schedule improves verified error for CROWN-IBP at from 12.84% to to 12.06% and does do affect verified errors at other levels.

  • For CIFAR-10, we follow the setting in gowal2018effectiveness and train 3200 epochs on 32 TPU cores. We use a batch size of 1024, and a learning rate of . We warm up for 320 epochs, and ramp-up for 1600 epochs. Learning rate is reduced by 10X at epoch 2600 and 3040. We use random horizontal flips and random crops as data augmentation, and normalize images according to per-channel statistics. Note that this schedule is slightly different from the schedule used in (gowal2018effectiveness); we use a smaller batch size due to TPU memory constraints (we used TPUv2 which has half memory capacity as TPUv3 used in (gowal2018effectiveness)), and also we decay learning rates later. We found that this schedule improves both IBP baseline performance and CROWN-IBP performance by around 1%; for example, at , this improved schedule can reduce verified error from 73.52% to 72.68% for IBP baseline (, ) using the DM-Large model.

Hyperparameter and .

We use a linear schedule for both hyperparameters, decreasing from to while increasing from to . The schedule length is set to the same length as the schedule.

In both IBP and CROWN-IBP, a hyperparameter is used to trade-off between clean accuracy and verified accuracy. Figure 2 shows that can significantly affect the trade-off, while has minor impacts compared to . In general, we recommend and as a safe starting point, and we can adjust to a larger value if a better standard accuracy is desired. The setting (pure minimax optimization) can be challenging for IBP as there is no natural loss as a stabilizer; under this setting CROWN-IBP usually produces a model with good (sometimes best) verified accuracy but noticeably worse standard accuracy (on CIFAR-10 the difference can be as large as 10%), so this setting is only recommended when a model with best verified accuracy is desired at a cost of noticeably reduced standard accuracy.

Compared to IBP, CROWN-IBP adds one additional hyperparameter, . has a clear meaning: balancing between the convex relaxation based bounds and the IBP bounds. is always set to 1 as we want to use CROWN-IBP to obtain tighter bounds to stabilize the early phase of training when IBP bounds are very loose; determines if we want to use a convex relaxation based bound () or IBP based bound () after the schedule. Thus, we set for the case where convex relaxation based method (wong2018scaling) can outperform IBP (e.g., CIFAR-10 , and for the case where IBP outperforms convex relaxation based bounds. We do not tune or grid-search this hyperparameter.

DM-Small DM-Medium DM-Large
Conv Conv Conv
Conv Conv Conv
FC Conv Conv
Conv Conv
FC Conv
FC FC
Table A: Model structures from gowal2018effectiveness. “Conv ” represents a 2D convolutional layer with filters of size using a stride of in both dimensions. “FC n” = fully connected layer with outputs. Last fully connected layer is omitted. All networks use ReLU activation functions.

Appendix D Hyperparameters and Model Structures for Training Stability Experiments

In all our training stability experiments, we use a large number of relatively small models and train them on a single GPU. These small models cannot achieve state-of-the-art performance but they can be trained quickly and cheaply, allowing us to explore training stability over a variety of settings, and report min, median and max statistics. We use the following hyperparameters:

  • For MNIST, we train 100 epochs with batch size 256. We use Adam optimizer and the learning rate is . The first epoch is standard training for warming up. We gradually increase linearly per batch in our training process with a schedule length of 60. We reduce the learning rate by 50% every 10 epochs after schedule ends. No data augmentation technique is used and the whole 28 28 images are used (normalized to 0 - 1 range).

  • For CIFAR, we train 200 epoch with batch size 128. We use Adam optimizer and the learning rate is 0.1%. The first 10 epochs are standard training for warming up. We gradually increase linearly per batch in our training process with a schedule length of 120. We reduce the learning rate by 50% every 10 epochs after schedule ends. We use random horizontal flips and random crops as data augmentation. The three channels are normalized with mean (0.4914, 0.4822, 0.4465) and standard deviation (0.2023, 0.1914, 0.2010). These numbers are per-channel statistics from the training set used in (gowal2018effectiveness).

All verified error numbers are evaluated on the test set using IBP, since the networks are trained using IBP ( after reaches the target ), except for CIFAR where we set to compute the CROWN-IBP verified error.

Table B gives the 18 model structures used in our training stability experiments. These model structures are designed by us and are not used in gowal2018effectiveness. Most CIFAR-10 models share the same structures as MNIST models (unless noted on the table) except that their input dimensions are different. Model A is too small for CIFAR-10 thus we remove it for CIFAR-10 experiments. Models A - J are the “small models” reported in Figure 3. Models K - T are the “medium models” reported in Figure 3. For results in Table 1, we use a small model (model structure B) for all three datasets. These MNIST, CIFAR-10 models can be trained on a single NVIDIA RTX 2080 Ti GPU within a few hours each.


\adjustbox

max width= Name Model Structure (all models have a last FC 10 layer, which are omitted) A (MNIST Only) Conv 4 +2, Conv 8 +2, FC 128 B Conv 8 +2, Conv 16 +2, FC 256 C Conv 4 +1, Conv 8 +1, Conv 8 +4, FC 64 D Conv 8 +1, Conv 16 +1, Conv 16 +4, FC 128 E Conv 4 +1, Conv 8 +1, Conv 8 +4, FC 64 F Conv 8 +1, Conv 16 +1, Conv 16 +4, FC 128 G Conv 4 +1, Conv 4 +2, Conv 8 +1, Conv 8 +2, FC 256, FC 256 H Conv 8 +1, Conv 8 +2, Conv 16 +1, Conv 16 +2, FC 256, FC 256 I Conv 4 +1, Conv 4 +2, Conv 8 +1, Conv 8 +2, FC 512, FC 512 J Conv 8 +1, Conv 8 +2, Conv 16 +1, Conv 16 +2, FC 512, FC 512 K Conv 16 +1, Conv 16 +2, Conv 32 +1, Conv 32 +2, FC 256, FC 256 L Conv 16 +1, Conv 16 +2, Conv 32 +1, Conv 32 +2, FC 512, FC 512 M Conv 32 +1, Conv 32 +2, Conv 64 +1, Conv 64 +2, FC 512, FC 512 N Conv 64 +1, Conv 64 +2, Conv 128 +1, Conv 128 +2, FC 512, FC 512 O(MNIST Only) Conv 64 +1, Conv 128 +1, Conv 128 +4, FC 512 P(MNIST Only) Conv 32 +1, Conv 64 +1, Conv 64 +4, FC 512 Q Conv 16 +1, Conv 32 +1, Conv 32 +4, FC 512 R Conv 32 +1, Conv 64 +1, Conv 64 +4, FC 512 S(CIFAR-10 Only) Conv 32 +2, Conv 64 +2, FC 128 T(CIFAR-10 Only) Conv 64 +2, Conv 128 +2, FC 256

Table B: Model structures used in our training stability experiments. We use ReLU activations for all models. We omit the last fully connected layer as its output dimension is always 10. In the table, “Conv ” represents to a 2D convolutional layer with filters of size and a stride of . Model A - J are referred to as “small models” and model K to T are referred to as “medium models”.

Appendix E Omitted Results on DM-Small and DM-Medium Models

In Table 2 we report results from the best DM-Large model. Table C presents the verified, standard (clean) and PGD attack errors for all three model structures used in (gowal2018effectiveness) (DM-Small, DM-Medium and DM-Large) trained on MNIST and CIFAR-10 datasets. We evaluate IBP and CROWN-IBP under the same three settings as in Table 2. We use hyperparameters detailed in Section C to train these models. We can see that given any model structure and any setting, CROWN-IBP consistently outperforms IBP.

\adjustbox

max width=1.00 Dataset   ( norm)   Training Method   schedules   DM-small model’s err. (%)   DM-medium model’s err. (%)   DM-large model’s err. (%)          Standard Verified PGD      Standard Verified PGD    Standard Verified PGD MNIST     IBP   0 0   1.92 4.16 3.88   1.53 3.26 2.82   1.13 2.89 2.24   1 0.5   1.68 3.60 3.34   1.46 3.20 2.57   1.08 2.75 2.02   1 0   2.14 4.24 3.94   1.48 3.21 2.77   1.14 2.81 2.11     CROWN-IBP   0 0   1.90 3.50 3.21   1.44 2.77 2.37   1.17 2.36 1.91     1 0.5   1.60 3.51 3.19   1.14 2.64 2.23   0.95 2.38 1.77     1 0   1.67 3.44 3.09   1.34 2.76 2.39   1.17 2.24 1.81     IBP   0 0   5.08 9.80 9.36   3.68 7.38 6.77   3.45 6.46 6.00     1 0.5   3.83 8.64 8.06   2.55 5.84 5.33   2.12 4.75 4.24     1 0   6.25 11.32 10.84   3.89 7.21 6.68   2.74 5.46 4.89     CROWN-IBP   0 0   3.78 6.61 6.40   3.84 6.65 6.42   2.84 5.15 4.90     1 0.5   2.96 6.11 5.74   2.37 5.35 4.90   1.82 4.13 3.81     1 0   3.55 6.29 6.13   3.16 5.82 5.44   2.17 4.31 3.99     IBP   0 0   5.08 14.42 13.30   3.68 10.97 9.66   3.45 9.76 8.42     1 0.50   3.83 13.99 12.25   2.55 9.51 7.87   2.12 8.47 6.78     1 0   6.25 16.51 15.07   3.89 10.4 9.17   2.74 8.73 7.37     CROWN-IBP   0 0   3.78 9.60 8.90   3.84 9.25 8.57   2.84 7.65 6.90     1 0.5   2.96 9.44 8.26   2.37 8.54 7.74   1.82 7.02 6.05     1 0   3.55 9.40 8.50   3.16 8.62 7.65   2.17 7.03 6.12     IBP   0 0   5.08 23.40 20.15   3.68 18.34 14.75   3.45 16.19 12.73     1 0.5   3.83 24.16 19.97   2.55 16.82 12.83   2.12 15.37 11.05     1 0   6.25 26.81 22.78   3.89 16.99 13.81   2.74 14.80 11.14     CROWN-IBP   0 0   3.78 15.21 13.34   3.84 14.58 12.69   2.84 12.74 10.39     1 0.5   2.96 16.04 12.91   2.37 14.97 12.47   1.82 12.59 9.58     1 0   3.55 15.55 13.11   3.16 14.19 11.31   2.17 12.06 9.47 CIFAR-10   4 3   IBP   0 0   44.66 56.38 54.15   39.12 53.86 49.77   38.54 55.21 49.72 1 0.5   38.90 57.94 53.64   34.19 56.24 49.63   33.77 58.48 50.54 1 0   44.08 56.32 54.16   39.30 53.68 49.74   39.22 55.19 50.40   CROWN-IBP   0 0   39.43 53.93 49.16   32.78 49.57 44.22   28.48 46.03 40.28   1 0.5   34.08 54.28 51.17   28.63 51.39 42.43   26.19 50.53 40.24   1 0   38.15 52.57 50.35   33.17 49.82 44.64   28.91 46.43 40.27   3   IBP   0 0   61.91 73.12 71.75   61.46 71.98 70.07   59.41 71.22 68.96   1 0.5   54.01 73.04 70.54   50.33 73.58 69.57   49.01 72.68 68.14   1 0   62.66 72.25 70.98   61.61 72.60 70.57   58.43 70.81 68.73   CROWN-IBP   0 0   59.94 70.76 69.65   59.17 69.00 67.60   54.02 66.94 65.42   1 0.5   53.12 73.51 70.61   48.51 71.55 67.67   45.47 69.55 65.74   1 0   60.84 72.47 71.18   58.19 68.94 67.72   55.27 67.76 65.71   3   IBP   0 0   70.02 78.86 77.67   67.55 78.65 76.92   68.97 78.12 76.66   1 0.5   63.43 81.58 78.81   60.07 81.01 77.32   59.46 80.85 76.97   1 0   67.73 78.71 77.52   70.28 79.26 77.43   68.88 78.91 76.95   CROWN-IBP   0 0   67.42 78.41 76.86   68.06 77.92 76.89   67.17 77.27 75.76   1 0.5   61.47 79.62 77.13   59.56 79.30 76.43   56.73 78.20 74.87   1 0   68.75 78.71 77.91   67.94 78.46 77.21   66.06 76.80 75.23