CNNCert: An Efficient Framework for Certifying Robustness of Convolutional Neural Networks
Abstract
Verifying robustness of neural network classifiers has attracted great interests and attention due to the success of deep neural networks and their unexpected vulnerability to adversarial perturbations. Although finding minimum adversarial distortion of neural networks (with ReLU activations) has been shown to be an NPcomplete problem, obtaining a nontrivial lower bound of minimum distortion as a provable robustness guarantee is possible. However, most previous works only focused on simple fullyconnected layers (multilayer perceptrons) and were limited to ReLU activations. This motivates us to propose a general and efficient framework, CNNCert, that is capable of certifying robustness on general convolutional neural networks. Our framework is general – we can handle various architectures including convolutional layers, maxpooling layers, batch normalization layer, residual blocks, as well as general activation functions; our approach is efficient – by exploiting the special structure of convolutional layers, we achieve up to 17 and 11 times of speedup compared to the stateoftheart certification algorithms (e.g. FastLin, CROWN) and 366 times of speedup compared to the dualLP approach while our algorithm obtains similar or even better verification bounds. In addition, CNNCert generalizes stateoftheart algorithms e.g. FastLin and CROWN. We demonstrate by extensive experiments that our method outperforms stateoftheart lowerboundbased certification algorithms in terms of both bound quality and speed.
CNNCert: An Efficient Framework for Certifying Robustness of Convolutional Neural Networks
Akhilan Boopathy^{1}, TsuiWei Weng^{1}, PinYu Chen^{2}, Sijia Liu^{2} and Luca Daniel^{1} ^{1}Massachusetts Institute of Technology, Cambridge, MA 02139 ^{2} MITIBM Watson AI Lab, IBM Research
Copyright © 2019, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Introduction
Method  Nontrivial bound  Multilayer  Scalability & Efficiency  Beyond ReLU  Exploit CNN structure  Pooling and other struc. 

Reluplex (?), Planet (?)  ✓  ✓  
Global Lipschitz constant (?)  ✓  ✓  ✓  ✓  
Local Lipschitz constant (?)  ✓  ✓  differentiable  
SDP approach (?)  ✓  ✓  
Dual approach (?)  ✓  ✓  ✓  
Dual approach (?)  ✓  ✓  codes not yet released  ✓  ✓  
Fastlin / Fastlip (?)  ✓  ✓  ✓  
CROWN (?)  ✓  ✓  ✓  ✓  
CNNCert (This work)  ✓  ✓  ✓  ✓  ✓  ✓ 
Recently, studies on adversarial robustness of stateoftheart machine learning models, particularly neural networks (NNs), have received great attention due to interests in model explainability (?) and rapidly growing concerns on security implications (?). Take object recognition as a motivating example, imperceptible adversarial perturbations of natural images can be easily crafted to manipulate the model predictions, known as predictionevasive adversarial attacks. One widelyused threat model to quantify the attack strengths is the normball bounded attacks, where the distortion between an original example and the corresponding adversarial example is measured by the norm of their difference in realvalued vector representations (e.g., pixel values for images or embeddings for texts). Popular norm choices are (?), (?), and (?).
The methodology of evaluating model robustness against adversarial attacks can be divided into two categories: gamebased or verificationbased. Gamebased approaches measure the success in mitigating adversarial attacks via mounting empirical validation against a (selfchosen) set of attacks. However, many defense methods have shown to be broken or bypassed by attacks that are adaptive to these defenses under the same threat model (?; ?), and therefore their robustness claims may not extend to untested attacks. On the other hand, verificationbased approaches provide certified defense against any possible attacks under a threat model. In the case of an normball bounded threat model, a verified robustness certificate means the (top1) model prediction on the input data cannot be altered if the attack strength (distortion measured by norm) is smaller than . Different from gamebased approaches, verification methods are attackagnostic and hence can formally certify robustness guarantees, which is crucial to securitysensitive and safetycritical applications.
Although verificationbased approaches can provide robustness certification, finding the minimum distortion (i.e., the maximum certifiable robustness) of NNs with ReLU activations has been shown to be an NPcomplete problem (?). While minimum distortion can be attained in small and shallow networks (?; ?; ?; ?), these approaches are not even scalable to moderatesized NNs. Recent works aim to circumvent the scalability issue by efficiently solving a nontrivial lower bound on the minimum distortion (?; ?; ?). However, existing methods may lack generality in supporting different network architectures and activation functions. In addition, current methods often deal with convolutional layers by simply converting back to fullyconnected layers, which may lose efficiency if not fully optimized with respect to the NNs, as demonstrated in our experiments. To bridge this gap, we propose CNNCert, a general and efficient verification framework for certifying robustness of a broad range of convolutional neural networks (CNNs). The generality of CNNCert enables robustness certification of various architectures, including convolutional layers, maxpooling layers batch normalization layers and residual blocks, and general activation functions. The efficiency of CNNCert is optimized by exploiting the convolution operation. A full comparison of verificationbased methods is given in Table 1.
We highlight the contributions of this paper as follows.

CNNCert is general – it can certify robustness on general CNNs with various building blocks, including convolutional/pooling/batchnorm layers and residual blocks, as well as general activation functions such as ReLU, tanh, sigmoid and arctan. Other variants can easily be incorporated. Moreover, certification algorithms FastLin (?) and CROWN (?) are special cases of CNNCert.

CNNCert is computationally efficient – the cost is similar to forwardpropagation as opposed to NPcompleteness in formal verification methods, e.g. Reluplex (?). Extensive experiments show that CNNCert achieves up to 17 times of speedup compared to stateoftheart certification algorithms FastLin and up to 366 times of speedup compared to dualLP approaches while CNNCert obtains similar or even better verification bounds.
Background and Related Work
Adversarial Attacks and Defenses.
In the whitebox setting where the target model is entirely transparent to an adversary, recent works have demonstrated adversarial attacks on machine learning applications empowered by neural networks, including object recognition (?), image captioning (?), machine translation (?), and graph learning (?). Even worse, adversarial attacks are still plausible in the blackbox setting, where the adversary is only allowed to access the model output but not the model internals (?; ?; ?; ?). For improving the robustness of NNs, adversarial training with adversarial attacks is by far one of the most effective strategies that showed strong empirical defense performance (?; ?). In addition, verificationbased methods have validated that NNs with adversarial training can indeed improve robustness (?; ?).
Robustness Verification for Neural Networks.
Under the normball bounded threat model, for NNs with ReLU activation functions, although the minimum adversarial distortion gives the best possible certified robustness, solving it is indeed computationally intractable due to its NPcompleteness complexity (?). Alternatively, solving a nontrivial lower bound of the minimum distortion as a provable robustness certificate is a more promising option but at the cost of obtaining a more conservative robustness certificate. Some analytical lower bounds depending solely on model weights can be derived (?; ?; ?; ?) but they are in general too loose to be useful or limited to 1 or 2 hidden layers. The robustness of NNs can be efficiently certified on ReLU activation (?; ?) and general activation (?) but mostly on models with fullyconnected layers. (?) can also be applied to different activation functions but their bound quality might decrease a lot as a tradeoff between computational efficiency due to its ‘anytime‘ property. This paper falls within this line of research with an aim of providing both a general and efficient certification framework for CNNs (see Table 1 for detailed comparisons).
Threat model, minimum adversarial distortion and certified lower bound .
Throughout this paper, we consider the normball bounded threat model with full access to all the model parameters. Given an input image and a neural network classifier , let be the class where predicts for . The minimum distortion is the smallest perturbation that results in , and . A certified lower bound satisfies the following: (i) and (ii) for all and , . In other words, a certified bound guarantees a region (an ball with radius ) such that the classifier decision can never be altered for all possible perturbations in that region. Note that is also known as untargeted robustness, and the targeted robustness is defined as satisfying (i) but with (ii) slightly modified as and , given some targeted class .
CNNCert: A General and Efficient Framework for Robustness Certification
Blocks  

(i) ActConv Block  
(ii) Residual Block  
(iv) Pooling Block  at location :  
Note 1: denotes filter coordinate indices and denotes output tensor indices.  
Note 2: are all tensors. contains only the positive, negative entries of with other entries equal 0.  
Note 3: for pooling block are slightly different. Please see Appendix (c) for details. 
Overview of our results.
In this section, we present a general and efficient framework CNNCert for computing certified lower bounds of minimum adversarial distortion with general activation functions in CNNs. We derive the range of network output in closedform by applying a pair of linear upper/lower bound on the neurons (e.g. the activation functions, the pooling functions) when the input of the network is perturbed with noises bounded in norm (). Our framework can incorporate general activation functions and various architectures – particularly, we provide results on convolutional layers with activations (a.k.a Actconv block), maxpooling layers (a.k.a. Pooling block), residual blocks (a.k.a. Residual block) and batch normalization layers (a.k.a. BN block). In addition, we show that the stateoftheart FastLin algorithm (?) and CROWN (?) are special cases under the CNNCert framework.
General framework
When an input data point is perturbed within an ball with radius , we are interested in the change of network output because this information can be used to find a certified lower bound of minimum adversarial distortion (as discussed in the section Computing certified lower bound ). Toward this goal, the first step is to derive explicit output bounds for the neural network classifiers with various popular building blocks, as shown in Figure 1, Table 2 and Table 9 (with general strides and padding). The fundamental idea of our method is to apply linear bounding techniques separately on the nonlinear operations in the neural networks, e.g. the nonlinear activation functions, residual blocks and pooling operations. Our proposed techniques are general and allow efficient computations of certified lower bounds. We begin the formal introduction to CNNCert by giving notations and intuitions of deriving explicit bounds for each building block followed by the descriptions of utilizing such explicit bounds to compute certified lower bounds in our proposed framework.
Notations.
Let be a neural network classifier function and be an input data point. We use to denote the coordinatewise activation function in the neural networks. Some popular choices of include ReLU: , hyperbolic tangent: , sigmoid: and arctan: . The symbol denotes the convolution operation and denotes the output of th layer building block, which is a function of an input . We use superscripts to denote index of layers and subscripts to denote upper bound (), lower bound () and its corresponding building blocks (e.g. act is short for activation, conv is short for convolution, res is short for residual block, bn is short for batch normalization and pool is short for pooling). Sometimes subscripts are also used to indicate the element index in a vector/tensor, which is selfcontent. We will often write as for simplicity and we will sometimes use to denote the output of the classifier, i.e. . Note that the weights , bias , input and the output of each layer are tensors since we consider a general CNN in this paper.
(i) Tackling the nonlinear activation functions and convolutional layer.
For the convolutional layer with an activation function , let be the input of activation layer and be the output of convolutional layer. The input/output relation is as follows:
(1) 
Given the range of , we can bound the range of by applying two linear bounds on each activation function :
(2) 
When the input is in the range of , the parameters can be chosen appropriately based on ’s lower bound and upper bound . If we use (2) and consider the signs of the weights associated with the activation functions, it is possible to show that the output in (1) can be bounded as follows:
(3)  
(4) 
where are constant tensors related to weights and bias as well as the corresponding parameters in the linear bounds of each neuron. See Table 2 for full results. Note the bounds in (3) and (4) are elementwise inequalities and we leave the derivations in the Appendix (a). On the other hand, if is also the output of convolutional layer, i.e.
thus the bounds in (3) and (4) can be rewritten as follows:
(5) 
and similarly
(6) 
by letting , , and , . Observe that the form of the upper bound in (5) and lower bound in (6) becomes the same convolution form again as (1). Therefore, for a neural network consists of convolutional layers and activation layers, the above technique can be used iteratively to obtain the final upper and lower bounds of the output in terms of the input of neural network in the following convolutional form:
In fact, the above framework is very general and is not limited to the convolutionactivation building blocks. The framework can also incorporate popular residual blocks, pooling layers and batch normalization layers, etc. The key idea is to derive linear upper bounds and lower bounds for each building block in the form of (3) and (4), and then plug in the corresponding bounds and backpropagate to the previous layer.
(ii) Tackling the residual blocks operations.
For the residual block, let denote the output of residual block (before activation) and be the output of first convolutional layer and be the input of residual block. The input/output relation is as follows:
Similar to the linear bounding techniques for upwrapping the nonlinear activation functions, the output of residual block can be bounded as:
where are constant tensors related to weights , , bias , , and the corresponding parameters in the linear bounds of each neuron; see Table 2 for details. Note that in Table 2, all indices are shifted from to . The full derivations are provided in the Appendix (b).
(iii) Tackling the batch normalization.
The batch normalization layer performs operations of scaling and shifting during inference time. Let be the output and be the input, the input/output relation is the following:
where , are the learned training parameters and , are the running average of the batch mean and variance during training. Thus, it is simply scaling and shifting on both upper bounds and lower bounds:
where and .
(iv) Tackling the pooling operations.
Let and be the output and input of the pooling layer. For maxpooling operations, the input/output relation is the following:
where denotes the pooled input index set associated with the th output. When the input is bounded in the range , it is possible to bound the output by linear functions as follows:
where are constant tensors related to and . For average pooling operation, the range of the output is simply the the average of and on the corresponding pooling indices. See Table 2 and derivation details in Appendix (c).
Computing global bounds and of network output .
Let be the output of a th layer neural network classifier. We have shown that when the input of each building block is bounded and lies in the range of some , then the output of the building block can be bounded by two linear functions in the form of input convolution. Since a neural network can be regarded as a cascade of building blocks – the input of current building block is the output of previous building block – we can propagate the bounds from the last building block that relates the network output backward to the first building block that relates the network input . A final upper bound and lower bound connect the network output and input are in the following linear relationship:
(7) 
Recall that the input is constrained within an ball centered at input data point and with radius . Thus, maximizing (minimizing) the righthand side (lefthand side) of (7) over leads to a global upper (lower) bound of th output :
(8)  
(9) 
where is norm and with .
Computing certified lower bound .
Recall that the predicted class of input data is and let be a targeted class. Given the magnitude of largest input perturbation , we can check if the output by applying the global bounds derived in (8) and (9). In other words, given an , we will check the condition if . If the condition is true, we can increase ; otherwise decrease . Thus, the largest certified lower bound can be attained by a bisection on . Note that although there is an explicit term in (8) and (9), they are not a linear function in because all the intermediate bounds of depend on . Fortunately, we can still find numerically via the aforementioned bisection method. On the other hand, also note that the derivation of output bounds in each building block depend on the range of the building block input (say ), which we call the intermediate bounds. The value of intermediate bounds can be computed similarly by treating as the final output of the subnetwork which consists of all building blocks before layer and deriving the corresponding in (7). Thus, all the intermediate bounds also have the same explicit forms as (8) and (9) but substituted by its corresponding .
Discussions: FastLin and CROWN are special cases of CNNCert.
FastLin (?) and CROWN (?) are special cases of CNNCert. In FastLin, two linear bounds with the same slope (i.e. in (2)) are applied on the ReLU activation while in CROWN and CNNCert different slopes are possible ( and can be different). However, both FastLin and CROWN only consider fullyconnected layers (MLP) while CNNCert can handle various building blocks and architectures such as residual blocks, pooling blocks and batch normalization blocks and is hence a more general framework. We show in Table 13 (appendix) that when using the same linear bounds in ReLU activations, CNNCert obtains the same robustness certificate as CROWN; meanwhile, for the general activations, CNNCert uses more accurate linear bounds and thus achieves better certificate quality up to 260% compared with CROWN (if we use exactly the same linear bounds, then CNNCert and CROWN indeed get the same certificate). Note that in all cases, CNNCert is much faster than CROWN (2.511.4 speedup) due to the advantage of explicit convolutional bounds in CNNCert.
Discussion: CNNCert is computationally efficient.
CNNCert has a similar cost to forwardpropagation for general convolutional neural networks – it takes polynomial time, unlike algorithms that find the exact minimum adversarial distortion such as Reluplex (?) which is NPcomplete. As shown in the experiment sections, CNNCert demonstrates an empirical speedup as compared to (a) the original versions of FastLin (b) an optimized sparse matrix versions of FastLin (by us) and (c) DualLP approaches while maintaining similar or better certified bounds (the improvement is around 820 %). For a pure CNN network with layers, by filter size, filters per layer, input size by, and stride by, the time complexity of CNNCert is . The equivalent fully connected network requires time to certify.
Discussion: Trainingtime operations are independent of CNNCert.
Since CNNCert is certifying the robustness of a fixed classifier at the testing time, techniques that only apply to the training phase, such as dropout, will not affect the operation of CNNCert (though the given model to be certified might vary if model weights differ).
Experiments
We conduct extensive experiments comparing CNNCert with other lowerbound based verification methods on 5 classes of networks: (I) pure CNNs; (II) general CNNs (ReLU) with pooling and batch normalization; (III) residual networks (ReLU); (IV) general CNNs and residual networks with nonReLU activation functions; (V) small MLP models. Due to page constraints, we refer readers to the appendix for additional results. Our codes are available at https://github.com/AkhilanB/CNNCert.
Comparative Methods.

Certification algorithms: (i) FastLin provides certificate on ReLU networks (?); (ii) GlobalLips provides certificate using global Lipschitz constant (?); (iii) DualLP solves dual problems of the LP formulation in (?), and is the best result that (?) can achieve, although it might not be attainable due to the anytime property; (iv) Reluplex (?) obtains exact minimum distortion but is computationally expensive.

Robustness estimation, Attack methods: (i) CLEVER (?) is a robustness estimation score without certification; (ii) CW/EAD are attack methods (?; ?).

Our methods: CNNCertRelu is CNNCert with the same linear bounds on ReLU used in FastLin, while CNNCertAda uses adaptive bounds all activation functions. CNNs are converted into equivalent MLP networks before evaluation for methods that only support MLP networks.
Implementations, Models and Dataset. CNNCert is implemented with Python (numpy with numba) and we also implement a version of FastLin using sparse matrix multiplication for comparison with CNNCert since convolutional layers correspond to sparse weight matrices. Experiments are conducted on a AMD Zen server CPU. We evaluate CNNCert and other methods on CNN models trained on the MNIST, CIFAR10 and tiny Imagenet datasets. All pure convolutional networks use 3by3 convolutions. The general 7layer CNNs use two max pooling layers and uses 32 and 64 filters for two convolution layers each. LeNet uses a similar architecture to LeNet5 (?), with the nopooling version applying the same convolutions over larger inputs. The residual networks (ResNet) evaluated use simple residual blocks with two convolutions per block and ResNet with residual blocks is denoted as ResNet. We evaluate all methods on 10 random test images and attack targets (in order to accommodate slow verification methods) and also 100 images results for some networks in Table 5. It shows that the results of average 100 images are similar to average 10 imagess. We train all models for 10 epochs and tune hyperparameters to optimize validation accuracy.
Results (I): pure CNNs with ReLU activation.
Table 3 demonstrates that CNNCert bounds consistently improve on FastLin over network size. CNNCert also improves on DualLP. Attack results show that all certified methods leave a significant gap on the attackbased distortion bounds (i.e. upper bounds on the minimum distortions). Table 4 gives the runtimes of various methods and shows that CNNCert is faster than FastLin, with over an order of magnitude speedup for the smallest network. CNNCert is also faster than the sparse version of FastLin. The runtime improvement of CNNCert decreases with network size. Notably, CNNCert is multiple orders of magnitude faster than the DualLP method. GlobalLips is an analytical bound, but it provides very loose lower bounds by merely using the product of layer weights as the Lipschitz constant. In contrast, CNNCert takes into account the network output at the neuron level and thus can certify significantly larger lower bounds, and is around 820 % larger compared to FastLin and DualLP approaches.
Results (II), (III): general CNNs and ResNet with ReLU activation.
Table 5 gives certified lower bounds for various general CNNs including networks with pooling layers and batch normalization. CNNCert improves upon FastLin style ReLU bounds (CNNCertRelu). Interestingly, the LeNet style network without pooling layers has certified bounds much larger than the pooling version while the network with batch normalization has smaller certified bounds. These findings provide some new insights on uncovering the relation between certified robustness and network architecture, and CNNCert could potentially be leveraged to search for more robust networks. Table 6 gives ResNet results and shows CNNCert improves upon FastLin.
Results (IV): general CNNs and ResNet with general activations.
Table 7 computes certified lower bounds for networks with 4 different activation functions. Some sigmoid network results are omitted due to poor test set accuracy. We conclude that CNNCert can indeed efficiently find nontrivial lower bounds for all the tested activation functions and that computing certified lower bounds for general activation functions incurs no significant computational penalty.
Results (V): Small MLP networks.
Table 8 shows results on small MNIST MLP with 20 nodes per layer. For the small 2layer network, we are able to run Reluplex and compute minimum adversarial distortion. It can be seen that the gap between the certified lower bounds method here are all around 2 times while CLEVER and attack methods are close to Reluplex though without guarantees.
Conclusion and Future Work
In this paper, we propose CNNCert, a general and efficient verification framework for certifying robustness of CNNs. By applying our proposed linear bounding technique on each building block, CNNCert can handle a wide variety of network architectures including convolution, pooling, batch normalization, residual blocks, as well as general activation functions. Extensive experimental results under four different classes of CNNs consistently validate the superiority of CNNCert over other methods in terms of its effectiveness in solving tighter nontrivial certified bounds and its run time efficiency.
Acknowledgement
Akhilan Boopathy, TsuiWei Weng and Luca Daniel are partially supported by MITIBM Watson AI Lab.
References
 [Athalye, Carlini, and Wagner 2018] Athalye, A.; Carlini, N.; and Wagner, D. 2018. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. ICML.
 [Biggio and Roli 2017] Biggio, B., and Roli, F. 2017. Wild patterns: Ten years after the rise of adversarial machine learning. arXiv preprint arXiv:1712.03141.
 [Carlini and Wagner 2017a] Carlini, N., and Wagner, D. 2017a. Adversarial examples are not easily detected: Bypassing ten detection methods. arXiv preprint arXiv:1705.07263.
 [Carlini and Wagner 2017b] Carlini, N., and Wagner, D. 2017b. Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy (SP), 39–57.
 [Chen et al. 2017a] Chen, H.; Zhang, H.; Chen, P.Y.; Yi, J.; and Hsieh, C.J. 2017a. Showandfool: Crafting adversarial examples for neural image captioning. arXiv preprint arXiv:1712.02051.
 [Chen et al. 2017b] Chen, P.Y.; Zhang, H.; Sharma, Y.; Yi, J.; and Hsieh, C.J. 2017b. ZOO: Zeroth order optimization based blackbox attacks to deep neural networks without training substitute models. In ACM Workshop on Artificial Intelligence and Security, 15–26.
 [Chen et al. 2018] Chen, P.Y.; Sharma, Y.; Zhang, H.; Yi, J.; and Hsieh, C.J. 2018. Ead: elasticnet attacks to deep neural networks via adversarial examples. AAAI.
 [Cheng et al. 2018a] Cheng, M.; Le, T.; Chen, P.Y.; Yi, J.; Zhang, H.; and Hsieh, C.J. 2018a. Queryefficient hardlabel blackbox attack: An optimizationbased approach. arXiv preprint arXiv:1807.04457.
 [Cheng et al. 2018b] Cheng, M.; Yi, J.; Zhang, H.; Chen, P.Y.; and Hsieh, C.J. 2018b. Seq2sick: Evaluating the robustness of sequencetosequence models with adversarial examples. arXiv preprint arXiv:1803.01128.
 [Cheng, Nührenberg, and Ruess 2017] Cheng, C.H.; Nührenberg, G.; and Ruess, H. 2017. Maximum resilience of artificial neural networks. In International Symposium on Automated Technology for Verification and Analysis, 251–268. Springer.
 [Dvijotham et al. 2018] Dvijotham, K.; Stanforth, R.; Gowal, S.; Mann, T.; and Kohli, P. 2018. A dual approach to scalable verification of deep networks. UAI.
 [Ehlers 2017] Ehlers, R. 2017. Formal verification of piecewise linear feedforward neural networks. In International Symposium on Automated Technology for Verification and Analysis, 269–286. Springer.
 [Fischetti and Jo 2017] Fischetti, M., and Jo, J. 2017. Deep neural networks as 01 mixed integer linear programs: A feasibility study. arXiv preprint arXiv:1712.06174.
 [Goodfellow, Shlens, and Szegedy 2015] Goodfellow, I. J.; Shlens, J.; and Szegedy, C. 2015. Explaining and harnessing adversarial examples. ICLR.
 [Hein and Andriushchenko 2017] Hein, M., and Andriushchenko, M. 2017. Formal guarantees on the robustness of a classifier against adversarial manipulation. In NIPS.
 [Ilyas et al. 2018] Ilyas, A.; Engstrom, L.; Athalye, A.; and Lin, J. 2018. Blackbox adversarial attacks with limited queries and information. arXiv preprint arXiv:1804.08598.
 [Katz et al. 2017] Katz, G.; Barrett, C.; Dill, D. L.; Julian, K.; and Kochenderfer, M. J. 2017. Reluplex: An efficient smt solver for verifying deep neural networks. In International Conference on Computer Aided Verification, 97–117. Springer.
 [Kolter and Wong 2018] Kolter, J. Z., and Wong, E. 2018. Provable defenses against adversarial examples via the convex outer adversarial polytope. ICML.
 [Kurakin, Goodfellow, and Bengio 2017] Kurakin, A.; Goodfellow, I.; and Bengio, S. 2017. Adversarial machine learning at scale. ICLR.
 [LeCun et al. 1998] LeCun, Y.; Bottou, L.; Bengio, Y.; and Haffner, P. 1998. Gradientbased learning applied to document recognition. Proceedings of the IEEE 86(11):2278–2324.
 [Lomuscio and Maganti 2017] Lomuscio, A., and Maganti, L. 2017. An approach to reachability analysis for feedforward relu neural networks. arXiv preprint arXiv:1706.07351.
 [Madry et al. 2018] Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; and Vladu, A. 2018. Towards deep learning models resistant to adversarial attacks. ICLR.
 [Peck et al. 2017] Peck, J.; Roels, J.; Goossens, B.; and Saeys, Y. 2017. Lower bounds on the robustness to adversarial perturbations. In NIPS.
 [Raghunathan, Steinhardt, and Liang 2018] Raghunathan, A.; Steinhardt, J.; and Liang, P. 2018. Certified defenses against adversarial examples. ICLR.
 [Sinha, Namkoong, and Duchi 2018] Sinha, A.; Namkoong, H.; and Duchi, J. 2018. Certifiable distributional robustness with principled adversarial training. ICLR.
 [Szegedy et al. 2013] Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.; and Fergus, R. 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199.
 [Tu et al. 2018] Tu, C.C.; Ting, P.; Chen, P.Y.; Liu, S.; Zhang, H.; Yi, J.; Hsieh, C.J.; and Cheng, S.M. 2018. Autozoom: Autoencoderbased zeroth order optimization method for attacking blackbox neural networks. arXiv preprint arXiv:1805.11770.
 [Weng et al. 2018a] Weng, T.W.; Zhang, H.; Chen, H.; Song, Z.; Hsieh, C.J.; Boning, D.; Dhillon, I. S.; and Daniel, L. 2018a. Towards fast computation of certified robustness for relu networks. ICML.
 [Weng et al. 2018b] Weng, T.W.; Zhang, H.; Chen, P.Y.; Yi, J.; Su, D.; Gao, Y.; Hsieh, C.J.; and Daniel, L. 2018b. Evaluating the robustness of neural networks: An extreme value theory approach. ICLR.
 [Zhang et al. 2018] Zhang, H.; Weng, T.W.; Chen, P.Y.; Hsieh, C.J.; and Daniel, L. 2018. Efficient neural network robustness certification with general activation functions. In NIPS.
 [Zügner, Akbarnejad, and Günnemann 2018] Zügner, D.; Akbarnejad, A.; and Günnemann, S. 2018. Adversarial attacks on neural networks for graph data. In KDD.
Appendix
(a) Derivation of ActConv block:
Our goal.
We are going to show that the output in (1) can be bounded as follows:
where are constant tensors related to weights and bias as well as the corresponding parameters in the linear bounds of each neuron.
Notations.
Below, we will use subscript to denote the location of and its corresponding weight filter is denoted as . Meanwhile, we will use subscripts to denote the location in the weight filter.
Derivations of upper bounds.
By definition, the th output is a convolution of previous output with its corresponding filter :
(10)  
(11)  
(12)  
(13) 
From (10) to (11), we expand the convolution into summation form. From (11) to (12), we apply the linear upper and lower bounds on each activation as described in (2); the inequalities holds when multiplying with a positive weight and will be reversed (the RHS and LHS will be swapped) when multiplying with a negative weight. Since here we are deriving the upper bound, we only need to look at the RHS inequality. This is indeed the key idea for all the derivations. The tensor contains only the positive entries of weights with all others set to zero while contains only the negative entries of and sets other entries to zero. Note that with a slightly abuse of notation, the here are tensors with the same dimensions as (while the in (2) are scalar), and we use subscripts to denote the entry of tensor, e.g. .
Derivations of lower bounds.
The lower bounds can be derived similarly:
where
(17)  
(18) 
(b) Derivation of Residual block:
Our goal.
We are going to show that the output in the residual block can be bounded as follows:
where denote the output of residual block (before activation), be the output of first convolutional layer and be the input of residual block, are constant tensors related to weights , , bias , , and the corresponding parameters in the linear bounds of each neuron. The input/output relation of residual block is as follows:
Notations.
Below, we will use subscript to denote the location of and its corresponding weight filter is denoted as . Meanwhile, we will use subscripts to denote the location in the weight filter.
Derivations of upper bounds.
Write out and apply the actconv bound on the term , we obtain
Plug in the equation , we get