Verifying the robustness property of a general Rectified Linear Unit (ReLU) network is an NP-complete problem [Katz, Barrett, Dill, Julian and Kochenderfer CAV17]. Although finding the exact minimum adversarial distortion is hard, giving a certified lower bound of the minimum distortion is possible. Current available methods of computing such a bound are either time-consuming or delivering low quality bounds that are too loose to be useful. In this paper, we exploit the special structure of ReLU networks and provide two computationally efficient algorithms (Fast-Lin and Fast-Lip) that are able to certify non-trivial lower bounds of minimum distortions, by bounding the ReLU units with appropriate linear functions (Fast-Lin), or by bounding the local Lipschitz constant (Fast-Lip). Experiments show that (1) our proposed methods deliver bounds close to (the gap is 2-3X) exact minimum distortion found by Reluplex in small MNIST networks while our algorithms are more than 10,000 times faster; (2) our methods deliver similar quality of bounds (the gap is within 35% and usually around 10%; sometimes our bounds are even better) for larger networks compared to the methods based on solving linear programming problems but our algorithms are 33-14,000 times faster; (3) our method is capable of solving large MNIST and CIFAR networks up to 7 layers with more than 10,000 neurons within tens of seconds on a single CPU core.
In addition, we show that, in fact, there is no polynomial time algorithm that can approximately find the minimum adversarial distortion of a ReLU network with a approximation ratio unless =, where is the number of neurons in the network.
- 1 Introduction
- 2 Background and related work
3 Robustness guarantees for ReLU networks
- 3.1 Finding the minimum distortion with a approximation ratio is hard
- 3.2 ReLU network and activation patterns under perturbations
- 3.3 Approach 1: Certified lower bounds via linear approximations
- 3.4 Approach 2: Certified lower bounds via bounding the local Lipschitz constant
- 4 Experimental Results
- 5 Conclusions
- A Hardness
- B Proof of Theorem 3.5
- C Proof of Corollary 3.7
- D An alternative bound on the Lipschitz constant
Since the discovery of adversarial examples in deep neural network (DNN) image classifiers [SZS13], researchers have successfully found adversarial examples in many machine learning tasks applied to different areas, including object detection [XWZ17], image captioning [CZC18], speech recognition [CANK17], malware detection [WGZ17] and reading comprehension [JL17]. Moreover, black-box attacks have also been shown to be possible, where an attacker can find adversarial examples without knowing the architecture and parameters of the DNN [CZS17, PMG17, LCLS17].
The existence of adversarial examples poses a huge threat to the application of DNNs in mission-critical tasks including security cameras, self-driving cars and aircraft control systems. Many researchers have thus proposed defensive or detection methods in order to increase the robustness of DNNs. Notable examples are defensive distillation [PMW16], adversarial retraining/training [KGB17, MMS18] and model ensembles [TKP18, LCZH17]. Despite many published contributions that aim at increasing the robustness of DNNs, theoretical results are rarely given and there is no guarantee that the proposed defensive methods can reliably improve the robustness. Indeed, many of these defensive mechanism have been shown to be ineffective when more advanced attacks are proposed [CW17c, CW17a, CW17b, HWC17].
The robustness of a DNN can be verified, for instance, by examining a neighborhood (typically, an or ball) near a data point . The idea is to find the largest ball with radius that guarantees no points inside the neighborhood can ever change the classifier’s output. Typically, can be found as follows: given , a global optimization algorithm can be used to find an adversarial example within this ball. A bisection on can then produce . Reluplex [KBD17] is one example that uses such a technique. It encodes a ReLU network into statements that can be solved by satisfiability modulo theory (SMT) and linear programming (LP) solvers. However, it takes hours or more to verify the robustness property of a single data point on a small feed-forward network with a total of 300 neurons. Furthermore it is computationally infeasible even on a small MNIST classifier. In general, verifying the robustness property of a ReLU network is NP-complete [KBD17, SND18].
On the other hand, a lower bound of radius can be given, which guarantees that no examples within a ball with radius can ever change the network classification outcome. [HA17] is a pioneering work on giving such a lower bound for neural networks that are continuously differentiable, although only a 2-layer MLP network with differentiable activation is investigated. [WZC18] has extended the lower bound result to non-differentiable functions and proposed a sampling-based algorithm to estimate via extreme value theory. Their approach is feasible for large state-of-the-art DNNs but the computed quantity is an estimate of without certificates. Ideally, we would like to obtain a certified 111A certified lower bound guarantees that . and non-trivial222Note that a trivial is 0, which is not useful. that is reasonably close to within reasonable amount of computational time – this is indeed the main motivation of this paper.
In this paper, we develop two fast algorithms for obtaining a tight and certified lower bound on ReLU networks. In addition to the certified lower bounds, we also provide a complementary theoretical result to [KBD17, SND18] by further showing there does not even exist a polynomial time algorithm that can approximately find the minimum adversarial distortion with guaranteed approximation ratio. Our contributions can be summarized as follows:
We fully exploit the ReLU networks to give two computationally efficient methods of computing tighter and guaranteed robustness lower bounds via (1) linear approximation on the ReLU units (see Theorem 3.5, Corollary 3.7, Algorithm 1 Fast-Lin) and (2) bounding network local Lipschitz constant (see Section 3.4, Algorithm 2 Fast-Lip). Unlike the per-layer operator-norm-based lower bounds which are often very loose (close to 0, as verified in our experiments) for deep networks, our bounds are much closer to the upper bound given by the best adversarial examples, and thus can be used to evaluate the robustness of DNNs with theoretical guarantee.
We show that the computational efficiency of our proposed method is at least four orders of magnitude faster than finding the exact minimum distortion (with Reluplex), and also around two orders of magnitude (or more) faster than linear programming (LP) based methods. For instance, we can compute a reasonable robustness lower bound within a minute for a ReLU network with up to 7 layers or over ten thousands neurons (This is so far the best available result in the literature to our best knowledge).
We also show that there is no polynomial time algorithm that can find a lower bound of minimum adversarial distortion with a approximation ratio (where is the total number of neurons) unless = (see Theorem 3.1).
We discussed some related works on solving the minimum adversarial distortion or finding a lower bound in Section 2. We present our two certified lower bounds (Fast-Lin, Fast-Lip) and hardness result in Section 3. We compare our algorithms with existing methods and show the experimental results in Section 4. Appendix A presents the full proof of our in-approximability result. Appendix B and C provide proofs of our guaranteed lower bounds. Appendix D provides an alternative method of bounding Lipschitz constant.
2 Background and related work
2.1 Solving the minimum adversarial distortion
For ReLU networks, the verification problem can be transformed into a Mixed Integer Linear Programming (MILP) problem [LM17, CNR17, FJ17] by using binary variables to encode the states of ReLU activation in each neuron. [KBD17] proposed a satisfiable modulo theory (SMT) based framework, Reluplex, which also encodes the network into a set of linear constraints with special rules to handle ReLU activations. Reluplex uses a similar algorithm to Simplex, and it splits the problem into two linear programming (LP) problems based on a ReLU’s activation status on demand. Similarly, [Ehl17] proposed Planet, another splitting-based approach using satisfiability (SAT) solvers. All of these approaches can guarantee to find the exact minimum distortion of an adversarial example, and thus can be used for formal verification. However, because of the NP-hard nature of the underlying problem, these approaches can only be applied to very small networks. For example, using Reluplex to verify a feed-forward network with 5 inputs, 5 outputs and total 300 hidden neurons on a single data point can take a few hours [KBD17].
2.2 Existing methods of computing lower bounds of minimum distortion
The earliest attempt in this line of work dates back to 2001 [Zak01], where the authors bound the approximation error between a trained neural network (with sigmoid or hyperbolic tangent activations) and a pre-defined multi-dimensional look-up table using second derivatives. [SZS13] gives a lower bound on the minimum distortion in a ReLU network by investigating the product of the weight matrices operator norms, but this bound is usually too loose to be useful in practice, as pointed out in [HA17] and verified in our experiments (see Table 2). A tighter bound was given by [HA17] using local Lipschitz constant on a network with one hidden layer, but their approach requires the network to be continuously-differentiable, and thus cannot be directly applied to ReLU networks. The approach in [WZC18] extends the lower bound guarantee in [HA17] to non-differentiable functions by Lipschitz continuity assumption, and shows promising results of estimating lower bounds of large DNNs with ReLU activations via extreme value theory. The CLEVER score [WZC18], which is the estimated lower bound, has been shown to be capable of reflecting relative robustness of different networks and is the first robustness metric that can scale to large ImageNet networks. As also shown in our experiments in Section 4, the CLEVER score is a good robustness estimate close to the true minimum distortion given by Reluplex, albeit without providing certificates. Recently, [KW17] proposes using a convex outer adversarial polytope to compute a certified lower bound in ReLU networks. They provide a convex relaxation on the MILP verification problem [LM17, CNR17, FJ17], which reduces MILP to a linear programming (LP) when the distortion is in norm, and they propose to solve the dual of their relaxed LP problem for the sake of computational efficiency. They focus on norm, which allows them to get layer-wise bounds by looking into the dual problem. It is also worth noting that Reluplex [KBD17] can not deal with general norms though it can find the minimum distortion in terms of norm (and is possible via an extension). To address this issue and motivated by the approach of convex outer polytope [KW17], we provide another convex approximation based on a neuron’s activation status, which enables fast computation of a certified lower bound. As we will show in Section 3.3, we can derive explicit output bounds that does not involve solving any expensive LP or its dual problems on-the-fly, and can be incorporated with binary search to give a certified lower bound efficiently. Our techniques allow computing certified lower bounds on large fully-connected networks with multiple layers on MNIST and CIFAR datasets, which current state-of-the-art techniques [KBD17, KW17] do not seem able to handle.
2.3 Hardness and approximation algorithms
is the most important and popular assumption in computational complexity in the last several decades. It can be used to show that the decision of the exact case of a question is hard. However, in several cases, solving one question approximately is much easier than solving one question exactly. For example, there is no polynomial time algorithm to solve the - problem, but there is a simple -approximation polynomial time algorithm. Previous works [KBD17, SND18] show that there is no polynomial time algorithm to find the minimum adversarial distortion exactly. Therefore, a natural question to ask is: does there exist a polynomial time algorithm to solve the robustness problem approximately? In other words, can we give a lower bound of with a guaranteed approximation ratio?
From another perspective, only rules out the polynomial running time. Some problems might not even have a sub-exponential time algorithm. To rule out that, the most well-known assumption used is the “Exponential Time Hypothesis” [IPZ98]. The hypothesis states that cannot be solved in sub-exponential time in the worst case. Another example is that while tensor rank calculation is NP-hard [Hås90], a recent work [SWZ17b] proved that there is no time algorithm to give a constant approximation of the rank of the tensor. There are also some stronger versions of the hypothesis than ETH, e.g., Strong ETH [IP01], Gap ETH [Din16, MR17], and average case ETH [Fei02, RSW16].
3 Robustness guarantees for ReLU networks
Overview of our results.
We begin with a motivating theorem in Section 3.1 showing that there does NOT exist a polynomial time algorithm able to find the minimum adversarial distortion with a approximation ratio. We then introduce basic properties and notations in Section 3.2 and state our main results in Section 3.3 and 3.4, where we develop two approaches that guarantee to obtain a lower bound of minimum adversarial distortion. In Section 3.3, we first demonstrate a general approach to directly derive the output bounds of a ReLU network with linear approximation when inputs are perturbed by a general norm noise. The analytic output bounds allow us to numerically compute a certified lower bound of minimum distortion along with a binary search. In Section 3.4, we present another method to obtain a certified lower bound of minimum distortion by deriving upper bounds for the local Lipschitz constant. Both methods are highly efficient and allow fast computation of certified lower bounds on large ReLU networks.
3.1 Finding the minimum distortion with a approximation ratio is hard
[KBD17] shows that verifying robustness for a general ReLU network is NP-complete; in other words, there is no efficient (polynomial time) algorithm to find the exact minimum adversarial distortion. Here, we further show that even approximately finding the minimum adversarial distortion with a guaranteed approximation ratio can be hard. Suppose the norm of the true minimum adversarial distortion is , and a robustness verification program A gives a guarantee that no adversarial examples exist within an ball of radius ( is a lower bound of ). The approximation ratio . We hope that is close to 1 with a guarantee; for example, if is a constant regardless of the scale of the network, we can always be sure that is at most times as large as the lower bound found by A. Here we relax this requirement and allow the approximation ratio to increase with the number of neurons . In other words, when is larger, the approximation becomes more inaccurate, but this “inaccuracy” can be bounded. However, the following theorem shows that no efficient algorithms exist to give a approximation in the special case of robustness:
Unless , there is no polynomial time algorithm that gives -approximation to the ReLU robustness verification problem with neurons.
Our proof is based on a well-known in-approximability result of SET-COVER problem [RS97, AMS06, DS14] and a novel reduction from SET-COVER to our problem. We defer the proof into Appendix A. The formal definition of the ReLU robustness verification problem can be found in Definition A.7. Theorem 3.1 implies that any efficient (polynomial time) algorithm cannot give better than -approximation guarantee.
Under the Exponential Time Hypothesis and Projection Games Conjecture , there is no time algorithm that gives -approximation to the ReLU robustness verification problem with neurons, where is some fixed constant.
3.2 ReLU network and activation patterns under perturbations
Let be the input vector for an -layer neural network with hidden layers and let the number of neurons in each layer be . We use to denote set . The weight matrix and bias vector for the -th layer have dimension and , respectively. Let be the operator mapping from input layer to layer and be the coordinate-wise activation function; for each , the relation between layer and layer can be written as
For the input layer and the output layer, we have and . The output of the neural network is , which is a vector of length , and the -th output is its -th coordinate, denoted as . For ReLU activation, the activation function is an element-wise operation on the input vector .
Given an input data point and a bounded -norm perturbation , the input is constrained in an ball . With all possible perturbations in , the pre-ReLU activation of each neuron has a lower and upper bound and , where . Let us use and to denote the lower and upper bound for the -th neuron in the -th layer, and let be its pre-ReLU activation, where , , and is the -th row of . There are three categories of possible activation patterns and we use and to denote the sets of -th layer neurons that fit the categories (i)-(iii) respectively:
the neuron is always activated ,
the neuron is always inactivated ,
the neuron could be either activated or inactivated .
The formal definition of , and is
Obviously, is a partition of set .
3.3 Approach 1: Certified lower bounds via linear approximations
3.3.1 Derivation of the output bounds via linear upper and lower bounds for ReLU
In this section, we propose a methodology to directly derive upper bounds and lower bounds of the output of an -layer feed-forward ReLU network. The central idea is to derive an explicit upper/lower bound based on the linear approximations for the neurons in category (iii) and the signs of the weights associated with the activations. Although the idea of using linear approximations is similar to [KW17], our technique is different in three aspects. First, we show that it is possible to derive explicit output bounds and do not need to solve any relaxed linear program or its dual. Second, our framework can easily handle general constrained adversarial distortions and we demonstrate full results for general () distortions on both MNIST and CIFAR networks in Section 4. Lastly, though our technique can work with more general convex approximations, we focus on a specific linear approximation bounds to ReLU (see Figure 0(c)) different from [KW17] (see Figure 0(b)) and show the adopted linear bounds is beneficial to efficient computation in large networks. Figure 2 illustrates the idea of our proposed method.
We start with a 2-layers network and then extend it to layers. The -th output of a 2-layer network is:
For neurons in category (i), i.e., , we have ; for neurons in category (ii), i.e., , we have For the neurons in category (iii), [KW17] uses the following convex outer bounds to replace the ReLU activation (illustrated in Figure 0(b)):
where and are the lower bound and upper bound of the pre-ReLU activation of the neuron (i.e. we can think of , ). If this approximation is applied, then for the following holds with :
where . To obtain this upper bound , we take the upper bound of for and its lower bound for . Both cases share a common term of , which is combined into the first summation term in (5) with . Similarly we get the bound for .
We note that this technique also works for other linear upper and lower bounds. For example, we can also obtain and using (1) by considering the signs of . However, our linear approximation (3) gives additional benefits on computational efficiency especially when we extend the bounds to multiple layers. Because (3) has the same slope on both sides, the multiplier before in (5) and (6) are the same regardless of the sign of . This is crucial to efficient computation for multiple layers, as we will be able to recursively compute the multiplier before efficiently. On the other hand, the approximation (1) does not exhibit such a good property.
For a general -layer ReLU network with the linear approximation (3), we will show in Theorem 3.5 that the network output can be bounded by two explicit functions when the input is perturbed with a -bounded noise. We start by defining the activation matrix and the additional equivalent bias terms and for the -th layer in Definition 3.3 and the two explicit functions in 3.4.
Definition 3.3 ().
Given matrices and vectors . We define as an identity matrix. For each , we define matrix as follows
We define matrix to be , and for each , matrix is defined recursively as
For each , we define matrices , where
Definition 3.4 (Two explicit functions : and ).
Let matrices , and be defined as Definition 3.3. We define two functions as follows. For each input vector ,
Now, we are ready to state our main theorem,
Theorem 3.5 (Two side bounds with explicit functions).
Given an -layer ReLU neural network function , there exists two explicit functions and (see Definition 3.4) such that ,
The proof of Theorem 3.5 is in Appendix B. Since the input , we can maximize (5) and minimize (6) within this set to obtain a global upper and lower bound of , which has analytic solutions for any and the result is formally shown in Corollary 3.7 with the proof in Appendix C. In other words, we have analytic bounds that can be computed efficiently without resorting to any optimization solvers for general distortion, and this is the key to enable fast computation for layer-wise output bounds.
We first formally define the global upper bound and lower bound of ,
Definition 3.6 ().
Given a point , a neural network function , parameters . Let matrices , and , be defined as Definition 3.3. For each , we define as
where q satisfy and are defined as
Corollary 3.7 (Two side bounds in closed-form).
Given a point , an -layer neural network function , parameters and . For each , there exist two fixed values and (see Definition 3.6) such that ,
3.3.2 Computing pre-ReLU activation bounds
Theorem 3.5 and Corollary 3.7 give us a global lower bound and upper bound of the -th neuron at the -th layer if we know all the pre-ReLU activation bounds and , from layer to , as the construction of , and requires and (see Definition 3.3). Here, we show how this can be done easily and layer-by-layer. We start from where . Then, we can apply Corollary 3.7 to get the output bounds of each neuron and set them as and . Then, we can proceed to with and and compute the output bounds of second layer by Corollary 3.7 and set them as and . Repeating this procedure for all layers, we will get all the and needed to compute the output range of the -th layer.
Note that when computing and , the constructed can be saved and reused for bounding the next layer, which facilitates efficient implementations. Moreover, the time complexity of computing the output bounds of an -layer ReLU network with Theorem 3.5 and Corollary 3.7 is polynomial time in contrast to the approaches in [KBD17] and [LM17] where SMT solvers and MIO solvers have exponential time complexity. The major computation cost is to form for the -th layer, which involves multiplications of layer weights in a similar cost of forward propagation.
3.3.3 Deriving maximum certified lower bounds of minimum adversarial distortion
Suppose is the predicted class of the input data point and the targeted attack class is . With Theorem 3.5, the maximum possible lower bound for the targeted attacks and un-targeted attacks are
Though it is hard to get analytic forms of and in terms of , fortunately, we can still obtain via binary search. This is because Corollary 3.7 allows us to efficiently compute the numerical values of and given . It is worth noting that we can further improve the bound by considering at the last layer and apply the same procedure to compute the lower bound of (denoted as ); this can be done easily by redefining the last layer’s weights to be a row vector . The corresponding maximum possible lower bound for the targeted attacks is . Our proposed algorithm, Fast-Lin, is shown in Algorithm 1.
3.4 Approach 2: Certified lower bounds via bounding the local Lipschitz constant
[WZC18] shows a non-trivial lower bound of minimum adversarial distortion for an input example in targeted attacks is , where is the local Lipschitz constant of in , is the target class, is the original class, and . For un-targeted attacks, the lower bound can be presented in a similar form. [WZC18] uses sampling techniques to estimate the local Lipschitz constant and compute a lower bound without certificates.
Here, we propose a new algorithm to compute a certified lower bound of the minimum adversarial distortion by upper bounding the local Lipschitz constant with guarantees. To start with, let us rewrite the relations of subsequent layers in the following form: , where is replaced by the diagonal activation pattern matrix that encodes the status of neurons in th layer:
and . With a slight abuse of notation, let us define as a diagonal activation matrix for neurons in the -th layer who are always activated, i.e. the -th diagonal is if and otherwise, and as the diagonal activation matrix for th layer neurons whose status are uncertain, i.e. the -th diagonal is or (to be determined) if , and otherwise. Therefore, we have . We can obtain for by applying Algorithm 1 and check the lower and upper bounds for each neuron in layer .
3.4.1 A general upper bound of Lipschitz constant in norm
The central idea is to compute upper bounds of by exploiting the three categories of activation patterns in ReLU networks when the allowable inputs are in . can be defined as the maximum directional derivative as shown in [WZC18]. The maximum directional derivative in a ball is the same as maximum gradient norm, in a ball. For the ReLU network, the maximum gradient norm can be found by examining all the possible activation patterns and take the one (the worst-case) that results in the largest gradient norm. However, as all possible activation patterns grow exponentially with the number of the neurons, it is impossible to examine all of them in brute-force. Fortunately, computing the worst-case pattern on each element of (i.e. ) is much easier and more efficient. In addition, we apply a simple fact that the maximum norm of a vector (which is in our case) is upper bounded by the norm of the maximum value for each components. By computing the worst-case pattern on and the norm, we can obtain an upper bound of the local Lipschitz constant, which results in a certified lower bound of minimum distortion. 333We can use the triangle inequality to derive an upper bound on the Lipschitz constant (see Appendix D), but this bound is loose as the activation patterns on the neurons are not exploited.
Below, we first show how to derive an upper bound of the Lipschitz constant by computing the worst-case activation pattern on for layers. Next, we will show how to apply it repeatedly for a general -layer network. Note that for simplicity, we will use to illustrate our derivation; however, it is easy to extend to as . Thus, the result for is simply the result of but replacing the last layer weight vector by