Interpreting Adversarial Robustness:
A View from Decision Surface in Input Space
Abstract
One popular hypothesis of neural network generalization is that the flat local minima of loss surface in parameter space leads to good generalization. However, we demonstrate that loss surface in parameter space has no obvious relationship with generalization, especially under adversarial settings. Through visualizing decision surfaces in both parameter space and input space, we instead show that the geometry property of decision surface in input space correlates well with the adversarial robustness. We then propose an adversarial robustness indicator, which can evaluate a neural network’s intrinsic robustness property without testing its accuracy under adversarial attacks. Guided by it, we further propose our robust training method. Without involving adversarial training, our method could enhance network’s intrinsic adversarial robustness against various adversarial attacks.
Interpreting Adversarial Robustness:
A View from Decision Surface in Input Space
Fuxun Yu, Chenchen Liu, Yanzhi Wang, Xiang Chen 

Department of Electrical Computer Engineering, George Mason University, Fairfax, VA 22030 
Department of Electrical Computer Engineering, Clarkson University, Potsdam, NY 13699 
Department of Electrical Computer Engineering, Northeastern University, Boston, MA 02115 
fyu2@gmu.edu, chliu@clarkson.edu, yanz.wang@northeastern.edu, xchen26@gmu.com 
1 Introduction
It is commonly believed that a neural network’s generalization is correlated to its geometric properties of the loss surface, i.e. the flatness of the local minima in parameter space (Kawaguchi (2016); Im et al. (2017)). For example, Keskar et al. (2017) showed that a sharp local minima brings poorer neural network generalization. Li et al. (2017) visualized and demonstrated that the outstanding performance of ResNet comes from its “wide valley” of the local minima on the loss surface. Chaudhari et al. (2017) then proposed EntropySGD, which utilizes the “wide valleys” property to guide the neural network training for better generalization.
However, such a generalization estimation approach is challenged by adversarial examples recently (Szegedy et al. (2013)): even models with good generalization may still suffer from adversarial examples, resulting in extremely low accuracy. For example, ResNet model usually converges to a wide and flat local minima on loss surface in parameter space as visualized in Fig. 1(a), which indicates good test accuracy according to the mentioned evaluation approach. When testing adversarial examples, the model’s accuracy is significantly defected by the adversarial noises, which are small, but can cause dramatically loss increments. Therefore, the conventional generalization estimation in parameter space fails under adversarial settings, and how to estimate the generalization over adversarial examples, i.e. the adversarial robustness, remains a significant challenge.
Fortunately, the loss increments introduced by adversarial noises can be well reflected in the loss surface in input space as visualized in Fig. 1(b), where the nonsmoothness indicates high sensitivity to the adversarial noises. The differences in Fig. 1 suggest the ineffectiveness of generalization estimation in parameter space and the potential in input space. Therefore, different from prior works focusing on parameter space, we will explore the robustness estimation mainly in input space.
In this work, we have the following contributions:

We analyze the geometric properties of the loss surface in both parameter and input space. We demonstrate that the input space is more essential in evaluating the generalization and adversarial robustness of a neural network;

We reveal the shared mechanisms of various adversarial attack methods. To do so, we first extend the concept of loss surface to decision surface for clearer geometry visualization. By visualizing the adversarial attack trajectory on decision surfaces in input space, we reveal that various adversarial attacks are all utilizing the decision surface geometry properties to cross the decision boundary within least distance.

We then formalize the adversarial robustness indicator by involving the geometric properties of Jacobian and Hessian’s eigenvalues. Such an indicator can effectively evaluate a neural network’s intrinsic robustness property against various adversarial attacks without field accuracy testing of massive adversarial examples; It also concludes that the wide and flat plateau of decision surface in input space enables better generalization and robustness.

We also propose a robust training method guided by our adversarial robustness indicator, which aims to smooth the decision surfaces and enhances adversarial robustness by regulating the Jacobian in training process.
Our robustness estimation approach has provable relationship with neural network’s robustness performance and geometry properties in input space. This enables us to evaluate adversarial robustness of a neural network, without conducting massive adversarial attacks for field test. Guided by such an estimation approach, our robust training method could also effectively enhance the neural network against various adversarial attacks without involving the timeconsuming adversarial training.
2 Ineffectiveness of Adversarial Robustness Estimation
from the Loss Surface in Parameter Space
In this section, we compare the effectiveness of the adversarial robustness estimation from the loss surface in both parameter and input space with an effective visualization method.
2.1 Visualization Neural Network Loss Surface by Projection
Given a neural network with a loss function , where is neural network parameters (weight and bias) and is the input. As the function inputs are usually in highdimensional space, direct visualization analysis on the loss surface is impossible. Therefore, following the methods proposed by Goodfellow et al. (2015) and Li et al. (2017), we project the highdimensional loss surface into a lowdimensional space, e.g. 2D hyperplane to visualize it. In such methods, two projection vectors and are chosen and normalized as the base vectors for and axes. Then given an starting point , the points around it are interpolated and corresponding loss values can be calculated:
(1) 
Here, the original point in function could be either in parameter space, which is mostly studied by prior work (e.g. Li et al. (2017); Im et al. (2017)), or in input space, which is our major focus in this paper. The coordinate denotes how far the original point moves along and direction. After calculating enough points’ loss values, the function with highdimensional inputs could be projected to the chosen hyperplane formed by vector and .
Fig. 1 has already shown two visualized loss surface examples in both parameter space and input space, which give an intuition of the dramatic difference between the two approaches. In the next section, we will further demonstrate that the loss surface geometry in input space is more essential regarding the neural network robustness properties.
2.2 The Loss Surface in Parameter Surface vs. Input Space
To prove our statement, we examine the robustness of a pair of neural networks with the same ResNet model setting, but trained with natural process and MinMax robust training scheme respectively. Both neural networks could achieve optimal accuracy () on CIFAR10 dataset. However, their adversarial robustness degrees are significantly different: 0.0% and 44.71% accuracy under adversarial attacks. To analyze such a difference, the loss surfaces and corresponding contour maps are visualized in both parameter space (as shown in Fig. 2) and input space (as shown in Fig. 3). Here the axis of 3D visualization denotes the loss values (as well as the numbers on contour lines in 2D visualization), and the and axes are the corresponding projection vectors and .
When illustrated in parameter space, both neural networks’ loss surfaces on local minima are wide and flat, which align well with their high accuracy as stated in (Li et al. (2017)). But comparing Fig. 2 (a) and (b), even though the two neural networks have distinct degrees of robustness, there is no obvious difference between their loss surfaces in parameter space.
However when illustrated in input space as shown in Fig. 3, obvious differences emerge between the natural and robust neural networks: (1) Based on the 3D surface visualization, the natural neural network’s loss surface in input space has a deep and sharp bottom, while the local minima one the loss surface of the robust neural network is much flatter; (2) Based on the contour map visualization, we show that the original inputs in the natural neural network’s surface locate in a very small valley, while the robust one shows a wide area. Thus, in the natural neural network’s case, once some small perturbations are injected into the inputs and move the inputs out of the small valley, the function loss will significantly increase and the prediction result could be easily flipped.
These significant distinctions indicate the ineffectiveness of generalization estimation from the loss surfaces in parameter space, which cannot effectively guide the neural network enhancement. Therefore, we clarify that in terms of generalization and adversarial robustness, we should not only focus on the socalled “wide valley” of loss surfaces in parameter space, but also in input space. In the next section, we will visualize the adversarial attack trajectory and further demonstrate the close relation between adversarial vulnerability and decision surface geometry in input space.
3 Revealing Adversarial Attacks’ Mechanism
through Decision Surface Analysis
Previously, we showed the potential of adversarial robustness estimation in input space. In this section, we further explore the adversarial attacks’ mechanism with input space decision surface.
3.1 Extend Loss Surface to Decision Surface
Directly visualizing loss surfaces in input space has certain disadvantages: The major issue is that there is no explicit decision boundary for the correct or wrong prediction for a given input image; Another deficiency of cross entropy based loss surface is that it cannot well demonstrate the geometry, especially when the loss is relatively low ^{1}^{1}1For the same input image, neural network prediction with high confidence can have similar loss with low confidence prediction due to the nonlinear operations. Detailed analysis could be find in the Appendix.. This usually causes large blank regions with no useful information in visualization. To resolve these problems, we extend the concept of loss surface to decision surface, which enriches the geometry information and offers clear decision boundary.
Here, we introduce the definition of neural network decision boundary and decision surface. For one input image with label , the decision boundary of a neural network is defined as following:
(2) 
is the logit layer output before softmax layer, and is the true class of input . The decision function evaluates the confidence of prediction, i.e. how much correct logit output exceeds the max incorrect logit. In correct prediction cases, should always be positive and higher value is often better (different from crossentropy loss that lower is better). indicates the equal confidence in correct and wrong class, and thus is the decision boundary between correct and wrong prediction. The surface formed by function is called the decision surface to distinguish from cross entropy based loss surface, also because it contains the explicit decision boundary.
Fig. 4 compares the cross entropy based loss surface visualization (first row) and the decision surface visualization (second row). The decision surfaces can well resemble the loss surfaces in their informative areas but also include the confidence information in the blank area of the loss surfaces. Meanwhile, the explicit decision boundary, i.e. contour line , enables us to clearly see when network decision changes, which is very useful in analyzing adversarial examples as we will show next. In the following paper, we will use decision surface visualization as default settings.
3.2 Shared Mechanism of Various Adversarial Attacks
As shown in Eq. 1, we could project the decision surface to a hyperplane composed of two base vectors – and . Therefore, using adversarial attacking direction as the projection vector (axis), we could visualize the adversarially projected decision surfaces and the corresponding attack trajectory along the axis direction.
For generality, we compare four cases with different projection vector : The first one is random direction, but the other three are produced from three representative adversarial attack objective functions: crossentropy nontargeted loss, crossentropy targeted loss (leastlikely class), and C&W loss (Kurakin et al. (2016); Carlini & Wagner (2017)):
(3) 
where is the true class label, is least likely class label (both onehot).
The decision surface and adversarial trajectory visualization results are shown in Fig. 4. The length of blue and green arrows denote the distance needed to cross the decision boundary. In random projection (a), the length of green arrows is long. This indicates that towards a random direction, the original input is far from neural network’s decision boundary or wrong classification regions with . This explains the common sense that natural images with random noises won’t degrade neural network accuracy significantly. However, in adversarially projected hyperplane (bd), wrong regions are much closer indicated by extremely short arrows: Towards axis adversarial direction, adversarial examples could be easily found and even within .
Comparing different adversarial attack trajectories in Fig. 4(b)(d), we could find that they all demonstrate similar behaviors even though their objective functions are designed differently: All axis attack directions show extremely dense contour lines. This denotes the steepest descent direction to cross decision boundary and to enter wrong classification regions. Therefore, we can conclude the shared mechanism by different adversarial attacks, which is to utilize the decision surface geometry information to cross the decision boundary within shortest distance.
Meanwhile, our visualization results reveal the nature of adversarial examples: Although a neural network’s training loss in parameter space seems to converge well after model training, there still exist large regions of points that the neural network fails to classify correctly (proved by the large negative regions on the adversarial projected hyperplane). And some of these regions are extremely close to the original input points (some even within distance). Since the data points in such regions are in the close neighborhood of the natural input images, they seem no difference by human vision, which is conventionally recognized as adversarial examples.
Therefore, we conclude that rather than being "crafted" by adversarial attacks, adversarial examples are "naturally existed" points. Rather than defending "adversarial attacks", the essential solution of robustness enhancement is to solve the "neighborhood underfitting" issue of neural networks. We then propose an adversarial robustness evaluation approach, which uses decision surface’s differential geometry property to interpret and evaluate the neural network robustness.
4 Adversarial Robustness Indicator with
Decision Surface Geometry
4.1 Theoretical Robustness Bound based on SecondOrder Taylor Expansion
Suppose neural network decision function is secondorder differentiable. We have noticed that the neural network decision surface in input space captures more information about adversarial vulnerability. Since has no explicit formulation, we could utilize the secondorder Taylor Approximation w.r.t input to approximate it within neighborhood:
(4) 
where is the parameters of the neural network. The Jacobian vector is of the same dimension with , and is the input feature vector. And Hessian matrix is a square matrix of secondorder partial derivatives of with regard to .
Given a correctly classified input with confidence . In adversarial settings, the adversarial robustness of neural network means that given a feasible set , e.g. constraints, all perturbations in this set cannot change the decision. Formally, it can be defined as following:
(5) 
To connect this objective function with our decision surface, we enforce a new constraint:
(6) 
This leads to , thus has the same sign with . Clearly, Eq. 5 is strictly guaranteed. Meanwhile, this formulation enforces stronger constraints: it means when neural network predicts, its neighborhood points should not only share the same decision but also have similar confidence bounded by absolute difference , similar with mixup (Zhang et al. (2017)).
Then, combining Taylor Approximation in Eq. 4 and Eq. 6, the following inequality can be derived:
(7) 
Since Hessian matrix is orthogonally diagonalizable, it could be decomposed by eigendecomposition: . is the eigenvector matrix composed of ’s eigenvectors , and is the diagonal eigenvalue matrix with only ’s eigenvalues in the diagonal as the nonzero entries. Let , we have:
(8) 
Therefore, we could show that the upper bound of Eq. 7 is:
(9) 
Here and are the entries of vector and , which depend on the choice of perturbation . is the entry of Jacobian vector, and is the eigenvalue of Hessian . Intuitively, given the constraints on (e.g. constraints), the upper bound highly depends on the Jacobian and Hessian matrix. As long as the magnitude of every and eigenvalue of Hessian could be controlled to the minimum, e.g. near zeros, the influence of perturbation can be constrained to a certain range, i.e. robust to any noises. Therefore, the average magnitude of these two parameter sets of a neural network could be defined as its robustness indicator.
4.2 The Geometric Explanation of Robustness Indicator
As shown in Eq. 9, model robustness highly relies on the magnitude of Jacobian entries and eigenvalues of Hessian. In differential geometry, these parameters has their specific geometry meaning:
For a multivariable function , Jacobian entry measures the slope of the tangent vector at point along axis, where low value denotes flat neighborhood. Therefore, it’s easy to understand that smaller Jacobian leads to a flat minima. Meanwhile, magnitude of eigenvalues of Hessian denotes the curvature (Alain et al. (2018); Cheeger & Ebin (2008)), which is defined as:
(10) 
is the reciprocal of the radius of osculating circle at current point. The conception in simple 1d case is shown in Fig. 5. Clearly, lower curvature (small eigenvalues) means that the hyperplane bends less, leading to a wider neighborhood of original point.
Based on the differential geometry meaning of Jacobian and Hessian, we could conclude that both constraints on Jacobian and Hessian in Eq. 9 appeal for a wider and flatter local neighborhood of input space decision surface (not parameter space). This is consistent with our preliminary visualization results in Fig. 3. Next, we will qualitatively and quantitatively demonstrate the effectiveness of our neural network robustness indicator.
4.3 Robustness Indicator Evaluation: A Case Study
Decision Surface of Natural Model vs. Robust Model In this section, we compare two pairs of robust and natural models on MNIST and CIFAR10. The two pair of models are released in MNIST/CIFAR adversarial challenges (Madry et al. (2018)) with same structure and comparable accuracy in natural test settings but different degree of robustness. The robust MNIST model is trained by MinMax optimization and could achieve 88% accuracy under all attacks with the constraints on a (0, 1) pixel range, which is believed to be the currently most robust model on MNIST dataset. By contrast, the natural model can be totally broken with 0.0% accuracy within same constraints. CIFAR models are same as Sec. 2. To prove our geometric robustness theory, we first visualize two pair of models’ decision surfaces in Fig. 6 (MNIST), and Fig. 7 (CIFAR10).
From Fig. 6, we can find significant difference between natural and robust decision surface: On robust decision surfaces (c) and (d), whether we choose random or adversarial projection, all neighborhood points around the original input point locates on the high plateau with . The surface in the neighborhood is rather flat with minimum slopes until it reaches constraints, which is the given adversarial attack constraint. This explains its exceptional robustness against all adversarial attacks. By contrast, natural decision surfaces shows sharp peaks and large slopes, on which decision confidence could quickly drop to negative along the axis. For CIFAR10 models as shown in Fig. 7, similar conclusions could be drawn that the degree of adversarial robustness depends on how well models could fit the neighborhood of input points, and a flat and wide plateau around the original points on decision surface is one of the most desired properties of a robust model ^{2}^{2}2More examples could be found in Appendix..
Jacobian and Hessian Statistics Analysis We also analyze the statistics of previous natural and robust MNIST model’s Jacobian and Hessian matrix.
We randomly take input images from test set and calculate their Jacobian and Hessian using natural and robust model, respectively. The data distribution and visualization results are shown in Fig. 8.
First, the norm of robust model’s Jacobian is much smaller than natural models: average norm of robust model’s Jacobian is about , ten times less than of natural model’s. And norm of robust model’s Hessian are , two times less than of natural models’. Therefore, the statistics of Jacobian and Hessian also verifies our robustness indicator theory in Eq. 9. Another significant difference is that compared to natural model, robust model’s Jacobian and Hessian are more sparse. For natural and robust models, the ratio of zeros in Jacobian are 1.1% and 54.0% respectively. As for Hessian, the ratio of zeros in eigenvalues for natural and robust models are 65.6% and 97.1% respectively^{3}^{3}3Jacobian and Hessian matrix entries are mostly near zero but nonzero values, therefore we consider values below  (Jacobian) and  (Hessian) as zeros.. One intuitive explanation of the relationship between robustness with Jacobian sparsity is that natural model’s Jacobian contains many unmeaningful but noisy gradients, while most of robust model’s Jacobian nonzero entries concentrate on the main trace of the digits, or socalled main features. This assumption is based on our observation that for most input images, robust model’s Jacobian could precisely capture the main trace of digits or main patterns of the object, as shown in Fig. 8. In such ways, if adversarial noise is uniformly injected into every pixel of the input image, only a small portion of them will be likely to influence decision, thus more robust to perturbations.
5 Toward Robustness Enhancement against Adversarial Attacks
5.1 Robust Training for Smoothing Decision Surface in Input Space
As shown in Eq. 9, better robustness of neural networks needs lower magnitude or zero Jacobian and Hessian. Here, we use Jacobian of our decision function w.r.t . Therefore to improve the network robustness and flatten the local minima, we propose a simple yet effective approach: add a regularizer on the Jacobian of decision loss in network training process. This could be done through double backpropagation (Drucker & Le Cun (1991)). As for Hessian eigenvalue, calculating the Hessian eigenvalues takes complexity ( is parameter space dimension). Currently there is no efficient way to regulate it in neural network context. Possible techniques to use are Hessian diagonal approximation (Martens (2010); Becker et al. (1988)), which we leave as future work.
To regulate the Jacobian of decision function , we could add a regularization term on to network training loss :
(11) 
where is crossentropy loss, and hyper parameter is the factor of penalty strength. For the regulation term , we can choose , or . As aforementioned, introducing the gradient loss into the training loss needs us to solve a secondorder gradient computing problem. To solve this problem, double backpropagation (Drucker & Le Cun (1991)) is needed. The crossentropy loss are first computed by forwardpropagation, with the gradients then being calculated by backpropagation. Note here we need to calculate both and as required in Eq. 12. Then, to minimize the gradient loss, the secondorder mixed partial derivative of gradient loss w.r.t is calculated. Note this mixed partial derivative is different from Hessian (which is pure secondorder derivative of or ), and thus is calculable (Ororbia et al. (2016)). After this, a second backpropagation operation is performed, and the weights of neural networks are updated according to gradient descent algorithm:
(12) 
Compared to adversarial training based defense techniques, our proposed robust training method doesn’t rely on adversarial example generation method, and thus is capable to defend against different adversarial attacks. The main extra computation overhead is the doubled backpropagation computation time, which we will show in the following experiments.
5.2 Robust Training Performance Evaluation
For comparison of the effect of robustness enhancement, we conduct different previous techniques: adversarial training (Kurakin et al. (2016)), crossentropy gradient regularization (Ross & DoshiVelez (2018)), MinMax training (Madry et al. (2018)), our method, and etc. Evaluated adversarial attacks include Fast Gradient Sign Method (FGSM), Basic Iterative Method (BIM) and C&W attack. Detailed experiment settings could be found in Appendix. Final results are shown in Table. 1 and 2.
FGSM  BIM  C&W  

Models  Natural  0.1  0.2  0.3  0.1  0.2  0.3  0.1  0.2  0.3 
Natural Model  99.1  67.3  12.9  4.7  22.5  0.0  0.0  21.6  0.0  0.0 
AdvTrain  99.1  73.0  52.7  10.9  62.0  6.5  0.0  71.09  17.0  2.1 
CrossEntropy  99.2  91.6  60.4  18.3  87.9  19.9  0.0  88.09  20.0  0.0 
Ours  98.4  91.6  70.3  41.6  88.1  64.9  26.7  89.2  72.6  37.6 
FGSM  BIM  C&W  

Models  Natural  3  6  9  3  6  9  3  6  9 
Natural Model  87.2  5.8  2.4  1.6  0.7  0.0  0.0  0.6  0.0  0.0 
AdvTrain*  84.5  10.2  5.8  2.6  1.4  0.0  0.0  0.0  0.0  0.0 
CrossEntropy*  86.2  19.1  9.5  6.1  2.6  0.7  0.4  2.1  1.5  1.4 
Ours  84.2  59.8  41.9  31.0  54.6  29.5  20.3  53.7  29.8  20.1 
Ours+AdvTrain  83.1  68.5  48.5  38.2  62.7  39.3  30.3  60.5  39.0  30.3 
MinMax*  79.4  65.8  55.6  47.4  64.2  49.3  41.1  62.9  48.5  40.7 
On MNIST dataset, our proposed gradient regularization method can achieve accuracy under all considered attacks within constraints. Cross entropy gradient regularization (Ross & DoshiVelez (2018)) achieves similar robustness as ours within , but their robustness performance drops very fast when adversarial attacking strengths increases, e.g. under attacks. The reason is that the softmax and cross entropy operation introduces unnecessary nonlinearity, in which case the gradients from cross entropy loss is already very small.
Improving robustness on CIFAR10 dataset is much harder than MNIST. Stateoftheart MinMax training achieves accuracy under strongest attacks in considered settings. But this method highly relies on huge amounts of adversarial data augmentation methods, which takes over 10 times of overhead during training process. By contrast, our method doesn’t need adversarial example generation and can achieves comparable robustness under constraints. We test the average time of each epoch for both natural training and gradient regularized training. Our time consumption is average 2.1 times than natural training per epoch. Notice that the robustness enhancement of our method becomes lower when becomes larger. This shows one limitation of gradient regularization methods: Our gradient regularization approach is based on Taylor approximation in a small neighborhood. When the adversarial examples exceeds the reasonable approximating range, the gradient regularization effect also exhausts. Empirically, we found robust training usually takes more epochs to converge: On CIFAR10, natural training takes about 30 epochs, our method usually need 100 epochs, and MinMax robust training takes over 400 epochs to converge in our implementation.
5.3 Analyzing the Input Space Decision Surface and Statistics
To test if our robust training method flattens the local minima of decision surfaces, we also visualize and compare natural and our model’s decision surface, as shown in Fig. 9. Compared to natural model’s surface, our model clearly has wider local neighborhood and lower slopes as expected. Meanwhile, the statistics of Jacobian and Hessian on MNIST models also align well with our previous robustness indication: The average norm of Jacobian and Hessian of our models are and times less than natural model, respectively.
Fig. 10 shows several examples and their Jacobian visualization of both natural and robust models (normalized to range for visualization).
The robust model’s Jacobian demonstrates better capability to capture the main feature of images on both MNIST and CIFAR10, as mentioned before.
6 Related Work
Previous one popular hypothesis is that neural network’s good generalization comes from flat local minima of the loss function in parameter space (Im et al. (2017); Keskar et al. (2017); Dinh et al. (2017); Kawaguchi (2016)). For example, Li et al. (2017) propose a visualize technique which establishes good connections between the minima geometry and generalization on ResNet. However, recently adversarial examples were introduced, which challenges the above generalization theory. Many adversarial attack methods are proposed (Szegedy et al. (2013); Kurakin et al. (2016); Carlini & Wagner (2017); Papernot et al. (2016a)). As for defense techniques, current defense techniques include adversarial training (Ian J. Goodfellow (2014)), defensive distillation (Papernot et al. (2016b)), parseval network (Cisse et al. (2017)), MinMax robustness optimization (Madry et al. (2018)), adversarial logit pairing (Kannan et al. (2018)), and etc. Original adversarial training techniques augment the natural training samples with corresponding adversarial examples together with correct labels. Recently proposed MinMax robustness optimization (Madry et al. (2018)) augments the training dataset with large amount of adversarial examples which cause the maximum loss increments within a norm ball, which is currently the strongest defense.
7 Conclusion
In this work, through visualizing network loss surface in parameter and input space, we point out the ineffectiveness of previous generalization theory under adversarial settings. Meanwhile, we show that adversarial examples are essentially the neighborhood underfitting issue of neural networks in input space. We then derive the connection between network robustness and decision surface geometry as an indicator of the neural network’s adversarial robustness. Guided by the indicator, we propose a practical robust training method, which involves no adversarial example generation. Extensive visualization results and experiments verify our theory and demonstrate the effectiveness of our proposed robustness enhancement method.
References
 Alain et al. (2018) Guillaume Alain, Nicolas Le Roux, and PierreAntoine Manzagol. Negative eigenvalues of the hessian in deep neural networks. 2018.
 Becker et al. (1988) Sue Becker, Yann Le Cun, et al. Improving the convergence of backpropagation learning with second order methods. In Proceedings of the 1988 connectionist models summer school, pp. 29–37. San Matteo, CA: Morgan Kaufmann, 1988.
 Carlini & Wagner (2017) Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. IEEE, 2017.
 Chaudhari et al. (2017) P Chaudhari, Anna Choromanska, S Soatto, Yann LeCun, C Baldassi, C Borgs, J Chayes, Levent Sagun, and R Zecchina. Entropysgd: Biasing gradient descent into wide valleys. In International Conference on Learning Representations (ICLR), 2017.
 Cheeger & Ebin (2008) Jeff Cheeger and David G Ebin. Comparison theorems in Riemannian geometry, volume 365. American Mathematical Soc., 2008.
 Cisse et al. (2017) Moustapha Cisse, Piotr Bojanowski, Edouard Grave, Yann Dauphin, and Nicolas Usunier. Parseval networks: Improving robustness to adversarial examples. In International Conference on Machine Learning, pp. 854–863, 2017.
 Dinh et al. (2017) Laurent Dinh, Razvan Pascanu, Samy Bengio, and Yoshua Bengio. Sharp minima can generalize for deep nets. In International Conference on Machine Learning, pp. 1019–1028, 2017.
 Drucker & Le Cun (1991) Harris Drucker and Yann Le Cun. Double backpropagation increasing generalization performance. In Neural Networks, 1991., IJCNN91Seattle International Joint Conference on, volume 2, pp. 145–150. IEEE, 1991.
 Goodfellow et al. (2015) Ian J Goodfellow, Oriol Vinyals, and Andrew M Saxe. Qualitatively characterizing neural network optimization problems. In International Conference on Learning Representations (ICLR), 2015.
 Ian J. Goodfellow (2014) Christian Szegedy Ian J. Goodfellow, Jonathon Shlens. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
 Im et al. (2017) Daniel Jiwoong Im, Michael Tao, and Kristin Branson. An empirical analysis of deep network loss surfaces. arXiv preprint arXiv:1612.04010, 2017.
 Kannan et al. (2018) Harini Kannan, Alexey Kurakin, and Ian Goodfellow. Adversarial logit pairing. arXiv preprint arXiv:1803.06373, 2018.
 Kawaguchi (2016) Kenji Kawaguchi. Deep learning without poor local minima. In Advances in Neural Information Processing Systems, pp. 586–594, 2016.
 Keskar et al. (2017) Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter Tang. On largebatch training for deep learning: Generalization gap and sharp minima. In International Conference on Learning Representations (ICLR), 2017.
 Kurakin et al. (2016) Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533, 2016.
 Li et al. (2017) Hao Li, Zheng Xu, Gavin Taylor, and Tom Goldstein. Visualizing the loss landscape of neural nets. arXiv preprint arXiv:1712.09913, 2017.
 Madry et al. (2018) Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. 2018.
 Martens (2010) James Martens. Deep learning via hessianfree optimization. In International Conference on Machine Learning, 2010.
 Ororbia et al. (2016) II Ororbia, G Alexander, C Lee Giles, and Daniel Kifer. Unifying adversarial training algorithms with flexible deep data gradient regularization. arXiv preprint arXiv:1601.07213, 2016.
 Papernot et al. (2016a) Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. The limitations of deep learning in adversarial settings. In Security and Privacy (EuroS&P), 2016 IEEE European Symposium on, pp. 372–387. IEEE, 2016a.
 Papernot et al. (2016b) Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE Symposium on Security and Privacy (SP), pp. 582–597. IEEE, 2016b.
 Ross & DoshiVelez (2018) Andrew Slavin Ross and Finale DoshiVelez. Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients. In AAAI, 2018.
 Szegedy et al. (2013) Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
 Zhang et al. (2017) Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David LopezPaz. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412, 2017.
Appendix A Appendix
a.1 The Explanation of Ineffectiveness of Loss Surface with Blanks
In Sec. 3, we mentioned that loss surfaces often demonstrate large regions of blanks. The reason is that the exponential and log operation involved in the cross entropy calculation. In this section, we give a simple case to show the ineffectiveness and demonstrate how blank regions produce. Consider a 10class neural network and one input image of label 0. Suppose we have ten different logit output as with confidence score ranged from  to . The corresponding cross entropy loss for ten different predictions are . We could see that in low confidence cases, cross entropy loss demonstrate informative trends with the increase of confidence. But when neural network prediction confidence reaches or above , the loss hardly changes, which causes certain large blank regions when visualizing the loss surfaces, which is consistent with Fig. 11.
a.2 Decision Surface Visualization with More Input Points
a.2.1 Comparisons of Loss Surface and Decision Surface With MNIST Input Images
a.2.2 Comparisons of Robust and Natural Models With CIFAR Input Images
a.3 Experiment Settings
In the evaluation on MNIST dataset, a fourlayer neural network model with two convolutional layers and two fully connected layers is adopted. After natural training, the baseline model achieves 99.17% accuracy. And for CIFAR10, we use a regular ConvNet with five convolutional layers and one global average pooling layer. For iterative methods BIM and C&W attack, we use 10 iterations and step size = 0.1 on MNIST and 1 on CIFAR. In adversarial training method, we use C&W attack to generate adversarial examples: For MNIST, we use 10 iterations, step size = 0.1, on pixel range . For CIFAR10, we use 10 iterations, step size = 1, on pixel range . The gradient regularization coefficient c (in Eq. 12) is set to 500 for gradient regularization.