# Generating Minimal Adversarial Perturbations with Integrated Adaptive Gradients

###### Abstract

We focus our attention on the problem of generating adversarial perturbations based on the gradient in image classification domain; substantial pixel perturbations make features learned by deep neural networks changed in clean images which fool deep neural models into making incorrect predictions. However, large-scale pixel modification in the image directly makes the changes visible although attack process success. To find the optimal perturbations which can quantify the boundary distance directly between clean images and adversarial examples in latent space, we propose a novel method for integrated adversarial perturbations generation, which formulate adversarial perturbations in the adaptive and integrated gradient level. Our approach calls few adaptive gradient operators to seek the decision boundary between original images and corresponding adversarial examples directly. We compare our proposed method for crafting adversarial perturbations with other state-of-the-art gradient-based attack methods. Experimental results suggest that adversarial samples generated by our approach show excellent efficiency in fooling deep neural classification networks with lower pixel modification and good transferability on image classification models.

## 1 Introduction

Deep Neural Networks(DNNs)[1, 2] have achieved state-of-the-art performance on image classification[3, 4], natural language processing[5] and audio processing[6]. However, as the study continuously goes deep in specific domains, it finds that the deep neural models are astonishingly susceptible to adversarial attacks[7, 8], especially in image classification domain, small-magnitude perturbations added into the input data could cause DNNs to misclassify[7, 9]. In general, adversarial attack methods generating adversarial perturbations calculate the distances of decision between different classes[10, 9, 11], which means fooling deep neural models with adversarial examples is to find the boundary distance that misleading DNNs into mislabeling the input data[8, 12]. There have been several techniques proposed to craft adversarial perturbations, such as single-step gradient updating attack[7], iterative-steps gradient updating attack methods[9], the white-box or black-box attack [13, 8, 14, 7, 11, 12].

Attacks also can be classified into targeted or non-targeted attack methods, targeted attack[12, 11, 15] aims to find perturbation which can fool deep neural models with specific target label, and non-targeted attack[9, 10] find perturbation that fool model with any label other than the ground-truth label. In this paper, we evaluate these adversarial attack strategies in white-box and black-box, targeted and no-targeted settings, and show the experimental results in the following sections.

We summarize contributions in this paper as follows:

An attack algorithm for generating adversarial examples is proposed to craft perturbations with integrated adaptive gradients that helps escape local minima when seeking minimal boundary distance between classes.

Our attack method achieve higher attack success rates both in white-box and black-box attack strategies with ever low image distortion than other state-of-the-art gradient-based adversarial attack methods with and norms.

We are the first to proposed attack method using an integrated adaptive gradient to craft adversarial perturbations, which show good transferability with high attack success rates across deep neural models.

## 2 Related Work

Adversarial attack methods on image classification try to find a perturbation adding into an original image which fools the deep neural models to make mistake classification of from correct label to other labels with high confidence.

### 2.1 Methods for Generating Adversarial Perturbations

Substantial pixels modified in images can lead misclassification, perturbation based on gradient like Fast gradient sign method(FGSM Eq.(1))[7], Iterative-FGSM(Eq.(2))[9] and momentum-based method[15] try to find proper perturbation vector, by maximizing the loss function .

(1) |

(2) |

(3) |

Carlini and Wagner proposed a targeted attack method called attack[11], which generate adversarial examples reduce detecting defenses rates with norm constraint, the parameter is norm constraint which is set to 0, 1, 2 or .

(4) |

[16] introduce a simple iterative method(JSMA) for targeted attack.

(5) |

[13] seek adversarial examples which fool a deep neural model with only one perturbation. In the equation, is set to control adversarial examples attack success rate.

(6) |

We present several attacks approaches for generating adversarial perturbations in this part, most of which fool DNNs with good effect.

### 2.2 Methods for Adversarial Example Defense

Defense adversarial examples in DNNs is a whole task; the defense mechanism needs to know the principles of the attack method, [17] introduce ensemble model mechanism to adversarial defense attack which trains dataset with adversarial examples produced that having good defense result against black-box and fast gradient attack methods. [18] present distillation defense method which reduces the effectiveness of adversarial samples on DNNs and lowers adversarial attack success rates with distillation mechanism. [19] propose the Feature Squeezing method which reduces the search space available to an adversarial example by coalescing samples that correspond to different feature vectors in the original space into a single sample.

Attack and defense are the interactive influent process that improves the robustness of Deep Neural Networks.

## 3 Proposed Method

In this part, we describe adversarial attack method which generates adversarial perturbation with integrated adaptive gradient, adversarial examples generated by our approach can fool deep neural networks with ever high attack success rate and lower pixel vibration both in white-box and black-box strategies, and it shows good effect in seeking the optimize boundary distance in latent space.

Before introducing our approach, we define = ,, be a set of image from the distribution , and be a classifier which are pre-trained deep neural models for image classification validation.

### 3.1 Integrated Adaptive Gradients for Adversarial Example Generation

Adaptive gradient method is a modified stochastic gradient descent with per-parameter learning rate updating. Informally, the adaptive mechanism increases the learning rate for more sparse parameters and decreases the learning rate for less sparse ones, and improve the convergence performance over standard stochastic gradient descent method. In adversarial perturbation generation processing, considering the gradient at time , adaptive mechanism adjusts gradient updating pace based on the calculation of sign function for the square loss , and use this calculation as denominator to perform adversarial perturbation updating with a decay factor , it performs small updates with frequently occurring features and substantial updates when occurring infrequent features, which helps escaping the local minimal. The class of adaptive methods for adversarial perturbation generation show good attack ability and transferability over different deep neural models; we elaborate on them in the following sections.

Seeking decision boundary is the key problem of processing adversarial attack in image classification domain, because category representation is a high-dimensional matrix in latent space, distance changes along with the center point of category domain changes, single gradient updating method is difficult to find the optimal gradient updating direction at one time, for single stochastic gradient descent lead trapping in local minimal easily when seeking decision boundary using gradient. Integrated gradient mechanism[20] gives a solution to search for the optimal gradient updating direction.

We consider that there is an optimize path from a clean image in class A to decision boundary beside class B(see in Fig.2.), there are paths , and to class B, our proposed method try to find the optimize perturbation which make boundary distance minimal between different classes and map the distance success in latent space, which satisfies the following constraint:

(7) |

If there is optimize path from class A to class C, and it gradient = along the dimension for an input , we seek different decision path() based on gradient () descent methods from original to adversarial domain, fist, we give each gradient a weight to control gradient size, and , we can derive that:

and the direction gradient can be expressed as follow:

Particularly, the parameter is a variable coefficient used to express the proportional relationship between and . Intuitively, the direction gradient can be seen as a correlation of and parameters and , it is easily derived that is related to and parameters and , when the angel in the dimension between the and is smaller, the parameter is smaller which means is near 0, and the gradient is closed to the optimize gradient , which means the optimize boundary distance in adversarial perturbation generation.

### 3.2 Adversarial Perturbations for Ensemble Deep Neural Models

In general, adversarial samples crafted on ensemble models[17] show good transferability across other networks for complete feature learned on multiple models. In ensemble adversarial samples processing, this strategy uses logits to activate functions to make attacks on ensembles networks and gain good efficient, because logits can reveal the logarithmic relationship between the models’ predictions well. We combine ensemble strategy with our proposed method to attack multiple neural models; the algorithm is revealed in the following.

## 4 Experimental Results

We conduct our experiments on ImageNet datasets to validate the effectiveness of our proposed method in this section with attack setting in the following part, experiments settings are kept the same setup both in and norm constraints and iteration time is set to 10, We show attack success rates and perturbation with our proposed method on preprocessed ILSVRC2012(Val) datasets which contain 3000 images, each category contains 3 images.

Attacks | Inc-v3 | Inc-v4 | IR-v2 | |
---|---|---|---|---|

Inc-v3 | I-FGSM | 98.41% | 28.86% | 27.25% |

MI-FGSM | 99.69% | 25.22% | 25.27% | |

Ours | 99.95% | 30.17% | 27.08% | |

Inc-v4 | I-FGSM | 29.36% | 96.72% | 25.45% |

MI-FGSM | 28.17% | 95.27% | 26.41% | |

Ours | 32.02% | 97.04% | 28.70% | |

IR-v2 | I-FGSM | 28.77% | 28.26% | 96.65% |

MI-FGSM | 26.67% | 25.65% | 97.01% | |

Ours | 28.54% | 27.54% | 97.14% | |

Attacks | VGG16 | VGG19 | Res152 | |

VGG16 | I-FGSM | 94.83% | 58.29% | 30.21% |

MI-FGSM | 94.22% | 57.69% | 31.75% | |

Ours | 96.70% | 59.01% | 32.69% | |

VGG19 | I-FGSM | 58.70% | 92.62% | 32.53% |

MI-FGSM | 56.72% | 95.17% | 31.31% | |

Ours | 59.46% | 96.67% | 32.04% | |

Res152 | I-FGSM | 33.63% | 34.92% | 100.00% |

MI-FGSM | 34.11% | 34.17% | 100.00% | |

Ours | 34.65% | 36.07% | 100.00% |

### 4.1 Parameter Setting

In the algorithm, we perform three different gradient-based methods in our experiments, because these three algorithms can self-modify gradient updating pace, which is very important in the process of finding the decision boundary. We analyze the weights with the following:

(8) |

We conducted experiments on the effect of parameters on the attack success rate, respectively for different (1, 10, 100) in norm, different parameters , and . The experimental results show that under the constraint, , and are set to 0.1, 0.1 and 0.8, this setting shows good effectiveness in fooling deep neural models and the lower image distortion. For ensemble models, is relative ensemble weight for the k-th model, particularly contains all positive values and is equal to 1. For the fairness of the experiment, so we set to 1/3 in our experiments.

### 4.2 Results on Attack Success Rates

We show results on the base of and norm bound. Our method performs good both under and norm. We also conduct experiments under black-box attack with two norm limitation, see in Table.1 and 2.

Attacks | Inc-v3 | Inc-v4 | IR-v2 | |

Inc-v3 | I-FGSM | 98.92% | 55.17% | 57.31% |

MI-FGSM | 99.89% | 56.55% | 58.11% | |

Ours | 99.72% | 58.55% | 56.67% | |

Inc-v4 | I-FGSM | 59.41% | 98.34% | 60.41% |

MI-FGSM | 61.65% | 95.88% | 61.01% | |

Ours | 67.64% | 99.43% | 60.89% | |

IR-v2 | I-FGSM | 63.86% | 60.17% | 95.15% |

MI-FGSM | 66.20% | 59.88% | 97.41% | |

Ours | 67.30% | 63.06% | 98.36% | |

Attacks | VGG16 | VGG19 | Res152 | |

VGG16 | I-FGSM | 99.31% | 69.01% | 62.45% |

MI-FGSM | 98.11% | 65.47% | 61.65% | |

Ours | 99.99% | 68.90% | 64.21% | |

VGG19 | I-FGSM | 71.36% | 98.98% | 57.88% |

MI-FGSM | 66.52% | 90.17% | 57.01% | |

Ours | 74.01% | 99.31% | 58.76% | |

Res152 | I-FGSM | 65.41% | 59.65% | 99.79% |

MI-FGSM | 63.55% | 55.25% | 99.61% | |

Ours | 64.84% | 60.94% | 99.73% |

Attack | Inception-v3 | Resnet152 | |
---|---|---|---|

=10 | I-FGSM | 36.41% | 39.54% |

MI-FGSM | 40.18% | 42.20% | |

Ours | 40.30% | 43.27% | |

=1500 | I-FGSM | 42.17% | 42.05% |

MI-FGSM | 42.02% | 42.70% | |

Ours | 42.85% | 42.19% |

Firstly, with both two norm constraints, we find that I-FGSM’s ASR grows fast with perturbation size lower than 5 and 600, that means, perturbation size affect ASR lot in this phase, MI-FGSM and our method grows slow but keep ASR at a high level(near 98%). All hold the high attack success rates when the perturbation size reaches = 8 and = 1200, in this phase, attack methods’ mechanism affects ASR lot because of the ability to escape local minimal for decision boundary seeking in this domain, our approach reaches higher ASR than other two methods. See in Fig.5.

It shows that ASR on different models with white-box attack strategy is higher than ASR with the black-box approach, it indicates that it is effective for improving the attack success rates with understanding models’ structure and parameters, that means white-box strategy gives more features learned for adversarial perturbation generation. With black-box limitation, the ASR is higher between models with similar structures; for example, VGG16 and VGG19, similar structure means the same feature representation with little changes and good transferability. See in Table.1 and Table.2.

Then, we conduct experiments on ensemble models with two norms bound; in general, our methods shows good attack capability in fooling ensemble models with different resolutions. Individually, Table.5 shows that, with the same experimental setting, adversarial perturbations generated by our method is close to the minimum which means the lower image distortion than others’.

In this part, we show ASR with different gradient based attack methods under two norm constraints in both two attack strategies. Experiments show that our proposed method shows good attack efficiency in fooling deep neural models

Norm Size | 1 | 5 | 10 | |
---|---|---|---|---|

U-ASR()299*299 | I-FGSM | 13.50% | 23.42% | 35.74% |

MI-FGSM | 16.16% | 28.88% | 41.09% | |

Ours | 14.92% | 26.96% | 42.31% | |

U-ASR()224*224 | I-FGSM | 39.81% | 63.23% | 77.27% |

MI-FGSM | 43.38% | 67.67% | 78.69% | |

Ours | 43.64% | 71.19% | 78.89% | |

Norm Size | 300 | 900 | 1500 | |

U-ASR()299*299 | I-FGSM | 60.76% | 77.98% | 83.78% |

MI-FGSM | 67.31% | 78.25% | 84.18% | |

Ours | 65.18% | 74.26% | 86.34% | |

U-ASR()224*224 | I-FGSM | 87.38% | 96.09% | 97.90% |

MI-FGSM | 87.92% | 96.36% | 98.09% | |

Ours | 88.78% | 97.78% | 98.58% |

Norm Size | 1 | 5 | 10 | |
---|---|---|---|---|

U-AMP()299*299 | I-FGSM | 0.004 | 0.021 | 0.040 |

MI-FGSM | 0.007 | 0.026 | 0.049 | |

Ours | 0.005 | 0.015 | 0.028 | |

U-AMP()224*224 | I-FGSM | 0.005 | 0.023 | 0.038 |

MI-FGSM | 0.008 | 0.021 | 0.035 | |

Ours | 0.007 | 0.016 | 0.024 | |

Norm Size | 300 | 900 | 1500 | |

U-AMP()299*299 | I-FGSM | 0.002 | 0.007 | 0.013 |

MI-FGSM | 0.003 | 0.008 | 0.014 | |

Ours | 0.003 | 0.007 | 0.012 | |

U-AMP()224*224 | I-FGSM | 0.002 | 0.008 | 0.012 |

MI-FGSM | 0.003 | 0.009 | 0.014 | |

Ours | 0.003 | 0.008 | 0.012 |

### 4.3 Results on Absolute Mean Perturbation

We give an evaluation function to qualify adversarial perturbation, which computes absolute mean perturbation(AMP) value, AMP = is a representation of the magnitude of the value of the disturbance added to the pixels of the clean image. On the perspective of perturbation size, the AMP value correlates with norm distance, the bigger the value is, the higher the image distortion. We find that the adversarial examples generated by MI-FGSM fool DNNs with high success rates, but it show visible changes on images because pixel modification is in big degree than our method and I-FGSM. See in Fig.7.

With two norm constraints, compared to I-FGSM and MI-FGSM, the value of perturbation generated by our algorithm maintains at a little level, especially with norm, MI-FGSM and I-FGSM produce perturbations with large variations, adversarial samples generated by our method fool models with same high attack success rate at a very little variation level in image, which means that pixel modification is effective in influencing models prediction, that is, the boundary distance our method accumulated is closed to the minimum boundary distance as shown in Fig.6.

## 5 Conclusion

We describe an adversarial attack method which generates perturbations with the integrated and adaptive gradient to seek the minimal boundary distance between different classes representation in latent space. Our proposed method can create adversarial examples which fool the deep neural networks with high probability, and the adversarial perturbations generated with lower variations show good transferability on different deep neural models. Experiments indicate that the integrated adaptive method is a fast and effective way in the process of finding the decision boundary. The integrated category is not complete enough in our experiments. We believe that enough integrated component and appropriate weight parameters are more conducive to the acquisition of optimal boundary distance. We will enrich the class of attack algorithm to our future research.

## References

- [1] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826, 2016.
- [2] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. In European conference on computer vision, pages 630–645. Springer, 2016.
- [3] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. In International Conference on Neural Information Processing Systems, pages 1097–1105, 2012.
- [4] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
- [5] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. CoRR, abs/1409.0473, 2014.
- [6] Nicholas Carlini and David Wagner. Audio adversarial examples: Targeted attacks on speech-to-text. In 2018 IEEE Security and Privacy Workshops(SPW), pages 1–7, May 2018.
- [7] Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. CoRR, abs/1412.6572, 2014.
- [8] Anh Nguyen, Jason Yosinski, and Jeff Clune. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 427–436, 2015.
- [9] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533, 2016.
- [10] Pedro Tabacof and Eduardo Valle. Exploring the space of adversarial images. In 2016 International Joint Conference on Neural Networks (IJCNN), pages 426–433. IEEE, 2016.
- [11] Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pages 39–57. IEEE, 2017.
- [12] Chaowei Xiao, Bo Li, Jun-Yan Zhu, Warren He, Mingyan Liu, and Dawn Xiaodong Song. Generating adversarial examples with adversarial networks. In IJCAI, 2018.
- [13] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Universal adversarial perturbations. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 86–94. IEEE, 2017.
- [14] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2574–2582, 2016.
- [15] Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Jun Zhu, Xiaolin Hu, and Jianguo Li. Boosting adversarial attacks with momentum. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
- [16] Nicolas Papernot, Patrick D. McDaniel, Somesh Jha, Matt Fredrikson, Z. Berkay Celik, and Ananthram Swami. The limitations of deep learning in adversarial settings. 2016 IEEE European Symposium on Security and Privacy, pages 372–387, 2016.
- [17] Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204, 2017.
- [18] Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE Symposium on Security and Privacy (SP), pages 582–597. IEEE, 2016.
- [19] Weilin Xu, David Evans, and Yanjun Qi. Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155, 2017.
- [20] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. CoRR, abs/1703.01365, 2017.
- [21] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 248–255. Ieee, 2009.