Generating Minimal Adversarial Perturbations with Integrated Adaptive Gradients

Generating Minimal Adversarial Perturbations with Integrated Adaptive Gradients

Yatie Xiao Chi-Man Pun
ytxiao18@gmail.com
cmpun@um.edu.mo
Abstract

We focus our attention on the problem of generating adversarial perturbations based on the gradient in image classification domain; substantial pixel perturbations make features learned by deep neural networks changed in clean images which fool deep neural models into making incorrect predictions. However, large-scale pixel modification in the image directly makes the changes visible although attack process success. To find the optimal perturbations which can quantify the boundary distance directly between clean images and adversarial examples in latent space, we propose a novel method for integrated adversarial perturbations generation, which formulate adversarial perturbations in the adaptive and integrated gradient level. Our approach calls few adaptive gradient operators to seek the decision boundary between original images and corresponding adversarial examples directly. We compare our proposed method for crafting adversarial perturbations with other state-of-the-art gradient-based attack methods. Experimental results suggest that adversarial samples generated by our approach show excellent efficiency in fooling deep neural classification networks with lower pixel modification and good transferability on image classification models.

1 Introduction

Deep Neural Networks(DNNs)[1, 2] have achieved state-of-the-art performance on image classification[3, 4], natural language processing[5] and audio processing[6]. However, as the study continuously goes deep in specific domains, it finds that the deep neural models are astonishingly susceptible to adversarial attacks[7, 8], especially in image classification domain, small-magnitude perturbations added into the input data could cause DNNs to misclassify[7, 9]. In general, adversarial attack methods generating adversarial perturbations calculate the distances of decision between different classes[10, 9, 11], which means fooling deep neural models with adversarial examples is to find the boundary distance that misleading DNNs into mislabeling the input data[8, 12]. There have been several techniques proposed to craft adversarial perturbations, such as single-step gradient updating attack[7], iterative-steps gradient updating attack methods[9], the white-box or black-box attack [13, 8, 14, 7, 11, 12].

Figure 1: We show clean images and corresponding adversarial images crafted on a deep neural network: Inception-v3 by our proposed algorithm with two different norms. The left column adversarial images are under norm limitation, and the right column adversarial images are under norm limitation. All adversarial examples are generated within 10 times iterations on deep neural models deployed(Perturbations are amplified by 5 times).

Attacks also can be classified into targeted or non-targeted attack methods, targeted attack[12, 11, 15] aims to find perturbation which can fool deep neural models with specific target label, and non-targeted attack[9, 10] find perturbation that fool model with any label other than the ground-truth label. In this paper, we evaluate these adversarial attack strategies in white-box and black-box, targeted and no-targeted settings, and show the experimental results in the following sections.

We summarize contributions in this paper as follows:

An attack algorithm for generating adversarial examples is proposed to craft perturbations with integrated adaptive gradients that helps escape local minima when seeking minimal boundary distance between classes.

Our attack method achieve higher attack success rates both in white-box and black-box attack strategies with ever low image distortion than other state-of-the-art gradient-based adversarial attack methods with and norms.

We are the first to proposed attack method using an integrated adaptive gradient to craft adversarial perturbations, which show good transferability with high attack success rates across deep neural models.

Figure 2: Boundary Distance in Latent Space(2D)

2 Related Work

Adversarial attack methods on image classification try to find a perturbation adding into an original image which fools the deep neural models to make mistake classification of from correct label to other labels with high confidence.

2.1 Methods for Generating Adversarial Perturbations

Substantial pixels modified in images can lead misclassification, perturbation based on gradient like Fast gradient sign method(FGSM Eq.(1))[7], Iterative-FGSM(Eq.(2))[9] and momentum-based method[15] try to find proper perturbation vector, by maximizing the loss function .

(1)
(2)
(3)

Carlini and Wagner proposed a targeted attack method called attack[11], which generate adversarial examples reduce detecting defenses rates with norm constraint, the parameter is norm constraint which is set to 0, 1, 2 or .

(4)

[16] introduce a simple iterative method(JSMA) for targeted attack.

(5)

[13] seek adversarial examples which fool a deep neural model with only one perturbation. In the equation, is set to control adversarial examples attack success rate.

(6)

We present several attacks approaches for generating adversarial perturbations in this part, most of which fool DNNs with good effect.

2.2 Methods for Adversarial Example Defense

Defense adversarial examples in DNNs is a whole task; the defense mechanism needs to know the principles of the attack method, [17] introduce ensemble model mechanism to adversarial defense attack which trains dataset with adversarial examples produced that having good defense result against black-box and fast gradient attack methods. [18] present distillation defense method which reduces the effectiveness of adversarial samples on DNNs and lowers adversarial attack success rates with distillation mechanism. [19] propose the Feature Squeezing method which reduces the search space available to an adversarial example by coalescing samples that correspond to different feature vectors in the original space into a single sample.

Attack and defense are the interactive influent process that improves the robustness of Deep Neural Networks.

3 Proposed Method

In this part, we describe adversarial attack method which generates adversarial perturbation with integrated adaptive gradient, adversarial examples generated by our approach can fool deep neural networks with ever high attack success rate and lower pixel vibration both in white-box and black-box strategies, and it shows good effect in seeking the optimize boundary distance in latent space.

Before introducing our approach, we define = ,, be a set of image from the distribution , and be a classifier which are pre-trained deep neural models for image classification validation.

Figure 3: Adversarial Example Generation with Integrated Adaptive Gradients
Figure 4: Generating Integrated Adversarial Perturbations on Ensemble models
0:  A clean image with true label from dataset ; a classifier ; perturbation size ;
0:  Adversarial examples with
1:  initial: = 0, = 10e-8, , , and
2:  for  = 0 : k, = 0, = do
3:     for  0 : n do
4:        while   do
5:           get loss base on
6:           update by .
7:           update by .
8:           update by .
9:           update by .
10:           update by .
11:           update by .
12:           update by .
13:           update by .
14:           update
15:        end while
16:     end for
17:     update
18:  end for
19:  return
Algorithm 1 Adversarial Examples Generation with Integrated Adaptive Gradients

3.1 Integrated Adaptive Gradients for Adversarial Example Generation

Adaptive gradient method is a modified stochastic gradient descent with per-parameter learning rate updating. Informally, the adaptive mechanism increases the learning rate for more sparse parameters and decreases the learning rate for less sparse ones, and improve the convergence performance over standard stochastic gradient descent method. In adversarial perturbation generation processing, considering the gradient at time , adaptive mechanism adjusts gradient updating pace based on the calculation of sign function for the square loss , and use this calculation as denominator to perform adversarial perturbation updating with a decay factor , it performs small updates with frequently occurring features and substantial updates when occurring infrequent features, which helps escaping the local minimal. The class of adaptive methods for adversarial perturbation generation show good attack ability and transferability over different deep neural models; we elaborate on them in the following sections.

Seeking decision boundary is the key problem of processing adversarial attack in image classification domain, because category representation is a high-dimensional matrix in latent space, distance changes along with the center point of category domain changes, single gradient updating method is difficult to find the optimal gradient updating direction at one time, for single stochastic gradient descent lead trapping in local minimal easily when seeking decision boundary using gradient. Integrated gradient mechanism[20] gives a solution to search for the optimal gradient updating direction.

We consider that there is an optimize path from a clean image in class A to decision boundary beside class B(see in Fig.2.), there are paths , and to class B, our proposed method try to find the optimize perturbation which make boundary distance minimal between different classes and map the distance success in latent space, which satisfies the following constraint:

(7)

If there is optimize path from class A to class C, and it gradient = along the dimension for an input , we seek different decision path() based on gradient () descent methods from original to adversarial domain, fist, we give each gradient a weight to control gradient size, and , we can derive that:

and the direction gradient can be expressed as follow:

Particularly, the parameter is a variable coefficient used to express the proportional relationship between and . Intuitively, the direction gradient can be seen as a correlation of and parameters and , it is easily derived that is related to and parameters and , when the angel in the dimension between the and is smaller, the parameter is smaller which means is near 0, and the gradient is closed to the optimize gradient , which means the optimize boundary distance in adversarial perturbation generation.

3.2 Adversarial Perturbations for Ensemble Deep Neural Models

In general, adversarial samples crafted on ensemble models[17] show good transferability across other networks for complete feature learned on multiple models. In ensemble adversarial samples processing, this strategy uses logits to activate functions to make attacks on ensembles networks and gain good efficient, because logits can reveal the logarithmic relationship between the models’ predictions well. We combine ensemble strategy with our proposed method to attack multiple neural models; the algorithm is revealed in the following.

0:  A clean image with true label from dataset ; perturbation size , logits classifiers with corresponding ensemble weights ;
0:  Adversarial examples with
1:  initial: = 0, = 10e-8, , , and
2:  for  = 1 : k, = 0, = do
3:     for  = 0 : n do
4:        while   do
5:           get by input into
6:           fuse the logits
7:           get loss base on
8:           get
9:           get
10:        end while
11:     end for
12:     update
13:  end for
14:  return
Algorithm 2 Adversarial Examples Generation for Ensemble Models

4 Experimental Results

We conduct our experiments on ImageNet datasets to validate the effectiveness of our proposed method in this section with attack setting in the following part, experiments settings are kept the same setup both in and norm constraints and iteration time is set to 10, We show attack success rates and perturbation with our proposed method on preprocessed ILSVRC2012(Val) datasets which contain 3000 images, each category contains 3 images.

Figure 5: Attack Success Rate(ASR) on different DNNs with different perturbation size. The left figure shows ASR with norm constraint; the right gives ASR with norm constraint in no-targeted strategy.
Attacks Inc-v3 Inc-v4 IR-v2
Inc-v3 I-FGSM 98.41% 28.86% 27.25%
MI-FGSM 99.69% 25.22% 25.27%
Ours 99.95% 30.17% 27.08%
Inc-v4 I-FGSM 29.36% 96.72% 25.45%
MI-FGSM 28.17% 95.27% 26.41%
Ours 32.02% 97.04% 28.70%
IR-v2 I-FGSM 28.77% 28.26% 96.65%
MI-FGSM 26.67% 25.65% 97.01%
Ours 28.54% 27.54% 97.14%
Attacks VGG16 VGG19 Res152
VGG16 I-FGSM 94.83% 58.29% 30.21%
MI-FGSM 94.22% 57.69% 31.75%
Ours 96.70% 59.01% 32.69%
VGG19 I-FGSM 58.70% 92.62% 32.53%
MI-FGSM 56.72% 95.17% 31.31%
Ours 59.46% 96.67% 32.04%
Res152 I-FGSM 33.63% 34.92% 100.00%
MI-FGSM 34.11% 34.17% 100.00%
Ours 34.65% 36.07% 100.00%
Table 1: ASR in norm constraint on six deep neural models, * indicate white-box attacks, IR-v2 indicate InceptionResnet-v2, perturbation size 10.

4.1 Parameter Setting

In the algorithm, we perform three different gradient-based methods in our experiments, because these three algorithms can self-modify gradient updating pace, which is very important in the process of finding the decision boundary. We analyze the weights with the following:

(8)

We conducted experiments on the effect of parameters on the attack success rate, respectively for different (1, 10, 100) in norm, different parameters , and . The experimental results show that under the constraint, , and are set to 0.1, 0.1 and 0.8, this setting shows good effectiveness in fooling deep neural models and the lower image distortion. For ensemble models, is relative ensemble weight for the k-th model, particularly contains all positive values and is equal to 1. For the fairness of the experiment, so we set to 1/3 in our experiments.

4.2 Results on Attack Success Rates

We show results on the base of and norm bound. Our method performs good both under and norm. We also conduct experiments under black-box attack with two norm limitation, see in Table.1 and 2.

Figure 6: Absolute Mean Perturbation(AMP) value on different DNNs with different perturbation size. The left figure shows AMP with norm constraint; the right gives AMP value with norm constraint in no-targeted strategy.
Attacks Inc-v3 Inc-v4 IR-v2
Inc-v3 I-FGSM 98.92% 55.17% 57.31%
MI-FGSM 99.89% 56.55% 58.11%
Ours 99.72% 58.55% 56.67%
Inc-v4 I-FGSM 59.41% 98.34% 60.41%
MI-FGSM 61.65% 95.88% 61.01%
Ours 67.64% 99.43% 60.89%
IR-v2 I-FGSM 63.86% 60.17% 95.15%
MI-FGSM 66.20% 59.88% 97.41%
Ours 67.30% 63.06% 98.36%
Attacks VGG16 VGG19 Res152
VGG16 I-FGSM 99.31% 69.01% 62.45%
MI-FGSM 98.11% 65.47% 61.65%
Ours 99.99% 68.90% 64.21%
VGG19 I-FGSM 71.36% 98.98% 57.88%
MI-FGSM 66.52% 90.17% 57.01%
Ours 74.01% 99.31% 58.76%
Res152 I-FGSM 65.41% 59.65% 99.79%
MI-FGSM 63.55% 55.25% 99.61%
Ours 64.84% 60.94% 99.73%
Table 2: ASR in norm constraint on six deep neural models, * indicate white-box attacks, IR-v2 indicate InceptionResnet-v2, perturbation size 1500.
Attack Inception-v3 Resnet152
=10 I-FGSM 36.41% 39.54%
MI-FGSM 40.18% 42.20%
Ours 40.30% 43.27%
=1500 I-FGSM 42.17% 42.05%
MI-FGSM 42.02% 42.70%
Ours 42.85% 42.19%
Table 3: Top-1 Target Accuracy in the targeted strategy with two different norms. The targeted label is crane.

Firstly, with both two norm constraints, we find that I-FGSM’s ASR grows fast with perturbation size lower than 5 and 600, that means, perturbation size affect ASR lot in this phase, MI-FGSM and our method grows slow but keep ASR at a high level(near 98%). All hold the high attack success rates when the perturbation size reaches = 8 and = 1200, in this phase, attack methods’ mechanism affects ASR lot because of the ability to escape local minimal for decision boundary seeking in this domain, our approach reaches higher ASR than other two methods. See in Fig.5.

It shows that ASR on different models with white-box attack strategy is higher than ASR with the black-box approach, it indicates that it is effective for improving the attack success rates with understanding models’ structure and parameters, that means white-box strategy gives more features learned for adversarial perturbation generation. With black-box limitation, the ASR is higher between models with similar structures; for example, VGG16 and VGG19, similar structure means the same feature representation with little changes and good transferability. See in Table.1 and Table.2.

Then, we conduct experiments on ensemble models with two norms bound; in general, our methods shows good attack capability in fooling ensemble models with different resolutions. Individually, Table.5 shows that, with the same experimental setting, adversarial perturbations generated by our method is close to the minimum which means the lower image distortion than others’.

In this part, we show ASR with different gradient based attack methods under two norm constraints in both two attack strategies. Experiments show that our proposed method shows good attack efficiency in fooling deep neural models

Norm Size 1 5 10
U-ASR()299*299 I-FGSM 13.50% 23.42% 35.74%
MI-FGSM 16.16% 28.88% 41.09%
Ours 14.92% 26.96% 42.31%
U-ASR()224*224 I-FGSM 39.81% 63.23% 77.27%
MI-FGSM 43.38% 67.67% 78.69%
Ours 43.64% 71.19% 78.89%
Norm Size 300 900 1500
U-ASR()299*299 I-FGSM 60.76% 77.98% 83.78%
MI-FGSM 67.31% 78.25% 84.18%
Ours 65.18% 74.26% 86.34%
U-ASR()224*224 I-FGSM 87.38% 96.09% 97.90%
MI-FGSM 87.92% 96.36% 98.09%
Ours 88.78% 97.78% 98.58%
Table 4: ASR on ensemble models with two norm constraint.
Norm Size 1 5 10
U-AMP()299*299 I-FGSM 0.004 0.021 0.040
MI-FGSM 0.007 0.026 0.049
Ours 0.005 0.015 0.028
U-AMP()224*224 I-FGSM 0.005 0.023 0.038
MI-FGSM 0.008 0.021 0.035
Ours 0.007 0.016 0.024
Norm Size 300 900 1500
U-AMP()299*299 I-FGSM 0.002 0.007 0.013
MI-FGSM 0.003 0.008 0.014
Ours 0.003 0.007 0.012
U-AMP()224*224 I-FGSM 0.002 0.008 0.012
MI-FGSM 0.003 0.009 0.014
Ours 0.003 0.008 0.012
Table 5: AMP Value on ensemble models with two norm constraint.

4.3 Results on Absolute Mean Perturbation

We give an evaluation function to qualify adversarial perturbation, which computes absolute mean perturbation(AMP) value, AMP = is a representation of the magnitude of the value of the disturbance added to the pixels of the clean image. On the perspective of perturbation size, the AMP value correlates with norm distance, the bigger the value is, the higher the image distortion. We find that the adversarial examples generated by MI-FGSM fool DNNs with high success rates, but it show visible changes on images because pixel modification is in big degree than our method and I-FGSM. See in Fig.7.

With two norm constraints, compared to I-FGSM and MI-FGSM, the value of perturbation generated by our algorithm maintains at a little level, especially with norm, MI-FGSM and I-FGSM produce perturbations with large variations, adversarial samples generated by our method fool models with same high attack success rate at a very little variation level in image, which means that pixel modification is effective in influencing models prediction, that is, the boundary distance our method accumulated is closed to the minimum boundary distance as shown in Fig.6.

Figure 7: Adversarial samples with confidence and AMP values on Inceptive-v3 with = 10 norms generated by different adversarial attack methods.

5 Conclusion

We describe an adversarial attack method which generates perturbations with the integrated and adaptive gradient to seek the minimal boundary distance between different classes representation in latent space. Our proposed method can create adversarial examples which fool the deep neural networks with high probability, and the adversarial perturbations generated with lower variations show good transferability on different deep neural models. Experiments indicate that the integrated adaptive method is a fast and effective way in the process of finding the decision boundary. The integrated category is not complete enough in our experiments. We believe that enough integrated component and appropriate weight parameters are more conducive to the acquisition of optimal boundary distance. We will enrich the class of attack algorithm to our future research.

References

  • [1] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826, 2016.
  • [2] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. In European conference on computer vision, pages 630–645. Springer, 2016.
  • [3] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. In International Conference on Neural Information Processing Systems, pages 1097–1105, 2012.
  • [4] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  • [5] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. CoRR, abs/1409.0473, 2014.
  • [6] Nicholas Carlini and David Wagner. Audio adversarial examples: Targeted attacks on speech-to-text. In 2018 IEEE Security and Privacy Workshops(SPW), pages 1–7, May 2018.
  • [7] Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. CoRR, abs/1412.6572, 2014.
  • [8] Anh Nguyen, Jason Yosinski, and Jeff Clune. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 427–436, 2015.
  • [9] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533, 2016.
  • [10] Pedro Tabacof and Eduardo Valle. Exploring the space of adversarial images. In 2016 International Joint Conference on Neural Networks (IJCNN), pages 426–433. IEEE, 2016.
  • [11] Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pages 39–57. IEEE, 2017.
  • [12] Chaowei Xiao, Bo Li, Jun-Yan Zhu, Warren He, Mingyan Liu, and Dawn Xiaodong Song. Generating adversarial examples with adversarial networks. In IJCAI, 2018.
  • [13] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Universal adversarial perturbations. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 86–94. IEEE, 2017.
  • [14] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2574–2582, 2016.
  • [15] Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Jun Zhu, Xiaolin Hu, and Jianguo Li. Boosting adversarial attacks with momentum. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
  • [16] Nicolas Papernot, Patrick D. McDaniel, Somesh Jha, Matt Fredrikson, Z. Berkay Celik, and Ananthram Swami. The limitations of deep learning in adversarial settings. 2016 IEEE European Symposium on Security and Privacy, pages 372–387, 2016.
  • [17] Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204, 2017.
  • [18] Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE Symposium on Security and Privacy (SP), pages 582–597. IEEE, 2016.
  • [19] Weilin Xu, David Evans, and Yanjun Qi. Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155, 2017.
  • [20] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. CoRR, abs/1703.01365, 2017.
  • [21] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 248–255. Ieee, 2009.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
352308
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description