ASP:A Fast Adversarial Attack Example Generation Framework based on Adversarial Saliency Prediction
With the excellent accuracy and feasibility, the Neural Networks (NNs) have been widely applied into the novel intelligent applications and systems. However, with the appearance of the Adversarial Attack, the NN based system performance becomes extremely vulnerable: the image classification results can be arbitrarily misled by the adversarial examples, which are crafted images with human unperceivable pixel-level perturbation. As this raised a significant system security issue, we implemented a series of investigations on the adversarial attack in this work: We first identify an image’s pixel vulnerability to the adversarial attack based on the adversarial saliency analysis. By comparing the analyzed saliency map and the adversarial perturbation distribution, we proposed a new evaluation scheme to comprehensively assess the adversarial attack precision and efficiency. Then, with a novel adversarial saliency prediction method, a fast adversarial example generation framework, namely “ASP”, is proposed with significant attack efficiency improvement and dramatic computation cost reduction. Compared to the previous methods, experiments show that ASP has at most 12 speed-up for adversarial example generation, 2 lower perturbation rate, and high attack success rate of 87% on both MNIST and Cifar10. ASP can be also well utilized to support the data-hungry NN adversarial training. By reducing the attack success rate as much as 90%, ASP can quickly and effectively enhance the defense capability of NN based system to the adversarial attacks.
Nowadays, the Neural Network (NN) is considered as one of the most representative machine learning technologies, and has been widely applied into intelligent applications and embedded systems, such as augmented reality devices , mobile natural language processing , and autonomous-driving system . However, a considerable security issue has also emerged recently with an NN dedicated attack method, namely, the Adversarial Attack .
The current adversarial attacks are usually designed to manipulate the NN image classification results arbitrarily, which is achieved by injecting adversarial examples into the NN testing phase . Those adversarial examples are generated by dedicated adversarial attack algorithms, which distort the original images with pixel-level perturbations that even human vision can’t perceive . Even imperceptible, these perturbations can effectively fool the state-of-the-art NN systems. As shown in Fig. 1, although the human can still recognize the adversarial examples as the correct classes of the originals, the similar adversarial examples could cause 90% misclassification rate to a well-trained NN system -.
The adversarial attacks not only demonstrate the vulnerability of the NNs with significant security issues in practical systems, but also reveal the significant cognitive difference between the NNs and the human vision, which makes the adversarial attack an important approach for the NN study. Therefore, more and more effort has been made to the adversarial attack research recently -. However, without deep understanding and comprehensive evaluation, most of the adversarial attack methods still suffer from inconsistent attack success rate, large perturbation area, and considerable computation cost -.
To gain a better understanding of the adversarial attack, we implemented a series of investigations on the adversarial attack, in terms of attack evaluation, attack generation, and attack defense. In this work, we have the following contributions:
We invented a comprehensive adversarial attack evaluation sche-me. By identifying an image’s pixel vulnerability distribution to the adversarial attack with saliency analysis, we can evaluate the precision of the perturbation distribution generated on the adversarial example, and therefore the overall adversarial attack efficiency;
We designed ASP, an innovative fast adversarial example generation framework. ASP is based on a new adversarial saliency prediction method with comprehensive adversarial pattern analysis and extraction. With predicted adversarial pattern, the high-quality adversarial examples can be quickly generated without considerable computation overhead;
We applied ASP to support the data-hungry adversarial training process. With massive generated adversarial examples included in the NN training phase as vaccines, the immunity of NNs to the adversarial attack can be effectively enhanced;
We implemented the proposed fast adversarial attack example generation framework, as well as the fast adversarial training framework, and quantitatively evaluated their performance comparing to the previous methods.
Experiments show that, compared to the previous adversarial attack methods, ASP has significant cost reduction with (2100) computation speed-up on MNIST dataset  and (212) speed-up on Cifar10 . The generated adversarial examples also demonstrated optimal quality with (1.52) lower perturbation rate, and high attack success rate of 87%92%. When the ASP is utilized in adversarial training, the adversarial attack can be effectively defended with dramatic attack success rate reduction of 45%90%.
2.1 Gradient-based Adversarial Attacks
An NN could be seen as a large-scale non-linear function, composed with massive volumes of neurons, weights () and bias values (). For the neuron i in layer l, the activation is:
While, the whole NN could be written as a function of f(x):
The training phase of a NN can be seen as a loss function optimization process for , which is to reduce the error differentiated between the predicted labels and the true ones. When the error is iteratively reduced by modifying the weights with gradient-based backpropagation , the NN classification accuracy correspondingly increases and finally reaches a satisfaction degree.
On the other hand, the adversarial attack is similar to the training phase but with the opposite object. As shown in Fig. 2, when certain attack targeted error is injected from the very back-end of the classification, the false gradients will be propagated eventually into the image and cause pixel-level perturbation. And during the testing phase of forward-propagation, the distorted image will cause corresponding classification failure, in other words, adversarial attack success.
To achieve the optimal image perturbation, we suppose is a classifier mapping an -dimensional input vector to a discrete label set. For an original image of and an attack targeted false label , the adversarial example generation can be defined as a perturbation minimization process:
where, and are the original image and adversarial example respectively, vector is the perturbation vector, is the regulation factor ( for L1-norm, for L2-norm, and for L--norm), and [0, 1] is the image pixel value bound constraint. ([0,1] is normalized from for MNIST dataset and for Cifar10, respectively.)
2.2 State-of-the-Art Attacking Methods
Currently, several representative adversarial attack methods are proposed for adversarial example generation, such as Fast Gradient Method (FGM) , Basic Iterative Method (BIM) , DeepFool method , and etc. All these methods can generate effective adversarial examples with human unperceivable pixel-level perturbation as shown in Fig. 3.
Fast Gradients Method: FGM is one of the simplest and fastest adversarial attack methods. It utilizes a relatively big perturbation step with a hyper-parameter of to generate perturbation without specifically targeted false label:
where, is the NN loss function in terms of cross-entropy usually. As a non-targeted adversarial attack, such a method only costs one call to back propagation, offering high attacking speed .
Basic Iterative Method: BIM is an improved method composed of iterative FGM, which runs FGM multiple times with a small step size . During iteration pixel values need to be clipped to [0, 1] after each iteration to ensure that they are in an -neighborhood of the original image .
This method is a non-targeted attack but takes more iterations compared to FGM. Both FGM and BIM are optimized for L--norm because they need to restrict the perturbation step to be smaller than a certain threshold .
DeepFool: DeepFool is also a non-targeted attack technique optimized for the L2-norm. This method supposes that a NN is a linear functions with a hyperplane separating each class from another. By this assumption, it analytically derives the optimal perturbation to push the example to pass the hyperplane. Since NNs are not actually linear, it repeats this process iteratively until a successful adversarial example is found.
2.3 Adversarial Training for Defense
For one certain adversarial example, the adversarial attack succeeds when the NN wrongly recognizes the hidden adversarial pattern to be the main pattern. These circumstances happen because NNs cannot generalize well on pictures containing both adversarial and main patterns. Thus, training on both adversarial and original examples could be seen as one of the data augmentation schemes: augmentating the training data to improve the generalization of NNs to the adversarial examples. Previous works show that, when adding adversarial examples as a “vaccine” subset in the training data, the NN could be more immune to the adversarial attacks with significantly lower attack success rate . In this work, we also applied the adversarial training to enhance the NN immunity to the adversarial attack, more details will be presented in Section 5.
3 Performance Evaluation for Adversarial Attacks
In this section, we will investigate the performance of current adversarial attacks, and propose a novel comprehensive evaluation scheme based on the adversarial attack analysis.
3.1 Current Evaluation Metrics
In the current adversarial attack works, the evaluation metrics mainly focus on the error manipulation levels, such as: Attack Success Rate (1 - Prediction Accuracy), Perturbation Rate (ratio of the manipulated pixel number to the image resolution), and Perturbation Degree (ratio of the total manipulated pixel value to the overall image pixel value summation) -. Table 1 shows how these metrics evaluate the performance of FGM, BIM and DeepFool attacking the MNIST dataset .
|Attack Success Rate||92.4%||99.0%||98.9%|
However, those metrics can’t fully explain the effectiveness of those attacking methods. For example, FGM has both the highest perturbation rate and perturbation degree, which should lead a highest attack success rate, since more perturbation means a higher chance of successful attack. But, in fact, FGM has the lowest success rate of 92.4%, compared to BIM of 99.0% and DeepFool of 98.9%. On the other hand, DeepFool has much lower perturbation degree of 3.41% and medium perturbation rate of 65.92%, but achieves the best success rate of 98.9% as BIM. Moreover, even BIM and DeepFool have the similar success rate, BIM has 4 more perturbation degree than DeepFool.
3.2 Adversarial Saliency Efficiency
In this work, we propose a new evaluating scheme – Adversarial Saliency Efficiency (ASE). Rather than only analyzing the attack result, our ASE examines adversarial attack with precision and efficiency with adversarial saliency analysis . The intuitive of this scheme is to tell if the adversarial example generation algorithm can precisely find the most vulnerable or sensitive pixels to cast perturbation. Here, the sensitivity is defined as how much prediction results error that a unit perturbation in the pixel could cause.
Mathematically, the ASE is derived from the pixel saliency analysis based on Jacobian-matrix, which describes the pixel vulnerability distribution for classification :
where, means the pixel’s differential impact for the image to be classified as label . Based on the Jacobian-matrix, Adversarial Saliency Map(ASM) is defined by the following equation:
where, a mask scheme is proposed to polarize the robust pixels and attack robust pixels (with true label’s derivative score or the sum of false label’s derivative score ). Hence, the ASM can offer a comprehensive statistic of the vulnerable pixels of an image.
With ASM, ASE is then calculated by the divergence of ASM distribution and adversarial perturbation distribution generated by specific adversarial attack method, which is defined as:
In Eq. 8, rather than evaluate every individual pixel in the adversarial attack, we choose the most vulnerable pixels with the highest derivative score. The is selected in regards of the resolution of the input image, since an improper value could cause significant interpretability issue of ASE. In our experiment, we choose =50 for the MNIST dataset and =100 for the Cifar10 dataset. The sum of effective perturbations on the pixels will be further normalized by , which is the perturbation summation of all the pixels on the image regardless of their vulnerability.
To evaluate the effective of the proposed ASE, we applied it to three methods as shown in Fig. 4. The ASE of the three methods are 5.6%, 10.1%, 17.9% respectively. And we can see that the ASE describes their efficiency very well: For FGM, even its perturbation rate is very high, the ASE is low, which means most of the perturbation are not useful for attacking purpose. This causes the classification accuracy higher than other two attacks, which means lower attack success rate. On the other hand, the low perturbation rate and high ASE of DeepFool indicate an outstanding pixel attack precision, which make the attack success rate optimal. For BIM, its ASE along with its perturbation degree are both relatively high, indicating the best performance with precise and concentrated attack.
With our proposed evaluation scheme, the adversarial attack efficiency can be comprehensively evaluated and applied to guide the fast adversarial attack as well as defense.
4 ASP Framework Design
Due to the complex pixel vulnerability analysis and backpropagation computation, the current adversarial attack methods all suffer from high computation cost. In this work, we propose our fast adversarial example generation framework ASP: With previous ASE analysis, we first propose a new adversarial saliency prediction method, which can effectively analyze and extract a general adversarial pattern for dedicated attack. With the prediction, the dedicated pixel analysis for each adversarial example generation can be effectively avoided and significantly improve the attack speed.
4.1 Adversarial Saliency Prediction
From previous analysis, we have known that ASM score could be used to find the most sensitive or vulnerable pixels to attack. And most adversarial attack should follow the specific adversarial pattern along with those vulnerable pixels distribution. An example is shown in Fig. 2, which is an adversarial example attacking MNIST classificatoin with a target of 3 8. The best attack performance is achieved by perturbing the most vulnerable pixels mainly lying in the different trace of 3 8. In fact, not only for 3 8, for any other certain pair of original class and target class, their main patterns should be analogous.
Therefore, a general pattern in ASM should be predictable for one certain pair of classes, which could be utilized for attack analysis and computation optimization. Suppose we have a N class dataset. For each pair of , we could predict the ASM pattern of them by producing large numbers of ASM training dataset, based on which we could use linear regression algorithm to predict the ASM distribution in the general pattern of saliency maps of each pair. The detailed algorithm for the adversarial saliency prediction is shown in Algorithm 1.
A sample prediction for MNIST and Cifar10 is shown in Fig. 5. For MNIST dataset, we could see clear patterns for a series of target adversarial saliency, which are quite easy to understand: Attacking these pixels will alter the original handwriting shape towards the target class’s shape. For Cifar10 dataset, since the images have three channels (RGB), the pattern is not that straight-forward to understand but these patterns all perform well in the following attack phase, which will be discussed in Section 6.
4.2 Fast Adversarial Example Generation
With predicted adversarial pattern, attackers can directly utilize it to replace the time-consuming gradients computation for individual adversarial example generation. Specifically, we first distort the most sensitive pixels value according to the order of our ASP pattern score and a certain perturbation rate. Most examples will succeed in causing misclassfication results after this step. Fig. 6 shows such one adversarial image generation example in MNIST. From it we could see that the predicted pattern is still quite accurate and matches the intuitive adversarial pattern. In addition, for unsuccessful attacked images, we will further adaptively distort more pixels. The overall algorithm is shown in Algorithm 1.
To meet the highest performance of adversarial attack effectiveness, we further optimize the adversarial example generation process. We know that a higher perturbation rate not only increases the attack success rate but also makes the adversarial feature perceivable. So, to analysis the trade-off between the perturbation rate and attacking success rate, we made a series of tests and eventually choose 21.7% as the perturbation rate, resulting in 1.4% of test accuracy (or 98.6% of attacking success rate) with only a few unsuccessful examples.
For those ineffective examples, we further adaptively increase their perturbation rate and modifying pixels according to ASM scores by a step of 10 in prediction test iteration. Once the prediction results change to false, we stop the process. Since number of these examples is minimum, the adaptive perturbation process would not take considerable time or influence the average perturbation rate to a large extent. Thus, we could make sure that our attacking method could achieve as high attacking successful rate and maintain low perturbation rate as possible.
4.3 ASP Framework Overview
In this section, we combine the previously proposed algorithms and methods to form an effective fast adversarial example generation framework, namly ASP:
During the prediction training: we first use chain forward derivative to calculate the Jacobian-matrix for each with different false label in training set. Then we build the saliency maps based on the Jacobian-matrix. With the saliency maps training data, we use high performance server to predict the map pattern for each pair . We call the above steps the training session.
During the adversarial attack: with the perturbation distribution on the predicted general pattern, we could directly apply perturbation on the image under attack. This will cost much less resources than other algorithms because of no gradients calculation. By choosing the best-fit parameters to trade-off the adversarial effectiveness and perturbation rate, the number of ineffective adversarial examples could be minimized. And additional perturbation degree will be continuously generated to the ineffective examples until successful attack. The overview of the proposed fast adversarial example generation framework is shown in Algorithm 1.
5 ASP based Adversarial Training
As aforementioned, the adversarial training can effectively enhance the NN defense capability. In this section, we propose an ASP fast adversarial training framework with the proposed ASP.
5.1 Mechanism of Adversarial Training
The adversarial training can be seen as a network regulation process with data augmentation scheme, which augments the training data with adversarial examples to improve the NN generalization capability for better tolerance with the adversarial examples:
The loss functions of a normal training and an adversarial training can be formulated as:
where, in the parameter set of the NN, is the perturbation value from the adversarial attack, and therefore represents the adversarial example. By integrating these two loss functions, the adversarial training procedure could be formulated as minimizing the following function:
where, is the parameter set to adjust the weight ratio between original dataset and adversarial examples, for which we choose , considering that both original dataset and adversarial examples are equally important.
This adversarial training process will eventually minimize the prediction error as well as the adversarial attack success rate. However, from Eq. 10, we can also tell that, the adversarial training is also a data-hungry process for the adversarial examples. Considering the low adversarial example generation speed with current adversarial attack methods, the adversarial training efficiency is highly compromised. (The computation time for the adversarial attacks will be quantitatively investigated in Section 6.)
5.2 Fast Adversarial Training Framework
The proposed high performance ASP provides an optimal solution to the data-hungry adversarial training process. As shown in Fig. 7, we combine the fast adversarial example generation with adversarial training to improve the defense capability of the NN.
Suppose we have an N class dataset. We first utilize our fast adversarial example generation method ASP to attack on the training dataset samples. Specifically, ASP method will produce adversarial examples for each training sample. With our ASP method, large amounts of adversarial examples could be produced efficiently. Combining these adversarial examples with original label, we could get an adversarial training dataset with times size of the original dataset (N class for original and N(N-1) class for adversarial examples). Then adversarial loss function is used to train the neural network. When the adversarial training phase is done, the neural network will become more robust to adversarial attack, which will be discussed in Evaluation part.
6 Performance and Evaluation
6.1 Experiment Setup
We mainly test our algorithms on two most popular dataset for image classfication – MNIST and Cifar10. For MNIST hand writing digits classification, we use a four-layer convolution neural network as test object, which contains 3 convolutional layer with ReLu activation function and 1 fully-connected layer with SoftMax function as output. This model could achieve 99.2% classification accuracy after training 10 epochs. For Cifar10 image classification, we use a ten-layer convolutional neural network training with dropout technique as test object, which includes 6 convolutional layers, 3 max pooling layers and 1 fully-connected layer with SoftMax function as output. This model achieves 85% percents classification accuracy on test images after training 100 epochs. We evaluate the performance of different attack methods all based on these two models within the same test environment: Tensorflow-1.3  with CUDA support, GTX1080 8G. In addition, for computation time evaluation, we set the parameters in all algorithms to cause just above 90% and 85% misclassification rates for MNIST and Cifar10 to ensure they produce the same adversarial effect. The adversarial attacks FGM, BIM and DeepFool are tested by using v2.0.0 of CleverHans library.
6.2 Performance of ASP on MNIST
In this section, we compare our ASP algorithm with current existed algorithms: FGM, BIM and DeepFool, with evaluation metrics of perturbation rate, attack success rate, computing cost and ASE. First, regarding different perturbation degree, the results are shown in Fig. 8.
As Fig. 8 shows, under the low perturbation degree, ASP algorithm has a better performance than FGM and BIM, which proves that our ASP is more precise and effective than gradients propagation. DeepFool algorithm achieves the best attacking performance with the lowest perturbation degree 0.05, but this comes with a much higher computation overhead. Note that DeepFool algorithm is a heuristic searching algorithm, thus the perturbation degree is a fixed value.
From Fig. 9, first, we could clearly see that except for FGM, all of the BIM, DeepFool and ASP achieve 1% test accuracy, which means 99% attacking success rate. Suffering from imprecise adversarial pattern calculation, FGM’s attacking success rate is lowest since its test accuracy is highest of all, 7.2%. Another significant result is that DeepFool needs much more computation time for attacking 1000 MNIST images compared to other algorithms. In fact, DeepFool uses 44.3s for attacking 1000 MNIST images while our ASP algorithm only uses 0.44s. By comparison, FGM and BIM use 0.94s and 7.65s respectively. Benefitting from the pre-trained pattern, ASP algorithm has the shortest attacking time and also lowest computation requirements. In addition, ASP algorithm also has the lowest perturbation rate (3 times less than all other algorithms) and highest ASE, which also proves ASP prediction’s effectiveness.
6.3 Performance of ASP on Cifar10
For Cifar10 dataset, we first evaluate our ASP attack success rate under different perturbation rates, shown in Fig. 10. Here means the perturbation step size for the pixels. As Fig. 10 shows, with just 20% to 30% perturbation rate and 0.2 perturbation step, ASP could achieve 85% attack success rate. This indicates that the predicted saliency pattern in ASP framework is also able to implement accurate and effective attack on more complex images.
For performance comparation, we use the same perturbation step size for FGM and BIM. But due to the different attack mechanisms, different perturbation degree is produced. Thus in order to compare their performance more precisely, we use three different paramter sets for ASP so that they produce nearly same perturbation degree with our benchmark algorithms. Specifically, ASP_1 produces the same perturbation degree (21%) with FGM, and for ASP_2 and BIM (11%), ASP_3 and DeepFool(3%). Fig. 11 shows the ASP performance compararation results. For all three sets, ASP achieves same or better attack success rate (all over 83%). In addition, perturbation rates of all three ASPs are less than their counter parts, which means ASP attack is generated more precisely and concentrated. Most importantly, ASP takes only 2.1s to attack 1000 images, which is faster than FGM (3.4s) and faster than BIM (25.1s) and DeepFool (26.9s).
|Normal Test on original DNN||99.0%||99.0%||99.0%|
|Ad.Test on original DNN||7.38%||0.79%||1.71%|
|Ad.Test on Ad-trained DNN||96.5%||52.3%||45.94%|
In summary, ASP significantly outperforms FGM, BIM and DeepFool on both MNIST and Cifar10 with (23) lower perturbation rates, 1.5 higher attack efficiency, and most importantly, (12100) attack speed-up at most.
6.4 Adversarial Training Performance
We utilized FGM, BIM and ASP algorithms and generated large amounts of adversarial examples on MNIST dataset for training purpose. Together with adversarial training object function, we re-trained our previous model with the augemented training dataset. Table 2 shows the accuracy of a NN before and after adversarial training, which indicates adversarial training could effectively enhance the defense capability of NN to the adversarial attacks.
In this work, we proposed a new fast adversarial example generation framework based on adversarial saliency prediction. Compared with current state-of-art methods, ASP could achieve at most 12 speed-up for adversarial example generation, 2 lower perturbation rate, and high attack success rate of 87% on both MNIST and Cifar10. In addition, we also utilized ASP to support the data-hungry NN adversarial training, which effectively enhance the robustness of NN to the adversarial attacks by reducing the attack success rate by .
- R. T. Azuma, “A survey of augmented reality,” Presence: Teleoperators and virtual environments, vol. 6, no. 4, pp. 355–385, 1997.
- J. Hirschberg and et al., “Advances in Natural language processing,” Science, vol. 349, no. 6245, pp. 261–266, 1992.
- M. Bojarski and et al., “End to end learning for self-driving cars,” arXiv:1604.07316, 2016.
- C. Szegedy and et al., “Intriguing properties of neural networks,” arXiv:1312.6199, 2013.
- I. Goodfellow and et al., “Explaining and harnessing adversarial examples,” arXiv:1412.6572, 2014.
- A. Kurakin and et al., “Adversarial examples in the physical world,” arXiv:1607.02533, 2016.
- I. Evtimov and et al., “Robust physical-world attacks on deep learning models,” arXiv:1707.08945, 2017.
- N. Papernot and et al., “Distillation as a defense to adversarial perturbations against deep neural networks,” in Security and Privacy (SP), 2016 IEEE Symposium on, 2016, pp. 582–597.
- S.-M. Moosavi-Dezfooli and et al., “Deepfool: a simple and accurate method to fool deep neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2574–2582.
- N. Carlini and et al., “Towards evaluating the robustness of neural networks,” in Security and Privacy (SP), 2017 IEEE Symposium on, 2017, pp. 39–57.
- Y. LeCun, “The mnist database of handwritten digits,” http://yann. lecun. com/exdb/mnist/, 1998.
- A. Krizhevsky and et al., “The cifar-10 dataset,” http://www. cs. toronto. edu/kriz/cifar. html, 2014.
- Y. LeCun and et al., “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
- N. Papernot and et al., “The limitations of deep learning in adversarial settings,” in Security and Privacy (EuroS&P), 2016 IEEE European Symposium on, 2016, pp. 372–387.
- M. Abadi and et al., “Tensorflow: Large-scale machine learning on heterogeneous distributed systems,” arXiv:1603.04467, 2016.
- P. Nicolas and et al., “cleverhans v2.0.0: an adversarial machine learning library,” arXiv:1610.00768, 2017.