Improving Transferability of Adversarial Examples with Input Diversity
Abstract
Though convolutional neural networks have achieved stateoftheart performance on various vision tasks, they are extremely vulnerable to adversarial examples, which are obtained by adding humanimperceptible perturbations to the original images. Adversarial examples can thus be used as an useful tool to evaluate and select the most robust models in safetycritical applications. However, most of the existing adversarial attacks only achieve relatively low success rates under the challenging blackbox setting, where the attackers have no knowledge of the model structure and parameters. To this end, we propose to improve the transferability of adversarial examples by creating diverse input patterns. Instead of only using the original images to generate adversarial examples, our method applies random transformations to the input images at each iteration. Extensive experiments on ImageNet show that the proposed attack method can generate adversarial examples that transfer much better to different networks than existing baselines. To further improve the transferability, we (1) integrate the recently proposed momentum method into the attack process; and (2) attack an ensemble of networks simultaneously. By evaluating our method against top defense submissions and official baselines from NIPS adversarial competition, this enhanced attack reaches an average success rate of , which outperforms the top attack submission in the NIPS competition by a large margin of . We hope that our proposed attack strategy can serve as a benchmark for evaluating the robustness of networks to adversaries and the effectiveness of different defense methods in future. The code is public available at https://github.com/cihangxie/DI2FGSM.
Keywords:
Adversarial Examples, BlackBox Attacks1 Introduction
Recent success of convolutional neural networks (CNNs) leads to a dramatic performance improvement on various vision tasks, including image classification [13, 28, 11], object detection [8, 24, 36] and semantic segmentation [18, 3]. However, CNNs are extremely vulnerable to small perturbations to the input images, i.e., humanimperceptible additive perturbations can result in failure predictions of CNNs. These intentionally crafted images are known as adversarial examples [32]. Learning how to generate adversarial examples can help us investigate the robustness of different models [1] and understand the insufficiency of current training algorithms [9, 15, 33].
Several methods [9, 32, 14] have been proposed recently to find adversarial examples. In general, these attacks can be categorized into two types, singlestep attacks [9] and iterative attacks [32, 14], according to the number of steps of gradient computation. Under the whitebox setting, where the attackers have a perfect knowledge of the network structure and weights, iterative attacks can generate adversarial examples with much higher success rates than those generated by singlestep attacks. However, if these adversarial examples are tested on a different network (either in terms of network structure, weights or both), i.e., the blackbox setting, singlestep attacks achieve higher success rates than iterative attacks. This tradeoff is due to the fact that iterative attacks tend to overfit the specific network parameters (i.e., have high whitebox success rates) thus generated adversarial examples rarely transfer to other networks (i.e., have low blackbox success rates), while singlestep attacks usually underfit to the network parameters (i.e., have low whitebox success rates) thus producing adversarial examples with slightly better transferability. Given this phenomenon, one interesting question is whether we can generate adversarial examples with high success rates under both whitebox and blackbox settings.
Data augmentation [13, 28, 11] has been shown to be an effective way to prevent networks from overfitting during the training process. Specifically, a set of labelpreserving transformations, e.g., resizing, cropping and rotating, are applied to the images to enlarge the training set. Consequently, the trained networks have stronger ability to generalize well to unseeing images. Meanwhile, [34, 10] showed that image transformations can defend against adversarial examples under certain situations, which indicates that adversarial examples cannot generalize well under different transformations. These transformed adversarial examples are known as hard examples [26, 27] for attackers, which can then be served as good samples to produce more transferable adversarial examples.
To this end, we propose the Diverse Input Iterative Fast Gradient Sign Method (DI^{2}FGSM) to improve the transferability of adversarial examples. At each iteration, unlike the traditional methods which maximize the loss function directly w.r.t. the original inputs, we apply random and differentiable transformations to the input images with probability and maximize the loss function w.r.t. these transformed inputs. In particular, the transformations used here are random resizing, which resizes the input images to a random size, and random padding, which pads zeros around the input images in a random manner. Note that, these randomized operations were previously used to defend against adversarial examples [34], while here we incorporate them into the attack process to create hard and diverse input patterns. Figure 1 shows an adversarial examples generated by our proposed attack method, DI^{2}FGSM, and compares its success rates to other attack methods under both whitebox and blackbox settings.
We test the proposed attack method on several networks under both whitebox and blackbox settings. Compared with traditional iterative attacks, the results on ImageNet (see Section 4.2) show that DI^{2}FGSM gets significantly higher success rates for blackbox models, and maintains similar success rates for whitebox models. To improve the transferability of adversarial examples further, we (1) integrate momentum term [7] into the attack process; and (2) attack multiple networks simultaneously [17]. By evaluating our attack method w.r.t. the top defense submissions and official baselines from NIPS adversarial competition, this enhanced attack reaches an average success rate of , which outperforms the top attack submission in the NIPS competition by a large margin of . We hope that our proposed attack strategy can serve as a benchmark for evaluating the robustness of networks to adversaries and the effectiveness of different defense methods in future.
2 Related Work
2.1 Generating Adversarial Examples
Traditional machine learning algorithms are known to be vulnerable to adversarial examples [5, 12, 2]. Recently, Szegedy et al. [32] pointed out that CNNs are also fragile to adversarial examples, and proposed a boxconstrained LBFGS method to find adversarial examples reliably. Due to the expensive computation in [32], Goodfellow et al. [9] proposed the fast gradient sign method to generate adversarial examples efficiently by performing a single gradient step. This method was then extended by [14] to an iterative version, and showed that the generated adversarial examples can exist in the physical world. Dong et al. [7] proposed a broad class of momentumbased iterative algorithms to boost the transferability of adversarial examples. The transferability can also be improved by attacking an ensemble of networks simultaneously [17]. Besides image classification, adversarial examples also exist in object detection [35], semantic segmentation [35, 4], speech recognition [4], deep reinforcement learning [16], etc.. Unlike adversarial examples which can be recognized by human, Nguyen et al. [21] generated fooling images that are different from natural images and difficult for human to recognize, but CNNs believe they are recognizable objects with high confidences.
2.2 Defending Against Adversarial Examples
Conversely, many methods have been proposed recently to defend against adversarial examples. [9, 15] proposed to inject adversarial examples into the training data to increase the network robustness. Tramèr et al. [33] pointed out that such adversarially trained models still remain vulnerable to adversarial examples, and proposed ensemble adversarial training, which augments training data with perturbations transferred from other models, in order to improve the network robustness further. [34, 10] utilized randomized image transformations to inputs at inference time to mitigate adversarial effects. Dhillon et al. [6] pruned a random subset of activations according to their magnitude to enhance network robustness. Prakash et al. [23] proposed a framework which combines pixel deflection with soft wavelet denoising to defend against adversarial examples. [20, 29, 25] leveraged generative models to purify adversarial images by moving them back towards the distribution of clean images.
3 Methodology
Let denote an image, and denote the corresponding groundtruth label. We use to denote the network parameters, and to denote the loss. For the adversarial example generation, the goal is to maximize the loss for the image , under the constraint that the generated adversarial example should look visually similar to the original image and the corresponding predicted label . In this paper, we use norm to measure the perceptibility of adversarial perturbations, i.e., . The loss function is defined as
(1) 
where is the onehot encoding of the groundtruth , and is the logits output. Note that all the baseline attacks have been implemented in the cleverhans library [22], which can be used directly for our experiments.
3.1 Family of Fast Gradient Sign Methods
In this section, we give an overview of the family of fast gradient sign methods:

Fast Gradient Sign Method (FGSM): FGSM [9] is the first member in this attack family, which finds the adversarial perturbations in the direction of the loss gradient . The update equation is
(2) 
Iterative Fast Gradient Sign Method (IFGSM): Kurakin et al. [15] extended FGSM to an iterative version, which can be expressed as
(3) (4) where indicates the resulting image are clipped within the ball of the original image , is the iteration number and is the step size.

Momentum Iterative Fast Gradient Sign Method (MIFGSM): MIFGSM [7] proposed to integrate the momentum term into the attack process to stabilize update directions and escape from poor local maxima. The updating procedure is similar to IFGSM, with the replacement of Equation (4) by:
(5) (6) where is the decay factor of the momentum term and is the accumulated gradient at iteration .
3.2 Diverse Inputs Iterative Fast Gradient Sign Method
Overfitting Phenomenon
Let denote the unknown network parameters. In general, a strong adversarial example should have high success rates on both whitebox models, i.e., , and blackbox models, i.e., . On one hand, the traditional singlestep attacks, e.g., FGSM, tend to underfit to the specific network parameters due to inaccurate linear appropriation of the loss , thus cannot reach high success rates on whitebox models. On the other hand, the traditional iterative attacks, e.g., IFGSM, greedily perturb the images in the direction of the sign of the loss gradient at each iteration, thus easily fall into the poor local maxima and overfit to the specific network parameters . These overfitted adversarial examples rarely transfer to blackbox models. In order to generate adversarial examples with strong transferability, we need to find a better way to optimize the loss to alleviate this overfitting phenomenon.
Data augmentation [13, 28, 11] is shown as an effective way to prevent networks from overfitting during the training process. Meanwhile, [34, 10] showed that adversarial examples are no longer malicious if simple image transformations are applied, which indicates these transformed adversarial images can serve as good samples for better optimization.
Our Solution
Based on the analysis above, we propose the Diverse Inputs Iterative Fast Gradient Sign Method (DI^{2}FGSM), which applies image transformations to the original inputs with probability at each iteration to alleviate the overfitting phenomenon. Specifically, the image transformations applied here is random resizing, which resizes the input images to a random size, and random padding, which pads zeros around the input images in a random manner [34]. The transformation probability controls the tradeoff between success rates on whitebox models and success rates on blackbox models, which can be observed from Figure 3. If , DI^{2}FGSM degrades to IFGSM and leads to overfitting. If , i.e., only transformed inputs are used for the attack, the generated adversarial examples tend to have much higher success rates on blackbox models but lower success rates on whitebox models, since the original inputs are not seen by the attackers.
In general, the updating procedure of DI^{2}FGSM is similar to IFGSM, with the replacement of Equation (4) by:
(7) 
where the stochastic transformation function is:
(8) 
3.3 Momentum Diverse Inputs Iterative Fast Gradient Sign Method
Intuitively, momentum and diverse inputs are two completely different ways to alleviate the overfitting phenomenon. We can combine them naturally to form a much stronger attack, i.e., Momentum Diverse Inputs Iterative Fast Gradient Sign Method (MDI^{2}FGSM). The overall updating procedure of MDI^{2}FGSM is similar to MIFGSM, with only replacement of Equation (5) by:
(9) 
3.4 Relationships between Different Attacks
The attacks mentioned above all belong to the family of Fast Gradient Sign Methods, and can be related via different parameter settings, as shown in Figure 2. In summary:

If the transformation probability , MDI^{2}FGSM degrades to MIFGSM, and DI^{2}FGSM degrades to IFGSM;

If the decay factor , MDI^{2}FGSM degrades to DI^{2}FGSM, and MIFGSM degrades to IFGSM;

If the total iteration number , IFGSM degrades to FGSM.
3.5 Attacking an Ensemble of Networks
Liu et al. [17] suggested that attacking an ensemble of multiple networks simultaneously can generate much stronger adversarial examples. The motivation is that if an adversarial image remains adversarial for multiple networks, then it is more likely to transfer to other networks as well. Therefore, we can use this strategy to improve the transferability even further.
We follow the ensemble strategy proposed in [7], which fuse the logit activations together to attack multiple networks simultaneously. Specifically, to attack an ensemble of models, the logits are fused by:
(10) 
where is the logits output of the th model with the parameters , is the ensemble weight with and .
4 Experiment
4.1 Experiment Setup
Dataset
It is less meaningful to attack the images that are already classified wrongly. Therefore, we randomly choose images from the ImageNet validation set that are classified correctly by all the networks which we test on, to form our test dataset. All these images are resized to beforehand.
Networks
We consider four normally trained networks, i.e., Inceptionv3 (Incv3) [31], Inceptionv4 (Incv4) [30], Resnetv2152 (Res152) [11] and InceptionResnetv2 (IncResv2) [30], and three adversarially trained networks [33], i.e., ens3advInceptionv3 (Incv3_{ens3}), ens4advInceptionv3 (Incv3_{ens4}) and ensadvInceptionResNetv2 (IncResv2_{ens}). All networks are publicly available
Implementation details
For the parameters of different attackers, we follow the default settings in [14] with the step size and the total iteration number . We set the maximum perturbation to be , which is still imperceptible to human vision [19]. For the momentum term, decay factor is set to be as in [7]. For the stochastic transformation function , the probability is set to be , i.e., attackers put equal attentions on the original inputs and the transformed inputs. For transformation operations , the input is first randomly resized to a image, with , and then padded to the size in a random manner.
4.2 Attacking a Single Network
\arraybackslash  \arraybackslashAttack  \arraybackslashIncv3  \arraybackslashIncv4  \arraybackslashIncResv2  \arraybackslashRes152  \arraybackslashIncv3_{ens3}  \arraybackslashIncv3_{ens4}  \arraybackslashIncResv2_{ens} 

\arraybackslashIncv3  \arraybackslashFGSM  \arraybackslash64.6%  \arraybackslash23.5%  \arraybackslash21.7%  \arraybackslash21.7%  \arraybackslash8.0%  \arraybackslash7.5%  \arraybackslash3.6% 
\arraybackslashIFGSM  \arraybackslash99.9%  \arraybackslash14.8%  \arraybackslash11.6%  \arraybackslash8.9%  \arraybackslash3.3%  \arraybackslash2.9%  \arraybackslash1.5%  
\arraybackslashDI^{2}FGSM (Ours)  \arraybackslash99.9%  \arraybackslash35.5%  \arraybackslash27.8%  \arraybackslash21.4%  \arraybackslash5.5%  \arraybackslash5.2%  \arraybackslash2.8%  
\arraybackslashMIFGSM  \arraybackslash99.9%  \arraybackslash36.6%  \arraybackslash34.5%  \arraybackslash27.5%  \arraybackslash8.9%  \arraybackslash8.4%  \arraybackslash4.7%  
\arraybackslashMDI^{2}FGSM (Ours)  \arraybackslash99.9%  \arraybackslash63.9%  \arraybackslash59.4%  \arraybackslash47.9%  \arraybackslash14.3%  \arraybackslash14.0%  \arraybackslash7.0%  
\arraybackslashIncv4  \arraybackslashFGSM  \arraybackslash26.4%  \arraybackslash49.6%  \arraybackslash19.7%  \arraybackslash20.4%  \arraybackslash8.4%  \arraybackslash7.7%  \arraybackslash4.1% 
\arraybackslashIFGSM  \arraybackslash22.0%  \arraybackslash99.9%  \arraybackslash13.2%  \arraybackslash10.9%  \arraybackslash3.2%  \arraybackslash3.0%  \arraybackslash1.7%  
\arraybackslashDI^{2}FGSM (Ours)  \arraybackslash43.3%  \arraybackslash99.7%  \arraybackslash28.9%  \arraybackslash23.1%  \arraybackslash5.9%  \arraybackslash5.5%  \arraybackslash3.2%  
\arraybackslashMIFGSM  \arraybackslash51.1%  \arraybackslash99.9%  \arraybackslash39.4%  \arraybackslash33.7%  \arraybackslash11.2%  \arraybackslash10.7%  \arraybackslash5.3%  
\arraybackslashMDI^{2}FGSM (Ours)  \arraybackslash72.4%  \arraybackslash99.5%  \arraybackslash62.2%  \arraybackslash52.1%  \arraybackslash17.6%  \arraybackslash15.6%  \arraybackslash8.8%  
\arraybackslashIncResv2  \arraybackslashFGSM  \arraybackslash24.3%  \arraybackslash19.3%  \arraybackslash39.6%  \arraybackslash19.4%  \arraybackslash8.5%  \arraybackslash7.3%  \arraybackslash4.8% 
\arraybackslashIFGSM  \arraybackslash22.2%  \arraybackslash17.7%  \arraybackslash97.9%  \arraybackslash12.6%  \arraybackslash4.6%  \arraybackslash3.7%  \arraybackslash2.5%  
\arraybackslashDI^{2}FGSM (Ours)  \arraybackslash46.5%  \arraybackslash40.5%  \arraybackslash95.8%  \arraybackslash28.6%  \arraybackslash8.2%  \arraybackslash6.6%  \arraybackslash4.8%  
\arraybackslashMIFGSM  \arraybackslash53.5%  \arraybackslash45.9%  \arraybackslash98.4%  \arraybackslash37.8%  \arraybackslash15.3%  \arraybackslash13.0%  \arraybackslash8.8%  
\arraybackslashMDI^{2}FGSM (Ours)  \arraybackslash71.2%  \arraybackslash67.4%  \arraybackslash96.1%  \arraybackslash57.4%  \arraybackslash25.1%  \arraybackslash20.7%  \arraybackslash14.9%  
\arraybackslashRes152  \arraybackslashFGSM  \arraybackslash34.4%  \arraybackslash28.5%  \arraybackslash27.1%  \arraybackslash75.2%  \arraybackslash12.4%  \arraybackslash11.0%  \arraybackslash6.0% 
\arraybackslashIFGSM  \arraybackslash20.8%  \arraybackslash17.2%  \arraybackslash14.9%  \arraybackslash99.1%  \arraybackslash5.4%  \arraybackslash4.6%  \arraybackslash2.8%  
\arraybackslashDI^{2}FGSM (Ours)  \arraybackslash53.8%  \arraybackslash49.0%  \arraybackslash44.8%  \arraybackslash99.2%  \arraybackslash13.0%  \arraybackslash11.1%  \arraybackslash6.9%  
\arraybackslashMIFGSM  \arraybackslash50.1%  \arraybackslash44.1%  \arraybackslash42.2%  \arraybackslash99.0%  \arraybackslash18.2%  \arraybackslash15.2%  \arraybackslash9.0%  
\arraybackslashMDI^{2}FGSM (Ours)  \arraybackslash78.9%  \arraybackslash76.5%  \arraybackslash74.8%  \arraybackslash99.2%  \arraybackslash35.2%  \arraybackslash29.4%  \arraybackslash19.0% 
We first perform adversarial attacks on a single network, using FGSM, IFGSM, DI^{2}FGSM, MIFGSM and MDI^{2}FGSM, respectively. We craft adversarial examples only on normally trained networks, and test them on all seven networks. The success rates are shown in Table 1, where the diagonal blocks indicate whitebox attacks and offdiagonal blocks indicate blackbox attacks. We list the networks that we attack on in rows, and networks that we test on in columns.
From Table 1, first and foremost, we observe that MDI^{2}FGSM outperforms all other baseline attacks by a large margin on all blackbox models, and maintains high success rates on all whitebox models. For example, if adversarial examples are crafted on IncResv2, MDI^{2}FGSM has success rates of on Incv4 (normally trained blackbox model) and on Incv3_{ens3} (adversarially trained blackbox model), while strong baselines like MIFGSM only obtains the corresponding success rates of and , respectively. This convincingly demonstrates the effectiveness of the combination of input diversity and momentum for improving the transferability of adversarial examples.
We then compare the success rates of IFGSM and DI^{2}FGSM to see the effectiveness of diverse input patterns solely. By generating adversarial examples with input diversity, DI^{2}FGSM significantly improves the success rates of IFGSM on challenging blackbox models, regardless whether this model is adversarially trained, and maintains high success rates on whitebox models. For example, if adversarial examples are crafted on Res152, DI^{2}FGSM has success rates of on Res152 (whitebox model), on Incv3 (normally trained blackbox model) and on Incv3_{ens4} (adversarially trained blackbox model), while IFGSM only obtains the corresponding success rates of , and , respectively. Compared with FGSM, DI^{2}FGSM also reaches much higher success rates on the normally trained blackbox models, and comparable performance on the adversarially trained blackbox models.
4.3 Attacking an Ensemble of Networks
\arraybackslash  \arraybackslashAttack  \arraybackslashIncv3  \arraybackslashIncv4  \arraybackslashIncResv2  \arraybackslashRes152  \arraybackslashIncv3_{ens3}  \arraybackslashIncv3_{ens4}  \arraybackslashIncResv2_{ens} 

\arraybackslashEnsemble  \arraybackslashIFGSM  \arraybackslash96.6%  \arraybackslash96.9%  \arraybackslash98.7%  \arraybackslash96.2%  \arraybackslash97.0%  \arraybackslash97.3%  \arraybackslash94.3% 
\arraybackslashDI^{2}FGSM (Ours)  \arraybackslash88.9%  \arraybackslash89.6%  \arraybackslash93.2%  \arraybackslash87.7%  \arraybackslash91.7%  \arraybackslash91.7%  \arraybackslash93.2%  
\arraybackslashMIFGSM  \arraybackslash96.9%  \arraybackslash96.9%  \arraybackslash98.8%  \arraybackslash96.8%  \arraybackslash96.8%  \arraybackslash97.0%  \arraybackslash94.6%  
\arraybackslashMDI^{2}FGSM (Ours)  \arraybackslash90.1%  \arraybackslash91.1%  \arraybackslash94.0%  \arraybackslash89.3%  \arraybackslash92.8%  \arraybackslash92.7%  \arraybackslash94.9%  
\arraybackslashHoldout  \arraybackslashIFGSM  \arraybackslash43.7%  \arraybackslash36.4%  \arraybackslash33.3%  \arraybackslash25.4%  \arraybackslash12.9%  \arraybackslash15.1%  \arraybackslash8.8% 
\arraybackslashDI^{2}FGSM (Ours)  \arraybackslash69.9%  \arraybackslash67.9%  \arraybackslash64.1%  \arraybackslash51.7%  \arraybackslash36.3%  \arraybackslash35.0%  \arraybackslash30.4%  
\arraybackslashMIFGSM  \arraybackslash71.4%  \arraybackslash65.9%  \arraybackslash64.6%  \arraybackslash55.6%  \arraybackslash22.8%  \arraybackslash26.1%  \arraybackslash15.8%  
\arraybackslashMDI^{2}FGSM (Ours)  \arraybackslash80.7%  \arraybackslash80.6%  \arraybackslash80.7%  \arraybackslash70.9%  \arraybackslash44.6%  \arraybackslash44.5%  \arraybackslash39.4% 
Though the results in Table 1 show that momentum and input diversity can significantly improve the transferability of adversarial examples, they are still relatively weak at attacking an adversarially trained network under the blackbox setting, e.g., the highest blackbox success rate on IncResv2_{ens} is only . Therefore, we follow the strategy in [17] to attack multiple networks simultaneously in order to further improve transferability. We consider all seven networks here. Adversarial examples are generated on an ensemble of six networks, and tested on the ensembled network and the holdout network, using IFGSM, DI^{2}FGSM, MIFGSM and MDI^{2}FGSM, respectively. FGSM is ignored here due to its low success rates on whitebox models. All ensembled models are assigned with equal weight, i.e., .
The results are summarized in Table 2, where the top row shows the success rates on the ensembled network (whitebox setting), and the bottom row shows the success rates on the holdout network (blackbox setting). Under the challenging blackbox setting, we observe that MDI^{2}FGSM always generates adversarial examples with better transferability than other methods on all networks. For example, by keeping Incv3_{ens3} as a holdout model, MDI^{2}FGSM can fool Incv3_{ens3} with an success rate of , while IFGSM, DI^{2}FGSM and MIFGSM only have success rates of , and , respectively. Besides, compared with MIFGSM, we observe that using diverse input patterns alone, i.e., DI^{2}FGSM, can reach a much higher success rate if the holdout model is an adversarially trained network, and a comparable success rate if the holdout model is a normally trained network.
Under the whitebox setting, we see that DI^{2}FGSM and MDI^{2}FGSM reach slightly lower (but still very high) success rates on ensemble models compared with IFGSM and MIFGSM under the whitebox setting. This is due to the fact that attacking multiple networks simultaneously is much harder than attacking a single model. However, the whitebox success rates can be improved if we assign the transformation probability with a smaller value, increase the number of total iteration or use a smaller step size (see Section 4.4).
4.4 Ablation Studies
In this section, we conduct a series of ablation experiments to study the impact of different parameters, e.g., the step sizp , on DI^{2}FGSM and MDI^{2}FGSM. We only consider attacking an ensemble of networks here, since this is much stronger than attacking a single network, which provides a more accurate evaluation of the network robustness. The max perturbation is set to for all experiments.
Transformation Probability
We first study the influence of the transformation probability on the success rates under both whitebox and blackbox settings. We set the step size and the total iteration number . The transformation probability is varied from to . According to the relationships showed in Figure 2, if , MDI^{2}FGSM degrades to MIFGSM and DI^{2}FGSM degrades to IFGSM.
We show the success rates on various networks in Figure 3. We observe that both DI^{2}FGSM and MDI^{2}FGSM achieve a higher blackbox success rates but lower whitebox success rates as increase. Moreover, for all attacks, if is small, i.e., only a small amount of transformed inputs are utilized, blackbox success rates can increase significantly, while whitebox success rates only drop a little. This phenomenon indicates the importance of adding transformed inputs into the attack process.
The trends showed in Figure 3 also provide useful suggestions of constructing strong adversarial attacks in practice. For example, if you know the blackbox model is a new network that totally different from any existing networks, you can set to reach the maximum transferability. If the blackbox model is a mixture of new networks and existing networks, you can choose a moderate value of to maximize the blackbox success rates under a predefined whitebox success rates, e.g., whitebox success rates must greater or equal than .
Total Iteration Number
We here study the influence of the total iteration number on the success rates under both whitebox and blackbox settings. We set the transformation probability and the step size . The total iteration number is varied from to , and the results are plotted in Figure 4. For DI^{2}FGSM, we see that the blackbox success rates and whitebox success rates always increase as the total iteration number increase. Similar trends can also be observed for MDI^{2}FGSM except for the blackbox success rates on adversarially trained models, i.e., performing more iterations cannot bring extra transferability on adversarially trained models. Moreover, we observe that the success rates gap between MDI^{2}FGSM and DI^{2}FGSM is diminished as increase.
Step Size
We finally study the influence of the step size on the success rates under both whitebox and blackbox settings. We set the transformation probability . In order to reach the maximum perturbation even for a small step size , we set the total iteration number be proportional to the step size, i.e., . The results are plotted in Figure 5. We observe that the whitebox success rates of both DI^{2}FGSM and MDI^{2}FGSM can be boosted if a smaller step size is provided. Under the blackbox setting, the success rates of DI^{2}FGSM is insensitive to the step size, while the success rates of MDI^{2}FGSM can still be improved with smaller step size.
4.5 Reproducing NIPS Adversarial Competition
In order to examine the effectiveness of our proposed attack methods in practice, we here reproduce the top defense submissions, which are blackbox models to us, and official baselines from NIPS 2017 adversarial competition. Due to resource limitation, we only consider the top defense submissions, i.e., TsAIL
Generating Adversarial Examples
When generating adversarial examples, we follow the procedure
Attacker Configurations
For the attacker configuration, we follow exactly the same settings in [7] which attacks an ensemble of Incv3, Incv4, IncResv2, Res152, Incv3_{ens3}, Incv3_{ens4}, IncResv2_{ens} and Incv3_{adv} [15]. The ensemble weights are set as equally for the first seven models and for Incv3_{adv}. The total iteration number is and the decay factor is . This configuration for MIFGSM won the st place in the NIPS adversarial attack competition. For DI^{2}FGSM and MDI^{2}FGSM, we choose according to the trends showed in Figure 3.
\arraybackslashAttack  \arraybackslashTsAIL  \arraybackslashiyswim  \arraybackslashAnil Thomas  \arraybackslashIncv3_{adv}  \arraybackslashIncResv2_{ens}  \arraybackslashIncv3  \arraybackslashAvg. 

\arraybackslashIFGSM  \arraybackslash14.0%  \arraybackslash35.6%  \arraybackslash30.9%  \arraybackslash98.2%  \arraybackslash96.4%  \arraybackslash99.0%  \arraybackslash62.4% 
\arraybackslashDI^{2}FGSM (Ours)  \arraybackslash22.7%  \arraybackslash58.4%  \arraybackslash48.0%  \arraybackslash91.5%  \arraybackslash90.7%  \arraybackslash97.3%  \arraybackslash68.1% 
\arraybackslashMIFGSM  \arraybackslash14.9%  \arraybackslash45.7%  \arraybackslash46.6%  \arraybackslash97.3%  \arraybackslash95.4%  \arraybackslash98.7%  \arraybackslash66.4% 
\arraybackslashMIFGSM*  \arraybackslash13.6%  \arraybackslash43.2%  \arraybackslash43.9%  \arraybackslash94.4%  \arraybackslash93.0%  \arraybackslash97.3%  \arraybackslash64.2% 
\arraybackslashMDI^{2}FGSM (Ours)  \arraybackslash20.0%  \arraybackslash69.8%  \arraybackslash64.4%  \arraybackslash93.3%  \arraybackslash92.4%  \arraybackslash97.9%  \arraybackslash73.0% 
Results
The results are summarized in Table 3. We also report the official results of MIFGSM (named MIFGSM*) as a reference to validate our implementation. The performance difference between MIFGSM and MIFGSM* is due to the randomness of max perturbation magnitude introduced in the attack process. Compared with MIFGSM, DI^{2}FGSM have higher success rates on top submissions while slightly lower success rates on baseline models, which results in these two attack methods having similar average success rates. By integrating both diverse inputs and momentum term, this enhanced attack, MDI^{2}FGSM, reaches an average success rate of , which is far better than other methods. For example, the top attack submission, MIFGSM, in the NIPS competition only get an average success rate of . We believe the same advantage can be observed even if we test on all defense submissions. This results also indicate that our proposed attack method can be used as a better tool to evaluate the robustness of various newly developed networks and defense methods.
4.6 Discussion
We provide a brief discussion of why diverse patterns help generate adversarial examples with better transferability. One hypothesis is that the decision boundaries of different networks share similar inherent structures due to the same training dataset, e.g., ImageNet. For example, as shown in Figure 1, different networks make similar mistakes in the presence of adversarial examples. By incorporating diverse patterns at each step, the optimization produces adversarial examples that are more robust to small transformations. These adversarial examples are malicious in a certain region at the network decision boundary, thus increase the chance to fool other networks, i.e., they achieve better blackbox success rate than existing methods. In the future, we plan to validate this hypothesis theoretically or empirically.
5 Conclusions
In this paper, we propose to improve transferability of adversarial examples with input diversity. Specifically, our method applies random transformations to the input images at each iteration in the attack process. Compared with traditional iterative attacks, the results on ImageNet show that our proposed attack method gets significantly higher success rates for blackbox models, and maintains similar success rates for whitebox models. We improve the transferability further by integrating momentum term and attacking multiple networks simultaneously. By evaluating this enhanced attack against the top defense submissions and official baselines from NIPS adversarial competition, we show that this enhanced attack reaches an average success rate of , which outperforms the top attack submission in the NIPS competition by a large margin of . We hope that our proposed attack strategy can serve as a benchmark for evaluating the robustness of networks to adversaries and the effectiveness of different defense methods in future. The code is public available at https://github.com/cihangxie/DI2FGSM.
Footnotes
 email: {cihangxie306, zhshuai.zhang, zhouyuyiner, alan.l.yuille}@gmail.com
 email: wjyouch@gmail.com
 email: zhou.ren@snapchat.com
 email: {cihangxie306, zhshuai.zhang, zhouyuyiner, alan.l.yuille}@gmail.com
 email: wjyouch@gmail.com
 email: zhou.ren@snapchat.com
 email: {cihangxie306, zhshuai.zhang, zhouyuyiner, alan.l.yuille}@gmail.com
 email: wjyouch@gmail.com
 email: zhou.ren@snapchat.com
 email: {cihangxie306, zhshuai.zhang, zhouyuyiner, alan.l.yuille}@gmail.com
 email: wjyouch@gmail.com
 email: zhou.ren@snapchat.com
 email: {cihangxie306, zhshuai.zhang, zhouyuyiner, alan.l.yuille}@gmail.com
 email: wjyouch@gmail.com
 email: zhou.ren@snapchat.com
 email: {cihangxie306, zhshuai.zhang, zhouyuyiner, alan.l.yuille}@gmail.com
 email: wjyouch@gmail.com
 email: zhou.ren@snapchat.com
 https://github.com/tensorflow/models/tree/master/research/slim
 https://github.com/tensorflow/models/tree/master/research/adv_imagenet_models
 https://github.com/lfz/GuidedDenoise
 https://github.com/cihangxie/NIPS2017_adv_challenge_defense
 https://github.com/anlthms/nips2017/tree/master/mmd
 https://www.kaggle.com/c/nips2017nontargetedadversarialattack
References
 Arnab, A., Miksik, O., Torr, P.H.: On the robustness of semantic segmentation models to adversarial attacks. arXiv preprint arXiv:1711.09856 (2017)
 Biggio, B., Corona, I., Maiorca, D., Nelson, B., Šrndić, N., Laskov, P., Giacinto, G., Roli, F.: Evasion attacks against machine learning at test time. In: Joint European conference on machine learning and knowledge discovery in databases. pp. 387–402. Springer (2013)
 Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence (2017)
 Cisse, M., Adi, Y., Neverova, N., Keshet, J.: Houdini: Fooling deep structured prediction models. arXiv preprint arXiv:1707.05373 (2017)
 Dalvi, N., Domingos, P., Sanghai, S., Verma, D., et al.: Adversarial classification. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM (2004)
 Dhillon, G.S., Azizzadenesheli, K., Bernstein, J.D., Kossaifi, J., Khanna, A., Lipton, Z.C., Anandkumar, A.: Stochastic activation pruning for robust adversarial defense. In: International Conference on Learning Representations (2018)
 Dong, Y., Liao, F., Pang, T., Su, H., Hu, X., Li, J., Zhu, J.: Boosting adversarial attacks with momentum. arXiv preprint arXiv:1710.06081 (2017)
 Girshick, R.: Fast rcnn. In: International Conference on Computer Vision. IEEE (2015)
 Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: International Conference on Learning Representations (2015)
 Guo, C., Rana, M., Cissé, M., van der Maaten, L.: Countering adversarial images using input transformations. In: International Conference on Learning Representations (2018)
 He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: European Conference on Computer Vision. Springer (2016)
 Huang, L., Joseph, A.D., Nelson, B., Rubinstein, B.I., Tygar, J.: Adversarial machine learning. In: Proceedings of the 4th ACM workshop on Security and artificial intelligence. ACM (2011)
 Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (2012)
 Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial examples in the physical world. In: International Conference on Learning Representations Workshop (2017)
 Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial machine learning at scale. In: International Conference on Learning Representations (2017)
 Lin, Y.C., Hong, Z.W., Liao, Y.H., Shih, M.L., Liu, M.Y., Sun, M.: Tactics of adversarial attack on deep reinforcement learning agents. In: International Joint Conference on Artificial Intelligence. AAAI (2017)
 Liu, Y., Chen, X., Liu, C., Song, D.: Delving into transferable adversarial examples and blackbox attacks. In: International Conference on Learning Representations (2017)
 Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Computer Vision and Pattern Recognition. IEEE (2015)
 Luo, Y., Boix, X., Roig, G., Poggio, T., Zhao, Q.: Foveationbased mechanisms alleviate adversarial examples. arXiv preprint arXiv:1511.06292 (2015)
 Meng, D., Chen, H.: Magnet: a twopronged defense against adversarial examples. arXiv preprint arXiv:1705.09064 (2017)
 Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In: Computer Vision and Pattern Recognition. IEEE (2015)
 Papernot, N., Goodfellow, I., Sheatsley, R., Feinman, R., McDaniel, P.: cleverhans v1.0.0: an adversarial machine learning library. arXiv preprint arXiv:1610.00768 (2016)
 Prakash, A., Moran, N., Garber, S., DiLillo, A., Storer, J.: Deflecting adversarial attacks with pixel deflection. arXiv preprint arXiv:1801.08926 (2018)
 Ren, S., He, K., Girshick, R., Sun, J.: Faster rcnn: Towards realtime object detection with region proposal networks. In: Advances in Neural Information Processing Systems (2015)
 Samangouei, P., Kabkab, M., Chellappa, R.: DefenseGAN: Protecting classifiers against adversarial attacks using generative models. In: International Conference on Learning Representations (2018)
 Shrivastava, A., Gupta, A., Girshick, R.: Training regionbased object detectors with online hard example mining. In: Computer Vision and Pattern Recognition. IEEE (2016)
 SimoSerra, E., Trulls, E., Ferraz, L., Kokkinos, I., Fua, P., MorenoNoguer, F.: Discriminative learning of deep convolutional feature point descriptors. In: International Conference on Computer Vision. IEEE (2015)
 Simonyan, K., Zisserman, A.: Very deep convolutional networks for largescale image recognition. In: International Conference on Learning Representations (2015)
 Song, Y., Kim, T., Nowozin, S., Ermon, S., Kushman, N.: Pixeldefend: Leveraging generative models to understand and defend against adversarial examples. arXiv preprint arXiv:1710.10766 (2017)
 Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inceptionv4, inceptionresnet and the impact of residual connections on learning. In: AAAI (2017)
 Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Computer Vision and Pattern Recognition. IEEE (2016)
 Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R.: Intriguing properties of neural networks. In: International Conference on Learning Representations (2014)
 Tramèr, F., Kurakin, A., Papernot, N., Boneh, D., McDaniel, P.: Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204 (2017)
 Xie, C., Wang, J., Zhang, Z., Ren, Z., Yuille, A.: Mitigating adversarial effects through randomization. In: International Conference on Learning Representations (2018)
 Xie, C., Wang, J., Zhang, Z., Zhou, Y., Xie, L., Yuille, A.: Adversarial Examples for Semantic Segmentation and Object Detection. In: International Conference on Computer Vision. IEEE (2017)
 Zhang, Z., Qiao, S., Xie, C., Shen, W., Wang, B., Yuille, A.L.: Singleshot object detection with enriched semantics. arXiv preprint arXiv:1712.00433 (2017)