WITCHcraft: Efficient PGD attacks with random step size
State-of-the-art adversarial attacks on neural networks use expensive iterative methods and numerous random restarts from different initial points. Iterative FGSM-based methods without restarts trade off performance for computational efficiency because they do not adequately explore the image space and are highly sensitive to the choice of step size. We propose a variant of Projected Gradient Descent (PGD) that uses a random step size to improve performance without resorting to expensive random restarts. Our method, Wide Iterative Stochastic crafting (WITCHcraft), achieves results superior to the classical PGD attack on the CIFAR-10 and MNIST data sets but without additional computational cost. This simple modification of PGD makes crafting attacks more economical, which is important in situations like adversarial training where attacks need to be crafted in real time.
Ping-Yeh Chiang Jonas Geiping Micah Goldblum
Tom Goldstein Renkun Ni Steven Reich Ali Shafahi ††thanks: Authors contributed equally and are listed in alphabetical order. \addressUniversity of Maryland, College Park
University of Siegen
Adversarial, Attack, PGD, CNN, CIFAR
Neural networks trained using stochastic gradient descent (SGD) are easily fooled by adversarial examples, small perturbations to inputs that change the output of the network [szegedy2013intriguing]. Adversarial attacks can expose serious security vulnerabilities in real-world applications such as object detection in self-driving cars [sitawarin2018darts] and classification in medical imaging [finlayson2018adversarial]. In response to this threat, subsequent work has developed training methods for producing neural networks robust to these attacks [gu2014towards, madry2018towards]. The back-and-forth between new defenses and adversarial attacks that break them has spawned an array of powerful new attack methods.
Among these, typical untargeted adversarial attacks operate by maximizing the loss of a neural network with respect to image space, within a small ball surrounding the input, using various optimization algorithms. Targeted attacks, on the other hand, minimize loss on a particular incorrect label. In the white-box attack setting, an attacker has access to the parameters of the network, while black-box attacks operate by querying the network or transferring attacks computed on other networks. We focus on the white-box setting, a space which is dominated by optimization methods.
Input spaces in computer vision are high-dimensional, and finding these small perturbations that effectively fool a network requires non-convex optimization [sinha2017certifying]. The outputs of neural networks oscillate in these neighborhoods, so that classical gradient descent is ineffective [athalye2018obfuscated], and signed gradient descent methods [bernstein2018signsgd] have better success. Even so, a single gradient descent is not guaranteed to solve the problem, so state-of-the-art attacks restart the attack many times with random initialization to introduce randomness and aggressively explore input space. However, this technique increases computational cost which may render an adversary, dynamically attacking a system in real time, useless. This is particularly problematic for adversarial training, a process in which attacks are generated on-the-fly during network training and used to harden a network against attacks.
In this work, we develop a novel method, Wide Iterative Stochastic crafting (WITCHcraft), for introducing randomness into adversarial attacks without running the attack multiple times with different initializations. We modify the classical PGD attack, which is similar to the Basic Iterative Method with a random restart and projections, by using a coordinate-wise random step size, meaning each entry of the signed gradient is scaled by a random factor chosen uniformly at random. We find that this randomization scheme decreases sensitivity to the choice of the step size parameters and initialization. We compare our method to standard PGD attacks and PGD with random restarts. We find that our method outperforms both of these attacks on the CIFAR-10 and MNIST data sets when granting all attackers equal compute budget. In Figure 1, we see an example of WITCHcraft perturbing a man into an ImageNet fish class without visibly changing the class to human observers [deng2009imagenet].
2 Related Work
Szegedy et al. first demonstrated the existence of adversarial examples comprising small perturbations of input pixels [szegedy2013intriguing]. Their regularized gradient descent method spawned numerous subsequent attacks. These later attacks include a variety of new objectives from universal adversarial perturbations, in which a single perturbation is effective on most test images, to realistic perturbations which are not sensitive to certain transformations [moosavi2017universal, athalye2017synthesizing]. Similarly, defense methods have sprung up to create networks robust to these attacks [gu2014towards, goodfellow2014explaining, ross2018improving]. The most successful of these defense methods, adversarial training, involves exposing the network to adversarial examples instead of clean examples during training [madry2018towards]. Since the advent of these defenses, even more attack methods have emerged for defeating adversarially robust networks [brendel2017decision, obfuscated-gradients]. A recent result of this war between attacks and defenses is the high computational cost of effective attacks against adversarially trained models.
The foundation of most popular attack methods is the PGD attack described by Madry et. al. [madry2018towards]. Their version of this attack starts with a randomly initialized perturbation , which is updated at each step via
where is a fixed step size, is the input, and is the corresponding label. This method uses just the sign of the gradient, a strategy first adapted for attacks in [goodfellow2014explaining]. The superiority of signed gradients to raw gradients for producing adversarial examples has puzzled the robustness community since its discovery, but these strong fluctuations in the gradient signal possibly help the attack to escape suboptimal solutions with low gradient. Signed gradient descent methods are tightly interconnected with adaptive gradient methods, such as Adam [kingma2014adam] as discussed in [bernstein2018signsgd].
In the aforementioned paper, Madry et al. demonstrate that adversarial training against PGD results in a model that is robust to norm-bounded attacks. Surprisingly, their experiments (as well as later experiments by other authors [zheng2019distributionally]) show that even though their models are specifically trained against PGD, models adversarially trained against the PGD attacker are also robust against other attacks.
The current best reported (white-box) attack on the Madry PGD-trained model is the multi-targeted attack described in [qin2019adversarial], which uses a targeted PGD attack (in which the attacker chooses the label) on each incorrect class to find the best class in which to perturb the clean input. This method exhibits numerical results superior to previous methods but has the drawback of being highly computationally expensive as it both employs random restarts and scales linearly with the number of classes. This makes it necessary to run the attack on massive servers when training on large data sets with high-resolution images, such as ImageNet or similar.
3 Our Algorithm
In our work, we combine the PGD attack with a randomly chosen coordinate-wise step size (See Algorithm 1). Effectively, a random step size is chosen independently for each entry in the gradient so that different pixels are perturbed different amounts with each iteration. WITCHcraft still incorporates a random initialization, which has been found to improve results in previous work [madry2018towards] and comes at no cost to the attack scheme. We terminate the algorithm as soon as the attack is successful at fooling the image classifier.
This strategy of perturbing the gradient signal randomly can be understood as a specific form of stochastic preconditioning of the actual PGD step, which leads to an increasing exploratory power of the optimization scheme. Due to the stochasticity, the algorithm does not easily stagnate or oscillate between two fixed points. As a result, the method avoids getting trapped in local minima or cycles that inhibit progress.
Note that the step size in Algorithm 1 is a 2-dimensional or 3-dimensional array of values (the same dimensions as the image being crafted), as opposed to a single scalar value (as is conventionally used for standard PGD attacks). The step size array is multiplied into the gradient update using a Hadamard (i.e., coordinate-wise) product, denoted . The entries in the step size array are independent and identically distributed and are chosen from uniform distribution on the interval In our experiments, which appear below, we compare this step size choice to a deterministic version with step size , which has an identical expected value to the randomized version.
4.1 Comparison to PGD benchmarks
We test our method on the CIFAR-10 and MNIST data sets against the WideResNet(34-10) model and CNN model with two convolutional layers respectively, trained by the authors of [madry2018towards] using their 7-step PGD adversarial training algorithm [zagoruyko2016wide]. These robust models are canonical for testing attacks and are used for competitive robustness leaderboards [madry2019github]. We focus on attacks, since this choice of norm dominates the robustness literature. Perturbations on CIFAR-10 images are restricted to the ball with radius , while for MNIST, attacks are restricted to the ball with radius .
|20-PGD w/ 10 restarts||45.21%|
WITCHcraft outperforms PGD with Madry’s choice of hyperparameters and the same number of updates as shown in Table 1 and Table 2. It is especially interesting to note that WITCHcraft is able to continuously improve during the attack iterations, whereas the standard PGD method quickly saturates as shown in Figure 4 and Figure 5.
4.2 Exploring the effect of expected step size
Following up on this apparent success, we investigate the sensitivity of PGD and WITCHcraft by comparing the expected step size of our approach and the corresponding fixed step size for PGD. The plot in Figure 2 shows that the standard PGD attack can, in fact, be further enhanced over Madry’s results by fine-tuning the step size. WITCHcraft shows some sensitivity to expected step size but performs at least as well as standard PGD except for very small values of hyperparameters, where randomness seemingly has little to no effect on the optimization. Of particular note is that the best overall reduction in accuracy among these trials is achieved by WITCHcraft at an expected step size of .
The advantage of WITCHcraft over a range of expected step sizes is especially pronounced when attacking the difficult robust MNIST data, as Figure 3 shows. We note that in this table, every value achieved by WITCHcraft surpasses any achieved by PGD.
4.3 Exploring the benefits of additional attack steps
A third way to compare our method to PGD is to see how quickly their success rates saturate as the number of attack steps increase. Figure 4 and Figure 5 show the results of these comparisons on CIFAR-10 and MNIST data, respectively.
We note that in both cases, WITCHcraft suffers less from diminishing returns as the number of steps grows. We hypothesize that this can be explained by the effect of randomness on exploration. The stochastic step size choice in the WITCHcraft algorithm seems to better escape local minima. The result is an algorithm that more aggressively explores the space of permissible attack images than a standard PGD attack with fixed step size.
5 Discussion & Conclusions
In this work, we develop a method for introducing randomness into adversarial attacks without running the attack multiple times at different initializations. This simple modification of the popular PGD adversarial attack improves performance on benchmark data sets against robust models, while avoiding the high cost of conventional random restart methods. We believe that attack algorithms that perform many sequential iterations in a deterministic fashion lose efficiency due to stagnating exploration, and the WITCHcraft algorithm seems to supply a remedy for this problem.
We hope that the proposed method can increase the efficiency of attack generation in situations like adversarial training, where attacks are crafted on-the-fly during training. A reduction in the cost of crafting attacks has the potential to make adversarial training more affordable on large industrial problems. Future work may uncover new ways to introduce randomness into attacks for increased efficiency.