A principled approach for generating adversarial images under nonsmooth dissimilarity metrics
Abstract.
Deep neural networks perform well on real world data but are prone to adversarial perturbations: small changes in the input easily lead to misclassification. In this work, we propose an attack methodology not only for cases where the perturbations are measured by norms, but in fact any adversarial dissimilarity metric with a closed proximal form. This includes, but is not limited to, , and perturbations; the counting “norm” (i.e. true sparseness); and the total variation seminorm, which is a (non) convolutional dissimilarity measuring local pixel changes. Our approach is a natural extension of a recent adversarial attack method, and eliminates the differentiability requirement of the metric. We demonstrate our algorithm, ProxLogBarrier, on the MNIST, CIFAR10, and ImageNet1k datasets. We consider undefended and defended models, and show that our algorithm easily transfers to various datasets. We observe that ProxLogBarrier outperforms a host of modern adversarial attacks specialized for the case. Moreover, by altering images in the total variation seminorm, we shed light on a new class of perturbations that exploit neighboring pixel information.
1. Introduction
Deep neural networks (DNNs) have strong classification abilities on training and validation datasets. However, they are vulnerable to adversarial images, which are formally defined as imperceptibly small changes (in a given dissimilarity metric) to model input that lead to misclassification [origin_adversarial, FGSM]. This behavior could mean several things: the model is overfitting on some level; the model is underregularized; or this is simply due to complex nonlinearities in the model. This has lead to several lines of work in the deep learning community: the generation of adversarial images, defending against these adversarial attacks, and lastly determining which dissimilarity metric to consider.
Regarding the latter, it is not obvious what “imperceptibly small” means, and recent work has demonstrated adversarial image generation beyond norms by considering deformations instead of perturbations [adef]. There is also the problem of generating “realistic” attacks, such as through sparse attacks. For example these include small stickers on a road sign, which may tamper with autonomous vehicles [eykholt2017robust]. The purpose of this work is adversarial image generation for a broad class of (possibly nondifferentiable) dissimilarity metrics for both undefended and defended networks. We do not make judgment regarding which metric is “best”; instead we are interested in an attack framework that works well for a broad class of metrics.
Adversarial attacks are often broadly categorized into one of two types: whitebox attacks, where the full structure of the neural network is provided to the attacker, including gradient information, or blackbox attacks, where the attacker is only given the model decision. One of the first proposed adversarial attacks is the Fast Gradient Signed Method (FGSM), which generates an adversarial image with respect to the norm, along with its iterative form, dubbed Iterative FGSM (IFGSM) [FGSM, ifgsm]. A similar iterative attack was also done with respect to the norm. In their purest form, the above attacks perform gradient ascent on the training loss function subject to a norm constraint on the perturbation, either with one step in the case of FGSM, or multiple steps in the case of IFGSM, and their norm equivalents. Apart from training loss maximization, attacks have been developed using loss functions that directly measure misclassification [cw, deepfool]. Others have considered the and norms; these both induce sparsity in the perturbations [sparsefool]. In the blackbox setting, adversarial examples are generated using only model decisions, which is a much more expensive endeavor. However, blackbox methods often perform better, most notably by avoiding gradient obfuscation, since they take advantage of sampling properties near the decision boundary of the model. Notable examples of blackbox (decisionbased) attacks are the Boundary Attack [boundaryattack] and the recent HopSkipJumpAttack [bapp].
The development of new and improved adversarial attacks has occurred in parallel with various defensive training regimes to provide robustness against adversarial perturbations. The task of training a robust network is twofold: models must be resistant to perturbations of a certain magnitude, while also maintaining classification ability on clean data. It has been argued that these two objectives are inherently “at odds” [atodds]. A popular method for training robust networks is adversarial training, where adversarial examples are added to the training data (see for example [madryLinf]).
Contributions
This paper introduces an attack methodology for not just norms, but any adversarial dissimilarity metric with a closed proximal form. This includes, but is not limited to, , , , the counting “norm”, i.e. a true measurement of sparseness of the perturbation, and total variation, a non dissimilarity. Our approach adopts the relaxation structure of the recently proposed LogBarrier attack [logbarrier], which required differentiable metrics. We extend this work to include a broad class of nonsmooth (nondifferentiable) metrics. Our algorithm, ProxLogBarrier, uses the proximal gradient method for generating adversarial perturbations. We demonstrate our attack on MNIST, CIFAR10, and ImageNet1k datasets. ProxLogBarrier shows significant improvement over both the LogBarrier attack, and over the other attacks we considered. In particular, in the case, we achieve stateoftheart results with respect to a suite of attacks typically used for this problem class. Finally, by using the total variation dissimiliarity, we shed light on a new class of imperceptible adversaries that incorporates neighboring pixel information, which can be viewed as an adversarial attack measured in a convolutional norm.
2. Background material
2.1. Adversarial attacks
Let be the image space, and be the label space (the unitsimplex for classes). An imagelabel pair is defined by , with the image belonging to one of classes. The trained model is defined by . An adversarial perturbation should be small with respect to a dissimilarity metric (henceforth simply called the metric) , e.g. . Formally, the optimal adversarial perturbation is the minimizer of the following optimization problem:
(1) 
DNNs might be powerful classifiers, but that does not mean their decision boundaries are wellbehaved. Instead, researchers have popularized using the training loss, often the crossentropy loss, as a surrogate for the decision boundary: typically a model is trained until the loss is very low, which is often related to good classification performance. Thus, instead of solving (1), one can perform Projected Gradient Descent (PGD) on the crossentropy loss:
(2) 
where is typically taken to be either the or norm, and defines the perturbation threshold of interest.
Some adversarial attack methods try to solve the problem posed in (1) without incorporating the loss function used to train the network. For example, Carlini & Wagner attack the logitlayer of a network and solve a different optimization problem, which depends on the choice of norm [cw]. Regarding adversarial defense methods, they demonstrated how a significant number of prior defense methods fail because of “gradient obfuscation”, where gradients are small only locally to the image [obfuscated_cw]. Another metric of adversarial dissimilarity is the “norm”, which counts the number of total different pixels between the adversary and the clean image [sparsefool, jsma]. This is of interest because an adversary might be required to also budget the number of allowed pixels to perturb, while still remaining “imperceptible” to the human eye. For example, the stickerattack [eykholt2017robust] is a practical attack with realworld consequences, and does not interfere with every single part of the image.
2.2. Proximal gradient method
Our adversarial attack amounts to a proximal gradient method. Proximal algorithms are a driving force for nonsmooth optimization problems, and are receiving more attention in the deep learning community on a myriad of problems [proxquant, admmprox, learningprox, catalyst]. For a full discussion on this topic, we suggest [beckbook].
We consider the following framework for proximal algorithms, namely a composite minimization problem
(3) 
where is a Euclidean space. We make the following assumptions:

is a nondegenerate, closed convex function over

is nondegenerate, closed function, with convex, and has Lipschitz gradients over the interior of its domain


the solution set, , is nonempty.
Generating a stationary point of (3) amounts to finding a fixed point of the following sequence:
(4) 
where is some step size, and is defined as
Despite not being convex, there are still convergence properties we can get from a sequence of iterates generated in this way. The following theorem is a simplified version of what can be found in [beckbook] (Section 10.3 with proof), and is the main motivation for our proposed method.
3. Our method: ProxLogBarrier
Following the previous theoretical ideas, we reformulate (1) in the following way:
(5) 
Here, is the model output before the softmax layer that “projects” onto , and so and . In other words, we want to perturb the clean image minimally in such a way that the model misclassifies it. This problem is difficult as the decision boundary has virtually no exploitable structure. Thus the problem can be relaxed using a logarithmic barrier, a technique often used in traditional optimization [nocedal],
(6) 
This objective function now includes the constraint that enforces misclassification. In [logbarrier], (6) was originally solved via gradient descent, which necessarily assumes that is at least differentiable. The assumption of differentiability is not a given, and may be impracticable. For example, consider the subgradient of for an element in ;
where , and are the standard basis vectors. At each subgradient step, very little information is obtained. Indeed, in the original LogBarrier paper, a smooth approximation of this norm was used to get around this issue. We shall see that this does not occur with our proposed ProxLogBarrier method.
For brevity, let and
The optimization problem (6) becomes
(7) 
One can draw several similarities between (7) and (3). As before, we have no guarantees of convexity on , which is a representation of in the composite problem, but it is smooth provided (that is, is smooth from a computational perspective). Our dissimilarity metric represents , as it usually has a closedform proximal operator. Thus, we simply turn to the proximal gradient method to solve the minimization problem in (7).
We iteratively find a minimizer for the problem; the attack is outlined in Algorithm 1. Due to the highly nonconvex nature of the decision boundary, we perform a backtracking step to ensure the proposed iterate is in fact adversarial. We remark that the adversarial attack problem is constrained by the imagespace, and thus requires a further projection step back onto the image space (pixels must be in the range [0,1]). In traditional nonconvex optimization, best practice is to also record the “best iterate”, as valleys are likely pervasive throughout the decision boundary. This way, even at some point our gradient sends our image faroff and is unable to return in the remaining iterations, we already have a better candidate. The algorithm begins with a misclassified image, and moves the iterates towards the original image by minimizing the dissimilarity metric. Misclassification is maintained by the log barrier function, which prevents the iterates from crossing the decision boundary. Refer to Figure 1. Contrast this with PGD based algorithms, which begin at or near the original image, and iterate away from the original image.
Proximal operators for dissimilarities
To complete the algorithm, it remains to compute the proximal operator for various choices of . One can turn to [beckbook] for complete derivations of the proximal operators for the adversarial metrics we are considering, namely norms, and the cardinality function. Consider measuring the distance between the clean image and our desired adversarial perturbation:
Due to the Moreau Decomposition Theorem [RWbible], the proximal operator of this function relies on projecting onto the unit ball:
We make use of the algorithm from [duchi] to perform the projection step, implemented over batches of vectors for efficiency. Similarly, one obtains the proximal operator for and via the same theorem,
where is the soft thresholding operator. In the case that one wants to minimize the number of perturbed pixels in the adversarial image, one can turn to the counting “norm”, called , which counts the number of nonzero entries in a vector. While this function is nonconvex, the proximal operator still has a closed form:
where is a hardthresholding operator, and acts componentwise in the case of vector arguments.
Example of non dissimilarity: Total variation
We let denote the image space, and for the time being assume the images are grayscale, and let denote the finitedifference operator on the gridspace defined by the image. Then , where
(8) 
where are the pixel indices of the image in rowcolumn notation. The anisotropic total variation seminorm is defined by
(9) 
where is an induced matrix norm. Heuristically, this is a measure of large changes between neighboring pixels. In practice can be implemented via a convolution. In the case of color images, we aggregate the total variation for each channel. Total variation (TV) is not true norm, in that nonzero images can have zero TV. In what follows, we omit the distinction and write TVnorm to mean the total variation seminorm. Traditionally, TV has been used in the context of image denoising [ROF].
What does this mean in the context of adversarial perturbations? The TVnorm of the perturbation will be small when the perturbation has few jumps between pixels. That is, small TVnorm perturbations have locally flat regions. This is primarily because TVnorm is convolutional in nature: the finitedifference gradient operator incorporates neighboring pixel information. We note that this is not the first instance of TV being used as a disimillarity metric [spatially]; however our approach is quite different and is not derived from a flow. An outline for the proximal operator can be found in [beckbook]; we use a standard package for efficient computation [proxtv_1, proxtv_2]
4. Experimental methodology
Outline
We compare the ProxLogBarrier attack with several other adversarial attacks on MNIST [mnist_dataset], CIFAR10 [cifar10_dataset], and ImageNet1k [imagenet_dataset]. For MNIST, we use the network described in [jsma]; on CIFAR10, we use a ResNeXt network [resnext]; and for ImageNet1k, ResNet50 [resnet50, DAWNBench]. We also consider defended models for the aforementioned networks. This is to further benchmark the attack capability of the ProxLogBarrier, and to reaffirm previous work in the area. For defended models, we consider Madrystyle adversarial training for CIFAR10 and MNIST [madryLinf]. On ImageNet1k, we use the recently proposed scaleable input gradient regularization for adversarial robustness [finlay2019scaleable]. We randomly select 1000 (test) images to evaluate performance on MNIST and CIFAR10, and 500 (test) images on ImageNet1k. We consider the same images on their defended counterparts. We note that for ImageNet1k, we consider the problem of Top5 misclassification, where the log barrier is with respect to the following constraint set
where denotes the largest index.
We compare the ProxLogBarrier attack with a wide range of attack algorithms that are available through the FoolBox adversarial attack library [foolbox]. For perturbations in , we compare against SparseFool [sparsefool], Jacobian Saliency Map Attack (JSMA) [jsma], and Pointwise [pointwise] (this latter attack is blackbox). For attacks, we consider CarliniWagner’s attack (CW) [cw], Projected Gradient Descent (PGD) [ifgsm], DeepFool [deepfool], and the original LogBarrier attack [logbarrier]. Finally, for norm perturbations, we consider PGD, DeepFool, and LogBarrier. All hyperparameters are left to their implementation defaults, with the exception of SparseFool, where we used the exact parameters indicated in the paper. We omit the OnePixel attack [onepixel], as [sparsefool] showed that this attack is quite weak on MNIST, CIFAR10, and not tractable on ImageNet1k.
Implementation details for our algorithm
When optimizing for based noise, we initialize the adversarial image with sufficiently large Gaussian noise; for and based perturbations, we use uniform noise. For hyperparameters, we used , with . We observed some computational drawbacks for ImageNet1k: firstly, the proximal operator for the norm is far too strict. We decided to use the norm to induce sparseness in our adversarial perturbation (changing both the prox parameter and the step size to ). Other parameter changes for the ImageNet1k dataset are that for the proximal parameter in the case, we set , and we used 2500 algorithm iterations. Finally, we found that using the softmax layer outputs helps with ImageNet1k attacks against both the defended and undefended network. For TVnorm, perturbations, we set the proximal parameter , and with (far less than before).
Reporting
For perturbations in and , we report the percent misclassification at various threshold levels that are somewhat standard [atodds]. Our choices for distance thresholds were arbitrary, however we supplement with a median perturbation distances on all attack norms to mitigate cherrypicking. For attacks that were unable to successfully perturb at least half the sampled images, we do not report anything. If the attack was able to perturb more than half but not all, we add an asterisk to the median distance. We denote the defended models by “(D)” (recall that for MNIST and CIFAR10, we are using Madry’s adversarial training, and scaleable inputgradient regularization for Imagenet1k).
Perturbations in
Result for perturbations are found in Table 1, with examples available in Figure 2 and Figure 3(b). Across all datasets considered, ProxLogBarrier outperforms all other attack methods, for both defended and undefended networks. It also appears immune to Madrystyle adversarial training on both MNIST and CIFAR10. This is entirely reasonable, for the Madrystyle adversarial training is targeted towards attacks. In contrast, on ImageNet1k, the defended model trained with inputgradient regularization performs significantly better than the undefended model, even though this defence is not aimed towards attacks. Neither JSMA or Pointwise scale to networks on ImageNet1k. Pointwise exceeds at smaller images, since it takes less than 1000 iterations to cycle over every pixel and check if it can be zero’d out. We remark that SparseFool was unable to adversarially attack all images, whereas ProxLogBarrier always succeeded.
Perturbations in
Results for perturbations are found in Table 2. Our attack stands out on MNIST, in both the defended and undefended case. On CIFAR10, our attack is best on the undefended network, and only slightly worse than PGD when adversarially defended. On ImageNet1k, our method suffers dramatically. This is likely due to very poor decision boundaries with respect to this norm , as our method will necessarily be better when the boundaries are not muddled. PGD does not focus on the decision boundaries explicitly, thus has more room to find something adversarial quickly.
Perturbations in
Results for perturbations measured in Euclidean distance are found in Table 3. For MNIST and ImageNet1k, on both defended and undefended networks, our attack performs better than all other methods, both in median distance and at a given perturbation norm threshold. On CIFAR10, we are best on undefended but lose to CW in the defended case. However, the CW attack did not scale to ImageNet1k using the implementation in the FoolBox attack library.
Perturbations in the TVnorm
To our knowledge, there are no other TVnorm atacks against which to compare our methods. However, we present the median total variation across the data in question, and a handful of pictures for illustration. On MNIST, adversarial images with minimal total variation are often as expected: nearflat perturbations or very few pixels perturbed (see Figure 2(a)). For CIFAR10 and ImageNet1k, we have found that adversarial images with small TVnorm have an adversarial “tint” on the image: they appear nearly identical to the original, with a small color shift. When the adversary is not a tint, perturbations are highly localized or localized in several regions. See for example Figures 2(b) and 3(a).
Algorithm runtime
We strove to implement ProxLogBarrier so that it could be run in a reasonable amount of time. For that reason, ProxLogBarrier was implemented to work over a batch of images. Using one consumer grade GPU, we can comfortably attack several MNIST and CIFAR10 images simultaneously, but only one ImageNet1k image at a given time. We report our algorithm runtimes in Table 5. Algorithms implemented from the FoolBox repository were not written to take advantage of the GPU, hence we omit runtime comparisons. Heuristically speaking, PGD is one of the faster algorithms, whereas CW, SparseFool, and DeepFool are slower. We omit the computational complexity for minimizing total variation since the proximal operator is coded in C, and not Python.
We are not surprised that our attack in takes longer than the other norms; this is likely due to the backtracking step to ensure misclassification of the iterate. On ImageNet1k, the ProxLogBarrier attack in the metric is quite slow due to the projection step onto the ball, which is , where is the input dimension size [duchi].
5. Conclusion
We have presented a concise framework for generating adversarial perturbations by incorporating the proximal gradient method. We have expanded upon the LogBarrier attack, which was originally only effective in and norms, by addressing the norm case and the total variation seminorm. Thus we have proposed a method unifying all three common perturbation scenarios. Our approach requires fewer hyperparameter tweaks than LogBarrier, and performs significantly better than many attack methods we compared against, both on defended and undefended models, and across all norm choices. We highlight that our method is, to our knowledge, the best choice for perturbations measured in , compared to all other methods available in FoolBox. We also perform better than all other attacks considered on the MNIST network with in the median distance and in commonly reported thresholds. The proximal gradient method points towards new forms of adversarial attacks, such as those measured in the TVnorm, provided the attack’s dissimilarity metric has a closed proximal form.