Adversarial Framing for Image and Video Classification
Neural networks are prone to adversarial attacks. In general, such attacks deteriorate the quality of the input by either slightly modifying most of its pixels, or by occluding it with a patch. In this paper, we propose a method that keeps the image unchanged and only adds an adversarial framing on the border of the image. We show empirically that our method is able to successfully attack state-of-the-art methods on both image and video classification problems. Notably, the proposed method results in a universal attack which is very fast at test time. Source code can be found at github.com/zajaczajac/adv_framing.
Adversarial Framing for Image and Video Classification
Michał Zając††thanks: Equal contribution1, 2, Konrad Żołna1, 3, Negar Rostamzadeh3, Pedro O. Pinheiro3 1Jagiellonian University, Kraków, Poland 2Nomagic, Warsaw, Poland 3Element AI, Montréal, Canada email@example.com, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org
The remarkable success of deep convolutional networks for image and video classification (?; ?) has spurred interest in analyzing their robustness. Unfortunately, it turned out that even though neural networks often achieve human level performance (?), they are susceptible to adversarial attacks (?). It means that the output of a neural network-based classifier may be drastically changed by applying a small perturbation to its input. We divide such perturbations into two categories: fully-affecting and partially-affecting.
Fully-affecting attacks generate small pixel intensity modifications which are optimized to be hardly visible for humans. These attacks typically have their or norm constrained (?; ?) and hence affect the whole image.
Partially-affecting attacks usually have their norm constrained. They introduce perceptible but small occlusion to the image, such as a patch (?; ?) or a single pixel (?).
The attacks mentioned above either slightly modify all the pixels of the image or occlude parts of it. However, the attackers may find this to be a serious limitation and seek for new types of attacks. For instance, consider a scenario where they upload videos containing forbidden content. Their goal is to bypass video-sharing website’s filters. At the same time, the perturbations introduced should not be distracting and all information should be retained.
In this paper, a new attack which is well-suited for the above-mentioned purposes is demonstrated. The method, dubbed adversarial framing (AF), consists in simply adding a thin border around the original input (which may be an image or a video), keeping the whole content unchanged (see Figure 1 and youtu.be/PrU9R6eFNTs for some qualitative results). The attack is universal (?), which means the same AF is applied to all inputs. The method only requires substantial computing during the training procedure. At test time, the only extra computation required is the appending of the precomputed framing to the input.
Similarly to (?), we believe that research on attack techniques deepens understanding of inner workings of neural networks. We hope that our work and analyzing adversarial attacks in general can be helpful in designing defenses and/or robust methods.
In this work, we consider a white-box setting, in which an access to the architecture and weights of the trained classifier is given. Previous work has shown that if only black-box access is given, a surrogate model can be leveraged to obtain an attack that transfers well to the original model (?). Therefore, a white-box model is a realistic assumption and, in fact, is the most commonly considered paradigm in the literature.
Computing the adversarial framing
Suppose a labeled dataset of images or videos is given. Moreover, a differentiable classifier has been trained so that for each input and class , a probability is assigned to of being in class .
We now present a procedure to train the adversarial framing to attack . During training, a minibatch is sampled from . Every example is surrounded with the same framing, which is the current version of the trained AF. In case of videos, every frame of each example is surrounded with the same framing. Then the classification loss is backpropagated and the framing is modified using its gradients to maximize the loss. The training continues until convergence. The framing’s width is a tunable hyperparameter fixed at the beginning of the training procedure.
For a detailed explanation see Algorithm 1. The algorithm is presented for image datasets. The modification for video datasets is straightforward.
Note that the input size is modified due to the addition of the framing. This does not pose any issue to the CNN-based classifier, as most modern architectures (such as ResNet (?) or ResNeXt (?)) accept various input sizes. If the classifier’s input size is fixed, the proposed algorithm can be simply modified so that the image is resized before applying adversarial framing. We investigate performance under various resizing strategies further in the paper.
We performed untargeted attacks against state-of-the-art classifiers for ImageNet (?) and UCF101 (?) datasets. We compare our AF to two simple baselines. They both do not require any training and are fixed. One applies uniformly distributed random noise (RF) and another black pixels only (BF).
ImageNet is a large-scale image dataset containing over million images from 1000 various classes. It serves as a popular benchmark for image classification. We performed attacks against ResNet-50 (?) model pretrained on ImageNet. The model was taken from PyTorch Model Zoo (?). Results are reported in Table 0(a).
UCF101 is a dataset containing realistic videos. Each video contains a person performing some action, out of 101 possible classes. We tested our method by performing an attack on a ResNeXt-101 based spatio-temporal 3D CNN – we used model pretrained by (?). This model takes clips as input, each containing 16 consecutive frames. Results are reported in Table 0(b).
As it can be seen in Figure 1 and Figure 2, adversarial framing usually fools the classifier into wrongly recognizing one particular class. In the case of ImageNet, this adversarial class is usually maypole – even across different trainings. We hypothesize that this is because of colorfulness of this object.
In order to make sure that the performance of our attack does not depend on presence of such special classes, we performed attacks in targeted setting. In these experiments, instead of minimizing the output score for the ground-truth class, we maximize the score for a randomly selected target class. We report success rate (i.e. percentage of images classified as a given target) for different target classes.
In all the experiments we used Adam optimizer (?). For all the hyperparameters of the optimizer except for learning rate, we used default values from PyTorch (?) implementation.
On ImageNet (?) we trained for 5 epochs, with initial learning rate 0.1 decaying by 0.1 every 2 epochs and batch size 32. On UCF101 (?) we trained for 60 epochs, with initial learning rate 0.03 decaying by 0.3 every 15 epochs and batch size 32. On both these datasets, we trained adversarial framing using training data only. All the reported results were computed on validation data.
On ImageNet, we applied the framing to images previously resized to . On UCF101, we applied the framing to images previously resized to . These are standard input dimensions for aforementioned datasets.
Grad-CAM (?) is a method for producing visual explanations for a convolutional neural network’s predictions. For a given classifier , input and a class , it computes a heatmap visualizing how much particular regions of contribute to a score of the class output by .
We computed such visualizations for the pretrained ResNet-50 from PyTorch Model Zoo, taking as input images from ImageNet. We consider both the cases with and without an adversarial framing. Few qualitative results are presented in Figure 2111We use the following Grad-CAM implementation: github.com/kazuto1011/grad-cam-pytorch..
Classifier’s input resizing
Our method does not modify pixels of the original input (with dimensions ) and only adds a framing around it. This results in dimensions of the classifier input becoming where is framing’s width. This is fine for most of the state-of-the-art image classification architectures; however, to make sure the approach also works for classifiers with fixed input size, we conducted experiments with attacking the ImageNet classifier for several image resizing strategies:
no resizing, input dimensions are changed (Vanilla). The framing is trained with Algorithm 1.
first the framing is added, and then the whole image is rescaled back to (Frame & Resize, F&R). We use the same framing as in 1.
the image is first scaled to and then the framing is added, so that size is again (Resize & Frame, R&F). We train the framing separately because the number of parameters is smaller than in 1.
framing is put on the original image, occluding its border pixels; the size remains unchanged (Occlude). We use the same framing as in 3.
While we see differences in results, all the variants prove very efficient for . Compared to other resizing strategies, performance is especially degraded in Frame & Resize. This is expected since the adversarial framing itself is resized and mixed with neighbouring pixels there.
Based on these results, if one can change input dimensions, Vanilla approach performs the best, and otherwise Resize & Frame leads to the highest error rate. Results are shown in Table 3.
Universal partially-affecting attacks
Since existing attacks are quite different from our approach, it is hard to perform a direct comparison. However, we try to compare our work with universal partially-affecting attacks using localized patches. We are aware of two works that perform these kind of attacks, LaVAN (?) and Adversarial patch (?). Both methods were tested on ImageNet and hence we will focus on that case.
Unfortunately, each of these works consider different percentages of the image pixels that may be altered. We thus first recall our results for various framing sizes and then relate it to results from other works. With AF of width 1, we use less than of the image’s pixels and accuracy in untargeted setting drops to . For , we use less than of the image’s pixels and the accuracy is only. Finally, for we use less than of the image’s pixels to make the classifier almost completely confused ( accuracy) in untargeted setting and achieve average success rate in targeted setting.
In LaVAN, a patch occluding about of the image is used (which is comparable to our AF of width 1). Their universal attack has success rate in targeted setting. When they use the same patch to measure untargeted performance, they change the output class of the classifier for only of data, which suggests that the accuracy of the classifier is higher than achieved by our method.
Adversarial patch is a method that creates localized perturbations which can be deployed in a real world. The authors consider targeted setting only. They measure success rate as a function of percentage of pixels used. They need to occlude at least of pixels to obtain success rate.
As mentioned before, the comparison to prior works is burdensome due to the differences in shape, localization and design of other approaches. However, when we put all these characteristics aside and focus on the performance in respect to the ratio of perturbed pixels to the original ones, it seems that our method performs better than prior approaches. Additionally, our method is shown to generalize to videos.
Attacking video classifiers
Although extensive literature exists on attacks against image classifiers, we are aware of only a few works on video classifier attacks (?; ?; ?). While resulting in successful attacks, these approaches are fully-affecting and hence introduce adversarial artifacts in the video. In contrast, output from our attack contains the original video and no information is lost. Moreover, the framing is constant over all video frames, removing any “flickering” effect that could potentially be distracting to viewers.
In this work, we present a simple method for attacking both image and video classifiers. The proposed attack is universal (i.e. the same adversarial framing can be applied in different images or videos), efficient and effective. Moreover, our method does not modify the original content of the input and only adds a small border to surround it.
Michał Zając is co-financed by National Centre for Research and Development as a part of EU supported Smart Growth Operational Programme 2014-2020 (POIR.01.01.01-00-0392/17-00).
Konrad Żołna is financially supported by National Science Centre, Poland (2017/27/N/ST6/00828).
- [Athalye, Carlini, and Wagner 2018] Athalye, A.; Carlini, N.; and Wagner, D. 2018. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420.
- [Brown et al. 2017] Brown, T. B.; Mané, D.; Roy, A.; Abadi, M.; and Gilmer, J. 2017. Adversarial patch. CoRR abs/1712.09665.
- [Carlini and Wagner 2017] Carlini, N., and Wagner, D. A. 2017. Towards evaluating the robustness of neural networks. In Symposium on Security and Privacy.
- [Hara, Kataoka, and Satoh 2018] Hara, K.; Kataoka, H.; and Satoh, Y. 2018. Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet? In CVPR.
- [He et al. 2016] He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep residual learning for image recognition. In CVPR.
- [Karmon, Zoran, and Goldberg 2018] Karmon, D.; Zoran, D.; and Goldberg, Y. 2018. LaVAN: Localized and visible adversarial noise. In ICML.
- [Karpathy et al. 2014] Karpathy, A.; Toderici, G.; Shetty, S.; Leung, T.; Sukthankar, R.; and Fei-Fei, L. 2014. Large-scale video classification with convolutional neural networks. In CVPR.
- [Kingma and Ba 2014] Kingma, D. P., and Ba, J. 2014. Adam: A method for stochastic optimization. CoRR abs/1412.6980.
- [Krizhevsky, Sutskever, and Hinton 2012] Krizhevsky, A.; Sutskever, I.; and Hinton, G. E. 2012. Imagenet classification with deep convolutional neural networks. In NIPS.
- [Li et al. 2018] Li, S.; Neupane, A.; Paul, S.; Song, C.; Krishnamurthy, S. V.; Roy-Chowdhury, A. K.; and Swami, A. 2018. Adversarial perturbations against real-time video classification systems. CoRR abs/1807.00458.
- [Moosavi-Dezfooli et al. 2017] Moosavi-Dezfooli, S.-M.; Fawzi, A.; Fawzi, O.; and Frossard, P. 2017. Universal adversarial perturbations. CVPR.
- [Moosavi-Dezfooli, Fawzi, and Frossard 2016] Moosavi-Dezfooli, S.-M.; Fawzi, A.; and Frossard, P. 2016. Deepfool: a simple and accurate method to fool deep neural networks. In CVPR.
- [Papernot, McDaniel, and Goodfellow 2016] Papernot, N.; McDaniel, P. D.; and Goodfellow, I. J. 2016. Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. CoRR abs/1605.07277.
- [Paszke et al. 2017] Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; and Lerer, A. 2017. Automatic differentiation in pytorch. In NIPS-W.
- [Rey-de Castro and Rabitz 2018] Rey-de Castro, R., and Rabitz, H. 2018. Targeted nonlinear adversarial perturbations in images and videos. CoRR abs/1809.00958.
- [Russakovsky et al. 2015] Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. 2015. Imagenet large scale visual recognition challenge. IJCV.
- [Selvaraju et al. 2017] Selvaraju, R. R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; and Batra, D. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In ICCV.
- [Soomro, Zamir, and Shah 2012] Soomro, K.; Zamir, A. R.; and Shah, M. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. CoRR abs/1212.0402.
- [Su, Vargas, and Sakurai 2017] Su, J.; Vargas, D. V.; and Sakurai, K. 2017. One pixel attack for fooling deep neural networks. CoRR abs/1710.08864.
- [Szegedy et al. 2014] Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.; and Fergus, R. 2014. Intriguing properties of neural networks. In ICLR.
- [Taigman et al. 2014] Taigman, Y.; Yang, M.; Ranzato, M.; and Wolf, L. 2014. Deepface: Closing the gap to human-level performance in face verification. In CVPR.
- [Wei, Zhu, and Su 2018] Wei, X.; Zhu, J.; and Su, H. 2018. Sparse adversarial perturbations for videos. CoRR abs/1803.02536.
- [Xie et al. 2017] Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; and He, K. 2017. Aggregated residual transformations for deep neural networks. In CVPR.