Adversarial Defense by Suppressing High-frequency Components
Recent works show that deep neural networks trained on image classification dataset bias towards textures. Those models are easily fooled by applying small high-frequency perturbations to clean images. In this paper, we learn robust image classification models by removing high-frequency components. Specifically, we develop a differentiable high-frequency suppression module based on discrete Fourier transform (DFT). Combining with adversarial training, we won the 5th place in the IJCAI-2019 Alibaba Adversarial AI Challenge. Our code is available online.
Deep neural networks (DNNs) have achieved state-of-the-art performances on many tasks, such as image classifications. However, DNNs have been shown to be vulnerable to adversarial attacks [Szegedy et al.2014] [Goodfellow et al.2015]. Adversarial attacks are carefully designed small perturbations to clean data which significantly change the predictions of target models. The lack of robustness w.r.t adversarial attacks of DNNs brings out security concerns.
In this paper, we focus on defending DNNs from adversarial attacks for image classifications. Many algorithms have been proposed to achieve this purpose. Roughly, those algorithms fall into three categories:
adding stochastic components into DNNs to hide gradient information [Athalye et al.2018].
adversarial training [Madry et al.2018].
Data preprocessing or stochastic components are usually combined with adversarial training since it is the most successful defense algorithm.
Recent works show that deep neural networks trained on image classification dataset bias towards textures which are the high-frequency components of images [Geirhos et al.2019]. Meanwhile, researchers empirically find that the perturbations generated by adversarial attacks are also high-frequency signals. This means DNNs are mainly fooled by carefully designed textures. Those facts suggest that suppressing high-frequency components of images is helpful to reduce the effects of adversarial attacks and improve the robustness of DNNs. On the other hand, the basic information on clean images will be retained when suppression high-frequency components because it converges on low frequencies. In this paper, we aim to develop a high-frequency suppressing module which is expected to have the following properties:
separability: it should suppress high-frequency components while keep low-frequency ones.
efficiency: it should have low computational costs compared with the standard DNNs.
differentiability: it should be differentiable which allows to jointly optimize with adversarial training.
controllability: it should be easy to control the degree of high-frequency suppression and the degree of how the original images are modified (e.g. distance).
Discrete Fourier transform (DFT) which maps images into frequency domain is a good tool to achieve those goals. Based on (inverse) DFT, we propose a high-frequency suppressing module which has all those properties. We evaluate our method in the IJCAI-2019 Alibaba Adversarial AI Challenge [AAAC]. Our code is available on https://github.com/zzd1992/Adversarial-Defense-by-Suppressing-High-Frequencies.
2.1 High-frequency suppression
As mentioned earlier, suppressing high-frequency components is helpful to reduce the effects of adversarial attacks and improve the robustness of DNNs. Given an input image, we transform it into frequency domain via DFT. Then we reduce the high-frequency components in frequency domain. Finally, we transform the modified frequency image back to time domain.
Formally, denote as the input image and as its frequency representation.
To suppress the high-frequency components, we modify as follows:
where and is element-wise multiplication. controls how different frequency is scaled. Intuitively, should close to for high-frequency components and close to for low-frequency ones. In this paper, we set to a box window with fixed radius . That is
To simplify the notation, we set and . The overall function of our high-frequency suppression module is
where means DFT. An image is processed by this module and a standard DNN in order.
Now we analyze the properties of our proposed module.
separability: because is a box window, high-frequency components are completely removed and low-frequency ones are perfectly reserved.
efficiency: the computational costs are dominated by DFT. For an (we suppose ) image, the time complexity of DFT is . In practice, DFT of a color image is faster than a convolutional layer. Thus the costs of our proposed module are cheap compared with DNNs.
differentiability: DFT can be expressed in matrix form:
where is the so-called Fourier transform matrix. Clearly, DFT is differentiable. Instead of an image pre-processing method, this property makes it possible to integrate our module into DNNs and optimize with adversarial training jointly.
controllability: denote as the output of the proposed module. Based on Parseval theory, we have
Thus the degree of high-frequency suppression and the norm between the original image and the modified image are easily controlled by varying of the box window. For nature images, spectral energy is converged on low-frequency regions. Thus is small enough even when most of the frequency components are suppressed ( is small).
2.2 Adversarial training
The idea of adversarial training is optimizing DNNs w.r.t both clean samples and adversarial samples.
where maps an image into classification probability, is the parameters of and is the cross-entropy loss. is obtained by (iteratively) projected gradient descent (PGD). controls the tradeoff between clean samples and adversarial samples.
Recently, [Zhang et al.2019] propose a novel adversarial training method called TRADES. TRADES is formalized as follows:
Instead of minimizing the difference between and the true label, TRADES minimizes the difference between and which encourages the output to be smooth. In this paper, we use TRADES as the adversarial training method because it has a better tradeoff between robustness and accuracy. Refer [Zhang et al.2019] for more information.
|High-frequency suppression||Adversarial training||Model ensemble||Score|
We first analyze the statistics of clean images and adversarial images in frequency domain. Then we evaluate the proposed method in the IJCAI-2019 Alibaba Adversarial AI Challenge (AAAC).
In AAAC, models are evaluated by image classification task for electric business. There are about color images from classes for training. There are images for test. Given an image, the score of a defense model is calculated as follows:
where is the predicted label. The final score is averaged over all images and all black-box attack models. Note that before computing the score, images are resized to .
We use ResNet-18 as the DNN architecture for all experiments. Our method is implemented with PyTorch.
3.1 Statistics in frequency domain
We analyze the statistics of clean samples and adversarial samples in frequency domain. Specifically, we study the distributions of cumulative spectrum energy (CSE) w.r.t frequency. Given a 2D signal in frequecy domain, we define CSE as follows:
where . We randomly select images from AAAC. We calculate the CSE score of each image and average all scores. We also calculate the averaged CSE score for the corresponding adversarial perturbations which are generated by iteratively PGD. The results are shown in Fig. 1(a). As we can see, CSE for clean images converges on low-frequency regions while CSE for adversarial perturbations is nearly uniform. Thus, when we suppress the high-frequency components, the effects of adversarial attacks will be significantly reduced while most of the information on clean images will be retained. This is the main motivation of our work.
We calculate CSE score for CIFAR-10, as shown in Fig. 1(b). The distribution is similar to AAAC’s.
3.2 AAAC results
As analyzed earlier, when we remove the high-frequency components, the model will be more robust w.r.t adversarial attacks while the accuracy on clean images will be decreased. We evaluate this phenomenon with different without adversarial training. The accuracy is obtained on validation clean images and the robustness is measured by the score of AAAC. We show the results in Fig. 2. As decreased, the robustness w.r.t adversarial attacks is substantially increased.
Then we do ablation study for three strategies and their combinations: 1) the proposed high-frequency suppression module; 2) adversarial training via TRADES; 3) ensembles of models with different . As we can see in Tab. 1, our proposed module is even better than adversarial training in this challenge and those two methods are complementary to each other. The best score is obtained by ensembling models with different each of which is trained together with the proposed module and adversarial training. We secure the 5th place in this challenge (the score for the 1st solution is ).
4 Conclusions and discussions
Motived by the difference of frequency spectrum distributions between clean images and adversarial perturbations, we have proposed a high-frequency suppression module to improve the robustness of DNNs. This module is efficient, differentiable and easy to control. We have evaluated our method in AAAC.
We list several directions or questions which are worth to be further explored:
Is it helpful to change the radius of box window dynamically?
Is it helpful to suppress the high-frequency components of intermediate convolutional features?
We evaluate our method for image classification. Does this method work for other tasks or other kinds of data, such as speech recognition?
- [AAAC] AAAC. Ijcai-2019 alibaba adversarial ai challenge. https://security.alibaba.com/alibs2019.
- [Athalye et al.2018] Anish Athalye, Nicholas Carlini, and David A Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. international conference on machine learning, pages 274–283, 2018.
- [Das et al.2017] Nilaksh Das, Madhuri Shanbhogue, Shangtse Chen, Fred Hohman, Li Chen, Michael E Kounavis, and Duen Horng Chau. Keeping the bad guys out: Protecting and vaccinating deep learning with jpeg compression. arXiv: Computer Vision and Pattern Recognition, 2017.
- [Geirhos et al.2019] Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A Wichmann, and Wieland Brendel. Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. international conference on learning representations, 2019.
- [Goodfellow et al.2015] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. international conference on learning representations, 2015.
- [Madry et al.2018] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. international conference on learning representations, 2018.
- [Szegedy et al.2014] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J Goodfellow, and Rob Fergus. Intriguing properties of neural networks. international conference on learning representations, 2014.
- [Xu et al.2018] Weilin Xu, David Evans, and Yanjun Qi. Feature squeezing: Detecting adversarial examples in deep neural networks. network and distributed system security symposium, 2018.
- [Zhang et al.2019] Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric P Xing, Laurent El Ghaoui, and Michael I Jordan. Theoretically principled trade-off between robustness and accuracy. arXiv: Learning, 2019.