Adversarial Defense by Suppressing Highfrequency Components
Abstract
Recent works show that deep neural networks trained on image classification dataset bias towards textures. Those models are easily fooled by applying small highfrequency perturbations to clean images. In this paper, we learn robust image classification models by removing highfrequency components. Specifically, we develop a differentiable highfrequency suppression module based on discrete Fourier transform (DFT). Combining with adversarial training, we won the 5th place in the IJCAI2019 Alibaba Adversarial AI Challenge. Our code is available online.
1 Introduction
Deep neural networks (DNNs) have achieved stateoftheart performances on many tasks, such as image classifications. However, DNNs have been shown to be vulnerable to adversarial attacks [Szegedy et al.2014] [Goodfellow et al.2015]. Adversarial attacks are carefully designed small perturbations to clean data which significantly change the predictions of target models. The lack of robustness w.r.t adversarial attacks of DNNs brings out security concerns.
In this paper, we focus on defending DNNs from adversarial attacks for image classifications. Many algorithms have been proposed to achieve this purpose. Roughly, those algorithms fall into three categories:

data preprocessing, such as JPEG compression [Das et al.2017] and image denoise [Xu et al.2018].

adding stochastic components into DNNs to hide gradient information [Athalye et al.2018].

adversarial training [Madry et al.2018].
Data preprocessing or stochastic components are usually combined with adversarial training since it is the most successful defense algorithm.
Recent works show that deep neural networks trained on image classification dataset bias towards textures which are the highfrequency components of images [Geirhos et al.2019]. Meanwhile, researchers empirically find that the perturbations generated by adversarial attacks are also highfrequency signals. This means DNNs are mainly fooled by carefully designed textures. Those facts suggest that suppressing highfrequency components of images is helpful to reduce the effects of adversarial attacks and improve the robustness of DNNs. On the other hand, the basic information on clean images will be retained when suppression highfrequency components because it converges on low frequencies. In this paper, we aim to develop a highfrequency suppressing module which is expected to have the following properties:

separability: it should suppress highfrequency components while keep lowfrequency ones.

efficiency: it should have low computational costs compared with the standard DNNs.

differentiability: it should be differentiable which allows to jointly optimize with adversarial training.

controllability: it should be easy to control the degree of highfrequency suppression and the degree of how the original images are modified (e.g. distance).
Discrete Fourier transform (DFT) which maps images into frequency domain is a good tool to achieve those goals. Based on (inverse) DFT, we propose a highfrequency suppressing module which has all those properties. We evaluate our method in the IJCAI2019 Alibaba Adversarial AI Challenge [AAAC]. Our code is available on https://github.com/zzd1992/AdversarialDefensebySuppressingHighFrequencies.
2 Method
2.1 Highfrequency suppression
As mentioned earlier, suppressing highfrequency components is helpful to reduce the effects of adversarial attacks and improve the robustness of DNNs. Given an input image, we transform it into frequency domain via DFT. Then we reduce the highfrequency components in frequency domain. Finally, we transform the modified frequency image back to time domain.
Formally, denote as the input image and as its frequency representation.
(1) 
To suppress the highfrequency components, we modify as follows:
(2) 
where and is elementwise multiplication. controls how different frequency is scaled. Intuitively, should close to for highfrequency components and close to for lowfrequency ones. In this paper, we set to a box window with fixed radius . That is
(3) 
To simplify the notation, we set and . The overall function of our highfrequency suppression module is
(4) 
where means DFT. An image is processed by this module and a standard DNN in order.
Now we analyze the properties of our proposed module.
separability: because is a box window, highfrequency components are completely removed and lowfrequency ones are perfectly reserved.
efficiency: the computational costs are dominated by DFT. For an (we suppose ) image, the time complexity of DFT is . In practice, DFT of a color image is faster than a convolutional layer. Thus the costs of our proposed module are cheap compared with DNNs.
differentiability: DFT can be expressed in matrix form:
(5) 
where is the socalled Fourier transform matrix. Clearly, DFT is differentiable. Instead of an image preprocessing method, this property makes it possible to integrate our module into DNNs and optimize with adversarial training jointly.
controllability: denote as the output of the proposed module. Based on Parseval theory, we have
(6) 
Thus the degree of highfrequency suppression and the norm between the original image and the modified image are easily controlled by varying of the box window. For nature images, spectral energy is converged on lowfrequency regions. Thus is small enough even when most of the frequency components are suppressed ( is small).
2.2 Adversarial training
The idea of adversarial training is optimizing DNNs w.r.t both clean samples and adversarial samples.
(7) 
where maps an image into classification probability, is the parameters of and is the crossentropy loss. is obtained by (iteratively) projected gradient descent (PGD). controls the tradeoff between clean samples and adversarial samples.
Recently, [Zhang et al.2019] propose a novel adversarial training method called TRADES. TRADES is formalized as follows:
(8) 
Instead of minimizing the difference between and the true label, TRADES minimizes the difference between and which encourages the output to be smooth. In this paper, we use TRADES as the adversarial training method because it has a better tradeoff between robustness and accuracy. Refer [Zhang et al.2019] for more information.
Highfrequency suppression  Adversarial training  Model ensemble  Score 
2.0350  
9.9880  
14.9736  
19.0510  
19.7531 
3 Experiments
We first analyze the statistics of clean images and adversarial images in frequency domain. Then we evaluate the proposed method in the IJCAI2019 Alibaba Adversarial AI Challenge (AAAC).
In AAAC, models are evaluated by image classification task for electric business. There are about color images from classes for training. There are images for test. Given an image, the score of a defense model is calculated as follows:
(9) 
where is the predicted label. The final score is averaged over all images and all blackbox attack models. Note that before computing the score, images are resized to .
We use ResNet18 as the DNN architecture for all experiments. Our method is implemented with PyTorch.
3.1 Statistics in frequency domain
We analyze the statistics of clean samples and adversarial samples in frequency domain. Specifically, we study the distributions of cumulative spectrum energy (CSE) w.r.t frequency. Given a 2D signal in frequecy domain, we define CSE as follows:
(10) 
where . We randomly select images from AAAC. We calculate the CSE score of each image and average all scores. We also calculate the averaged CSE score for the corresponding adversarial perturbations which are generated by iteratively PGD. The results are shown in Fig. 1(a). As we can see, CSE for clean images converges on lowfrequency regions while CSE for adversarial perturbations is nearly uniform. Thus, when we suppress the highfrequency components, the effects of adversarial attacks will be significantly reduced while most of the information on clean images will be retained. This is the main motivation of our work.
We calculate CSE score for CIFAR10, as shown in Fig. 1(b). The distribution is similar to AAAC’s.
3.2 AAAC results
As analyzed earlier, when we remove the highfrequency components, the model will be more robust w.r.t adversarial attacks while the accuracy on clean images will be decreased. We evaluate this phenomenon with different without adversarial training. The accuracy is obtained on validation clean images and the robustness is measured by the score of AAAC. We show the results in Fig. 2. As decreased, the robustness w.r.t adversarial attacks is substantially increased.
Then we do ablation study for three strategies and their combinations: 1) the proposed highfrequency suppression module; 2) adversarial training via TRADES; 3) ensembles of models with different . As we can see in Tab. 1, our proposed module is even better than adversarial training in this challenge and those two methods are complementary to each other. The best score is obtained by ensembling models with different each of which is trained together with the proposed module and adversarial training. We secure the 5th place in this challenge (the score for the 1st solution is ).
4 Conclusions and discussions
Motived by the difference of frequency spectrum distributions between clean images and adversarial perturbations, we have proposed a highfrequency suppression module to improve the robustness of DNNs. This module is efficient, differentiable and easy to control. We have evaluated our method in AAAC.
We list several directions or questions which are worth to be further explored:

Is it helpful to change the radius of box window dynamically?

Is it helpful to suppress the highfrequency components of intermediate convolutional features?

We evaluate our method for image classification. Does this method work for other tasks or other kinds of data, such as speech recognition?
References
 [AAAC] AAAC. Ijcai2019 alibaba adversarial ai challenge. https://security.alibaba.com/alibs2019.
 [Athalye et al.2018] Anish Athalye, Nicholas Carlini, and David A Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. international conference on machine learning, pages 274–283, 2018.
 [Das et al.2017] Nilaksh Das, Madhuri Shanbhogue, Shangtse Chen, Fred Hohman, Li Chen, Michael E Kounavis, and Duen Horng Chau. Keeping the bad guys out: Protecting and vaccinating deep learning with jpeg compression. arXiv: Computer Vision and Pattern Recognition, 2017.
 [Geirhos et al.2019] Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A Wichmann, and Wieland Brendel. Imagenettrained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. international conference on learning representations, 2019.
 [Goodfellow et al.2015] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. international conference on learning representations, 2015.
 [Madry et al.2018] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. international conference on learning representations, 2018.
 [Szegedy et al.2014] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J Goodfellow, and Rob Fergus. Intriguing properties of neural networks. international conference on learning representations, 2014.
 [Xu et al.2018] Weilin Xu, David Evans, and Yanjun Qi. Feature squeezing: Detecting adversarial examples in deep neural networks. network and distributed system security symposium, 2018.
 [Zhang et al.2019] Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric P Xing, Laurent El Ghaoui, and Michael I Jordan. Theoretically principled tradeoff between robustness and accuracy. arXiv: Learning, 2019.