Bandlimiting Neural Networks Against
Adversarial Attacks
Abstract
In this paper, we study the adversarial attack and defence problem in deep learning from the perspective of Fourier analysis. We first explicitly compute the Fourier transform of deep ReLU neural networks and show that there exist decaying but nonzero high frequency components in the Fourier spectrum of neural networks. We demonstrate that the vulnerability of neural networks towards adversarial samples can be attributed to these insignificant but nonzero high frequency components. Based on this analysis, we propose to use a simple postaveraging technique to smooth out these high frequency components to improve the robustness of neural networks against adversarial attacks. Experimental results on the ImageNet dataset have shown that our proposed method is universally effective to defend many existing adversarial attacking methods proposed in the literature, including FGSM, PGD, DeepFool and C&W attacks. Our postaveraging method is simple since it does not require any retraining, and meanwhile it can successfully defend over 95% of the adversarial samples generated by these methods without introducing any significant performance degradation (less than 1%) on the original clean images.
1 Introduction
Although deep neural networks (DNN) have shown to be powerful in many machine learning tasks, researchers (Szegedy et al., 2013) found that they are vulnerable to adversarial samples. Adversarial samples are subtly altered inputs that can fool the trained model to produce erroneous outputs. They are more commonly seen in image classification task and typically the perturbations to the original images are so small that they are imperceptible to human eye.
Research in adversarial attacks and defences is highly active in recent years. In the attack side, many attacking methods have been proposed (Szegedy et al., 2013; Goodfellow et al., 2014; Papernot et al., 2016a, 2017; MoosaviDezfooli et al., 2016; Kurakin et al., 2016; Madry et al., 2017; Carlini and Wagner, 2017a; Chen et al., 2017; Alzantot et al., 2018; Brendel et al., 2017), with various ways to generate effective adversarial samples to circumvent new proposed defence methods. However, since different attacks may be more effective to different defences or datasets, there is no consensus on which attack is the strongest. Hence for the sake of simplicity, in this work, we will evaluate our proposed defence approach against four popular and relatively strong attacks for empirical analysis. In the defence side, various defence mechanisms have also been proposed, including adversarial training (Rozsa et al., 2016; Kurakin et al., 2016; Tramèr et al., 2017; Madry et al., 2017), network distillation (Papernot et al., 2016b), gradient masking (Nguyen and Sinha, 2017), adversarial detection (Feinman et al., 2017) and adding modifications to neural networks (Xie et al., 2017). Nonetheless, many of them were quickly defeated by new types of attacks (Carlini and Wagner, 2016, 2017b, 2017c, 2017a; Athalye et al., 2018; Athalye and Carlini, 2018; Alzantot et al., 2018). Madry et al. (2017) tried to provide a theoretical security guarantee for adversarial training by a minmax loss formulation, but the difficulties in nonconvex optimization and finding the ultimate adversarial samples for training may loosen this robustness guarantee. As a result, so far there is no defence that is universally robust to all adversarial attacks.
Along the line of researches, there were also investigations into the properties and existence of adversarial samples. Szegedy et al. (2013) first observed the transferability of adversarial samples across models trained with different hyperparameters and across different training sets. They also attributed the adversarial samples to the lowprobability blind spots in the manifold. In (Goodfellow et al., 2014), the authors explained adversarial samples as "a result of models being too linear, rather than too nonlinear." In a later paper, Papernot et al. (2016) showed the transferability occurs across models with different structures and even different machine learning techniques in addition to neural networks. In summary, the general existence and transferability of adversarial samples are well known but the reason of adversarial vulnerability still needs further investigation.
In general, the observation that some small imperceptible perturbations in the inputs of neural networks lead to huge unexpected fluctuations in outputs must correspond to high frequency components in the Fourier spectrum of neutral networks. In this paper, we will start with the Fourier analysis of neural networks and elucidate why there always exist some decaying but nonzero high frequency response components in neural networks. Based on this analysis, we show neural networks are inherently vulnerable to adversarial samples due to the underlying model structure and why simple parameter regularization fails to solve this problem. Next, we propose a simple postaveraging method to tackle this problem. Our proposed method is fairly simple since it works as a postprocessing stage of any given neural network models and it does not require to retrain neural networks at all. Furthermore, we have evaluated the postaveraging method against four popular adversarial attacking methods and our method is shown to be universally effective in defending all examined attacks. Experimental results on the ImageNet dataset have shown that our simple postaveraging method can successfully defend over 95% of adversarial samples generated by these attacks with little performance degradation (less than 1%) on the original clean images.
2 Fourier analysis of neural networks
In order to understand the behaviour of adversarial samples, it is essential to find the Fourier transform of neural networks. Fortunately, for some widely used neural networks, namely fullyconnected neural networks using ReLU activation functions, we may explicitly derive their Fourier transform under some minor conditions. As we will show, these theoretical results will shed light on how adversarial samples happen in neural networks.
2.1 Fourier transform of fullyconnected ReLU neural networks
As we know, any fullyconnected ReLU neural networks (prior to the softmax layer) essentially form piecewise linear functions in input space. Due to space limit, we will only present the main results in this section and the proofs and more details may be found in Appendix.
Definition 2.1.
A piecewise linear function is a continuous function such that there are some hyperplanes passing through origin and dividing into pairwise disjoint regions , , on each of which is linear:
Lemma 2.2.
Composition of a piecewise linear function with a ReLU activation function is also a piecewise linear function.
Theorem 2.3.
The output of any hidden unit in an unbiased fullyconnected ReLU neural network is a piecewise linear function.
This is straightforward because the input to any hidden node is a linear combination of piecewise linear functions and this input is composed with the ReLU activation function to yield the output, which is also piecewise linear. However, each region is the intersection of a different number of halfspaces, enclosed by various hyperplanes in . In general, these regions do not have simple shapes. For the purpose of mathematical analysis, we need to decompose each region into a union of some welldefined shapes having a uniform form, which is called infinite simplex.
Definition 2.4.
Let be a set of linearly independent vectors in . An infinite simplex, , is defined as the region linearly spanned by using only positive weights:
(1) 
Theorem 2.5.
Each piecewise linear function can be formulated as a summation of some simpler functions: , each of which is linear and nonzero only in an infinite simplex as follows:
(2) 
where is a set of linearly independent vectors, and is a weight vector.
In practice, we can always assume that the input to neural networks, , is bounded. As a result, for computational convenience, we may normalize all inputs into the unit hypercube, . Obviously, this assumption can be easily incorporated into the above analysis by multiplying each in eq.(2) by where is the Heaviside step function. Alternatively, we may simplify this term by adding additional hyperplanes to further split the input space to ensure all the elements of do not change signs within each region . In this case, within each region , the largest absolute value among all elements of is always achieved by a specific element, which is denoted as . In other words, the dimension achieves the largest absolute value inside . Similarly, the normalized piecewise linear function may be represented as a summation of some functions: , where each has the following form:
For every , there exists an invertible matrix to linearly transform all vectors of into standard basis vectors in . As a result, each function may be represented in terms of standard bases as follows:
where , and .
Lemma 2.6.
Fourier transform of the following function:
may be presented as:
(3) 
where is the th component of frequency vector , and .
Finally we derive the Fourier transform of fullyconnected ReLU neural networks as follows.
Theorem 2.7.
The Fourier transform of the output of any hidden node in a fullyconnected unbiased^{1}^{1}1For mathematical convenience, we assume neural networks have no biases here. However, regular neural networks with biases may be reformulated as unbiased ones by adding another dimension of constants. Thus, the main results here are equally applicable to both cases. Note that regular neural networks with biases are used in our experiments in this paper. ReLU neural network may be represented as , where denote the differential operator.
Obviously, neural networks are the socalled approximated bandlimited models as defined in (Jiang, 2019), which have decaying high frequency components in Fourier spectrum. Theorem 2.7 further shows how the matrices contribute to the high frequency components when the corresponding region are too small. This is clear since the determinant of is proportional to the volume of in . As we will show later, these small regions may be explicitly exploited to generate adversarial samples for neural networks.
2.2 Understanding adversarial samples
As shown in Theorem 2.3, neural network may be viewed as a sequential division of the input space into many small regions, as illustrated in Figure 1. Each layer is a further division of the existing regions from the previous layers, with each region being divided differently. Hence a neural network with multiple layers would result in a tremendous amount of subregions in the input space. For example, when cutting an dimensional space using hyperplanes, the maximum number of regions may be computed as . For a hidden layer of nodes and input dimension is , the maximum number of regions is roughly equal to . In other words, even a middlesized neural network can partition input space into a huge number of subregions, which can easily exceed the total number of atoms in the universe. When we learn a neural network, we can not expect there is at least one sample inside each region. For those regions that do not have any training sample, the resultant linear functions in them may be arbitrary since they do not contribute to the training objective function at all. Of course, most of these regions are extremely small in size. When we measure the expected loss function over the entire space, their contributions are negligible since the chance for a randomly sampled point to fall into these tiny regions is extremely small. However, adversarial attacking is imposing a new challenge since adversarial samples are not naturally sampled. Given that the total number of regions is huge, those tiny regions are almost everywhere in the input space. For any data point in the input space, we almost surely can find such a tiny region in proximity where the linear function is arbitrary. If a point inside this tiny region is selected, the output of the neural network may be unexpected. These tiny regions are the fundamental reason why neural networks are vulnerable to adversarial samples.
In layered deep neural networks, the linear functions in all regions are not totally independent. If we use to denote the weight matrix in layer , the resultant linear weight in eq.(2) is actually the sum of all concatenated along all active paths. When we make a small perturbation to any input , the fluctuation in the output of any hidden node can be approximated represented as:
(4) 
where denotes the total number of hyperplanes to be crossed when moving to . In any practical neural network, we normally have at least tens of thousands of hyperplanes crossing the hypercube . In other words, for any input in a highdimensional space, we can always move it to cross a large number of hyperplanes to enter a tiny region. When is fairly large, the above equation indicates that the output of a neural network can still fluctuate dramatically even after all weight vectors are regularized by or norm.
At last, we believe the existence of many unlearned tiny regions is an intrinsic property of neural networks given its current model structure. Therefore, simple retraining strategies or small structure modifications will not be able to completely get rid of adversarial samples. In principle, neural networks must be strictly bandlimited to filter out those decaying high frequency components in order to completely eliminate all adversarial samples. We definitely need more research efforts to figure out how to do this effectively and efficiently for neural networks.
3 The proposed defence approach: postaveraging
3.1 Postaveraging
In this paper, we propose a simple postprocessing method to smooth out those high frequency components as much as possible, which relies on a simple idea similar to movingaverage in onedimensional sequential data. Instead of generating prediction merely from one data point, we use the averaged value within a small neighborhood around the data point, which is called postaveraging here. Mathematically, the postaveraging is computed as an integral over a small neighborhood centered at the input:
(5) 
where is the input and represents the output of the neural network, and denotes a small neighborhood centered at the origin and denotes its volume. When we choose to be an sphere in of radius , we may simply derive the Fourier transform of as follows:
(6) 
where is the first kind Bessel function of order . Since the Bessel functions, , decay with rate as (Watson, 1995), we have as . Therefore, if is chosen properly, the postaveraging operation can significantly bandlimit neural networks by smoothing out high frequency components. Note that the similar ideas have been used in (Jiang et al., 1999; Jiang and Lee, 2003) to improve robustness in speech recognition.
3.2 Sampling methods
However, it is intractable to compute the above integral for any meaningful neural network used in practical applications. In this work, we propose to use a simple numerical method to approximate it. For any input , we select points in the neighborhood centered at , i.e. , the integral is approximately computed as
(7) 
Obviously, in order to defend against adversarial samples, it is important to have samples outside the current unlearned tiny region. In the following, we use a simple sampling strategy based on directional vectors. To generate a relatively even set of samples for eq.(7), we first determine some directional vectors , and then move the input along these directions using several step sizes within the sphere of radius :
(8) 
where , and is a selected unitlength directional vector. For each selected direction, we generate six samples within C along both the positive and the negative directions to ensure efficiency and even sampling. Here, we propose two different methods to sample directional vectors:

random: Random sampling is the simplest and most efficient method that one can come up with. We fill the directional vectors with random numbers generated from a standard normal distribution, and then normalize them to have unit length.

approx: Instead of using random directions, it would be much more efficient to move out of the original region if we use the normal directions of the closest hyperplanes. In ReLU neural networks, each hidden node represents a hyperplane in the input space. For any input , the distance to each hyperplane may be computed as , where denotes the output of the corresponding hidden node and . Based on all distances computed for all hidden nodes, we can select the normal directions for the closest hyperplanes. However, computing the exact distances is computationally expensive as it requires backpropagation for all hidden nodes. In implementation, we simply estimate relative distances among all hidden units in the same layer using the weights matrix of this layer and select some closest hidden units in each layer based on the relative distances. In this way, we only need to backpropagate for the selected units. We refer this implementation as "approx" in the experimental results.
4 Experiments
In this section, we evaluate the above postaveraging method to defend against several popular adversarial attacking methods on the challenging ImageNet task.
4.1 Experimental setup

Dataset: Since our proposed postaveraging method does not need to retrain neural networks, we do not need to use any training data in our experiments. For the evaluation purpose, we use the validation set of the ImageNet task (Russakovsky et al., 2015). The validation set consists of 50000 images labelled into 1000 categories. Following settings in (Prakash et al., 2018; Liao et al., 2017; Xie et al., 2017; Athalye et al., 2018), for computational efficiency, we randomly choose 5000 images from the ImageNet validation set and evaluate our approach on these 5000 images.

Target model: We use a pretrained VGG16 network (Simonyan and Zisserman, 2014) with batch normalization that is available from PyTorch. In our experiments, we directly use this pretrained model without any modification.

Source of adversarial attacking methods: We use Foolbox (Rauber et al., 2017), an open source tool box to generate adversarial samples using different adversarial attacking methods. In this work, we have chosen four most popular attacking methods used in the literature: Fast Gradient Sign method (FGSM) (Goodfellow et al., 2014), Projected Gradient Descent (PGD) method (Kurakin et al., 2016; Madry et al., 2017), DeepFool (DF) attack method (MoosaviDezfooli et al., 2016) and Carlini & Wagner (C&W) L2 attack method (Carlini and Wagner, 2017a).
4.2 Evaluation criteria
For each experiment, we define:

Clean set: The dataset that consists of the 5000 images randomly sampled from ImageNet.

Attacked set: For every correctly classified image in the Clean set, if an adversarial sample is successfully generated under the attacking criteria, the original sample is replaced with the adversarial sample; if no adversarial sample is found, the original sample is kept in the dataset. Meanwhile, all the misclassified images are kept in the dataset without any change. Therefore the dataset also has 5000 images.
In our experiments, we evaluate the original model and the model defended using postaveraging on both the Clean and the Attacked sets. The performance is measured in terms of :

Accuracy: number of correctly classified images over the whole dataset.

Defence rate: number of successfully defended adversarial samples over the total number of adversarial samples in the Attacked set. By "successfully defended", it refers to the case where an adversarial sample is correctly classified after the original model is defended by the postaveraging approach.
4.3 Experimental results on top1miss criterion
Original Model  Defended by PostAveraging  

Top1 Accuracy  Top1 Accuracy  Defence  
attack, defence  Clean  Attacked  Clean  Attacked  Rate  #Adv 
FGSM, random  0.7252  0.0224  0.7192  0.6958  0.9363  3514 
FGSM, approx  0.6786  0.6388  0.8372  
PGD, random  0.7252  0.0010  0.7190  0.7048  0.9508  3621 
PGD, approx  0.6786  0.6540  0.8630  
DF, random  0.7252  0.0120  0.7180  0.7052  0.9521  3571 
DF, approx  0.6786  0.6578  0.8681  
C&W, random  0.7252  0.0012  0.7188  0.7064  0.9533  3620 
C&W, approx  0.6786  0.6600  0.8713 
In the experiments reported in this subsection, we generated adversarial samples based on the top1miss criterion, which defines adversarial samples as images whose predicted classes are not the same as their true labels.
Original Model  Defended by PostAveraging  
Top1 Accuracy  Top1 Accuracy  Defence  
attack, defence  Clean  Attacked  Clean  Attacked  Rate  #Adv 
FGSM, random(r=4)  0.7252  0.0224  0.7238  0.4888  0.6614  3514 
FGSM, random(r=15)  0.7242  0.6860  0.9303  
FGSM, random(r=30)  0.7192  0.6958  0.9363  
FGSM, approx(r=4)  0.7252  0.0224  0.7246  0.6534  0.8739  3514 
FGSM, approx(r=15)  0.7056  0.6534  0.8617  
FGSM, approx(r=30)  0.6786  0.6388  0.8372 
Table 1 shows the performance of our defence approach against different attacking methods. In this table, the samples for postaveraging are selected within an sphere of radius as in eq.(8), with different directions. For the approx sampling method, we select 20 directional vectors from each of the last three fullyconnected layers in the original VGG16 model, while for random sampling we simply randomly generate 60 different directions. Both methods result in a total of samples (including the input) for each input image to be used in eq.(7). Moreover, all the adversarial samples generated are restricted to be within the perturbation range . We show the top1 accuracy of the original model and the defended model on both the Clean and the Attacked set respectively, as well as the defence rate of the defended model. Besides, we also show the number of adversarial samples successfully generated by each attacking method in the last column.
From Table 1, we can see that our proposed defence approach is universally robust to all of the attacking methods we have examined. It has achieved about 85% defence rates in all the experiments with only a minor performance degradation in the Clean set. Especially when using the random sampling method, our method can defend about 95% adversarial samples while having very little performance degradation in the Clean set (less than 1%). This is due to random sampling can provide more evenly distributed sampling directions than approx sampling when the neighborhood is large enough ().
However, when the used neighborhood is small, we may anticipate that random sampling may be more sensitive to the neighborhood size . In this case, the randomly sampled directions are usually not the normal directions of the closest hyperplanes so that small radius may not be sufficient to move out the current region. To investigate this problem, we have tested both sampling methods on 3 different radii. Experimental results are shown in Table 2. As we can see, the defence rates drop significantly from 94% to 66% for random sampling when a smaller radius is used while for approx sampling it can even get slight performance improvement when a smaller radius is used. Therefore, we recommend to use relatively larger radii for random sampling and relatively smaller radii for approx sampling. Moreover, we may improve and stabilize the performance by combining two methods with an ensemble model, which will be left for future investigation.
At the end, we have also investigated the effect on performance when using different numbers of sampling directions. As shown in Table 3, the defence performance doesn’t vary much when much less sampling directions are used. For example, our defence approach still retains very good performance even when is used, in which only samples are evaluated for each input image. These samples can be easily packed into a minibatch for very fast computation in GPUs. Hence when time efficiency is a concern, we can significantly reduce the number of sampling directions for faster defensive evaluation.
Original Model  Defended by PostAveraging  
Top1 Accuracy  Top1 Accuracy  Defence  
attack, defence  Clean  Attacked  Clean  Attacked  Rate  #Adv 
FGSM, random(K=6)  0.7252  0.0224  0.7180  0.6944  0.9351  3514 
FGSM, random(K=15)  0.7194  0.6948  0.9351  
FGSM, random(K=60)  0.7192  0.6958  0.9363  
FGSM, approx(K=6)  0.7252  0.0224  0.6754  0.6128  0.7999  3514 
FGSM, approx(K=15)  0.6802  0.6280  0.8190  
FGSM, approx(K=60)  0.6786  0.6388  0.8372 
4.4 Experimental results on top5miss criterion
For image classification on the ImageNet task, it is usually more reasonable to use top5 accuracy due to large number of confusing classes and multilabel cases, in this subsection, we have also evaluated our defence approach against adversarial samples that are generated based on the top5miss criterion. Under the top5miss criterion, adversarial samples are defined as images whose true labels are not among their top 5 predictions. Note that although the adversarial samples are easier to defend under the top5miss criterion, the adversarial samples generated are actually much stronger since the true labels are pushed out of the top5 predictions. Experimental results are shown in Table 4. As shown in the table, our defence approach using random sampling can achieve over 97% defence rates against all four attacking methods. Meanwhile, when measured by top5 accuracy, we can see that the defended models using random sampling yield almost no performance degradation in the Clean set and achieves only a small performance degradation (about 13%) in the Attacked set.
Original Model  Defended by PostAveraging  

Top1 Accuracy  Top5 Accuracy  Top1 Accuracy  Top5 Accuracy  Defence  
attack,defence  Clean  Attacked  Clean  Attacked  Clean  Attacked  Clean  Attacked  Rate  #Adv 
FGSM, random  0.7252  0.1306  0.9136  0.1544  0.7184  0.3436  0.9106  0.8892  0.9565  3796 
FGSM, approx  0.6786  0.3874  0.8856  0.7914  0.8177  
PGD, random  0.7252  0.0074  0.9136  0.0096  0.7190  0.3172  0.9112  0.9014  0.9768  4520 
PGD, approx  0.6786  0.3902  0.8856  0.8290  0.8883  
DF, random  0.7252  0.0454  0.9136  0.0534  0.7180  0.5006  0.9110  0.9034  0.9788  4301 
DF, approx  0.6786  0.5528  0.8856  0.8642  0.9247  
C&W, random  0.7252  0.1116  0.9136  0.3000  0.7196  0.3214  0.9116  0.9054  0.9889  3068 
C&W, approx  0.6786  0.4294  0.8856  0.8458  0.9263 
5 Final remarks
In this paper, we have presented some theoretical results on Fourier analysis of ReLU neural networks. These results are useful for us to understand why neural networks are vulnerable to adversarial samples. As a possible defence strategy, we have proposed a simple postaveraging method. Experimental results on ImageNet have demonstrated that our simple defence technique turns to be very effective against many popular attack methods in the literature. Finally, it will be interesting to see whether our postaveraging method will be still robust against any new attack methods in the future.
Acknowledgments
This work is supported partially by a research donation from iFLYTEK Co., Ltd., Hefei, China, and a discovery grant from Natural Sciences and Engineering Research Council (NSERC) of Canada.
References
 Alzantot et al. (2018) Moustafa Alzantot, Yash Sharma, Supriyo Chakraborty, and Mani B. Srivastava. Genattack: Practical blackbox attacks with gradientfree optimization. CoRR, abs/1805.11090, 2018. URL http://arxiv.org/abs/1805.11090.
 Athalye and Carlini (2018) Anish Athalye and Nicholas Carlini. On the robustness of the cvpr 2018 whitebox adversarial example defenses. arXiv preprint arXiv:1804.03286, 2018.
 Athalye et al. (2018) Anish Athalye, Nicholas Carlini, and David A. Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. CoRR, abs/1802.00420, 2018. URL http://arxiv.org/abs/1802.00420.
 Brendel et al. (2017) Wieland Brendel, Jonas Rauber, and Matthias Bethge. Decisionbased adversarial attacks: Reliable attacks against blackbox machine learning models. arXiv preprint arXiv:1712.04248, 2017.
 Carlini and Wagner (2016) Nicholas Carlini and David Wagner. Defensive distillation is not robust to adversarial examples. arXiv preprint arXiv:1607.04311, 2016.
 Carlini and Wagner (2017a) Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pages 39–57. IEEE, 2017a.
 Carlini and Wagner (2017b) Nicholas Carlini and David Wagner. Adversarial examples are not easily detected: Bypassing ten detection methods. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pages 3–14. ACM, 2017b.
 Carlini and Wagner (2017c) Nicholas Carlini and David Wagner. Magnet and "efficient defenses against adversarial attacks" are not robust to adversarial examples. arXiv preprint arXiv:1711.08478, 2017c.
 Carlini et al. (2019) Nicholas Carlini, Anish Athalye, Nicolas Papernot, Wieland Brendel, Jonas Rauber, Dimitris Tsipras, Ian J. Goodfellow, Aleksander Madry, and Alexey Kurakin. On evaluating adversarial robustness. CoRR, abs/1902.06705, 2019. URL http://arxiv.org/abs/1902.06705.
 Chen et al. (2017) PinYu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and ChoJui Hsieh. Zoo: Zeroth order optimization based blackbox attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, AISec ’17, pages 15–26, New York, NY, USA, 2017. ACM. ISBN 9781450352024. doi: 10.1145/3128572.3140448. URL http://doi.acm.org/10.1145/3128572.3140448.
 Feinman et al. (2017) Reuben Feinman, Ryan R. Curtin, Saurabh Shintre, and Andrew B. Gardner. Detecting adversarial samples from artifacts. CoRR, abs/1703.00410, 2017.
 Goodfellow et al. (2014) Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
 Jiang (2019) Hui Jiang. A new perspective on machine learning: How to do perfect supervised learning. arXiv preprint arXiv:1901.02046., 2019.
 Jiang and Lee (2003) Hui Jiang and ChinHui Lee. A new approach to utterance verification based on neighborhood information in model space. IEEE Transactions on Speech and Audio Processing, 11(5), 2003.
 Jiang et al. (1999) Hui Jiang, Keikichi Hirose, and Qiang Huo. A new approach to utterance verification based on neighborhood information in model space. IEEE Transactions on Speech and Audio Processing, 7(4), 1999.
 Kurakin et al. (2016) Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236, 2016.
 Liao et al. (2017) Fangzhou Liao, Ming Liang, Yinpeng Dong, Tianyu Pang, Jun Zhu, and Xiaolin Hu. Defense against adversarial attacks using highlevel representation guided denoiser. CoRR, abs/1712.02976, 2017. URL http://arxiv.org/abs/1712.02976.
 Madry et al. (2017) Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
 MoosaviDezfooli et al. (2016) SeyedMohsen MoosaviDezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2574–2582, 2016.
 Nguyen and Sinha (2017) Linh Nguyen and Arunesh Sinha. A learning approach and masking approach to secure learning. CoRR, abs/1709.04447, 2017. URL http://arxiv.org/abs/1709.04447.
 Papernot et al. (2016a) N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami. The limitations of deep learning in adversarial settings. In 2016 IEEE European Symposium on Security and Privacy (EuroS P), pages 372–387, March 2016a. doi: 10.1109/EuroSP.2016.36.
 Papernot et al. (2016b) N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE Symposium on Security and Privacy (SP), pages 582–597, May 2016b. doi: 10.1109/SP.2016.41.
 Papernot et al. (2016) Nicolas Papernot, Patrick D. McDaniel, and Ian J. Goodfellow. Transferability in machine learning: from phenomena to blackbox attacks using adversarial samples. CoRR, abs/1605.07277, 2016. URL http://arxiv.org/abs/1605.07277.
 Papernot et al. (2017) Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. Practical blackbox attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, ASIA CCS ’17, pages 506–519, New York, NY, USA, 2017. ACM. ISBN 9781450349444. doi: 10.1145/3052973.3053009. URL http://doi.acm.org/10.1145/3052973.3053009.
 Prakash et al. (2018) Aaditya Prakash, Nick Moran, Solomon Garber, Antonella DiLillo, and James A. Storer. Deflecting adversarial attacks with pixel deflection. CoRR, abs/1801.08926, 2018. URL http://arxiv.org/abs/1801.08926.
 Rauber et al. (2017) Jonas Rauber, Wieland Brendel, and Matthias Bethge. Foolbox: A python toolbox to benchmark the robustness of machine learning models. arXiv preprint arXiv:1707.04131, 2017. URL http://arxiv.org/abs/1707.04131.
 Rozsa et al. (2016) Andras Rozsa, Ethan M. Rudd, and Terrance E. Boult. Adversarial diversity and hard positive generation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2016.
 Russakovsky et al. (2015) Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li FeiFei. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015. doi: 10.1007/s112630150816y.
 Simonyan and Zisserman (2014) Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for largescale image recognition. arXiv preprint arXiv:1409.1556, 2014.
 Stein and Weiss (1971) E.M. Stein and G. Weiss. Introduction to Fourier Analysis on Euclidean Spaces. Mathematical Series. Princeton University Press, 1971. ISBN 9780691080789. URL https://books.google.ca/books?id=YUCV678MNAIC.
 Szegedy et al. (2013) Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
 Tramèr et al. (2017) Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204, 2017.
 Watson (1995) George Neville Watson. A treatise on the theory of Bessel functions. Cambridge university press, 1995.
 Xie et al. (2017) Cihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, and Alan L. Yuille. Mitigating adversarial effects through randomization. CoRR, abs/1711.01991, 2017. URL http://arxiv.org/abs/1711.01991.
 Yuan et al. (2019) Xiaoyong Yuan, Pan He, Qile Zhu, and Xiaolin Li. Adversarial examples: Attacks and defenses for deep learning. IEEE transactions on neural networks and learning systems, 2019.
Appendix: Mathematical proofs
Definition B.1.
A piecewise linear function is a continuous function such that there are some hyperplanes passing through origin and dividing into pairwise disjoint regions , , on each of which is linear:
Lemma B.2.
Composition of a piecewise linear function with a ReLU activation function is also a piecewise linear function.
Proof.
Let r(.) denote the ReLU activation function. If on region takes both positive and negative values, will break it into two regions and . On the former and on the latter , which both are linear functions. As on is linear, common boundary of and lies inside a hyperplane passing through origin – which is the kernel of the linear function. Therefore, if is a piecewise linear function defined by hyperplanes resulting in regions, will be a piecewise linear function defined by at most hyperplanes. ∎
Theorem B.3.
The output of any hidden unit in an unbiased fullyconnected ReLU neural network is a piecewise linear function.
Proof.
This proposition immediately follows lemma B.2. ∎
Definition B.4.
Let be a set of independent vectors in . An infinite simplex, , is defined as the region linearly spanned by using only positive weights:
(9) 
Theorem B.5.
Each piecewise linear function can be formulated as a summation of some functions: , each of which is linear and nonzero only in an infinite simplex as follows:
where is a set of independent vectors, and is a weight vector.
Proof.
Each region of a piecewise linear function, , which describes the behavior of a ReLU node if intersects with an affine hyperplane results in a convex polytope. This convex polytope can be triangulated into some simplices. Define , , sets of vertexes of these simplices. The infinite simplexes created by these vector sets will have the desired property and can be written as: . ∎
As explained earlier in the original article by adding hyperplanes to those defining the piecewise linear function, the output of a ReLU node may be represented as . These hyperplanes are those perpendicular to standard basis vectors and subtraction of one of these vectors from another one. That is, and . Given this representation, the final step to achieve the Fourier transform is the following lemma:
Lemma B.6.
Fourier transform of the following function:
may be presented as:
(10) 
where is the th component of frequency vector , and .
Proof.
Alternatively, may be represented as:
(11) 
Therefore, we need to compute Fourier transform of :
(12)  
(13) 
By taking the inverse Fourier transform of the function:
(14) 
where is dimensional Dirac Delta function, it can be shown that it is the Fourier transform of :
(15)  
(16)  
(17)  
(18)  
(19) 
Now we can find the Fourier transform of
(20)  
(21) 
where is convolution operator. The final integrand may be represented as:
(22)  
(23)  
(24)  
(25) 
where , is the summation over elements of and . Therefore:
(26)  
(27)  
(28)  
(29)  
(30) 
If does not contain and have at least 2 elements then the terms for and will cancel each other out. Also, will vanish if has only one element. Therefore, there only remains empty set and sets with two elements one of them being . Given the fact that , the result of the integral will be:
(31)  
(32) 
Finally, substituting 32 into 21 yields to the desired result. ∎
Theorem B.7.
The Fourier transform of the output of any hidden node in a fullyconnected ReLU neural network may be represented as , where denote the differential operator.
Proof.
As discussed in the original paper, where:
(33) 
or equivalently:
(34) 
Therefore:
(35)  
(36) 
where . ∎
Derivation of eq.(6)
As for the Fourier transform computed in section 3.1, it should be mentioned that the integral in equation 6 is the Fourier transform of:
(37) 
which can be derived utilizing the property of the Fourier transforms for radially symmetric functions [Stein and Weiss, 1971]:
(38)  
(39)  
(40) 
Given this transform:
(41)  
(42) 