Applying Tensor Decomposition for the Robustness against Adversarial Attack
Nowadays, deep learning technology is growing faster and shows dramatic performance in computer vision. However, it turns out that the deep learning model is highly vulnerable to small perturbation called an adversarial attack. So far, although many of the defense mechanism has been proposed to mitigate the effect of the adversarial attack, all of them are under rigorous assumptions. However, our approach is not tied up any assumptions since our insight stems from the Tensor Decomposition. In this paper, we experimentally demonstrated that decomposing the tensor would be an effective countermeasure against several adversarial attacks. We conducted experiments with well-known benchmarks such as MNIST, CIFAR-10, and ImageNet dataset. Our experimental results show that this simple method has capable of having attack resilience and robustness against adversarial attacks. To the best of our knowledge, this is the first approach to leverage the tensor decomposition as a defense mechanism. We hope that leveraging the tensor decomposition becomes a universal approach to solve inherent corner cases of deep learning models.
Keywords:Adversarial example, Tensor decomposition
Over the past several years, advances in deep neural networks (DNNs) have widely expanded the ability of what the machine can deal with. Especially, DNNs have achieved remarkable successes for image classification [14, 24] and it even goes beyond human capability . With this performance, deep learning technology has started to be applied to various areas. However, some papers [28, 7, 2, 16, 17, 20, 6, 5] proved that even DNNs can be easily fooled by small changes to input that is imperceptible to a human eye. According to these studies, carefully crafted perturbations to the vision-based applications can induce systems to behave in unexpected ways. Indeed, this is small enough to be inconspicuous, but some researches show that its influence might be more than expected since even state-of-the-art models get an almost zero-classification accuracy under .
Considering the deep learning models do not hesitate whenever judge the output, it might cause crucial accidents. For instance, Fig. 1 represents the adversarial examples in the image classification task. Although all of the images can be seen as an ostrich by human visible intuition, deep learning model outputs clearly different labels due to lack of such intuition. From a more theoretical perspective, misclassification occurs when the adversarial perturbations cross the decision boundary, but the existing classifier has no such intuition that can ward it off. Now, the corner case of DNNs which have been alluded to adversarial attacks is getting pervasive and being more sophisticated. As a result, the vulnerability of adversarial attacks hinders its adoption for some safety-critical system and also security-sensitive application, including an autonomous-driving car [6, 25].
Since the advent of such adversarial attacks, many researchers or vendors have paid significant attention to adversarial examples. This is because they might not want to go through all the risks of their applications or models. They might as well choose to verify the robustness rather than take risks. However, the resistance against adversarial examples renders another challenge, for no method can be a cure-all against adversarial attacks. To make up for corner cases of DNNs, several studies [7, 28, 19, 21, 26, 10, 18, 32] have proposed the defense mechanism against adversarial attacks to mitigate the potential of the risk by adversary. These defense mechanisms can be viewed as two main approaches: (1) changing the model itself, which can improve the robustness by training with adversarial examples, e.g., adversarial training [7, 28, 29], (2) preprocessing the inputs to diminish the effect of adversarial noise, e.g., Magnet, Comdefend, PixelDefend, Defense-GAN, HGD, etc. [19, 10, 26, 23, 18, 32].
However, (1) are designed to deal with specific adversarial attack strategies in mind, so generalization is likely to be restricted. It implies that the models using this method may be vulnerable to another attack optimized with such attack strategies. On the other hand, (2) utilize models with a vast amount of legitimate data to purify the inputs itself, instead of assuming some attack strategies. The main point of (2) is to measure the distance between the inputs and manifold of the legitimate images, and then approximate or guide the adversarial images closer to the manifold of the legitimate images. To purify the inputs, therefore, a well-generalized model should be required to assure the performance. Given that the adversarial images occupy the low probability region trained with legitimate samples , poorly generalized models cannot ensure that the aforementioned approaches get a good result. Also, another attack technique might be created by an adversary who knows the model’s structure. In a nutshell, as these approaches are likely to be a temporary expedient, the universal defense approaches which can cover a myriad of risk should be explored.
Here, we propose a novel intuition for deep learning models, which can make the model universally robust. To the best of our knowledge, this is the first approach to explore the defense in terms of the universal point of view. Our approach leverages the potential power of the tensor decomposition to diminish the effect of adversarial noise by using the reconstructed images as an input of a deep learning model. The reconstructed inputs are fed into the classifier, and we experimentally demonstrate that such simple preprocessing could be an effective countermeasure against the adversarial attack. On MNIST, a degradation of top-1 accuracy on adversarial example is less than 1% against four adversarial attacks and less than 10 % on CIFAR-10 and ImageNet. This result outperforms recent defense mechanisms [18, 10]. To ensure that deep learning applications extend their potential of utilization toward other domains, it would be better to take into account the robustness of those applications. If you want to avoid cherry-picking doubt and make the model more general across a variety of risks including adversarial attacks, our insight would be an interesting candidate. Our contribution is as follows:
High Compatibility. Our approach leverages the tensor decomposition for preprocessing the inputs which might have been affected by the adversary. We do not assume anything such as attack strategies or classifiers, just use an input as a reconstructed input by using the tensor decomposition method, which indicates that our proposed method can be relatively easy to utilize and be applied to whatever the classifier is.
Efficient Engineering Complexity. As we mentioned above, tensor decomposition just depends on what the input it is, so it is not tied up with the attack strategies or classifiers. Therefore, we do not need to focus on how the model classifies the input since tensor decomposition is free from the model dependency. It implies that retraining the model or augmenting the training data could no longer be required. It requires only processing time to reconstruct inputs. Even more, the processing time is negligible.
Integrity of the inputs. When it comes to the reconstruction process, some information that in charge of the important role might be lost. Although state-of-the-art approaches [18, 10] have gotten remarkable performance, their proposed model degrades the performance with even the clean images. This is some kind of a trade-off. It thus makes it difficult to apply defense mechanisms. However, tensor decomposition could incur less adverse effects on the clean images, and ensure its performance even at the high-dimension dataset such as ImageNet.
2 Related work
Szegedy et al. found the existence of adversarial perturbation that breaks the image classifier thorough solving adversarial optimization problem [28, 7]. They show the model accuracy is dropped even though the perturbed image looks similar to human eyes. Goodfellow et al.  uses the sign of the gradient of input with respect to the loss function of the target model. This method is called Fast Gradient Sign Method (FGSM) since it updates input once. With a similar idea,  uses FGSM in an iterative way. Chen et al.  leverages distortion to generate effective adversarial examples and improve the attack transferability which refers to the attack success rate using the adversarial examples which come from the substitute models. In other words, high transferability implies that the performance of the target model might depreciate even without the knowledge about the model, i.e., black box attack. Carlini and Wagner  changes the optimization problem defined in  for achieving more powerful attack.  measures the minimum size required for the attack. They approximate the decision boundary of the model and update input repeatably until the model misclassifies it. Besides the image classification task, [6, 25] demonstrated that adversarial attacks can be applied beyond the digital space, so security concerns could arise in even physical space such as the autonomous-driving car.
To counter adversarial attacks, some works trained the model with adversarial examples to ensure that the model has a resilience against those adversarial examples, which have been called adversarial training. During the process of training, they generate adversarial images for improving the performance. Although it works, it depends on the particular adversarial data used in the training process. For instance,  shows their approach is robust in the simple attack, but not in a more sophisticated attack. In addition, it has an engineering penalty since it requires retraining the model. If it takes longer to create an adversarial example, it will take more time to retrain the model. Instead of using the data augmentation, methods to change the model itself were also proposed . They change the objective function of the problem for obtaining the robustness. However, this approach also has to retrain the model, so it also boils down to increasing the engineering complexity.
In recent years, several papers [19, 10, 26, 18, 32, 23] preprocess the inputs before putting into the classifier. They propose the model which serves the direction to approximate the distribution of the adversarial images as close as possible to the decision boundary. All of the methods require a well-generalized defense model, so the even clean images could be affected when the model is poorly generalized. It results in damage to the integrity of the inputs. To guarantee the integrity of the model, all of them require well-generalized classifiers to detect if the input is adversarial or approximate the manifold of legitimate samples. Our approach is similar to those approaches in terms of the preprocessing, yet differentiation is our method does not need any premises, including the detector or well-generalized models. Consequently, our method does not hurt the performance in terms of integrity.
3.1 Adversarial Attack
Basically, all of the attacks use the gradient of data with respect to the loss function of the target model. In this section, we briefly review the basic method of adversarial attack.
Fast Gradient Sign Method (FGSM): The FGSM is proposed by . It is a simple and effective attack method. The image is perturbed as follows.
Where is a magnitude of noise and is a loss with respect to the true label of the image. It adjusts input X by adding a sign of the gradient of X. It increases the loss function of the target model so that the model misjudges the adjusted input. Since it updates input X once, it is also called single-step method.
Basic Iterative Method (BIM): The BIM is a repetitive version of FGSM . it is a more powerful attack method compared to the FGSM. And it is also called Iterative FGSM. It uses the following equations:
Where , is a step size for adjusting , and clip function ensures that for all . It is also called multi-step method. Here is the in the scale of 0 to 255 in the original paper.
DeepFool: DeepFool attack approximates the decision boundary of a classifier, and measure the minimal perturbations that are sufficient to fool the classifier . For the affine multiclass classifier, they calculate the distance as follows.
Here and are classifier for and -th class. Similar to BIM, they update in an iterative way. For nonlinear classifiers, they approximate the linear boundary and find the distance to fool the nonlinear classifier.
Carlini & Wagner (C&W): Carlini and Wagner  define an optimizaion problem to find an adversarial example. They define following optimizaion formulation.
Here is a distance metric to measure the distance between the clean image and adversarial image. is an objective function to control the result of original classifier . C&W attack is one of the most powerful attacks in this literature. We visualize the adversarial image generated by each method in Fig. 2
3.2 Tensor Decomposition
A tensor is a multi-dimensional array. For instance, the color image is a tensor consists of height, width, and the color channel. A tensor decomposition method decomposes a tensor into low dimension tensors. The CANDECOMP/PARAFC [4, 8] decomposition approximates a tensor as a sum of the outer product of the tensor belonging to each dimension as Fig. 3. We refer to this as a CP decomposition.
Let , for . Here is the number of components and it is a hyperparameter. If is small, tensor is approximated into low dimension tensor. So we call deciding as choosing the dimension for convenience. Then is approximated as follows.
The Tucker decomposition [30, 31] is another way to decompose a tensor. It is a kind of higher order principal component analysis . It decomposes tensor as a core tensor and factor tensors as Fig. 4.
Let , for . Here the size of the core tensor and are the number of the components and it is hyperparameter. When the size of the core tensor is fixed, the size of the factor tensor is decided according to the size of the core tensor. Similar to the number of the components of CP decomposition, we call deciding the size of the core tensor as a choosing the dimension. Then is approximated as follows.
To mitigate the effect of the adversarial attacks, our insight stems from the tensor decomposition. In this section, all the paragraphs that describe revolve around how we can apply this magic, i.e., tensor decomposition, as a defense mechanism against the adversarial attacks.
4.1 Tensor decomposition as a preprocessing
As you can see in Fig. 2, adversarial examples are too sophisticated to be recognized by human senses, including well-generalized deep learning models. To prevent the potential threat, our model simply uses the reconstructed image from the tensor decomposition as an input. We conjecture that the effect of the adversarial perturbations could be reduced by approximating the tensors toward the low dimension. To cast light on our hypothesis, we conduct brief experiments based on the visual sense. Fig. 5 represents the examples of the noise. Intuitively, adversarial noise crafted by the FGSM seems to distinguishable from others, while the rest of them are relatively similar to each other. In other words, the tensor decomposition can transform adversarial noise into random noise, e.g., gaussian noise. We can say that the tensor decomposition has an ability to purify the adversarial noise in this regard. Even though such random noise might also degrade the performance, it would be no worse than original adversarial noise considering they are crafted by adversarial intend.
Based on this light, we utilize CP and Tucker decomposition methods to verify how effective these methods actually are under various attack strategies. Before the main experiments, both methods require to set the dimension, e.g., for the CP and , for the Tucker. As the dimension of the tensor increases, the quality of the reconstructed image gets better as shown in Fig. 6. The high quality of the reconstructed image is not always better, however, so the dimension needs to be decided in a heuristic manner. We thus studied the ablation study to find out the effective hyperparameters under CP and Tucker and consider two kinds of factors to decide the hyperparameters, accuracy and time complexity. We randomly sampled 1,000 images from CIFAR-10, and then generate the adversarial images by applying FGSM, BIM, DeepFool, and C&W, respectively. As follow, those images are reconstructed by CP and Tucker decomposition. Finally, we measured the accuracy and time complexity using the reconstructed images.
The result summarized in Fig. 7. While accuracy increased as dimensions increased by the middle, accuracy tends to decrease gradually. And the processing time increases as dimension increases. So we decide to use 40 % of the original dimension. For instance, the size of the image is 32 by 32 in the CIFAR-10 dataset. In the case of CP decomposition, the size of the three tensors in Fig. 3 will be 32,32 and 3. So we choose rank . We use a similar argument when choosing the size of the core tensor of Tucker decomposition. In particular, we don’t compress the channel dimension in Tucker decomposition since we don’t want to lose color information. Thus, the size of the core tensor of the Tucker decomposition is in the form of height, width and . We use an open-source library  for each decomposition method.
4.2 Denoise Autoencoder as a supplement
We should consider one more thing before putting the input value into the classifier. When it comes to reconstructing the images, we should consider the loss of information as it might affect the classification result. If the input is clean images, it would work even worse. To diminish the adverse effect from that point, we add denoise autoencoder into the procedure. Our method is based on a coarse-to-fine approach. Through the reconstructed inputs from decomposed tensors, we remove the coarse-grained adversarial features. We expect that some fine-grained features that might be lost by the coarse-grained approachâwhich is more likely to occur in high-dimension, could be compensated pass through the denoise autoencoder. Equipped with this approach, we set up the denoise autoencoder architecture as follows. The numerical value in Table 1 stands for input channel and output channel respectively. And the filter size is .
We utilize the CIFAR-10 images at the RGB scale and MNIST image at the grayscale. We set the learning rate to and used Adam  as an optimizer. And we use mean square error (MSE) for loss function. For both models, we train autoencoder for 10 epochs. We henceforth denote autoencoder as AE.
4.3 Overall architecture
We describe the overall architecture in detail. Fig. 8 represents the overall flow of our proposed method. First, we approximate the input image via the tensor decomposition method. The inputs could be adversarial images or clean images. Our method does not spend time deciding whether the input is adversarial or not. That’s the reason why our model does not require a well-generalized model. In other words, whatever the input is, our model splits the input into several tensors based on CP or Tucker decomposition, and then reconstruct them. As follows, the reconstructed images are passed through the denoise autoencoder, which might compensate for losing the information that may in charge of an important role in that image. Note that our method does not have a model dependency, so it can be applied in conjunction with every classifier.
For the MNIST, CIFAR-10 data, we test on the full test data, which are composed of 10,000 images on MNIST and 50,000 images on CIFAR-10. For ImageNet, we select randomly 1,000 images as similar setting [10, 15]. Since our proposed method decomposes the input whichever clean image or adversarial image, we tested on both clean images and adversarial images.
We measure the top-1 accuracy on clean images and adversarial images on each dataset. For MNIST, we use a simple model consists of two convolutional layers. For CIFAR-10 and ImageNet, we basically use pre-trained Resnet101 . In particular, for CIFAR-10, we finetune pre-trained Resnet101 for 10 classes. We use FGSM, BIM, DeepFool and C&W attack methods. For the distance metric, a related research area mainly uses and norm [3, 10]. In detail, we use for FGSM, BIM, and DeepFool attack. And for the C&W attack, we use a metric. We try to find small perturbation when applying the adversarial attack since the noise is visible when the perturbation is not small enough. We generate adversarial images by using open source library Foolbox . In detail, we try 100 epsilons from 0 to 1 for FGSM. For BIM, we set 5 as a number of iteration. And for DeepFool, we set 50 as a maximum number of steps and set 50 as a maximum iteration number of C&W. And we measure the pre-processing time for calculating the additional time consuming for the proposed method. Also, we compare our results to other state-of-the-art defense models. For a fair comparison, we compare the ratio between the accuracy of the clean image and the adversarial image since the accuracy of a clean image is a little bit different depending on the setting.
We achieve remarkably high accuracy against adversarial attacks. In most cases, the CP is better than Tucker decomposition method. In some case of ImageNet dataset, Tucker decomposition method is better than CP. For instance, when attack with FGSM and C&W method, the result was the best by using Tucker decomposition. And in the case of clean images, the accuracy reduction was about 1% on all datasets. It means that we do not harm the original model in a normal case which is the input image is clean. The autoencoder is highly effective on MNIST dataset. Although the autoencoder does not have much effect on clean images, it improves the performance of various adversarial attacks on MNIST dataset. In addition to MNIST dataset, there have been small performance improvements for other datasets by using the denoise autoencoder. The numerical results are summarized in Table 2, 3 and 4.
Even the DeepFool and C&W attacks are more accurate and powerful attack compared to the FGSM and BIM attacks, the accuracy after decomposition is higher than the case of FGSM and BIM attacks.
5.4 Comparison with other defense methods
We measure the ratio of accuracy on clean images and adversarial images generated by FGSM, BIM, DeepFool and C&W attack for a fair comparison. Here the is restricted to in scale. The defense ratio is defined as follows.
We compare the performance of recent defense methods, HGD  and Comdefend . Fig. 9 shows the results. We select Resnet101  and Inception V3 (IncV3)  as base model. And we tested on 1,000 images from the ImageNet data. Our methods outperform in all cases compared to two recent defense methods. This result verifies that our method is effective. Moreover, our method does not depend on attack methods and the target model classifier, thus it can be easily combined with every model.
5.5 Time analysis
We measure the preprocessing time of each method. We pick randomly 1,000 images in MNIST,CIFAR-10, and ImageNet. And we calculate the average processing time per image. In most case, the CP decomposition requires more time compared to Tucker decomposition. In the case of MNIST and CIFAR-10, the time required to reconstruction is similar in both cases. However, In the case of ImageNet, the CP decomposition takes about 10 times more than the Tucker method. Table 5 summarizes preprocessing time of each dataset on each method.
|dataset||CP||CP+AE||Tucker||Tucker + AE|
5.6 White box scenario
In the white box scenario, we should assume the adversary knows full defense mechanism according to . In our method, note that the input image is always decomposed and reconstructed, and the decomposed components are always started from the random tensor. In detail, the component tensors of each decomposition method initialized to random tensor and then trained to approximate the original tensor. Thus, the input is always random tensor and the original image is a label itself like unsupervised learning. Therefore, there are no fixed weights, so the adversary can not generate adversarial examples concerning the tensor decomposition method. Hence, our propose method is robust on the white box attack scenario.
In this work, we verify the tensor decomposition is a simple and powerful method for purifying the adversarial perturbation. When we combine denoise autoencoder with the tensor decomposition method, the proposed method achieves higher accuracy against adversarial attacks. We experiment with our method against various adversarial attacks such as DeepFool and C&W attacks and discuss why this method is robust in the white box scenario.
Our intuition applying tensor decomposition into the adversarial attack is as follows. Since the adversarial perturbation is so small that it is hard to catch a difference, such a small perturbation would be removed by approximating the image tensor using low dimensional tensors. Since there is no straightforward algorithm to choose the dimension of the component tensors of the CP and Tucker decomposition, finding the best dimension remains for future work. Also, establishing a theoretical base why tensor decomposition is robust against adversarial attack is left to our future work.
- (2017) Adversarial examples are not easily detected: bypassing ten detection methods. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 3–14. Cited by: §5.6.
- (2017) Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp), pp. 39–57. Cited by: §1, §2, §3.1.
- (2018) Adversarial examples detection in features distance spaces. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 0–0. Cited by: §5.2.
- (1970) Analysis of individual differences in multidimensional scaling via an n-way generalization of âeckart-youngâ decomposition. Psychometrika 35 (3), pp. 283–319. Cited by: §3.2.
- (2017) EAD: elastic-net attacks to deep neural networks via adversarial examples. ArXiv abs/1709.04114. Cited by: Figure 1, §1, §2.
- (2018-06) Robust physical-world attacks on deep learning visual classification. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §1, §1, §2.
- (2014) Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572. Cited by: §1, §1, §2, §3.1.
- (1970) Foundations of the parafac procedure: models and conditions for an” explanatory” multimodal factor analysis. Cited by: §3.2.
- (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §1, Figure 2, §5.2, §5.4.
- (2019) Comdefend: an efficient image compression model to defend adversarial examples. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6084–6092. Cited by: item 3, §1, §1, §2, §5.1, §5.2, §5.4.
- (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §4.2.
- (2009) Tensor decompositions and applications. SIAM review 51 (3), pp. 455–500. Cited by: §3.2.
- (2019) Tensorly: tensor learning in python. The Journal of Machine Learning Research 20 (1), pp. 925–930. Cited by: §4.1.
- (2012) Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105. Cited by: §1.
- (2018) Adversarial attacks and defences competition. In The NIPS’17 Competition: Building Intelligent Systems, pp. 195–231. Cited by: §5.1.
- (2016) Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533. Cited by: §1, §2, §3.1.
- (2016) Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236. Cited by: §1, §2.
- (2018-06) Defense against adversarial attacks using high-level representation guided denoiser. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: item 3, §1, §1, §2, §5.4.
- (2017) Magnet: a two-pronged defense against adversarial examples. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 135–147. Cited by: §1, §2.
- (2016) Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2574–2582. Cited by: §1, §2, §3.1.
- (2016) Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE Symposium on Security and Privacy (SP), pp. 582–597. Cited by: §1, §2.
- (2017) Foolbox: a python toolbox to benchmark the robustness of machine learning models. arXiv preprint arXiv:1707.04131. Cited by: §5.2.
- (2018) Defense-gan: protecting classifiers against adversarial attacks using generative models. arXiv preprint arXiv:1805.06605. Cited by: §1, §2.
- (2013) Overfeat: integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229. Cited by: §1.
- (2018) Darts: deceiving autonomous cars with toxic signs. arXiv preprint arXiv:1802.06430. Cited by: §1, §2.
- (2017) Pixeldefend: leveraging generative models to understand and defend against adversarial examples. arXiv preprint arXiv:1710.10766. Cited by: §1, §1, §2.
- (2016) Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826. Cited by: §5.4.
- (2013) Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199. Cited by: §1, §1, §2.
- (2017) Ensemble adversarial training: attacks and defenses. arXiv preprint arXiv:1705.07204. Cited by: §1.
- (1963) Implications of factor analysis of three-way matrices for measurement of change. Problems in measuring change 15, pp. 122–137. Cited by: §3.2.
- (1966) Some mathematical notes on three-mode factor analysis. Psychometrika 31 (3), pp. 279–311. Cited by: §3.2.
- (2019) Feature denoising for improving adversarial robustness. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 501–509. Cited by: §1, §2.