Block Switching: A Stochastic Approach for Deep Learning Security
Recent study of adversarial attacks has revealed the vulnerability of modern deep learning models. That is, subtly crafted perturbations of the input can make a trained network with high accuracy produce arbitrary incorrect predictions, while maintain imperceptible to human vision system. In this paper, we introduce Block Switching (BS), a defense strategy against adversarial attacks based on stochasticity. BS replaces a block of model layers with multiple parallel channels, and the active channel is randomly assigned in the run time hence unpredictable to the adversary. We show empirically that BS leads to a more dispersed input gradient distribution and superior defense effectiveness compared with other stochastic defenses such as stochastic activation pruning (SAP). Compared to other defenses, BS is also characterized by the following features: (i) BS causes less test accuracy drop; (ii) BS is attack-independent and (iii) BS is compatible with other defenses and can be used jointly with others.
Powered by rapid improvements of learning algorithms (he2016deep; lecun2015lenet; krizhevsky2012imagenet; Zhao2019fault; zhao2018admm), computing platforms (abadi2016tensorflow; jia2014caffe), and hardware implementations (han2016eie; li2019rnn), deep neural networks become the workhorse of more and more real world applications, many of which are security critical, such as self driving cars (bojarski2016end) and image recognition (parkhi2015deep; he2016deep; krizhevsky2012imagenet; zhao2017aircraft; wang2018using), where malfunctions of these deep learning models lead to serious loss.
However, the vulnerability of deep neural networks against adversarial attacks is discovered by Szegedy et al. (szegedy2013intriguing), who shows that in the context of classification, malicious perturbations can be crafted and added to the input, leading to arbitrary erroneous predictions of the target neural network. While the perturbations can be small in size and scale or even invisible to human eyes.
This phenomenon triggered wide interests of researchers, and a large number of attacking methods have been developed. Some typical attack methods include Fast Gradient Sign Method (FGSM) by Goodfellow et al. (Goodfellow2015explaining), Jacobian-based Saliency Map Attack (JSMA) by Papernot et al. (papernot2016limitations), and CW attack by Carlini and Wagner (carlini2017towards). These attacks utilize gradients of a specific object function with respect to the input, and design perturbations accordingly in order to have a desired output of the network. Among the attacks, CW attack is known to be the strongest and often used as a benchmark for evaluating model robustness.
In the meantime, a rich body of defending methods have been developed, attempting to improve model robustness in different aspects. Popular directions include adversarial training (madry2017towards), detection (grosse2017statistical; metzen2017detecting), inputs rectifying (das2017keeping; xie2017mitigating), and stochastic defense (s.2018stochastic; wang2018defensive; wang2018defending; wang2019protecting). However, although these defenses alleviate the vulnerability of deep learning in some extent, they are either shown to be invalid against counter-measures of the adversary (carlini2017adversarial) or require additional resources or sacrifices. A significant trade-off of these methods is between defense effectiveness and test accuracy, where a stronger defense is often achieved at the cost of worse performance on clean examples(wang2019protecting).
Motivated by designing defense method with less harm on test accuracy, in this article we introduce Block Switching (BS) as an effective stochastic defense strategy against adversarial attacks. BS involves assembling a switching block consisting of a number of parallel channels. Since the active channel in the run time is random, it prevents the adversary from exploiting the weakness of a fixed model structure. On the other hand, with proper training, the BS model is capable of adapting the switch of active channels, and maintains high accuracy on clean examples. As a result, BS achieves drastic model variation, and thus have strong resistance against adversary without noticeable drop in legitimate accuracy. The nature of BS also enables its usage jointly with other type of defenses such as adversarial training.
Our experimental results show that a BS model with 5 channels can reduce the fooling ratio (the percentage of generated adversarial examples that successfully fool the target model) of CW attack from 100% to 21.0% on MNIST dataset and to 22.2% on CIFAR-10 dataset respectively with very minor testing accuracy loss on legitimate inputs. As comparison, another recent stochastic defense stochastic activation pruning (SAP) only reduces the fooling ratio to 32.1% and 93.3% given the same attack. The fooling ratio can be further deceased with more parallel channels.
The rest of this article is organized in the following way: In Section 2, we introduce related works in both attacking and defending sides. The defense strategy and analysis are given in Section 3. Experimental results are given in Section 4. And Section 5 concludes this work.
2. Adversarial Attack
FGSM. Fast Gradient Sign Method (FGSM) (Goodfellow2015explaining) utilizes the gradient of the loss function to determine the direction to modify the pixels. They are designed to be fast, rather than optimal.
Specifically, Adversarial examples are generated as following:
where is the magnitude of the added distortion, is the target label. Since it only performs a single step of gradient descent, it is a typical example of \sayone-shot attack.
CW. Carlini & Wagner (CW) attack (carlini2017towards) generates adversarial examples by solving the following optimization problem:
where controls the relative importance between the distortion term and loss term . The loss term takes the following form:
where controls the confidence in attacks.
3.1. Block Switching Implementation
Training a Block Switching model involves two phases. In the first phase, a number of sub-models with the same architecture are trained individually from random weights initialization. With the training process and data being the same, these models tend to have similar characteristics in terms of classification accuracy and robustness, yet different model parameters due to random initialization and stochasticity in the training process.
After the first round of training, each sub-model is split into two parts. The lower parts are grouped together and form the parallel channels of the switching block, while the upper parts are discarded. The switching block is then connected to a randomly initialized common upper model as shown in Fig. 1. In the run time, a random channel is selected to be active that processes the input while all other channel remains inactive, resulting in a stochastic model that has different behavior at different time.
The whole BS model is then trained for the second round on the same training dataset in order to regain classification accuracy. In this phase, the common upper model is forced to adapt inputs given by different channels so that a legitimate example can be correctly classified given whichever channel is active. Usually, this phase is much faster than the first round of training since the parallel channels are already trained.
3.2. Defense Analysis
Let denoted the learned mapping of a stochastic model. Note that is a stochastic function and now is a random variable. The defending against adversarial attacks can be revealed in two aspects.
Stochasticity of Inference: Since is a random variable, an adversarial example that fools an instance of the stochastic model sampled at may not be able to sampled at .
Stochasticity of Gradient Due to the stochasticity of the network, the gradient of attacker’s objective loss with respect to the input is also stochastic. That is, the gradient backpropagated to the input is just an instance sampled from the gradient distribution. And this instance may not represent the most promising gradient descent direction.
Note that these two aspects are actually correlated. From the attacker’s point of view, the goal is to find where outputs 1 if the attack is successful and 0 otherwise, and is the target class. Therefore, the attacker is benefited from using stochastic gradients other than gradients from a fixed model instance, in order to generate adversarial examples that are robust to model variation. In another word, this means the adversary cannot benefit from simply disabling the variation of the stochastic model and craft perturbations using a fixed model instance.
The above analysis holds for any stochastic model but the question is what makes a good randomizatin strategy against adversarial attacks? Intuitively, a good randomization strategy should cause the input gradients to have wider distributions. In an extreme case, if the gradient direction is uniformly distributed, performing gradient descent is no better than random walking, which means the attacker cannot take any advantage from the target model.
Knowing this, we explain why block switching performs better than existing stochastic strategies such as SAP. In Fig. 2 we visualize gradient distributions under CW attacks to a SAP model and a BS model respectively. We observe that the gradient (of the attacker’s object function w.r.t the input) distribution of the SAP model is unimodal and concentrated While the gradient of BS has a multimodal distribution in a wider range. This distribution indicates that it is harder to attack BS than SAP which is verified by our experiment results in Section 4.
Usually dramatic variations of the stochastic model tend to harm classification accuracy on clean inputs. That is why in SAP, smaller activation outputs have more chance to be dropped. The reason that Block Switching maintain high test accuracy despite drastic model change is that, since each channel connected to the common upper model is able function independently. As long as the common upper model can learn to adapt different knowledge representations given by different channels, the stochastic model will not suffer from significant test accuracy loss.
An interesting question that readers may ask is: why stochasticity of the model does not impede the second round of training? The fact is that although the gradients with respect to the input are random variables, the gradients with respect to model parameters are not. Since gradients of the inactive channel are just zeros, only weights parameters in the activate channel will be updated in each training step. Therefore, although the weights to be updated alternates, the gradients with respect to model parameters are deterministic at any time.
In this section, we compare the defense effectiveness of regular, SAP and BS models against FGSM (Goodfellow2015explaining) and CW (akhtar2018threat) attacks on MNIST (lecun1998mnist) and CIFAR-10 (krizhevsky2009learning) datasets. FGSM is a typical \sayone-shot method which performs only one gradient descent step and CW attack is known to be the strongest attack method so far (akhtar2018threat).
Both of these two datasets contain separated training and testing sets. In our experiments, the training sets are used to train the defending models and the testing sets are used to evaluate classification performance and generate adversarial examples.
This section is organized in the following way: Details about the defending models, including the models’ architectures and training methods, are given in Section 4.1. Defending records against FGSM and CW attacks are shown in Section 4.2. Study on how the number of channels in the block switching influences its the defending effectiveness and classification accuracy is provided in Section 4.3.
4.1. Model Details
We use two standard Convolutional Neural Networks (CNNs) architectures for MNIST and CIFAR-10 datasets respectively, as they serve as baseline models repeatedly in previous works (papernot2016distillation). Both of these two CNNs have 4 convolutional layers, 2 pooling layers and 2 fully-connected layers but the kernel size of convolution filters and layer width are different.
Both models are trained using stochastic gradient descent with the mini batch size of 128. Dropout (srivastava2014dropout) is used as regularization during training.
SAP can be applied post-hoc to a pre-trained model (dhillon2018stochastic). Therefore, in order to make the experimental results more comparable, we use the same trained weights for SAP model as of the regular model. Stochastic activation pruning is added between the first and second fully-connected layers.
The switching block in this experiment consists of 5 channels. During the first round of training, 5 regular models are trained as described above. Each regular model is split into a lower part, containing all convolutional layers and the first fully-connected layer, and a upper part, containing the second fully-connected layer. The lower parts of regular model are kept, providing parallel channels of block switching while the upper parts are discarded. A upper model, which is the same as the upper part of regular models except that its weights are randomly initialized, is added on top of all channels. The whole block switching is then trained on original training set for the second time. We found that the second round of training is much faster than the first round. On MNIST dataset block switching is retrained for 1 epoch and on CIFAR-10 dataset 5 epochs.
|Model||Test Acc. on MNIST||Test Acc. on CIFAR|
The test classification accuracy of all models is summarized in Table 1. The direct comparisons are between the regular model and the SAP model, since they share the same weights; and the average of sub-models used to construct block switching and block switching itself. We can conclude that both SAP and block switching are excellent in maintaining testing accuracy.
4.2. Defense against Adversarial Attacks
We use the fooling ratio, which is the percentage of adversarial examples generated by a attack method that successfully fools a neural network model to predict the target label, to evaluate the defense effectiveness of the target model. The lower the fooling ratio is, the stronger the model is in defending adversarial attacks.
We also record the average norm of the generated adversarial examples from legitimate input images, since it is only fair to compare two attacks at similar distortion levels. For attacks like CW attack that uses a leveraged object function between distortion and misclassification, a large distortion also indicates that it is hard for the attacking algorithm to find an adversarial example in a small region.
Experiments on MNIST Dataset
For the sake of reproducibility of our experiments, we report the hyper-parameter settings we use for FGSM and CW attacks. FGSM has one hyper-parameter, the attacking strength as shown in equation 1. When using , the norm of adversarial examples roughly matches CW, but the fooling ratio is way too small. Thus we also test the case when in order to provide a more meaningful comparison, although the norm is significantly larger. For CW attack, gradient descent is performed for 100 iterations with step size of 0.1. The number of binary searching iterations for in 2 is set to 10.
We use FGSM and CW attacks to generate adversarial examples targeting the regular model, the SAP model and block switching respectively. Experimental results are shown in Table 2.
Although the SAP model demonstrates its extra robustness against both FGSM and CW than the regular model, block switching is apparently superior and deceases the fooling ratio further.
Experiments on CIFAR-10 Dataset
We use for FGSM in this experiment in order to have adversarial examples with similar distortion level comparing to examples generated by CW attack. The hyper-parameter setting for CW attack is the same as above.
Experimental results on CIFAR-10 datasets are shown in Table 3. And block switching significantly decreases fooling ratio of FGSM and CW to 8.1% and 22.2% respectively while the SAP model only shows minor advantages over the regular model.
4.3. the Effect of Channel Number
To provide an analysis on how the number of channels in a block switching affect its defense effectiveness as well as testing accuracy, we run CW attack on BS models with different number of channels ranging from 1 (which is a regular model) to 9.
In Fig. 3 we plot the fooling ratio, distortion and test accuracy over different channel numbers: in general, the defense becomes stronger with more channels of block switching and the fooling ratio is lowest, 12.1%, when using 9 channels. The fooling ratio drops rapidly from 1 channel to 4 channels while the drop of fooling ratio decelerates after 5 channels, which indicates the effectiveness provided by switching channels starts to saturate. The increasing of distortion of adversarial examples also indicates that BS with more channels are stronger when defending adversarial attacks. The trend of testing accuracy, on the other hand, is almost flat with a very slight descent from 78.31% to 78.17%. This indicates that BS is very effective in defending adversarial attacks with very minor classification accuracy loss.
In this paper, we investigate block switching as a defense against adversarial perturbations. We provide analysis on how the switching scheme defends adversarial attacks as well as empirical results showing that a block switching model can decease the fooling ratio of CW attack from 100% to 12.1% . We also illustrate that stronger defense can be achieved by using more channels at the cost of slight classification accuracy drop.
Block switching is easy to implement which does not require additional training data nor information about potential adversary. Also, it has no extra computational complexity than a regular model in the inference phase since only one channel is used at a time. In practice, the parallel channels can be stored distributedly with periodical updating, which can provide extra protection of the model that prevents important model information leak.
More importantly, BS demonstrates that it is possible to enhance model variation yet maintain test accuracy at the same time. And we hope this paper can inspire more works toward this direction.
- footnotetext: This work is supported by the Air Force Research Laboratory FA8750-18-2-0058.
- copyright: none
- conference: AdvML’19: Workshop on Adversarial Learning Methods for Machine Learning and Data Mining at KDD; August 5th, 2019; Anchorage, Alaska, USA