Discretization based Solutions for Secure Machine Learning against Adversarial Attacks
Adversarial examples are perturbed inputs that are designed (from a deep learning network’s (DLN) parameter gradients) to mislead the DLN during test time. Intuitively, constraining the dimensionality of inputs or parameters of a network reduces the ‘space’ in which adversarial examples exist. Guided by this intuition, we demonstrate that discretization greatly improves the robustness of DLNs against adversarial attacks. Specifically, discretizing the input space (or allowed pixel levels from 256 values or to 4 values or ) extensively improves the adversarial robustness of DLNs for a substantial range of perturbations for minimal loss in test accuracy. Furthermore, we find that Binary Neural Networks (BNNs) and related variants are intrinsically more robust than their full precision counterparts in adversarial scenarios. Combining input discretization with BNNs furthers the robustness even waiving the need for adversarial training for certain magnitude of perturbation values. We evaluate the effect of discretization on MNIST, CIFAR10, CIFAR100 and Imagenet datasets. Across all datasets, we observe maximal adversarial resistance with input discretization that incurs an adversarial accuracy loss of just as compared to clean test accuracy.
Deep Learning Networks (DLNs) have exhibited better than human performance in several vision-related tasks . However, they have been recently shown to be vulnerable toward adversarial attacks [2, 3, 4]: slight changes of input pixel intensities that fool a DLN to misclassify an input with high confidence (Fig. 1). What is more worrying is that such small changes (that craft adversaries) are visually imperceptible to humans, yet, mislead a DLN. This vulnerability severely limits the potential safe-use and deployment of DLNs in real-world scenarios. For instance, an attacker can fool a DLN deployed on a self-driving car to mispredict a STOP sign as a GO signal, and cause fatal accidents.
Subsequently, there have been several theories pertaining to the adversarial susceptibility of DLNs . The most common one suggests that the presence of adversary is an outcome of the excessive linearity of a DLN (a property of high-dimensional dot-products). While one can argue that ReLU-type activation imposes non-linearity in a model, the linear operations such as Convolution, Pooling exceed the number of non-linear ReLU operations. Further, ReLU is typically a linear functionality in the regime, and hence, plagues a DLN to be sufficiently linear. Now, this linearity causes a model to extrapolate its behavior for points in the hyper-space (of data and model parameters) that lie outside the training/test data manifold. Adversarial inputs, essentially, are images that are synthesized such that they ‘lie far’ from the typical data manifold and hence get misclassified.
Fig. 2 (a) demonstrates this data manifold intuition and adversarial input creation with a cartoon. Since DLNs are discriminative models, they partition a very high-dimensional input space into different classes by learning appropriate decision boundaries. The class-specific decision boundaries simply divide the space into hyper-volumes. Interestingly, these hyper-volumes encompass the training data examples as well as large areas of unpopulated space that is arbitrary and untrained. This extrapolation of decision boundary beyond the training data space is a result of ‘linearity’, that in turn, gives rise to generalization ability. The fact that a model trained only on training data is able to predict well on unseen test data (termed as, generalization) is a favorable outcome of this extrapolation property. Unfortunately, this property also exposes a model to adversarial attacks. Adversarial data are created by simply adding small perturbations to an input data point, that shifts it from its manifold (or hyper-volume) to a different hyper-volume (that the model has not been trained upon and shows extrapolated behavior), causing misclassification.
From the above intuition, one can deduce that adding regularization features to a DLN’s training will improve its generalization ability and in turn, adversarial robustness. In fact, the most effective form of adversarial defense so far is training a model with adversarial data augmentation (called adversarial training) . It is evident that explicit training on adversarial data will increase the model’s capability to generalize and hence predict correctly on unseen adversarial data. However, the above discussion on excessive linearity and hyper-space dimensionality points to an alternate and unexplored regularization possibility, that is discretizing or constraining the data manifold for achieving adversarial robustness. For instance, discretizing the input data (say from 256-pixel value levels (or 8-bit) to 4 levels (or 2-bit)) reduces the regions into which data can be perturbed. In other words, the minimum perturbation required to shift a particular data point from one hyper-volume to another will increase in a discretized space (Fig. 2 (b)). This in turn will intrinsically improve the resistance of a DLN. Similarly, discretizing the parameter space (as in binarized neural networks (BNNs) ) will introduce discontinuities and quantization in the manifold (that is non-linear by nature). This will further decrease the extent of hyper-volume space that is arbitrary/untrained and thus reduce adversarial susceptibility (Fig. 2(b)). It is evident that such discretization methods have an added advantage of computational efficiency. In fact, low-precision neural networks (BNNs and related XNOR-Nets ) were introduced with a key motif of obtaining reduced memory and power-consumption for hardware deployment of DLNs.
In this paper, we demonstrate that discretization, besides offering obvious efficiency improvements, has far-reaching implication on a model’s adversarial resistance. We particularly emphasize on three different discretization themes and illustrate their suitability toward improving a DLN’s adversarial robustness, as follows:
Discretization of input space: We reduce the input dimensionality by quantizing the RGB pixel intensities into a variable range: to . We show that for minimal loss in accuracy, the adversarial robustness of a model substantially improves ( accuracy difference between clean test and adversarial test data), even, without any adversarial training. Furthermore, we show that combining adversarial training with 2-bit input discretization makes a model substantially more robust (than adversarial training with full 8-bit input precision) for large perturbation ranges.
Discretization of parameter space: We show that models trained with low-precision weights and activations, such as BNNs, are intrinsically more robust to adversarial perturbations than full precision networks. Furthermore, we find that training BNNs with adversarial data augmentation is difficult. However, increasing the capacity of the BNN (with more neurons and weights) minimizes the adversarial training difficulty. For sufficient model capacity, adversarially trained BNNs yield higher adversarial robustness than their full-precision counterpart.
Discretization of both input & parameter space: We demonstrate that combining input discretization with binarized weight /activation training greatly improves a model’s robustness. In fact, training a BNN with input discretization (say, 2-bit input) yields similar or better adversarial accuracy as that of an adversarially trained full-precision model. Thus, the combined discretization scheme can be seen as an efficient alternative to achieving adversarial robustness without the expensive data augmentation procedure.
To the best of our knowledge, we are the first to formally evaluate and analyze the impact of input and parameter space discretization on robustness. A recent work  demonstrated the effectiveness of BNNs against adversarial attacks and observed a similar difficulty in adversarial training with BNNs. However, they did not consider input space discretization and its impact on robustness. While complementing their results, we show that quantizing the input pixels of a BNN during training greatly improves its robustness, even waiving the need for the expensive and time-consuming adversarial training, for certain perturbation ranges.
2 Background on Adversarial
Generating Adversaries : Adversarial examples are created using a trained DLN’s parameters and gradients. As shown in Fig. 1, the adversarial perturbation, , is not just some random noise, but carefully designed to bias the network’s prediction on a given input towards a wrong class. Goodfellow et. al  proposed a simple method called Fast Gradient Sign Method (FGSM) to craft adversarial examples by linearizing a trained model’s loss function (, say cross-entropy) with respect to the input ():
Here, is the true class label for the input , denotes the model parameters (weights, biases etc.) and quantifies the magnitude of distortion. The net perturbation added to the input () is, thus, regulated by . Distorting the input image in the direction of steepest gradient has the maximal effect on the loss function during prediction. Intuitively, referring to Fig.2, this distortion shifts the data point from the trained region or hyper-volume to an arbitrary region thereby fooling the model.
Types of Attacks : In machine learning literature, there are two kinds of attacks: Black-Box (BB), White-Box (WB) that are used to study adversarial robustness . WB adversaries are created using the target model’s parameters, that is, the attacker has full knowledge of a target model’s training information. BB attacks refer to the case when the attacker has no knowledge about the target model’s parameters. In this case, adversaries are created using a different source model’s parameters trained on the same classification task as the target model. Since BB attacks are transferred onto the target model, they are weaker than WB attacks. Security against WB attacks is a stronger notion and robustness against WB attacks guarantees robustness against BB for similar perturbation () range.
Adversarial Training : Adversarial training simply injects adversarial examples into the training dataset of a model . For each training sample in the dataset, an adversary is created using FGSM . There are several forms of adversarial training. For instance, instead of using the same for all training examples, [10, 9] propose to sample a unique (from a random normal distribution) for each training example. This will increase the variation in the adversaries created thereby increasing the robustness of a network to larger range of values. The authors in  use WB adversaries created, using a multi-step variant of FGSM to guarantee a strong defense against both BB and WB attacks. Note, the common theme across all adversarial training methods is data augmentation.
In this work, we focus on adversarial attacks created using FGSM and evaluate the robustness of models against WB adversaries. We evaluate a model’s robustness/report adversarial accuracy on the adversarial dataset created using the test data for a given task.
We conduct a series of experiments for each discretization theme, primarily using MNIST  (Fully Connected Network, FCN) and CIFAR10  (AlexNet  architecture), detailing the advantages and limitation of each approach. We compare the adversarial robustness of each discretization approach with its’ full-precision counterpart (with and without adversarial training), using values reported in recent works [9, 11]. For adversarial training, we employ Random-step FGSM (R-FGSM) proposed in  to create a variety of training set adversaries. R-FGSM perturbs the input with a small random step (sampled from a normal distribution ) before adding the loss gradient to the input: , where . We use WB adversarial training to confer strong robustness toward all forms of attacks. Note, for MNIST (CIFAR), we use during adversarial training. For evaluating the robustness of parameter space discretization, we use BNNs [6, 14] to evaluate CIFAR10 and MNIST datasets. We also evaluate the robustness of discretization methods on large-scale datasets, CIFAR100 (ResNet20 architecture ) and Imagenet  (AlexNet architecture) using XNOR networks [7, 17]. Please note, for MNIST we use two different FCN architectures: FCN1-4 hidden layer network with 6144 neurons each (784-6144(4)-10), FCN2-4 hidden layer network with 600 neurons each (784-600(4)-10). We imported github models [17, 14] for implementing our experiments. We used the same hyperparameters (such as weight decay value, learning rate etc.) as used in [17, 14] to train our models. It is worth mentioning that our paper is the only other work besides  demonstrating the effectiveness of discretized/binarized parameter space on adversarial attacks. While  conducted experiments with various forms of attacks (primarily, on MNIST), we restrict ourselves to the WB attack scenario and extrapolate our analysis on larger datasets.
3.1 Discretization of Input Space
With input space discretization, we convert the raw integer pixel intensities () that are typically 8-bit (or ) values to a low precision range of bits () as:
where denotes integer division. Such quantization reduces the number of data points (given a grayscale input image of size ) in the manifold from to . This can be broadly interpreted as reducing the redundancy in the input data. Fig. 3 illustrates sample CIFAR10 images discretized to varying values. The corresponding accuracy (trained on AlexNet for 20 epochs) is shown in Table 1. There is a natural tradeoff between input discretization and overall accuracy of a network. Yet, the test accuracy loss from the full precision to is . This verifies the presence of unnecessary and redundant input dimensions that do not substantially contribute to the classification task or accuracy. discretization decreases the accuracy by a larger margin (). Note, this accuracy loss can be minimized by training the inputs for more epochs. However, for iso-comparison, we fix the number of epochs across all experiments for a given dataset.
A remarkable outcome of this discretization method is the substantial improvement in a model’s adversarial accuracy. Fig. 4 illustrates the evolution of adversarial accuracy of the CIFAR10 models (from Table 1) with increasing level of perturbation, . Note, corresponds to clean test accuracy. It is clear that clamping the input dimensionality to lower values increases the resistance of the model to larger magnitude of distortion. We can thus deduce that removing the redundant dimensions in the input reduces the overall hyper-volume space thereby leaving ‘less’ space for shifting or adversarially perturbing a data point (referring to Fig.2 intuition). input model shows a decline in accuracy even for a small value of . This further confirms that presence of redundant input dimensions allow even small perturbations to shift a data point. In contrast, increasing discretization increases the minimum that affects a model’s accuracy catastrophically. What is surprising is that for input, a model’s adversarial accuracy () for large is almost similar to that of clean accuracy (). For larger , the accuracy of all models declines to , except . This is a very interesting result since we have not employed any adversarial training, and still achieve substantial adversarial resistance for a large range.
Fig. 4(b) shows the adversarial accuracy results for MNIST (trained on FCN2 for 10 epochs). We observe a similar trend of increasing adversarial resistance with increasing discretization for larger . Since MNIST is a simple dataset with predominantly black-background, input discretization () does not contribute much to adversarial resistance until we go to extremely low precision. In fact, discretization yields adversarial accuracy similar to the clean test accuracy (for ) exhibiting the effectiveness of this technique even for simple datasets.
Next, we trained the input discretized CIFAR10 and MNIST models with adversarial training to observe the improvement in adversarial accuracy compared to input adversarial training (Table 2). Comparing to the results in Fig. 4(a, b), adversarial training substantially improves the robustness of a model with full input for larger values. Input discretization greatly furthers this robustness with accuracy gain across different perturbation ranges in both MNIST and CIFAR10. It is worth mentioning that the CIFAR10 accuracy (79%) without adversarial training for for input is as good as the accuracy (83%) with adversarial training. This shows that input discretization is a good regularization scheme that improves the generalization capability of a network on adversarial data. Note, for in case of CIFAR10, the accuracy is similar for since the adversarial training was conducted with adversaries created using . Including larger perturbation adversaries during adversarial training will yield improved accuracy gain.
3.2 Discretization of Parameter Space
Since input discretization gave us such promising results, we were naturally inclined toward analyzing a binarized neural network’s (BNN) behavior against adversarial attacks. Here, the weights and activations (or parameters) are discretized to extremely low precision values . The discretization constraints are imposed on a BNN during training, wherein, the parameters are clamped to after every backpropagation step. One can view this discretization as an implicit form of regularization. In fact, it is this extreme form of regularization that makes a BNN difficult to train (clean test accuracy observed with BNNs is, typically, lower than full-precision networks). As suggested in , the difficulty in training a BNN translates to difficulty in attacking the BNN as well. Referring to the data manifold intuition (Fig. 2), we can deduce that constraining the parameter space during a model’s training will introduce discontinuities and non-smoothness in its decision boundary. Since adversaries are created using gradients of a model (that is a property of the model’s decision boundary), generating gradients (and hence adversaries) for non-smooth functions will be difficult. This in turn will make a BNN less susceptible to adversaries. Note, the input image to a BNN is full precision.
Fig.5(a) compares the adversarial accuracy obtained for varying values for CIFAR10 BNN (AlexNet architecture) against a similar architecture full-precision network (with precision for weights and activations). We trained the networks for 40 epochs since BNNs require more training iterations to attain comparable accuracy as that of a full-precision network. Here, we do not incorporate input discretization in our analyses. All networks are fed inputs. In Fig. 5(a), for , BNN shows better adversarial resistance (i.e. adversarial accuracy is closer to clean accuracy). However, the BNN’s accuracy declines steeply as we move toward larger perturbation ranges. We note a similar trend for MNIST (trained for 10 epochs on FCN2 architecture), wherein the full-precision network yields improved robustness than the BNN for . These results contradict our intuition that increased discretization of BNNs should result in lesser adversarial susceptibility.
To understand this, we calculated the L1 norm of the first hidden layer activation of the FCN2 network in response to clean input images. We found that BNNs generally have a larger variance and range of values than full-precision network. Since BNN uses weight values (+/-1) which are typically of greater magnitude than the small weight values of a full precision network, we observe a larger range in the former case. Interestingly, we find that the L1 norm of the BNN activations (in response to adversarial images perturbed with lower values) approximately lie within the same range as that of the clean input case. In contrast, L1 norm for higher adversaries have a much higher range. For a full-precision network, the L1 norm range of the different adversaries and clean data typically intersect with each other owing to the lower weight values (Fig. 6). We believe that the extreme quantization of weight values in BNNs to higher magnitudes causes adversarial susceptibility for larger range perturbations. While the L1 norm analysis is not very substantial from a mathematical standpoint, it hinted us to increase the capacity (more neurons and weights) of the network. The motif here is that increasing the capacity would increase the overall range of activation values that might incorporate larger range perturbations. Exploding the network capacity for MNIST (FCN1 architecture) yielded a sizable improvement in adversarial resistance with BNN as compared to its corresponding full-precision counterpart (Fig. 5(b)). This is a crucial detail of our analysis that: while BNNs are intrinsically robust to adversaries (for small ), only models with sufficient capacity can withstand against large values.
Even with adversarial training, we observed the same trend that binarized networks of insufficient capacity do not yield as good adversarial robustness as that of a full-precision network (Table 3). For CIFAR10, full-precision network is the clear winner. While for MNIST (with excessive capacity FCN1 architecture), BNN yields improved robustness. A noteworthy observation here is that adversarial training substantially improves the robustness of a full-precision network (see CIFAR10 results in Fig. 5(a) , Table 3), while BNNs do not benefit much from them. In fact, we find that BNNs are difficult to train with adversarial training. The learning rate/other hyperparameters need to be tuned carefully to ensure that the BNN model converges to lower error values during adversarial training.  also observed a similar trend and explained that binarized weights are not as ‘malleable’ as full-precision weights and hence cannot easily adjust to all possible variations of adversarial data augmented to the training dataset. We think that increasing the capacity of the network compensates for the ‘non-malleability’ of the constrained parameters to certain extent. As a result, we see improved accuracy for MNIST in Table 3 with FCN1 architecture.
3.3 Discretization of Input and Parameter
Next, we combined both discretization strategies and analyzed the adversarial robustness of BNNs with varying image-level discretization. We compare the adversarial accuracy of BNNs to that of a full-precision network for iso-input discretization scenarios, as shown in Table 4 for CIFAR10 (AlexNet architecture trained for 40 epochs). In Table 4, BNN-2b (Full-2b) refers to a binarized (full-precision) model with input precision. Full precision models have precision weights and activations. While input discretization for a full-precision network suffers a sizeable accuracy loss, BNN’s accuracy fluctuation is marginal with a maximum of change. This is expected since BNNs (owing to +/-1 binarized parameters) do not have as many dimensions (as a full-precision network with weights and activations) to fit the extra information in the input data. Thus, BNNs fit , data likewise yielding similar generalization error. As opposed to the results seen earlier with inputs, BNNs with lower input precision () have significantly higher adversarial resistance than their full-precision counterparts even for large values. Model capacity does not restrict the adversarial resistance in this case. This is an artefact of the two-step quantization that increases the minimum allowable perturbation to shift a data point. We can also draw an alternate insight from this result: The constrained parameter space of BNNs restricts their overall exploration of the data manifold during training. Referring to Fig. 2 (b), this increases the probability of untrained or arbitrary hyper-volumes (for BNNs) thereby increasing their adversarial susceptibility. Increasing the capacity enables a BNN to explore the manifold better during training. By discretizing the input, we are restricting the overall data manifold space. This allows a model, even, with lower capacity to explore the manifold well thereby decreasing the extent of arbitrary hyper-volumes. Table 5 illustrates the accuracy results for MNIST (FCN1 architecture trained for 20 epochs).
We conducted adversarial training with input discretized BNNs to find out if it helps build adversarial robustness. The results are shown in Table 6. Comparing to the input BNN adversarial training results in Table 3, we observe a substantial gain in adversarial accuracy. However, contrasting the BNN results against Table 2 ( input full-precision networks), we observe similar performance gains. In fact, the accuracy gains for input CIFAR10 BNN with and without adversarial training (Table4/Table 6) are nearly the same. Earlier, we saw that the accuracy (for low values) of a full-precision network working on input data without adversarial training is similar to that of an adversarially trained network on inputs (Table 2, Fig. 4). Combining the adversarial training results till now, we can deduce the following: 1) For low input-precision () regime, adversarial training does not compound the adversarial resistance of a network (irrespective of binarized or full-precision parameters), for lower values. Adversarial training helps when the input has higher () precision. 2) Input discretization, in general, offers very strong adversarial defense for lower values. Discretizing the input as well as the parameter space furthers adversarial robustness. Adversarial training in a discretized input and parameter space does not benefit much and hence can be waived. However, in case of stronger multi-step attack scenarios and to gain robustness against larger perturbations (such as in CIFAR10, MNIST), the network needs to be adversarial trained with corresponding large values.
3.4 Analysis on CIFAR100 and Imagenet
Scaling up the discretization analysis to larger datasets yielded similar results as observed with CIFAR10, MNIST. Fig. 7 demonstrates the adversarial accuracy evolution for CIFAR100 (trained on ResNet20 architecture for 164 epochs) for binarized XNOR ( weights and activations) networks and corresponding full-precision ( weights and activations) models. Note, XNOR networks are similar to BNNs (1-bit weights/activations) with additional scaling factors to achieve higher accuracy on complex datasets. It is evident that input discretization is the most beneficial to obtain adversarial robustness. input discretized models in both cases yield adversarial accuracy close to the clean accuracy () for a large range of perturbations. The accuracy loss between clean and adversary for -input XNOR () is slightly better than the -input full-precision model (). This can be attributed to the intrinsic robustness offered by discretizing the parameter space of XNOR networks. Furthermore, the fact that -input XNOR yields higher adversarial accuracy for iso-perturbation values than -input full-precision model further demonstrates the ability of binarized networks to counter adversarial attacks. A noteworthy observation here is that the loss in clean accuracy between and input discretized full-precision network is small () as compared to the large loss observed earlier with CIFAR10 (Table 1). As we scale up the complexity of the dataset, the redundancies in the input dimensions increase. Discretizing the input for a complex dataset thus eliminates more redundant dimensions that do not contribute to the accuracy. In contrast, smaller datasets have lesser redundant dimensions and are thus at a risk of suffering a large accuracy drop with input discretization.
Fig. 8 shows the accuracy results for Imagenet (trained on AlexNet). We only show the top-5 adversarial accuracy. We see similar trends as CIFAR100. Note, the XNOR models are trained for 50 epochs, while full-precision models are trained for 90 epochs. As a result, we see lower baseline accuracy () in the former case. Like CIFAR100, the loss in clean test accuracy between and input discretization is minimal for each model. Also, the accuracy difference between clean and adversarial data for () is much lower than precision models (). This highlights the intrinsic robustness capability of binarized networks even for large-scale datasets.
Low-precision models or quantization techniques, so far, have been explored to reduce the resource utilization of DLNs for energy-efficient deployment on edge devices. We have demonstrated that discretization also warrants security against adversarial attacks, offering a key benefit of robustness in hardware implementation. In summary, the main findings/recommendations from this work are: 1) Input discretization is a major benefactor for adversarial robustness (with both binarized and full-precision) models. input discretized models (without adversarial training) yield similar adversarial accuracy as adversarially trained input models for lower perturbation values. Robustness against higher perturbations and multi-step attack requires adversarial training. 2) Binarized models are intrinsically more robust than full-precision models (with 32bit weights/activations). Adversarial training needs to be carefully done on sufficient capacity binarized networks to attain improved adversarial accuracy.
-  Y. LeCun, et. al, “Deep learning,” nature, 2015.
-  A. Kurakin, et. al, “Adversarial examples in the physical world,” arXiv:1607.02533, 2016.
-  I. J. Goodfellow, et. al, “Explaining & harnessing adversarial examples,” 2015.
-  P. Panda, et. al, “Explainable Learning: Implicit Generative Modelling during Training for Adversarial Robustness,” arXiv:1807.02188, 2018.
-  C. Szegedy, et. al, “Intriguing properties of neural networks,” 2013.
-  I. Hubara, et. al, “Binarized neural networks,” NIPS, 2016.
-  M. Rastegari, et. al, “Xnor-net: Imagenet classification using binary convolutional neural networks,” ECCV, 2016.
-  A. Galloway, et. al, “Attacking binarized neural networks,” ICLR, 2018.
-  F. Tramèr, et. al, “Ensemble adversarial training: Attacks & defenses,” 2018.
-  A. Kurakin, et. al, “Adversarial machine learning at scale,” ICLR, 2016.
-  A. Madry, et. al, “Towards deep learning models resistant to adversarial attacks,” ICLR, 2018.
-  Y. LeCun, “The mnist database of handwritten digits,” http://yann. lecun. com/exdb/mnist/, 1998.
-  A. Krizhevsky, et. al, “Cifar-10 and cifar-100 datasets,” URl: https://www. cs. toronto. edu/kriz/cifar. html, 2009.
-  https://github.com/itayhubara/BinaryNet.pytorch.
-  K. He, et. al, “Deep residual learning for image recognition,” CVPR, 2016.
-  J. Deng, et. al, “Imagenet: A large-scale hierarchical image database,” 2009.
-  https://github.com/jiecaoyu/XNOR-Net-PyTorch.