A Hamiltonian Monte Carlo Method for Probabilistic Adversarial Attack and Learning
Although deep convolutional neural networks (CNNs) have demonstrated remarkable performance on multiple computer vision tasks, researches on adversarial learning have shown that deep models are vulnerable to adversarial examples, which are crafted by adding visually imperceptible perturbations to the input images. Most of the existing adversarial attack methods only create a single adversarial example for the input, which just gives a glimpse of the underlying data manifold of adversarial examples. An attractive solution is to explore the solution space of the adversarial examples and generate a diverse bunch of them, which could potentially improve the robustness of real-world systems and help prevent severe security threats and vulnerabilities. In this paper, we present an effective method, called Hamiltonian Monte Carlo with Accumulated Momentum (HMCAM), aiming to generate a sequence of adversarial examples. To improve the efficiency of HMC, we propose a new regime to automatically control the length of trajectories, which allows the algorithm to move with adaptive step sizes along the search direction at different positions. Moreover, we revisit the reason for high computational cost of adversarial training under the view of MCMC and design a new generative method called Contrastive Adversarial Training (CAT), which approaches equilibrium distribution of adversarial examples with only few iterations by building from small modifications of the standard Contrastive Divergence (CD) and achieve a trade-off between efficiency and accuracy. Both quantitative and qualitative analysis on several natural image datasets and practical systems have confirmed the superiority of the proposed algorithm.
With the rapid development and superior performance achieved in various vision tasks, deep convolutional neural networks (CNNs) have eventually led to pervasive and dominant applications in many industries. However, most deep CNN models could be easily misled by natural images with imperceptible but deceptive perturbations. These crafted images are known as adversarial examples, which have become one of the biggest threats in real-world applications with security-sensitive purposes[51, 53, 62]. Devising an effective algorithm to generate such deceptive examples can not only help to evaluate the robustness of deep models, but also promote better understanding about deep learning for the future community development.
In the past literature, most state-of-the-art methods are well-designed for generating a single adversarial example only, for example, by maximizing the empirical risk minimization (ERM) over the target model, and might not be able to exhaustively explore the solution space of adversarial examples. In our opinion, adversarial examples of a deep model might form an underlying data manifold[11, 55, 58, 56] rather than scattered outliers of the classification surface.
Therefore, we argue that it is desirable and critical for adversarial attack and learning methods to have the ability of generating multiple diverse adversarial examples in one run for the following reasons. First, the diversity of adversarial examples can fully verify the robustness of an unknown system. Second, developing an attack with multiple distinct adversarial examples would enable adversarial training with such examples, which could make the model more robustness against white-box attacks. Third, it is necessary to preserve multiple adversarial examples since the solution space of adversarial examples only depends on the targeted model and its input image even if the objective energy function of adversarial examples is constantly being improved[4, 5, 73, 48], e.g. mapping the clipped gradient descent into tanh space or adding KL-divergence term. A series of adversarial samples can better depict the manifold of the solution space than a single global optimal, which can also bring more stable and superior performance on attacking. In fact, training these representative generative models also suffers from instability due to the difficulty of finding the exact Nash equilibrium[17, 10] or tackling memorization[46, 63, 14].
Motivated by the aforementioned observations, we rethink the generation of adversarial examples from the view of probabilistic distribution and develop an innovative paradigm called Hamiltonian Monte Carlo with Accumulated Momentum (HMCAM) for generating a sequence of adversarial examples in one run. Given the attack objective energy function, the HMCAM method first constructs a joint distribution by Hamiltonian equations and the Metropolis-Hastings algorithm is used to determine whether to transition to the candidate sample via the acceptance function based upon the proposal distribution and the candidate-generating density. To improve the efficiency of HMC, we further propose a new regime called accumulated momentum to adaptively control the step sizes, which allows the algorithm to move with different step sizes along the search direction at different positions. Conceptually, our HMCAM paradigm also reveals the roles of the well-known FSGM family algorithms, including FSGM, I-FGSM, PGD and MI-FGSM. These methods can be considered as special cases of HMC with minor modifications. Inspired by our new paradigm, we further design a new generative method, called Contrastive Adversarial Training (CAT) , which approaches equilibrium distribution of adversarial examples with only few iterations by building from small modifications of the standard Contrastive Divergence. We verify the effectiveness of both the adversarial attack and the training algorithms in multiple scenarios. For the investigation of adversarial attack, we test our algorithm on single and ensemble models in both white-box and black-box manners. Extensive experiments conducted on the CIFAR10 dataset show that our method achieves much higher success rates with fewer iterations for black-box models and maintains similar success rates for white-box models. We also evaluate the proposed HMCAM on the CAAD 2018 defense champion solution. It outperforms the official baseline attack and M-PGD (PGD with momentum) by a large margin, which clearly demonstrates the effectiveness of the proposed adversarial method. To further show the practical applicability of our proposed method, we launch our attack on the real-world celebrity recognition system such as Clarifai, AWS and Azure. Compared with traditional iterative attack methods, HMCAM is able to generate more successful malicious examples to fool the systems through sampling from the likelihood models. For adversarial training, our CAT algorithm achieves much higher robustness than any other state-of-the-art adversarial training methods on both the CIFAR-10 and MNIST datasets and reaches a balance of performance and efficiency. In summary, this paper has the following contributions:
We formulate the problem of generating adversarial examples in a HMC framework, which can produce multiple fair samples and better represent the underlying distribution of the adversarial examples. These fair samples can well reflect the typical state of the underlying system.
We design a new regime called accumulated momentum to adaptively control the step sizes, which allows the algorithm to move with different step sizes along the search direction at different positions, and thus improves the efficiency of HMC.
We thoroughly compare the effectiveness of our algorithms in various settings against several iterative attack methods on both CIFAR10 and ImageNet, including the champion solution in the defense track of CAAD 2018 competitions. We also investigate the high efficiency of HMC framework in adversarial training and show the practical applicability of our HMCAM by successfully attacking the real-world celebrity recognition system.
2 Related Work
Adversarial Attacks. Since Szegedy et al. first revealed that deep learning models were vulnerable to adversarial attacks, learning how to generate adversarial examples has quickly attracted wide research interest. Goodfellow et al. developed a single gradient step method to generate adversarial examples, which was known as the fast gradient sign method (FGSM). Kurakin et al. extended FGSM to an iterative version and obtained much stronger adversarial examples. Based on their works, Madry et al. started projected gradient descent (PGD) from several random points in the -ball around the natural example and iterate PGD. Dong et al. proposed to add the momentum term into iterative process to boost adversarial attacks, which won the first places in the NIPS 2017 Adversarial Attacks and Defenses Competition. Due to the high efficiency and high success rates, the last two methods have been widely used as baseline attack models in many competitions. Our method also belongs to the iterative attack family but has much faster convergence and better transferability than alternative methods. When compared with recent similar works on distributional attack[29, 73], our HMC-based methods can better explore the distribution space of adversarial samples and reveal the reason for the high computational cost of adversarial training from the perspective of MCMC. Adversarial Defense. To deal with the threat of adversarial examples, different strategies have been studied with the aim of finding countermeasures to protect ML models. These approaches can be roughly categorized into two main types: (a) detection only and (b) complete defense. The goal of the former approaches[28, 3, 39, 35, 26, 59, 70] is to reject the potential malignant samples before feeding them to the ML models. However, it is meaningless to pinpoint the defects for developing more robust ML models. Complimentary to the previous defending techniques, the latter defense methods often involve modifications in the training process. For example, gradient masking[43, 44, 2] or randomized models[31, 65, 61, 25, 33] obfuscate the gradient information of the classifiers to confuse the attack mechanisms. There are also some add-on modules[66, 1, 13, 30, 27, 16] being appended to the targeted network to protect deep networks against the adversarial attacks. Besides all the above methods, adversarial training[12, 23, 21, 37, 32] is the most effective way, which has been widely verified in many works and competitions. However, limited works[50, 71] focus on boosting robust accuracy with reasonable training time consumption.
Markov Chain Monte Carlo Methods. Markov chain Monte Carlo (MCMC) established a powerful framework for drawing a series of fair samples from the target distribution. But MCMC is known for its slow convergence rate which prevents its wide use in time critical fields. To address this issue, Hamiltonian (or Hybrid) Monte Carlo method (HMC) [9, 41] was introduced to take advantages of the gradient information in the target solution space and accelerate the convergence to the target distribution. Multiple variants of HMC [45, 49, 19] were also developed to integrate adaptive strategies for tuning step size or iterations of leapfrog integrator. Recently, the fusion of MCMC and machine learning hastens wide range of applications, including data-driven MCMC[60, 6], adversarial training, cooperative learning, which shows great potential of MCMC in deep learning.
In this section, we briefly review the Markov chain Monte Carlo (MCMC) method  and Hamiltonian Monte Carlo (HMC) methods [9, 41]. Then we will explain that most of the existing methods for generating adversarial examples are the specializations of HMC. Finally, we illustrate how to modify the update policy of the momentum item in HMC to obtain a better trajectory.
3.1 Review: MCMC and Hamiltonian Monte Carlo
We now give the overall description of Metropolis-Hasting based MCMC algorithm. Suppose is our target distribution over a space , MCMC methods construct a Markov Chain that has the desired distribution as its stationary distribution. At the first step, MCMC chooses an arbitrary point as the initial state. Then it repeatedly performs the dynamic process consisting of the following steps: (1) Generate a candidate sample as a “proposed” value for from the candidate-generating density , which generates a value from when a process is at the state . (2) Compute the acceptance probability , which is used to decide whether to accept or reject the candidate. (3) Accept the candidate sample as the next state with probability by setting . Otherwise reject the proposal and remain . Although MCMC makes it possible to sample from any desired distributions, its random-walk nature makes the Markov chain converge slowly to the stationary distribution .
In contrast, HMC employs physics-driven dynamics to explore the target distribution, which is much more efficient than the alternative MCMC methods. Before introducing HMC, we start out from an analogy of Hamiltonian systems in  as follows. Suppose a hockey puck sliding over a surface of varying height and both the puck and the surface are frictionless. The state of the puck is determined by potential energy and kinetic energy , where and are the position and the momentum of the puck. The evolution equation is given by the Hamilton’s equations:
Due to the reversibility of Hamiltonian dynamics, the total energy of the system remains constant:
As for HMC, it contains three major parts: (1) Hamiltonian system construction; (2) Leapfrog integration; (3) Metropolis-Hastings correction. Firstly, the Hamiltonian is an energy function for the joint density of the variables of interest and auxiliary momentum variable , so HMC defines a joint distribution via the concept of a canonical distribution:
where for the common setting. Then, HMC discretizes the system and approximately simulates Eq. (1) over time via the leapfrog integrator. Finally, because of inaccuracies caused by the discretization, HMC performs Metropolis-Hastings correction without reducing the acceptance rate. A full procedure of HMC is described in Algorithm 1.
Since is an auxiliary term and always setting with identity matrix for standard HMC, our aim is that the potential energy can be defined as to explore the target density more efficiently than using a proposal probability distribution. If we can calculate , then we can simulate Hamiltonian dynamics that can be used in an MCMC technique.
3.2 Simulating Adversarial Examples Generating by HMC
Considering a common classification task, we have a dataset that contains normalized data and their one-hot labels . We identify a target DNN model with an hypothesis from a space . The cross entropy loss function is used to train the model. Assume that the adversarial examples for with label are distributed over the solution space . Given any input pair , for a specified model with fixed parameters, the adversary wants to find such examples that can mislead the model:
where is the neighboring regions of and defined as . From the perspective of Bayesian statistics, we can make inference about adversarial examples over a solution space from the posterior distribution of given the natural inputs and labels .
|Methods||Hamiltonian system construction||Iteration||Metropolis-Hastings correction|
|potential energy?||kinetic energy?||sampling?||update?||update?|
|FGSM||✓, but implicit||✓, but implicit|
|I-FGSM||✓, but implicit||✓, but implicit||✓|
|PGD||✓, but implicit||✓, but implicit||✓, but independent||✓|
|MI-FGSM||✓, but implicit||✓, but implicit||✓||✓|
In Hamiltonian system, it becomes to generate samples from the joint distribution . Let , according to Eq. (6) and (4), we can express the posterior distribution as a canonical distribution (with ) using a potential energy function defined as:
Since is the usual classification likelihood measure, the question remains how to define . A sensible choice is a uniform distribution over the ball around , which means we can directly use a DNN classifier to construct a Hamiltonian system for adversarial examples generating as the base step of HMC.
Recall that the development of adversarial attacks is mainly based on the improvement of the vanilla fast gradient sign method, which derives I-FGSM, PGD and MI-FGSM. For clarity, we omit some details about the correction due to the constraint of adversarial examples. The core policy of the family of fast gradient sign methods is:
where is the gradient of at the -th iteration, i.e., . It is clear that the above methods are the specialization of HMC by setting:
More specifically, I-FGSM can be considered as the degeneration of HMC, which explicitly updates the position item but implicitly changes the momentum item at every iteration. One of the derivation of I-FGSM, MI-FGSM, has explicitly updated both and by introducing after Eq. (8) at each step with the decay factor . The other derivative PGD runs Eq. (8) on a set of initial points adding different noises, which can be treated as a parallel HMC but the results are mutually independent.
3.3 Adaptively Exploring the Solution Space with Accumulated Momentum
Although the above formulation has proved that HMC can be used to simulate adversarial examples generating, one major problem of these methods is that and are not independent because of as discussed in Eq. (9). The other disadvantage is in optimization: SGD scales the gradient uniformly in all directions, which can be particularly detrimental for ill-scaled problems. Like the need to choose step size in HMC, the laborious learning rate tuning is also troublesome.
To overcome the above two problems, we present a Hamiltonian Monte Carlo with Accumulated Momentum (HMCAM) for adversarial examples generating. The resulting HMCAM algorithm is shown in Algorithm 2. The core of our accumulated momentum strategy is using exponential moving average (EMA) to approximate the first and second moment of the stochastic gradient by weighted accumulating the history moment information. Let us initialize the exponential moving average as . After inner-loop steps, the accumulated momentum is:
The derivation for the second moment estimate is completely analogous. Owing to the fact that the decay rates close to 1 is typically recommended in practice, the contribution of older gradients decreases exponentially. But meanwhile, we can observe in Eq. (10) that the current gradient only accounts for , which is much smaller than . This indicates that performing exponential moving averages for the step in lieu of the gradient greatly reduces the relevance between and the current position . That makes the sequence of samples into an approximate Markov chain.
As for step size, there always be a tradeoff between using long trajectories to make HMC more efficient or using shorter trajectories to update more frequently. Ignoring small constant , our accumulated momentum is to update the position by:
where corrects the biasd estimation of moments towards initial values at early stages due to the property of EMA. When approaching to the minima, automatically decreases the size of the gradient steps along different coordinates. Because leads to smaller effective steps in solution space when closer to zero, this anisotropic scale of step size helps to escape sharp local minimal at the later period of the learning process at some coordinates, which leads to better generalization. We apply similar idea as  by replacing to that maintains the maximum of all history to keep a non-increasing step size . To guarantee the step size does not exceed the magnitude of adversarial perturbations, we confines the to a predefined maximum by applying element-wise .
After every full inner iteration, we calculate the acceptance rate of the candidate sample by M-H sampling and reinitialize the first/second moment as well as the maximum of second moment to zero and then perform the next generation. M-H algorithm distributes the generating samples to staying in high-density regions of the candidate distribution or only occasionally visiting low-density regions through the acceptance probability. As more and more sample are produced, the distribution of samples more closely approximates the desired distribution and its returning samples are more in line with such distribution than other works like PGD with random starts.
4 Contrastive Adversarial Training
Assume softmax is employed for the output layer of the model and let denote the softmax output of a given input , i.e., , where is the number of categories. We also assume that there exists an oracle mapping function , which pinpoints the belonging of the input to all the categories by accurate confidence scores . The common training is to minimize the cross-entropy (CE) loss, which is defined as:
where is the manual one-hot annotation of the input since is invisible. The goal of Eq. (12) is to update the parameters of for better approaching , which leads to:
Suppose the target DNN model correctly classifies most of the input after hundreds of iterations, it will still be badly misclassified by adversarial examples (i.e., ). In adversarial training, these constructed adversarial examples are used to updates the model using minibatch SGD. The objective of this minmax game can be formulated as a robust optimization following:
As mentioned in Section 3.2, the inner maximization problem can be reformulated as the process of HMC. It is obvious that the high time consumption of adversarial training is caused by the long trajectory of HMC. But running a full trajectory for many steps is too inefficient since the model changes very slightly between parameter updates. Thus, we take advantage of that by initializing a HMC at the state in which it ended for the previous model. This initialization is often fairly close to the model distribution, even though the model has changed a bit in the parameter update. Besides, the high acceptance rate of HMC indicates that it is not neccesary to run a long Markov Chain from the initial point. Therefore, we can simply run the chain for one full step and then update the parameters to reduce the tendency of the chain to wander away from the initial distribution on the first step instead of running the full trajectory to equilibrium. We takes small number of transitions from the data sample as the initial values of the MCMC chains and then use these -step MCMC samples to approximate the gradient for updating the parameters of the model. Algorithm3 summarizes the full algorithm.
Moreover, we also present a new training objective function , which minimizes the difference of KL divergence between two adjacent sampling steps to substitute the common KL loss:
where denotes a Kullback-Leibler divergence and and are the balanced factors. The intuitive motivation for using this is that we would like every state in HMC exploring to leave the initial distribution and would never exceed until achieves the equilibrium distribution. We set and analyze how this objective function influences the partial derivative of the output probability vector with respect to the input. Due to the fact that the equilibrium distribution is considered as a fixed distribution and the chain rule, we only need to focus on the derivative of the softmax output vector with respect to its input vector in the last layer as follows:
where . Based on this abbreviation, we can easily get the relationship between Eq. (16) and . For each adversarial example generation, Eq. (16) makes an amendment of which is determined by the difference of current and the last -step HMC samples output probability. Since and are more closer to and than and , each update of would be better corrected.
In this section, we conduct extensive experimental evaluations of our proposed methods on three benchmarks: CIFAR10, ImageNet and MNIST. Firstly, we briefly introduce the major implementation settings in Section. 5.1, and perform comprehensive comparisons to verify the superiority of our HMCAM method on single and ensemble models in both white-box and black-box manners in Section. 5.2 and Section. 5.3. Then, we perform detailed ablation studies to demonstrate the influence of different aspects in HMCAM and explore the possibility of few sample learning for competitive results in adversarial training in Section. 5.4. To further test the efficiency of CAT method in adversarial training, we provide detailed quantitative comparison results of our proposed models in Section. 5.5. Finally, to investigate the generalization of our approach, we also perform experiments on ImageNet against the champion solution in the defense track of CAAD 2018 competitions in Section. 5.6.1 and attempt to launch attack on public face recognition systems in Section. 5.6.2.
5.1 Datasets and Implementation Details
Datasets. We employ the following four benchmark datasets for a comprehensive evaluation to validate the effectiveness of our HMCAM and CAT methods.
CIFAR10 is a widely used dataset consisting of 60,000 colour images of 10 categories. Each category has 6,000 images. Due to the resource limitation, we mainly focus on the CIFAR10 dataset with extensive experiments to validate the effectiveness of the proposed methods on both adversarial attack and training.
ImageNet a large dataset with 1,283,166 images in the training set and 50,000 images in the validation set images collected from the Web. It has 1,000 synsets used to label the images. As it is extremely time-consuming to train a model from scratch on ImageNet, we only use it to test the generalization of our approach, which fights against the champion solution in the defense track of CAAD 2018 competitions.
MNIST is a database for handwritten digit classification. It consists of 60,000 training images and 10,000 test images, which are all greyscale images, representing the digits 09. In this experiment, we only perform different adversarial training methods on MNIST.
Implementation details. For adversarial attack, we pick six models, including four normally trained single models (ResNet32, VGG16 (without BN), ResNetXt29-8-64 and DenseNet121) and one adversarially trained ensemble models (). The hyper-parameters of different attack methods follow the default settings in  and the total iteration number is set to (in most cases except HMCAM). We fix and for HMCAM, and the decay rate is set to 1.0 for M-PGD (MI-FGSM+PGD). The magnitude of maximum perturbation at each pixel is . For simplicity, we only report the results based on norm for the non-targetd attack.
For adversarial training, we follow the training scheme used in Free and YOPO on CIFAR10. We choose the standard Wide ResNet-34 and Preact-ResNet18 following previous works[37, 71]. For PGD adversarial training, we set the total epoch number as a common practice. The initial learning rate is set to 5e-2, reduced by 10 times at epoch 79, 90 and 100. We use a batch size of 256, a weight decay of 5e-4 and a momentum of 0.9 for both algorithms. During evaluating, we test the robustness of the model under CW, M-PGD and 20 steps of PGD with step size and magnitude of perturbation based on norm. When performing YOPO and Free, we train the models for 40 epochs and the initial learning rate is set to 0.2, reduced by 10 times at epoch 30 and 36. As for ImageNet, we fix the total loop times same as Free-4 for fair comparison. For all methods, we use a batch size of 256, and SGD optimizer with momentum 0.9 and a weight decay of 1e-4. The initial learning rate is 0.1 and the learning rate is decayed by 10 every epochs. We also set step size and magnitude of perturbation based on norm.
5.2 Attacking a Single Model
We compare the attack success rates of HMCAM with the family of FGSM on a single network in Table II. The adversarial examples are created by one of the six networks in turns and test on all of them. The italic columns in each block indicate white-box attacks and others refer to black-box attacks. From the Table II, we can observe that HMCAM outperforms all other FGSM family attacks by a large margin in black-box scenario, and maintains comparable results on all white-box attacks with M-PGD. For example, HMCAM obtains success rates of 74.92% on ResNetXt29-8-64 (white-box attack), 78.37% on DenseNet121 (black-box attack on normally trained model) and 14.11% on (black-box attack on adversarially trained model) if adversarial examples are crafted on ResNetXt29-8-64, while M-PGD only reaches the corresponding success rates of 72.81%, 42.53% and 10.11%, respectively. Considering that the white-box attack is usually used as a launch pad for the black-box attack, this demonstrates the practicality and effectiveness of our HMCAM for improving the transferability of adversarial examples.
Note that AI-FGSM is a special case of HMCAM (, ), which means AI-FGSM only carries out the inner loop in Algorithm 2 for position and momentum updating. But AI-FGSM also reaches much higher success rates than FSGM family. This shows the superiority of our accumulated momentum strategy.
5.3 Attacking an Ensemble of Models
Although our AI-FGSM and HMCAM better improve the success rates for attacking model in black-box scenario, the results of all the attack methods on adversarially trained model, e.g., , are far from satisfactory. To solve this problem, generating adversarial examples on the ensemble models[34, 8, 67] rather than a single model have been broadly adopted in the black-box scenario for enhancing the transferability and shown its effectiveness.
For the ensemble-based strategy, each one of the six models introduced above will be selected as the hold-out model while the rest build up an ensemble model. The ensemble weights are set equally for all the six models. The results are shown in Table III. The ensemble block consists of the white-box attack which uses the ensemble model to attack itself, and the hold-out block is composed of the black-box attack that utilizes the ensemble model to generate adversarial examples for its corresponding hold-out model.
We can observe from Table III that our AI-FGSM and HMCAM always show much better transferability than other methods no matter which target model is selected. For example, the adversarial examples generated by an ensemble of ResNet32, VGG16 and DenseNet121 (ResNetXt29-8-64 hold-out) can fool ResNetXt29-8-64 with a 83.07% success rate. Moreover, our proposed methods can remarkably boost the transferability of adversarial examples on adversarially trained model.
5.4 Ablation Study on Adversarial Attack
In the following sections, we perform several ablation experiments to investigate how different aspects of HMCAM influence its effectiveness. For simplicity, we only attack five single models introduced in the previous section, and focus on comparing our HMCAM with M-PGD since M-PGD is one of the most effective iterative attack method so far. We report the results in both white-box and black-box scenarios.
Influence of Iteration Number
To further demonstrate how fast our proposed method converges, we first study the influence of the total iteration number on the success rates. We clip a snippet over a time span of 10 iterations from the very beginning. Results are shown in Fig. 2.
These results indicate that (1) the success rate of HMCAM against both white-box and black-box models are higher than M-PGD at all stages when combining with the extensive comparisons in Table II, which shows the strength of our HMCAM. (2) Even when the number of iterations is one order lower than that in Table II, the success rate of both HMCAM and M-PGD are still higher than PGD on the black-box scenario. Moreover, HMCAM () reaches higher values than PGD (), demonstrating that HMCAM has strong attack ability and fast converges on both the white-box and black-box scenarios.
|Methods||Natural||PGD-20 Attack||M-PGD-20 Attack||CW Attack||Speed (mins)|
|Methods||Natural||PGD-20 Attack||M-PGD-20 Attack||CW Attack||Speed (mins)|
Influence of Step Size
We also study the influence of the step size on the success rates under both white-box and black-box settings. For simplicity, we fix the total iteration and set for HMCAM. We control the step size in the range of . The results are plotted in Fig. 3. It can be observed that HMCAM outperforms M-PGD on both small and large step size. Under both the white-box and the black-box settings, our HMCAM is insensitive to the step size attributing to the accumulated momentum strategy.
Fewer samples for competitive results
Since HMCAM is able to explore the distribution of adversarial examples, we finally investigate what aspects of systems are strengthened by our method. We also investigate whether the competitive result can be achieved with fewer samples when compared to the regular adversarial training. We generate adversarial images using FGSM, BIM and PGD to adversarially retrain the model and remain M-PGD to attack. We fix the total iteration . To test the diversity of our generated samples, we select only samples from the whole training set for generating adversarial samples, then mixed into the training set for adversarial training. For fair comparison, we allow other methods except HMCAM to select more samples satisfying . We sweep the sampling number among . The results are plotted in Fig. 4. It is clear to see that the system trained by our HMCAM, only using two orders of magnitude fewer natural samples than any other method, can achieve comparable robustness. Considering the compared methods utilize the extra samples truly on the adversarial manifold, this indicates that our HMCAM draws the distribution of adversarial examples with few samples indeed.
5.5 Efficiency for Adversarial Training
In this subsection, we investigate whether the training time of adversarial training can benefit from the view of HMC since the high computational cost of adversarial training can be easily attributed to the long trajectory of MCMC finding the stationary distribution of adversarial examples. We take fixed but small number of transitions from the data sample as the initial values of the MCMC chains and then use these -step MCMC samples to approximate the gradient for updating the parameters of model. We calculate the deviation value of the last 5 evaluations and report the average over 5 runs. Results about Preact-ResNet18 and Wide ResNet34 on CIFAR10 are shown in Table IV and Table V, respectively. Our CAT method greatly boost the robust accuracy in a reasonable training speed.
We also present a comparison in terms of both clean accuracy and robust accuracy per iteration on all methods evaluated during training in Figure. 5. When compared with YOPO, the robust accuracy of our CAT method rises steadily and quickly while YOPO vibrates greatly and frequently.
For ImageNet, we report the average results over last three runs. Comparison between free adversarial training and ours are shown in Table VI. Although the 2-PGD trained ResNet-50 model still maintains its leading role in the best robust accuracy, it takes three times longer than our CAT method. Actually, when compared with its high computational cost of ImageNet training, this performance gain can be considered inefficient or even impractical for resource limited entities. We also compare ResNet-50 model trained by our CAT method with the Free-4 trained, model trained by CAT produces much more robust models than Free-4 against different attacks in almost the same order of time.
|Methods||Clean Data||PGD-10 Attack||PGD-20 Attack||PGD-50 Attack||MI-FGSM-20 Attack||Speed (mins)|
We also investigate our CAT method on MNIST. We choose a simple ConvNet with four convolutional layers followed by three fully connected layers, which is of the same as . For PGD adversarial training, we train the models for 55 epochs. The initial learning rate is set to 0.1, reduced by 10 times at epoch 45. We use a batch size of 256, a weight decay of 5e-4 and a momentum of 0.9. For evaluating, we perform a PGD-40 and CW attack against our model and set the size of perturbation as based on norm as a common practice [37, 71, 72]. Results are shown in Table VII.
|Clean Data||PGD-40 Attack||CW Attack|
5.6 Competitions and Real World Systems Attack
Attack CAAD 2018 Defense Champion
Adversarial Attacks and Defenses (CAAD) 2018 is an open competition involving an exciting security challenge which stimulate the interest of a wide range of talents from industry and academia on adversarial learning. In the defense track of CAAD 2018, the champion solution devised new network architectures with novel non-local means blocks and better adversarial training scheme, which greatly surpassed the runner-up approach under a strict criterion. We download the meticulously pretrained models
|Methods||10/100-step Success Rate (%)|
Attack on Public Face Recognition Systems
To further show the practical applicability of attack, we apply our HMCAM to the real-world celebrity recognition APIs in Clarifai
|Geekpwn CAAD 2018||3||0||0|
We choose 10 pairs of images from the LFW dataset and learn perturbations from local facenet model to launch targeted attack, whose goal is to mislead the API to recognize the adversarial images as our selected identity. We randomly pick up 10 celebrities as victims from Google and 10 existing celebrities as targets from LFW, ensuring that all colors and genders are taken into account. Then we apply the same strategy as Geekpwn CAAD 2018 method that pulls victims towards their corresponding targets by the inner product of their feature vectors and generates noise to them. Finally, we examine their categories and confidence scores by uploading these adversarial examples to the online systems API.
We fix and total iteration number . Besides, we also set to generate a sequence of adversarial examples to test the robustness of these online systems. Here we propose a strict evaluation criterion derived from for our HMCAM attacker, which we also call “all-or-nothing”: an attack is considered successful only if all the adversarial examples in our generated sequence can deceive the system. This is a challenging evaluation scenario. As shown in Table IX, quite a part of them pass the recognition of the online systems and output the results we want. The qualitative results are given in the supplementary document. Note that we also compare our HMCAM method with one of state-of-the-art black-box attack method Attack, which aims at finding a probability density distribution around the input and estimates the gradient by a modified NES method. Comparisons between Attack and HMCAM show that the samples generated by our proposed method have the stronger transferability since HMCAM is just a white-box attack method.
In this paper, we formulate the generation of adversarial examples as a MCMC process and present an efficient paradigm called Hamiltonian Monte Carlo with Accumulated Momentum (HMCAM). In contrast to traditional iterative attack methods that aim to generate a single optimal adversarial example in one run, HMCAM can efficiently explore the distribution space to search multiple solutions and generate a sequence of adversarial examples. We also develop a new generative method called Contrastive Adversarial Training (CAT), which approaches equilibrium distribution of adversarial examples with only few iterations by building from small modifications of the standard Contrastive Divergence. Extensive results with comparisons on CIFAR10 showed that not only HMCAM attained much higher success rates than other black-box models and comparable results as other white-box models in adversarial attack, but also CAT achieved a trade-off between efficiency and accuracy in adversarial training. By further evaluating this enhanced attack against the champion solution in the defense track of CAAD 2018 competition, HMCAM outperforms the official baseline attack and M-PGD. To demonstrate its practical applicability, we apply the proposed HMCAM method to investigate the robustness of real-world celebrity recognition systems, and compare against the Geekpwn CAAD 2018 method. The result shows that the existing real-world celebrity recognition systems are extremely vulnerable to adversarial attacks in the black-box scenario since most examples generated by our approach can mislead the system with high confidence, which raises security concerns for developing more robust celebrity recognition models. The proposed attack strategy leads to a new paradigm for generating adversarial examples, which can potentially assess the robustness of networks and inspire stronger adversarial learning methods in the future.
Hongjun Wang (S’20) received his B.E. degree of information security from Sun Yat-Sen University, Guangzhou, China, in 2018. He is currently working toward the M.E. degree at Sun Yat-Sen University. His current research interests include computer vision and the security of machine learning, particularly in adversarial attacks and defenses.
Guanbin Li (M’15) is currently an associate professor in School of Data and Computer Science, Sun Yat-sen University. He received his PhD degree from the University of Hong Kong in 2016. His current research interests include computer vision, image processing, and deep learning. He is a recipient of ICCV 2019 Best Paper Nomination Award. He has authorized and co-authorized on more than 60 papers in top-tier academic journals and conferences. He serves as an area chair for the conference of VISAPP. He has been serving as a reviewer for numerous academic journals and conferences such as TPAMI, IJCV, TIP, TMM, TCyb, CVPR, ICCV, ECCV and NeurIPS.
Xiaobai Liu is currently an Associate Professor of Computer Science in the San Diego State University (SDSU), San Diego. He received his PhD from the Huazhong University of Science and Technology, China. His research interests focus on scene parsing with a variety of topics, e.g. joint inference for recognition and reconstruction, commonsense reasoning, etc. He has published 60+ peer-reviewed articles in top-tier conferences (e.g. ICCV, CVPR etc.) and leading journals (e.g. TPAMI, TIP etc.). He received a number of awards for his academic contribution, including the 2013 outstanding thesis award by CCF(China Computer Federation). He is a member of IEEE.
Liang Lin (M’09, SM’15) is a full Professor of Sun Yat-sen University. He is an Excellent Young Scientist of the National Natural Science Foundation of China. From 2008 to 2010, he was a Post-Doctoral Fellow at the University of California, Los Angeles. From 2014 to 2015, as a senior visiting scholar, he was with The Hong Kong Polytechnic University and The Chinese University of Hong Kong. He currently leads the SenseTime RD teams to develop cutting-edge and deliverable solutions on computer vision, data analysis and mining, and intelligent robotic systems. He has authored and co-authored more than 100 papers in top-tier academic journals and conferences. He has been serving as an associate editor of IEEE Trans. Human-Machine Systems, The Visual Computer and Neurocomputing. He served as area/session chairs for numerous conferences, such as ICME, ACCV, ICMR. He was the recipient of the Best Paper Runners-Up Award in ACM NPAR 2010, the Google Faculty Award in 2012, the Best Paper Diamond Award in IEEE ICME 2017, and the Hong Kong Scholars Award in 2014. He is a Fellow of IET.
- (2018) Defense against universal adversarial perturbations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3389–3398. Cited by: §2.
- (2018) Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In ICML, pp. 274–283. Cited by: §2.
- (2018) Enhancing robustness of machine learning systems via data transformations. In 2018 52nd Annual Conference on Information Sciences and Systems (CISS), pp. 1–5. Cited by: §2.
- (2017) Exploring the space of black-box attacks on deep neural networks. arXiv preprint arXiv:1712.09491. Cited by: §1.
- (2017) Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. Cited by: §1, §5.1.
- (2014) Stochastic gradient hamiltonian monte carlo. In International conference on machine learning, pp. 1683–1691. Cited by: §2.
- (2009) Imagenet: a large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Cited by: 2nd item, §5.
- (2018) Boosting adversarial attacks with momentum. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 9185–9193. Cited by: Fig. 1, §1, §2, TABLE I, §5.3, TABLE VIII.
- (1987) Hybrid monte carlo. Physics letters B 195 (2), pp. 216–222. Cited by: §2, §3.
- (2020) GANs may have no nash equilibria. arXiv preprint arXiv:2002.09124. Cited by: §1.
- (2018) Adversarial spheres. In ICLR, Cited by: §1.
- (2015) Explaining and harnessing adversarial examples. In ICLR, Cited by: §2, TABLE I.
- (2015) Towards deep neural network architectures robust to adversarial examples. In ICLR, Cited by: §2.
- (2019) Towards GAN benchmarks which require generalization. See ?, External Links: Cited by: §1.
- (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §5.1.
- (2019) Non-local context encoder: robust biomedical image segmentation against adversarial attacks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 8417–8424. Cited by: §2.
- (2017) Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in neural information processing systems, pp. 6626–6637. Cited by: §1.
- (2002) Training products of experts by minimizing contrastive divergence. Neural computation 14 (8), pp. 1771–1800. Cited by: §1.
- (2014) The no-u-turn sampler: adaptively setting path lengths in hamiltonian monte carlo.. Journal of Machine Learning Research 15 (1), pp. 1593–1623. Cited by: §2.
- (2017) Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708. Cited by: §5.1.
- (2018) Adversarial logit pairing. arXiv preprint arXiv:1803.06373. Cited by: §2, §5.6.1.
- (2009) Learning multiple layers of features from tiny images. Technical report Citeseer. Cited by: 1st item, §5.
- (2017) Adversarial machine learning at scale. In ICLR, Cited by: §1, §2, TABLE I.
- (1998) The mnist database of handwritten digits. http://yann. lecun. com/exdb/mnist/. Cited by: 3rd item, §5.
- (2019) Certified robustness to adversarial examples with differential privacy. In 2019 IEEE Symposium on Security and Privacy (SP), pp. 656–672. Cited by: §2.
- (2018) A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In Advances in Neural Information Processing Systems, pp. 7167–7177. Cited by: §2.
- (2019) ROSA: robust salient object detection against adversarial attacks. IEEE transactions on cybernetics. Cited by: §2.
- (2017) Adversarial examples detection in deep networks with convolutional filter statistics. In Proceedings of the IEEE International Conference on Computer Vision, pp. 5764–5772. Cited by: §2.
- (2019) Nattack: learning the distributions of adversarial examples for an improved black-box attack on deep neural networks. arXiv preprint arXiv:1905.00441. Cited by: §2, §5.6.2, TABLE IX.
- (2018) Defense against adversarial attacks using high-level representation guided denoiser. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1778–1787. Cited by: §2.
- (2018) Towards robust neural networks via random self-ensemble. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 369–385. Cited by: §2.
- (2019) Rob-gan: generator, discriminator and adversarial attacker. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §2.
- (2019) Adv-bnn: improved adversarial defense through robust bayesian neural network. In ICLR, Cited by: §2.
- (2017) Delving into transferable adversarial examples and black-box attacks. In ICLR, Cited by: §5.3.
- (2018) Characterizing adversarial subspaces using local intrinsic dimensionality. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, Cited by: §2.
- (2008) Visualizing data using t-sne. Journal of machine learning research 9 (Nov), pp. 2579–2605. Cited by: Fig. 1.
- (2018) Towards deep learning models resistant to adversarial attacks. In ICLR, Cited by: Fig. 1, §1, §2, TABLE I, §4, §5.1, §5.5, TABLE IV, TABLE V, TABLE VIII.
- (1953) Equation of state calculations by fast computing machines. The journal of chemical physics 21 (6), pp. 1087–1092. Cited by: §3.1.
- (2017) On detecting adversarial perturbations. In ICLR, Cited by: §2.
- (1993) Probabilistic inference using markov chain monte carlo methods. Department of Computer Science, University of Toronto Toronto, Ontario, Canada. Cited by: §2, §3.
- (2011) MCMC using hamiltonian dynamics. Handbook of markov chain monte carlo 2 (11), pp. 2. Cited by: §2, §3.1, §3.
- (2018) Adversarial robustness toolbox v1.0.1. CoRR 1807.01069. External Links: Cited by: §5.1.
- (2016) Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE Symposium on Security and Privacy (SP), pp. 582–597. Cited by: §2.
- (2017) Extending defensive distillation. arXiv preprint arXiv:1705.05264. Cited by: §2.
- (2010) Adaptively scaling the metropolis algorithm using expected squared jumped distance. Statistica Sinica, pp. 343–364. Cited by: §2.
- (2016) Unsupervised representation learning with deep convolutional generative adversarial networks. See ?, External Links: Cited by: §1.
- (2018) On the convergence of adam and beyond. In ICLR, Cited by: §3.3.
- (2019) Decoupling direction and norm for efficient gradient-based l2 adversarial attacks and defenses. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4322–4330. Cited by: §1.
- (2015) Markov chain monte carlo and variational inference: bridging the gap. In International Conference on Machine Learning, pp. 1218–1226. Cited by: §2.
- (2019) Adversarial training for free!. In Advances in Neural Information Processing Systems, pp. 3353–3364. Cited by: §2, §5.1, TABLE IV, TABLE V.
- (2016) Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition. In Proceedings of the 23rd ACM SIGSAC Conference on Computer and Communications Security, Cited by: §1.
- (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Cited by: §5.1.
- (2018) DARTS: deceiving autonomous cars with toxic signs. CoRR abs/1802.06430. External Links: Cited by: §1.
- (2017) A-nice-mc: adversarial training for mcmc. In Advances in Neural Information Processing Systems, pp. 5140–5150. Cited by: §2.
- (2018) PixelDefend: leveraging generative models to understand and defend against adversarial examples. In ICLR, Cited by: §1.
- (2019) Disentangling adversarial robustness and generalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6976–6987. Cited by: §1.
- (2014) Intriguing properties of neural networks. In ICLR, Cited by: §1, §2.
- (2016) A boundary tilting persepective on the phenomenon of adversarial examples. CoRR abs/1608.07690. Cited by: §1.
- (2018) Attacks meet interpretability: attribute-steered detection of adversarial samples. In Advances in Neural Information Processing Systems, pp. 7717–7728. Cited by: §2.
- (2002) Image segmentation by data-driven markov chain monte carlo. IEEE Transactions on pattern analysis and machine intelligence 24 (5), pp. 657–673. Cited by: §2.
- (2019) CamDrop: a new explanation of dropout and a guided regularization method for deep neural networks. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 1141â1149. Cited by: §2.
- (2020-06) Transferable, controllable, and inconspicuous adversarial attacks on person re-identification with deep mis-ranking. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §1.
- (2019) Detecting overfitting of deep generative networks via latent recovery. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11273–11282. Cited by: §1.
- (2011) Natural evolution strategies. External Links: Cited by: §5.6.2.
- (2018) Mitigating adversarial effects through randomization. In ICLR, Cited by: §2.
- (2019) Feature denoising for improving adversarial robustness. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 501–509. Cited by: §1, §2, §5.6.1, §5.6.2.
- (2019) Improving transferability of adversarial examples with input diversity. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2730–2739. Cited by: §5.3.
- (2018) Cooperative learning of energy-based model and latent variable model via mcmc teaching. In Thirty-Second AAAI Conference on Artificial Intelligence, Cited by: §2.
- (2017) Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1492–1500. Cited by: §5.1.
- (2018) Detecting adversarial perturbations with saliency. In 2018 IEEE 3rd International Conference on Signal and Image Processing (ICSIP), pp. 271–275. Cited by: §2.
- (2019) You only propagate once: accelerating adversarial training via maximal principle. In Advances in Neural Information Processing Systems, pp. 227–238. Cited by: §2, §5.1, §5.5, TABLE IV, TABLE V.
- (2019) Theoretically principled trade-off between robustness and accuracy. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, pp. 7472–7482. Cited by: §5.5.
- (2019) Distributionally adversarial attack. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 2253–2260. Cited by: §1, §2, §5.6.1, TABLE VIII.