AdvGAN++ : Harnessing latent layers for adversary generation
Adversarial examples are fabricated examples, indistinguishable from the original image that mislead neural networks and drastically lower their performance. Recently proposed AdvGAN, a GAN based approach, takes input image as a prior for generating adversaries to target a model. In this work, we show how latent features can serve as better priors than input images for adversary generation by proposing AdvGAN++, a version of AdvGAN that achieves higher attack rates than AdvGAN and at the same time generates perceptually realistic images on MNIST and CIFAR-10 datasets.
1 Introduction and Related Work
Deep Neural Networks(DNNs), now have become a common ingredient to solve various tasks dealing with classification, object recognition, segmentation, reinforcement learning, speech recognition etc. However recent works [18, 4, 15, 13, 19, 6] have shown that these DNNs can be easily fooled using carefully fabricated examples that are indistinguishable to original input. Such fabricated examples, knows as adversarial examples mislead the neural networks by drastically changing their latent features, thus affecting their output.
Adversarial attacks are broadly classified into White box and Black box attacks. White box attacks such as FGSM  and DeepFool  have access to the full target model. In contrary to this black box attacks like Carlini and Wagner. , the attacker does not have access to the structure or parameters of the target model, it only has access to the labels assigned for the selected input image.
Gradient based attack methods like Fast Gradient Sign Method (FGSM) obtains an optimal max-norm constrained perturbation of
where J is the cost function and gradient is calculated w.r.t to input example.
Optimization-based methods like Carlini Wagner  optimize the adversarial perturbations subject to several constraints. This approach targets , , distance metrics for attack purpose. The optimization objective used in the approach makes it slow as it can focus on one perturbation instance at a time.
In contrary to this, AdvGAN  used a GAN  with an encoder-decoder based generator to generate perceptually more realistic adversarial examples, close to original distribution. The generator network produces adversarial perturbation when an original image instance is provided as input. The discriminator tries to distinguish adversarial image with original instance . Apart from standard GAN loss, it uses hinge loss to bound the magnitude of maximum perturbation and an adversarial loss to guide the generation of image in adversarial way. Though, AdvGAN is able to generate the realistic examples, it fails to exploit latent features as priors which are shown to be more susceptible to the adversarial perturbations recently .
Our Contributions in this work are:
We show that the latent features serve as a better prior for adversarial generation than the whole input image for the untargeted attacks thereby utilizing the observation from  and at same time eliminating the need to follow encoder-decoder based architecture for generator, thus reducing training/inference overhead.
In the end, through quantitative and qualitative evaluation we show that our examples look perceptually very similar to the real ones and have higher attack success rates compared to AdvGAN.
2.1 Problem definition
Given a model that accurately maps image sampled from a distribution to its corresponding label , We train a generator to generate an adversary of image using its feature map (extracted from a feature extractor) as prior. Mathematically :
where , represents a feature extractor and is maximum magnitude perturbation allowed.
2.2 Harnessing latent features for adversary generation
We now propose our attack, AdvGAN++ which take latent feature map of original image as prior for adversary generation. Figure 1 shows the architecture of our proposed network. It contains the target model , a a feature extractor , generator network and a discriminator network . The generator receives feature of image and a noise vector (as a concatenated vector) and generates an adversary corresponding to . The discriminator distinguishes the distribution of generator output with actual distribution . In order to fool the target model , generator minimize , which represents the softmax-probability of adversary belonging to class . To bound the magnitude of perturbation, we also minimize loss between the adversary and . The final loss function is expressed as :
Here , are hyper-parameters to control the weight-age of each objective. The feature is extracted from one of the intermediate convolutional layers of target model . By solving the min-max game we obtain optimal parameters for and . The training procedure thus ensures that we learn to generate adversarial images close to input distribution that harness the susceptibility of latent features to adversarial perturbations. Algorithm 1 summarizes the training procedure of AdvGAN++.
|MNIST||Lenet C||FGSM Adv. training||18.7||20.02|
|Iter. FGSM training||13.5||27.31|
|CIFAR-10||Resnet-32||FGSM Adv. training||16.03||29.36|
|Iter. FGSM training||14.32||32.34|
|Wide-Resnet-34-10||FGSM Adv. training||14.26||26.12|
|Iter. FGSM training||13.94||43.2|
In this section we evaluate the performance of AdvGAN++, both quantitatively and qualitatively. We start by describing datasets and model-architectures followed by implementation details and results.
Datasets and Model Architectures: We perform experiments on MNIST and CIFAR-10 datasets wherein we train AdvGAN++ using training set and do evaluations on test set. We follow Lenet architecture C from  for MNIST as our target model. For CIFAR-10, we show our results on Resnet-32  and Wide-Resnet-34-10 .
3.1 Implementation details
We use an encoder and decoder based architecture of discriminator and generator respectively. For feature extractor we use the last convolutional layer of our target model . Adam optimizer with learning rate 0.01 and = 0.5 and = 0.99 is used for optimizing generator and discriminator. We sample the noise vector from a normal distribution and use label smoothing to stabilize the training procedure.
Attack under no defense We compare the attack success rate of examples generated by AdvGAN and AdvGAN++ on target models without using any defense strategies on them. The results in table 2 shows that with much less training/inference overhead, AdvGAN++ performs better than AdvGAN.
Attack under defense We perform experiment to compare the attack success rate of AdvGAN++ with AdvGAN when target model is trained using various defense mechanism such as FGSM , iterative FGSM  and ensemble adversarial training . For this, we first generate adversarial examples using original model as target (without any defense) and then evaluate the attack success rate of these adversarial examples on same model, now trained using one of the aforementioned defense strategies. Table 1 shows that AdvGAN++ performs better than the AdvGAN under various defense environment.
Visual results Figure 2 shows the adversarial images generated by AdvGAN++ on MNIST and CIFAR-10 datasets. It shows the ability of AdvGAN++ to generate perceptually realistic adversarial images.
Transferability to other models Table 3 shows attack success rate of adversarial examples generated by AdvGAN++ and evaluated on different model doing the same task. From the table we can see that the adversaries produced by AdvGAN++ are significantly transferable to other models performing the same task which can also be used to attack a model in a black-box fashion.
|Data||Target Model||Other Model||Attack Success rate|
|MNIST||LeNet C||LeNet B ||20.24|
In our work, we study the gaps left by AdvGAN  mainly focusing on the observation  that latent features are more prone to alteration by adversarial noise as compared to the input image. This not only reduces training time but also increases attack success rate. This vulnerability of latent features made them a better candidate for being the starting point for generation and allowed us to propose a generator that could directly convert latent features to the adversarial image.
- (2017) Towards evaluating the robustness of neural networks.. In Security and Privacy (SP), 2017 IEEE Symposium on, pp. 39â57. Cited by: §1, §1.
- (2014) Generative adversarial networks. External Links: Cited by: §1.
- (2015) Explaining and harnessing adversarial examples.. In International Conference on LearningRepresentations,. Cited by: §1, §3.2.
- (2017) Adversarial examples for malware detection. In Computer Security – ESORICS 2017, S. N. Foley, D. Gollmann and E. Snekkenes (Eds.), Cham, pp. 62–79. External Links: Cited by: §1.
- (2015) Deep residual learning for image recognition. External Links: Cited by: §3.
- (2017) Adversarial attacks on neural network policies. CoRR abs/1702.02284. External Links: Cited by: §1.
- (2016) Image-to-image translation with conditional adversarial networks. External Links: Cited by: 2nd item.
- () CIFAR-10 (canadian institute for advanced research). CoRRCoRRCoRR. External Links: Cited by: §3.2, §3.
- (2016) Adversarial examples in the physical world. abs/1607.02533. External Links: Cited by: §3.2.
- (2010) MNIST handwritten digit database. Note: http://yann.lecun.com/exdb/mnist/ External Links: Cited by: §3.2, §3.
- (2014) Conditional generative adversarial nets. External Links: Cited by: 2nd item.
- (2016) âDeepfool: a simple and accurate method to fool deep neural networks,â. IEEE Conference on Computer Vision and Pattern Recognition (CVPR),. Cited by: §1.
- (2016) Accessorize to a crime: real and stealthy attacks on state-of-the-art face recognition. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, CCS ’16, New York, NY, USA, pp. 1528–1540. External Links: Cited by: §1.
- (2019) Harnessing the vulnerability of latent layers in adversarially trained models. External Links: Cited by: 1st item, §1, §4.
- (2018) Targeted adversarial examples for black box audio systems. CoRR abs/1805.07820. External Links: Cited by: §1.
- (2017) Ensemble adversarial training: attacks and defenses. arXiv preprint arXiv:1705.07204. Cited by: §3.2, Table 3, §3.
- (2018) Generating adversarial examples with adversarial networks. IJCAI. Cited by: §1, §4.
- (2017) Adversarial examples for semantic segmentation and object detection. In International Conference on Computer Vision, Cited by: §1.
- (2018) Adaptive adversarial attack on scene text recognition. CoRR abs/1807.03326. External Links: Cited by: §1.
- (2016) Wide residual networks. External Links: Cited by: §3.