On the Robustness of the CVPR 2018 White-Box Adversarial Example Defenses

On the Robustness of the CVPR 2018 White-Box Adversarial Example Defenses

Anish Athalye    Nicholas Carlini
Abstract

Neural networks are known to be vulnerable to adversarial examples. In this note, we evaluate the two white-box defenses that appeared at CVPR 2018 and find they are ineffective: when applying existing techniques, we can reduce the accuracy of the defended models to 0%.

Machine Learning, ICML
\pdfstringdefDisableCommands

1 Introduction

Training neural networks so they will be robust to adversarial examples (Szegedy et al., 2013) is a major challenge. Two defenses that appear at CVPR 2018 attempt to address this problem: “Deflecting Adversarial Attacks with Pixel Deflection” (Prakash et al., 2018) and “Defense against Adversarial Attacks Using High-Level Representation Guided Denoiser” (Liao et al., 2018).

In this note, we show these two defenses are not effective in the white-box threat model. We construct adversarial examples that reduce the classifier accuracy to on the ImageNet dataset (Deng et al., 2009) when bounded by a small perturbation of , a stricter bound than considered in the original papers. Our attacks can construct targeted adversarial examples with over success.

Our methods are a direct application of existing techniques.

2 Background

We assume familiarity with neural networks, adversarial examples (Szegedy et al., 2013), generating strong attacks against adversarial examples (Madry et al., 2018), and computing adversarial examples for neural networks with non-differentiable layers (Athalye et al., 2018). We briefly review the key details and notation.

Adversarial examples (Szegedy et al., 2013) are instances that are very close to an instance with respect to some distance metric ( distance, in this paper), but where the classification of is not the same as the classification of . Targeted adversarial examples are instances whose label is equal to a given target label .

We examine two defenses: Pixel Deflection and High-level Representation Guided Denoiser. We are grateful to the authors of these defenses for releasing their source code and pre-trained models.

Figure 1: Original images from ImageNet validation set (row 1). Targeted adversarial examples (with randomly chosen targets) for Pixel Deflection (row 2) and High-level representation Guided Denoiser (row 3), with a perturbation of .

Pixel Deflection (Prakash et al., 2018) proposes a non-differentiable preprocessing of inputs. Some pixels (a tunable hyperparameter) are randomly replaced with near-by pixels. This resulting image is often noisy, and to restore accuracy, a denoising operation is applied.

High-level representation Guided Denoiser (HGR) (Liao et al., 2018) proposes denoising inputs using a trained neural network before passing them to a standard classifier. This denoiser is a differentiable, non-randomized neural network. This defense has also been evaluated by Uesato et al. (2018) and found to be ineffective.

2.1 Methods

We evaluate these defenses under the white-box threat model. We generate adversarial examples with Projected Gradient Descent (PGD) (Madry et al., 2018) maximizing the cross-entropy loss and bounding distortion by .

What is the right threat model to evaluate against? Many papers only claim white-box security against an attacker who is completely unaware the defense is being applied. HGD, for example, says “the white-box attacks defined in this paper should be called oblivious attacks according to Carlini and Wagner’s definition” (Liao et al., 2018).

Unfortunately, security against oblivious attacks is not useful. We only defined this threat model in our prior work (Carlini & Wagner, 2017) to study the case of an extremely weak attacker, to show that some defenses are not even robust under this model. Furthermore, many previously published schemes already achieve security against oblivious attacks. In practice, any serious attacker would certainly consider the possibility that a defense is in place and try to circumvent it, if there is a reasonable way to do so.

Thus, security against oblivious attacks is far from sufficient to be interesting or useful in practice. Even the black-box threat model allows for an attacker to be aware that the defense is being applied, and only holds the exact parameters of the defense as private data. Also, our experience is that schemes that are insecure against white-box attacks also tend to be insecure against black-box attacks (Carlini & Wagner, 2017). Accordingly, in this note, we evaluate schemes against white-box attacks.

3 Methodology

3.1 Pixel Deflection

We now show that Pixel Deflection is not robust. We analyze the defense as implemented by the authors 111https://github.com/iamaaditya/pixel-deflection. Our evaluation code is publicly available 222https://github.com/carlini/pixel-deflection.

We apply BPDA (Athalye et al., 2018) to Pixel Deflection for its non-differentiable replacement operation. Our attack reduces the accuracy of the defended classifier to .

In a targeted setting, we succeed with probability. (Because the defense is randomized, we report success only if the image is classified as the adversarial target label times out of .)

3.2 High-Level Representation Guided Denoiser

Next, we show that using a High-level representation Guided Denoiser is not robust in the white-box threat model. We analyze the defense as implemented by the authors 333https://github.com/lfz/Guided-Denoise. Our evaluation code is publicly available 444https://github.com/anishathalye/Guided-Denoise.

We apply PGD (Madry et al., 2018) end-to-end with no modification. It reduces the accuracy of the defended classifier to and achieves success at generating targeted adversarial examples.

4 Conclusion

As this note demonstrates, Pixel Deflection and High-level representation Guided Denoiser (HGD) are not robust to adversarial examples.

Acknowledgements

We are grateful to Aleksander Madry and David Wagner for comments on an early draft of this paper.

We thank Aaditya Prakash and Fangzhou Liao for discussing their defenses with us, and we thank the authors of both papers for releasing source code and pre-trained models.

References

  • Athalye et al. (2018) Athalye, Anish, Carlini, Nicholas, and Wagner, David. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420, 2018.
  • Carlini & Wagner (2017) Carlini, Nicholas and Wagner, David. Adversarial examples are not easily detected: Bypassing ten detection methods. AISec, 2017.
  • Deng et al. (2009) Deng, Jia, Dong, Wei, Socher, Richard, Li, Li-Jia, Li, Kai, and Fei-Fei, Li. Imagenet: A large-scale hierarchical image database. In CVPR, pp. 248–255. IEEE, 2009.
  • Liao et al. (2018) Liao, Fangzhou, Liang, Ming, Dong, Yinpeng, Pang, Tianyu, Zhu, Jun, and Hu, Xiaolin. Defense against adversarial attacks using high-level representation guided denoiser. In CVPR, 2018.
  • Madry et al. (2018) Madry, Aleksander, Makelov, Aleksandar, Schmidt, Ludwig, Tsipras, Dimitris, and Vladu, Adrian. Towards deep learning models resistant to adversarial attacks. International Conference on Learning Representations, 2018.
  • Prakash et al. (2018) Prakash, Aaditya, Moran, Nick, Garber, Solomon, DiLillo, Antonella, and Storer, James. Deflecting adversarial attacks with pixel deflection. In CVPR, 2018.
  • Szegedy et al. (2013) Szegedy, Christian, Zaremba, Wojciech, Sutskever, Ilya, Bruna, Joan, Erhan, Dumitru, Goodfellow, Ian, and Fergus, Rob. Intriguing properties of neural networks. ICLR, 2013.
  • Uesato et al. (2018) Uesato, Jonathan, O’Donoghue, Brendan, Oord, Aaron van den, and Kohli, Pushmeet. Adversarial risk and the dangers of evaluating against weak attacks. arXiv preprint arXiv:1802.05666, 2018.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
230573
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description