AdvSPADE: Realistic Unrestricted Attacks for Semantic Segmentation
Due to the inherent robustness of segmentation models, traditional norm-bounded attack methods show limited effect on such type of models. In this paper, we focus on generating unrestricted adversarial examples for semantic segmentation models. We demonstrate a simple and effective method to generate unrestricted adversarial examples using conditional generative adversarial networks (CGAN) without any hand-crafted metric. The naïve implementation of CGAN, however, yields inferior image quality and low attack success rate. Instead, we leverage the SPADE (Spatially-adaptive denormalization) structure with an additional loss item to generate effective adversarial attacks in a single step. We validate our approach on the popular Cityscapes and ADE20K datasets, and demonstrate that our synthetic adversarial examples are not only realistic, but also improve the attack success rate by up to 41.0% compared with the state of the art adversarial attack methods including PGD.
Despite their impressive accuracy and wide adaption, deep learning (DL) models remain fragile to adversarial attacks [41, 5, 32], which raises serious concerns for deploying them into real-world applications, especially in safety and security-critical systems.
Extensive efforts have been made to combat these adversarial attacks: robust models are trained such that they are not easily evaded by adversarial examples [15, 33, 27]. Although these defense methods improve the models’ robustness, they are mostly limited to addressing norm-bounded attacks such as PGD , in which, human is not supposed to differentiate between original clean images and adversarial examples. However, two similar but non-identical images may also contain consistent semantic information from human perspective; e.g., a slightly rotated variant of an image is semantically similar to the original one . Since such changes (e.g., lighting condition, certain textures, rotation, etc.) do not interfere with the human perception, a robust ML model should also remain invariant to these realistic changes .
However, such realistic variants of the clean images tend to have high attack success rate for robust models designed to defend norm-bounded attacks . Thus, the realistic adversarial attacks beyond norm bound remain a major concern to those robust models, which spur extensive efforts to explore stronger and realistic adversarial attacks, e.g., using Wasserstein bound measurement , realistic image transformations  etc. In particular,  propose unrestricted adversarial attacks using conditional GAN for the image classification models, a big step toward realistic attacks beyond human crafted constrains. However, due to their model design, they are mostly restricted to low-resolution images—for high resolution, the generated images are not very realistic.
The problem of achieving realistic adversarial attacks and defenses aggravate further for more difficult visual recognition tasks such as semantic segmentation, where one needs to attack order of magnitude more pixels while achieving a consistent human perception. It is essential to make the segmentation models robust against adversarial attacks, especially due to their applicability in autonomous driving , medical imaging [36, 38], and computer-aided diagnose system . Unfortunately, we show that existing attack methods primarily designed for simple classification tasks do not generalize well to semantic segmentation. Besides, the inherent robustness of segmentation models  make existed adversarial attack methods less effective. For instance, following the work of , we show that for segmentation tasks the norm-bound perturbation becomes human visible since larger bounds are required for launching a successful attack. The unrestricted adversarial attack, on the other hand, is not constrained by the norm bounded budget, which can expose more vulnerabilities of a given machine learning model. However, the quality and resolution of the unrestricted adversarial images generated by those methods are low and limited to simple images like the handwritten digits.
In this paper, we present the first realistic unrestricted adversarial attack, AdvSPADE, for semantic segmentation models. Figure 1 illustrates the effectiveness of our proposed method. To generate realistic images, we use SPADE , a state-of-the-art conditional generative model to generate high-resolution images (up to 1 million pixels). We then add an adversarial loss term on the original SPADE architecture to fool the target model. Thus, we create a wide variety of adversarial examples from a single image in a single step. Empirical results show that we can generate realistic and more successful adversarial attacks than existing norm-bounded or GAN based attacks. We further show that augmenting the training data with such realistic adversarial examples have the potential to improve the models’ robustness.
This paper makes the following contributions: (1) We propose a new realistic attack for semantic segmentation which defeats the existing state-of-the-art robust models. We demonstrate the existence of a rich variety of unrestricted adversarial examples besides the previously known ones. (2) We demonstrate that augmenting the training dataset with our new adversarial examples have the potential of improving the robustness of existing models. (3) We present an empirical evaluation of our approach using two popular semantic segmentation dataset. First, we evaluate the quality of our generated adversarial examples using Amazon Mechanical Turk and demonstrate that our samples are indistinguishable to natural images for humans. Second, we further show that our adversarial samples improve the attack success rate up to .
2 Related Work
Semantic Segmentation. It can be considered as a multi-output classification task that provides more fine-granular information in the prediction . Although plenty of network architectures have been proposed to address semantic segmentation task efficiently [36, 25, 6, 3], very few studies [1, 48] look into the robustness of this class of networks against adversarial examples.
Adversarial Attacks. Adversarial examples are carefully crafted to mislead the DL models’ predictions, while still perceived identical or similar semantics with original images to the human. Researchers have proposed multiple methods for generating adversarial examples for the image classification tasks [15, 23, 5, 27], where the target model is fooled by the adversarial images. Hand-crafted metrics, such as norm bound [27, 14] and Wesstrasien distance , are applied to the generation process to preserve the semantic meaning of the adversarial examples to human.
Generative Adversarial Network (), popular for image manipulation [50, 24, 18, 53], image-to-image translation [20, 43, 35], etc., was also leveraged to generate adversarial examples. Xiao et al.  proposed to generate norm-bounded perturbation for the classification task. Song et al.  used to generate unrestricted adversarial attacks for image classification. By leveraging Auxiliary Classifier Generative Adversarial Network (-) , the model was able to generate low-quality adversarial examples from scratch and beyond any norm bound.  further proposed a new generative model, -, to learn a transformation between a pre-trained and an adversarial . Concurrent to our work,  leveraged to generate adaptive unrestricted examples for classification. Compared to the restricted image manipulation or local image editing attacks, one of the key features of these unrestricted attacks is that it can generate more realistic images semantically similar (may not be identical) to original benign images. Thus, these images can fool even the robust ML models that can withstand norm-bounded restricted adversarial examples.
However, these -based attack methods mainly focus on classification tasks, and as we will show in Section 5, do not generalize well for segmentation tasks. The one closest to us, Song et al. , adopt two-step procedures: first train - to generate benign images and then optimize an adversarial loss objective function w.r.t the input noise of -. In this setting, the variety of unrestricted adversarial examples is restricted by the dimension of , usually fixed to a small number (10,100,256). Thus, they tend not to work well in high-resolution datasets. In contrast, we combine the image generation and unrestricted adversarial image generation into a single stage by adding an adversarial loss item in its objective function, and optimize them together. This leads to a better attack success rate and a larger mIoU drop than Song’s method, as shown in Section 5.
A few studies focused on the adversarial attack on modern semantic segmentation networks.  conducted the first systematic analysis about the effect of multiple adversarial attack methods on different modern semantic segmentation network architectures across two large-scale datasets.  propose a new attack method called Dense Adversary Generation , which generates a group of adversarial examples for a bunch of state-of-the-art segmentation and detection deep networks. However, all of the attack methods rely on norm-bounded perturbations, which only cover a small fraction of all the feasible adversarial examples.  further advances the restricted attacks using surrogate loss functions for semantic segmentation to make non-differential IOU loss to be approximate differential, however, due to the expensive computationally, traditional norm-bounded attack methods such as , are more widely used in practise.
Defense Methods. Adversarial training is the state-of-the-art method for training robust classifiers [15, 26, 37, 41, 17, 34, 49, 27]. Besides,  evaluated other defense methods like the input transformation, including rescaling, JPEG compression, Gaussian blur, HSV jitter, grayscale against adversarial attack on semantic segmentation networks. These input transformation methods, however, were shown to rely on obfuscated gradients and give a false sense of robustness . Thus,  endorsed the robustness of the model trained with adversarial training.In this paper, we demonstrate that the robustness can be improved by training with examples generated by our method.
3 Generating Adversarial Examples
We will now introduce our methodology for generating unrestricted adversarial examples for semantic segmentation. For this purpose, we leverage a conditional Generative Adversarial Networks, SPADE. The main goal of a standard conditional GAN is to synthesize realistic images that will fool the discriminator. The generation of adversarial attack, however, also requires to fool the segmentation model under attack. Hence, we add an additional loss function to fool both the discriminator and the segmentation model. Figure 2 shows the overall workflow. The rest of the section describes the steps in details.
Unrestricted Adversarial Examples. Consider as a set of images and be the set of all possible categories for . Suppose is an oracle that can map any image from its domain to correctly. A classification model can also provide a class prediction for any given images in . Under the assumption that , an unrestricted adversarial example is any image which meets following requirements : , .
Conditional Generative Adversarial Networks. A Conditional Generative Adversarial Network  consists of a generator G and a discriminator D and they are both conditioned on auxiliary information . Combining random noise and extra information as input, G is able to map it to a realistic image. The discriminator aims to distinguish the real images and synthetic images from the Generator. G and D correspond to a minimax two-player game and can be formalized as
Unrestricted Adversarial Loss. We design an adversarial loss term for the unrestricted adversarial examples generation. We mainly focus on the untargeted attack in this paper though our approach is general and can be simply applied on targeted attack. Intuitively, the SPADE generator is trained to mislead the prediction of target segmentation network. The synthetic images are not only required to fool the discriminator for the conditional GAN but also need to be mis-segmented by the target segmentation network. To achieve this goal, we introduce the target segmentation network into the training phase and aim to maximize the loss of the segmentation model while keeping the quality and semantic meaning of the synthetic images. We denote the target segmentation network by , SPADE generator by , input semantic label by , and the input random vector by . We define the untargeted version of Unrestricted Adversarial Loss as follows:
We select Dice Loss  as the objective function . An image encoder processes a real image and generates a mean vector and a variance vector and then compute the noise input according to reparameterization trick .
The complete objective function of AdvSPADE then can be written as:
We follow the definition of feature matching loss and perceptual loss in . The feature matching loss, , is able to stabilizes the GAN training by minimizing the distance between features of synthesis and real images from multiple layers of the discriminator. The VGG perceptual loss () plays a similar role as by introducing a pretrained VGG network. For , we borrow the definition from . . It calculates the KL divergence between a prior distribution , a standard Gaussian distribution and a variational distribution . To speed up the generation process as well as the quality of synthesized images, we follow Spatially-adaptive denormalization, as proposed by .
Spatially-adaptive denormalization. Our model uses SPADE architecture  as the conditional GAN model, where the Batch Normalization  is replaced with Spatially-adaptive denormalization. This method is proved to maintain the semantic segmentation information which will get lost during the subsampling. Please refer to supplementary materials for details.
4 Experimental Set-up
Datasets. We evaluate AdvSPADE on two large image segmentation datasets: Citsycapes  and ADE20K . Cityscapes contains street view images from German cities and semantic classes, and it consists of training and validation images. ADE20K covers semantic classes in multiple real world scenes, where the training and validation set contains and images respectively.
Training. Following , we apply the Spectral Norm  in all layers of generator and discriminator. We train AdvSPADE with epochs on Cityscapes. For ADE20K, we run epochs (rather than epochs reported in  due to ADE20K’s large size and computation limits). We set the learning rate of the generator and discriminator both equal to and start to decay learning rate linearly from epoch when trained on ADE20K. We employ the ADAM  with , . In Equation 3, we set , , for both Cityscapes and ADE20K and for Cityscapes, for ADE20K respectively. All experiments are done on a single NVIDIA TITAN Xp GPU.
Baseline Models. We compare AdvSPADE generated attacks with traditional norm-bounded attacks in two settings: (i) real images with perturbation and (ii) GAN-generated clean images with perturbation. For (ii), we generate clean images with vanilla SPADE and then add norm-bounded perturbation over the synthetic images. For a better comparison, we choose the same segmentation networks as target networks for each dataset as  mentioned: DRN-D-105  for Cityscapes, Upernet-101 for ADE20K . Besides, we also select several state-of-the-art open source segmentation networks to evaluate the transferability of AdvSPADE as a black box setting: DRN-38, DRN-22 [51, 52], DeepLab-V3 , PSPNet-34-8s  for Cityscapes, PPM-18, MobilenetV2, Upernet-50, PPM-101  for ADE20K.
We also compare AdvSPADE with two GAN-based attacks [46, 39] on Cityscapes. For AdvGAN, we adapt the GAN structure of  for the segmentation task. First, we change the classification target network to the segmentation target network DRN-105 for Cityscapes. Second, we change the loss function of adversarial loss items from cross-entropy to dice loss for a fair comparison with ours. We reproduce their method with two different hyper-parameter settings () (see Table 5).
We implement Song et al.’s  method on SPADE. We first train a SPADE to generate clean images and fix SPADE parameters. Second, we apply their designed objective function, substitute target classification network to DRN-105, and change the loss function of adversarial loss item from cross-entropy to dice loss. We set the dimension of input random noise to 256. We run SPADE with 50 epochs and then run another 50 epochs to optimize adversarial loss on Cityscapes. We also test the transferability of Song et al.’s attack on a black-box setting (see Table 1).
Evaluation metric. Due to the dense output property of the semantic segmentation task, the evaluation of the attack success rate is different from that of the classification . Let be a set of RGB images with height and width and channel . Let be the set of semantic labels for the corresponding images from . Suppose is an oracle that can map any images from its domain which presents all images that look realistic to humans to correctly. A segmentation model can provide pixel-wise predictions for any given images in . We evaluate the following two categories of adversarial examples: 1) Given a constant and a hand-craft norm , a restricted adversarial example is an image that meets the following conditions: , , . (2) An unrestricted adversarial example is an image that meets following requirement: , . Here, stands for the prediction given by oracle and segmentation network at pixel respectively. is a hyperparameter that is set to .
Given the nature of semantic segmentation task, misclassifying a single pixel does not lead an image to fall into the class of adversarial examples. A legitimate adversarial example should have the property that the majority of pixels in it are misclassified (measured by mIoU score), and the adversarial image still looks realistic to humans (measured by FID score) with the same semantic meaning as the original images (measured by Amazon Turk). In particular, we use following three measures: 1. Mean Intersection-over-Union (mIoU): For measuring the effect of different attack methods on the target networks, we measure the drop in recognition accuracy using mIoU score which is widely used in semantic segmentation tasks [9, 55]—lower mIoU score means better adversarial example. 2. Fréchet Inception Distance (FID): We use FID  to compute the distance between the distribution of our adversarial examples and the distribution of the real images; small FID stands for the high quality of generated images. 3. Amazon Mechanical Turk (AMT): AMT is used to verify the success of our unrestricted adversarial attack. Here, we randomly select generated adversarial images under two experimental settings from each dataset to generate AMT assignments. Each assignment is answered by different workers and each worker has minutes to make decision. We use the result of a majority vote as each assignment’s final answer.
5 Experiment Result
|Attacks||Seg Model||Real Images||SPADE||Song et al ||AdvSPADE||Seg Model||Real Images||SPADE||AdvSPADE|
Evaluating Generated Adversarial Images. Here, we compare the adversarial images generated from the original real images and the clean synthetic images created by vanilla SPADE using mIoU (Table 1) and FID scores (Table 2). Table 1 shows that compared to vanilla SPADE, generated images under whitebox attack can lead to a giant decline on mIoU score (from to for DRN-105, from to for Upernet-101). On different network architectures, our adversarial examples can also decrease mIoU (around on Cityscapes, on ADE20K) showing strong transferability of our examples across models.
Compare to vanilla SPADE, the FID of our adversarial examples increases slightly ( to on Cityscapes, to on ADE20K, see Table 2) indicating our samples have comparable quality and variety. Note that we only train AdvSPADE half epochs as reported in  and achieve FID on ADE20K, still smaller than other leading semantic image synthetic models. Figure 3 shows qualitative results. Moreover, by introducing an image encoder and KL Divergence loss, we can generate multi-modal stylized adversarial examples which are shown in the supplementary materials.
|Bound Size||Real Images||Vanilla SPADE||Bound Size||Real Images||Vanilla SPADE|
Norm-bounded Adversarial attacks. We compare the attack success rate of AdvSPADE with the state-of-the-art norm bounded adversarial attacks, including FGSM and PGD [15, 27], for two dataset. We set the norm bound size to for both FGSM and PGD. For PGD, we follow the [23, 1] and set number of attack iterations to . We apply FGSM and PGD on both real images and synthetic images by vanilla SPADE, and compare their mIoU scores and FID with ours. The results (see Table 3) show that PGD and FGSM attacks can barely attack target networks with small bound size (). For instance, FGSM attack with bound size on real and vanilla SPADE generated images achieve attack success rate on DRN-105 network on Cityscapes. In contrast, AdvSPADE achieves high attack success rate ( and on DRN-105 and Upernet-101, respectively).
|DRN-105 (Cityscapes)||Upernet-101 (ADE20K)|
|Bound Size||Real Images||Vanilla SPADE||Bound Size||Real Images||Vanilla SPADE|
|FGSM||0.25||0.557||0.431 (63.354)||FGSM||0.25||0.346||0.286 (33.821)|
|1||0.408||0.355 (64.455)||1||0.278||0.221 (35.254)|
|8||0.196||0.152 (82.144)||8||0.178||0.152 (60.563)|
|32||0.009||0.009 (248.175)||32||0.070||0.048 (166.724)|
|PGD||0.25||0.557||0.431 (63.354)||PGD||0.25||0.346||0.286 (33.821)|
|1||0.339||0.287 (63.971)||1||0.276||0.181 (34.876)|
|8||0.036||0.022 (69.162)||8||0.070||0.022 (62.289)|
|32||0.013||0.009 (89.998)||32||0.013||0.007 (113.553)|
|AdvSPADE||0.01 (67.302)||AdvSPADE||0.011 (53.49)|
Table 4 further reveals that for both FGSM and PGD attack, to decrease the mIoU to the same level as AdvSPADE (mIoU = ), the generated perturbation becomes conspicuous () so that human can easily distinguish adversarial examples from clean images. FID also reflects the decline of adversarial images’ quality. Secondly, adversarial examples generated by FGSM and PGD attack can not make mIoU drop down to the same level as AdvSPADE if it is required to maintain the quality of the samples. Consider the adversarial samples on Cityscapes generated by vanilla SPADE and add perturbation with , their FID (64.455) is comparable with our samples, but mIoU () is much larger than ours (). Figure 4 illustrates the difference between AdvSPADE samples and norm-bounded samples on the same level of mIoU score. We can easily see the noise pattern in norm-bounded samples rather than in our examples.
We further compare the transferability of FGSM and PGD with our AdvSPADE on black-box setting and observe similar conclusion. The detailed results are shown in supplementary material.
GAN-based Adversarial Attack. Here, we compare the mIoU drop between Song’s unrestricted attack  and AdvSPADE on Cityscapes (see Table 1 ). The results show  generated examples can only lead a small mIoU drop on a white box setting (from 0.62 to 0.461), while AdvSPADE generated images drop to 0.010. The transferability of their examples under the blackbox setting is also limited—mIoU drop of 3% (Song et al.) vs. 15% (AdvSPADE), demonstrating limitations of their attacks.
|Method||Bound Size||Attack Success Rate||mIoU|
|( = -100)||8||7.8%||0.055|
We also compare the attack success rate and mIoU drop between AdvGAN  and AdvSPADE (see Table 5). Since AdvGAN can be considered a special norm-bounded attack method, we follow the same experimental setting for bound size with traditional norm-bounded attacks. With suggested hyper-parameter setting in , AdvGAN generated examples show a 0% attack success rate and only 0.351 mIoU drop even when bound size is as large as 32. After adjusting the hyper-parameter, generated adversarial examples can only attack target segmentation network successfully with large bound size (Attack success rate is 7.8%, mIoU is 0.055 at bound size is 8). However, in such a case, we can clearly see the noise pattern from adversarial examples. From our experimental results, it turns out that AdvGAN’s attack effectiveness is even worse than the traditional norm-bound attacks, showing that its applicability to only classification tasks.
|Method||FID||mIoU||Attack Success Rate|
|w/o Feature Matching Loss||79.76||0.027||57.6%|
|w/o VGG Loss||80.142||0.026||56.0%|
|w/o Adv Loss||62.939||0.62||0%|
Table 6 further shows an ablation study on Cityscapes Dataset to evaluate effectiveness of each component in AdvSPADE. The results show that both Feature Matching loss and VGG loss terms are important for the quality and effectiveness of adversarial examples generation. Removing any of them causes FID to increase, and attack success rate to drop. Besides, on removing adversarial loss term AdvSPADE degenerates to vanilla SPADE, which can only generate benign images.
Human Evaluation. Using Amazon Mechanical Turk (AMT), we evaluate how a human perceives generated adversarial images. A detailed result is presented in the Supplementary part. This is done in two settings:
(1) Semantic Consistency Test: If the semantic meanings of our adversarial examples are consistent with their respective ground truth labels, humans will segment the examples correctly. However, asking workers to segment every pixel is time-consuming and inefficient. Instead, we give AMT workers a pair of images: a generated adversarial image and a semantic label (half of the images pairs are matched, and rest are mismatched) and ask them if the semantic meaning of given synthetic image is consistent with the given semantic label. We notice that users can identify the semantic meaning of our adversarial examples precisely ( for Cityscapes, for ADE20K). This shows although a segmentation network completely fails to handle our adversarial examples, humans can successfully identify the semantic meaning, which proves our attack’s success.
(2) Fidelity AB Test: We compare the visual fidelity of AdvSPADE with vanilla SPADE. We give workers the semantic ground truth label and two generated images by AdvSPADE and vanilla SPADE respectively and ask them to select the more appropriate image corresponding to the ground truth label. and users favor our examples over vanilla SPADE for Cityscapes and ADE20K dataset indicating competitive visual fidelity of our adversarial images.
Robustness Evaluation. We first show that robust training with norm-bounded adversarial images can defend restricted adversarial attacks where perturbation is added either on real or synthetic images. However, our unrestricted adversarial examples can still attack these robust models successfully . We then present the experimental results of a more robust segmentation model built based on our unrestricted examples. We follow the training setting introduced by : we select PGD as the attack method and set the adversarial training epoch = on Cityscapes, on ADE20K, norm-bound size , attack iteration = , step size = . After the training phase, we use PGD with the same setting to generate norm-bounded perturbation and add it on both real and synthetic images by vanilla SPADE. We find that real and synthesized images with perturbation make mIoU decrease to and on robust DRN-105, and on robust Upernet-101, respectively. In contrast, our adversarial examples can achieve mIoU score on robust DRN-105, and on robust Upernet-101 indicating that our examples can successfully surpass the robust models trained with norm-bounded adversarial examples. Next, we train a model 50 epochs with our unrestricted adversarial examples on the Cityscapes dataset and then apply PGD to attack. The result shows that PGD attack can only achieve attack success rate on DRN-105. Since norm-bound examples are unknown for the robust model defended by our samples, the low success rate reflects models gain stronger robustness from adversarial training with AdvSPADE examples.
This paper explores the existence of adversarial examples beyond norm-bounded metric on the state-of-the-art semantic segmentation neural networks. By modifying the loss function of SPADE architecture, we are able to generate high quality unrestricted realistic adversarial examples, which mislead segmentation networks’ behavior. We demonstrate the effectiveness and robustness of our method by comparing with traditional norm-bounded attacks. We also show that our generated adversarial examples can easily surpass the state-of-the-art defense method, which raises new concerns about the security of segmentation networks.
-  (2018-06) On the robustness of semantic segmentation models to adversarial attacks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. External Links: Cited by: §1, §2, §2, §2, §5.
-  (2018) Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. External Links: Cited by: §2.
-  (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE transactions on pattern analysis and machine intelligence 39 (12), pp. 2481–2495. Cited by: §2.
-  (1981-08) Interpreting line drawings as three-dimensional surfaces. Artif. Intell. 17 (1-3), pp. 75–116. External Links: Cited by: §2.
-  (2017) Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy, SP 2017, San Jose, CA, USA, May 22-26, 2017, pp. 39–57. Cited by: §1, §2.
-  (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence 40 (4), pp. 834–848. Cited by: §2.
-  (2017) Rethinking atrous convolution for semantic image segmentation. External Links: Cited by: §4.
-  (2017) Houdini: fooling deep structured prediction models. arXiv preprint arXiv:1707.05373. Cited by: §2.
-  (2016) The cityscapes dataset for semantic urban scene understanding. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §4, §4.
-  (2019) Adaptive generation of unrestricted adversarial inputs. External Links: Cited by: §2.
-  (2017) A rotation and a translation suffice: fooling cnns with simple transformations. External Links: Cited by: §1.
-  (2017) Exploring the landscape of spatial robustness. External Links: Cited by: §1.
-  (2009) Segmentation-based urban traffic scene understanding.. In BMVC, Vol. 1, pp. 2. Cited by: §1.
-  (2017) Detecting adversarial samples from artifacts. ArXiv abs/1703.00410. Cited by: §2.
-  (2015) Explaining and harnessing adversarial examples. In International Conference on Learning Representations, External Links: Cited by: §1, §2, §2, §5, §5.
-  (2017) GANs trained by a two time-scale update rule converge to a local nash equilibrium. External Links: Cited by: §4.
-  (2015) Distilling the knowledge in a neural network. In NIPS Deep Learning and Representation Learning Workshop, External Links: Cited by: §2.
-  (2018) Learning hierarchical semantic image manipulation through structured representations. External Links: Cited by: §2.
-  (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. External Links: Cited by: §3.
-  (2017-07) Image-to-image translation with conditional adversarial networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). External Links: Cited by: §2.
-  (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §4.
-  (2013) Auto-encoding variational bayes. Note: cite arxiv:1312.6114 External Links: Cited by: §3.
-  (2016) Adversarial machine learning at scale. External Links: Cited by: §2, §5.
-  (2018) Context-aware synthesis and placement of object instances. In Advances in Neural Information Processing Systems 31, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), pp. 10393–10403. External Links: Cited by: §2.
-  (2015) Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440. Cited by: §2.
-  (2015-11) A unified gradient regularization family for adversarial examples. 2015 IEEE International Conference on Data Mining. External Links: Cited by: §2.
-  (2018) Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, External Links: Cited by: §1, §2, §2, §5, §5.
-  (2016) V-net: fully convolutional neural networks for volumetric medical image segmentation. In 2016 Fourth International Conference on 3D Vision (3DV), pp. 565–571. Cited by: §1.
-  (2014) Conditional generative adversarial nets. External Links: Cited by: §3.
-  (2018) Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957. Cited by: §4.
-  (2016) Conditional image synthesis with auxiliary classifier gans. In ICML, Cited by: §2.
-  (2016) Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. External Links: Cited by: §1.
-  (2016-05) Distillation as a defense to adversarial perturbations against deep neural networks. 2016 IEEE Symposium on Security and Privacy (SP). External Links: Cited by: §1.
-  (2017) Extending defensive distillation. External Links: Cited by: §2.
-  (2019) Semantic image synthesis with spatially-adaptive normalization. External Links: Cited by: §1, §2, §3, §3, §4, §4, §5.
-  (2015) U-net: convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, pp. 234–241. External Links: Cited by: §1, §2.
-  (2018) Understanding adversarial training: increasing local stability of supervised models through robust optimization. Neurocomputing 307, pp. 195–204. Cited by: §2.
-  (2018) Brain tumor segmentation using concurrent fully convolutional networks and conditional random fields. In Proceedings of the 3rd International Conference on Multimedia and Image Processing, pp. 24–30. Cited by: §1.
-  (2018) Constructing unrestricted adversarial examples with generative models. External Links: Cited by: §1, §2, §2, §3, §4, §4, §4, Table 1, §5.
-  (2017) Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. Lecture Notes in Computer Science, pp. 240–248. External Links: Cited by: §3.
-  (2013) Intriguing properties of neural networks. External Links: Cited by: §1, §2.
-  (2018) Deeptest: automated testing of deep-neural-network-driven autonomous cars. In Proceedings of the 40th international conference on software engineering, pp. 303–314. Cited by: §1.
-  (2018-06) High-resolution image synthesis and semantic manipulation with conditional gans. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. External Links: Cited by: §2, §3.
-  (2019) AT-gan: a generative attack model for adversarial transferring on generative adversarial nets. External Links: Cited by: §2.
-  (2019) Wasserstein adversarial examples via projected sinkhorn iterations. External Links: Cited by: §1, §2.
-  (2018-07) Generating adversarial examples with adversarial networks. Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence. External Links: Cited by: §2, §4, Table 5, §5.
-  (2018) Unified perceptual parsing for scene understanding. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 418–434. Cited by: §4.
-  (2017-10) Adversarial examples for semantic segmentation and object detection. 2017 IEEE International Conference on Computer Vision (ICCV). External Links: Cited by: §2, §2.
-  (2018) Feature squeezing: detecting adversarial examples in deep neural networks. Proceedings 2018 Network and Distributed System Security Symposium. External Links: Cited by: §2.
-  (2018) 3D-aware scene manipulation via inverse graphics. External Links: Cited by: §2.
-  (2017) Dilated residual networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 472–480. Cited by: §4.
-  (2016) Multi-scale context aggregation by dilated convolutions. In International Conference on Learning Representations (ICLR), Cited by: §4.
-  (2019) Free-form image inpainting with gated convolution. In Proceedings of the IEEE International Conference on Computer Vision, pp. 4471–4480. Cited by: §2.
-  (2017-07) Pyramid scene parsing network. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). External Links: Cited by: §4.
-  (2016) Semantic understanding of scenes through the ade20k dataset. arXiv preprint arXiv:1608.05442. Cited by: §4, §4.
-  (2018) Semantic understanding of scenes through the ade20k dataset. International Journal on Computer Vision. Cited by: §4.