AdvSPADE: Realistic Unrestricted Attacks for Semantic Segmentation

AdvSPADE: Realistic Unrestricted Attacks for Semantic Segmentation

Guangyu Shen
Purdue University
shen447@purdue.edu
   Chengzhi Mao
Columbia University
cm3797@columbia.edu
   Junfeng Yang
Columbia University
junfeng@cs.columbia.edu
   Baishakhi Ray
Columbia University
rayb@cs.columbia.edu
Abstract

Due to the inherent robustness of segmentation models, traditional norm-bounded attack methods show limited effect on such type of models. In this paper, we focus on generating unrestricted adversarial examples for semantic segmentation models. We demonstrate a simple and effective method to generate unrestricted adversarial examples using conditional generative adversarial networks (CGAN) without any hand-crafted metric. The naïve implementation of CGAN, however, yields inferior image quality and low attack success rate. Instead, we leverage the SPADE (Spatially-adaptive denormalization) structure with an additional loss item to generate effective adversarial attacks in a single step. We validate our approach on the popular Cityscapes and ADE20K datasets, and demonstrate that our synthetic adversarial examples are not only realistic, but also improve the attack success rate by up to 41.0% compared with the state of the art adversarial attack methods including PGD.

1 Introduction

Figure 1: Illustrating the effectiveness of generated unrestricted adversarial example compared to norm bounded attacks. The 1 column shows a real image from Cityscapes dataset and its prediction result of DRN-105 segmentation network. The 2 and 3 columns show the result of applying PGD and AdvGAN generated perturbation to real images, respectively. In the 4 column, PGD is applied to a synthesized image generated by SPADE, while the 5 column shows AdvSPADE’s unrestricted adversarial image that completely fools DRN-105. Note that, there are conspicuous noises in the 2, 3 and 4 column images. The segmentation results are still good for the 1, 2 and 4 columns. While 3 column shows better attack performance than traditional norm-bounded attacks, it is still worse than ours. In contrast, in our case (5 column) the image is free from noise, it preserves the objects critical to driving (e.g., street lamps, cars, buildings), with slight differences in colors and textures that do not affect the semantics for a human driver. However, the prediction of the segmentation model is totally wrong. This demonstrates the effectiveness of our unrestricted adversarial attacks for segmentation models.

Despite their impressive accuracy and wide adaption, deep learning (DL) models remain fragile to adversarial attacks [41, 5, 32], which raises serious concerns for deploying them into real-world applications, especially in safety and security-critical systems.

Extensive efforts have been made to combat these adversarial attacks: robust models are trained such that they are not easily evaded by adversarial examples  [15, 33, 27]. Although these defense methods improve the models’ robustness, they are mostly limited to addressing norm-bounded attacks such as PGD  [27], in which, human is not supposed to differentiate between original clean images and adversarial examples. However, two similar but non-identical images may also contain consistent semantic information from human perspective; e.g., a slightly rotated variant of an image is semantically similar to the original one [12]. Since such changes (e.g., lighting condition, certain textures, rotation, etc.) do not interfere with the human perception, a robust ML model should also remain invariant to these realistic changes [42].

However, such realistic variants of the clean images tend to have high attack success rate for robust models designed to defend norm-bounded attacks [39]. Thus, the realistic adversarial attacks beyond norm bound remain a major concern to those robust models, which spur extensive efforts to explore stronger and realistic adversarial attacks, e.g., using Wasserstein bound measurement [45], realistic image transformations [11] etc. In particular, [39] propose unrestricted adversarial attacks using conditional GAN for the image classification models, a big step toward realistic attacks beyond human crafted constrains. However, due to their model design, they are mostly restricted to low-resolution images—for high resolution, the generated images are not very realistic.

The problem of achieving realistic adversarial attacks and defenses aggravate further for more difficult visual recognition tasks such as semantic segmentation, where one needs to attack order of magnitude more pixels while achieving a consistent human perception. It is essential to make the segmentation models robust against adversarial attacks, especially due to their applicability in autonomous driving [13], medical imaging [36, 38], and computer-aided diagnose system [28]. Unfortunately, we show that existing attack methods primarily designed for simple classification tasks do not generalize well to semantic segmentation. Besides, the inherent robustness of segmentation models [1] make existed adversarial attack methods less effective. For instance, following the work of [1], we show that for segmentation tasks the norm-bound perturbation becomes human visible since larger bounds are required for launching a successful attack. The unrestricted adversarial attack, on the other hand, is not constrained by the norm bounded budget, which can expose more vulnerabilities of a given machine learning model. However, the quality and resolution of the unrestricted adversarial images generated by those methods are low and limited to simple images like the handwritten digits.

In this paper, we present the first realistic unrestricted adversarial attack, AdvSPADE, for semantic segmentation models. Figure 1 illustrates the effectiveness of our proposed method. To generate realistic images, we use SPADE [35], a state-of-the-art conditional generative model to generate high-resolution images (up to 1 million pixels). We then add an adversarial loss term on the original SPADE architecture to fool the target model. Thus, we create a wide variety of adversarial examples from a single image in a single step. Empirical results show that we can generate realistic and more successful adversarial attacks than existing norm-bounded or GAN based attacks. We further show that augmenting the training data with such realistic adversarial examples have the potential to improve the models’ robustness.

This paper makes the following contributions: (1) We propose a new realistic attack for semantic segmentation which defeats the existing state-of-the-art robust models. We demonstrate the existence of a rich variety of unrestricted adversarial examples besides the previously known ones. (2) We demonstrate that augmenting the training dataset with our new adversarial examples have the potential of improving the robustness of existing models. (3) We present an empirical evaluation of our approach using two popular semantic segmentation dataset. First, we evaluate the quality of our generated adversarial examples using Amazon Mechanical Turk and demonstrate that our samples are indistinguishable to natural images for humans. Second, we further show that our adversarial samples improve the attack success rate up to .

2 Related Work

Semantic Segmentation. It can be considered as a multi-output classification task that provides more fine-granular information in the prediction  [4]. Although plenty of network architectures have been proposed to address semantic segmentation task efficiently  [36, 25, 6, 3], very few studies [1, 48] look into the robustness of this class of networks against adversarial examples.

Adversarial Attacks. Adversarial examples are carefully crafted to mislead the DL models’ predictions, while still perceived identical or similar semantics with original images to the human. Researchers have proposed multiple methods for generating adversarial examples for the image classification tasks [15, 23, 5, 27], where the target model is fooled by the adversarial images. Hand-crafted metrics, such as norm bound  [27, 14] and Wesstrasien distance  [45], are applied to the generation process to preserve the semantic meaning of the adversarial examples to human.

Generative Adversarial Network (), popular for image manipulation  [50, 24, 18, 53], image-to-image translation [20, 43, 35], etc., was also leveraged to generate adversarial examples. Xiao et al. [46] proposed to generate norm-bounded perturbation for the classification task. Song et al. [39] used to generate unrestricted adversarial attacks for image classification. By leveraging Auxiliary Classifier Generative Adversarial Network (-[31], the model was able to generate low-quality adversarial examples from scratch and beyond any norm bound. [44] further proposed a new generative model, -, to learn a transformation between a pre-trained and an adversarial . Concurrent to our work, [10] leveraged to generate adaptive unrestricted examples for classification. Compared to the restricted image manipulation or local image editing attacks, one of the key features of these unrestricted attacks is that it can generate more realistic images semantically similar (may not be identical) to original benign images. Thus, these images can fool even the robust ML models that can withstand norm-bounded restricted adversarial examples.

However, these -based attack methods mainly focus on classification tasks, and as we will show in Section 5, do not generalize well for segmentation tasks. The one closest to us, Song et al. [39], adopt two-step procedures: first train - to generate benign images and then optimize an adversarial loss objective function w.r.t the input noise of -. In this setting, the variety of unrestricted adversarial examples is restricted by the dimension of , usually fixed to a small number (10,100,256). Thus, they tend not to work well in high-resolution datasets. In contrast, we combine the image generation and unrestricted adversarial image generation into a single stage by adding an adversarial loss item in its objective function, and optimize them together. This leads to a better attack success rate and a larger mIoU drop than Song’s method, as shown in Section 5.

A few studies focused on the adversarial attack on modern semantic segmentation networks.  [1] conducted the first systematic analysis about the effect of multiple adversarial attack methods on different modern semantic segmentation network architectures across two large-scale datasets.  [48] propose a new attack method called Dense Adversary Generation , which generates a group of adversarial examples for a bunch of state-of-the-art segmentation and detection deep networks. However, all of the attack methods rely on norm-bounded perturbations, which only cover a small fraction of all the feasible adversarial examples. [8] further advances the restricted attacks using surrogate loss functions for semantic segmentation to make non-differential IOU loss to be approximate differential, however, due to the expensive computationally, traditional norm-bounded attack methods such as , are more widely used in practise.

Defense Methods. Adversarial training is the state-of-the-art method for training robust classifiers [15, 26, 37, 41, 17, 34, 49, 27]. Besides, [1] evaluated other defense methods like the input transformation, including rescaling, JPEG compression, Gaussian blur, HSV jitter, grayscale against adversarial attack on semantic segmentation networks. These input transformation methods, however, were shown to rely on obfuscated gradients and give a false sense of robustness  [2]. Thus,  [2] endorsed the robustness of the model trained with adversarial training.In this paper, we demonstrate that the robustness can be improved by training with examples generated by our method.

3 Generating Adversarial Examples

Figure 2: Illustration of proposed AdvSPADE Architecture for generating unrestricted adversarial examples. Image encoder takes real images as input to compute mean and variance vectors () and apply reparameterization trick to generate random noise . SPADE generator considers and semantic labels and generates synthetic images . Next, s are fed into a fixed pre-trained target segmentation network and encouraged to mislead ’s predictions by maximizing adversarial loss between predictions and semantic labels. Meanwhile, , as SPADE discriminator ’s input, also aims to fool . is trained to reliably distinguish between generated () and real images . Random sampled brings randomness into the model so that can generate various adversarial examples. Notice that, due to the adversarial loss item, prediction results at the top left corner of the figure are completely mis-segmented.

We will now introduce our methodology for generating unrestricted adversarial examples for semantic segmentation. For this purpose, we leverage a conditional Generative Adversarial Networks, SPADE. The main goal of a standard conditional GAN is to synthesize realistic images that will fool the discriminator. The generation of adversarial attack, however, also requires to fool the segmentation model under attack. Hence, we add an additional loss function to fool both the discriminator and the segmentation model. Figure 2 shows the overall workflow. The rest of the section describes the steps in details.

Unrestricted Adversarial Examples. Consider as a set of images and be the set of all possible categories for . Suppose is an oracle that can map any image from its domain to correctly. A classification model can also provide a class prediction for any given images in . Under the assumption that , an unrestricted adversarial example is any image which meets following requirements  [39]: , .

Conditional Generative Adversarial Networks. A Conditional Generative Adversarial Network  [29] consists of a generator G and a discriminator D and they are both conditioned on auxiliary information . Combining random noise and extra information as input, G is able to map it to a realistic image. The discriminator aims to distinguish the real images and synthetic images from the Generator. G and D correspond to a minimax two-player game and can be formalized as

Unrestricted Adversarial Loss. We design an adversarial loss term for the unrestricted adversarial examples generation. We mainly focus on the untargeted attack in this paper though our approach is general and can be simply applied on targeted attack. Intuitively, the SPADE generator is trained to mislead the prediction of target segmentation network. The synthetic images are not only required to fool the discriminator for the conditional GAN but also need to be mis-segmented by the target segmentation network. To achieve this goal, we introduce the target segmentation network into the training phase and aim to maximize the loss of the segmentation model while keeping the quality and semantic meaning of the synthetic images. We denote the target segmentation network by , SPADE generator by , input semantic label by , and the input random vector by . We define the untargeted version of Unrestricted Adversarial Loss as follows:

(1)

We select Dice Loss  [40] as the objective function . An image encoder processes a real image and generates a mean vector and a variance vector and then compute the noise input according to reparameterization trick  [22].

(2)

The complete objective function of AdvSPADE then can be written as:

(3)

We follow the definition of feature matching loss and perceptual loss in [43]. The feature matching loss, , is able to stabilizes the GAN training by minimizing the distance between features of synthesis and real images from multiple layers of the discriminator. The VGG perceptual loss () plays a similar role as by introducing a pretrained VGG network. For , we borrow the definition from  [35]. . It calculates the KL divergence between a prior distribution , a standard Gaussian distribution and a variational distribution . To speed up the generation process as well as the quality of synthesized images, we follow Spatially-adaptive denormalization, as proposed by  [35].

Spatially-adaptive denormalization. Our model uses SPADE architecture  [35] as the conditional GAN model, where the Batch Normalization  [19] is replaced with Spatially-adaptive denormalization. This method is proved to maintain the semantic segmentation information which will get lost during the subsampling. Please refer to supplementary materials for details.

4 Experimental Set-up

Datasets. We evaluate AdvSPADE on two large image segmentation datasets: Citsycapes [9] and ADE20K [55]. Cityscapes contains street view images from German cities and semantic classes, and it consists of training and validation images. ADE20K covers semantic classes in multiple real world scenes, where the training and validation set contains and images respectively.

Training. Following [35], we apply the Spectral Norm [30] in all layers of generator and discriminator. We train AdvSPADE with epochs on Cityscapes. For ADE20K, we run epochs (rather than epochs reported in [35] due to ADE20K’s large size and computation limits). We set the learning rate of the generator and discriminator both equal to and start to decay learning rate linearly from epoch when trained on ADE20K. We employ the ADAM [21] with , . In Equation 3, we set , , for both Cityscapes and ADE20K and for Cityscapes, for ADE20K respectively. All experiments are done on a single NVIDIA TITAN Xp GPU.

Baseline Models. We compare AdvSPADE generated attacks with traditional norm-bounded attacks in two settings: (i) real images with perturbation and (ii) GAN-generated clean images with perturbation. For (ii), we generate clean images with vanilla SPADE and then add norm-bounded perturbation over the synthetic images. For a better comparison, we choose the same segmentation networks as target networks for each dataset as [35] mentioned: DRN-D-105 [51] for Cityscapes, Upernet-101 for ADE20K [47]. Besides, we also select several state-of-the-art open source segmentation networks to evaluate the transferability of AdvSPADE as a black box setting: DRN-38, DRN-22 [51, 52], DeepLab-V3 [7], PSPNet-34-8s [54] for Cityscapes, PPM-18, MobilenetV2, Upernet-50, PPM-101 [56] for ADE20K.

We also compare AdvSPADE with two GAN-based attacks  [46, 39] on Cityscapes. For AdvGAN, we adapt the GAN structure of [46] for the segmentation task. First, we change the classification target network to the segmentation target network DRN-105 for Cityscapes. Second, we change the loss function of adversarial loss items from cross-entropy to dice loss for a fair comparison with ours. We reproduce their method with two different hyper-parameter settings () (see Table 5).

We implement Song et al.’s [39] method on SPADE. We first train a SPADE to generate clean images and fix SPADE parameters. Second, we apply their designed objective function, substitute target classification network to DRN-105, and change the loss function of adversarial loss item from cross-entropy to dice loss. We set the dimension of input random noise to 256. We run SPADE with 50 epochs and then run another 50 epochs to optimize adversarial loss on Cityscapes. We also test the transferability of Song et al.’s attack on a black-box setting (see Table 1).

Figure 3: Visual Results on ADE20K. As we can see, the semantic meaning of the generated adversarial examples are well aligned to the original image, but are different in the style and mis-predicted by the segmentation model. For example, the color and strip on bed in the first two columns are changed, but human still perceive them as bed while the segmentation model predict the wrong label. The results demonstrate the effectiveness of our method for generating realistic adversarial examples that mislead the target model.

Evaluation metric. Due to the dense output property of the semantic segmentation task, the evaluation of the attack success rate is different from that of the classification  [39]. Let be a set of RGB images with height and width and channel . Let be the set of semantic labels for the corresponding images from . Suppose is an oracle that can map any images from its domain which presents all images that look realistic to humans to correctly. A segmentation model can provide pixel-wise predictions for any given images in . We evaluate the following two categories of adversarial examples: 1) Given a constant and a hand-craft norm , a restricted adversarial example is an image that meets the following conditions: , , . (2) An unrestricted adversarial example is an image that meets following requirement: , . Here, stands for the prediction given by oracle and segmentation network at pixel respectively. is a hyperparameter that is set to .

Given the nature of semantic segmentation task, misclassifying a single pixel does not lead an image to fall into the class of adversarial examples. A legitimate adversarial example should have the property that the majority of pixels in it are misclassified (measured by mIoU score), and the adversarial image still looks realistic to humans (measured by FID score) with the same semantic meaning as the original images (measured by Amazon Turk). In particular, we use following three measures: 1. Mean Intersection-over-Union (mIoU): For measuring the effect of different attack methods on the target networks, we measure the drop in recognition accuracy using mIoU score which is widely used in semantic segmentation tasks  [9, 55]—lower mIoU score means better adversarial example. 2. Fréchet Inception Distance (FID): We use FID  [16] to compute the distance between the distribution of our adversarial examples and the distribution of the real images; small FID stands for the high quality of generated images. 3. Amazon Mechanical Turk (AMT): AMT is used to verify the success of our unrestricted adversarial attack. Here, we randomly select generated adversarial images under two experimental settings from each dataset to generate AMT assignments. Each assignment is answered by different workers and each worker has minutes to make decision. We use the result of a majority vote as each assignment’s final answer.

5 Experiment Result

Cityscapes ADE20K
Attacks Seg Model Real Images SPADE Song et al [39] AdvSPADE Seg Model Real Images SPADE AdvSPADE
Whitebox DRN-105 0.756 0.620 0.461 0.010 Upernet-101 0.420 0.403 0.011
Transfer- DRN-38 0.714 0.551 0.520 0.407 MobilenetV2 0.348 0.317 0.110
based DRN-22 0.68 0.526 0.489 0.387 PPM-18 0.340 0.362 0.102
blackbox DeepLab-V3 0.68 0.54 0.501 0.425 Upernet-50 0.404 0.395 0.096
PSPNet-34-8s 0.691 0.529 0.495 0.441 PPM-101 0.422 0.409 0.078
Table 1: The effectiveness of our proposed attack (AdvSPADE) under mIoU metric under both whitebox and transfer based blackbox attacks. Lower mIoU means better attack. Results show that AdvSPADE successfully misleads the segmentation models while both the real and synthetic images are predicted correctly by the models.
\diagboxDatasetModel Vanilla SPADE Pix2PixHD CRN AdvSPADE
Cityscapes 62.939 95.0 104.7 67.302
ADE20K 33.9 81.8 73.3 53.49
Table 2: FID Comparison between our AdvSPADE and state-of-art semantic image synthesis models. The results show that AdvSPADE outperforms Pix2PixHD and CRN and achieve comparable FID with vanilla SPADE on Cityscapes.

Evaluating Generated Adversarial Images. Here, we compare the adversarial images generated from the original real images and the clean synthetic images created by vanilla SPADE using mIoU (Table 1) and FID scores (Table 2). Table 1 shows that compared to vanilla SPADE, generated images under whitebox attack can lead to a giant decline on mIoU score (from to for DRN-105, from to for Upernet-101). On different network architectures, our adversarial examples can also decrease mIoU (around on Cityscapes, on ADE20K) showing strong transferability of our examples across models.

Compare to vanilla SPADE, the FID of our adversarial examples increases slightly ( to on Cityscapes, to on ADE20K, see Table 2) indicating our samples have comparable quality and variety. Note that we only train AdvSPADE half epochs as reported in [35] and achieve FID on ADE20K, still smaller than other leading semantic image synthetic models. Figure 3 shows qualitative results. Moreover, by introducing an image encoder and KL Divergence loss, we can generate multi-modal stylized adversarial examples which are shown in the supplementary materials.

DRN-105 Upernet-101
Bound Size Real Images Vanilla SPADE Bound Size Real Images Vanilla SPADE
Method () +Perturbation +Perturbation Method () +Perturbation +Perturbation
FGSM 0.25 0% 0% FGSM 0.25 0.4% 0.9%
1 0% 0% 1 0.9% 1.8%
8 0% 0% 8 2.6% 2.6%
32 15.6% 16.4% 32 6.1% 8.0%
PGD 0.25 0% 0% PGD 0.25 0.4% 0.9%
1 0% 0% 1 0.8% 2.8%
8 22.2% 43.4% 8 11.5% 24.2%
32 33.8% 47.2% 32 39.0% 44.1%
AdvSPADE 84.4% AdvSPADE 57.7%
Table 3: Attack Success Rate under white-box setting for our AdvSPADE and norm-bounded attacks (FGSM,PGD) with bound sizes: . AdvSPADE achieves significantly higher attack success rates on both datasets.

Norm-bounded Adversarial attacks. We compare the attack success rate of AdvSPADE with the state-of-the-art norm bounded adversarial attacks, including FGSM and PGD [15, 27], for two dataset. We set the norm bound size to for both FGSM and PGD. For PGD, we follow the  [23, 1] and set number of attack iterations to . We apply FGSM and PGD on both real images and synthetic images by vanilla SPADE, and compare their mIoU scores and FID with ours. The results (see Table 3) show that PGD and FGSM attacks can barely attack target networks with small bound size (). For instance, FGSM attack with bound size on real and vanilla SPADE generated images achieve attack success rate on DRN-105 network on Cityscapes. In contrast, AdvSPADE achieves high attack success rate ( and on DRN-105 and Upernet-101, respectively).

DRN-105 (Cityscapes) Upernet-101 (ADE20K)
Bound Size Real Images Vanilla SPADE Bound Size Real Images Vanilla SPADE
Method () +Perturbation +Perturbation Method () +Perturbation +Perturbation
FGSM 0.25 0.557 0.431 (63.354) FGSM 0.25 0.346 0.286 (33.821)
1 0.408 0.355 (64.455) 1 0.278 0.221 (35.254)
8 0.196 0.152 (82.144) 8 0.178 0.152 (60.563)
32 0.009 0.009 (248.175) 32 0.070 0.048 (166.724)
PGD 0.25 0.557 0.431 (63.354) PGD 0.25 0.346 0.286 (33.821)
1 0.339 0.287 (63.971) 1 0.276 0.181 (34.876)
8 0.036 0.022 (69.162) 8 0.070 0.022 (62.289)
32 0.013 0.009 (89.998) 32 0.013 0.007 (113.553)
AdvSPADE 0.01 (67.302) AdvSPADE 0.011 (53.49)
Table 4: Attack Effectiveness under white-box setting for our AdvSPADE and norm-bounded attacks (FGSM,PGD) with bound sizes: . We show mIoU and FID scores (FID in parentheses) of AdvSPADE generated examples and norm-bounded attacks on both real and standard SPADE generated images. The results indicate that traditional norm-bounded attacks need large size perturbation to achieve similar mIoU as AdvSPADE, which will be easily detectable by the human. However, our unrestricted adversarial examples remain invisible to humans, which reveals the effectiveness of our proposed method.
Figure 4: Comparison of norm-bounded vs. AdvSPADE generated samples at the same mIoU level on ADE20K. We Apply different attack methods to descend mIoU score to the same level () and show the visual comparison. FGSM and PGD are applied with . The noise patterns in norm-bounded adversarial images (First four columns) rather than in our examples indicate that our examples can attack target networks successfully, yet keep undetectable to human.

Table 4 further reveals that for both FGSM and PGD attack, to decrease the mIoU to the same level as AdvSPADE (mIoU = ), the generated perturbation becomes conspicuous () so that human can easily distinguish adversarial examples from clean images. FID also reflects the decline of adversarial images’ quality. Secondly, adversarial examples generated by FGSM and PGD attack can not make mIoU drop down to the same level as AdvSPADE if it is required to maintain the quality of the samples. Consider the adversarial samples on Cityscapes generated by vanilla SPADE and add perturbation with , their FID (64.455) is comparable with our samples, but mIoU () is much larger than ours (). Figure 4 illustrates the difference between AdvSPADE samples and norm-bounded samples on the same level of mIoU score. We can easily see the noise pattern in norm-bounded samples rather than in our examples.

We further compare the transferability of FGSM and PGD with our AdvSPADE on black-box setting and observe similar conclusion. The detailed results are shown in supplementary material.

GAN-based Adversarial Attack. Here, we compare the mIoU drop between Song’s unrestricted attack [39] and AdvSPADE on Cityscapes (see Table 1 ). The results show  [39] generated examples can only lead a small mIoU drop on a white box setting (from 0.62 to 0.461), while AdvSPADE generated images drop to 0.010. The transferability of their examples under the blackbox setting is also limited—mIoU drop of 3% (Song et al.) vs. 15% (AdvSPADE), demonstrating limitations of their attacks.

Method Bound Size Attack Success Rate mIoU
0.25 0% 0.350
AdvGAN 1 0% 0.344
8 0% 0.264
32 0% 0.351
0.25 0% 0.340
AdvGAN 1 0% 0.338
( = -100) 8 7.8% 0.055
32 6.4% 0.044
AdvSPADE(Ours) - 84.4% 0.01
Table 5: Comparison between AdvGAN and AdvSPADE ours on DRN-105. We present the mIoU score and attack success rate of AdvGAN [46] with two different hyper-parameter setting on Cityscapes. The results show the ineffectiveness of AdvGAN on segmentation task even with large norm-bound size.

We also compare the attack success rate and mIoU drop between AdvGAN [46] and AdvSPADE (see Table 5). Since AdvGAN can be considered a special norm-bounded attack method, we follow the same experimental setting for bound size with traditional norm-bounded attacks. With suggested hyper-parameter setting in [46], AdvGAN generated examples show a 0% attack success rate and only 0.351 mIoU drop even when bound size is as large as 32. After adjusting the hyper-parameter, generated adversarial examples can only attack target segmentation network successfully with large bound size (Attack success rate is 7.8%, mIoU is 0.055 at bound size is 8). However, in such a case, we can clearly see the noise pattern from adversarial examples. From our experimental results, it turns out that AdvGAN’s attack effectiveness is even worse than the traditional norm-bound attacks, showing that its applicability to only classification tasks.

Method FID mIoU Attack Success Rate
w/o Feature Matching Loss 79.76 0.027 57.6%
w/o VGG Loss 80.142 0.026 56.0%
w/o Adv Loss 62.939 0.62 0%
AdvSPADE 67.302 0.01 84%
Table 6: Ablation Study Results for Cityscapes Dataset

Table 6 further shows an ablation study on Cityscapes Dataset to evaluate effectiveness of each component in AdvSPADE. The results show that both Feature Matching loss and VGG loss terms are important for the quality and effectiveness of adversarial examples generation. Removing any of them causes FID to increase, and attack success rate to drop. Besides, on removing adversarial loss term AdvSPADE degenerates to vanilla SPADE, which can only generate benign images.

Human Evaluation. Using Amazon Mechanical Turk (AMT), we evaluate how a human perceives generated adversarial images. A detailed result is presented in the Supplementary part. This is done in two settings:

(1) Semantic Consistency Test: If the semantic meanings of our adversarial examples are consistent with their respective ground truth labels, humans will segment the examples correctly. However, asking workers to segment every pixel is time-consuming and inefficient. Instead, we give AMT workers a pair of images: a generated adversarial image and a semantic label (half of the images pairs are matched, and rest are mismatched) and ask them if the semantic meaning of given synthetic image is consistent with the given semantic label. We notice that users can identify the semantic meaning of our adversarial examples precisely ( for Cityscapes, for ADE20K). This shows although a segmentation network completely fails to handle our adversarial examples, humans can successfully identify the semantic meaning, which proves our attack’s success.

(2) Fidelity AB Test: We compare the visual fidelity of AdvSPADE with vanilla SPADE. We give workers the semantic ground truth label and two generated images by AdvSPADE and vanilla SPADE respectively and ask them to select the more appropriate image corresponding to the ground truth label. and users favor our examples over vanilla SPADE for Cityscapes and ADE20K dataset indicating competitive visual fidelity of our adversarial images.

Robustness Evaluation. We first show that robust training with norm-bounded adversarial images can defend restricted adversarial attacks where perturbation is added either on real or synthetic images. However, our unrestricted adversarial examples can still attack these robust models successfully [15]. We then present the experimental results of a more robust segmentation model built based on our unrestricted examples. We follow the training setting introduced by  [27]: we select PGD as the attack method and set the adversarial training epoch = on Cityscapes, on ADE20K, norm-bound size , attack iteration = , step size = . After the training phase, we use PGD with the same setting to generate norm-bounded perturbation and add it on both real and synthetic images by vanilla SPADE. We find that real and synthesized images with perturbation make mIoU decrease to and on robust DRN-105, and on robust Upernet-101, respectively. In contrast, our adversarial examples can achieve mIoU score on robust DRN-105, and on robust Upernet-101 indicating that our examples can successfully surpass the robust models trained with norm-bounded adversarial examples. Next, we train a model 50 epochs with our unrestricted adversarial examples on the Cityscapes dataset and then apply PGD to attack. The result shows that PGD attack can only achieve attack success rate on DRN-105. Since norm-bound examples are unknown for the robust model defended by our samples, the low success rate reflects models gain stronger robustness from adversarial training with AdvSPADE examples.

6 Conclusion

This paper explores the existence of adversarial examples beyond norm-bounded metric on the state-of-the-art semantic segmentation neural networks. By modifying the loss function of SPADE architecture, we are able to generate high quality unrestricted realistic adversarial examples, which mislead segmentation networks’ behavior. We demonstrate the effectiveness and robustness of our method by comparing with traditional norm-bounded attacks. We also show that our generated adversarial examples can easily surpass the state-of-the-art defense method, which raises new concerns about the security of segmentation networks.

References

  • [1] A. Arnab, O. Miksik, and P. H.S. Torr (2018-06) On the robustness of semantic segmentation models to adversarial attacks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. External Links: ISBN 9781538664209, Link, Document Cited by: §1, §2, §2, §2, §5.
  • [2] A. Athalye, N. Carlini, and D. Wagner (2018) Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. External Links: 1802.00420 Cited by: §2.
  • [3] V. Badrinarayanan, A. Kendall, and R. Cipolla (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE transactions on pattern analysis and machine intelligence 39 (12), pp. 2481–2495. Cited by: §2.
  • [4] H. G. Barrow and J. M. Tenenbaum (1981-08) Interpreting line drawings as three-dimensional surfaces. Artif. Intell. 17 (1-3), pp. 75–116. External Links: ISSN 0004-3702, Link, Document Cited by: §2.
  • [5] N. Carlini and D. A. Wagner (2017) Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy, SP 2017, San Jose, CA, USA, May 22-26, 2017, pp. 39–57. Cited by: §1, §2.
  • [6] L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence 40 (4), pp. 834–848. Cited by: §2.
  • [7] L. Chen, G. Papandreou, F. Schroff, and H. Adam (2017) Rethinking atrous convolution for semantic image segmentation. External Links: 1706.05587 Cited by: §4.
  • [8] M. Cisse, Y. Adi, N. Neverova, and J. Keshet (2017) Houdini: fooling deep structured prediction models. arXiv preprint arXiv:1707.05373. Cited by: §2.
  • [9] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele (2016) The cityscapes dataset for semantic urban scene understanding. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §4, §4.
  • [10] I. Dunn, H. Pouget, T. Melham, and D. Kroening (2019) Adaptive generation of unrestricted adversarial inputs. External Links: 1905.02463 Cited by: §2.
  • [11] L. Engstrom, B. Tran, D. Tsipras, L. Schmidt, and A. Madry (2017) A rotation and a translation suffice: fooling cnns with simple transformations. External Links: 1712.02779 Cited by: §1.
  • [12] L. Engstrom, B. Tran, D. Tsipras, L. Schmidt, and A. Madry (2017) Exploring the landscape of spatial robustness. External Links: 1712.02779 Cited by: §1.
  • [13] A. Ess, T. Mueller, H. Grabner, and L. J. Van Gool (2009) Segmentation-based urban traffic scene understanding.. In BMVC, Vol. 1, pp. 2. Cited by: §1.
  • [14] R. Feinman, R. R. Curtin, S. Shintre, and A. B. Gardner (2017) Detecting adversarial samples from artifacts. ArXiv abs/1703.00410. Cited by: §2.
  • [15] I. Goodfellow, J. Shlens, and C. Szegedy (2015) Explaining and harnessing adversarial examples. In International Conference on Learning Representations, External Links: Link Cited by: §1, §2, §2, §5, §5.
  • [16] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter (2017) GANs trained by a two time-scale update rule converge to a local nash equilibrium. External Links: 1706.08500 Cited by: §4.
  • [17] G. Hinton, O. Vinyals, and J. Dean (2015) Distilling the knowledge in a neural network. In NIPS Deep Learning and Representation Learning Workshop, External Links: Link Cited by: §2.
  • [18] S. Hong, X. Yan, T. Huang, and H. Lee (2018) Learning hierarchical semantic image manipulation through structured representations. External Links: 1808.07535 Cited by: §2.
  • [19] S. Ioffe and C. Szegedy (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. External Links: 1502.03167 Cited by: §3.
  • [20] P. Isola, J. Zhu, T. Zhou, and A. A. Efros (2017-07) Image-to-image translation with conditional adversarial networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). External Links: ISBN 9781538604571, Link, Document Cited by: §2.
  • [21] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §4.
  • [22] D. P. Kingma and M. Welling (2013) Auto-encoding variational bayes. Note: cite arxiv:1312.6114 External Links: Link Cited by: §3.
  • [23] A. Kurakin, I. Goodfellow, and S. Bengio (2016) Adversarial machine learning at scale. External Links: 1611.01236 Cited by: §2, §5.
  • [24] D. Lee, S. Liu, J. Gu, M. Liu, M. Yang, and J. Kautz (2018) Context-aware synthesis and placement of object instances. In Advances in Neural Information Processing Systems 31, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), pp. 10393–10403. External Links: Link Cited by: §2.
  • [25] J. Long, E. Shelhamer, and T. Darrell (2015) Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440. Cited by: §2.
  • [26] C. Lyu, K. Huang, and H. Liang (2015-11) A unified gradient regularization family for adversarial examples. 2015 IEEE International Conference on Data Mining. External Links: ISBN 9781467395045, Link, Document Cited by: §2.
  • [27] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu (2018) Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, External Links: Link Cited by: §1, §2, §2, §5, §5.
  • [28] F. Milletari, N. Navab, and S. Ahmadi (2016) V-net: fully convolutional neural networks for volumetric medical image segmentation. In 2016 Fourth International Conference on 3D Vision (3DV), pp. 565–571. Cited by: §1.
  • [29] M. Mirza and S. Osindero (2014) Conditional generative adversarial nets. External Links: 1411.1784 Cited by: §3.
  • [30] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida (2018) Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957. Cited by: §4.
  • [31] A. Odena, C. Olah, and J. Shlens (2016) Conditional image synthesis with auxiliary classifier gans. In ICML, Cited by: §2.
  • [32] N. Papernot, P. McDaniel, and I. Goodfellow (2016) Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. External Links: 1605.07277 Cited by: §1.
  • [33] N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami (2016-05) Distillation as a defense to adversarial perturbations against deep neural networks. 2016 IEEE Symposium on Security and Privacy (SP). External Links: ISBN 9781509008247, Link, Document Cited by: §1.
  • [34] N. Papernot and P. McDaniel (2017) Extending defensive distillation. External Links: 1705.05264 Cited by: §2.
  • [35] T. Park, M. Liu, T. Wang, and J. Zhu (2019) Semantic image synthesis with spatially-adaptive normalization. External Links: 1903.07291 Cited by: §1, §2, §3, §3, §4, §4, §5.
  • [36] O. Ronneberger, P. Fischer, and T. Brox (2015) U-net: convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, pp. 234–241. External Links: ISBN 9783319245744, ISSN 1611-3349, Link, Document Cited by: §1, §2.
  • [37] U. Shaham, Y. Yamada, and S. Negahban (2018) Understanding adversarial training: increasing local stability of supervised models through robust optimization. Neurocomputing 307, pp. 195–204. Cited by: §2.
  • [38] G. Shen, Y. Ding, T. Lan, H. Chen, and Z. Qin (2018) Brain tumor segmentation using concurrent fully convolutional networks and conditional random fields. In Proceedings of the 3rd International Conference on Multimedia and Image Processing, pp. 24–30. Cited by: §1.
  • [39] Y. Song, R. Shu, N. Kushman, and S. Ermon (2018) Constructing unrestricted adversarial examples with generative models. External Links: 1805.07894 Cited by: §1, §2, §2, §3, §4, §4, §4, Table 1, §5.
  • [40] C. H. Sudre, W. Li, T. Vercauteren, S. Ourselin, and M. Jorge Cardoso (2017) Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. Lecture Notes in Computer Science, pp. 240–248. External Links: ISBN 9783319675589, ISSN 1611-3349, Link, Document Cited by: §3.
  • [41] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus (2013) Intriguing properties of neural networks. External Links: 1312.6199 Cited by: §1, §2.
  • [42] Y. Tian, K. Pei, S. Jana, and B. Ray (2018) Deeptest: automated testing of deep-neural-network-driven autonomous cars. In Proceedings of the 40th international conference on software engineering, pp. 303–314. Cited by: §1.
  • [43] T. Wang, M. Liu, J. Zhu, A. Tao, J. Kautz, and B. Catanzaro (2018-06) High-resolution image synthesis and semantic manipulation with conditional gans. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. External Links: ISBN 9781538664209, Link, Document Cited by: §2, §3.
  • [44] X. Wang, K. He, and J. E. Hopcroft (2019) AT-gan: a generative attack model for adversarial transferring on generative adversarial nets. External Links: 1904.07793 Cited by: §2.
  • [45] E. Wong, F. R. Schmidt, and J. Z. Kolter (2019) Wasserstein adversarial examples via projected sinkhorn iterations. External Links: 1902.07906 Cited by: §1, §2.
  • [46] C. Xiao, B. Li, J. Zhu, W. He, M. Liu, and D. Song (2018-07) Generating adversarial examples with adversarial networks. Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence. External Links: ISBN 9780999241127, Link, Document Cited by: §2, §4, Table 5, §5.
  • [47] T. Xiao, Y. Liu, B. Zhou, Y. Jiang, and J. Sun (2018) Unified perceptual parsing for scene understanding. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 418–434. Cited by: §4.
  • [48] C. Xie, J. Wang, Z. Zhang, Y. Zhou, L. Xie, and A. Yuille (2017-10) Adversarial examples for semantic segmentation and object detection. 2017 IEEE International Conference on Computer Vision (ICCV). External Links: ISBN 9781538610329, Link, Document Cited by: §2, §2.
  • [49] W. Xu, D. Evans, and Y. Qi (2018) Feature squeezing: detecting adversarial examples in deep neural networks. Proceedings 2018 Network and Distributed System Security Symposium. External Links: ISBN 1891562495, Link, Document Cited by: §2.
  • [50] S. Yao, T. M. H. Hsu, J. Zhu, J. Wu, A. Torralba, W. T. Freeman, and J. B. Tenenbaum (2018) 3D-aware scene manipulation via inverse graphics. External Links: 1808.09351 Cited by: §2.
  • [51] F. Yu, V. Koltun, and T. Funkhouser (2017) Dilated residual networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 472–480. Cited by: §4.
  • [52] F. Yu and V. Koltun (2016) Multi-scale context aggregation by dilated convolutions. In International Conference on Learning Representations (ICLR), Cited by: §4.
  • [53] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Huang (2019) Free-form image inpainting with gated convolution. In Proceedings of the IEEE International Conference on Computer Vision, pp. 4471–4480. Cited by: §2.
  • [54] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia (2017-07) Pyramid scene parsing network. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). External Links: ISBN 9781538604571, Link, Document Cited by: §4.
  • [55] B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso, and A. Torralba (2016) Semantic understanding of scenes through the ade20k dataset. arXiv preprint arXiv:1608.05442. Cited by: §4, §4.
  • [56] B. Zhou, H. Zhao, X. Puig, T. Xiao, S. Fidler, A. Barriuso, and A. Torralba (2018) Semantic understanding of scenes through the ade20k dataset. International Journal on Computer Vision. Cited by: §4.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
399492
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description