Intelligent image synthesis to attack a segmentation CNN using adversarial learning
Deep learning approaches based on convolutional neural networks (CNNs) have been successful in solving a number of problems in medical imaging, including image segmentation. In recent years, it has been shown that CNNs are vulnerable to attacks in which the input image is perturbed by relatively small amounts of noise so that the CNN is no longer able to perform a segmentation of the perturbed image with sufficient accuracy. Therefore, exploring methods on how to attack CNN-based models as well as how to defend models against attacks have become a popular topic as this also provides insights into the performance and generalization abilities of CNNs. However, most of the existing work assumes unrealistic attack models, i.e. the resulting attacks were specified in advance. In this paper, we propose a novel approach for generating adversarial examples to attack CNN-based segmentation models for medical images. Our approach has three key features: 1) The generated adversarial examples exhibit anatomical variations (in form of deformations) as well as appearance perturbations; 2) The adversarial examples attack segmentation models so that the Dice scores decrease by a pre-specified amount; 3) The attack is not required to be specified beforehand. We have evaluated our approach on CNN-based approaches for the multi-organ segmentation problem in 2D CT images. We show that the proposed approach can be used to attack different CNN-based segmentation models.
CNNs have been amongst the most popular model for image classification and segmentation problems thanks to their efficiency and effectiveness in learning representative image features. However, it has been widely reported that even the most well-established CNN models such as the GoogLeNet , are vulnerable to almost imperceptible intensity changes to the input images . These small intensity changes can be regarded as adversarial attacks to CNNs. In medical image classification, the adversarial attacks can also fool CNN-based classifiers [16, 3]. Therefore, it is important to verify the robustness of CNNs before deploying them into practical use.
The verification of CNNs requires good understanding of the mechanism of adversarial attacks. In this paper, we aim at developing a novel method to generate adversarial examples which are able to attack CNN models for medical image segmentation. Generating adversarial examples to attack semantic image segmentation models is challenging because: 1) Semantic segmentation means assigning a label to each pixel (or voxel) instead of a single label per image as in conventional adversarial attacks typically described in computer vision scenarios. Therefore, attacking a segmentation model is more challenging than attacking a classification model; 2) It is not straightforward to evaluate the success of the attack. A good adversarial example for a classification model results in an incorrect prediction on the whole image while a good adversarial example for a segmentation model does not necessarily lead to an incorrect prediction for every pixel (voxel); 3) Conventional adversarial attacks perturb the image intensity by small amount, however, in medical imaging scenarios deformations are also useful to attack segmentation models. For instance, organs can be present in various configuration in images. Any segmentation model is therefore in principle susceptible to unseen poses or shapes of organs.
Generative adversarial networks (GAN)  and variational autoencoders (VAE)  are both unsupervised methods that can learn latent feature representations from training images. A GAN learns the latent feature representations implicitly while the VAE learns them explicitly. Training a GAN is difficult due to mode collapse and unreasonable results, e.g. a dog with two heads. In contrast, training a VAE is fairly simple. However, while a GAN can generate realistic images, images generated by a VAE are blurry because of the L2 loss employed during training. Inspired by these observations, we propose to combine the advantages of VAE and GAN to generate realistic and reasonable image deformations and appearance changes so that the transformed images can attack medical segmentation models.
Our main contributions can be summarised as follows: 1) We propose a novel approach to generate adversarial examples to attack the CNN model for abdominal organ segmentation in CT images; 2) We also measure the success of attack by means of observing significant reductions in the Dice score compared to ground truth segmentations; 3) The proposed approach attacking the segmentation model does not require any a-priori specification of particular attacks. In our application, we do not specify any organ which is attacked.
2 Related Work
The work in , , and  represent state-of-the-art methods for attacking segmentation models. Fischer et al.  proposed to attack segmentation models so that the models cannot segment object in a specified class (e.g. ignoring pedestrians on street). Metzen et al.  proposed to generate adversarial examples so that the segmentation model incorrectly segments one cityscape as another one. The adversarial examples generated by these two methods attacked the segmentation models with specified targets, e.g. pedestrians. In contrast, Xie et al.  proposed an approach to generate adversarial examples for image semantic segmentation and object detection without attacking targets. However, a random segmentation result should be specified so that the adversaries can be inferred. The adversarial attacks generated by these three methods often appear as pure noise that has no semantic meaning. Therefore, these attacks do not represent real-world situations that can occur in medical imaging applications.
3 Our Approach
We propose a novel end-to-end approach to generate adversarial examples for medical image segmentation scenarios. Formally, is the original image ( height and width) and is its segmentation given a fixed CNN-based segmentation model , i.e. . Here is the number of labels, e.g. the organs of interest. The adversarial attack model allows deformations and intensity variations applied to . is a dense deformation field which is a displacement vector for each pixel (or voxel) while is a smooth intensity perturbation which can be interpreted, e.g. as a bias field. Therefore, the transformed image after adversarial attack is given by
Here is the function which transforms to based on . Figure 1 shows the framework which learns appropriate and such that can attack the segmentation model. The whole framework consists of two key components: a CNN model for generation and it’s learning algorithm.
3.1 Model for generating adversarial attack
The CNN architecture which generates the is similar to a multi-task VAE. First, is processed by an encoding CNN resulting in several feature maps which are then used to learn a latent feature representation and . and are then reconstructed to dense deformation field and dense intensity variation by two CNN decoders, respectively. The two decoding CNNs share the same architecture but they do not share weights. Learning and explicitly ensures the looks reasonable.
The dense deformation consists of two channels of feature maps and , representing pixel position changes in horizontal and vertical directions (i.e. and axis). In addition, we propose to limit the norm of and so that it is difficult to be perceived by human observers. To this end, the following to regularization terms are used:
and are two fixed hyper-parameters. The regularization ensures the smoothness of the deformation field and intensity variation .
Each branch of the CNNs generating is different from a VAE since the input and output of the CNN are not the same. In fact, it is an image-to-image CNN and we can sample the learned latent space to generate multiple instances s and s. This idea is similar to the one proposed in  where a latent space was learned to sample multiple realistic image segmentations.
Since the ground truth of , , and are not available, it is not possible to learn the parameters of the encoding and decoding CNN in a explicit supervised manner. To address this problem, we propose to learn the parameters implicitly based on two conditions: First, we assume that should look realistic compared with . Secondly we assume that the accuracy of the segmentation should decreases significantly compared with . This decrease can be measured in terms of a reduction of Dice score.
An adversarial learning method is employed to ensure the looks realistic compared with . To this end, the generating CNN is regarded as a generator CNN, i.e. . An additional discriminator CNN is used to predict the realism of compared with . Adversarial training and results in realistic . We adopt the Wasserstein GAN (WGAN)  loss function during the adversarial training. Formally,
The goal of this work is to generate which is able to attack a given segmentation CNN model, e.g. a U-Net . This means that the segmentation results and are different. Here . When training , we use cross-entropy as the loss function between and the ground truth , i.e. . This leads to satisfactory . To constrain the difference between and , we propose to use the following loss function:
Here is a hyper-parameter which controls the difference between and . If , then tends to be similar to so that and tend to be zero. In contrast, if is a very large number, then the norm of and are large that the discriminator CNN is difficult to fool. As such, the training process is likely to collapse. Therefore, should be within a proper range. In addition, we propose to mask the standard cross-entropy function so that the ROI of organs of interest is emphasized. Specifically, the masked cross-entropy function is
Here, is the mask highlighting the organs of interest and is the element-wise product.
In summary, the loss functions of the whole framework are:
3.3 Implementation Details
In this paper, CNNs are implemented using Tensorflow. The adversarial learning is optimised using the RMSProp algorithm . The decay is 0.9 and . We use the fixed learning rate of for both generator and discriminator CNNs. Batch normalization technique  is used after convolutions. A leaky rectified linear unit (LReLU) is used as the nonlinear activation function to ease the adversarial training with . and are set as 0.1 and 0.01, respectively.
4 Experiments and Results
Experiments were performed on a abdominal CT dataset with multiple organs manually annotated by human experts. The image acquisition details and the involved patient demographics can be found in . The dataset consists of 150 subjects and for each subject the annotated organs include the pancreas, the kidneys, the liver, and the spleen. The dataset was randomly split into a training set, a validation set, and a testing set, which have 60, 15, and 75 subjects respectively. The voxel intensities of each subject were normalized to zero mean and unit standard deviation.
We trained a standard U-Net  to segment all abdominal organs. Due to limitations with GPU memory, the U-Net is based on 2D image, rather than 3D volumes. Following , the U-Net was trained on image patches and tested on image slices. The trained U-Net was used as the fixed CNN in this work to be subjected by adversarial attacks.
The Dice score was used to assess the segmentation quality for each organ. We define a 30% decrease on the Dice score of an organ as a successful attack. Similar to [12, 15], we compute the perceptibility of the adversarial perturbation by
is the similarity of the real image to the synthetic image. The smaller the value of is, the less likely the adversarial perturbation is perceived by human observers.
4.1 Adversarial Examples
By sampling the learned latent spaces, the deformation and the intensity variation are generated and therefore realistic adversarial examples are obtained. Figure 2 shows two such examples. Thanks to the regularization imposed on and , both are smooth and difficult to recognise by humans. However, the derived adversarial examples attack the U-Net successfully. More examples are shown in Figure 3.
4.2 Attacking the Segmentation Model
Table 1 shows the segmentation results of the standard U-Net on multiple organs. The aforementioned success of the attacking model results in a 30% decrease in terms of Dice score on every organ. We also listed this borderline of Dice score in Table 1. In the proposed attack approach, is an important hyper-parameter deciding the success of attacking the U-Net. ranging from 0.5 to 3.0 was tested and the results are shown in Table 1. The larger the is, the more the Dice scores decrease and the larger the perceptibility is. Using , the U-Net can be attacked successfully on all organs.
In terms of different organs, the segmentations on the pancreas and the kidneys are more difficult to be attacked compared to segmentations on the liver and the spleen. Specifically, the segmentations on the pancreas and the kidneys can be attacked when while the segmentations on the liver and the spleen can be attacked when .
The proposed adversarial examples feature both deformations and intensity variations . We studied the effect of and individually when . The results are reported in Table 1. For the kidneys, the deformation changes lead to more decrease of the Dice scores while on the other organs, the intensity variance has more impact on attacking the U-Net model. This means that the segmentation model is more sensitive to the intensity variance. The abdominal organs naturally vary in terms of pose on 2D image slices in the training set. Therefore, small deformations do not significantly decrease the Dice scores. In contrast, the intensity variations introduces shadows and artefacts which are likely to influence the segmentation CNN.
|U-Net 30% decrease||56.05||66.32||66.30||66.33||–|
|on U-Net ()||74.77||93.88||89.81||81.87||0.061|
|on U-Net ()||70.66||90.06||59.51||37.12||0.060|
|on U-Net ()||64.25||66.97||26.45||35.81||0.075|
|on U-Net ()||53.59||60.07||11.19||17.21||0.074|
|on U-Net ()||40.57||45.47||9.16||17.60||0.084|
|on U-Net ()||31.49||43.43||5.82||26.58||0.085|
|on U-Net ()||70.46||82.04||75.06||70.47||0.056|
|on U-Net ()||68.15||88.25||50.05||46.94||0.061|
5 Discussion and Conclusion
In this paper, we have proposed a novel approach to generate adversarial examples to attack an existing CNN model for medical image segmentation. The generated adversarial examples include geometrical deformations to model anatomical variations as well as intensity variation which model appearance variations. These examples attack CNN-based segmentation models such as a U-Net  by decreasing the Dice score by a pre-specified amount. The training process is end-to-end without any predefined requirements. In fact, it can be replaced by any other CNN-based models. In the future, we will investigate the use of the proposed approach to generate additional training images so that the segmentation model can be more robust and defend attacks. In addition, the proposed approach can be used to verify if an CNN model is robust or not. Specifically, our approach can generate adversarial examples for the CNN model. If the adversarial examples are reasonable and realistic, then the CNN model is not robust enough.
-  (2017) Wasserstein generative adversarial networks. In ICML, pp. 214–223. Cited by: §3.2.
-  (2018) DRINet for medical image segmentation. IEEE TMI 37 (11), pp. 2453–2462. Cited by: §4.
-  (2018) Adversarial attacks against medical deep learning systems. arXiv preprint arXiv:1804.05296. Cited by: §1.
-  (2017) Adversarial examples for semantic image segmentation. arXiv preprint arXiv:1703.01101. Cited by: §2.
-  (2015) Explaining and harnessing adversarial examples. In ICLR, Cited by: §1.
-  (2014) Generative adversarial nets. In NIPS, pp. 2672–2680. Cited by: §1.
-  (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, pp. 448–456. Cited by: §3.3.
-  (2018) The relativistic discriminator: a key element missing from standard GAN. arXiv preprint arXiv:1807.00734. Cited by: §3.2.
-  (2013) Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114. Cited by: §1.
-  (2018) A probabilistic U-Net for segmentation of ambiguous images. In NIPS, pp. 6965–6975. Cited by: §3.1.
-  (2017) Universal adversarial perturbations against semantic image segmentation. In ICCV, pp. 2755–2764. Cited by: §2.
-  (2015) Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In CVPR, pp. 427–436. Cited by: §4.
-  (2015) U-net: convolutional networks for biomedical image segmentation. In MICCAI, pp. 234–241. Cited by: §3.2, §4, §5.
-  (2015) Going deeper with convolutions. In CVPR, pp. 1–9. Cited by: §1.
-  (2014) Intriguing properties of neural networks. In ICLR, Cited by: §4.
-  (2018) Vulnerability analysis of chest X-ray image classification against adversarial attacks. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pp. 87–94. Cited by: §1.
-  (2012) Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning 4 (2), pp. 26–31. Cited by: §3.3.
-  (2015) Discriminative dictionary learning for abdominal multi-organ segmentation. Medical Image Analysis 23 (1), pp. 92–104. Cited by: §4.
-  (2017) Adversarial examples for semantic segmentation and object detection. In ICCV, pp. 1369–1378. Cited by: §2.