Adversarial Deep Structured Nets for Mass Segmentation from Mammograms

Adversarial Deep Structured Nets for
Mass Segmentation from Mammograms


Mass segmentation provides effective morphological features which are important for mass diagnosis. In this work, we propose a novel end-to-end network for mammographic mass segmentation which employs a fully convolutional network (FCN) to model a potential function, followed by a CRF to perform structured learning. Because the mass distribution varies greatly with pixel position, the FCN is combined with a position priori. Further, we employ adversarial training to eliminate over-fitting due to the small sizes of mammogram datasets. Multi-scale FCN is employed to improve the segmentation performance. Experimental results on two public datasets, INbreast and DDSM-BCRP, demonstrate that our end-to-end network achieves better performance than state-of-the-art approaches. 111


Anonymous Authors \addressAnonymous Affiliations \nameWentao Zhu   Xiang Xiang   Trac D. Tran   Gregory D. Hager   Xiaohui Xie\addressUniversity of California, Irvine   Johns Hopkins University
{wentaoz1,xhx}, {xxiang, trac, ghager1} {keywords} Adversarial deep structured networks, segmentation, adversarial fully convolutional networks

1 Introduction

According to the American Cancer Society, breast cancer is the most frequently diagnosed solid cancer and the second leading cause of cancer death among U.S. women. Mammogram screening has been proven to be an effective way for early detection and diagnosis, which significantly decrease breast cancer mortality. Mass segmentation provides morphological features, which play crucial roles for diagnosis.

Traditional studies on mass segmentation rely heavily on hand-crafted features. Model-based methods build classifiers and learn features from masses [1, 2]. There are few works using deep networks for mammogram [3, 4, 5]. Dhungel et al. employed multiple deep belief networks (DBNs), GMM classifier and a priori as potential functions, and structured SVM to perform segmentation [6]. They further used CRF with tree re-weighted belief propagation to boost the segmentation performance [7]. A recent work used the output from a CNN as a complimentary potential function, yielding the state-of-the-art performance [8]. However, the two-stage training used in these methods produces potential functions that easily over-fit the training data.

In this work, we propose an end-to-end trained adversarial deep structured network to perform mass segmentation (Fig. 1). The proposed network is designed to robustly learn from a small dataset with poor contrast mammographic images. Specifically, an end-to-end trained fully convolutional network (FCN) with CRF is applied. Adversarial training is introduced into the network to learn robustly from scarce mammographic images. Different from DI2IN-AN using a generative framework for segmentation [9], ours directly optimize pixel-wise labeling loss. To further explore statistical property of mass regions, a spatial priori is integrated into FCN. We validate the adversarial deep structured network on two public mammographic mass segmentation datasets. The proposed network is demonstrated to outperform other algorithms for mass segmentation consistently.

Our main contributions in this work are: (1) To our best knowledge, it is the first work employing adversarial training to solve challenges in image segmentation222The first version was on arXiv 2016.. We propose an unified end-to-end training framework integrating FCN+CRF and adversarial training. (2) We employ an end-to-end network to do mass segmentation while previous works require a lot of hand-designed features or multi-stage training. (3) Our model achieves state-of-the-art results on two most commonly used mammographic mass segmentation datasets.

Figure 1: The proposed adversarial deep FCN-CRF network with four convolutional layers followed by CRF for structured learning.

2 FCN-CRF Network

Fully convolutional network (FCN) is a commonly used model for image segmentation, which consists of convolution, transpose convolution, or pooling [10]. For training, the FCN optimizes maximum likelihood loss function , where is the label of th pixel in the th image , is the number of training mammograms, is the number of pixels in the image, and is the parameter of FCN. Here the size of images is fixed to and is 1,600.

CRF is a classical model for structured learning, well suited for image segmentation. It models pixel labels as random variables in a Markov random field conditioned on an observed input image. To make the annotation consistent, we use to denote the random variables of pixel labels in an image, where . The zero denotes pixel belonging to background, and one denotes it belonging to mass region. The Gibbs energy of fully connected pairwise CRF is [11], where unary potential function is the loss of FCN in our case, pairwise potential function defines the cost of labeling pair . The pairwise potential function can be defined as


where label compatibility function is given by the Potts model in our case, is the Gaussian kernel applied to feature vectors [11], is the learned weight. Pixel values and positions can be used as the feature vector .

Efficient inference algorithm can be obtained by mean field approximation [11]. The update rule is


where the first equation is the message passing from label of pixel to label of pixel , the second equation is re-weighting with the learned weights , the third equation is compatibility transformation, the fourth equation is adding unary potentials, and the last step is normalization. Here denotes background or mass. The initialization of inference employs unary potential function as . The above mean field approximation can be interpreted as a recurrent neural network [12].

3 Adversarial FCN-CRF Nets

The shape and appearance priori play important roles in mammogram mass segmentation [13, 8]. The distribution of labels varies greatly with position in the mammographic mass segmentation. From observation, most of the masses are located in the center of region of interest (ROI), and the boundary areas of ROI are more likely to be background (Fig. 2(a)).

(a) (b)
Figure 2: The empirical estimation of a priori on INbreast (left) and DDSM-BCRP (right) training datasets (a). Trimap visualizations on the DDSM-BCRP dataset, segmentation groundtruth (first column), trimap of width (second column), trimaps of width (third column) (b).

The conventional FCN provides independent pixel-wise predictions. It considers global class distribution difference corresponding to bias in the last layer. Here we employ a priori for position into consideration and integrate it into the FCN as , where is the empirical estimation of mass varied with the pixel position , and is the predicted mass probability of conventional FCN. In the implementation, we added an image sized bias as the empirical estimation of mass for FCN to train network. The is used as the unary potential function for in the CRF as RNN. For multi-scale FCN as potential functions, the potential function is defined as , where is the learned weight for unary potential function, is the potential function provided by FCN of each scale.

Adversarial training provides strong regularization for deep networks [14]. The idea of adversarial training is that if the model is robust enough, it should be invariant to small perturbations of training examples that yield the largest increase in the loss (adversarial examples [15]). The perturbation can be obtained as . In general, the calculation of exact is intractable especially for complicated models such as deep networks. The linear approximation and norm box constraint can be used for the calculation of perturbation as , where [14]. For adversarial FCN, the network predicts label of each pixel independently as . For adversarial CRF as RNN, the prediction of network relies on mean field approximation inference as .

The adversarial training forces the model to fit examples with the worst perturbation direction as well. The adversarial loss is defined as


In training, the total loss is defined as the sum of adversarial loss and the empirical loss based on training samples as


where is the regularization factor for , is either mass probability prediction in the FCN or a posteriori approximated by mean field inference in the CRF as RNN for the th image .

4 Experiments

We validate the proposed model on two most commonly used public mammographic mass segmentation datasets: INbreast [16] and DDSM-BCRP dataset [17]. We use the same ROI extraction and resize principle as [6, 8, 7]. Due to the low contrast of mammograms, image enhancement technique is used on the extracted ROI images as the first 9 steps in [18], followed by pixel position dependent normalization. The preprocessing makes training converge quickly. We further augment each training set by flipping horizontally, flipping vertically, flipping horizontally and vertically, which makes the training set 4 times larger than the original training set.

For consistent comparison, the Dice index metric is used to evaluate segmentation performance and is defined as . For a fair comparison, we re-implement the Deep structured learning with CNN which is a two-stage model [8], and obtain similar result (Dice index ) on the INbreast dataset. To investigate the impact of each component in our model, we conduct extensive experiments under different configurations.

  • [noitemsep]

  • FCN is the network integrating a position priori into FCN (denoted as FCN 1 in Table 1).

  • Adversarial FCN is FCN with adversarial training.

  • Joint FCN-CRF is the FCN followed by CRF as RNN with an end-to-end training scheme.

  • Adversarial FCN-CRF is the Jointly FCN-CRF with end-to-end adversarial training.

  • Multi-FCN, Adversarial multi-FCN, Joint multi-FCN-CRF, Adversarial multi-FCN-CRF employ 4 FCNs with multi-scale kernels.

The prediction of Multi-FCN, Adversarial multi-FCN is the average prediction of the 4 FCNs. The configurations of FCNs are in Table 1. Each convolutional layer is followed by max pooling. The last layers of the four FCNs are all two transpose convolution kernels with soft-max activation function. We use hyperbolic tangent activation function in middle layers. The parameters of FCNs are set such that the number of each layer’s parameters is almost the same as that of CNN used in the work [8]. We use Adam with learning rate 0.003. The is in the two datasets. The used in adversarial training are and for INbreast and DDSM-BCRP datasets respectively. Because the boundaries of masses on the DDSM-BCRP dataset are smoother than those on the INbreast dataset, we use larger perturbation . For mean field approximation of the CRF as RNN, we use 5 time steps in the training and 10 time steps in the test phase.


Net. First layer Second layer Third layer
FCN 1 conv.
FCN 2 conv.
FCN 3 conv.
FCN 4 conv.


Table 1: Kernel sizes of sub-nets (#kernel#width#height).


Methodology INbreast DDSM-BCRP


Cardoso et al. [2]
88 N/A
Beller et al. [1]
N/A 70
Deep Structure Learning [6]
88 87
TRW Deep Structure Learning [7]
89 89
Deep Structure Learning + CNN [8]
90 90


FCN 89.48 90.21
Adversarial FCN
89.71 90.78
89.78 90.97
Adversarial FCN-CRF
90.07 91.03
90.47 91.17
Adversarial multi-FCN
90.71 91.20
Joint multi-FCN-CRF
90.76 91.26
Adversarial multi-FCN-CRF
90.97 91.30


Table 2: Dices (%) on INbreast and DDSM-BCRP datasets.

The INbreast dataset is a recently released mammographic mass analysis dataset, which provides more accurate contours of lesion region and the mammograms are of high quality. For mass segmentation, the dataset contains 116 mass regions. We use the first 58 masses for training and the rest for test, which is of the same protocol as [6, 8, 7]. The DDSM-BCRP dataset contains 39 cases (156 images) for training and 40 cases (160 images) for testing [17]. After ROI extraction, there are 84 ROIs for training, and 87 ROIs for test. We compare schemes with other recently published mammographic mass segmentation methods in Table 2.

Table 2 shows the CNN features provide superior performance on mass segmentation, outperforming hand-crafted feature based methods [2, 1]. Our enhanced FCN achieves 0.25% Dice index improvement than the traditional FCN on the INbreast dataset. The adversarial training yields 0.4% improvement on average. Incorporating the spatially structured learning further produces 0.3% improvement. Using multi-scale model contributes the most to segmentation results, which shows multi-scale features are effective for pixel-wise classification in mass segmentation. Combining all the components together achieves the best performance with relative 9.7%, 13% improvement on INbreast, DDSM-BCRP datasets respectively. The possible reason for the improvement is that adversarial scheme eliminates the over-fitting in [8]. We calculate the p-value of McNemar’s Chi-Square Test to compare our model with  [8] on the INbreast dataset. We obtain p-value , which shows our model is significantly better than model [8].

To better understand the adversarial training, we visualize segmentation results in Fig. 3. We observe that the segmentations in the second and fourth rows have more accurate boundaries than those of the first and third rows. It demonstrates that the adversarial training improves FCN and FCN-CRF on mass segmentation.

Figure 3: Visualization of segmentation results using the FCN (first row), Adversarial FCN (second row), Joint FCN-CRF (third row), Adversarial FCN-CRF (fourth row) on the test sets of INbreast dataset. Each column denotes a test sample. Red lines denote the ground truth. Green lines or points denote the segmentation results. Adversarial training provides sharper and more accurate segmentation boundaries than methods without adversarial training.

We further employ the prediction accuracy based on trimap to specifically evaluate segmentation accuracy in boundaries [19]. We calculate the accuracies within trimap surrounding the actual mass boundaries (groundtruth) in Fig. 4. Trimaps on the DDSM-BCRP dataset is visualized in Fig. 2(b). From the figure, accuracies of Adversarial FCN-CRF are 2-3 % higher than those of Joint FCN-CRF on average and the accuracies of Adversarial FCN are better than those of FCN. The above results demonstrate that the adversarial training improves the FCN and Joint FCN-CRF both for whole image (Dice Index) and boundary region segmentation.

(a) (b)
Figure 4: Accuracy comparisons among FCN, Adversarial FCN, Joint FCN-CRF and Adversarial FCN-CRF in trimaps with pixel width , , , , on the INbreast dataset (a) and the DDSM-BCRP dataset (b). The adversarial training improves segmentation accuracy around boundaries.

5 Conclusion

In this work, we propose an end-to-end adversarial FCN-CRF network for mammographic mass segmentation. To integrate the priori distribution of masses and fully explore the power of FCN, a position priori is added to the network. Furthermore, adversarial training is used to handle the small size of training data by reducing over-fitting and increasing robustness. Experimental results demonstrate the state-of-the-art performance of adversarial FCN-CRF on two most commonly used public mammogram datasets.


  • [1] M. Beller et al., “An example-based system to support the segmentation of stellate lesions,” Springer, 2005.
  • [2] J. S Cardoso et al., “Closed shortest path in the original coordinates with an application to breast cancer,” IJPRAI, 2015.
  • [3] M. Kallenberg et al., “Unsupervised deep learning applied to breast density segmentation and mammographic risk scoring,” TMI, 2016.
  • [4] H. Greenspan et al., “Guest editorial deep learning in medical imaging: Overview and future promise of an exciting new technique,” IEEE TMI, 2016.
  • [5] W. Zhu et al, “Deep multi-instance networks with sparse label assignment for whole mammogram classification,” MICCAI, 2017.
  • [6] N. Dhungel et al., “Deep structured learning for mass segmentation from mammograms,” in ICIP. IEEE, 2015.
  • [7] N. Dhungel et al., “Tree re-weighted belief propagation using deep learning potentials for mass segmentation from mammograms,” in ISBI. IEEE, 2015.
  • [8] N. Dhungel et al., “Deep learning and structured prediction for the segmentation of mass in mammograms,” in MICCAI, 2015.
  • [9] D. Yang et al., “Automatic liver segmentation using an adversarial image-to-image network,” in MICCAI. Springer, 2017.
  • [10] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in CVPR, 2015.
  • [11] P. Krähenbühl and V. Koltun, “Efficient inference in fully connected crfs with gaussian edge potentials,” in NIPS, 2011.
  • [12] S. Zheng et al., “Conditional random fields as recurrent neural networks,” in ICCV, 2015.
  • [13] M. Jiang et al., “Mammographic mass segmentation with online learned shape and appearance priors,” in MICCAI, 2016.
  • [14] I. J Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” ICLR, 2015.
  • [15] C. Szegedy et al., “Intriguing properties of neural networks,” ICLR, 2014.
  • [16] I. C Moreira et al., “Inbreast: toward a full-field digital mammographic database,” Academic radiology, 2012.
  • [17] M. Heath et al., “Current status of the digital database for screening mammography,” in Digital mammography. 1998.
  • [18] J. E Ball and L. M. Bruce, “Digital mammographic computer aided diagnosis (cad) using adaptive level set segmentation,” in EMBI. IEEE, 2007.
  • [19] P. Kohli et al., “Robust higher order potentials for enforcing label consistency,” IJCV, 2009.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description