# Deep Structured Learning for Mass Segmentation from Mammograms

###### Abstract

In this paper, we present a novel method for the segmentation of breast masses from mammograms exploring structured and deep learning. Specifically, using structured support vector machine (SSVM), we formulate a model that combines different types of potential functions, including one that classifies image regions using deep learning. Our main goal with this work is to show the accuracy and efficiency improvements that these relatively new techniques can provide for the segmentation of breast masses from mammograms. We also propose an easily reproducible quantitative analysis to assess the performance of breast mass segmentation methodologies based on widely accepted accuracy and running time measurements on public datasets, which will facilitate further comparisons for this segmentation problem. In particular, we use two publicly available datasets (DDSM-BCRP and INbreast) and propose the computation of the running time taken for the methodology to produce a mass segmentation given an input image and the use of the Dice index to quantitatively measure the segmentation accuracy. For both databases, we show that our proposed methodology produces competitive results in terms of accuracy and running time.

Deep Structured Learning for Mass Segmentation from Mammograms

Neeraj Dhungel Gustavo Carneiro Andrew P. Bradley ^{†}^{†}thanks: This work was partially supported by the Australian Research Council’s Discovery Projects funding scheme (project DP140102794). Prof. Bradley is the recipient of an Australian Research Council Future Fellowship(FT110100623) |
---|

ACVT, School of Computer Science, The University of Adelaide |

School of Information Technology and Electrical Engineering, The University of Queensland |

Index Terms— Mammograms, mass segmentation, structured learning, structured inference

## 1 Introduction

Breast cancer is among the most common types of cancer in women. According to a WHO report [1], breast cancer accounts for 22.9% of diagnosed cancers and 13.7% of cancer related death worldwide. Early detection of breast cancer using imaging techniques is vital to improve survival rates and the most commonly used screening technique is X-ray mammography (MG), which enables the detection of suspicious masses and micro-calcifications that are subsequently used for classification [2, 3]. It has been observed that there is a trade off between sensitivity and specificity in the manual analysis of MG, which in general can reduce the efficacy of the diagnosys process [4]. The development of computer aided diagnostic (CAD) systems has the potential to improve this trade off, but it has been observed that the use of these systems in MG reduces the accuracy of screening by increasing the rate of biopsies without improving the detection of invasive breast cancer [5]. We believe that this issue can be fixed with a more easily reproducible and reliable assessment mechanism that provides a clear comparison between competing methodologies, which can lead to a better informed decision process related to the selection of appropriate algorithms for CAD systems in MG. Another reason for this poor performance lies in the reliance of current approaches on more traditional image processing and segmentation techniques, such as active contours, which typically produce sub-optimal results due to their non-convex cost functions and reliance on strong contour and appearance priors (e.g., smooth contours, strong edges, etc.). Therefore, we propose a statistical pattern recognition approach that estimates optimal (or near optimal) models directly from annotated data [6].

The main contribution of this paper is the use of structured support vector machine (SSVM) that can learn a structured output, representing the mass segmentation, from an input test image. We also propose a potential function (to be used in this structured learning problem) based on deep belief networks (DBN) that can learn complex features directly from MG. We also propose an easily reproducible assessment that measures both the accuracy and the efficiency of breast mass segmentation methodologies on the publicly available databases INbreast [7] and DDSM-BCRP [8]. We show that our methodology produces competitive mass segmentation results of the field on these two databases.

## 2 Methodology

In this section, we describe our statistical model, SSVM for learning and the DBN-based potential function for mass segmentation.

### 2.1 Statistical Model for Mass Segmentation

Let be a collection of mammograms, with ( denotes the image lattice) representing the region of interest (ROI) in the MG containing the mass, and representing the segmentation of , with ( represents mass and , background). Our model is denoted by the following probabilistic function [9]:

(1) |

where represents the model parameters, and the partition function. This model can be represented by a graph with nodes and edges between nodes, with in (1) defined as:

(2) |

where represents one of the potential functions that links label (hidden) nodes and pixel (observed) nodes, denotes one of the potential functions on the edges between label nodes, and is the component of vector .

### 2.2 Structured Learning and Inference

Learning the model parameters in (2) follows the SSVM procedure [10, 9], as follows:

(3) |

where measures a distance in the label space, satisfying the conditions and . This optimization is a quadratic programming problem involving an intractably large number of constraints. In order to keep the number of constraints manageable, we use the cutting plane algorithm, where the most violated constraint for for the training sample is found by:

(4) |

This algorithm is an iterative process that runs until no more violated inequalities are found (i.e., the right hand side in (4) is strictly larger than zero). This loss-augmented inference is efficiently solved with graph cuts [11] if the function can be decomposed in the label space. A simple example that works with graph cuts is , which represents the Hamming distance that can be decomposed in the label space, with denoting the Dirac delta function (this is the function used in this paper).

### 2.3 Potential Functions

One of the advantages of learning the model parameter in (2) is that we can define and use any number of potential functions between observed and hidden nodes and between hidden nodes. Specifically, we use three different types of potential functions between observed and hidden nodes.

The first type, , represents a prior of the location, size and shape of the mass (see Fig. 1-(d)). This prior is the mean annotation estimated from the training set, as follows:

(6) |

where . The second potential function is represented by a generative model based on a Gaussian mixture model (GMM - see Fig. 1-(e)), as in

(7) |

where with denoting the parameters of the model (means , variances and weights of components), is the Gaussian function, is the normalizer, represents the pixel value at image lattice position , and . The model parameters in (7) are learned from the annotated training set using the expectation-maximization (EM) algorithm [12]. Finally, the third function between observed and hidden nodes, , is based on the following free energy computed from a deep belief network (DBN) [13]:

(8) |

where represents a patch of size pixels extracted around image lattice position (the reason for taking a patch from position instead of using the whole image is essentially to reduce the computational complexity of the training and inference procedures - see Fig. 1(f)-(g), where (f) uses a region size and (g) uses ), represents the network weights and biases (hereafter, we drop the dependence on for notation simplicity). The DBN is a generative model represented by a multi-layer perceptron containing a large number of layers (typically more than three) and a large number of nodes per layer. The underlying DBN model with layers is represented by

(9) |

where denotes the hidden variables at layer containing nodes. The first term in (LABEL:eq:DBN) can be written as:

(10) |

where , are the biases and weights of the network, while the conditional probabilities in the remaining two terms can be factorized as because the nodes in layer are independent from each other given , which is a consequence of the DBN structure (note that is similarly defined). Finally, assuming that each node is activated by a sigmoid activation function , we have . Then (8) is computed with:

(11) |

which can be estimated by the mean field approximation of the values in layers to followed by the computation of free energy on the top layer [13]. The training of the DBN involves the estimation of the parameter in (8), which is achieved with an iterative layer by layer training of auto-encoders using contrastive divergence [13].

We use two different types of potential functions between label (hidden) nodes in (2), which encode label and contrast dependent labelling homogeneity. More precisely, the first potential function in (2) represents a label transition penalty [9], as follows:

(12) |

and the second function denotes a contrast penalty, as follows:

(13) |

with representing the pixel value at position , and .

## 3 Materials and Methods

The performance evaluation is carried out on two publicly available datasets: DDSM-BCRP [8] and INbreast [7]. The DDSM-BCRP [8] is part of the DDSM database used to evaluate CAD algorithms and consists of four datasets, from which we use the two datasets focused on spiculated masses, with 9 cases (77 annotated images) for training and 40 cases (81 annotated images) for testing. However, it is important to acknowledge that the annotations provided with DDSM-BCRP are inaccurate [15, 7] and so most of the literature uses subsets of DDSM with bespoke annotations that are not publicly available. The recently proposed INbreast database [7] has been developed to provide a high quality publicly available mammogram database, containing accurate annotations. INbreast has a total of 56 cases containing 116 accurately annotated masses, which have been divided into mutually exclusive train and test sets, each containing 58 images each on training and testing set. It is important to note that some cases in DDSM-BCRP and INbreast database contain multiple masses, where each case presents the Craniocaudal (CC) and Mediolateral (MLO) views.

We use Dice index (DI) = for quantitatively measuring the segmentation accuracy. Here denotes the number of mass pixels correctly segmented, the background pixels falsely segmented as mass, the correctly identified background pixels and the mass pixels not identified. Finally, the running time reports the average execution time per image of the segmentation algorithm in Alg. 1 on a standard computer (Intel(R) Core(TM) i5-2500k 3.30GHz CPU with 8GB RAM). The ROI is produced by a manual annotation of the mass centre and scale, where the size of the final ROI is two times the manually annotated scale. The ROI and is then resized to 40 x 40 pixels using bicubic interpolation. We adopt the image pre-processing described by Ball and Bruce [14]. (see Fig. 1(c)). This pre-processing step improves the contrast of the input image, which can potentially increase the separation between mass and background samples, facilitating the training and segmentation tasks.

## 4 Results

Figure 2 shows the performance of several combinations of potential functions (Sec. 2.3) using the proposed model in (1) on INbreast ^{1}^{1}1Please note that our results on DDSM-BCRP are similar to the ones on the INbreast data..
Specifically, Fig. 2 shows that the best performance on the test set is obtained using a combination of all potential functions.
Note that the Dice index of our methodology on the training set is , which is similar to the performance on the test set shown in Fig. 2, which is , indicating good generalization capability.
Also, the best Dice index of our methodology on the test set when we do not adopt the pre-processing described by Ball and Bruce [14] is , which indicates that this pre-processing is important in elevating the Dice index to .
Here, “Prior”, “GMM”, “DBN” represent the functions for with and denoting the image patch size used by the DBN (see Equations 6-8), “Binary1” is the label transition penalty in in (12), “Binary2” the pairwise contrast penalty of in (13) and “Binary12” indicates the use of both “Binary1” and “Binary2”.
Finally, the running time of each method is given in brackets.

Tab. 1 shows the accuracy and running time results of our approach with potential functions DBN3x3 + DBN5x5 + GMM + Prior + Binary12) on the test sets of DDSM-BCRP and INbreast. The results from the other methods are as reported by Horsh et al.[15] or by their original authors. However, note that the majority of the results on DDSM cannot be compared directly because they have been obtained with train and test sets that are not publicly available, and so cannot be reproduced (indicated by “Reproducible”). Also not all performance measures were reported (indicated by “?”).

Method | Rep. | Images | Dataset | DI | Time |

Proposed | yes | 158 | DDSM-BCRP | 0.87 | 0.8s |

Beller et al. [3] | yes | 158 | DDSM-BCRP | 0.70 | ?? |

Ball et al. [14] | no | 60 | DDSM | 0.85 | ? |

Hao et al. [16] | no | 1095 | DDSM | 0.85 | 5.1s |

Rahmati et al. [2] | no | 100 | DDSM | 0.93 | ? |

Song et al. [17] | no | 337 | DDSM | 0.83 | 0.96s |

Yuan et al. [18] | no | 483 | DDSM | 0.78 | 4.7s |

Proposed | yes | 116 | INbreast | 0.88 | 0.8s |

Cardoso et al. [19] | yes | 116 | INbreast | 0.88 | ? |

## 5 Discussion and Conclusion

Fig. 2 demonstrates that segmentation accuracy improves with the introduction of each potential function at a relatively small computational cost. Also, our method shows good generalization ability given the small differences between results on the train and test sets. It is interesting to note that the pre-processing stage provides a substantial increase in accuracy. Moreover, the “Prior” and “DBN” potential functions produce the best results among the potential functions in Sec. 2.3, but their integration in the model (1) is essential to produce the state-of-the-art results displayed in Fig. 2. Also notice that although the Dice index showed by the “Prior” potential function is relatively high, the shape produced by “Prior” is mostly circular, which means that DBN and GMM play important roles in segmenting the irregular boundaries shown by breast masses.

The comparison with the state of the art shown in Tab. 1 shows that our approach is computationally efficient, running in 0.8 seconds. In fact, this is the most efficient methodology reported in the field for this problem, to the best of our knowledge. Our method shows the best results on DDSM-BCRP, but using other subsets and annotations from DDSM (that are not publicly available), our method still appears competitive, having the second best overall result, with [2] being the most accurate. However, because we do not have access to the annotations and images used in [2], it is impossible to reproduce their experiment, making a direct comparison difficult. Finally, on INbreast our method ties with the approach by Cardoso et al [19], which is the current state of the art. The main limitation affecting our algorithm on both databases is the small size of the training set and the limited appearance and shape variations of the mass in this training set. These two issues induce the learning algorithm to put more weight on the potential function encoding the shape prior, , in (2), which results in large bias and small variance. By increasing the training sets, we can reduce the bias significantly without necessarily increasing the variance. This aspect is worth noticing because it increases the potential of our approach to produce more accurate results if richer and larger training sets become available.

We have shown that structured and deep learning produces competitive results on breast mass segmentation in terms of accuracy and efficiency. We strongly recommend that other researchers interested in the problem of breast mass segmentation use of the publicly available annotated databases DDSM-BCRP and INbreast. This will allow clearer comparisons between different methodologies, which can be used in determining the most effective approaches for this problem.

## References

- [1] A. Jemal, R. Siegel, E. Ward, Y. Hao, J. Xu, T. Murray, and M. Thun, “Cancer statistics, 2008,” CA: a cancer journal for clinicians, vol. 58, no. 2, pp. 71–96, 2008.
- [2] P. Rahmati, A. Adler, and G. Hamarneh, “Mammography segmentation with maximum likelihood active contours,” MIA, vol. 16, no. 6, pp. 1167–1186, 2012.
- [3] M. Beller, R. Stotzka, T. Müller, and H. Gemmeke, “An example-based system to support the segmentation of stellate lesions,” in Bildverarbeitung für die Medizin 2005, pp. 475–479. Springer, 2005.
- [4] J. Elmore, S. Jackson, and et al., “Variability in interpretive performance at screening mammography and radiologistsâ characteristics associated with accuracy1,” Radiology, vol. 253, no. 3, pp. 641–651, 2009.
- [5] Joshua J Fenton, Stephen Taplin, and et al., “Influence of computer-aided detection on performance of screening mammography,” New England Journal of Medicine, vol. 356, no. 14, pp. 1399–1409, 2007.
- [6] G. Carneiro, B. Georgescu, S. Good, and D. Comaniciu, “Detection and measurement of fetal anatomies from ultrasound images using a constrained probabilistic boosting tree,” IEEE TMI, vol. 27, no. 9, pp. 1342–1355, 2008.
- [7] I. Moreira, I. Amaral, I. Domingues, A. Cardoso, M. Cardoso, and J. Cardoso, “Inbreast: toward a full-field digital mammographic database,” Academic Radiology, vol. 19, no. 2, pp. 236–248, 2012.
- [8] M. Heath, K. Bowyer, D. Kopans, R. Moore, and P. Kegelmeyer, “The digital database for screening mammography,” in Proceedings of the 5th international workshop on digital mammography, 2000, pp. 212–218.
- [9] M. Szummer, P. Kohli, and D. Hoiem, “Learning crfs using graph cuts,” in ECCV 2008, pp. 582–595. Springer, 2008.
- [10] I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun, “Large margin methods for structured and interdependent output variables,” in Journal of Machine Learning Research, 2005, pp. 1453–1484.
- [11] Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts,” IEEE TPAMI, vol. 23, no. 11, pp. 1222–1239, 2001.
- [12] A. Dempster, N. Laird, and D. Rubin, “Maximum likelihood from incomplete data via the em algorithm,” Journal of the Royal Statistical Society. Series B, pp. 1–38, 1977.
- [13] G. Hinton and R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science, vol. 313, no. 5786, pp. 504–507, 2006.
- [14] J. Ball and L. Bruce, “Digital mammographic computer aided diagnosis (cad) using adaptive level set segmentation,” in Engineering in Medicine and Biology Society, 2007. EMBS 2007. 29th Annual International Conference of the IEEE. IEEE, 2007, pp. 4973–4978.
- [15] A. Horsch, A. Hapfelmeier, and M. Elter, “Needs assessment for next generation computer-aided mammography reference image databases and evaluation studies,” International journal of computer assisted radiology and surgery, vol. 6, no. 6, pp. 749–767, 2011.
- [16] X. Hao, Y. Shen, and S-R. Xia, “Automatic mass segmentation on mammograms combining random walks and active contour,” Journal of Zhejiang University SCIENCE C, vol. 13, no. 9, pp. 635–648, 2012.
- [17] E. Song, L. Jiang, R. Jin, L. Zhang, Y. Yuan, and Q. Li, “Breast mass segmentation in mammography using plane fitting and dynamic programming,” Academic radiology, vol. 16, no. 7, pp. 826–835, 2009.
- [18] Y. Yuan, M. Giger, H. Li, K. Suzuki, and C. Sennett, “A dual-stage method for lesion segmentation on digital mammograms,” Medical physics, vol. 34, pp. 4180, 2007.
- [19] J. Cardoso, I. Domingues, and H. Oliveira, “Closed shortest path in the original coordinates with an application to breast cancer,” Accepted for publication in Intern. Journal of Pattern Recognition and Artificial Intelligence.