Breast Cancer Diagnosis with Transfer Learning and Global Pooling
Breast cancer is one of the most common causes of cancer-related death in women worldwide. Early and accurate diagnosis of breast cancer may significantly increase the survival rate of patients. In this study, we aim to develop a fully automatic, deep learning-based, method using descriptor features extracted by Deep Convolutional Neural Network (DCNN) models and pooling operation for the classification of hematoxylin and eosin stain (H&E) histological breast cancer images provided as a part of the International Conference on Image Analysis and Recognition (ICIAR) 2018 Grand Challenge on BreAst Cancer Histology (BACH) Images. Different data augmentation methods are applied to optimize the DCNN performance. We also investigated the efficacy of different stain normalization methods as a pre-processing step. The proposed network architecture using a pre-trained Xception model yields 92.50% average classification accuracy.
Breast cancer is the most common form of cancer in women aged 20–59 years worldwide. According to the data provided by the American Cancer Society , in 2019, about 268,600 new cases of invasive breast cancer and about 62,930 new cases of in situ breast cancer will be diagnosed in which nearly 41,760 women will die from breast cancer. Tumors can be subdivided into malignant (cancerous) and benign (non-cancerous) types, based on a variety of cell characteristics. Malignant tumors can be further categorized as being in situ (remain in place) or invasive. In situ carcinomas can form in the ducts or lobules of the breast and are not considerate to be invasive, but if left untreated, could increase the risk of developing invasive breast cancer . Early detection of the breast cancer is therefore important for increasing the survival rates of patients. The high morbidity and considerable healthcare cost associated with cancer has inspired researchers to develop more accurate models for cancer detection. Over the last five years, data mining and machine learning models have been used in a variety of research areas to dramatically improve our ability to discover emergent patterns within large datasets [33, 2, 31, 41, 14, 25]. Developing computer-aided diagnosis (CAD) systems, integrated with medical image computing and machine learning methods, has become one of the major research paradigms for life-critical diagnosis . CAD systems have been widely used in different fields, including mass detection , lung cancer screening , mammography and breast histopathology image analysis , medical ultrasound analysis , etc. Fig 1 demonstrates some examples of breast histopathology images from the BACH dataset.
I-a Related studies
In , Rakhlin et al., proposed a deep learning-based method for classification of H&E stained breast tissue images. For each image, 20 crops of 400×400 pixels and 650×650 were extracted. Then, pre-trained ResNet-50, InceptionV3 and VGG-16 networks were used as feature extractors. Extracted features were combined through 3-norm pooling into a single feature vector. A LightGBM classifier with 10-fold cross-validation was used to classify extracted deep features. That method achieved an average accuracy 87.2 ± 2.6% across all folds for classification of the breast cancer histology images.
In another study by Kwok , four DCNN architectures i.e. VGG19, InceptionV3, InceptionV4 and InceptionResnetV2 were employed for the classification of H&E stained histological breast cancer images for both multi-class and binary classification. In that study, 5600 patches with the size of 1495×1495 and stride of 99 pixels are extracted from the images. Different data augmentation methods were also employed to improve the accuracy of the method. In Kwok’s study, InceptionResnetV2 achieved the highest accuracy of 79% for multi-class and 91% for binary classification.
Vang et al.  proposed an ensemble-based InceptionV3 architecture for multi-class breast cancer image classification. Their proposed ensemble classifier included; majority voting, gradient boosting machine (GBM), and logistic regression to obtain the final image-wise prediction. The Vahadane  stain-normalization method was utilized to normalize the stain images and with refinement achieved 87.50% accuracy.
Another research study conducted by Sarmiento et al.  proposed a machine learning approach using feature vectors extracted from different characteristics such as shape, color, texture and nuclei from each image. A Support Vector Machine (SVM) classifier with a quadratic kernel with 10-fold cross-validation was used to classify images but only achieved an overall accuracy of 79.2%.
Finally, Nawaz et al.  employed a fine-tuned AlexNet architecture for automatic breast cancer classification. The patches with the size of 512×512 pixels from training images were extracted and achieved an overall accuracy of 75.73% for the patch-wise dataset and 81.25% for the image-wise dataset.
The rest of the paper is organized as follows. Motivation and contributions are explained in next subsection. Section II provides a detailed description of materials and the proposed approach. Section III presents the experimental results obtained from proposed network architecture. Finally, the paper concludes in Section IV and provides future directions.
I-B Motivations and contributions
Deep Convolution Neural Network (DCNN) models have achieved promising results in various medical imaging applications [1, 17, 12, 22]. Moreover, it has been shown that data augmentation and stain normalization pre-processing steps are useful to get a more robust and accurate performance [15, 32, 6]. These studies motivated us to explore the performance of different well-established DCNNs and also to verify the effectiveness of pre-processing and data augmentation techniques for breast cancer assessment from histopathological images. In the following, we list the main contributions of this study:
We demonstrate a new strategy for extracting bottleneck features from breast histological images using modified well-established DCNN networks. To learn more discriminative feature maps, we combine features extracted from convolutional layers after max pooling layers into a single feature vector, then, we used the obtained features as input to train a multilayer perceptron with 256 hidden neurons to classify the breast cancer histopathology images into four classes.
For further improvement of the classification performance, pre-processing steps using different stain-normalization methods are employed. These preprocessing steps help to reduce the color inconsistency and therefore lead to improved efficiency in learning high-level features.
The dataset provided for this study is very small. To increase the dataset size and improve the performance of our model, we utilized different data augmentation techniques such as horizontal and vertical flips, rotation, random contrast and random brightness.
In this section, the proposed method based on DCNN architecture for training and predicting of breast cancer is explained. In the first step, stained histological breast cancer tissue images are pre-processed using stain normalization techniques. In the second step, data augmentation procedures are performed to address the issue of limited size of dataset and optimize the performance of DCNN models. In the third step, high-level features are extracted from pre-processed images using proposed network architecture from well-established DCNN models. Next, these extracted features are used as an input to a standard multilayer perceptron classifier. Finally, the performance of the proposed model is evaluated and reported on test images.
Ii-a Network architecture
Feature extraction using DCNN models has achieved promising results in extracting high-level features for different classification tasks [42, 21, 5]. Since fine-tuning of well-established DCNN architectures has not previously achieved good performance on this dataset, for this study, we employ the DCNN descriptor approach [11, 10, 7] to extract features in order to represent the discriminative characteristics of different classes sufficiently. In the proposed approach, features are extracted from the convolutional layer immediately after the max pooling layer and then followed by a global average pooling layer. Afterwards the extracted features are fused into a single feature vector. The extracted features, then are fed into a multilayer perceptron classifier for the prediction. Fig 2 illustrates the proposed architecture for breast cancer classification. For example, as demonstrated in Fig 2, we extract features from layers of 4, 7, 11 and 15 of VGG16 architecture, then apply a global average pooling to the extracted features, and next, we fuse them together to produce the final feature vector.
Ii-B Data pre-processing
Standardization of the H&E stained images is an essential step before feeding the images to the deep networks. Therefore, in the first step, we stain normalize all histopathological images to reduce the color variation and hence have a better color consistency. We investigate the effectiveness of popular stain normalization techniques including methods proposed by Macenko et al. , and Reinhard et al. . The original and stain normalized images are shown in Fig 3.
Before feeding images into DCNN architectures, we also need to apply another normalization method by subtracting the mean RGB value of ImageNet dataset images from all images of the training and test dataset . The ImageNet mean RGB value is a precomputed constant derived from the ImageNet database .
Ii-C Data augmentation
The performance of the DCNN predictive models may degrade due to the small size of training dataset. In this regards, different data augmentation techniques such as horizontal and vertical flips, rotation, contrast adjustments and brightness correction are applied to enlarge the dataset and improve the classification performance. Some examples of in situ cases after pre-processing and data augmentation steps are shown in Fig 4.
Ii-D Pre-trained DCNN feature extractors
In this study, five DCNN architectures are employed as feature extractors, namely InceptionV3, InceptionResNetV2, Xception and two VGGNet models. Transfer learning is a method widely used in different tasks. In this method a large dataset from a source task is employed for training of a target task using the weights trained by the images from source dataset. The main advantage of transfer learning is the improvement of classifier accuracy and the acceleration of the learning process . Previous studies in the literature have demonstrated that transfer learning also has the potential to reduce the problem of overfitting  . Although the dataset is not the same, low-level features from the source dataset are generic features e.g. edges, contours and curves which are similar to the low-level features of target dataset .
Iii Experiment and Results
Iii-a Dataset description
The dataset used for this research is the ICIAR 2018 Grand Challenge  on BreAst Cancer Histology (BACH) Images publicly available at . The goal of this challenge is to develop computer analysis systems that assist pathologists for accurate breast cancer assessment from histopathological images. The dataset consists of 400 H&E stained histological breast tissue images with four categories namely as benign, normal, in-situ and invasive carcinoma evenly distributed (100 images per class). All images stored in tagged image file format (TIFF) with a magnification factor of 200× and a pixel size 0.42 µm * 0.42 µm. All images have the consistent shape of 2048 × 1536 pixels. We randomly divide the dataset into two parts, 300 images are used for training and 100 images for test data. In order to increase the size of the training dataset, we applied different data augmentation. The class distributions of dataset before and after data augmentation is presented in Table I.
|Number of images for each class|
|Original training Data||75||75||75||75||300|
|Augmented training Data||1155||1155||1155||1155||4620|
|Original test data||25||25||25||25||100|
Iii-B Experimental Setup
We do not extract patches for this experiment, unlike the majority of previous studies . All images are downsized into 512×512 pixels using bicubic interpolation and normalized by subtracting the mean image computed from the training set. A fully connected layer trained with ReLU activation function and followed by a dropout  with a rate of 0.5 to prevent overfitting. , and learning rate for Adam optimizer were set to 0.6, 0.8 and 0.001 respectively. Weights are initialized from weights trained on ImageNet, as suggested by  for all DCNNs. The batch size is set to 32, and we set 1000 epochs to train all models. Our experiment is implemented in Python using Keras package with Tensorflow as deep learning framework backend and run on Nvidia GeForce GTX 1080 Ti GPU with 11 GB RAM. For the proposed network architectures, descriptor features are extracted from blocks presented in Table II for each pre-trained DCNN model.
|InceptionResNetV2||Blocks (11, 18, 275, 618)|
|Xception||Blocks (26, 36, 126)|
|VGG16||Blocks (4, 11, 15)|
|VGG19||Blocks (4, 7, 17)|
Iii-C Results and discussion
The proposed framework is trained on five DCNN architectures, i.e. InceptionV3, InceptionResNetV2, Xception and two VGGNet models. The obtained results are compared with different existing stain-normalization techniques. We started our experiments by examining the effect of the stain normalization on performance. The performance of all architectures are evaluated based on the overall prediction accuracy. The obtained results of the plain architectures are summarized in Table III. As shown in this Table, the Xception and InceptionV3 architecture individually give better average classification accuracy of 88.50%, and 84.50%, respectively.
Table IV shows the obtained results of the proposed network architectures as well as average classification accuracies. The asterisk (*) indicates that the DCNN models are modified based on the proposed network architecture. As the results shown in Tables III and IV of our preliminary analysis suggest, the Reinhard stain-normalization technique could achieve better classification accuracy than Macenko stain-normalization technique in most of the architectures. As shown in Table IV, the Xception* and InceptionV3* architectures individually gives better average classification accuracy of 92.50%, and 90.00%, respectively. Xception* architecture has 92.50% average accuracy while VGG19* and VGG16* have average accuracies of 82.00% and 85.00%, respectively. This means the gap in accuracy is 10.50% and 7.50%, respectively in favor of Xception*. The gap of Xception* compared to InceptionResNetV2* and InceptionV3*, is 3.50% and 2.50%, respectively. So, Xception* has the best average accuracy and the VGG19* has the worst accuracy among all counterparts. Similar conclusions can be drawn for other models. It is also inferred from Table IV that employing the Reinhard stain-normalization method tends to give an improvement of overall accuracy by high margin of 3.00% compared to the Macenko stain-normalization method. The Xception* architecture proved to be most effective at classifying examples belonging to the Normal and Invasive classes using Macenko stain-normalization method and Benign and Invasive classes using Reinhard stain-normalization method as illustrated in the form of confusion matrices in Fig 5.
Iii-D Comparative analysis of accuracy with other methods
For evaluating the effectiveness of the proposed method, a comparative analysis with the results of some of the previously published work from the same dataset is presented in Table V. It can be observed from Table V that the methods in , , ,  and  give an accuracy of 87.50%, 87.20%, 81.25%,79.20%, and 79.00% respectively, whereas, the results obtained using the network architecture used here, give an accuracy of 92.50%. These results confirm the superiority of our learner in terms of accuracy compared to other similar methods.
In this paper, we proposed an effective deep learning-based method using a DCNN descriptor and pooling operation for the classification of breast cancer. We also employed different data augmentation techniques to boost the performance of classification. The effect of different stain normalization methods are also investigated. Experimental results demonstrate the proposed network architecture using pre-trained Xception model outperforms all other DCNN architectures with 92.50% in terms of average classification accuracy. For future work, we aim to further improve the classification accuracy by utilizing deep learning-based ensemble models and better stain normalization techniques.
-  (2018) A fully integrated computer-aided diagnosis system for digital x-ray mammograms via deep learning detection, segmentation, and classification. International journal of medical informatics 117, pp. 44–54. Cited by: §I-B, §I.
-  (2019) Identification of significant features and data mining techniques in predicting heart disease. 36, pp. 82 – 93. External Links: Cited by: §I.
-  (2019) BACH: grand challenge on breast cancer histology images. 56, pp. 122 – 139. External Links: Cited by: §III-A.
-  (2012) Automated malignancy detection in breast histopathological images. In Medical Imaging 2012: Computer-Aided Diagnosis, Vol. 8315, pp. 831515. Cited by: §III-B.
-  (2019) EmbraceNet: a robust deep learning architecture for multimodal classification. 51, pp. 259 – 270. External Links: Cited by: §II-A.
-  (2017) The importance of stain normalization in colorectal tissue classification with convolutional networks. In 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), pp. 160–163. Cited by: §I-B.
-  (2015) Multi-level photo quality assessment with multi-view features. Neurocomputing 168, pp. 308–319. Cited by: §II-A.
-  (2018) Automatic classification of tissue malignancy for breast carcinoma diagnosis. 96, pp. 41–51. Cited by: §I.
-  (2017) Plant identification using deep neural networks via optimization of transfer learning parameters. 235, pp. 228 – 235. External Links: Cited by: §II-D.
-  (2019) Probabilistic density-based estimation of the number of clusters using the dbscan-martingale process. Pattern Recognition Letters. Cited by: §II-A.
-  (2018) A rapidly deployable classification system using visual data for the application of precision weed management. Computers and Electronics in Agriculture 148, pp. 107–120. Cited by: §II-A.
-  (2018) Deep learning in mammography and breast histology, an overview and future trends. Medical image analysis 47, pp. 45–67. Cited by: §I-B, §I.
-  ICIAR 2018 grand challenge: In: 15th International Conference on Image Analysis, Recognition.. External Links: Cited by: §III-A.
-  (2016) Pseudoinverse matrix decomposition based incremental extreme learning machine with growth of hidden nodes. 16 (2), pp. 125–130. Cited by: §I.
-  (2019) A comparative study of deep learning architectures on melanoma detection. Tissue and Cell. Cited by: §I-B.
-  (2019) A novel deep learning based framework for the detection and classification of breast cancer using transfer learning. Pattern Recognition Letters 125, pp. 1 – 6. External Links: Cited by: §II-D.
-  (2019) A collaborative computer aided diagnosis (c-cad) system with eye-tracking, sparse attentional model, and deep learning. Medical image analysis 51, pp. 101–115. Cited by: §I-B, §I.
-  (2012) ImageNet Classification with Deep Convolutional Neural Networks. External Links: Cited by: §II-B.
-  (2018) Multiclass classification of breast cancer in whole-slide images. In International Conference Image Analysis and Recognition, pp. 931–940. Cited by: §I-A, §III-D, TABLE V.
-  (2018) A deeply supervised residual network for HEp-2 cell classification via cross-modal transfer learning. Pattern Recognition. External Links: Cited by: §II-D.
-  (2019) Benign and malignant classification of mammogram images based on deep learning. 51, pp. 347 – 354. External Links: Cited by: §II-A.
-  (2019) Deep learning in medical ultrasound analysis: a review. Engineering. Cited by: §I-B, §I.
-  (2019) Pathological brain detection based on alexnet and transfer learning. Journal of Computational ScienceNeurocomputingThe Journal of Machine Learning ResearchNeural Information Processing SystemsArtificial intelligence in medicineComputers in biology and medicineMedical Image AnalysisBiomedical Signal Processing and ControlInformation FusionAcademic Journal of Research In Economics and ManagementTelematics and InformaticsEnergy and BuildingsInternet of ThingsInternational Journal of Fuzzy Logic and Intelligent SystemsMedical Image Analysis 30, pp. 41 – 47. External Links: Cited by: §II-D.
-  (2009) A method for normalizing histology slides for quantitative analysis. In 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, pp. 1107–1110. Cited by: §II-B.
-  (2019-06) Crop Lodging Prediction from UAV-Acquired Images of Wheat and Canola using a DCNN Augmented with Handcrafted Texture Features. External Links: Cited by: §I.
-  (2017-04) Knowledge transfer for melanoma screening with deep learning. In 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), Vol. , pp. 297–300. Cited by: §III-B.
-  (2018) Classification of breast cancer histology images using alexnet. In International Conference Image Analysis and Recognition, pp. 869–876. Cited by: §I-A, §III-D, TABLE V.
-  (2018) Two-stage convolutional neural network for breast cancer histology image classification. In Image Analysis and Recognition, A. Campilho, F. Karray, and B. ter Haar Romeny (Eds.), Cham, pp. 717–726. External Links: Cited by: §III-B.
-  (2018) Deep convolutional neural networks for breast cancer histology image analysis. In International Conference Image Analysis and Recognition, pp. 737–744. Cited by: §I-A, §III-D, TABLE V.
-  (2001) Color transfer between images. IEEE Computer graphics and applications 21 (5), pp. 34–41. Cited by: §II-B.
-  (2019) A data mining-based method for revealing occupant behavior patterns in using mechanical ventilation systems of dutch dwellings. 193, pp. 99 – 110. External Links: Cited by: §I.
-  (2019) Multi-grade brain tumor classification using deep cnn with extensive data augmentation. Journal of computational science 30, pp. 174–182. Cited by: §I-B.
-  (2015) Introducing a hybrid model of DEA and data mining in evaluating efficiency. Case study: Bank Branches. 3 (2). Cited by: §I.
-  (2018) Automatic breast cancer grading of histological images based on colour and texture descriptors. In International Conference Image Analysis and Recognition, pp. 887–894. Cited by: §I-A, §III-D, TABLE V.
-  (2014) Dropout: a simple way to prevent neural networks from overfitting. 15 (1), pp. 1929–1958. Cited by: §III-B.
-  (2018) Computational normalization of h&e-stained histological images: progress, challenges and future potential. Cited by: §I.
-  U.S. Breast Cancer Statistics. External Links: Cited by: §I.
-  (2015) Structure-preserved color normalization for histological images. In 2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI), pp. 1012–1015. Cited by: §I-A.
-  (2018) Deep learning framework for multi-class breast cancer histology image classification. In International Conference Image Analysis and Recognition, pp. 914–922. Cited by: §I-A, §III-D, TABLE V.
-  (2017) Aggregating deep convolutional features for melanoma recognition in dermoscopy images. In Machine Learning in Medical Imaging, Q. Wang, Y. Shi, H. Suk, and K. Suzuki (Eds.), Cham, pp. 238–246. External Links: Cited by: §II-B.
-  (2019) EEG signal analysis for epileptic seizures detection by applying data mining techniques. pp. 100048. External Links: Cited by: §I.
-  (2019) Medical image classification using synergic deep learning. 54, pp. 10 – 19. External Links: Cited by: §II-A.