Classification of Histopathological Biopsy Images Using Ensemble of Deep Learning Networks

Classification of Histopathological Biopsy Images Using Ensemble of Deep Learning Networks

Sara Hosseinzadeh Kassani 0000-0002-5776-7929University of SaskatchewanSaskatoonCanada Peyman Hosseinzadeh Kassani University of TulaneNew OrleansUSA Michal J. Wesolowski University of SaskatchewanSaskatoonCanada Kevin A. Schneider University of SaskatchewanSaskatoonCanada  and  Ralph Deters University of SaskatchewanSaskatoonCanada

Breast cancer is one of the leading causes of death across the world in women. Early diagnosis of this type of cancer is critical for treatment and patient care. Computer-aided detection (CAD) systems using convolutional neural networks (CNN) could assist in the classification of abnormalities. In this study, we proposed an ensemble deep learning-based approach for automatic binary classification of breast histology images. The proposed ensemble model adapts three pre-trained CNNs, namely VGG19, MobileNet, and DenseNet. The ensemble model is used for the feature representation and extraction steps. The extracted features are then fed into a multi-layer perceptron classifier to carry out the classification task. Various pre-processing and CNN tuning techniques such as stain-normalization, data augmentation, hyperparameter tuning, and fine-tuning are used to train the model. The proposed method is validated on four publicly available benchmark datasets, i.e., ICIAR, BreakHis, PatchCamelyon, and Bioimaging. The proposed multi-model ensemble method obtains better predictions than single classifiers and machine learning algorithms with accuracies of 98.13%, 95.00%, 94.64% and 83.10% for BreakHis, ICIAR, PatchCamelyon and Bioimaging datasets, respectively.

Computer-aided diagnosis, Deep learning, Feature extraction, Multi-model ensemble, Transfer learning
journalyear: 2019copyright: rightsretainedconference: CASCON’19; November 4-6, 2019; Toronto, ON, Canadadoi: 10.1145/3306307.3328180isbn: 978-1-4503-6317-4/19/07booktitle: CASCON’19, November 4-6, 2019, Toronto, ON, Canadaccs: Computing methodologies Artificial intelligenceccs: Computing methodologies Object recognitionccs: Computing methodologies Machine learning approachesccs: Computing methodologies Supervised learning by classification

1. Introduction

Breast cancer has become one of the major causes of cancer-related death worldwide in women (Khan et al., 2019). According to the World Health Organization reports (41), in 2018, it is estimated that 627,000 women died from invasive breast cancer - that is approximately 15% of all cancer-related deaths among women and breast cancer rates are increasing in nearly every country globally. It is evident that early detection and diagnosis plays an essential role in effective treatment planning and patient care. Cancer screening using breast tissue biopsies aims to distinguish between benign or malignant lesions. However, manual assessment of large-scale histopathological images is a challenging task due to the variations in appearance, heterogeneous structure, and textures (Li et al., 2019a). Such a manual analysis is laborious, and time intensive and often dependent on subjective human interpretation. For this reason, developing CAD systems is a possible solution for classification of Hematoxylin-Eosin (H&E) stained histological breast cancer images. In recent years, deep learning outperformed state-of-the-art methods in various fields of machine learning and medical image analysis tasks, such as classification (Mardanisamani et al., 2019), detection (Herent et al., 2019), segmentation (Lateef and Ruichek, 2019), and computer-based diagnosis (Maier et al., 2019). The merit of deep learning compared to other types of learners is its ability to obtain the performance similar to or better than human performance. Feature extraction is a critical step since the classifier performance directly depends on the quality of extracted low and high-level features. Several feature fusion methods employing pre-trained CNN models were proposed in the literature that effectively applied to medical imaging applications (Perdomo et al., 2019; Ma and Chu, 2019; Amin-Naji et al., 2019). Motivated by the success of ensemble learning models in computer vision, we propose a novel multi-model ensemble method for binary classification of breast histopathological images. The experimental results on four publicly available datasets demonstrate that the proposed ensemble method generates more accurate cancer prediction than single classifiers and widely-used machine learning algorithms.

2. Related works

Developing CAD systems using digital image processing and deep learning algorithms can assist pathologists with better diagnostic accuracy and less computational time. In (Vo et al., 2019), a combination of CNN and the boosting trees classifier was proposed for breast cancer detection on BreakHis dataset. The proposed model employed Inception-ResNet-v2 model for visual feature extraction from multi-scale images. Then a boosting classifier using gradient boosting trees was used for final classification step. In (Pratiher and Chattoraj, 2019), an ensemble of histological hashing and class-specific manifold learning was proposed for both binary and multi-class breast cancer detection on BreakHis dataset. In (Roy et al., 2019), a patch-based classifier by CNN and majority voting method were used for breast cancer histopathology classification on the augmented ICIAR dataset. The proposed classifier predicts the class label on both binary and multi-class task. In (Gandomkar et al., 2018), a framework using deep residual network was developed for H&E histopathological image classification. In (Han et al., 2017), a deep learning method based on GoogLeNet architecture was used for the image classification task, and a majority voting method was used for patient-level classification. In (Bejnordi et al., 2017b), a context-aware stacked convolutional neural network architecture was used for classifying whole slide images. The proposed method was trained on large input patches extracted from tissue structures. Finally, in (Spanhol et al., 2016), a deep learning method based on AlexNet architecture was used to classify breast histopathological images as benign or malignant cases.

A number of visual characteristics such as variations in sources of acquisition device, different protocols in stain normalization, variations in color, and heterogeneous textures in histopathological slide images can affect the performance of the Deep CNNs (Li et al., 2019b). Hence, developing a robust automated analysis tool to support the issue of data heterogeneity collected from multiple sources is a major challenge. To address this challenge, we propose a novel three-path ensemble architecture for binary classification of breast histopathological images collected from different datasets. Figure 1 depicts some examples of histology images acquired from different datasets. The variability and similarity of provided datasets can be observed in this figure.

Figure 1. Examples of variability in tissue patterns. Bioimaging 2015 (first row), BreakHis (second row), ICIAR 2018 (third row) and, PatchCamelyon dataset (fourth row).

The main contribution of this work is proposing a generic method that does not need handcrafted features and can be easily adapted to different datasets with the aim of reducing the generalization error and obtaining a more accurate prediction. We compared obtained results with the traditional machine learning algorithms and also with each selected CNN individually. Experimental results showed that the proposed method outperforms both the state-of-the-art architectures and the traditional machine learning algorithms on the provided datasets. The proposed model employs three well-established pre-trained CNNs - VGG19, MobileNet, and DenseNet which aims to incorporate specific components, i.e., standard convolutions, separable convolutions, depthwise convolutions, long skip, and short-cut connections. Doing so, we are able to overcome the data heterogeneity constraint and efficiently extract discriminative image features.

The rest of this paper is organized as follows. The proposed methodology for automatically classifying benign and malignant tissues is explained in Section 3. The datasets’ description, experimental settings, hyperparameter optimization and performance metrics are given in Section 4. A brief discussion and results analysis are provided in Section 5, and finally, the conclusion is presented in Section 6.

3. Methodology

3.1. Proposed Network architecture

Few studies have been published on the application of the ensemble deep learning method to breast histopathology images. Each of the adapted CNN architectures in the proposed model are constructed by different types of convolution layers in order to promote feature extraction and aggregation of fundamental information from a given input image. The block diagram of the proposed methodology of this study is shown in Figure 2. As it can be seen in this figure, the entire methodology is mainly divided into six steps: collecting H&E microscopic breast cancer histology images, data pre-processing, data augmentation, feature extraction using the proposed network, classification and finally model evaluation. We first improved the quality of visual information of each input image using different pre-processing strategies. Then the training dataset size is increased with various data augmentation techniques. Once input images are prepared, they are fed into the feature extraction phase with the proposed ensemble architecture. The extracted features from each architecture are flattened together to create the final multi-view feature vector. The generated feature vector is fed into a multi-layer perceptron to classify each image into corresponding classes. Finally, the performance of the proposed method is evaluated on test images using the trained model. We validated the performance of our proposed CNN architecture on the four publicly available datasets, namely: ICIAR, BreakHis, PatchCamelyon and Bioimaging.

Figure 2. Block diagram of the proposed methodology.

3.2. Feature extraction using transfer learning

Considering the high visual complexity of histopathological images, proper feature extraction is essential because of its impact on the performance of the classifier. However, due to the privacy issue in the medical domain (Uchibeke et al., 2018), the provided datasets are not large enough to sufficiently train a CNN (Hu et al., 2018). Recently, blockchain technology has been foreseen as a solution in the area of healthcare for secure data ownership management of electronic medical data or medical IoT devices (Samaniego et al., 2018; Samaniego and Deters, 2019). Aiming to tackle this challenge, a transfer learning strategy has been widely investigated to exploit the knowledge learned from cross domains instead of training a model from scratch with randomly initialized weights. In this method, we transfer knowledge learned by a dataset into the new dataset in another domain. Using a transfer learning approach, the model can learn general features from a source dataset that do not exist in the current dataset. Transfer learning has advantages such as speeding up the convergence of the network, reducing the computational power, and optimizing the network performance (Lu et al., 2019).

3.3. Three-path ensemble architecture for breast cancer classification

Three well-known architectures, VGG19 (Simonyan and Zisserman, 2014), MobileNetV2 (Howard et al., 2017) and DenseNet201 (Huang et al., 2017) are selected based on their (i) satisfying performances in different computer vision tasks (ii) usefulness towards real-time (or near real-time) applications and, (iii) feasibility of transfer learning for limited datasets. Considering that each method has shortcomings in regards to the variations of the shape and texture of the input image, inspired by the work of (Moeskops et al., 2016), we propose a three-path ensemble prediction approach to make use of the advantages of the multiple classifiers to improve overall accuracy. We selected theses networks based on the obtained results of an exhaustive grid-search technique on different state-of-the-art architectures (i.e. InceptionV3, InceptionresNetV2, Xception, ResNet50, MobileNetV2 and DenseNet201, VGG19 and VGG16) with different combination of hyperparameters including, optimizer, learning rate, weight initialization, batch size, dropout rate to obtain the best possible performance for breast cancer detection. Figure 3 illustrates the proposed ensemble architecture for breast cancer classification. As demonstrated in Figure 3, the proposed architecture is constructed by three independent CNN architectures. The final fully connected layers of each CNN architecture are combined together to produce the final feature vector. This combination allows capturing more informative features. Therefore, it is possible to achieve a more robust accuracy.

VGGNet (Simonyan and Zisserman, 2014) was introduced by Karen Simonyan and Andrew Zisserman from Visual Geometry Group (VGG) of the University of Oxford in 2014. It achieves one of the top performances in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2014. The network used 33 convolutional layers stacked on top of each other, alternated with a max pooling layer, two 4096 nodes for fully-connected layers, and finally followed by a softmax classifier.

The MobileNet (Howard et al., 2017) architecture is the second model used for this study. MobileNet, designed by Google researchers, is mainly designed for mobile phones and embedded applications. The MobileNet architecture was built based on depth-wise separable convolutions, followed by a pointwise convolution with a 11 convolution layer. In the standard convolution layer, each kernel is applied to all channels on the input image. While depthwise convolution is applied on each channel separately. This approach significantly reduces the number of parameters once is compared to standard convolutions with the same depth. MobileNet achieved inspiring performance over various applications with a fewer number of hyperparameters and computational resources.

As our third feature extractor, we employed DenseNet (Huang et al., 2017) architecture. DenseNet, stands for Densely-Connected Convolutional Networks, is proposed by Huang et al. (Huang et al., 2017). DenseNet introduces dense block, which is a sequential of convolutional layers, wherein every layer has a direct connection to all subsequent layers. This structure solves the issue of vanishing gradient and improves feature propagation by using very short connections between input and output layers throughout the network.

Figure 3. The proposed ensemble network with a three-path CNN of VGGNet, MobileNet and DenseNet.

4. Experiments

4.1. Datasets description

Four benchmark datasets are used for evaluating the performance of the proposed model. BreakHis (Spanhol et al., 2016) dataset consisting of 7909 H&E stained microscopic images which was collected from 82 anonymous patients. The dataset is divided into benign and malignant tumor biopsies. Small patches were extracted at four magnification of 40, 100, 200, and 400. The benign tumors were classified into four subclasses which were adenosis (A), tubular adenoma (TA), phyllodes tumor (PT), and fibroadenoma (F) and the malignant tumors were also classified into four subclasses which were ductal carcinoma (DC), mucinous carcinoma (MC), lobular carcinoma (LC), and papillary carcinoma (PC).

A modified version of the Patch Camelyon (PCam) benchmark dataset (Veeling et al., 2018; Bejnordi et al., 2017a), publicly available at (15), consisting of benign and malignant breast tumor biopsies is also used to evaluate the performance of the proposed classification model. The dataset consists of 327,680 microscopy images with 96 96-pixel size patches extracted from the whole-slide images with a binary label indicating the presence of metastatic tissue. We used the modified version of this database since the original Patch Camelyon database contained duplicated images.

Additionally, two other datasets, the Bioimaging 2015 (7) challenge dataset and the ICIAR 2018 (Aresta et al., 2019) dataset, are used in this work. The ICIAR 2018 dataset, available as part of the BACH challenge, was an extended version of the Bioimaging 2015 dataset. Both datasets consisted of 24 bits RGB H&E stained breast histology images and extracted from whole slide image biopsies, with a pixel size of 0.42 m 0.42 m acquired with 200 magnification. Each image is classified into four different classes, namely: normal tissues, benign lesions, in situ carcinomas and invasive carcinomas. The Bioimaging dataset contained 249 microscopy training images and 36 microscopy testing images in total, equally distributed among the four classes. The ICIAR dataset contained 100 images in each category, i.e., in a total of 400 training images. In order to create the binary database from these two datasets, we grouped the normal and benign classes into the benign category and the in situ and invasive classes into the malignant category.

4.2. Data preparation and pre-processing techniques

We adopted different data preparation techniques such as data augmentation, stain-normalization and image normalization strategies to optimize the training process. In the following, we briefly explain each of them.

4.2.1. Data augmentation

Due to the limited size of the input samples, training the CNN is prone to over-fitting leading to low detection rate (Li et al., 2019c). One solution to alleviate this issue is the data augmentation technique in which the aim is to generate more training data from the existing training set (Kassani and Kassani, 2019). Different data augmentation techniques, such as horizontal flipping, rotating and zooming are applied to datasets to create more training samples. The data augmentation parameters utilized for all datasets are presented in Table 1. Examples of histopathological images after the augmentation are shown in Figure 4.

Figure 4. Images obtained after data augmentation techniques. The left image is the original image and the right images are the artificially generated image after different data augmentation methods
Parameter Value
Horizontal Flip True
Vertical Flip True
Contrast Enhancement True
Zoom Range 0.2
Shear Range 0.2
Rotational Range 90
Fill Mode Nearest
Table 1. Data augmentation parameters.

4.2.2. Stain-normalization

The tissue slices are stained by Haematoxylin and Eosin (H&E) to differentiate between nuclei stained with purple color as well as other tissue structures stained with pink and red color to help pathologists analyze the shape of nuclei, density, variability and overall tissue structure. However, H&E staining variability between acquired images exists due to the different staining protocols, scanners and raw materials which is a common problem with histological image analysis. Therefore, stain-normalization of H&E stained histology slides is a necessary step to reduce the color variation and obtain a better color consistency prior to feeding input images into the proposed architecture. Different approaches have been proposed for stain normalization in histological images including Macenko et al. (Macenko et al., 2009), Reinhard et al. (Reinhard et al., 2001) and Vahadane et al. (Vahadane et al., 2015). For this experiment, Macenko et al. [22] approach is applied due to its promising performance in many studies (Xu et al., 2018; Roy et al., 2019; Albarqouni et al., 2016; Saraswat and Arya, 2014) to standardize the color intensity of the tissue. Macenko method is based on a singular value decomposition (SVD). In this method, a logarithmic function (Macenko et al., 2009) is used to adaptively transform color concentration of the original histopathological image into its optical density (OD) image as given in equation 1.


Where OD is the matrix of optical density values, I is the image intensity in RGB space and is the illuminating intensity incident on the histological sample.

4.2.3. Image normalization

Another necessary pre-processing step is intensity normalization. The primary purpose of image normalization (Yu et al., 2017) is to obtain the same range of values for each input image before feeding to the CNN model which also helps to speed up the convergence of the model. Input images are normalized to the standard normal distribution by min-max normalization to the intensity range of [0, 1], which is computed as:


where X is the pixel intensity. and are minimum and maximum intensity values of the input image in equation 2.

4.3. Experimental settings

All images were resized to 224x224 pixels using bicubic interpolation according to the input size of the selected pre-trained models. The batch size was set to 32 and all models trained for 1000 epochs. A fully connected layer trained with the rectified linear unit (ReLU) activation function with 256 hidden neurons followed by a dropout layer with a probability of 0.5 to prevent over-fitting. Dropout layer helps to further reduce over-fitting by randomly eliminates their contribution in the training process. For Adam optimizer, , and learning rate were set to 0.6, 0.8 and 0.0001, respectively. For fine-tuning, we have modified the last dense layer in all architectures to output two classes corresponding to benign and malignant lesions instead of 1000 classes as was proposed for ImageNet. All pre-trained Deep CNN models are fine-tuned separately. Also, the network weights were initialized from weights trained on ImageNet. The operating system is Windows with an Intel(R) Core(TM) i7-8700K 3.7 GHz processors with 32 GB RAM. Training and testing process of the proposed architecture for this experiment is implemented in Python using Keras package with Tensorflow as the deep learning framework backend and run on Nvidia GeForce GTX 1080 Ti GPU with 11GB RAM.

4.4. Evaluation criteria

The performance of the proposed classification model evaluated based on recall, precision, F1-score, and accuracy. Given the number of true positives (TP), false positives (FP), true negatives (TN) and false negatives (FN), the measures are mathematically expressed as follows:


5. Discussion

In this research, we focused on the binary classification for histopathological images using a three-path ensemble architecture with transfer learning and fine-tuning. To verify the effectiveness of the presented methodology, different comparative analyses were conducted. First, we compare the obtained results of the proposed ensemble model on the four provided datasets. Then, the comparison between proposed ensemble architecture and CNN classifiers individually is provided and finally, we present the comparison of the proposed ensemble architecture and machine learning algorithms. In Table 2 and Figure 5, the obtained accuracy, precision, recall and F-score of the proposed approach for each benchmark dataset is demonstrated. The proposed method on BreakHis dataset achieved the highest accuracy, precision, recall, and F-score with values of 98.13%, 98.75%, 98.54% and 98.64%, respectively.

Accuracy Precision Recall F-score
BreakHis 98.13% 98.75% 98.54% 98.64%
PatchCamelyon* 94.64% 95.70% 95.27% 95.50%
ICIAR 95.00% 95.91% 94.00% 94.94%
Bioimaging 83.10% 92.60% 71.42% 80.64%
Table 2. Results of accuracy, precision, recall, and F-score of the proposed method on four open access datasets.

On the other hand, the results also demonstrate that the detection rate is worst on the Bioimaging dataset with 83.10% accuracy, 92.60% precision, 71.42% recall and 80.64% F-score. Table 3 and Figure 6 presents the performance of the single classifiers on the four datasets. Analyzing Table 3 and Figure 6, we obtain the maximum 97.42%, 96.41% and 92.40% accuracies are produced on the BreakHis dataset by DenseNet201, VGG19 and MobileNetV2 models, respectively.

VGG19 MobileNetV2 DenseNet201
BreakHis 96.41% 92.40% 97.42%
PatchCamelyon* 90.84% 89.09% 87.84%
ICIAR 90.00% 92.00% 85.00%
Bioimaging 81.69% 78.87% 80.28%
Table 3. Results of accuracies obtained by single classifiers on four open access datasets.
Figure 5. Results of accuracy, precision, recall, and F-score of the proposed method on four open access datasets
Figure 6. Classification accuracy of single classifiers of VGG19, MobileNetV2, DenseNet201
BreakHis PCamelyon* ICIAR Bioimaging
InceptionV3 87.66% 87.52% 83.00% 85.00%
Xception 86.37% 88.05% 83.00% 78.77%
ResNet50 79.48% 79.06% 80.00% 63.38%
InceptionResNetV2 92.40% 89.93% 89.00% 76.06%
VGG16 93.54% 88.39% 89.00% 83.10%
Table 4. Classification results of different state-of-the-art CNN classifiers on four datasets.
Method Dataset Accuracy
Roy et al. (Roy et al., 2019) ICIAR 92.50%
Vo et al. (Vo et al., 2019) BreakHis 96.30%
Pratiher et al. (Pratiher and Chattoraj, 2019) BreakHis 98.70%
Spanhol et al. (Spanhol et al., 2016) BreakHis 84.60%
Han et al. (Han et al., 2017) BreakHis 96.90%
Gandomkar et al. (Gandomkar et al., 2018) BreakHis 97.90%
Brancati et al. (Brancati et al., 2018) Bioimaging 88.90%
Arujo et al. (Araújo et al., 2017) Bioimaging 83.30%
Vo et al. (Vo et al., 2019) Bioimaging 99.50%
Table 5. Comparative analysis with presented methods in the literature.
BreakHis PatchCamelyon* ICIAR Bioimaging
Decision Tree 91.67% 76.24% 77.00% 71.83%
Random Forest 92.10% 82.54% 85.00% 69.01%
XGBoost 94.11% 87.15% 89.00% 78.87%
AdaBoost 91.82% 76.49% 79.00% 63.38%
Bagging 94.97% 88.05% 87.00% 81.69%
Table 6. Comparison of classification accuracies obtained by different machine learning models.

The classification results of different well-established CNN architectures, including InceptionV3, Xception, ResNet50, InceptionResNetV2 and VGG16 are summarized in Table 4. Analyzing Table 4, we observe that there is a level of variation in all results of datasets. As the results confirms the proposed architecture and each of the selected single classifiers delivered higher accuracy in all of the datasets except InceptionV3 architecture for Bioimaging dataset. In Bioimaging dataset, the inceptionV3 network obtained 85.00% accuracy which is 1.9% lower than result obtained by proposed architecture with 83.10% accuracy.

For the sake of comparison, the performance of the proposed ensemble model is compared with the results of the previously published work for binary classification of breast cancer in Table 5. Referring to Table 5, on the BreakHis dataset, our proposed approach (98.13% accuracy) achieved a better performance compared to the methods in (Spanhol et al., 2016; Vo et al., 2019; Han et al., 2017) with accuracies of 86.6%, 96.3% and 96.9%, respectively. However, the result reported in the study of (Pratiher and Chattoraj, 2019) with accuracy of 98.7% achieved better performance than our proposed method with 98.13% accuracy with a gap of accuracy of 0.57%. On the binary classification of ICIAR dataset, the study in (Roy et al., 2019) achieved 92.5% while proposed method achieved 95%. On the binary classification of Bioimaging dataset, the proposed model obtained poor results in compare with studies of (Vo et al., 2019; Brancati et al., 2018) and only outperformed study in (Araújo et al., 2017) [Arujo], which is slightly higher performance with a gap of accuracy of 0.7%. Finally, for PatchCamelyon* dataset, no study reported in the literature yet.

To validate the performance of the proposed model, we also compare the proposed method with five machine learning models, namely, Decision Tree, Random Forest, XGBoost, AdaBoost and Bagging Classifier. Table 6 summarizes the comparison of the performance of the state-of-the-art machine learning algorithms, i.e., Decision Tree, Random Forest, XGBoost, AdaBoost and Bagging Classifier. As given in this table, the topmost result was obtained by bagging classifier with 94.97% accuracy for BreakHis dataset. Random Forest produced 69.01% accuracy for Bioimaging dataset, which is the worst accuracy achieved in the classification of benign and malignant cases.

Our proposed model in the ICIAR dataset achieved 95.00% overall accuracy, which is the highest result reported in the literature for binary classification of this dataset with a gap in the accuracy of 5.00% for VGG19, 3.00% for mobileNetV2 and 10.00% for DenseNet201. The proposed model, on the same dataset, also outperforms other machine learning models by 18.00% for Decision Tree, 10.00% for Random Forest, 6.00% XGBoost, 16.00% for AdaBoost and finally 8.00% for Bagging Classifier. The largest gap is observed for Bioimaging dataset between the proposed model and Adaboost classifier, where the difference is more than 19.00%. The second most significant gap is achieved for the modified PatchCamelyon dataset between the proposed model and Decision Tree classifier, where the difference is 18.40%. The smallest gap is seen for BreakHis dataset between the proposed model and DenseNet201 architecture, where the difference is less than 1.00%. Similar conclusions can be drawn for other models. The experiment results indicate that the performance of the proposed ensemble method yields satisfactory results and outperforms both the state-of-the-art CNNs and machine learning algorithms in cancer classification on four publicly available benchmark datasets with a large gap in terms of accuracy. The proposed method is generic as it does not need handcrafted features and can be easily adapted to different detection tasks, requiring minimal pre-processing. These datasets were collected across multiple sources with different shape, textures and morphological characteristics. The transfer learning strategy has successfully transferred knowledge from the source to the target domain despite the limited dataset size of ICIAR and Bioimaging databases. During the proposed approach, we observed that no over-fitting occurs to impact the classification accuracy adversely.

The performance of all of the single classifier and the proposed ensemble model was poor on Bioimaging dataset. For this dataset, benign cases are confused with malignant cases since the morphology of some benign classes is more similar to malignant samples. Intuitively, the main reason is that the size of the Bioimaging dataset is not large enough for deep learning models to capture high-level features and distinguish classes from each other. Although, data augmentation strategies are employed to tackle this problem, but it will be more appropriate to collect more training data by increasing the number of samples rather than artificially increase the size of the dataset by data augmentation methods. Also, employing pre-trained models requires input images to be resized to a certain dimension which may discard discriminating information from this dataset.

6. Conclusion

This paper presents an ensemble-based deep learning approach for aided diagnosis of breast cancer detection. Three well-established CNNs architectures, namely VGG19, MobileNetV2 and DenseNet201 are ensembled for feature representation and extraction using different components. The combination of such various features leads to a better generalization performance than single classifiers as counterparts. The experimental results showed that the proposed model not only outperformed the individual CNN classifiers but also outperformed state-of-the-art machine learning algorithms in all the test sets of the provided datasets. The highest and lowest performances were obtained for BreakHis and Bioimaging datasets, respectively. Thus, the deep learning-based multi-model ensemble method can make full use of the local and global features at different levels and improve the prediction performance of the base architectures across different datasets. This research is a foundation for our future publication in the integration of deep learning and blockchain technology.


  • S. Albarqouni, C. Baur, F. Achilles, V. Belagiannis, S. Demirci, and N. Navab (2016) Aggnet: deep learning from crowds for mitosis detection in breast cancer histology images. IEEE transactions on medical imaging 35 (5), pp. 1313–1321. Cited by: §4.2.2.
  • M. Amin-Naji, A. Aghagolzadeh, and M. Ezoji (2019) Ensemble of cnn for multi-focus image fusion. Information Fusion 51, pp. 201 – 214. External Links: ISSN 1566-2535, Document, Link Cited by: §1.
  • T. Araújo, G. Aresta, E. Castro, J. Rouco, P. Aguiar, C. Eloy, A. Polónia, and A. Campilho (2017) Classification of breast cancer histology images using convolutional neural networks. PloS one 12 (6), pp. e0177544. Cited by: Table 5, §5.
  • G. Aresta, T. Araújo, S. Kwok, S. S. Chennamsetty, M. Safwan, V. Alex, B. Marami, M. Prastawa, M. Chan, M. Donovan, et al. (2019) Bach: grand challenge on breast cancer histology images. Medical image analysis. Cited by: §4.1.
  • B. E. Bejnordi, M. Veta, P. J. Van Diest, B. Van Ginneken, N. Karssemeijer, G. Litjens, J. A. Van Der Laak, M. Hermsen, Q. F. Manson, M. Balkenhol, et al. (2017a) Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. Jama 318 (22), pp. 2199–2210. Cited by: §4.1.
  • B. E. Bejnordi, G. Zuidhof, M. Balkenhol, M. Hermsen, P. Bult, B. van Ginneken, N. Karssemeijer, G. Litjens, and J. van der Laak (2017b) Context-aware stacked convolutional neural networks for classification of breast carcinomas in whole-slide histopathology images. Journal of Medical Imaging 4 (4), pp. 044504. Cited by: §2.
  • [7] Bioimaging 2015 dataset. External Links: Link Cited by: §4.1.
  • N. Brancati, M. Frucci, and D. Riccio (2018) Multi-classification of breast cancer histology images by using a fine-tuning strategy. In International Conference Image Analysis and Recognition, pp. 771–778. Cited by: Table 5, §5.
  • Z. Gandomkar, P. C. Brennan, and C. Mello-Thoms (2018) MuDeRN: multi-category classification of breast histopathological image using deep residual networks. Artificial Intelligence in Medicine 88, pp. 14 – 24. External Links: ISSN 0933-3657, Document, Link Cited by: §2, Table 5.
  • Z. Han, B. Wei, Y. Zheng, Y. Yin, K. Li, and S. Li (2017) Breast cancer multi-classification from histopathological images with structured deep learning model. Scientific reports 7 (1), pp. 4172. Cited by: §2, Table 5, §5.
  • P. Herent, B. Schmauch, P. Jehanno, O. Dehaene, C. Saillard, C. Balleyguier, J. Arfi-Rouche, and S. Jégou (2019) Detection and characterization of mri breast lesions using deep learning. Diagnostic and interventional imaging 100 (4), pp. 219–225. Cited by: §1.
  • A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861. Cited by: §3.3, §3.3.
  • Z. Hu, J. Tang, Z. Wang, K. Zhang, L. Zhang, and Q. Sun (2018) Deep learning for image-based cancer detection and diagnosis- a survey. Pattern Recognition 83, pp. 134–149. Cited by: §3.2.
  • G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger (2017) Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708. Cited by: §3.3, §3.3.
  • [15] Kaggle -Histopathologic Cancer Detection. External Links: Link Cited by: §4.1.
  • S. H. Kassani and P. H. Kassani (2019) A comparative study of deep learning architectures on melanoma detection. Tissue and Cell 58, pp. 76–83. Cited by: §4.2.1.
  • S. Khan, N. Islam, Z. Jan, I. U. Din, and J. J. P. C. Rodrigues (2019) A novel deep learning based framework for the detection and classification of breast cancer using transfer learning. Pattern Recognition Letters 125, pp. 1 – 6. External Links: ISSN 0167-8655, Document, Link Cited by: §1.
  • F. Lateef and Y. Ruichek (2019) Survey on semantic segmentation using deep learning techniques. Neurocomputing 338, pp. 321 – 348. External Links: ISSN 0925-2312, Document, Link Cited by: §1.
  • C. Li, X. Wang, W. Liu, L. J. Latecki, B. Wang, and J. Huang (2019a) Weakly supervised mitosis detection in breast histopathology images using concentric loss. Medical Image Analysis 53, pp. 165 – 178. External Links: ISSN 1361-8415, Document, Link Cited by: §1.
  • C. Li, X. Wang, W. Liu, L. J. Latecki, B. Wang, and J. Huang (2019b) Weakly supervised mitosis detection in breast histopathology images using concentric loss. Medical image analysis 53, pp. 165–178. Cited by: §2.
  • H. Li, S. Zhuang, D. Li, J. Zhao, and Y. Ma (2019c) Benign and malignant classification of mammogram images based on deep learning. Biomedical Signal Processing and Control 51, pp. 347–354. Cited by: §4.2.1.
  • S. Lu, Z. Lu, and Y. Zhang (2019) Pathological brain detection based on alexnet and transfer learning. Journal of computational science 30, pp. 41–47. Cited by: §3.2.
  • S. Ma and F. Chu (2019) Ensemble deep learning-based fault diagnosis of rotor bearing systems. Computers in Industry 105, pp. 143 – 152. External Links: ISSN 0166-3615, Document, Link Cited by: §1.
  • M. Macenko, M. Niethammer, J. S. Marron, D. Borland, J. T. Woosley, X. Guan, C. Schmitt, and N. E. Thomas (2009) A method for normalizing histology slides for quantitative analysis. In 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, pp. 1107–1110. Cited by: §4.2.2.
  • A. Maier, C. Syben, T. Lasser, and C. Riess (2019) A gentle introduction to deep learning in medical image processing. Zeitschrift für Medizinische Physik 29 (2), pp. 86 – 101. External Links: ISSN 0939-3889, Document, Link Cited by: §1.
  • S. Mardanisamani, F. Maleki, S. Hosseinzadeh Kassani, S. Rajapaksa, H. Duddu, M. Wang, S. Shirtliffe, S. Ryu, A. Josuttes, T. Zhang, et al. (2019) Crop lodging prediction from uav-acquired images of wheat and canola using a dcnn augmented with handcrafted texture features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 0–0. Cited by: §1.
  • P. Moeskops, M. A. Viergever, A. M. Mendrik, L. S. de Vries, M. J. Benders, and I. Išgum (2016) Automatic segmentation of mr brain images with a convolutional neural network. IEEE transactions on medical imaging 35 (5), pp. 1252–1261. Cited by: §3.3.
  • O. Perdomo, H. Rios, F. J. Rodríguez, S. Otálora, F. Meriaudeau, H. Müller, and F. A. González (2019) Classification of diabetes-related retinal diseases using a deep learning approach in optical coherence tomography. Computer Methods and Programs in Biomedicine 178, pp. 181 – 189. External Links: ISSN 0169-2607, Document, Link Cited by: §1.
  • S. Pratiher and S. Chattoraj (2019) Diving deep onto discriminative ensemble of histological hashing & class-specific manifold learning for multi-class breast carcinoma taxonomy. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1025–1029. Cited by: §2, Table 5, §5.
  • E. Reinhard, M. Adhikhmin, B. Gooch, and P. Shirley (2001) Color transfer between images. IEEE Computer graphics and applications 21 (5), pp. 34–41. Cited by: §4.2.2.
  • K. Roy, D. Banik, D. Bhattacharjee, and M. Nasipuri (2019) Patch-based system for classification of breast histology images using deep learning. Computerized Medical Imaging and Graphics 71, pp. 90 – 103. External Links: ISSN 0895-6111, Document, Link Cited by: §2, §4.2.2, Table 5, §5.
  • M. Samaniego and R. Deters (2019) Pushing software-defined blockchain components onto edge hosts. In Proceedings of the 52nd Hawaii International Conference on System Sciences, Cited by: §3.2.
  • M. Samaniego, C. Espana, and R. Deters (2018) Smart virtualization for iot. In 2018 IEEE International Conference on Smart Cloud (SmartCloud), pp. 125–128. Cited by: §3.2.
  • M. Saraswat and K. Arya (2014) Automated microscopic image analysis for leukocytes identification: a survey. Micron 65, pp. 20–33. Cited by: §4.2.2.
  • K. Simonyan and A. Zisserman (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Cited by: §3.3, §3.3.
  • F. A. Spanhol, L. S. Oliveira, C. Petitjean, and L. Heutte (2016) Breast cancer histopathological image classification using convolutional neural networks. In 2016 international joint conference on neural networks (IJCNN), pp. 2560–2567. Cited by: §2, §4.1, Table 5, §5.
  • U. U. Uchibeke, S. H. Kassani, K. A. Schneider, and R. Deters (2018) Blockchain access control ecosystem for big data security. arXiv preprint arXiv:1810.04607. Cited by: §3.2.
  • A. Vahadane, T. Peng, S. Albarqouni, M. Baust, K. Steiger, A. M. Schlitter, A. Sethi, I. Esposito, and N. Navab (2015) Structure-preserved color normalization for histological images. In 2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI), pp. 1012–1015. Cited by: §4.2.2.
  • B. S. Veeling, J. Linmans, J. Winkens, T. Cohen, and M. Welling (2018) Rotation equivariant cnns for digital pathology. In International Conference on Medical image computing and computer-assisted intervention, pp. 210–218. Cited by: §4.1.
  • D. M. Vo, N. Nguyen, and S. Lee (2019) Classification of breast cancer histology images using incremental boosting convolution networks. Information Sciences 482, pp. 123 – 138. External Links: ISSN 0020-0255, Document, Link Cited by: §2, Table 5, §5.
  • [41] WHO-Breast cancer. External Links: Link Cited by: §1.
  • H. Xu, C. Lu, R. Berendt, N. Jha, and M. Mandal (2018) Automated analysis and classification of melanocytic tumor on skin whole slide images. Computerized Medical Imaging and Graphics 66, pp. 124–134. Cited by: §4.2.2.
  • Z. Yu, X. Jiang, T. Wang, and B. Lei (2017) Aggregating deep convolutional features for melanoma recognition in dermoscopy images. In Machine Learning in Medical Imaging, Q. Wang, Y. Shi, H. Suk, and K. Suzuki (Eds.), Cham, pp. 238–246. External Links: ISBN 978-3-319-67389-9 Cited by: §4.2.3.
Comments 3
Request Comment
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description