A Hybrid Deep Learning Architecture for Leukemic B-lymphoblast Classification
Automatic detection of leukemic B-lymphoblast cancer in microscopic images is very challenging due to the complicated nature of histopathological structures. To tackle this issue, an automatic and robust diagnostic system is required for early detection and treatment. In this paper, an automated deep learning-based method is proposed to distinguish between immature leukemic blasts and normal cells. The proposed deep learning based hybrid method, which is enriched by different data augmentation techniques, is able to extract high-level features from input images. Results demonstrate that the proposed model yields better prediction than individual models for Leukemic B-lymphoblast classification with 96.17% overall accuracy, 95.17% sensitivity and 98.58% specificity. Fusing the features extracted from intermediate layers, our approach has the potential to improve the overall classification performance.
Leukemia is a type of cancer associated with white blood cells that originates in the bone marrow and affects both children and adults. Leukemia can be divided into acute or chronic categories based on how quickly it progresses. There are four types of leukemia namely, Acute Myelogenous Leukemia (AML), Acute Lymphoblastic Leukemia (ALL), Chronic Myeloid Leukemia (CML), and Chronic Lymphocytic Leukemia (CLL)  . The most common types of leukemia that affect young children are AML and ALL. In ALL, lymphocytes - a type of white blood cell (WBC) - do not function properly and reproduce out of control, leading to anemia . This can lead to premature death if it is diagnosed in later stages or if the treatment process is delayed. Subject age is an important risk factor affecting prognosis, since the risk of developing ALL is highest in children below the age of 7-8 years. The risk then decreases until the mid-20s and begins to increase again after age 50. According to the data provided by , in 2018, about 5930 new cases of ALL will be diagnosed and about 1500 patients are expected to die of ALL, including both children and adults, in the United States. The risk of getting ALL is slightly higher in males than females, and higher in whites than African-Americans. However, if leukemia is diagnosed in its early stages, it is highly curable and increases the survival rate of the patients. Considering the large-scale of histopathology images, assessment of the images in a conventional way can be laborious, error-prone and hugely time-consuming since some images are highly variable in morphology which is difficult to analyze. Therefore, developing accurate and reliable approaches for Leukemia detection is important for early treatment. Numerous study results showed that with the advancement of computational capabilities, hidden trends, patterns and relationships can be discovered using the application of data mining approaches in many different areas [24, 21, 28, 14]. Fig 1 illustrates examples of ALL and healthy cells.
The details of our approach are shown in Fig 2 and are described in the subsequent sections. Briefly, we present an automatic leukemic B-lymphoblast classification system using a hybrid of two Convolution Neural Network (CNN) and transfer learning to extract features from each input image. Unlike previous approaches, instead of using deep features extracted from the entire pre-trained architectures, in our approach, fusing the features from specific abstraction layers can be deemed as auxiliary features lead to further improvement of the classification accuracy. In this approach features extracted from the lower levels are combined into higher dimension feature maps to help improve the discriminative capability of intermediate features and also overcome the problem of network gradient vanishing/exploding.
Ii Related Studies
Several methods for automated leukemia detection on microscopic images have been reported in the literature over the years. Singhal et al.  have used a support vector machine (SVM) classifier for automatic detection of Acute Lymphoblastic Leukemia based on geometric features and local binary pattern (LBP) texture features. Experimental results showed that the LBP texture features perform better with 89.72% accuracy compared to the shape features with 88.79% one.
The model proposed by Yu et al.  is a combination of state-of-the-art convolution neural networks including ResNet50, InceptionV3, VGG16, VGG19 and Xception for automatic cell recognition system using convolutional neural networks. The obtained result of the proposed model is compared to traditional machine learning algorithms such as K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Logistic Regression (LR), and Decision Tree (DT). This approach resulted in 88.50% accuracy with CNN models.
In the method proposed by Mohamed et al. , the color space of each image is converted to YCbCr and then the gaussian distribution of Cb and Cr values is constructed. For training a classifier, various features such as texture, size and morphological are computed. The proposed model achieved 94.3% accuracy using Random Forest classifier for diagnosing of Leukemia (ALL and AML) and Myeloma.
Mohapatra et al.  described a method for acute leukemia detection in stained blood smear and bone marrow microscopic images. An ensemble model was trained using features extracted from the input images. The results showed that ensemble of classifiers achieved 94.73% average prediction accuracy with an average sensitivity and average specificity of greater than 90% in comparison with other standard classifiers, i.e., naive Bayesian (NB), K-nearest neighbor, multilayer perceptron (MLP), radial basis functional network (RBFN), and SVM.
In the method proposed by Patel et al. , leukemia detection was modeled by k-mean clustering. The model was also able to calculate the percentage of leukemia infection in microscopic images. The performance of Patel’s method was 93.57% accuracy.
Finally, Mourya et al.  proposed the use of a deep learning-based hybrid architecture, with two CNN architectures to improve the classification accuracy. The model was tested on 636 samples of normal and ALL cells and showed 89.70% accuracy.
Iii Materials and Methods
Our approach consists of the following stages: Initially, we enhance the quality of visual information of each input image using different pre-processing and augmentation techniques to increase the visibility of crucial structures. Once input images are prepared, they are used in the feature extraction phase with the proposed hybrid architecture. We explore two architectures namely, VGG16  and MobileNet  for our hybrid model. VGG16 is a very simple yet effective architecture consists of 13 convolutional using 3 x 3 convolution filters followed by max pooling layers and two 4096 fully-connected layers, followed by a softmax classifier. MobileNet architecture is designed for object recognition on mobile devices. This architecture consists of depth-wise separable convolution and 1×1 point-wise convolutions. The performance of the MobileNet architecture is evaluated on ImageNet dataset and achieved an accuracy in the same level of accuracy as VGG16 with 32 times less parameters while is 27 times less computationally intensive. Since each architecture has its own shortcomings, we come up with an integrating strategy to make use of the advantages of both architectures in order to improve overall prediction accuracy. The extracted features were trained by a multi-layer perceptron to classify each image into corresponding class probabilities. Finally, the performance of the proposed architecture is evaluated on test images.
Iii-B Experimental Dataset
The dataset used for this study is based on Classification of Normal versus Malignant Cells in B-ALL White Blood Cancer Microscopic Images as part of ISBI 2019 challenge provided by SBILab which is available for the public at . The images are stored with the resolution of 450×450 pixels using the 24-bit RGB color system. The size of each cell is approximately the size of 300×300 pixels. The images were annotated by experienced oncologist for the classification procedure. The methods developed by [1, 7, 6, 5, 22] is employed for segmentation and stain normalization of the provided dataset. The dataset contains a total of 76 individual subjects (47 ALL subjects and 29 Normal subjects), containing a total cells images of 7272 ALL and 3389 normal cells.
Iii-C Data Pre-processing
Two normalization methods are used in this experiment to compare the performance of different methods. First, we subtract the mean RGB value of all images from the training set divided by its standard deviation to normalize the input images as suggested in . We also normalize images using ImageNet mean subtraction as a pre-processing step. The ImageNet mean is a pre-computed constant derived from ImageNet database .
Regarding to the black margin of each image as illustrated in Fig1, we resized all images from the image center of the original size of 450×450 pixels to the appropriate size 380×380 pixels using bicubic interpolation to ensure each cell is located at the center and reduce the non-informative adjacent background regions.
Iii-C3 Data Augmentation
CNNs demonstrated state-of-the-art performance in different tasks [3, 18, 13]. However, the performance of CNNs highly depends on training data size. Due to the data privacy issues in medical domain, collecting adequate clinical images is a challenge. To address the issue of limited dataset size and avoid over-fitting problems, we applied various data augmentation techniques to optimize the CNN performance as suggested in recent studies   including contrast adjustments and brightness correction, horizontal and vertical flips and intensity adjustments. The class distributions of dataset before and after data augmentation is presented in Table 1.
|Number of images|
|Cell type||Before augmentation||After augmentation|
Iii-D Proposed Deep CNN Architecture with Auxiliary Components
The main contribution of our approach is proposing a hybrid CNN model that combines low-level features from intermediate layers in order to generate high-level discriminative feature maps for immature leukemic blast classification. In this approach, two well-established CNN architectures, namely MobileNet and VGG16 which have shown excellent performance in many computer vision tasks are used [23, 2]. For VGG16 architecture, the initial weights are obtained from weights learned from ImageNet by transfer learning strategy. As illustrated in Figure 2, from MobileNet architecture, features from five convolution layers are extracted. Then each of them followed by an average pooling layer. Next, we concatenated them into a single feature vector. Thereafter, we connect a new fully connected (FC) layer with 256 hidden units with rectified linear unit (ReLU) activation function. Finally, two output neurons associating with normal and malignant cases with softmax non-linearity activation function are used at the classifier layer. These extracted features from selected intermediate layers can act as a complementary set of features to learn highly discriminative features beside the existing extracted deep features. This approach results in detection of more complex patterns from each input image and gives higher accuracy with lower error rate. Employing very deep architecture for training limited samples could have the issues of vanishing gradients and poor local minima. The main benefit of applying a global average pooling layer is reducing the number of parameters in very deep architectures. This reduction helps to prevent getting stuck in the poor local minima in a high dimensional space which often occurs in learning from very deep CNNs. Additionally, once the number of parameters decreases, we can ensure the gradient flow within the deep network and hence the learning process becomes stable regardless of the network depth, i.e. the number of hidden layers.
Iii-E Evaluation Metrics
To evaluate the performance of the proposed method, three mostly used evaluation metrics namely, accuracy, sensitivity and specificity are considered. Accuracy shows the number of correctly classified ALL cases divided by the total number of test images denoting the overall correctness, is defined as:
In detecting disease, sensitivity or True Positive Rate (TPR) is a measure of the proportion of true positive results to all real positives (subjects that have the disease). If cancer samples in the provided dataset are limited, the model has to be sensitive.
Specificity or True Negative Rate (TNR) is a measure of the true proportion of negative results to all real negatives (subjects that do not have the disease). High specificity means that the model is good in detecting healthy cases.
Iv Experiment and Results
Iv-a Experimental Setup
For our experiments, 70% of the images of each class are assigned to the training set, 20% to the validation set, and the remaining 10% to the test set. To obtain the optimal accuracy, several hyper-parameter tuning, using an exhaustive grid-search, is utilized. The effect of different optimizers, namely adaptive moment estimation (Adam), stochastic gradient descent (SGD) with momentum, and root mean square propagation (RMSProp) are investigated. For SGD optimizer, the momentum term was set to 0.9. For Adam optimizer, and were set to 0.7 and 0.999, respectively. For RMSProp optimizer, rho and were set to 0.8 and None, respectively. The learning rate was set to 0.001 for the Adam optimizer and to 0.0001 for both RMSProp and SGD optimizer. We utilized ReLU activation function and dropout  in the fully-connected layer with a rate of 0.4 to prevent over-fitting. The batch size was set to 32 in order to fit into the GPU memory. All models are trained for 1000 epochs. Our experiment is implemented in Python using the Keras package with Tensorflow as the deep learning framework backend and run on Nvidia GeForce GTX 1080 Ti GPU with 11GB RAM.
Iv-B Results and Discussion
The obtained results are derived from the 967 test images of the ISBI 2019 challenge which were not used in the training phase. These test set are consist of 312 normal cases and 655 ALL cases. We first examine the effect of image normalization and different optimizers on the classification performance. The accuracy, sensitivity and specificity of the obtained results are tabulated in Table II.
As the results confirm, there is a level of variation in all results when running the experiments with different optimizers and image normalization techniques. Analyzing Table II, we observe that proposed model delivered high accuracy (96.17%) on dataset mean normalization with Adam optimizer. High sensitivity (95.92%) result achieved by Adam optimizer, and ImageNet mean normalization method, and high specificity (99.53%) obtained by dataset mean normalization and SGD optimizer. Surprisingly, the worst classifier is observed by SGD optimizer and dataset mean normalization with an accuracy of 89.76%, sensitivity of 86.96%, and specificity of 99.53 (the last row in Table II).
To justify the performance of the proposed approach, the performance of each architecture is individually evaluated. Table III provides the comparison of the individual VGG16 and MobileNet architectures with the proposed model. From Table III, it can be seen that our proposed method significantly outperforms the individual architectures on the provided dataset. Our model improves VGG16 up to 16% and MobileNet by 8.17% in terms of accuracy, which is considered significant. Moreover, the plain MobileNet architecture (88.00%) gives a better performance than VGG16 architecture (80.77%). This means the gap in accuracy is 7.23%, in favor of MobileNet. This is probably because of the benefit of the depth-wise and point-wise blocks in MobileNet compared to regular convolutional blocks in VGG16.
For the sake of comparison, our proposed ensemble is compared with some of the recent studies in the literature in Table IV. As shown in Table IV, the proposed approach achieves better performance compared to other studies in terms of the accuracy.
|DTH||Yu et al. ||88.50%||2017|
|ISBI||Mourya et al. ||89.62%||2018|
|ALL-IDB2||Singhal et al. ||89.72%||2014|
|MISP||Mohamed et al. ||93.00%||2018|
|ALL-IDB||Patel et al. ||93.75%||2015|
|IGH||Mohapatra et al. ||94.73%||2013|
The experimental results in Table IV confirm that the proposed ensemble, by aggregating features from intermediate layers outperforms all counterparts and achieves the highest accuracy. This indicates the important role of ensemble based deep learning in joint with highly descriptive feature. Our proposed learner gains accuracy of 96.17% on the recent ISBI 2019 dataset while counterpart study at  from the Table 4, gains accuracy of 89.62% on the same dataset.
We presented an automatic CNN hybrid method for classification of ALL and healthy cells. Two well-established CNN, namely, VGG16 and MobileNet are used to extract features from multiple abstraction levels. Fusing the features from selected intermediate layers can be regarded as an auxiliary set of features which leads to further improvement of the classification accuracy. This approach not only helps to learn more complex patterns but also addresses the issues of vanishing gradients and poor local minima by reducing the number of parameters. The obtained results suggest that combining features learned by deep models improves the performance and yield more accurate result (96.17%) than individual state-of-the-art networks. For future research directions, we intend to employ the ensemble of other CNN architectures to observe the change in accuracy.
-  GCTI-SN: Geometry-Inspired Chemical and Tissue Invariant Stain Normalization of Microscopic Medical Images. Cited by: §III-B.
-  (2018) A convolutional neural network with feature fusion for real-time hand posture recognition. 73, pp. 748 – 766. External Links: Cited by: §III-D.
-  (2019) EmbraceNet: a robust deep learning architecture for multimodal classification. 51, pp. 259 – 270. External Links: Cited by: §III-C3.
-  Classification of Normal vs Malignant Cells in B-ALL White Blood Cancer Microscopic Images:ISBI 2019. External Links: Cited by: §III-B.
-  (2017) SD-Layer: Stain Deconvolutional Layer for CNNs in Medical Microscopic Imaging BT - Medical Image Computing and Computer-Assisted Intervention − MICCAI 2017. M. Descoteaux, L. Maier-Hein, A. Franz, P. Jannin, D. L. Collins, and S. Duchesne (Eds.), Cham, pp. 435–443. External Links: Cited by: §III-B.
-  (2016) Overlapping Cell Nuclei Segmentation in Microscopic Images Using Deep Belief Networks. In Proceedings of the Tenth Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP ’16, New York, NY, USA, pp. 82:1–82:8. External Links: Cited by: §III-B.
-  (2017-02) Stain Color Normalization and Segmentation of Plasma Cells in Microscopic Images as a Prelude to Development of Computer Assisted Automated Disease Diagnostic Tool in Multiple Myeloma. 17 (1), pp. e99. External Links: Cited by: §III-B.
-  (2017-04) MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. External Links: Cited by: §III-A.
-  (2019) A comparative study of deep learning architectures on melanoma detection. 58, pp. 76 – 83. External Links: Cited by: §III-C3.
-  Key Statistics for Acute Lymphocytic Leukemia (ALL). External Links: Cited by: §I.
-  (2012) ImageNet Classification with Deep Convolutional Neural Networks. External Links: Cited by: §III-C1.
-  (2018) Classification of acute leukemia using medical-knowledge-based morphology and cd marker. Biomedical Signal Processing and ControlJournal of Process ControlInformatics in Medicine UnlockedEngineering Applications of Artificial IntelligenceAcademic Journal of Research In Economics and ManagementNeural Computing and ApplicationsCoRRClinical Lymphoma Myeloma and LeukemiaCME Series on Hemato-Oncopathology, All India Institute of Medical Sciences (AIIMS)Neural Information Processing SystemsInformation FusionMicronNeurocomputingTissue and CellDiagnostic and Interventional ImagingApplied Soft ComputingPhysics and Chemistry of the Earth, Parts A/B/CProcedia Computer ScienceThe Journal of Machine Learning Research 44, pp. 127 – 137. External Links: Cited by: §I.
-  (2019) Deep learning for variational multimodality tumor segmentation in pet/ct. External Links: Cited by: §III-C3.
-  (2019-06) Crop Lodging Prediction from UAV-Acquired Images of Wheat and Canola using a DCNN Augmented with Handcrafted Texture Features. External Links: Cited by: §I.
-  (2018-03) Automated detection of white blood cells cancer diseases. In 2018 First International Workshop on Deep and Representation Learning (IWDRL), pp. 48–54. External Links: Cited by: §II, TABLE IV.
-  (2014) An ensemble classifier system for early diagnosis of acute lymphoblastic leukemia in blood microscopic images. External Links: Cited by: §II, TABLE IV.
-  (2018-10) LeukoNet: DCT-based CNN architecture for the classification of normal versus Leukemic blasts in B-ALL Cancer. External Links: Cited by: §II, §IV-B, TABLE IV.
-  (2019) Automatic detection, localization and segmentation of nano-particles with deep learning in microscopy images. 120, pp. 113 – 119. External Links: Cited by: §III-C3.
-  (2015) Automated leukaemia detection using microscopic images. 58, pp. 635 – 642. Note: Second International Symposium on Computer Vision and the Internet (VisionNet’15) External Links: Cited by: §II, TABLE IV.
-  (2018) Deep cnn and data augmentation for skin lesion classification. In Asian Conference on Intelligent Information and Database Systems, pp. 573–582. Cited by: §III-C3.
-  (2019) Prediction of kidney disease stages using data mining algorithms. 15, pp. 100178. External Links: Cited by: §I.
-  (2016) Segmentation of overlapping/touching white blood cell nuclei using artificial neural networks. Cited by: §III-B.
-  (2019) Detecting abnormal thyroid cartilages on ct using deep learning. 100 (4), pp. 251 – 257. External Links: Cited by: §III-D.
-  (2015) Introducing a hybrid model of DEA and data mining in evaluating efficiency. Case study: Bank Branches. 3 (2). Cited by: §I.
-  (2014) Very Deep Convolutional Networks for Large-Scale Image Recognition. abs/1409.1. External Links: Cited by: §III-A.
-  (2014) Local Binary Pattern for automatic detection of Acute Lymphoblastic Leukemia. In 2014 20th National Conference on Communications, NCC 2014, External Links: Cited by: §II, TABLE IV.
-  (2014) Dropout: a simple way to prevent neural networks from overfitting. 15 (1), pp. 1929–1958. Cited by: §IV-A.
-  (2018) Data mining and clustering in chemical process databases for monitoring and knowledge discovery. 67, pp. 160 – 175. Note: Big Data: Data Science for Process Control and Operations External Links: Cited by: §I.
-  (2018) Blood cell images segmentation using deep learning semantic segmentation. In 2018 IEEE International Conference on Electronics and Communication Engineering (ICECE), pp. 13–16. Cited by: §I.
-  (2018) Leukemia diagnosis in blood slides using transfer learning in cnns and svm for classification. 72, pp. 415 – 422. External Links: Cited by: §I.
-  (2017-10) Automatic classification of leukocytes using deep neural network. In 2017 IEEE 12th International Conference on ASIC (ASICON), pp. 1041–1044. External Links: Cited by: §II, TABLE IV.
-  (2017) Aggregating Deep Convolutional Features for Melanoma Recognition in Dermoscopy Images BT - Machine Learning in Medical Imaging. Q. Wang, Y. Shi, H. Suk, and K. Suzuki (Eds.), Cham, pp. 238–246. External Links: Cited by: §III-C1.