Hierarchical ResNeXt Models for Breast Cancer Histology Image Classification
Microscopic histology image analysis is a cornerstone in early detection of breast cancer. However these images are very large and manual analysis is error prone and very time consuming. Thus automating this process is in high demand. We proposed a hierarchical system of convolutional neural networks (CNN) that classifies automatically patches of these images into four pathologies: normal, benign, in situ carcinoma and invasive carcinoma. We evaluated our system on the BACH challenge dataset of image-wise classification and a small dataset that we used to extend it. Using a train/test split of 75%/25%, we achieved an accuracy rate of 0.99 on the test split for the BACH dataset and 0.96 on that of the extension. On the test of the BACH challenge, we’ve reached an accuracy of 0.81 which rank us to the out of 51 teams.
Keywords:CNN, ResNeXt, learning rate, histology images
Every year, breast cancer kills more than 500,000 women around the world . Early detection can help take proper actions before the spread of cancerous tissues. This has been proved to reduce death rate in US . Histology image analysis is necessary to perform early diagnosis . However these images are too large so manually analyzing them is very time consuming and error prone. Thus an automated system is more than welcome to reduce the burden of manual analysis.
In this scope, the BreAst Cancer Histology (BACH) Challenge
Our main contribution is a hierarchy of Convolutional Neural Network (CNN) models that gradually classify images from general pathological groups namely carcinoma and non-carcinoma and then into the four groups cited above. We trained our system on 75% of the challenge dataset (Sect. 2.2) and evaluated it on the remaining 25% (Sect. 3.3). We also extended this dataset with another one and performed the same training and evaluation.
Instead of classifying images directly into the four pathological groups that may be difficult to differentiate, our approach is about starting by a simplified version of the problem that is classifying images into two categories:
Carcinoma: which includes In Situ and Invasive pathologies.
Non Carcinoma: which includes Normal and Benign pathologies.
Then we classify images from each category into the two pathologies that composed them. We use a CNN model for each classification. So a CNN model that we call the general model is in charge of this simplified version of the problem. Then we have two specialized CNN models that classify respectively the Carcinoma category into In Situ and Invasive and the Non Carcinoma category into Normal and Benign. Thus we have a hierarchy of three CNN models in a binary tree structure where each model/node classifies incoming images into two types. Figure 1 synthesizes visually the structure of the hierarchy of models where each one answers a question. We describe below the underlying CNN model used and how we train them.
For all three models, we used the ResNeXt50 architecture  which is structured in repetitive blocks composed of convolution and non-linear operations as general CNNs. However, in a ResNeXt block, operations are performed across many branches and results are aggregated together with the block input (see Figure 2(a)). As ResNeXt is a 1000-categories ImageNet classifier, we substituted its last fully connected layer by some custom layers to make it a 2-categories classifier (see Figure 2(b)).
instead of learning from scratch, we started from a pretrained ResNeXt50 for each model of our hierarchical system. This is one way of doing transfer learning which is using a model trained for one task and re-targeting it for another related task. This is suggested in  as a baseline for any recognition task.
Then we fine tuned each model to its corresponding two classes. For the general model, we used the ImageNet pretrained weights. For the two specialized models, we have chosen the best of: one fine tuned from ImageNet and the other from the general model. The idea of fine tuning from the general model is that it has already seen these images and learned to extract meaningful features from them.
We trained each CNN model of our system following these steps:
choosing an optimal learning rate .
training for 3 epochs the last layers randomly initialized using  while keeping pretrained ones fixed.
training middle and last layers with different learning rates:
for middle layers
for last layers
Additionally, during each training, we vary the optimal learning rate with a specific scheduling method. We will go over it and each step stated above with their explanations.
Optimal learning rate choice:
the learning rate is one of the most important hyperparameter when training any CNN . In general it is set based on trial and error. We used a method in  for setting an optimum value for . The idea is to make one training run for few epochs while increasing from a very small value after each iteration. Then we plot the accuracy against the learning rate and note the value where the accuracy starts diverging or decreasing after increasing. The optimal is 1/3 or 1/4 of . However we utilized an implementation of this method that uses the loss plot instead. We have found that in this case it works better when using 1/10 of .
Learning rate scheduling via Stochastic Gradient Descent with Warm Restarts (SGDR):
It is a method to schedule the learning rate variation during training so as to converge rapidly. It has been proposed in  and achieved new state-of-the-art results on CIFAR-10 and CIFAR-100 datasets. In this approach, we decrease the optimal learning rate following the cosine annealing scheme until nearly zero. Then we suddenly set to its initial value and repeat again. This sudden jump of allows to look for another local minima around that may be better. That is the idea of ”Warm Restarts”.
Training with different learning rates:
This idea has been introduced by Jeremy Howard in . It is based on  where authors show that first layers of CNNs learn to extract generic features like edges, corners, blobs and latter ones are more specialized on the task in hand. So we avoid to alter first layers as these features are useful for any task. In the same way, we slightly alter middle layers because there are getting specialized on the task in hand and finally alter last layers with the optimal learning rate found. That is why we choose to use and respectively for middle layers and last layers. The number is arbitrary and chosen based on trial and error but the idea of having decreasing learning rates from last to first layers remains. Here we set the learning rate of first layers to zero.
3 Experiments and Results
The dataset is a set of images of hematoxylin and eosin (H&E) stained breast histology microscopy. It is one of the two tracks of the ICIAR 2018 Grand Challenge on BreAst Cancer Histology (BACH) Challenge. It has 400 images equally distributed among four pathologies: Normal, Benign, In situ carcinoma and Invasive carcinoma. Images are conventional color images and are high resolution pixels with a pixel scale of . This dataset is an extension of the BioImaging 2015 challenge.
We also used an additional public dataset from Bio-Image Semantic Query User Environment (BISQUE)
We trained our system on a Nvidia Tesla K80 equipped environment and used the following settings for training all three compounding CNNs:
splitting the dataset into 75% for training and 25% for validation.
resizing all images to .
augmenting the dataset by random rotations, reflections and cropping.
setting batch size to 10.
Concerning the resize operation, we first resized the image to which preserves the ratio equals to 4/3. So the resolution became . Then we cropped a square at the center of the resized image. This reduces of the risk of missing important parts of the image which are likely to be around its center. We did so to accelerate the training.
3.3 Results based on the training set
We evaluated our method on 25% of the dataset unused during training. Table 1 presents results. We performed better on the initial dataset than the extended one. Figure 3 shows the confusion matrix of results on the BACH dataset. Only one out of 100 images was misclassified. We obtained these results in less than 60 epochs of training for each model which is very fast thanks to the training approach.
|Models||Init.||Init. + Ext.||Competition|
3.4 Final system for the competition and result
For the competition we assemble four different versions of the general or Carci model and three versions of the specialized ones: NorBe and InvIs as named in Table 1.
Carci: two versions are in Table 1. They correspond to ‘Init.’ and ‘Init. + Ext.’ columns. We have built the third version by training on the whole dataset (BACH and BISQUE combined). The last one is a snapshot of ’Init’column when accuracy was 0.99.
NorBe: similar to the first three versions of Carci model.
InvIs: Table 1 contains one version which we obtained by using Carci as a pretrained model. The second version used the ImageNet pretrained model. Finally, we trained the model on the whole BACH dataset to build the last version.
When training on the whole dataset, there is no way to check overfitting in contrast to having separate train and validation sets where losses of both help check it. Thus we trusted more these latter than those trained on the whole dataset. As reported in Table 1, this system reached an accuracy of 0.81 on the preliminary competition test set and rank us the 8 place out of 50.
4 Related work and Discussions
Computer-Aided Diagnosis (CAD) has become a major area of research in medical imaging included histology images. There are many works on breast cancer detection from histology images with different datasets. It makes it difficult to fairly compare methods. Thus, competitions like BACH challenge are well suited for that and help advancing researches.
In , a CNN based model has been developed to classify histology images in the same four classes as our problem. They used the BioImaging 2015 challenge dataset which is the basis of the BACH dataset. Our reported accuracy is better than them: 0.81 vs 0.778. But this does not imply our method is better since this dataset is larger. The best result on the preliminary results of the BACH challenge is an accuracy of 0.87 which not far from us. Besides, looking at Table 1, we may suspect our system to overfit as there is a gap of 0.19 between validation and competition test set accuracies. To combat overfitting, we used dropout twice respectively with probabilities 0.25 and 0.5 (see Figure 2(b)). Thus, we inclined towards our validation choice which may be similar to the training set. Additionally, mainly using 75% of the data could have prevented our system to discover new patterns in validation data to perform better. Likewise, not only the resize operation induces lost of image parts but may also hamper the system to capture helpful details.
5 Conclusion and Perspectives
We proposed a hierarchical system of three CNN models to solve the image-wise classification of the BACH challenge. This system classifies gradually images into two categories carcinoma and non-carcinoma and then into the four classes of the challenge. When training CNN models, we followed a scheme that accelerate convergence. We got an accuracy of 0.81 on the competition test set and rank 8 out of 51 teams. The first way to improve our system is training on the whole dataset using a strategy described in . It consists in training the whole dataset until the loss reaches that of the best accuracy obtained during validation. Regarding the resize operation, we can use a left and right crop with the center crop during training. Not only this will avoid losing part of images but serve as a data augmentation strategy as well. Finally, we could increase images’ size but not exceeding the original size.
- Corresponding author: .
This article should be cited as: Koné I., Boulmane L. (2018) Hierarchical ResNeXt Models for Breast Cancer Histology Image Classification. In: Campilho A., Karray F., ter Haar Romeny B. (eds) Image Analysis and Recognition. ICIAR 2018. Lecture Notes in Computer Science, vol 10882. Springer, Cham.
- Araújo, T., Aresta, G., Castro, E., Rouco, J., Aguiar, P., Eloy, C.e.a.: Classification of breast cancer histology images using convolutional neural networks. PLOS ONE 12(6), 1–14 (2017). https://doi.org/10.1371/journal.pone.0177544
- Berry, D., Cronin, K., Plevritis, S., Fryback, D., Clarke, L., Zelen, M.e.a.: Effect of screening and adjuvant therapy on mortality from breast cancer. New England Journal of Medicine 353(17), 1784–1792 (2005). https://doi.org/10.1056/NEJMoa050518
- Gelasca, E., Byun, J., Obara, B., Manjunath, B.: Evaluation and benchmark for biological image segmentation. In: 15th IEEE International Conference on Image Processing. pp. 1816–1819 (2008). https://doi.org/10.1109/ICIP.2008.4712130
- Goodfellow, I., Warde-Farley, D., Mirza, M., Courville, A., Bengio, Y.: Maxout networks. In: Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 1319–1327 (2013)
- He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: 2015 IEEE International Conference on Computer Vision (ICCV). pp. 1026–1034 (2015). https://doi.org/10.1109/ICCV.2015.123
- Howard, J.: Lesson 2: Deep learning v2. practical deep learning for coders (2018)
- Howard, J., Others: The fast.ai deep learning library, github (2018)
- Kunio, D.: Computer-aided diagnosis in medical imaging: Historical review, current status and future potential. Computerized Medical Imaging and Graphics 31(4), 198 – 211 (2007). https://doi.org/10.1016/j.compmedimag.2007.02.002
- Leslie, N.: Cyclical learning rates for training neural networks. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). pp. 464–472 (2017). https://doi.org/10.1109/WACV.2017.58
- Loshchilov, I., Hutter, F.: Sgdr: Stochastic gradient descent with restarts. 6th International Conference on Learning Representations (ICLR) (2017)
- Pan, S., Yang, Q.: A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22(10), 1345–1359 (2010). https://doi.org/10.1109/TKDE.2009.191
- Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., De Vito, Z.e.a.: Automatic differentiation in pytorch. In: 31st Conference on NIPS 2017 Workshop Autodiff (2017)
- Pêgo, A., Aguiar, P.: Bioimaging 2015, 4th international symposium in applied bioimaging the pre-clinical challenge in 3d (2015)
- Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: Cnn features off-the-shelf: An astounding baseline for recognition. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops. pp. 512–519 (2014). https://doi.org/10.1109/CVPRW.2014.131
- Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S.e.a.: Imagenet large scale visual recognition challenge. Int J Comput Vis 115, 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
- Saining, X., Ross, G., Piotr, D., Zhuowen, T., Kaiming, H.: Aggregated residual transformations for deep neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 1492–1500 (2017)
- Society, A.C.: Breast cancer facts & figures 2017-2018 (2017)
- Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15, 1929–1958 (2014)
- WHO: WHO position paper on mammography screening. WHO Press, 20 Avenue Appia, 1211 Geneva 27, Switzerland (2014)
- Zeiler, M., Fergus, R.: Visualizing and understanding convolutional networks. In: Computer Vision â ECCV 2014. pp. 818–833 (2014)