Quantification of Lung Abnormalities in Cystic Fibrosis using Deep Networks
Cystic fibrosis is a genetic disease which may appear in early life with structural abnormalities in lung tissues. We propose to detect these abnormalities using a texture classification approach. Our method is a cascade of two convolutional neural networks. The first network detects the presence of abnormal tissues. The second network identifies the type of the structural abnormalities: bronchiectasis, atelectasis or mucus plugging.We also propose a network computing pixel-wise heatmaps of abnormality presence learning only from the patch-wise annotations. Our database consists of CT scans of 194 subjects. We use 154 subjects to train our algorithms and the 40 remaining ones as a test set. We compare our method with random forest and a single neural network approach. The first network reaches an accuracy of 0,94 for disease detection, 0,18 higher than the random forest classifier and 0,37 higher than the single neural network. Our cascade approach yields a final class-averaged F1-score of 0,33, outperforming the baseline method and the single network by 0,10 and 0,12 .
Quantification of Lung Abnormalities in Cystic Fibrosis using Deep Networks
F. Marques: E-mail: email@example.com
Keywords: Cystic Fibrosis, Deep Learning, Cascade Network, Reconstruction, Visualization
Cystic fibrosis (CF) is the most lethal genetic disorder in the Caucasian population. The disease starts to express itself by changing the structure of lung tissue leading to structural abnormalities .The early stages of cystic fibrosis have not been thoroughly studied yet. Early, automatic and quantitative analysis could give a better knowledge on what changes lead to Severe Advanced Lung Disease (SALD)  and reduce irreversible lung damage. The different types of abnormalities studied in this article are bronchiectasis - destruction or widening of the airway - mucus plug and atelectasis (deflation of alveoli)  as shown Fig. 1.
There are many approaches to automatic lung tissue classification. They can depend on handcrafted or learnt features. Handcrafted features use predefined filter banks containing features that capture uniquenesses and specific details of lung tissue. In Ciompi et al. for instance, the authors cascade supervised and unsupervised learning based on handcrafted features.
Recently deep learning techniques have successfully been used for medical image analysis. However, few articles [5, 6] propose to use deep learning for texture classification. A common problem is the unbalanced between different classes that leads to inefficient networks. Wang et al.  proposed a CNN where the classes are balanced by oversampling patches of the rare classes.
In this paper we propose an automatic method for patch-wise texture classification of different structural lung abnormalities. Our method is based on a cascade of two convolutional neural networks. To the best of our knowledge it is the first time that structural abnormalities in early stages of cystic fibrosis are automatically scored. Besides, we train and evaluate our algorithm on CT scans of children, acquired with a variety of scanner models and low dose scan protocols. All of this makes the problem more challenging.
In addition we compute precise pixel-wise heatmaps of structural abnormalities , using only the global patch labels during training to visualize where certain abnormalities are most likely to be present.
We propose a cascade of two convolutional neural networks (CNN) to classify 2D patches from lung CT scans. The first network performs a binary classification between healthy and diseased tissue. The second network refines the prediction by classifying the predicted diseased patches into 4 several specific subgroups: bronchiectasis, atelectasis, mucus or abnormal. As a side result, we were also able to compute pixel-wise heatmaps of abnormality presence from patch-wise annotations.
Only inspiration scans from the subjects are present. From these scans, 2D slices containing annotations were extracted. Annotations are patch-wise with variable size. Patches were resampled to have the same number of voxels. The image intensity of each CT scan was rescaled between 0 and 1. Since the subjects are in a early stage of cystic fibrosis, the changes in lung texture are mild and it can be challenging to distinguish them from healthy tissue. The classified patches are therefore selected larger than the annotations to include anatomical context.
2.2 Cascade of Convolutional Neural Networks.
In our dataset, the vast majority of patches are healthy (table 1). Direct classification approaches have the tendency to overestimate significantly the presence of abnormalities to maximize their global learning objective. To overcome this problem we train two different convolutional neural networks with different loss functions. The first network detects the presence of abnormalities. The second network classifies these abnormalities into several subgroups.
These two networks have the same architecture except for the last layer. The first network performs a binary classification and ends with a sigmoid activation function. The second network performs a multi-class classification and ends with a softmax function.
In Fig. 2 the network architecture is presented. In this network no pooling is performed between the layers since texture contains low level features that can be lost - there is only global average pooling in the end. The network architecture is similar to the one proposed by Anthimopoulos et al. with some adaptations. The size of convolution kernels are 3x3 instead of 2x2. In addition to dropout and data augmentation, a batch normalization layer  is inserted after every convolution . This allowed us to use higher learning rates of the optimizer.
The first network is optimized with a dice coefficient loss function. Being D the dice coefficient, the loss function is L=1-D:
The second network is optimized with a class-weighted categorical cross-entropy define as:
where is the number of samples, the number of classes, the weight of class , the ground truth and the prediction. Weighting the cross-entropy compensates, to some extent, the class imbalance in the dataset. As we show in the experiments, this approach alone cannot overcome the significant healthy/disease imbalance of our dataset. In order to minimize the loss we used Stochastic Gradient Descent (SGD). To accelerate convergence we use large dense layers. Before each dense layer a dropout layer prevents over-fitting. For the activation of each layer Leaky ReLu is used in order to avoid the stagnation of a neuron as may happen after a large gradient update .
2.3 Heatmaps of disease presence.
We aim at computing a heatmap of pixel-wise disease presence only using the patch-wise information during training. This can be seen as a way to automatically refine the precision of the annotations. In this problem setting, the ground truth patches are considered as weak-labels.
To compute this side result we use a different network from 2. It is a CNN with a U-net architecture  followed by a global pooling layer based on the network described in Dubost et al. . To compute the heatmap on the complete axial slice (Fig. 3), we compute the average of the patch-wise heatmaps predictions in the area where they overlap.
The evaluation of the method is performed in a test set with 40 CT scans. We compare the performance of the proposed method with a baseline method based on a random forest classifier and a direct application of our CNN on all classes. The two networks from the cascade method are evaluated separately. For detection we evaluate true positive rate (TPR), true negative rate (TNR) and accuracy. For multi-class classification we evaluate the average F1-Score of all classes: harmonic mean between precision and recall. Each network is compared to the baseline method used. In the end of the cascade the results from the two networks are concatenated and evaluated.
The scans were acquired in Erasmus MC-Sophia and annotated using an in-house developed annotation tool - PRAGMA . The dataset consists of 194 scans from children with ages between 1 year and 11 months to 18 years old. The children are in average 9 years old. 144 patients have abnormalities as consequence of CF while the remaining 50 patients don’t have annotated abnormalities. The slice thickness ranges between 0,75-3 mm with an average of 1,85 mm. There is different slice spacing up to 7 mm. The scans were reconstructed with different kernels, B60S and B75f being the most common. All scans were taken at full inspiration breath-hold. Each scan has 10 annotated slices patch-wise. Each original patch was first resized to 20 by 20 pixels. To incorporate the necessary surrounding information, the area around each patch is included in the inputs to the network. Every patch is extracted to have a final size of 60 by 60 pixels.Number of patches and its distribution is presented in Table 1.
The image is overlaid with a square grid. The grid size varies across scans and is defined as one-twentieth of the lung width at the carina (ridge at the base of the trachea). If a grid cell/patch is covered by diseased lung tissue for at least 50% of its surface, the patch is annotated as either bronchiectasis, mucus plug, atelectasis or abnormal (presence of unusual lung texture not falling into the other categories). Otherwise the patch is annotated as healthy. This leads to a 5 classes patch classification problem. In case several diseases are present within the same patch, only one label is assigned. The class is selected according to a hierarchical system, from highest to lowest priority: bronchiectasis, mucus plug,bronchial wall thickening, atelectasis and normal structure.
3.2 Baseline Methods.
We compare our method with two competitive non cascaded approaches: a single/direct CNN with the same architecture as our networks and a random forest classifier with features similar to the one presented in Ciompi et al. . For the random forest classifier, given a 60x60 voxel-wide patch, its feature vector is computed as follows. We first filter the patch with 15 different features based on Gaussian filter, Gradient Magnitude, Laplacian and eigenvalues of Hessian matrix, all with different kernel sizes (, ,). We then compute 16 different intensity histograms from these 15 filtered patches plus the original intensities. Each histogram has 100 bins and histogram equalization is subsequently computed. These 16 histograms are then concatenated into a single feature vector .
3.3 Experimental Settings.
The algorithms are implemented in Python with Keras and Theano libraries. The experiments ran on a Nvidia GeForce GTX 1070 GPU. Data processing was done in MatLab and Python. The class-weights in the cross-entropy loss function of the second network are the following: bronchiectasis 1.2, abnormal 1, mucus 1.8 and atelectasis 1.8. For the direct CNN baseline method the disease weights are the same and the weight for healthy class is 0.005. Mucus and atelectasis occur less frequently, hence have higher weight. The abnormal class has a low intra-observer agreement and a low clinical priority, therefore we associate a lower weight to this class. To compensate the healthy/disease class imbalance, the disease patches are replicated 16 times for the training of the first network. This leads to a 1/4 disease/healthy balance. As our batch size is 64, both labels are, on average, present in a batch.
Table 2 compares the class-average F1-score (as defined in ) between the random forest, the direct CNN and the cascaded CNNs. The cascaded CNNs outperform the random forest by 0,10 and the direct CNN by 0,12.
|Random Forest||Direct CNN||Cascaded CNNs|
In Fig. 3 we detailed the classification for every class and compare with the baseline methods. For the binary healthy/disease classification, in order to compare the direct CNN with cascaded CNNs, all disease predictions of the direct CNN are grouped together in the disease class. Cascaded CNNs greatly outperform the two other approaches in every metric.
For abnormality scoring, direct CNN and cascade CNN show similar performances. Cascade CNN fails at scoring the abnormal class. This happens because abnormal class represents an unusual structure that does not resemble the other diseases. As mentioned before abnormal class is poorly annotated by the clinicians. In the intra-observer agreement this class presents an intraclass correlation of 0,13. Therefore this class is not included in the class-averaged of F1-score reported in table 2. In the confusion matrices for disease classification, mucus appears - for all methods - to often be predicted as bronchiectasis. This can be explained by the fact that mucus plugging often coexists with bronchiectasis, filling dilated airways, in which case the PRAGMA score would label this as bronchiectasis.
Cascade CNN outperforms all other approaches for the overall classification. The gap in average F1-score between cascade CNN and other approaches is due mainly to an accurate detection of disease in the binary classification, reducing significantly the number of false detections in comparison with the other methods.
In Fig. 4, we show some examples of the pixel-wise heatmaps of disease presence computed as explained in section 2. The heatmaps are thresholded to highlight the strongest predictions. Bronchietasis is localized quite accurately. It seems more difficult to detect atelectasis and mucus. This might be because of the imbalance of the data. As mentioned in the data section, bronchiectasis is annotated in our dataset with the highest (clinical) priority. This may have introduced a bias leading to an overestimation of this class. In Fig. 3, all approaches overestimate bronchiectasis, leading to a high number of false positive and lowering the F1-score for bronchiectasis.
4 Discussion and Conclusion
Our results show that the proposed method is able to detect abnormality and estimate a score for each disease in early stage of CF lung disease. The computed heatmaps highlight abnormal regions and provide more precise quantification. The random forest approach presented in Ciompi et al.  showed a good performance in case of severe advanced lung disease  but is outperformed by our method in data with only mild disease.
Our network is similar to the one in Anthimopoulos et al.  Both networks have a similar architecture and are trained for multi-class texture classification in lungs. In Anthimopoulos et al.  the network is designed for detection of irregularities in the pulmonary interstitium, while our network is trained for classification of early signs of structural lung damage in cystic fibrosis. These two problems are different and present different challenges.
We used a 2D approach. The use of 3D convolutions could be better in the thin-slice scans, however because of the large variation in slice spacing up to 7 mm in our data, we opted for 2D convolutions.
We proposed a cascade method of two convolutional neural networks for lung texture classification in early stages of cystic fibrosis. The method combines a binary classification to discriminate between healthy and abnormal lung tissue and a second network that performs a multi-class classification to score different different types of abnormalities. Our method outperforms the baseline method by 0,10 of F-score. We also propose to compute pixel-wise abnormality maps, only using patch-wise information for training. This can be considered as a way to refine the manual annotations and circumvent the ambiguity inherent to patch-wise annotations.
-  Brody, A.S., 2004. Early morphologic changes in the lungs of asymptomatic infants and young children with cystic fibrosis.
-  Loeve, M., Van Hal, P.T.W., Robinson, P., Williams, T.J., Nossent, G. and Tiddens, H., 2009. The spectrum of structural abnormalities on CT scans from CF patients with severe advanced lung disease. Thorax.
-  Rosenow, T., Oudraad, M.C., Murray, C.P., Turkovic, L., Kuo, W., de Bruijne, M., Ranganathan, S.C., Tiddens, H.A. and Stick, S.M., 2015. PRAGMA-CF. A quantitative structural lung disease computed tomography outcome in young children with cystic fibrosis. American journal of respiratory and critical care medicine, 191(10), pp.1158-1165.
-  Ciompi, F., Palaioroutas, A., Loeve, M., Pujol, O., Radeva, P., Tiddens, H. and de Bruijne, M., 2011. Lung tissue classification in severe advanced cystic fibrosis from CT scans. Fourth Int Work Pulm Image Anal, pp.57-68.
-  Anthimopoulos, M., Christodoulidis, S., Ebner, L., Christe, A. and Mougiakakou, S., 2016. Lung pattern classification for interstitial lung diseases using a deep convolutional neural network. IEEE transactions on medical imaging, 35(5), pp.1207-1216.
-  Gao, M., Bagci, U., Lu, L., Wu, A., Buty, M., Shin, H.C., Roth, H., Papadakis, G.Z., Depeursinge, A., Summers, R.M. and Xu, Z., 2016. Holistic classification of CT attenuation patterns for interstitial lung diseases via deep convolutional neural networks. Computer Methods in Biomechanics and Biomedical Engineering: Imaging and Visualization, pp.1-6.
-  Wang, Q., Zheng, Y., Jin, W. and Chen, X., 2017. Multi-Scale Rotation-Invariant Convolutional Neural Networks for Lung Texture Classification. IEEE Journal of Biomedical and Health Informatics.
-  Ioffe, Sergey, and Christian Szegedy. ”Batch normalization: Accelerating deep network training by reducing internal covariate shift.” International Conference on Machine Learning. 2015.
-  Maas, A.L., Hannun, A.Y. and Ng, A.Y., 2013, June. Rectifier nonlinearities improve neural network acoustic models. In Proc. ICML (Vol. 30, No. 1).
-  Dubost, F., Bortsova, G., Adams, H., Ikram, A., Niessen, W., Vernooij, M. and De Bruijne, M., 2017. GP-Unet: Lesion Detection from Weak Labels with a 3D Regression Network. MICCAI 2017.
-  Ronneberger, O., Fischer, P. and Brox, T. U-net: Convolutional networks for biomedical image segmentation. MICCAI 2015.
-  Lin, M., Chen, Q. and Yan, S., 2013. Network in network. ICLR 2014.