RUN:Residual U-Net for Computer-Aided Detection of Pulmonary Nodules without Candidate Selection
The early detection and early diagnosis of lung cancer are crucial to improve the survival rate of lung cancer patients. Pulmonary nodules detection results have a significant impact on the later diagnosis. In this work, we propose a new network named RUN to complete nodule detection in a single step by bypassing the candidate selection. The system introduces the shortcut of the residual network to improve the traditional U-Net, thereby solving the disadvantage of poor results due to its lack of depth. Furthermore, we compare the experimental results with the traditional U-Net. We validate our method in LUng Nodule Analysis 2016 (LUNA16) Nodule Detection Challenge. We acquire a sensitivity of 90.90% at 2 false positives per scan and therefore achieve better performance than the current state-of-the-art approaches.
keywords:Computer-aided detection, lung cancer, pulmonary nodules, deep learning, residual network, U-Net
Lung cancer has the highest morbidity and mortality in China. At present, surgery is still the only treatment that can cure lung cancer. Since it has no symptoms in its early stages, 70 -80% of lung cancer patients are diagnosed when the cancer is already at an advanced stage, thereby losing the chance of undergoing a successful surgical treatment. Nowadays, this situation is being improved with the development of artificial intelligence in the direction of medical image processing.
The early manifestation of lung cancer in medical imaging is usually solitary pulmonary nodule (SPN). However, due to the large amount of information in the entire image, it is easy for human eyes to miss small nodules, and experts are also prone to misdiagnosis when they are tired. Therefore, computer-aided diagnosis (CAD) systems have been gradually developed. The early research of CAD of lung cancer is mainly the use of X-ray film. However, since X-ray imaging is based on the density of each detection site, it often misses tiny nodules and nodules hidden behind the heart and blood vessels, thereby leading to poor end results. With the continuous development and improvement of imaging technology, low-dose CT scanning gradually shows its superiority as it can even detect tumors as small as millimeters in size, and has become one of the most effective methods for detecting solitary pulmonary nodules in early stages of lung cancer. Accurate lung nodule detection is the key to CAD for lung cancer diagnosis and is usually divided into two main phases: nodule candidate detection and reduction of false positives. Presently, the method of nodule detection can be summarized as two types: 1) Traditional machine learning methodsMessay et al. (2010); Filho et al. (2016); GonÃ§alves et al. (2016); Javaid et al. (2016); Lan et al. (2018) : The region of interest (ROI) is extracted, then its characteristics are calculated, and finally classifiers are used for classification. Both the selection of features and classifiers will have a great impact on the final result. 2) Deep learning method Simonyan and Zisserman (2014); Gruetzemacher and Gupta (2016); Ding et al. (2017); Li et al. (2016): Build the network structure, train the model, and then use the trained model to classify the data. Since convolutional neural networks can autonomously learn features, the feature selection process of traditional methods is optimized. In this paper, we mainly study the classification of true and false nodules based on deep learning algorithms. At present, there are many deep learning models such as CNNKrizhevsky et al. (2012); Li et al. (2016), DBNMohamed et al. (2012); Li et al. (2015), RNNLiu et al. (2014); Karpathy and Li (2014), GANGoodfellow et al. (2014); Radford et al. (2015) and so on. In our experiment, we focused on other two kinds of models: U-Net Ronneberger et al. (2015) and residual networkHe et al. (2015).
U-Net, which was proposed in 2015Ronneberger et al. (2015) by Olaf Ronneberger, Philipp Fischer, and Thomas Brox, won the International Symposium on Biomedical Imaging (ISBI) competition 2015. The entire network contains a total of 23 convolutional layers,including convolutions,max poolings,up-convolutions and a fully convolution. In general, it can be regarded as an encoder-decoder structure, the encoder gradually reduces the spatial dimension of the pooling layer, the decoder gradually repairs the details of the object and increase spatial dimensions. There is a quick connection between the encoder and the decoder. Residual network which was proposed by Kaiming He, Xiangyu Zhang in 2015 He et al. (2015), won the champion of ImageNet Large Scale Visual Recognition Competition (ILSVR) competition. It increases the network depth without degrading by superimposing layers (called identity mappings) on a shallow network basis. The concept of a ”shortcut” is proposed, which skips one or more layers and adds the input results to the bottom layer directly. The feature is extracted by adding multiple cascaded output results to the input, reducing the training parameters. Both the U-Net and residual network have a simple structure and faster training speed, but U-Net’s depth is slightly insufficient, and residual network solves the problem of degeneration under extremely deep convolutional neural network effectively. Therefore, we combine the two networks effectively and propose a new network called RUN. Compared with other methods, our biggest advantage can be summarized as: only one network is used to implement an end-to-end classification system directly without candidate nodules extraction.
This paper presents an easy-to-implement lung nodule detection framework. On the one hand, the traditional method feature extraction process is optimized. On the other hand, we just use one network to obtain the nodule detection results directly, which guarantees effectiveness and simplifies the detection process. The specific implementation of the system can be shown in Figure 1, which comprises only three stages of preprocessing(lung extraction), training of the model and classification, it is easy and effective.
Preprocessing improves the overall system accuracy by enhancing image quality. At this stage, in order to remove the influence of background, the image is segmented using a threshold method and a morphology-related method after noise reduction, thereafter obtaining a refined lung imageGomathi and Thangaraj (2009).
2.2 Improved network structure
Although the original U-Net model is easy to train, the accuracy of the experimental results is affected to a certain extent due to the lack of depth. The Residual Attention NetworkWang et al. (2017) made the network model to reach the deep level by stacking the Attention Module. We therefore propose a method to introduce the main standpoint of the residual network into the U-Net. It not only deepens the depth of the network, but also guarantees the effectiveness of training. Being similar to Residual Attention Network, we stack the main components of the residual network: residual units, and each unit contains ”shortcut connection” and ”identity mapping.” This deepens the network depth and ensures more detailed features simultaneously. The entire network is still in the form of a U-shaped structure, which is downsampling first and then upsampling, and the down-sampled feature map is merged with the corresponding up-sampled’s. Finally, the classification result is obtained through the fully-connected layer.
2.2.1 Residual unit
For each residual unit, it can be expressed by the following formula:
and represent the input and output of the residual unit, respectively. is the weight (and error) of the first residual unit, and k is the number of weighted layers contained in each residual unit (). F represents a residual function, stacking two 3*3 convolutional layers. The function H is an identity mapping: . Rectified Linear Unit (ReLU) and Batch Normalization(BN) as ”pre-activation” of the weight layer. The specific residual unit design is shown in Figure 2, different from traditional structureHe et al. (2016).
2.2.2 Network structure
Whole network structure is stacked into U-shaped and consists of downsampling and upsampling. In the downsampling process, the residual unit is introduced to deepen the network structure. In the upsampling process, in order to avoid the residual unit from transmitting more noise information, we only use a simple convolution operation.
The network consists of a total of 10 residual unit layers, 4 max-pooling layers, 4 up-conv layers, 8 convolutional layers, and the finally 1 fully-connected layer. The structure of the network is shown in Figure 3. After each downsampling, size of the feature map is halved, and the number of feature maps is doubled; after each upsampling, the number of feature maps is halved and the size is doubled, and then merged with the corresponding feature maps in the downsampling process. To prevent over-fitting, we introduced the dropout operation during downsamplingSrivastava et al. (2014). At the same time, in order to speed up the convergence and further overcome the disadvantages of deep neural networks that are difficult to train, we used BN operations during the upsampling processIoffe and Szegedy (2015).
Like the U-Net network, downsampling can extract features, and upsampling can complete positioning; at the same time, the ”shortcut” operation increases network depth and retains more original detail features. In contrast to the residual network which stacks residual unit directly, our approach uses different sized units stack into a U-shaped structure and it can avoid over-reliance on the performance of equipment.
2.3 Loss function
The loss function is used to measure the degree of inconsistency between the model’s predicted value and the true value. In this experiment we just use the simple dice coefficient loss function, and it’s definition is shown as follows:
The calculation of the dice coefficient which is a similarity measure function is as follows:
X represents the predicted value, and Y represents the true value. The represents the intersection of two sets, and the represents the union. The more similar the two samples are, the closer the coefficient value is to 1. Therefore, the larger the dice coefficient is, the smaller loss becomes and the better robustness the model has.
2.4 Optimization function
The essence of most learning algorithms is to establish an optimization model, and optimize the objective function (or loss function) through the optimization method to train the best model. We used the adaptive moment estimation (Adam) optimization algorithmKingma and Ba (2014) since each iteration has a certain range of learning rate after offset correction, making the parameters more stable. Essentially, the algorithm is a RMSprop with a momentum term, which dynamically adjusts the learning rate of each parameter using the gradient first moment estimation and the second moment estimation. Full calculations are listed as in Eq.(4) to Eq.(8):
where t represents the training iteration number, is the gradient, and signify the first moment estimate and the second moment estimate respectively, corresponding is that and denote the bias corrected first moment and the bias corrected second moment respectively, and Eq.(8) updates parameters finally. are adjustable parameters (general default is: ), and represents the learning rate.
3 Data and experiment
3.1 Data processing
In our experiments, all the data come from LUNA16. The data set is derived from Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI) database, which includes 1018 research examples acquired from 1010 different patients. After picked scans with a slice thickness greater than 2.5mm, 888 CT scans are included in this challenge, and every scan contains annotations that were made by 4 experienced radiologists. This challenge consists all nodules () accepted by at least 3 out of 4 radiologists. All the data is divided into 10 subsets, we use 9 of them for training and 1 for testing. To reduce the impact of ribs, scanâs intensity is clipped in range from -1200 up to 600 Housfield Unit and subsequently normalized to the range of [0, 1].
3.2 Evalution Criterion
We use two evaluation criteria to analyze the performance of the network architecture for the detection task. (1) We use Dice coefficient to evaluate our predicted results. When the Dice coefficient of the predicted value and the real value is greater than 50%, we judge it as a hit. (2) The nodule area we predict contains the coordinates of the nodal centers labeled by the experts, it is also called a hit. Otherwise, it is determined there is no hit, what is a false positive.
3.3 Experiment and results
Instead of developing a whole pulmonary nodule detection system, which usually integrates a candidate detector and a FP reducer, our method completes the detection task only by one network. Due to the high computational cost, we use axial slices as inputs instead of the entire case to trainDing et al. (2017). In whole training stage, each model’s inputs are 512*512 images, with the per-pixel mean subtracted. The dropout() strategy is utilized in convolutional and fully connected layers to improve the generalization capability of each model. The used batch size depends on the GPU memory and Adam is used to optimize model. When training the RUN, we use 60 epochs in total, the learning rate starts from 0.01, 0.001 after the epoch 10, and 0.0001 at the halfway of training. At testing stage, CT images are pre-processed the same way as we do in training stage. The networks are implemented in Python based on the deep learning framework Keras with Tensorflow backend using a GeForce GTX 1080 Ti GPU.
Furthermore, in order to demonstrate the effectiveness of our RUN network structure, we adjust the corresponding parameter settings to train a U-Net and a dual-path residual U-Net. As Table 1 shows their network structures are similar to the RUN network structure we proposed. In U-Net, neither the downsampling nor the upsampling contain residual units. However, both the down-sampling and up-sampling processes use the residual unit as the basic component to stack the dual-path residual U-Net network.
|U-Net||Dual-Path Residual U-Net||RUN:Residual U-Net|
|conv-32||residual unit-32||residual unit-32|
|conv-32||residual unit-32||residual unit-32|
|conv-64||residual unit-64||residual unit-64|
|conv-64||residual unit-64||residual unit-64|
|conv-128||residual unit-128||residual unit-128|
|conv-128||residual unit-128||residual unit-128|
|conv-256||residual unit-256||residual unit-256|
|conv-256||residual unit-256||residual unit-256|
|conv-512||residual unit-512||residual unit-512|
|conv-512||residual unit-512||residual unit-512|
|Network Structure||Training Parameters||Dice Coefficient|
|Dual-Path Residual U-Net||8,993,157||63.05%|
The comparison of experiment results is shown in Table 2 and Figure 4. From Table 2, we can see that because introducing residual unit can deepen the depth of the network, the RUN network has more training parameters than the U-Net but less than the dual-path residual U-Net. And besides the RUN network segmentation effect is significantly higher than that of the U-Net, and its effect is improved by about 6%, but the dual-path residual U-Net results in more erroneous segmentation.
In figure 4, we show results of the segmentation of the same CT image by three different networks. CT image of each row indicates small nodule, smaller nodule, middle nodule, larger nodule and large nodule, respectively. From these segmentation results, it can be clearly seen that, when the segmentation object is small nodules, RUN is obviously better than the other two methods.
Currently, the FPs/scan rates between 1 and 4 are mostly preferred in clinical practiceGinneken et al. (2010). As Figure 5 shows, our method yields a sensitivity of 79.05% at 0.11 FPs/scan and 90.90% at 2 FPs/scan. Therefore, it can be noted that the results from our experiments can satisfy clinical usage. In addition, in order to show the performance of the proposed method, we compare it with the other state-of-the-art methods designed for lung nodule detection. The result is shown in Table 3.
|Lung nodule detection sys.||Cases||Sensitivity(%)||FP(per case)|
|Messay et al.Messay et al. (2010)||84||82.66||3|
|Bergtholdt et al.Bergtholdt et al. (2016)||243||85.9||2.5|
|Li et al.Li et al. (2016)||1010||87.1||4.622|
|Golan et al.Golan et al. (2016)||1018||71.2||10|
|Huang et al.Huang et al. (2017)||99||90||5|
|Setio et al.Setio et al. (2016)||888||90.1||4|
|Dou et al.Dou et al. (2017)||888||90.6||2|
|The proposed method||888||90.9||2|
We introduce the ”shortcut” of the residual network to improve the traditional U-Net to get a RUN for pulmonary nodule detection. Due to pulmonary nodules vary greatly in size (diameter range from 3mm to 30 mm), many existing successful detection and diagnosis systems employ a multi-scale architecture over the yearsSetio et al. (2016); Gori (2007); Wei et al. (2015); Dou et al. (2017). However, these methods need different sizes of nodule patches as input and to adjust the size of the receptive field according to the nodule size, the setting of the receptive field is very important to the results. Therefore, during the training of our experiment, the entire slice is used as an input to the network instead of the patch. We also verify the influence of spatial information of different size nodules on detection task by studying the effect of single slice and multi-slice training on the experimental results, and we find that the presence of very small nodules may result in the false learning of multi-slice training and lead to more FP, but in general, more spatial information is taken into account, which can help reduce misdiagnosis effectively. Therefore, if we can choose the correct number of slices for training according to the size of the nodules, the results must be improved significantly.
In this paper, we present a network called RUN for computer-aided detection of pulmonary nodules skipped the stage of candidate selection from volumetric CT scans. We prove this network has a good learning performance with complex and variable lung nodules via LUNA16. In principle, the proposed framework is generic and can easily be extended to other target detection tasks in medical images. Further investigations include the evaluation of more clinical data and the investigation of more methods to achieve better experimental results for clinical use, such as the three classification problems of pulmonary nodules.
- Messay et al. (2010) Messay, T., Hardie, R. C., Rogers, S. K., 2010. A new computationally efficient cad system for pulmonary nodule detection in ct imagery. Medical Image Analysis 14 (3), 390–406.
- Filho et al. (2016) Filho, A. O. D. C., Silva, A. C., Paiva, A. C. D., Nunes, R. A., Gattass, M., 2016. 3d shape analysis to reduce false positives for lung nodule detection systems. Medical & Biological Engineering & Computing 55 (8), 1–15.
- GonÃ§alves et al. (2016) GonÃ§alves, L., Novo, J., Campilho, A., 2016. Hessian based approaches for 3D lung nodule segmentation. Pergamon Press, Inc.
- Javaid et al. (2016) Javaid, M., Javid, M., Rehman, M. Z. U., Shah, S. I. A., 2016. A novel approach to cad system for the detection of lung nodules in ct images. Computer Methods & Programs in Biomedicine 135 (C), 125–139.
- Lan et al. (2018) Lan, T., Chen, S., Li, Y., Ding, Y., Qin, Z., Wang, X., 2018. Lung nodule detection based on the combination of morphometric and texture features. Med. Imaging Health Inf., 464–471.
- Simonyan and Zisserman (2014) Simonyan, K., Zisserman, A., 2014. Very deep convolutional networks for large-scale image recognition. Computer Science.
- Gruetzemacher and Gupta (2016) Gruetzemacher, R., Gupta, A., 2016. Using deep learning for pulmonary nodule detection & diagnosis.
- Ding et al. (2017) Ding, J., Li, A., Hu, Z., Wang, L., 2017. Accurate pulmonary nodule detection in computed tomography images using deep convolutional neural networks, 559–567.
- Krizhevsky et al. (2012) Krizhevsky, A., Sutskever, I., Hinton, G. E., 2012. Imagenet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems. pp. 1097–1105.
- Li et al. (2016) Li, W., Cao, P., Zhao, D., Wang, J., 2016. Pulmonary nodule classification with deep convolutional neural networks on computed tomography images. Computational and Mathematical Methods in Medicine,2016,(2016-12-14) 2016, 1–7.
- Mohamed et al. (2012) Mohamed, A. R., Hinton, G., Penn, G., 2012. Understanding how deep belief networks perform acoustic modelling. In: IEEE International Conference on Acoustics, Speech and Signal Processing. pp. 4273–4276.
- Li et al. (2015) Li, T., Zhang, J., Zhang, Y., 2015. Classification of hyperspectral image based on deep belief networks. In: IEEE International Conference on Image Processing. pp. 5132–5136.
- Liu et al. (2014) Liu, S., Yang, N., Li, M., Zhou, M., 2014. A recursive recurrent neural network for statistical machine translation. In: Meeting of the Association for Computational Linguistics. pp. 1491–1500.
- Karpathy and Li (2014) Karpathy, A., Li, F. F., 2014. Deep visual-semantic alignments for generating image descriptions. IEEE Transactions on Pattern Analysis & Machine Intelligence 39 (4), 664–676.
- Goodfellow et al. (2014) Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y., 2014. Generative adversarial networks. Advances in Neural Information Processing Systems 3, 2672–2680.
- Radford et al. (2015) Radford, A., Metz, L., Chintala, S., 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. Computer Science.
- Ronneberger et al. (2015) Ronneberger, O., Fischer, P., Brox, T., 2015. U-net: Convolutional networks for biomedical image segmentation 9351, 234–241.
- He et al. (2015) He, K., Zhang, X., Ren, S., Sun, J., 2015. Deep residual learning for image recognition, 770–778.
- Gomathi and Thangaraj (2009) Gomathi, M., Thangaraj, P., 2009. Computer aided medical diagnosis system for detection of lung cancer nodules: a survey. International Journal of Computational Intelligence Research.
- Wang et al. (2017) Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., Tang, X., 2017. Residual attention network for image classification, 6450–6458.
- He et al. (2016) He, K., Zhang, X., Ren, S., Sun, J., 2016. Identity mappings in deep residual networks, 630–645.
- Srivastava et al. (2014) Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R., 2014. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15 (1), 1929–1958.
- Ioffe and Szegedy (2015) Ioffe, S., Szegedy, C., 2015. Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on International Conference on Machine Learning. pp. 448–456.
- Kingma and Ba (2014) Kingma, D., Ba, J., 2014. Adam: A method for stochastic optimization. Computer Science.
- Bergtholdt et al. (2016) Bergtholdt, M., Wiemker, R., Klinder, T., 2016. Pulmonary nodule detection using a cascaded svm classifier. In: SPIE Medical Imaging. p. 978513.
- Setio et al. (2016) Setio, A. A., Ciompi, F., Litjens, G., Gerke, P., Jacobs, C., Van, R. S., Winkler, W. M., Naqibullah, M., Sanchez, C., Van, G. B., 2016. Pulmonary nodule detection in ct images: false positive reduction using multi-view convolutional networks. IEEE Transactions on Medical Imaging 35 (5), 1160–1169.
- Dou et al. (2017) Dou, Q., Chen, H., Jin, Y., Lin, H., Qin, J., Heng, P. A., 2017. Automated pulmonary nodule detection via 3d convnets with online sample filtering and hybrid-loss residual learning, 630–638.
- Golan et al. (2016) Golan, R., Jacob, C., Denzinger, J., 2016. Lung nodule detection in ct images using deep convolutional neural networks. In: International Joint Conference on Neural Networks. pp. 243–250.
- Huang et al. (2017) Huang, X., Shan, J., Vaidya, V., 2017. Lung nodule detection in ct using 3d convolutional neural networks. In: IEEE International Symposium on Biomedical Imaging.
- Ginneken et al. (2010) Ginneken, B. V., Iii, S. G. A., Hoop, B. D., Duindam, T., Niemeijer, M., Murphy, K., Schilham, A., Retico, A., Fantacci, M. E., 2010. Comparing and combining algorithms for computer-aided detection of pulmonary nodules in computed tomography scans: The anode09 study. Medical Image Analysis 14 (6), 707–722.
- Gori (2007) Gori, I., 2007. A multi-scale approach to lung nodule detection in computed tomography. International Journal of Computer Assisted Radiology & Surgery 2, S353–S355.
- Wei et al. (2015) Wei, S., Zhou, M., Yang, F., Yang, C., Tian, J., 2015. Multi-scale Convolutional Neural Networks for Lung Nodule Classification. Springer International Publishing.
- Dou et al. (2017) Dou, Q., Chen, H., Yu, L., Qin, J., Heng, P. A., 2017. Multi-level contextual 3d cnns for false positive reduction in pulmonary nodule detection. IEEE Transactions on Biomedical Engineering 64 (7), 1558–1567.