Ensemble of Deep Convolutional Neural Networks for Learning to Detect Retinal Vessels in Fundus Images
Vision impairment due to pathological damage of the retina can largely be prevented through periodic screening using fundus color imaging. However the challenge with large scale screening is the inability to exhaustively detect fine blood vessels crucial to disease diagnosis. In this work we present a computational imaging framework using deep and ensemble learning for reliable detection of blood vessels in fundus color images. An ensemble of deep convolutional neural networks is trained to segment vessel and non-vessel areas of a color fundus image. During inference, the responses of the individual ConvNets of the ensemble are averaged to form the final segmentation. In experimental evaluation with the DRIVE database, we achieve the objective of vessel detection with maximum average accuracy of 94.7% and area under ROC curve of 0.9283.
Computational imaging, deep learning, convolutional neural network, ensemble learning, vessel detection.
Pathological conditions of the retina examined through regular screening [1, 2] can heavily assist prevention of visual blindness. Fundus imaging is the most widely used modality for early screening and detection of such blindness causing diseases like diabetic retinopathy, glucoma, age-related macular degeneration , hypertension and stroke induced changes . Imaging of fundus has largely improved with progress from the film based photography camera to use of electronic imaging sensors; as well as red free imaging, stereo photography, hyperspectral imaging, angiography, etc. , thereby reducing inter- and intra-observer reporting variability. Retinal image analysis has also significantly contributed to this technological development [5, 6]. Since fundus imaging is predominantly used for first level of abnormality screening, research focus includes: (i) detection and segmentation of retinal structures (vessels, fovea, optic disc), (ii) segmentation of abnormalities, and (iii) quality quantification of images acquired to assess reporting fitness .
Related Work: The process of clinical reporting of retinal abnormalities is systematic and lesions are reported with respect to their location from vessels or optic disc. Computer assisted diagnosis systems are accordingly being developed to improve the clinical workflow . Some of the developments include, assessment of image quality ; blood vessel detection , branch pattern, diameter and vascular tree analysis [3, 4, 9, 10]; followed by reporting of lesions and their location with respect to the vessels [4, 10]. An important challenge in this context is robust and exhaustive detection of retinal vessels in color fundus images leveraging the potential of computer assisted diagnosis [5, 11] and assist in routine screening [4, 3].
Challenge: Methods for vessel detection and segmentation [5, 6, 7, 9, 10, 12] predominantly use image filters, vector geometry, statistical distribution studies, and machine learning of low-level features and photon distribution models for vessel detection. Such methods rely on use of handcrafted features or heuristic assumptions for solving the problem and are not generalized to learn pattern attributes from the data itself, thus making them vulnerable to performance subjectivity on account of the method’s inherent weaknesses. Recently fully data-driven ,deep learning based models have been proposed . However, they are weaker in performance compared to the state of the art methods that use the conventional paradigm. The primary challenge here is to design an end-to-end framework which learns pattern representation from the data without any domain knowledge based heuristic information to identify both coarse and fine vascular structures and is at least at par if not better than the heuristic-based models.
Approach: This paper makes an attempt to ameliorate the issue of subjectivity induced bias in feature representation by training an ensemble of Convolutional Neural Networks (ConvNets)  on raw color fundus images to discriminate vessel pixels from non-vessel ones. Fig. 1 illustrates an example of exhaustive retinal vessel detection using this approach. Each ConvNet has three convolutional layers and two fully connected layers and is trained independently on randomly selected patches from the training images. At the time of inference, the vesselness-probabilities independently output by each ConvNet are averaged to form the final vesselness probability of each pixel.
§II gives a brief theoretical background of the proposed method. The problem statement is formally defined in §III and the proposed approach is described in §IV. The results of experimental evaluation on DRIVE dataset have been presented in §V. The paper is concluded with a summary of the proposed method and a discussion of possible impact of end-to-end deep learning based solutions for medical image analysis in §VI.
Ii Theoretical Background
This section introduces some concepts regarding ConvNets and ensemble learning which form the pillars of the proposed solution.
Convolutional Neural Networks: Convolutional neural networks (CNN or ConvNet) are a special category of artificial neural networks designed for processing data with a grid-like structure [14, 15]. The ConvNet architecture is based on sparse interactions and parameter sharing and is highly effective for efficient learning of spatial invariances in images [16, 17]. There are four kinds of layers in a typical ConvNet architecture: convolutional (conv), pooling (pool), fully-connected (affine) and rectifying linear unit (ReLU). Each convolutional layer transforms one set of feature maps into another set of feature maps by convolution with a set of filters. Mathematically, if and denote the weights and the bias of the filter of the convolutional layer and be its activation-map, then:
where is the convolution operator. Pooling layers perform a spatial downsampling of the input feature maps. Pooling helps to make the representation become invariant to small translations of the input. Fully-connected layers are similar to the layers in a vanilla neural network. Let denote the incoming weight matrix and , the bias vector of a fully-connected layer, . Then:
where operator tiles the feature-maps of the input volume along the height, is matrix multiplication and is element-wise addition. ReLU layers perform a pointwise rectification of the input and correspond to the activation function. For the unit of layer :
In a deep ConvNet, units in the deeper layers indirectly interact with a larger area of the input, thus forming a high level abstraction of the input data.
Ensemble learning: Ensemble learning is a technique of using multiple models or experts for solving a particular artificial intelligence problem . Ensemble methods seek to promote diversity among the models they combine and reduce the problem related to overfitting of the training data. The outputs of the individual models of the ensemble are combined (e.g. by averaging) to form the final prediction. Concretely, if be models of an ensemble and is the probability that the input is classified as under the model , then the ensemble predicts:
Ensemble learning promotes better generalization and often provides higher accuracy of prediction than the individual models.
Iii Problem Statement
Let be an image acquired by the RGB sensor of a color fundus camera. The intensity observed at location is denoted by . Let be a set of pixels in the local neighborhood of . Let be the set of class labels for the pixel at location . In a machine learning framework, the probability of finding a tissue of type at location , is modelled by a class of functions where is a set of parameters which are learned from the training data and , a set of hyperparameters tuned using the validation data. In the proposed method, is an ensemble of ConvNets, the architecture of which is described next.
Iv Proposed Solution
Each layer of a ConvNet transforms one volume of features into another. A volume of features is described as where is the number of feature maps of spatial dimension . The input to each ConvNet of the proposed ensemble is a color fundus image patch. The ConvNets have the same organization of layers which can be described as: input- [conv - relu]-[conv - relu - pool] x 2 - affine - relu - [affine with dropout] - softmax. Fig.2 gives a schematic diagram of the organization. Each conv layer has receptive field size - , stride - and output volume - . The pool layers have receptive field size - and stride - . Dropout  is a regularization method for neural networks that enforces sparsity, prevents co-adaptation of features and promotes better generalization by forcing a fraction of neurons to be inactive during each episode of learning. The output of the final layer is passed to a softmax function which converts the outputs into class probabilities. Let denote the activation of the neuron of the fully-connected output-layer and denote the posterior probability of the output class. Then:
V Experimental Results and Discussion
This section presents experimental validation of the proposed technique and its performance comparison with earlier methods .
Dataset: The ensemble of ConvNets is is evaluated by learning with the DRIVE training set (image id. 21-40) and testing over the DRIVE test set (image id. 1-20)111DRIVE dataset: http://www.isi.uu.nl/Research/Databases/DRIVE/.
Learning mechanism: Each ConvNet is trained independently on a set of randomly chosen patches. Learning rate and annealing rate were kept constant across models at and respectively. Dropout probability, regularization coefficient and number of hidden units in the penultimate affine layer of the different models were sampled respectively from , and where denotes uniform probability distribution over a given range. The models were trained using RMSProp algorithm  with minibatch size .
Performance assessment: Table I presents the accuracy and consistency of detection in comparison with those reported in earlier techniques. It is clearly evident that although our approach does not have the highest accuracy as compared with other methods, it does exhibit superior performance than the previously proposed deep learning based method for learning vessel representations from data . The kappa score being a study of observer consistency indicates sensitivity of the technique to detect both coarse and fine vessels as desired. Typical response to detection of both coarse and fine vessels are presented in Fig. 3.
|Max. avg. Accuracy||Kappa|
|Maji et al. ||0.9327||0.6287|
|Sheet et al. ||0.9766||0.8213|
|Staal et al. .||0.9422||-|
|Niemeijer et al.||0.9416||0.7145|
|Zana et al.||0.9377||0.6971|
|Jiang et al.||0.9212||0.6399|
|Martínez-Pérez et al.||0.9181||0.6389|
|Chaudhuri et al.||0.8773||0.3357|
Fig: 4 gives the receiver operating characteristic (ROC) curve of the proposed method. Area under ROC curve obtained is .
This paper presents a ConvNet-ensemble based framework for processing color fundus images for detection of coarse and fine vessels. The method is evaluated experimentally on the DRIVE dataset. The remarkable ability of ConvNets to recognize images and that of ensemble learning at generalization is leveraged to design a heuristics independent, data driven approach to analyzing medical images. This presents a feasible solution to subjectivity induced bias in medical image analysis. This is an improvement of our previous work on data-driven analysis of fundus images . This approach in general also provides a strong alternative approach to solve complex medical data analysis problem through deep learning combined with the power of ensemble learning.
-  A. Tuulonen, P. J. Airaksinen, A. Montagna, and H. Nieminen, “Screening for glaucoma with a non-mydriatic fundus camera,” Acta Ophthalmol., vol. 68, no. 4, pp. 445–449, 1990.
-  E. Stefensson, T. Bek, M. Porta, N. Larsen, J. K. Kristinsson, and E. Agardh, “Screening and prevention of diabetic blindness,” Acta Ophthalmol., vol. 78, no. 4, pp. 374–385, 2000.
-  C. Heneghan, J. Flynn, M. O-Keefe, and M. Cahill, “Characterization of changes in blood vessel width and tortuosity in retinopathy of prematurity using image analysis,” Medical Image Analysis, vol. 6, no. 4, pp. 407 – 429, 2002.
-  T.-Y. Wong, R. Klein, B. E. K. Klein, J. M. Tielsch, L. Hubbard, and F. J. Nieto, “Retinal microvascular abnormalities and their relationship with hypertension, cardiovascular disease, and mortality,” Survey, Ophthal., vol. 46, no. 1, pp. 59 – 80, 2001.
-  M.D. Abrámoff, M.K. Garvin, and M. Sonka, “Retinal imaging and image analysis,” IEEE Rev. Biomed. Engg., vol. 3, pp. 169 –208, 2010.
-  N. Patton, T. M. Aslam, T. MacGillivray, I. J. Deary, B. Dhillon, R. H. Eikelboom, K. Yogesan, and I. J. Constable, “Retinal image analysis: Concepts, applications and potential,” Progress in Retinal and Eye Research, vol. 25, no. 1, pp. 99 – 127, 2006.
-  J. Staal, M.D. Abramoff, M. Niemeijer, M.A. Viergever, and B. van Ginneken, “Ridge-based vessel segmentation in color images of the retina,” IEEE Trans. Med. Imaging, vol. 23, no. 4, pp. 501 –509, Apr. 2004.
-  M.E. Gegundez-Arias, A. Aquino, J.M. Bravo, and D. Marin, “A function for quality evaluation of retinal vessel segmentations,” IEEE Trans. Med. Imaging, vol. 31, no. 2, pp. 231 –239, Feb. 2012.
-  M. Sofka and C.V. Stewart, “Retinal vessel centerline extraction using multiscale matched filters, confidence and edge measures,” IEEE Trans. Med. Imaging, vol. 25, no. 12, pp. 1531 –1546, Dec. 2006.
-  J. Jan, J. Odstrcilik, J. Gazarek, and R. Kolar, “Retinal image analysis aimed at blood vessel tree segmentation and early detection of neural-layer deterioration,” Comput. Med. Imaging and Graphics, vol. 36, no. 6, pp. 431 – 441, 2012.
-  M. Niemeijer, J. Staal, B. van Ginneken, M. Loog, and M.D. Abramoff, “Comparative study of retinal vessel segmentation methods on a new publicly available database,” in SPIE Medical Imaging. Proc. SPIE, 2004, vol. 5370, pp. 648–656.
-  D. Sheet, S. P. K. Karri, S. Conjeti, S. Ghosh, J. Chatterjee, and A. K. Ray, “Detection of retinal vessels in fundus images through transfer learning of tissue specific photon interaction statistical physics,” in Proc. Int. Symp. Biomed. Imaging, 2013, pp. 1452–1456.
-  D. Maji, A. Santara, S. Ghosh, D. Sheet, and P. Mitra, “Deep neural network and random forest hybrid architecture for learning to detect retinal vessels in fundus images,” in Engineering in Medicine and Biology Society (EMBC), 2015 37th Annual International Conference of the IEEE, 2015, pp. 3029–3032.
-  Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, pp. 2278–2324, 1998.
-  Ian Goodfellow, Yoshua Bengio, and Aaron Courville, “Deep learning,” Book in preparation for MIT Press, 2016.
-  Yann LeCun, Yoshua Bengio, and Geoffrey Hinton, “Deep learning,” Nature, vol. 521, pp. 436–444, 2015.
-  Stephane Mallat, “Understanding deep convolutional networks,” arXiv:1601.04920, 2016.
-  T.G. Dietterich, “Ensemble methods in machine learning,” LNCS, vol. 1857, pp. 1–15, 2001.
-  Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” Journal of Machine Learning Research, vol. 15, pp. 1929–1958, 2014.
-  Geoffrey Hinton, Nitish Srivastava, and Kevin Swersky, “Overview of minibatch gradient descent,” .