A CADe System for Gliomas in Brain MRI using Convolutional Neural Networks

A CADe System for Gliomas in Brain MRI using Convolutional Neural Networks


Inspired by the success of Convolutional Neural Networks (CNN), we develop a novel Computer Aided Detection (CADe) system using CNN for Glioblastoma Multiforme (GBM) detection and segmentation from multi channel MRI data. A two-stage approach first identifies the presence of GBM. This is followed by a GBM localization in each “abnormal” MR slice. As part of the CADe system, two CNN architectures viz. Classification CNN (c-cnn) and Detection CNN (d-cnn) are employed. The CADe system considers MRI data consisting of four sequences (, , , and ) as input, and automatically generates the bounding boxes encompassing the tumor regions in each slice which is deemed abnormal. Experimental results demonstrate that the proposed CADe system, when used as a preliminary step before segmentation, can allow improved delineation of tumor region while reducing false positives arising in normal areas of the brain. The GrowCut method, employed for tumor segmentation, typically requires a foreground and background seed region for initialization. Here the algorithm is initialized with seeds automatically generated from the output of the proposed CADe system, thereby resulting in improved performance as compared to that using random seeds.

Convolutional neural network, deep learning, gliomas, MRI, brain tumor segmentation, bounding box, CADe.

I Introduction

Brain tumors are one of the deadliest cancers with a high mortality rate [1, 2]. They can be primary, i.e. directly originating in the brain, or metastatic, i.e. spreading from other parts of the body. Gliomas constitute 70% of malignant primary brain tumors in adults [2], and are usually classified as High Grade Gliomas (HGG) and Low Grade Gliomas (LGG). The HGG encompasses grades III and IV of the WHO categorization [3], exhibiting a rapidly proliferating behaviour with a patient survival period of only about a year [2].

Magnetic Resonance Imaging (MRI), Positron Emission Tomography (PET) and Computed Tomography (CT) are some of the standard radio imaging techniques used for diagnosing abnormalities in the brain. MRI has been extensively employed in diagnosing brain and nervous system abnormalities, over the last few decades, due to its improved soft tissue contrast as compared to plain radiography or CT [4, 5, 6]. MR images are usually procured in multiple sequences or modalities, depending on the different excitation and repetition times used during the scan. This enables the capture of distinct structures of interest, by producing noticeably different tissue contrasts [2, 7]. The sequences include -weighted, -weighted, -weighted with contrast enhanced (), and -weighed with fluid-attenuated inversion recovery (). The rationale behind using these four sequences lies in the fact that different tumor regions may be visible in different sequences, allowing for a more accurate composite marking of the tumor region. Delineation of tumor region in MRI sequences is of great importance since it allows: i) volumetric measurement of the tumor, ii) monitoring of tumor growth in the patient between multiple MRI scans, and iii) treatment planning with follow-up evaluation.

Tumor segmentation from brain MRI sequences is usually done manually by the radiologist. Being a highly tedious and error prone task, mainly due to factors such as human fatigue, overabundance of MRI slices per patient, and increasing number of patients, manual operations often lead to inaccurate delineation. Moreover, use of qualitative measures of evaluation by radiologists results in high inter- and intra-observer error rates, which are often difficult to characterize [2, 8, 9, 10]. The need for an automated or semi-automated Computer Aided Detection/Diagnosis (CADe/CADx) system thus becomes apparent. Such a system improves the overall performance of the detection and subsequent segmentation of abnormalities, particularly when used as an assistant to the radiologist.

Automated detection is a challenging task due to the variety of shapes, textures and orientations exhibited by the tumor region. Typically the tumor acts as a mass and pushes the normal tissue, thereby changing the overall structure of the brain. Besides, brain MRI slices are known to be affected by Bias Field Distortion (BSD) and other artifacts that change the homogeneity of tissue intensities in different slices of the same brain. Existing methods leave significant room for increased automation, applicability and improved accuracy. Since segmentation of tumors in brain is typically preceded by its detection and plays an important role in curbing improper segmentation of tumor region, we investigate here the automated tumor detection problem in brain MR images.

Recently, deep learning research has witnessed a growing interest for data analysis. Deep learning is a branch of machine learning consisting of a set of algorithms that attempt to model high level abstractions in data by using a deep model with multiple processing layers, composed of both linear and non-linear transformations [11, 12, 13]. Among these, Convolutional Neural Networks (CNNs) provided impressive performance on image recognition and classification problems [14, 15, 16, 17].

Convolutional Neural Networks (also called ConvNets or CNNs) [13] are suitable for processing input that comes in the form of a grid-like topology, for instance – time series and image data. Unlike a traditional Artificial Neural Network (ANN), a CNN uses a convolution operation instead of matrix multiplication in some or all of its layers. The design of CNNs is motivated by the functioning of the mammalian vision system, which hierarchically captures semantically rich visual features [18, 19, 13]. User-provided bounding boxes are a simple and popular form of annotation used in computer vision to initialize object segmentation as well as induce spatial constraints. DeepCut [20] combines CNN with iterative graphical optimization to recover pixelwise object segmentations, from an image database with existing bounding box annotation. Bounding boxes are manually generated from user-provided segmentation. A fully connected conditional random field serves to regularize the segmentation. Experimental results demonstrate segmentation of the brain and lung of fetal MRI. However such manual annotation entails human bias, is prone to error, and is also time consuming.

Our research focuses on the design and development of a fully automated CADe System for the detection of HGG using CNNs. The novel CADe system first identifies the presence of a tumor from the 3D MR slices of the brain. The bounding box approach automatically localizes the tumor in each “abnormal” slice, encompassing sequences. Subsequent segmentation enables improved tumor delineation, with reduced false positives. Initial seeds for segmentation are automatically generated by the system.

The rest of the paper is organized as follows. Section II provides a brief literature review on detection of brain tumors. Section III highlights the characteristics and merits of the proposed CADe system, while outlining the architecture and methodology. Section IV describes the experimental results on the BRATS 2015 dataset, demonstrating the effectiveness of subsequent segmentation both qualitatively and quantitatively with respect to existing related methods. Finally conclusions are presented in Section V.

Ii Overview of Brain Tumor Detection

Over the years, a number of techniques have been successfully devised to automatically detect brain tumors. Generative model based approaches define a-priori model of the normal brain, and detect abnormal regions by looking for outliers [21, 22, 23, 24]. Other generative models may use asymmetry cues to identify abnormalities in the brain MRI, with the underlying assumption that the left and right halves of the brain are symmetric for a normal patient [25, 26]. Saha et al. [26] employed the concept of bounding boxes for glioma and edema detection from brain MRI slices. The method uses symmetry based a-priori assumptions, with the left and right halves of the brain being expected to be a asymmetric in case of possible tumors. A scoring function based on Bhattacharya coefficient is computed with gray level intensity histograms. Although generative models have been shown to generalize well on unseen data due to their simple hypothesis functions, yet their dependence on a-priori knowledge makes them unsuitable to applications where this is not available. Moreover, these models heavily rely on accurate registration for aligning images of different modalities; which is sometimes problematic in the presence of an abnormality in the brain [27]. Some of the “atlas” based methods [22] may also lead to incorrect learning in the presence of large deformations in brain structures [25].

Image processing based methods, on the other hand, perform various operations on the MRI slices to detect abnormal (tumor) regions. They exploit underlying differences in intensity values between normal and abnormal regions. This encompasses watershed segmentation [28] followed by the application of some morphological operations to detect tumor regions in an MRI slice [8, 29]. However image processing based approaches often suffer from severe over-segmentation and noise, in the form of false positive regions, resulting in inappropriately delineated tumor region.

Advances in machine learning have made an impact over research in brain tumor detection from MRI slices. Most of the literature in this field proposed the use of hand-crafted features such as fractals [30], Gabor coefficients [31, 32], or their combination [33]. These features are then used to train AdaBoost [30], Bayesian classifier [31], decision trees, forests and SVMs [33, 34] which then detect and delineate the tumor region in the MRI slice(s). Although the above approaches demonstrate good performance on BRATS datasets, they rely heavily on hand-crafted features requiring extensive domain knowledge of the data source. Manual design of features typically demands greater insight into the exact characteristics of normal and abnormal tissues in the brain. Moreover, such features may not be able to accurately capture the important representative features in the abnormal tumor regions of the brain; leading to hindrance in classifier performance.

CNNs essentially revolutionized the field of computer vision and have since become the de-facto standard for various object detection and recognition tasks [15, 35, 16, 17]. These networks automatically learn mid-level and high-level representations or abstractions from the input training data in the form convolution filters, that get updated during the training process. They work directly on raw input (image) data, and learn the underlying representative features of hierarchically complex input, thereby ruling out the need for specialized hand-crafted image features. Moreover, CNNs require no prior domain knowledge and can learn to perform any task by automatically working through the training data.

A CNN is built using two fundamental types of layers, namely the Convolution Layer and Pooling Layer. The inputs percolating through network are the responses of convoluting the images with various filters. These filters act as detectors of simple patterns like lines, edges, corners, from spatially contiguous regions in an image. When arranged in many layers, the filters can automatically detect prevalent patterns while blocking irrelevant regions. The pooling layers serve to down sample the convoluted response maps. This helps lessen the number of trainable parameters, thereby resulting in reduction of overfitting possibilities. Deeper layers help the CNN extract higher levels of feature abstractions. These layers are usually followed by a classifier, which in most cases is a multi-layer perceptron (MLP). Apart from connection weights inside the MLP, the other trainable parameters in a CNN are the filters in each convolution layer.

The adoption rate of CNNs in medical imaging has been on the rise [36], with recent research focusing on topics ranging from lesion detection [37, 38, 39, 40] to segmentation and shape modelling [41, 42, 10] from 2D/3D CT and MR images. Inspired by their success, many medical imaging researchers have applied CNNs as pixel-level classifiers for abnormality detection and segmentation in brain MRI. Urban et al. [43] used 3D CNNs for detecting abnormal voxels from volumetric MRI sequences. Havaei et al. [44] designed a 2-way CNN architecture that exploits both the local and global context of an input image. Each pixel in every 2D slice of the MRI data is classified into either normal or a part of the tumor region. Recently, Pereira et al. [10] demonstrated impressive results by developing two separate CNN architectures corresponding to pixel-wise label prediction for detecting HGG and LGG tumor regions in brain MRI slices.

However existing literature using CNNs mainly focuses on pixel (or voxel) level tumor detection by labelling normal or abnormal categories [43, 45, 44, 10]. In this process, the two distinct phases detection and segmentation of tumor regions get merged. Although this might appear to be an ideal scenario, where the detection phase get bypassed, yet this may lead to high false positive rates because the algorithm works on every pixel in the MRI slice and is not constrained inside a specific region. Even in clinical settings, the demarcation between a normal and an abnormal patient followed by the detection of an abnormal region assumes greater significance; and this always precedes the actual segmentation and volumetric analysis of the tumor region. The ever-increasing deluge of data, that the radiologists are regularly besieged with, becomes a major hindrance towards the accurate delineation; thereby highlighting the need for an automated detection system.

The premise of this paper is that optimal tumor segmentation can be achieved through a preceding approximate tumor detection or localization step, that can aid accurate segmentation by acting as a seed towards constrained segmentation. Hence we take a detection-first approach in which the tumor region is approximately detected by our proposed system. Next this information is used to generate the seed for segmentation, resulting in the whole tumor region getting accurately delineated.

Iii The CADe System

A novel Computer Aided Detection (CADe) system is designed for tumors in Brain MRI slices, employing a combination of classification and regression phases. A pair of convolution network architectures, viz. c-cnn and d-cnn, constitute the CADe system. At the entry point to the system, the Classification Convolutional Neural Network (c-cnn) determines whether or not the patient’s brain MRI study is normal (or abnormal) based on the presence (or absence) of suspicious regions. Once an abnormal sample is identified, the Detection Convolutional Neural Network (d-cnn) is invoked to approximately identify the abnormal regions in each MRI slice. d-cnn works by predicting a bounding box around the tumor region to identify the abnormality.

We tackle the problem of tumor detection in brain MRI using a bounding box based localization approach, as evident in the computer vision community. The proposed method is robust to any anatomical changes in the appearance of the brain, as well as towards improper registration of MRI slices. The schematic diagram of the CADe system is provided in Fig. 1. The input to the system is a patient study containing 4-sequence MRI slices, and output is an approximate localization of any abnormality in the slices in the form of bounding box coordinates. When used as a preceding step to tumor segmentation, it can provide a seed for constrained demarcation of the abnormal region; thereby leading to improved delineation of the tumor region while simultaneously reducing the number of false positives. The approximate tumor position predicted by the CADe system is used as seed for GrowCut [46] towards subsequent segmentation of the tumor region from the MRI slice.

Fig. 1: Flowchart illustrating the CADe system for brain MRI

Iii-a Contribution

The merits of our CADe system, over existing tumor detection methodologies for brain MRI, are outlined below.

  • Due to the discriminative nature of our CADe system, there is neither any requirement of inherent a-priori domain knowledge nor assumption of brain symmetry (as in generative models [21, 22, 23, 24]). The deterministic nature of the system also rules out any inter-observer error, as is prevalent in clinical setting.

  • Compared to earlier machine learning based models [34, 30, 31, 32, 33], our system eliminates the need of hand-crafted features for slice classification and tumor localization by automatically extracting / learning the underlying highly representational and hierarchical features.

  • Due to the preceding approximate localization step, the final tumor segmentation can be constrained to the specific suspicious region(s); thereby ruling out any false positives in other (normal) regions of the brain.

  • Unlike atlas based approaches [25], the proposed system is highly robust to significant changes and deformations in brain anatomy casued by the presence of abnormality (tumor).

Iii-B Preprocessing

MRI sequence slices usually suffer from inconsistent image intensity problem, better known as Bias Field Distortion (BFD). This makes the intensity of the same tissue to vary across different slices of a sequence for a single patient. Thus the input training data is first subjected to Bias Field Distortion correction using N4ITK [47] for a homogeneous intensity range throughout each sequence. Further, the images are processed with a median filter to rule out any high frequency image noise. The images in both training and testing sets are standardized to zero mean and unit variance by calculating mean intensity value and standard deviation of pixels in the training set.

Iii-C Proposed architecture

The two-stage architecture, consisting of the classifier and detection modules c-cnn and d-cnn, serves to classify a 2D brain MRI slice into normal (or abnormal) followed by an approximate localization of the tumor region in the specified slices. This is outlined in Fig. 1.

Classification ConvNet (c-cnn)

A 12 layer Classification network c-cnn, consisting of three sets of stacked convolution and pooling layers followed by two fully connected layers, is illustrated in Fig. 2. This network serves as the entry point of the CADe system, which takes each 2D brain MRI slice (four-sequence MRI slice of size ) as input and provides the probability of that slice being normal or abnormal as output. The network thus classifies each slice, and computes the overall number of slices being flagged as abnormal in a particular study. If more than 5% of the slices are flagged as abnormal, the patient study is then passed down the pipeline to the localization network d-cnn. The value 5% was chosen empirically using a small validation set.

Fig. 2: Network c-cnn

The c-cnn network consists of six convolution layers (C1_1, C1_2, C2_1, C2_2, C3_1, C3_2), with filter (or kernel) sizes but having increasing filter numbers () over the layers. There are three pooling layers P1, P2, P3 with filter size each. The classifier at the end is a fully connected MLP of connectivity . Table I summarizes the entire c-cnn architecture. Smaller kernels produce better regularization due to the smaller number of trainable weights, with the possibility of constructing deeper networks without losing too much information in the layers [10, 17]. Greater number of filters, involving deeper convolution layers, allows for more feature maps to be generated; thereby compensating for the decrease in the size of each feature map caused by “valid” convolution and pooling layers. The convolution layer is said to be of type “valid” when the input to the layer is not zero-padded before the convolution operation, such that the resulting output becomes gradually smaller down the layers in terms of input dimension.

Layer Filter Stride FC Conv Input Output
Size Units Type
C1_1 32x3x3 1 valid 4x96x96 32x94x94
C1_2 32x3x3 1 valid 32x94x94 32x92x92
P1 2x2 2 32x92x92 32x46x46
C2_1 64x3x3 1 valid 32x46x46 64x44x44
C2_2 64x3x3 1 valid 64x44x44 64x42x42
P2 2x2 2 64x42x42 64x21x21
C3_1 128x3x3 1 valid 64x21x21 128x19x19
C3_2 128x3x3 1 valid 128x19x19 128x17x17
P3 2x2 2 128x17x17 128x8x8
FC1 550 8192 550
FC2 550 550 550
Out (K) 2 550 2
TABLE I: Architecture configuration of c-cnn for 2D MRI slice classification

The output feature map dimension, from a convolution layer, is calculated as


where is the input image width, is input image height, is effective output width, and is output height. Here denotes the input padding which (in our case) is set to zero due to “valid” convolution involving nil zero-padding. The displacement , with being the receptive field (kernel size) of the neurons in a particular layer. Input downsizing in max pooling layer, with filter size fixed at and stride of two for a non-overlapping pooling operation, results in downsampling by a factor of .

The final feature maps from the layer P3 are flattened into a feature vector , before being fed to the fully connected layer FC1 of the classifier MLP. Two fully connected layers, with hidden neurons each, constitute the MLP having two final outputs. The number of hidden neurons are chosen through automatic hyperparameter estimation using cross-validation. Non-linearity in the form of Rectified Linear Unit (ReLU) [48] is applied after each convolution as well as fully connected layer, thereby transforming negative activation values to zero using . Finally, the predicted distribution is computed by taking the softmax


where corresponds to the number of output neurons and is the activation value of th neuron. The number of trainable parameters in c-cnn is 5,097,598.

Fig. 3: Network d-cnn

Detection ConvNet (d-cnn)

The layer Detection network d-cnn, for predicting an approximate bounding box around the tumor region, is depicted in Fig. 3. Its input is a 4-sequence brain MRI slice , and its output consists of four real numbers , where , are the abscissa and ordinate of the upper left corner of the bounding rectangle, respectively, with and referring to its corresponding dimensions. d-cnn consists of four sets of stacked convolution and pooling layers, followed by two fully connected layers. Due to the complexity of the bounding box prediction problem, the d-cnn network architecture is deeper as compared to the c-cnn. Table II summarizes the entire d-cnn architecture.

Convolution layers in d-cnn have filter numbers (), while the filter sizes are in the first three pairs of layers and in the last layer. The convolution type in the first three layers of the d-cnn are set to “same” (allowing input zero-padding to preserve spatial size), with the last pair of layers being of type “valid” involving no zero-padding at input. The feature map generated after layer P4 is flattened into a feature vector , and fed into the first fully connected layer FC1 with hidden neurons (chosen through automatic cross validation). As in c-cnn, the non-linearities after each convolution layer are set to ReLU; although no such non-linearity is applied after the last output layer. Note that the total number of trainable parameters in d-cnn is 2,760,612, which is reduced as compared to that of the c-cnn.

Layer Filter Stride FC Conv Input Output
Size Units Type
C1_1 32x3x3 1 same 4x96x96 32x96x96
C1_2 32x3x3 1 same 32x96x96 32x96x96
P1 2x2 2 32x96x96 32x48x48
C2_1 64x3x3 1 same 32x48x48 32x48x48
C2_2 64x3x3 1 same 32x48x48 32x48x48
P2 2x2 2 32x48x48 32x24x24
C3_1 128x3x3 1 same 64x24x24 64x24x24
C3_2 128x3x3 1 same 64x24x24 64x24x24
P3 2x2 2 64x24x24 64x12x12
C4_1 128x5x5 1 valid 64x12x12 128x8x8
C4_2 128x5x5 1 valid 128x8x8 128x4x4
P4 2x2 2 128x4x4 128x2x2
FC1 1200 512 1200
FC2 1200 1200 1200
Out 4 1200 4
(Bounding Box)
TABLE II: Architectural configuration of d-cnn for approximate tumor localization

Iii-D Methodology

In this section we briefly describe issues related to parameter selection, cost function, and network evaluation.

Parameter selection

The final architecture is chosen heuristically, with a deep network developed to overfit followed by regularization using Dropout [49] with a probability . A value of is used in c-cnn (d-cnn).

Name Hyperparameter Value
Glorot Uniform (initializer) [50]
Glorot Uniform (initializer) [50]
iterations (c-cnn)
iterations (d-cnn)
learning rate
TABLE III: Hyperparameters chosen using cross-validation

The hyperparameters required for the training process, provided in Table III, were chosen through automatic cross-validation. While slices were used to train the c-cnn, the system had abnormal slices for the d-cnn. Since deep CNNs entail a large number of free trainable parameters, the effective number of training samples were artificially enhanced using real time data augmentation in the form of horizontal and vertical image flipping. This type of augmentation works on the CPU parallel to the training process running on GPU, thereby saving computing time and improving resource usage when the CPU is idle during training. The weights were updated by adadelta [51] based on Stochastic Gradient Descent (SGD), which adapts the learning rate using first order information. Its main advantage lies in avoiding manual tuning of learning rate and is robust to noisy gradient values, different model architectures, various data modalities and selection of hyperparameters [51].

Cost function

The cost function for c-cnn was chosen as binary cross-entropy (for the two-class problem) as


where is the number of samples, is the true label of a sample and is its predicted label.

In the case of d-cnn the Mean Squared Error (MSE) was used as the cost function.


where are vectors with four components corresponding to the four output values.

Network evaluation

The c-cnn was evaluated on the basis of classification accuracy, Area Under the ROC Curve (), precision, recall, and scores. Let = true positives, = true negatives, = total number of positive samples, = total number of negative samples, = false positives, and = false negatives. We have


with being chosen as 1 to provide equal weight to both precision and recall scores.

Evaluation of d-cnn, with respect to bounding box detection, was performed using Mean Absolute Error (MAE) and Dice Similarly Co-efficient (DSC). Here


denotes the number of pixels by which the predicted bounding box is displaced from the original ground truth rectangle, with lower values implying better prediction.

A measure of the overlap between the predicted and target bounding boxes was obtained as


where denote the binary prediction and target masks, respectively. The intensity values of masks are either 0 (area outside rectangle) or 1 (area inside rectangle), with and “one” implying a perfect overlap.

Iii-E Segmentation

The detected tumor region was next segmented by GrowCut [46], using seeds automatically generated by the proposed CADe system by Algorithm LABEL:alg:genseeds.

Fig. 4: Choice of seeds by Algorithm LABEL:alg:genseeds on slice. The yellow rectangle denotes predicted bounding box, red circle indicates foreground region, and green circle denotes background region.

The iterative method grows a spline or boundary, inside and outside the bounding box, to distinguish between the foreground (tumor) and background regions. The “seed pixels” are chosen along the circumference of the circular regions having centers (, ) and (, ), and radii and , as depicted in Fig. 4 for a sample slice. Here corresponds to the radius of the red region selected as foreground, and refers to the background brain region having green boundary. The bounding box is drawn in yellow in the figure.

Iv Experimental Results

The CADe system was modeled on the BRATS 2015 dataset [52], consisting of 220 patients with High Grade Glioma (HGG) over 155 slices from the four MRI modalities , , , and , along with their segmented “ground truth” about four intra-tumoral classes, viz. edema, enhancing tumor, non-enhancing tumor, and necrosis. The data was aligned as , skull stripped, and interpolated to voxel resolution. The total slice count for the entire dataset was , with each slice being of size . The slices were resized to before training on samples and testing on the remaining with final bounding box being interpolated back to the original input slice dimension. Training phase of c-cnn consisted of labeling slices as “nomal” or “abnormal”, based on the ground truth. In case of d-cnn, the model generated the rectangular bounding box fully enclosing the tumor region (for “abnormal”) and encoded as .

The c-cnn and d-cnn networks were developed using Theano [53], with a wrapper library Keras [54] in Python. The experiments were performed on a Dell Precision 7810 Tower with 2x Intel Xeon E5-2600 v3, totalling 12 cores, 256GB RAM, and NVIDIA Quadro K6000 GPU with 12GB VRAM. The operating system was Ubuntu 14.04. Segmentation of tumor regions was performed using ITK-SNAP [55] software.

After classification and detection by the c-cnn and d-cnn, the bounding box was used to select seeds for subsequent segmentation by Algorithm LABEL:alg:genseeds. This constitutes the Automated GrowCut (AGC) segmentation. A comparative study is also provided with a manual initialization from seeds, using ground truth about the foreground and background regions. This is termed Semi-Automated GrowCut (SGS) segmentation.

Iv-a Dectection

The performance of the two networks was quantitatively evaluated using eqns. (5)-(10). The c-cnn achieved an accuracy of , with an area under the ROC curve of . The precision and recall values were observed to be and , respectively. The score, with , was . The high recall rate implies detection of a large number of abnormal slices, while the high precision demonstrates accurate distinction between normal and abnormal slices. In case of d-cnn, the MAE was pixels with standard deviation of while generating the bounding box. The DSC measured the overlap to be .

Method DSC
Proposed CADe
Saha et al. [26]  
TABLE IV: Comparative study of DSC for detection

We also present a comparison of DSC, with that of the earlier approach by Saha et al. [26], in Table IV. The overlap between the ground truth (target) and predicted regions (by bounding box) is found to be higher in our proposed CADe system. The qualitative result for the CADe system is presented in Fig. 5. The results demonstrate that the bounding boxes predicted by the d-cnn closely resemble the original ground truth.

Fig. 5: Bounding box on 10 sample patient slices generated by the CADe system. Red rectangle denotes ground truth and green rectangle indicates the predicted response.
Fig. 6: Comparative study of segmentation of ten sample patients

Iv-B Segmentation

Fig. 6 presents a qualitative comparison (over 10 sample patients) of the segmentation obtained by the semi-automated SGC, involving manual insertion of seeds in the foreground and background, with that of our fully automated CADe system using AGC. It is clearly observed that the proposed model accurately simulates the ground truth.

Table V provides a quantitative comparison between these algorithms, by computing the mean and standard deviation (SD) over three runs, corresponding to DSC for segmentation for the and sequences. The semi-automated SGC involves three independent observers to insert the seed points in the foreground and background regions. It is observed from the table that inter-observer SD exists in SGC. On the other hand, the deterministic nature of our automated CADe system enables complete elimination of any deviation over seed initialization. As SD , over three runs, it was considered to be approximately zero and hence is not reported in the table. The last row of the table presents the corresponding average DSC (for segmentation) over the 10 sample patients. It is evident that the automated AGC, used by our CADe system, provides an overall better match over both and sequences.

TABLE V: Comparative study of DSC [Mean(Standard Deviation)] for segmentation in 10 sample patients

Iv-C Analysis of architecture

The proposed network design for c-cnn and d-cnn were next evaluated with respect to several variations in architecture. Considering the architecture of Sec. III-C as the “baseline” model, four experiments were performed as enumerated below.

  1. Training without any data augmentation. Absence of data augmentation typically leads to overfitting, with most artificial augmentation involving random rotations, width or height shifts, horizontal or vertical flipping, etc. [10].

  2. Using larger than kernels in the convolution layers.

  3. Employing deeper layers (networks).

  4. Exploring LeakyReLU [56] as layer-wise non-linearity, instead of standard ReLU.

In order to establish the statistical significance of our baseline model, a pairwise t-test is performed between the corresponding DSC scores (with a null hypothesis that the models being compared are similar). We set a threshold of , with the null hypothesis being rejected when the computed -value from a test between a pair of models becomes lower than this threshold. It implies that the difference between mean DSCs is likely to represent an actual difference between the pair of models being compared.


Table VI provides a study of comparative classification performance of the four variants over the baseline c-cnn architecture. Quantitative evaluation is made in terms of accuracy, Precision, Recall, of eqns. (5)-(8), and Area Under the Curve (AUC).

Experiment Accuracy Precision Recall
TABLE VI: Comparative study of c-cnn variants

It is observed from Table VI that E1 leads to overfitting, thereby causing a drop in detection Accuracy, Recall and , with poor generalization. On the other hand, Precision over the training set was higher than the baseline model. Analyzing Table VI, we observe that data augmentation improves delineation between normal and abnormal tissues in c-cnn.

Employing larger fitter (or kernel) sizes in each convolution block (by E2) as compared to the size the baseline model, and going up-to increasing by units over each pair of convolution layer, resulted in an increase of approximately 1.8 times in the number of tunable parameters. The higher network size produced increased computational overhead with higher training and testing times. Examining Table VI we note that having larger kernels leads to overfitting and degradation of generalization performance due to increased trainable parameters.

It has been consistently mentioned in deep learning literature that going deeper with convolutions may increase performance. The whole idea behind deep learning is to train as deep networks as possible. Since a conventional c-cnn employs pooling layers which reduce the dimension of its input, there appears an inherent upper limit to the depth before the network exhausts itself of input feature maps. In experiment E3, we tested with three additional layers [a convolution block (two convolution layers) and a pooling layer] being added just before the fully connected layers. The parameters of these newly added layers mimic the ones before them. The E3 version of c-cnn, with 15 layers, exhibited poorer performance than the baseline model, as observed from Table VI. However Precision on the training set was higher.

It is argued that imposing a strict condition to zero out the negative neuron activation, in ReLU, may lead to gradient impairment and subsequent adjustment of weights in the network. As a result a new variant called LeakyReLU [56], with activation function , where is the leakiness parameter, was employed. The function is designed to “leak” negative gradient instead of zeroing it [10]. Here we investigate the use of LeakyReLU, instead of standard ReLU, under E4 with (since higher values resulted in divergence of training). It is clear from Table VI that the generalization performance was poorer over the baseline model, while the Precision on training set was higher.


Table VII presents a comparative analsysis of the four variants over the baseline d-cnn architecture. Quantitative evaluation is provided in terms of MAE and DSC of eqns. (9) and (10). It is observed that absence of data augmentation lead to poorer generalization performance, as compared to our baseline model. On subjecting the baseline model to -test against E1, a -value = demonstrated its statistical significance.

Experiment MAE DSC
TABLE VII: Comparative study of d-cnn variants

For large kernels, the increase was upto due to the extra convolution block. This resulted in an increase of tunable parameters by around 3.3 times, with poorer performance in Table VII. A deeper architecture, by E3, generated a network of 18 layers. However the generalization performance in Table VII was poorer than that of our baseline model. The larger size resulted in increased training and testing overheads. The pairwise -test performed between the baseline d-cnn and E3, over DSC, returned a -value = , demonstrating its statistical significance. Hence it can be inferred that going deeper with convolutions did not help improve the performance. Use of LeakyReLU in E4 resulted in poorer performance as well. Statistical significance of our baseline model was proven by a -value of .

V Conclusions

An automated Computer Aided Detection (CADe) system has been developed, using Convolution Neural Networks, for detecting and segmenting high grade gliomas from brain MRI. The concept of bounding box is employed to detect tumor cases, with subsequent localization of the abnormality from individual MR slices. Two ConvNet models d-cnn and d-cnn were designed for the purpose. The detection and delineation results on the BRATS 2015 database, demonstrated the effectiveness of the choices of hyperparameters was studied. Comparative studies with related methods established the superiority of our CADe system.


  1. L. M. DeAngelis, “Brain tumors,” New England Journal of Medicine, vol. 344, no. 2, pp. 114–123, 2001.
  2. S. Bauer, R. Wiest, L.-P. Nolte, and M. Reyes, “A survey of MRI-based medical image analysis for brain tumor studies,” Physics in Medicine and Biology, vol. 58, no. 13, pp. R97–R129, 2013.
  3. D. N. Louis, H. Ohgaki, O. D. Wiestler, W. K. Cavenee, P. C. Burger, A. Jouvet, B. W. Scheithauer, and P. Kleihues, “The 2007 WHO classification of tumours of the central nervous system,” Acta Neuropathologica, vol. 114, no. 2, pp. 97–109, 2007.
  4. R. R. Edelman and S. Warach, “Magnetic Resonance Imaging,” New England Journal of Medicine, vol. 328, no. 10, pp. 708–716, 1993.
  5. S. Mitra and B. U. Shankar, “Integrating radio imaging with gene expressions toward a personalized management of cancer,” IEEE Transactions on Human-Machine Systems, vol. 44, no. 5, pp. 664–677, 2014.
  6. S. Mitra and B. U. Shankar, “Medical image analysis for cancer management in natural computing framework,” Information Sciences, vol. 306, pp. 111–131, 2015.
  7. S. Banerjee, S. Mitra, B. U. Shankar, and Y. Hayashi, “A novel GBM saliency detection model using multi-channel MRI,” PloS ONE, vol. 11, no. 1, p. e0146388, 2016.
  8. S. Banerjee, S. Mitra, and B. U. Shankar, “Single seed delineation of brain tumor using multi-thresholding,” Information Sciences, vol. 330, pp. 88–103, 2016.
  9. S. Banerjee, S. Mitra, and B. U. Shankar, “Roi segmentation from brain mr images with a fast multilevel thresholding,” in Proc. International Conference on Computer Vision and Image Processing, pp. 249–259, 2017.
  10. S. Pereira, A. Pinto, V. Alves, and C. A. Silva, “Brain tumor segmentation using convolutional neural networks in MRI images,” IEEE Transactions on Medical Imaging, vol. 35, no. 5, pp. 1240 – 1251, 2016.
  11. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015.
  12. I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Adaptive Computation and Machine Learning series, MIT Press, 2016.
  13. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
  14. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in Neural Information Processing Systems, pp. 1097–1105, 2012.
  15. C. Farabet, C. Couprie, L. Najman, and Y. LeCun, “Learning hierarchical features for scene labeling,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1915–1929, 2013.
  16. P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun, “Overfeat: Integrated recognition, localization and detection using convolutional networks,” in Proc. of International Conference on Learning Representations (Computing Research Repository (CoRR)), vol. arXiv:1312.6229, 2014.
  17. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” Computing Research Repository (CoRR), vol. arXiv:1409.1, 2014.
  18. D. H. Hubel and T. N. Wiesel, “Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex,” The Journal of Physiology, vol. 160, no. 1, pp. 106–154, 1962.
  19. K. Fukushima, “Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position,” Biological Cybernetics, vol. 36, no. 4, pp. 193–202, 1980.
  20. M. Rajchl, M. C. H. Lee, O. Oktay, K. Kamnitsas, J. Passerat-Palmbach, W. Bai, M. Damodaram, M. A. Rutherford, J. V. Hajnal, B. Kainz, and D. Rueckert, “Deepcut: Object segmentation from bounding box annotations using convolutional neural networks,” IEEE Transactions on Medical Imaging, vol. 36, no. 2, pp. 674–683, 2017.
  21. M. Prastawa, E. Bullitt, S. Ho, and G. Gerig, “A brain tumor segmentation framework based on outlier detection,” Medical Image Analysis, vol. 8, no. 3, pp. 275–283, 2004.
  22. M. B. Cuadra, C. Pollo, A. Bardera, O. Cuisenaire, J.-G. Villemure, and J.-P. Thiran, “Atlas-based segmentation of pathological MR brain images using a model of lesion growth,” IEEE Transactions on Medical Imaging, vol. 23, no. 10, pp. 1301–1314, 2004.
  23. E. I. Zacharaki, D. Shen, S.-K. Lee, and C. Davatzikos, “ORBIT: A multiresolution framework for deformable registration of brain tumor images,” IEEE Transactions on Medical Imaging, vol. 27, no. 8, pp. 1003–1017, 2008.
  24. B. H. Menze, K. van Leemput, D. Lashkari, M.-A. Weber, N. Ayache, and P. Golland, “A generative model for brain tumor segmentation in multi-modal images,” in Proc. of MICCAI, pp. 151–159, Springer, 2010.
  25. H. Khotanlou, O. Colliot, J. Atif, and I. Bloch, “3D brain tumor segmentation in MRI using fuzzy classification, symmetry analysis and spatially constrained deformable models,” Fuzzy Sets and Systems, vol. 160, no. 10, pp. 1457–1473, 2009.
  26. B. N. Saha, N. Ray, R. Greiner, A. Murtha, and H. Zhang, “Quick detection of brain tumors and edemas: A bounding box method using symmetry,” Computerized Medical Imaging and Graphics, vol. 36, no. 2, pp. 95–107, 2012.
  27. S. Parisot, H. Duffau, S. Chemouny, and N. Paragios, “Joint tumor segmentation and dense deformable registration of brain MR images,” in Proc. of MICCAI, pp. 651–658, Springer, 2012.
  28. A. Mustaqeem, A. Javed, and T. Fatima, “An efficient brain tumor detection algorithm using watershed & thresholding based segmentation,” International Journal of Image, Graphics and Signal Processing, vol. 4, no. 10, pp. 34–39, 2012.
  29. Y. Sharma and Y. K. Meghrajani, “Brain tumor extraction from MRI image using mathematical morphological reconstruction,” in Proc. of 2nd International Conference on Emerging Technology Trends in Electronics, Communication and Networking (ET2ECN), pp. 1–4, IEEE, 2014.
  30. A. Islam, S. M. S. Reza, and K. M. Iftekharuddin, “Multifractal texture estimation for detection and segmentation of brain tumors,” IEEE Transactions on Biomedical Engineering, vol. 60, no. 11, pp. 3204–3215, 2013.
  31. N. K. Subbanna, D. Precup, D. L. Collins, and T. Arbel, “Hierarchical probabilistic Gabor and MRF segmentation of brain tumours in MRI volumes,” in Proc. of International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 751–758, Springer, 2013.
  32. W. Wu, A. Y. C. Chen, L. Zhao, and J. J. Corso, “Brain tumor detection and segmentation in a CRF (conditional random fields) framework with pixel-pairwise affinity and superpixel-level features,” International Journal of Computer Assisted Radiology and Surgery, vol. 9, no. 2, pp. 241–253, 2014.
  33. M. Soltaninejad, G. Yang, T. Lambrou, N. Allinson, T. L. Jones, T. R. Barrick, F. A. Howe, and X. Ye, “Automated brain tumour detection and segmentation using superpixel-based extremely randomized trees in FLAIR MRI,” International Journal of Computer Assisted Radiology and Surgery, vol. 12, no. 2, pp. 183––203, 2017.
  34. D. Zikic, B. Glocker, E. Konukoglu, A. Criminisi, C. Demiralp, J. Shotton, O. M. Thomas, T. Das, R. Jena, and S. J. Price, “Decision forests for tissue-specific segmentation of high-grade gliomas in multi-channel MR,” in Proc. of MICCAI, pp. 369–376, Springer, 2012.
  35. I. J. Goodfellow, Y. Bulatov, J. Ibarz, S. Arnoud, and V. D. Shet, “Multi-digit number recognition from street view imagery using deep convolutional neural networks,” Computing Research Repository (CoRR), vol. arXiv:1312.6, 2013.
  36. H. Greenspan, B. van Ginneken, and R. M. Summers, “Deep learning in medical imaging: Overview and future promise of an exciting new technique,” IEEE Transactions on Medical Imaging, vol. 35, no. 5, pp. 1153–1159, 2016.
  37. A. A. A. Setio, F. Ciompi, G. Litjens, P. Gerke, C. Jacobs, S. J. van Riel, M. M. W. Wille, M. Naqibullah, C. I. Sánchez, and B. van Ginneken, “Pulmonary nodule detection in CT images: False positive reduction using multi-view convolutional networks,” IEEE Transactions on Medical Imaging, vol. 35, no. 5, pp. 1160–1169, 2016.
  38. H. R. Roth, L. Lu, J. Liu, J. Yao, A. Seff, K. Cherry, L. Kim, and R. M. Summers, “Improving computer-aided detection using convolutional neural networks and random view aggregation,” IEEE Transactions on Medical Imaging, vol. 35, no. 5, pp. 1170–1181, 2016.
  39. Q. Dou, H. Chen, L. Yu, L. Zhao, J. Qin, D. Wang, V. C. T. Mok, L. Shi, and P.-A. Heng, “Automatic detection of cerebral microbleeds from MR images via 3D convolutional neural networks,” IEEE Transactions on Medical Imaging, vol. 35, no. 5, pp. 1182–1195, 2016.
  40. K. Sirinukunwattana, S. E. A. Raza, Y.-W. Tsang, D. R. J. Snead, I. A. Cree, and N. M. Rajpoot, “Locality sensitive deep learning for detection and classification of nuclei in routine colon cancer histology images,” IEEE Transactions on Medical Imaging, vol. 35, no. 5, pp. 1196–1206, 2016.
  41. F. C. Ghesu, E. Krubasik, B. Georgescu, V. Singh, Y. Zheng, J. Hornegger, and D. Comaniciu, “Marginal space deep learning: Efficient architecture for volumetric image parsing,” IEEE Transactions on Medical Imaging, vol. 35, no. 5, pp. 1217–1228, 2016.
  42. T. Brosch, L. Y. W. Tang, Y. Yoo, D. K. B. Li, A. Traboulsee, and R. Tam, “Deep 3D convolutional encoder networks with shortcuts for multiscale feature integration applied to multiple sclerosis lesion segmentation,” IEEE Transactions on Medical Imaging, vol. 35, no. 5, pp. 1229–1239, 2016.
  43. G. Urban, M. Bendszus, F. A. Hamprecht, and J. Kleesiek, “Multi-modal brain tumor segmentation using deep convolutional neural networks,” in Proc. of MICCAI-BRATS (Winning Contribution), pp. 31–35, 2014.
  44. M. Havaei, A. Davy, D. Warde-Farley, A. Biard, A. Courville, Y. Bengio, C. Pal, P. M. Jodoin, and H. Larochelle, “Brain tumor segmentation with deep neural networks,” Medical Image Analysis, vol. 35, pp. 18–31, 2017.
  45. D. Zikic, Y. Ioannou, M. Brown, and A. Criminisi, “Segmentation of brain tumor tissues with convolutional neural networks,” Proc. of MICCAI-BRATS, pp. 36–39, 2014.
  46. V. Vezhnevets and V. Konouchine, “GrowCut: Interactive multi-label ND image segmentation by cellular automata,” in Proc. of Graphicon, vol. 1, pp. 150–156, 2005.
  47. N. J. Tustison, B. B. Avants, P. A. Cook, Y. Zheng, A. Egan, P. A. Yushkevich, and J. C. Gee, “N4ITK: Improved N3 bias correction,” IEEE Transactions on Medical Imaging, vol. 29, no. 6, pp. 1310–1320, 2010.
  48. V. Nair and G. E. Hinton, “Rectified linear units improve Restricted Boltzmann Machines,” in Proc. of 27th International Conference on Machine Learning (ICML-10), pp. 807–814, 2010.
  49. N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.
  50. X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proc. of International Conference on Artificial Intelligence and Statistics, pp. 249–256, 2010.
  51. M. D. Zeiler, “ADADELTA: An adaptive learning rate method,” Computing Research Repository (CoRR), vol. arXiv:1212.5701, 2012.
  52. B. H. Menze, A. Jakab, et al., “The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS),” IEEE Transactions on Medical Imaging, vol. 34, no. 10, pp. 1–32, 2015.
  53. J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio, “Theano: A CPU and GPU Math Expression Compiler,” in Proc. of 9th Python in Science Conference (SciPy 2010), pp. 1–7, 2010.
  54. F. Chollet, “Keras: Deep learning library for tensorflow and theano.” https://github.com/fchollet/keras, in GitHub Repository, 2015.
  55. P. A. Yushkevich, J. Piven, H. Cody Hazlett, R. Gimpel Smith, S. Ho, J. C. Gee, and G. Gerig, “User-guided 3D Active Contour segmentation of anatomical structures: Significantly improved efficiency and reliability,” Neuroimage, vol. 31, no. 3, pp. 1116–1128, 2006.
  56. A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models,” in Proc. of 30th International Conference on Machine Learning, JMLR, vol. 28, pp. 1–6, 2013.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description