Multi-region segmentation of bladder cancer structures in MRI with progressive dilated convolutional networks

Multi-region segmentation of bladder cancer structures in MRI
with progressive dilated convolutional networks

Jose Dolz Laboratory for Imagery, Vision and Artificial Intelligence (LIVIA) École de technologie supérieure, Montréal, Canada.    Xiaopan Xu School of Biomedical Engineering, Fourth Military Medical University, Xi’an, China.    Jérôme Rony Laboratory for Imagery, Vision and Artificial Intelligence (LIVIA) École de technologie supérieure, Montréal, Canada.    Jing Yuan Xidian University, School of Mathematics and Statistics, Xi’an, China.    Yang Liu School of Biomedical Engineering, Fourth Military Medical University, Xi’an, China.    Éric Granger Laboratory for Imagery, Vision and Artificial Intelligence (LIVIA) École de technologie supérieure, Montréal, Canada.    Christian Desrosiers Laboratory for Imagery, Vision and Artificial Intelligence (LIVIA) École de technologie supérieure, Montréal, Canada.    Xi Zhang School of Biomedical Engineering, Fourth Military Medical University, Xi’an, China.    Ismail Ben Ayed Laboratory for Imagery, Vision and Artificial Intelligence (LIVIA) École de technologie supérieure, Montréal, Canada.    Hongbing Lu School of Biomedical Engineering, Fourth Military Medical University, Xi’an, China.

Purpose: Precise segmentation of bladder walls and tumor regions is an essential step towards non-invasive identification of tumor stage and grade, which is critical for treatment decision and prognosis of patients with bladder cancer (BC). However, the automatic delineation of bladder walls and tumor in magnetic resonance images (MRI) is a challenging task, due to important bladder shape variations, strong intensity inhomogeneity in urine and very high variability across population, particularly on tumors appearance. To tackle these issues, we propose to use a deep fully convolutional neural network.

Methods: The proposed network includes dilated convolutions to increase the receptive field without incurring extra cost nor degrading its performance. Furthermore, we introduce progressive dilations in each convolutional block, thereby enabling extensive receptive fields without the need for large dilation rates. The proposed network is evaluated on 3.0T T2-weighted MRI scans from 60 pathologically confirmed patients with BC.

Results: Experiments shows the proposed model to achieve high accuracy, with a mean Dice similarity coefficient of 0.98, 0.84 and 0.69 for inner wall, outer wall and tumor region, respectively. These results represent a very good agreement with reference contours and an increase in performance compared to existing methods. In addition, inference times are less than a second for a whole 3D volume, which is between 2-3 orders of magnitude faster than related state-of-the-art methods for this application.

Conclusion: We showed that a CNN can yield precise segmentation of bladder walls and tumors in bladder cancer patients on MRI. The whole segmentation process is fully-automatic and yields results in very good agreement with the reference standard, demonstrating the viability of deep learning models for the automatic multi-region segmentation of bladder cancer MRI images.

Bladder cancer, T2-weighted MRI, Convolutional neural networks, Deep learning, Bladder segmentation
Jose Dolz and Xiaopan Xu contributed equally to this work.

I Introduction

Urinary bladder cancer (BC) is a life-threatening disease with high morbidity and mortality rate Antoni2017Bladder (); Woo2017Diagnostic (); Alfred2016Updated (); AmericanCancerSociety2016 (); Kamat2016 (); Choi2015 (). Accurate identification of tumor stage and grade is of extreme clinical importance for the treatment decision and prognosis of patients with BC Kamat2016 (); Choi2015 (); Knowles2015 (); Cancer2014 (); Duan2010 (). Clinical standard reference for this task is optical cystoscopy (OCy) with transurethral resection (TUR) biopsies, however, this procedure is often limited due to its invasiveness and discomfort for patients. With narrow field-of-view (FOV) for lumen observation and local characterization of tissue samples, single transurethral biopsy has exhibited relatively high misdiagnostic rates, especially for staging Kamat2016 (); Knowles2015 (); Cancer2014 (); Duan2010 (). Recent advances in magnetic resonance imaging (MRI) and image processing technologies have made radiomics methods that predict tumor stage and grade using image features a potential alternative for non-invasive evaluation of bladder cancer Duan2010 (); qin2014adaptive (); Xu2017 (); Xu2017Preoperative (); Zhang2017Radiomics ().

Previous studies indicate that inner and outer bladder wall (IW and OW) as well as attached tumor regions in MRI images have great potential in reflecting tumorous subtypes, properties or muscle invasivenesss by using useful radiomics descriptors Xu2017Preoperative (); Zhang2017Radiomics (); xiao20163d (); Xu2017 (); qin2014adaptive (); duan2012adaptive (). In order to precisely extract radiomics features from regions of interest for quantitative evaluation of cacinomatous properties, an accurate and automated segmentation of IW, OW and tumor regions is needed Duan2010 (); qin2014adaptive (); duan2012adaptive ().

Figure 1: Challenges in computer-assisted segmentation of T2-weighted bladder MR images. (a) Intensity inhomogeneity in the lumen. (b) Weak wall boundaries. (c) Complex background intensity distribution. (d) Disconnected tumor region in the lumen.
Method Target
Li et al.,2004 li2004new () Markov random field (MRF) IW
Li et al., 2008 li2008segmentation () Markov random field (MRF) IW
Duan et al., 2010 Duan2010 () Coupled Level-sets IW/OW
Chi et al., 2011 chi2011segmentation () Coupled Level-sets IW/OW
Garnier et al., 2011 garnier2011bladder () Active region growing IW
Ma et al., 2011 ma2011novel () Geodesic active contour (GAC) + Shape-guided Chan-Vese IW/OW
Duan et al., 2012 duan2012adaptive () Coupled Level-set + Bladder wall thickness Prior Tumour
Han et al., 2013 han2013unified () Adaptive Markov random field (MRF) + Coupled Level-set IW/OW
Qin et al., 2014 qin2014adaptive () Coupled directional Level-sets IW/OW
Xiao et al., 2016 xiao20163d () Coupled directional Level-sets + Fuzzy c-means IW/OW/Tumour
Xu et al., 2017 xu2017simultaneous () Continuous max-flow + Bladder wall thickness Prior IW/OW
Table 1: Proposed methods to segment regions of interest in BC images via MRI.

However, the automatic delineation of IW and OW in MRI images is a challenging task, due to important bladder shape variations, strong intensity inhomogeneity in urine caused by motion artifacts, weak boundaries and complex background intensity distribution (Figure 1) Duan2010 (); qin2014adaptive (); duan2012adaptive (). When further considering the presence of cancer, the problem becomes much harder as it introduces more variability across population. That might explain why literature on multi-region bladder segmentation remains scarce, with few techniques proposed to date (Table 1). Initial attempts considered the use of Markov Random Fields to tackle the segmentation of the IW li2004new (); li2008segmentation (). Garnier et al. garnier2011bladder () proposed a fast deformable model based on an active region growing strategy that solved the leakage issue from standard region growing algorithms. The algorithm combined an inflation force, which acts like a region growing process, and an internal force that constrains the shape of the surface. Nevertheless, it would be difficult to apply these approaches directly to OW segmentation due to the complex distribution of tissues surrounding the bladder.

Several level-set based segmentation methods have been introduced to extract both inner and outer bladder walls Duan2010 (); chi2011segmentation (); han2013unified (); qin2014adaptive (). In Duan2010 (), Duan et al. developed a coupled level-set framework which adopts a modified Chan-Vese model to locate both IW and OW from T1-weighted MRI in a 2D slice fashion. Recently, Chi et al. chi2011segmentation () applied a geodesic active contour (GAC) model in T2-weighted MRI images to segment the IW, and then coupled the constraint of maximum wall thickness in T1-weighted MRI images to segment the OW. The limitation of this work arises from the difficulty to register the slices between the two sequences. To overcome these limitations, Qin et al. Qin2014 () proposed an adaptive shape prior constrained level-set algorithm that evolves both IW and OW simultaneously from T2-weighted images. Despite its precision, this algorithm can be sensitive to the initializing process. In an extension of these approaches, Xiao et al. xiao20163d () introduced a second step based on fuzzy c-means bezdek1984fcm () to include tumor segmentation in the pipeline. However, this extended method showed inconsistent results between different datasets. A main limitation of these level-set approaches is the burden of computation time and the difficulty to define a stopping criterion. As alternative, a modified Geodesic active contour (GAC) model and a shape-guided Chan-Vese model were proposed in ma2011novel () to segment bladder walls. Recently, Xu et al. xu2017simultaneous () introduced a continuous max-flow framework with global convex optimization to achieve a more accurate segmentation of both IW and OW. Nevertheless, an important limitation of all previous methods is their high sensitivity to initialization, which makes the full automation of segmentation very challenging. Further, most methods focus only on bladder walls and are unable to segment simultaneously both bladder walls and tumors.

Deep learning has recently emerged as a powerful modeling technique, demonstrating significant improvements in various computer vision tasks such as image classification huang2017densely (), object detection redmon2016yolo9000 () and semantic segmentation yu2015multi (). Particularly, convolutional neural networks (CNNs) have been applied with enormous success to many medical image segmentation problems DolzNeuro2017 (); Fechter_Esophagus (); dolz2018hyperdense (). Bladder segmentation has also been addressed with deep learning techniques, however, the image modality of study has been limited to computed tomography (CT) cha2016urinary (); cha2016bladder (); men2017automatic (). For example, Cha et al.cha2016urinary () proposed a convolutional neural network followed by a level-set to segment the IW and OW. Considering the significant advantages of MRI for the imaging process, including its high soft-tissue contrast and non-radiation, it may be more suitable for the characterization of bladder wall and tumor properties. Surprisingly, the application of deep learning to the multi-region segmentation of bladder cancer in MRI images remains, to the best of our knowledge, unexplored.

In light of these limitations, and inspired by the success of deep learning in medical image segmentation, we propose to address the task of multi-region bladder segmentation in MRI using a CNN. Specifically, we use a deep CNN that builds on UNet ronneberger2015u (), a well establish model for segmentation which combines a contracting path and an expansive path to get a high-resolution output of the same size as the input. To increase the receptive field spanned by the network, we propose to use a sequence of progressive dilation convolutional layers. This strategy allows us to span broader regions of input images without incorporating large dilation rates that can degrade segmentation performance. The current work is the first attempt to apply CNNs for multi-region segmentation of bladder cancer in MRI.

Ii Methods

ii.1 Fully convolutional neural networks

Convolutional neural networks (CNNs) are a special type of artificial neural networks that learn a hierarchy of increasingly complex features by successive convolution, pooling and non-linear activation operations (krizhevsky2012imagenet, ; lecun1998gradient, ). Originally designed for image recognition and classification, CNNs are now commonly used in semantic image segmentation. A naive approach follows a sliding-window strategy where regions defined by the window are processed independently. This technique presents two main drawbacks: reduction of segmentation accuracy and low efficiency. An alternative approach, known as fully CNNs (FCNNs) (FCN, ), mitigates these limitations by considering the network as a single non-linear convolution that is trained in an end-to-end fashion. An important advantage of FCNNs compared to standard CNNs is that they can be applied to images of arbitrary size. Moreover, because the spatial map of class scores is obtained in a single dense inference step, FCNNs can avoid redundant convolution operations, making them computationally more efficient.

The networks explored in this work are built on the UNet architecture, which has shown satisfactory performance in various medical segmentation tasks christ2016automatic (); cciccek20163d (); dong2017automatic (); sirinukunwattana2017gland (); zotti2017gridnet (). This network consists of a contracting and expanding path, the former collapsing an image down into a set of high level features and the latter using these features to construct a pixel-wise segmentation mask. The original architecture also proposed skip-connections between layers at the same level in both paths, by-passing information from early feature maps to the deeper layers in the network. These skip-connections allow incorporating high level features and fine pixel-wise details simultaneously.

Unlike in natural images, targeting clinical structures in segmentation tasks requires a certain knowledge of the global context. This information may, for example, indicate how an organ is arranged with respect to other ones. Standard convolutions have difficulties to integrate global context, even when pooling operations are sequentially added into the network. For instance, in the original UNet model, the receptive field spanned by the deepest layer is only 128128 pixels. This means that the context of the entire image is not fully considered in the deep architecture to generate its final prediction. A straightforward solution for increasing the receptive field is to include additional pooling operations in the network. However, this strategy usually decreases the performance since relevant information gets lost in the added down-sampling operations.

ii.2 Dilated convolutions

The dilated convolution operator, also referred in the literature as atrous convolution, was originally developed for wavelet decomposition holschneider1990real (). Very recently, Yu and Koltun yu2015multi () adopted this operation for semantic segmentation to increase the receptive field of deep CNNs, as alternative to down-sampling feature maps. The main idea is to insert “holes” (i.e., zeros) between pixels in convolutional kernels to increase image resolution of intermediate feature maps, thus enabling dense feature extraction in deep CNNs with an enlarged field of convolutional kernels (Figure 2). This ultimately leads to more accurate predictions wolterink2016dilated (); wu2016high (); moeskops2017adversarial (); lopez2017dilated (); chen2018deeplab (); anthimopoulos2018semantic ().

Let us consider a convolutional kernel in layer with sizes . The receptive field of , also known as effective kernel size, can be defined as


where represents the dilation rate of kernel , which specifies the number of zeros (or holes) to be placed between pixels. Note that, in standard convolutions, is equal to 1. Furthermore, stride value is considered equal to 1 for simplicity.

Figure 2: Examples of some dilation kernels and their effect on the receptive field. It is important to note that the number of parameters associated with each layer is identical.

ii.3 Architectures details

In this study, we propose to use dilated convolutions in a CNN architecture based on UNet. To evaluate the impact of dilated convolutions on segmentation performance, three models are investigated. First, we build a baseline using the original UNet setting with a few modifications (Section II.3.1). In the second network, the first standard convolution of each block in the baseline model is replaced by a dilated convolution (Section II.3.2). For the third model, the entire standard block in the baseline is replaced by the proposed progressive dilated convolutional block (Section II.3.3).

ii.3.1 UNet baseline

Our baseline network has three main differences compared to the original version of UNet. First, we employ convolutions with stride 2 instead of max-pooling in the contracting path. Second, the deconvolutional blocks in the decoding path are replaced by upsampling and convolutional blocks, which have demonstrated to improve performance badrinarayanan2017segnet () (Fig. 3,a). Third, to have a more compact representation of learned features, a bottleneck block with residual connections (Fig. 3,b) is introduced between the contracting and expanding paths. The objective of these connections is to have the information flow fom the block’s input to its output without modification, thus encouraging the path through non-linearities to learn a residual representation of the input data he2016deep (). In addition, the number of kernels in the first convolutional block has been reduced from 64 to 32, since no improvement was observed with the heavier model, and allowing us to obtain a more efficient model.

Figure 3: Diagram depicting some of the blocks employed in our architectures. W denotes the width or number of feature maps.

Furthermore, each convolution layer in the proposed models performs batch normalization ioffe2015batch (). By reducing variations between training samples in mini-batch learning, this technique was shown to accelerate convergence of the parameter learning process and make the model more robust in testing. In addition, all activation functions in our networks are parametric rectifier linear units (PReLUs) he2015delving ().

ii.3.2 Dilated UNet

Our first dilated CNN model follows the general architecture of UNet, but introduces a context module at each block of the encoding path. The context module contains a dilated convolution as first operation of each block to systematically aggregate multi-scale contextual information. An inherent problem when employing dilated convolutions is gridding wang2017understanding () (Fig. 4,top). As zeros are padded between pixels in a dilated convolutional kernel, the receptive field spanned by this kernel only covers an area with some sort of checkerboard patterns, sampling only locations with non-zero values. This results in the loss of neighboring information, which might be relevant for an effective learning. If dilation rate increases, this issue gets even worse, as the convolution kernel becomes too sparse to capture any local information. To alleviate this problem, we follow the strategy proposed in some other works paszke2016enet (); romera2017efficient (), where dilated convolutions are alternated with standard convolutions, and dilation rates are progressively increased. Therefore, the dilation rate in the convolutional blocks of this model are equal to 1, 2, 4 and 8, from shallow to deep layers, respectively.

Figure 4: Examples of some dilation kernels and their effect on the receptive field. It is important to note that the number of parameters associated with each layer is identical. Image from yu2015multi ().

ii.3.3 UNet with progressive dilated convolutional blocks

Instead of gradually increasing the dilation factor through different layers, we propose to increase it within each context module. The main idea is that features learned at each block are able to capture multi-scale level information. Therefore, at each block, the dilation rate will be equal to 1,2, and 4. With this, we avoid including large values that span broader regions, while maintaining the same network receptive field.

Figure 5 gives the schematic of the proposed model. As shown, the deep network consists of two important components: an encoder part and a decoder part. The encoder path is composed of 16 convolutional layers that learn visual features for the input data, while the decoder path contains 17 convolutional layers responsible for creating the dense segmented mask and recovering the original resolution. The dilation rate is shown at the bottom of the convolutional layers in the first block. In the rest of the network, blocks with the same color correspond to same dilation rate.

Figure 5: Overall framework of the proposed deep model. The convolutional blocks on the encoding path contain progressive dilated convolutions. At each convolutional block, the dilated convolutions with ratios x1,x2 and x4 are included.

Iii Experiments

iii.1 Materials

iii.1.1 Patients population

The study was approved by the Ethics Committee of Tangdu Hospital of the Fourth Military Medical University. Informed content was obtained from each enrolled subject. Sixty patients with pathologically confirmed BC between October 2013 and May 2016 were involved in this study.

iii.1.2 Image acquisition

All subjects were examined by a clinical whole body MR scanner (GE Discovery MR 750 3.0T) with a phased-array body coil before treatment. A high-resolution 3D Axial Cube T2-weighted (T2W) MR sequence was adopted due to its high soft tissue contrast and relatively fast image acquisition. Prior to scanning, each patient was asked to drink enough mineral water and then waited for an adequate time period so that the bladder was sufficiently distended. The acquisition time ranged from 160.456 to 165.135 s with three-dimensional scanning. The repetition and echo time were 2500 ms and 135 ms, respectively. The imaging process contained from 80 to 124 slices per scan, each of size 512 512 pixels, with a pixel resolution of 0.5 0.5 mm. Moreover, the slice thickness was 1 mm, and the space between slices also 1 mm.

iii.1.3 Ground truth

For each dataset, the urine, bladder walls and tumor regions were manually delineated per slice by two experts – whose MR image interpretation experience was both more than 9 years – with a custom-developed package of MATLAB 2016b. During delineation, all regions were first outlined by the experts independently, who were blinded to the clinico-pathological information of the patients. Then, the boundaries of the regions they outlined were read per slice in consensus with the corresponding information of clinical diagnosis of the patient, or both clinical diagnosis and functional MRI data, like the diffusion-weighted or dynamic contrast-enhanced MRI images, if available.

iii.1.4 Evaluation

Similarity between two segmentations can be assessed by employing several comparison metrics. Since each of these yields different information, their choice is very important and must be considered in the appropriate context. In terms of overlapping, the Dice similarity coefficient (DSC) dice1945measures () has been widely employed to compare volume similarities. The DSC for two volumes and can be defined as


However, volume-based metrics generally lack sensitivity to segmentation outline, and segmentations showing a high degree of spatial overlap might present clinically-relevant differences between their contours. This is particularly important in medical applications, such as radiation treatment planning, where contours serve as critical input to compute the delivered dose or to estimate prognostic factors. An additional analysis of the segmentation outline’s fidelity is highly recommended since any under-inclusion of the target region might lead to a higher radiation exposure in healthy tissues, or vice-versa, an over-inclusion might lead to tumor regions not being sufficiently irradiated. Thus, distance-based metrics like the average symmetric surface distance (ASSD) were also considered in our evaluation. The ASSD between contours and are defined as follows:


where is the distance between point and .

iii.2 Implementation details

The three networks were trained from scratch by employing Adam optimizer and minimizing the cross-entropy between the predicted probability distributions and the ground truth. The initial learning rate was set to 1e and it was decreased to the half after 20 epochs without showing any improvement on the validation set. Our implementation of the three models was done in pyTorch paszke2017automatic () and ran the experiments on a machine equipped with a NVIDIA TITAN X GPU with 12GBs of memory.

iii.3 Results

The performance of the UNet-Progressive model was compared with that of the UNet-Baseand UNet-Dilated models introduced in Sections II.3.1 and II.3.2. For these experiments, the dataset was split into training, validation and testing sets, which were composed by 40, 5 and 15 patients, respectively. These datasets remained the same for training and testing all models. Figure 6 depicts the evolution of the DSC measured on the validation set at different training epochs. In general, all models yield similar performance for the inner wall (IW) and outer wall (OW) regions. However, in the case of tumor regions, the UNet-progressive model obtained a higher accuracy than the other models once the training converged.

Figure 6: Evolution of DSC on validation set during training for the inner and outer wall, as well as tumor.

Tables 2 and 3 report the accuracy in terms of DSC and ASSD obtained by the compared models on the testing set. Results show the three models to achieve comparable results on inner and outer wall segmentation. However, as observed in the validation set, the UNet-Progressive model performed better than the baseline and UNet-Dilated when segmenting the tumor. This is particularly evident on the ASSD where the differences between the three models are larger. Figure 7 details the distribution of DSC and ASD values for the three models.

Model DSC ( 95 of confidence interval)
Inner Wall Outer Wall Tumor
UNet-Base 0.9839 0.0030 0.8344 0.0214 0.6276 0.0963
UNet-Dilated 0.9844 0.0030 0.8386 0.0232 0.6791 0.0818
UNet-Progressive 0.9836 0.0033 0.8391 0.0247 0.6856 0.0827
Table 2: DSC of using the proposed model with the independent testing group.
Model ASSD ( 95 of confidence interval)
Inner Wall Outer Wall Tumor
UNet 0.3379 0.0796 0.4503 0.0919 3.7432 1.6923
UNet-Dilated 0.3210 0.0632 0.4238 0.0725 3.4320 1.9224
UNet-Progressive 0.3517 0.0874 0.4299 0.0859 2.8352 1.1865
Table 3: ASSD of using the proposed model with the independent testing group.

The same trend is observed in Fig. 8, where segmentation results for the baselines, proposed model and ground truth are depicted. These images illustrate the variable sizes of tumors, some of them quite small and thus hard to segment (e.g., the tumor in the bottom row). Once again, we see that the three models achieve similar segmentations for inner and outer walls, and that differences arise when comparing the tumor segmentations provided by the models. Even though the tumor is typically identified by all the models, the proposed UNet-Progressive model achieves the most reliable contours compared to the ground truth. UNet underestimates the tumor region in two of the three examples, and generates a blobby contour in the third case (top). On the other hand, UNet-Dilated improves results compared to the version without dilated convolutions, however fails to separate outer walls from carcinogenic regions in some cases (top of the figure). By employing progressive dilated modules, our UNet-Progressive network can successfully differentiate tumor and outer walls, as shown in the top-right image of Fig. 8.

Figure 7: Distribution of the results for the three networks analyzed.
Figure 8: Examples of the segmentation results using the three models.
Inference times

In this section we compare the several architectures in terms of efficiency (Table 4). We can observe that inference times per 2D slice is very similar across the three deep models. Taking into account that a volume contains between 80 and 124 2D slices, we can easily realize that segmentation of a whole volume was performed in less than a second, regardless the architecture.

Method Mean inference time (ms/2D slice)
UNet baseline 4.42
UNet dilated 5.22
UNet progressive dilated 5.87
Table 4: Inference times for the three analyzed architectures.

Iv Discussion

In this study, a deep CNN model with progressive dilated convolutional modules was proposed to accurately segment multiple regions in MRI images of bladder cancer patients. The proposed network extends the well-known UNet model by including dilated convolutions, which have been shown to improve the performance of deep CNNs with standard convolutions. We evaluated the proposed model on MRI image datasets acquired from an in-house cohort of 60 patients with bladder cancer. Results demonstrate the proposed approach to achieve state-of-the-art results compared to existing approaches for this task, in a fraction of time.

Evaluation of the three compared models has shown similar results for the inner and outer bladder walls segmentation. However, a great improvement was seen between the different models for tumor segmentation, particularly between the baselines and models with dilated convolutions. The difference between the baseline model and the model with exponentially growing dilated convolutions can be explained by having a large receptive field in the latter, which leverages more contextual information. When using progressive dilated convolutions instead, the ability to span similar-sized regions while avoiding large dilation rates – which insert many holes between neighbor pixels – might explain improved accuracy compared to the other dilated convolution model.

Research on automatic segmentation of bladder multiple regions on MRI images remains very limited (Table 1). Many works relied on level-sets to achieve this task. However, even though level-sets have dominated several fields during the past, they present some important drawbacks. First, these variational approaches are based on local optimization techniques, making them highly sensitive to initialization and image quality. Second, if multiple objects are embedded in another object, multiple initializations of the active contours will be required, which is very time-consuming. Third, if there exist some gaps in the target, evolving contours will simply leak into those gaps and represent objects with incomplete contours. And fourth, processing times can be prohibitive, particularly in medical applications where segmentation is typically performed in volumes. As reported by some previous works, segmentation times usually exceed 20 minutes for a single 3D volume. In this work, we have demonstrated that deep models, i.e., CNNs, can overcome all these limitations and achieve satisfactory results for the task at hand.

Multiple metrics have been reported in literature to evaluate the performance of bladder segmentation approaches. Furthermore, some works only report qualitative results, merely indicating whether segmentations are good garnier2011bladder (); han2013unified (), which is subject to user interpretation. This makes it hard to carry out a fair and complete comparison between the proposed model and previous approaches for this problem. Ma et al. ma2011novel () reported a mean DSC of 0.97 for the IW, but performance on the OW was not assessed. More recently, Qin et al. qin2014adaptive () evaluated their method on 11 subjects, reporting mean DSC values of 0.96 and 0.71 for IW and OW, and an average surface distance (ASD) of 1.45 and 1.94 for IW and OW, respectively. In another study, Xu et al. Xu2017 () achieved a mean DSC of 0.87 measured on IW and OW. Thus, in light of previous results and the advantages of the proposed model with respect to the state-of-the-art methods, we believe that approaches similar to those proposed in this work should be considered from now to assess the segmentation of BC images.

Although our results demonstrated the high performance of proposed models, there are some regions where segmentation might not be satisfactory in a clinical setting (e.g., Fig. 9). To improve CNN-based segmentation, recent works have considered to cast the probability maps from the CNN as unary potentials in an energy minimization framework Fechter_Esophagus (); chen2018deeplab (); kamnitsas2017efficient (). In these works, length is typically employed as a regularizer on the energy function. Nevertheless, more complex regularizers have demonstrated to boost the performance of segmentation techniques, i.e. convexity gorelick2017convexity () or compactness dolz2017unbiased (). Employing such regularizers may ideally improve performance in the current application given the compact shape of the bladder.

Figure 9: Some visual examples of failure on segmentations. Ground truth are shown in the green frame (top), whereas CNN segmentations are depicted in the red frame (bottom).

The annotated data used in this study were very limited and acquired by the same scanner with the same imaging parameters, which may possibly reduce the generalization of the proposed scheme and impair its overall performance in segmentation. A larger validation group, including datasets acquired from multiple clinical centers with different scanners and imaging parameters, would further demonstrate its potential in real clinical applications.

Even though segmentation is a fundamental task in the medical field, it rarely represents the final objective of the clinical pipeline. In the assessment of bladder cancer patients, segmentation of IW, OW and tumor is employed to evaluate the muscle invasiveness and grade of BC, which play a crucial role in treatment decision and prognosis Xu2017Preoperative (); Zhang2017Radiomics (); Liu2017Relationship (); Xu2017 (). In future works, we aim to apply the proposed multi-region segmentation scheme with a radiomics strategy to asses the pre-operatively and automatic evaluation of BC. In addition, we expect to verify whether significant differences exist in radiomics predictions when using the automatic or manual delineation strategies.

V Conclusion

We proposed a multi-region semantic segmentation approach with progressive dilated convolution blocks for bladder cancer in MRI. Progressive dilated blocks allow having the same receptive field as standard dilated blocks but with lower dilation rates. The proposed network achieved a higher level of accuracy in segmentations than their counterparts, particularly when segmenting tumors. Moreover, the proposed model outperformed the state-of-the-art methods for the task at hand, bringing three important advantages: i) it allows for multi-region segmentation simultaneously, ii) there is no need for contour initialization, and iii) inference is achieved much faster, typically between 2-3 orders of magnitude. Therefore, deep CNNs in general, and the proposed network in particular, are very suitable to accomplish this task.


This work is supported by the National Science and Engineering Research Council of Canada (NSERC), discovery grant program, the National Nature Science Foundation of China under grant No.81230035, National Key Research and Development Program of China under grant No.2017YFC0107400, Key project supported by Military Science and Technology Foundation under grant No.BWS14C030, and by the ETS Research Chair on Artificial Intelligence in Medical Imaging.

Disclosure of Conflicts of Interest: The authors have no relevant conflicts of interest to disclose.



  • (1) W. J. Alfred, T. Lebret, E. M. Compérat, N. C. Cowan, S. M. De, H. M. Bruins, V. Hernández, E. L. Espinós, J. Dunn, and M. Rouanne. Updated 2016 EAU guidelines on muscle-invasive and metastatic bladder cancer. European Urology, 2016.
  • (2) American Cancer Society. Cancer Facts & Figures 2016. Cancer Facts & Figures 2016, pages 1–9, 2016.
  • (3) M. Anthimopoulos, S. Christodoulidis, L. Ebner, T. Geiser, A. Christe, and S. Mougiakakou. Semantic segmentation of pathological lung tissue with dilated fully convolutional networks. arXiv preprint arXiv:1803.06167, 2018.
  • (4) S. Antoni, J. Ferlay, I. Soerjomataram, A. Znaor, A. Jemal, and F. Bray. Bladder cancer incidence and mortality: A global overview and recent trends. European Urology, 71(1):96, 2017.
  • (5) V. Badrinarayanan, A. Kendall, and R. Cipolla. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE transactions on pattern analysis and machine intelligence, 39(12):2481–2495, 2017.
  • (6) J. C. Bezdek, R. Ehrlich, and W. Full. FCM: The fuzzy c-means clustering algorithm. Computers & Geosciences, 10(2-3):191–203, 1984.
  • (7) Cancer Genome Atlas Research Network and others. Comprehensive molecular characterization of urothelial bladder carcinoma. Nature, 507(7492):315–322, 2014.
  • (8) K. H. Cha, L. Hadjiiski, R. K. Samala, H.-P. Chan, E. M. Caoili, and R. H. Cohan. Urinary bladder segmentation in CT urography using deep-learning convolutional neural network and level sets. Medical physics, 43(4):1882–1896, 2016.
  • (9) K. H. Cha, L. M. Hadjiiski, R. K. Samala, H.-P. Chan, R. H. Cohan, E. M. Caoili, C. Paramagul, A. Alva, and A. Z. Weizer. Bladder cancer segmentation in CT for treatment response assessment: application of deep-learning convolution neural network—a pilot study. Tomography: a journal for imaging research, 2(4):421, 2016.
  • (10) L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE transactions on pattern analysis and machine intelligence, 40(4):834–848, 2018.
  • (11) J. W. Chi, M. Brady, N. R. Moore, and J. A. Schnabel. Segmentation of the bladder wall using coupled level set methods. In Biomedical Imaging: From Nano to Macro, 2011 IEEE International Symposium on, pages 1653–1656. IEEE, 2011.
  • (12) W. Choi, S. Porten, S. Kim, D. Willis, E. R. Plimack, B. Roth, T. Cheng, M. Tran, I.-l. Lee, J. Melquist, J. Bondaruk, T. Majewski, S. Zhang, S. Pretzsch, and K. Baggerly. Identification of distinct basal and luminal subtypes of muscle-invasive bladder cancer with different sensitivities to frontline chemotherapy. Cancer Cell, 25(2):152–165, 2015.
  • (13) P. F. Christ, M. E. A. Elshaer, F. Ettlinger, S. Tatavarty, M. Bickel, P. Bilic, M. Rempfler, M. Armbruster, F. Hofmann, M. D’Anastasi, et al. Automatic liver and lesion segmentation in CT using cascaded fully convolutional neural networks and 3D conditional random fields. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 415–423. Springer, 2016.
  • (14) Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger. 3D U-Net: learning dense volumetric segmentation from sparse annotation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 424–432. Springer, 2016.
  • (15) L. R. Dice. Measures of the amount of ecologic association between species. Ecology, 26(3):297–302, 1945.
  • (16) J. Dolz, I. Ben Ayed, and C. Desrosiers. Unbiased shape compactness for segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 755–763. Springer, 2017.
  • (17) J. Dolz, C. Desrosiers, and I. Ben Ayed. 3D fully convolutional networks for subcortical segmentation in MRI: A large-scale study. NeuroImage, pages –, 2017.
  • (18) J. Dolz, K. Gopinath, J. Yuan, H. Lombaert, C. Desrosiers, and I. Ben Ayed. Hyperdense-Net: A hyper-densely connected CNN for multi-modal image segmentation. arXiv preprint arXiv:1804.02967, 2018.
  • (19) H. Dong, G. Yang, F. Liu, Y. Mo, and Y. Guo. Automatic brain tumor detection and segmentation using U-Net based fully convolutional networks. In Annual Conference on Medical Image Understanding and Analysis, pages 506–517. Springer, 2017.
  • (20) C. Duan, Z. Liang, S. Bao, H. Zhu, S. Wang, G. Zhang, J. J. Chen, and H. Lu. A coupled level set framework for bladder wall segmentation with application to MR cystography. IEEE Transactions on Medical Imaging, 29(3):903–915, 2010.
  • (21) C. Duan, K. Yuan, F. Liu, P. Xiao, G. Lv, and Z. Liang. An adaptive window-setting scheme for segmentation of bladder tumor surface via MR cystography. IEEE Transactions on Information Technology in Biomedicine, 16(4):720–729, 2012.
  • (22) T. Fechter, S. Adebahr, D. Baltas, I. Ben Ayed, C. Desrosiers, and J. Dolz. Esophagus segmentation in CT via 3D fully convolutional neural network and random walk. Medical physics, 44(12):6341–6352, 2017.
  • (23) C. Garnier, W. Ke, and J.-L. Dillenseger. Bladder segmentation in MRI images using active region growing model. In Engineering in Medicine and Biology Society, EMBC, 2011 Annual International Conference of the IEEE, pages 5702–5705. IEEE, 2011.
  • (24) L. Gorelick, O. Veksler, Y. Boykov, and C. Nieuwenhuis. Convexity shape prior for binary segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(2):258–271, 2017.
  • (25) H. Han, L. Li, C. Duan, H. Zhang, Y. Zhao, and Z. Liang. A unified EM approach to bladder wall segmentation with coupled level-set constraints. Medical image analysis, 17(8):1192–1205, 2013.
  • (26) K. He, X. Zhang, S. Ren, and J. Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision, pages 1026–1034, 2015.
  • (27) K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  • (28) M. Holschneider, R. Kronland-Martinet, J. Morlet, and P. Tchamitchian. A real-time algorithm for signal analysis with the help of the wavelet transform. In Wavelets, pages 286–297. Springer, 1990.
  • (29) G. Huang, Z. Liu, K. Q. Weinberger, and L. van der Maaten. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, volume 1, page 3, 2017.
  • (30) S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning, pages 448–456, 2015.
  • (31) A. M. Kamat, N. M. Hahn, J. A. Efstathiou, S. P. Lerner, P.-u. Malmström, W. Choi, C. C. Guo, and Y. Lotan. Bladder cancer. The Lancet, 388, 2016.
  • (32) K. Kamnitsas, C. Ledig, V. F. Newcombe, J. P. Simpson, A. D. Kane, D. K. Menon, D. Rueckert, and B. Glocker. Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Medical image analysis, 36:61–78, 2017.
  • (33) M. A. Knowles and C. D. Hurst. Molecular biology of bladder cancer : new insights into pathogenesis and clinical diversity. Nature Publishing Group, 15(1):25–41, 2015.
  • (34) A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
  • (35) Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  • (36) L. Li, Z. Liang, S. Wang, H. Lu, X. Wei, M. Wagshul, M. Zawin, E. J. Posniak, and C. S. Lee. Segmentation of multispectral bladder MR images with inhomogeneity correction for virtual cystoscopy. In Medical Imaging 2008: Physiology, Function, and Structure from Medical Images, volume 6916, page 69160U. International Society for Optics and Photonics, 2008.
  • (37) L. Li, Z. Wang, X. Li, X. Wei, H. L. Adler, W. Huang, S. A. Rizvi, H. Meng, D. P. Harrington, and Z. Liang. A new partial volume segmentation approach to extract bladder wall for computer-aided detection in virtual cystoscopy. In Medical Imaging 2004: Physiology, Function, and Structure from Medical Images, volume 5369, pages 199–207. International Society for Optics and Photonics, 2004.
  • (38) Y. Liu, X. Xu, L. Yin, X. Zhang, L. Li, and H. Lu. Relationship between glioblastoma heterogeneity and survival time: An MR imaging texture analysis. Ajnr Am J Neuroradiol, 38(9), 2017.
  • (39) J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3431–3440, 2015.
  • (40) M. M. Lopez and J. Ventura. Dilated convolutions for brain tumor segmentation in MRI scans. In International MICCAI Brainlesion Workshop, pages 253–262. Springer, 2017.
  • (41) Z. Ma, R. N. Jorge, T. Mascarenhas, and J. M. R. Tavares. Novel approach to segment the inner and outer boundaries of the bladder wall in T2-weighted magnetic resonance images. Annals of biomedical engineering, 39(8):2287–2297, 2011.
  • (42) K. Men, J. Dai, and Y. Li. Automatic segmentation of the clinical target volume and organs at risk in the planning CT for rectal cancer using deep dilated convolutional neural networks. Medical physics, 44(12):6377–6389, 2017.
  • (43) P. Moeskops, M. Veta, M. W. Lafarge, K. A. Eppenhof, and J. P. Pluim. Adversarial training and dilated convolutions for brain MRI segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pages 56–64. Springer, 2017.
  • (44) A. Paszke, A. Chaurasia, S. Kim, and E. Culurciello. Enet: A deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv:1606.02147, 2016.
  • (45) A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer. Automatic differentiation in pytorch. 2017.
  • (46) X. Qin, X. Li, Y. Liu, H. Lu, and P. Yan. Adaptive shape prior constrained level sets for bladder MR image segmentation. IEEE journal of biomedical and health informatics, 18(5):1707–1716, 2014.
  • (47) X. Qin, X. Li, Y. Liu, H. Lu, and P. Yan. Adaptive Shape Prior Constrained Level Sets for Bladder MR Image Segmentation. IEEE Journal of Biomedical and Health Informatics, 18(5):1707–1716, 2014.
  • (48) J. Redmon and A. Farhadi. YOLO9000: better, faster, stronger. pages 7263–7271, 2017.
  • (49) E. Romera, J. M. Alvarez, L. M. Bergasa, and R. Arroyo. Efficient convnet for real-time semantic segmentation. In Intelligent Vehicles Symposium (IV), 2017 IEEE, pages 1789–1794. IEEE, 2017.
  • (50) O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015.
  • (51) K. Sirinukunwattana, J. P. Pluim, H. Chen, X. Qi, P.-A. Heng, Y. B. Guo, L. Y. Wang, B. J. Matuszewski, E. Bruni, U. Sanchez, et al. Gland segmentation in colon histology images: The GlaS Challenge contest. Medical image analysis, 35:489–502, 2017.
  • (52) P. Wang, P. Chen, Y. Yuan, D. Liu, Z. Huang, X. Hou, and G. Cottrell. Understanding convolution for semantic segmentation. arXiv preprint arXiv:1702.08502, 2017.
  • (53) J. M. Wolterink, T. Leiner, M. A. Viergever, and I. Išgum. Dilated convolutional neural networks for cardiovascular MR segmentation in congenital heart disease. In Reconstruction, Segmentation, and Analysis of Medical Images, pages 95–102. Springer, 2016.
  • (54) S. Woo, C. H. Suh, S. Y. Kim, J. Y. Cho, and S. H. Kim. Diagnostic performance of MRI for prediction of muscle-invasiveness of bladder cancer: A systematic review and meta-analysis. European Journal of Radiology, pages 46–55, 2017.
  • (55) Z. Wu, C. Shen, and A. v. d. Hengel. High-performance semantic segmentation using very deep fully convolutional networks. arXiv preprint arXiv:1604.04339, 2016.
  • (56) D. Xiao, G. Zhang, Y. Liu, Z. Yang, X. Zhang, L. Li, C. Jiao, and H. Lu. 3D detection and extraction of bladder tumors via MR virtual cystoscopy. International journal of computer assisted radiology and surgery, 11(1):89–97, 2016.
  • (57) X. Xu, Y. Liu, X. Zhang, Q. Tian, Y. Wu, G. Zhang, J. Meng, Z. Yang, and H. Lu. Preoperative prediction of muscular invasiveness of bladder cancer with radiomic features on conventional MRI and its high-order derivative maps. Abdominal Radiology, 42(7):1–10, 2017.
  • (58) X. Xu, X. Zhang, Y. Liu, Q. Tian, G. Zhang, Z. Yang, H. Lu, and J. Yuan. Simultaneous segmentation of multiple regions in 3D bladder MRI by efficient convex optimization of coupled surfaces. In International Conference on Image and Graphics, pages 528–542. Springer, 2017.
  • (59) X. Xu, X. Zhang, Q. Tian, G. Zhang, Y. Liu, G. Cui, J. Meng, Y. Wu, T. Liu, Z. Yang, and H. Lu. Three-dimensional texture features from intensity and high-order derivative maps for the discrimination between bladder tumors and wall tissues via MRI. International Journal of Computer Assisted Radiology and Surgery, 2017.
  • (60) F. Yu and V. Koltun. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122, 2015.
  • (61) X. Zhang, X. Xu, Q. Tian, B. Li, Y. Wu, Z. Yang, Z. Liang, Y. Liu, G. Cui, and H. Lu. Radiomics assessment of bladder cancer grade using texture features from diffusion-weighted imaging. Journal of Magnetic Resonance Imaging Jmri, 46, 2017.
  • (62) C. Zotti, Z. Luo, O. Humbert, A. Lalande, and P.-M. Jodoin. GridNet with automatic shape prior registration for automatic MRI cardiac segmentation. arXiv preprint arXiv:1705.08943, 2017.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description