Weakly-Supervised White and Grey Matter Segmentation in 3D Brain Ultrasound

Weakly-Supervised White and Grey Matter Segmentation in 3D Brain Ultrasound

Beatrice Demiray* Technische Universität München, Munich, Germany    Julia Rackerseder* Technische Universität München, Munich, Germany    Stevica Bozhinoski Technische Universität München, Munich, Germany    Nassir Navab Technische Universität München, Munich, Germany Johns Hopkins University, Baltimore, USA
Abstract
\@footnotetext

* Both authors contributed equally to this work. Although the segmentation of brain structures in ultrasound helps initialize image based registration, assist brain shift compensation, and provides interventional decision support, the task of segmenting grey and white matter in cranial ultrasound is very challenging and has not been addressed yet. We train a multi-scale fully convolutional neural network simultaneously for two classes in order to segment real clinical 3D ultrasound data. Parallel pathways working at different levels of resolution account for high frequency speckle noise and global 3D image features. To ensure reproducibility, the publicly available RESECT dataset is utilized for training and cross-validation. Due to the absence of a ground truth, we train with weakly annotated label. We implement label transfer from MRI to US, which is prone to a residual but inevitable registration error. To further improve results, we perform transfer learning using synthetic US data. The resulting method leads to excellent Dice scores of 0.7080, 0.8402 and 0.9315 for grey matter, white matter and background. Our proposed methodology sets an unparalleled standard for white and grey matter segmentation in 3D intracranial ultrasound. \@footnotetextWe received funding from the Horizon 2020 program EDEN2020 under grant agreement No 688279 and by the German Research Foundation (DFG) funded SFB 824. We gratefully acknowledge the support of the GPU grant program from NVIDIA.

1 Introduction

The standard modality for imaging of brain tissue is magnetic resonance imaging (MRI), where segmentation of grey and white matter (GM and WM) can be performed automatically, and used for many clinical purposes, such as detecting brain abnormalities [6] and assessing the severity of dementia [4]. Although its advantages have long been proven, intra-operative MRI has not been introduced into many operating theaters and processes, due to time requirements and workflow disruptions. Ultrasound (US) does not have these constraints; however, images are usually harder to interpret, due to poor contrast, artifacts, low anatomical detail and induced deformations. Compared to MRI and CT images, medical US suffers from a limited field of view and speckle noise, caused by scattering of the US beam from tissue inhomogeneities. This can lead to decreased contrast between anatomically distinct structures. This reduces the ability of human observers to resolve final detail which complicates and sometimes inhibits expert annotations. It also impedes robust automated computation of segmentation masks in US. However, intra-operative US (iUS) is increasingly used in a neurological setting, as a result of ease of integration into the clinical routine and small size. Thus, it is no longer only used for registration tasks, but also for tissue classification, tracking and intra-op monitoring of patient and surgical processes, drug perfusion and decision support systems. In this setting, segmentation of brain structures can initialize image based registration  [12] or assist brain shift and tracking error compensation [10] and thus provide interventional decision support. Nonetheless, segmentation of soft tissue in US, especially intracranial, is a demanding task for algorithms, as well as experts. This, combined with the small size of most medical datasets, complicates creating training data of sufficient quality even further. As a solution labels can be transferred from other modalities, that are easier to annotate or can even be annotated automatically, e.g. CT or MRI. Although challenging, the segmentation of structures in brain US has extensive relevance in clinical practice and research. Early diagnosis of Parkinson’s Disease is facilitated by midbrain segmentation from transcranial US (TCUS). Hough-CNN [8], automatically segments the midbrain in TCUS and deep brain regions in MRI. Another active field of research on cranial US segmentation is the automatic detection of ventricles in infants for diagnosis of brain anomalies. US can spare children sedation for MRI to avoid motion artifacts. A first fully automatic approach to determine the volume of ventricles [11] is based on level-set. However, it is difficult to integrate into clinical routine, due to a mean processing time of 54 minutes per volume. Recent developments in deep learning enable the segmentation of ventricles in infant US. A combination of U-Net and SegNet exploited 2D US [15], followed by a solely U-Net based implementation achieving segmentation in 3D US [7]. For the segmentation of WM and GM in MRI, standard approaches rely on the fuzzy C-means clustering technique [4]. Methods tackling this problem with deep learning include SegNet [1] and VoxResNet [2]. As a pre-processing step for lesion detection, DeepMedic [5] segments GM, WM and ventricles from MRI. Therefore the segmentation of WM and GM in MRI appears to be a well addressed problem. This is not the case for US.

Figure 1: DeepMedicUS with three pathways at different input resolutions.

To the best of our knowledge, we present the first work tackling the challenging problem of GM and WM segmentation in 3D US. We use and evaluate an extended DeepMedic [5] architecture on the publicly available RESECT dataset [16]. The original network implementation is available online and we provide an elaborate description of all hyperparameters, settings and experimental setups. By using a public dataset, this work allows full reproducibility, easy benchmarking and comparison within the research community. To gain insight on how to address the specific appearance of ultrasound data, we perform a study on two different activation functions to evaluate their behavior in Fully Convolutional Networks when applied to US image analysis. We show that pre-training network models on synthetic US data can improve their performance. We present unparalleled results with average dice score of 0.7080, 0.8402 and 0.9315 for GM, WM and background, respectively. Given that ground truth labels are unavailable, we only train with weakly annotated labels which are prone to a residual but inevitable registration error. Yet, for some patient cases the model even improves over uncertain transferred labels.

2 Methods

In this work we propose a CNN designed for the domain of US, inspired by the architecture presented by Kamnitsas et al. [5]. Their segmentation network for MRI outputs label probability maps, regularized by Conditional Random Fields (CRF) to produce highly accurate segmentation labels. To account for high frequency speckle present in ultrasound images, as well as more global image features in the 3D US data, we add another parallel pathway, working at an even lower resolution. In addition, we implemented a cyclic learning rate and empirically adapted the size of the hidden fully convolutional layers to further adapt our network to US. We train two classes simultaneously with the goal of segmenting multiple anatomical regions in 3D data, acquired in a real clinical setting. Following, we present the dataset preparation and further elaborate the network architecture.

Dataset Preparation We utilize T1 weighted MRI and co-registered pre-resection reconstructed 3D US volumes of the public RESECT dataset [16] from 23 patients with low-grade glioma. Patient numbering is kept according to original publication. Even for clinical experts it is challenging to distinguish GM and WM in US, rendering it impossible to obtain ground truth, which leaves us with the challenge to train our network weakly supervised with the risk of introducing a residual but inevitable registration error. Thus, for each US volume we generate labelmaps from segmentations in co-registered MRI volumes via label transfer. To generate MRI labelmaps, skull stripping and cortical parcellation of MRI volumes are performed automatically with FreeSurfer111http://surfer.nmr.mgh.harvard.edu/fswiki/. The parcellation labelmap is converted to GM (label ), WM () and background (BG, ) annotations. MRI and US volumes are co-registered rigidly. To increase the quality of our propagated labelmaps, we employ an affine registration of MRI and US using the metric [3] in ImFusion Suite222Version 1.1.8, ImFusion GmbH, Munich, Germany, https://www.imfusion.com/. The annotations for GM, WM and BG are mapped to the US volumes and sampled to a 0.4 mm isotropic resolution. This yields a label distribution of 23 % BG, 30 % GM and 47 % WM.

Simulated US Sweeps To cope with the small dataset size, we evaluate pre-training the network on synthetic US sweeps. The simulation allows the approximation of different imaging conditions by modifying the imaging parameters and acoustic properties of tissue types. Synthetic volumes are generated with a hybrid ray-tracing and convolutional method [13] based on the MRI labelmaps. A large amount of sweeps is generated and filtered, resulting in five high quality volumes per patient.

Pre-Processing and Data Augmentation US volumes and simulated sweeps are resampled to an isotropic voxel size of 0.4 mm. Standardization is performed by subtracting mean intensity value and dividing by standard deviation, ensuring stable behavior during training. US volumes are masked to areas that contain image information only. To tackle the challenges associated with small homogeneous datasets and encourage convergence to robust models while reducing overfitting, we augment the data per patch during training with a certain probability. Patches are flipped randomly by one of the main axes and rotated by 90° around one arbitrary chosen main axis. A balanced distribution of foreground and background classes is enforced in a ratio of 1:1 during training.

Figure 2: Examples of best and worst performing patients in the dataset. We show US, label map (WM in white, GM in grey and BG in black), prediction by the network, and probability maps for GM, WM and BG (probability indicated by increasing intensity).

Network Architecture Conventional CNNs tend to lose spatial information in their last fully connected layers, which opposes the high importance the information takes in semantic segmentation. Fully convolutional networks (FCN) can mitigate this problem by using transposed convolutional layers instead of applying learned up-sampling to low-resolution feature maps. We utilize a multi-scale CNN architecture that has achieved promising results for anatomical whole brain [1] and lesion segmentation [9] in MR brain images. As baseline for comparison, the original architecture [5] was implemented. We refer to the proposed network as DeepMedicUS. We use a three-pathway approach similar to [5] with eight convolutional layers per pathway, followed by concatenation blocks and three fully convolutional layers (Fig. 1). Batch normalization is applied after each convolutional layer and before each activation layer. The first path takes the 3D input patch at the original resolution while the two parallel pathways downsample by factor 3 and 5, respectively. This ensures global features are captured while not straining memory. The kernel size is set to [3,3,3] except for the two final fully convolutional layers with size [1,1,1]. Interconnecting different network levels preserves high-level image features and speeds up training times. Thus, residual connections are introduced to layers 4, 6 and 8. Cross entropy is used as loss function, mini-batch gradient descent with Adam optimizer is used. To minimize the need for manual refinement of the learning rate, we implement a cyclic learning rate and derived the optimal parameters as described in [14]. This lowers the risk of slow convergence or divergence. Tailored to our model, we use a triangular policy with a base learning rate of , a maximum bound of and a step size of 1600. To improve segmentation accuracy, CRF are used as a post-processing step to integrate smoothness terms. Thus variable updates can be efficiently executed using Gaussian filtering in feature space to maximize label consistency between similar pixels.

Experiments We analyze the effect of two activation functions: 1) a rectified linear unit (ReLU), where all negative values are set to zero. 2) Parametric ReLU (PReLU) which adaptively learns a proper positive slope for negative inputs, preventing negative neurons from dying. For complex US data, we expect this to improve performance at negligible extra computational cost. In addition, we analyze the effects of transfer learning using synthetic US data, which is five times the amount of real data. For this purpose, we pre-train a model from scratch on the synthetic data. We then fine-tune and test this model exclusively with real data. To examine the effect of different pre-training dataset sizes, we repeat this experiment at 100 %, 50 % and 25 % of the available synthetic data. In order to estimate the performance of our models on unseen data despite having a small dataset, we evaluate all trained models with N-fold cross-validation employing case separation at patient level. We randomly separate 23 patient cases into folds containing [5,5,5,4,4] cases, respectively. We keep this distribution consistent for all experiments to ensure comparability. For the implementation of all architectures we use the TensorFlow framework. All training and testing processes are performed using an NVIDIA TITAN X Pascal GPU (CUDA 10).

Network Setting AvgStd
Kamnitsas et al. PReLU GM 0.5216 0.4522 0.4376 0.5086 0.4542 0.47490.0376
WM 0.4899 0.4507 0.4187 0.4874 0.4908 0.46750.0320
BG 0.6271 0.5832 0.4540 0.5551 0.5986 0.56360.0666
DeepMedicUS PReLU GM 0.7025 0.7137 0.6608 0.7105 0.7524 0.70800.0327
WM 0.8343 0.8790 0.7885 0.8271 0.8719 0.84020.0367
BG 0.9485 0.9557 0.8805 0.9247 0.9480 0.93150.0308
DeepMedicUS ReLU GM 0.5834 0.5234 0.5228 0.4875 0.5897 0.54140.0438
WM 0.7852 0.7197 0.6495 0.6515 0.7777 0.71670.0656
BG 0.9115 0.8846 0.8036 0.8569 0.8969 0.87070.0425
Table 1: Comparison of Dice scores for different network architectures and actvation functions. Results are shown per cross validation test fold and per label class.

3 Results and Discussion

The quantitative results from our network comparison are shown in Fig. 1. On average, DeepMedicUS achieves the highest Dice with PReLU at 0.7080, 0.8402 and 0.9315 for GM, WM and BG, respectively. The additional pathway at lower scale significantly improved the performance of the model tailored for US data. For GM and WM, specificity (0.8957 and 0.9311) is generally higher than sensitivity (0.8021 and 0.8648), i.e. the number of false negatives is comparatively low which is desirable in clinical applications. While classification of BG pixels appears to be less challenging for the model, for WM and especially GM this poses a more complex task. Over all testing folds, WM predictions show higher accuracy over GM predictions. This could be due to WM structures in general having a more homogeneous appearance and intensity profile, thus being an easier task for the network to identify. Training the network by Kamnitsas et al. [5] took 13.6 hours on one GPU. Training DeepMedicUS took 14.7 hours, increasing the computational cost by only , while achieving more accurate segmentations. PReLU increased computational cost over ReLU by 12.0 %. Given the improvement in segmentation accuracy of 30.8 %, 17.2 % and 7.0 % for GM, WM and BG, respectively, this is an acceptable cost. Segmenting one full patient volume on average took 14 seconds. We compared the accuracy of pre-trained models at different training dataset size. A reduction to 25 % of training data impaired Dice scores by 8 %, 15 % and 4 % for GM, WM and BG, respectively, due to overfitting. Doubling the amount of synthetic data for pre-training, however, plateaued the average accuracy of the model, leading to no improvement in Dice.

Qualitative comparison of the predicted segmentations can be seen in Fig. 2, which depicts US, labelmaps and predictions for the patients with best and worst results. For GM the Dice ranges between 0.3964 (patient 5) and 0.8836 (patient 6), for WM 0.3884 (patient 3) and 0.9375 (patient 12) and for BG 0.7011 (patient 5) and 0.9375 (patient 12). Visual inspection of predictions only leaves few complaints, which is shown in patient 6 and 12. The network is also able to correctly interpret tumor areas as BG (blue arrows). These results can be explained partially by tracking inaccuracies that cause reconstruction problems and the use of different US probes, that give worse ultrasound images, see for example patient 3. For patient 5 the network falsely classifies too much area as GM (purple arrow). It is able to correctly label an area with incorrect transfer labels as non-BG (yellow arrows), however leading to a low Dice. Although some volumes seem more challenging for the network, all results are in a clinically acceptable range.

DeepMedicUS achieves accurate segmentation with Dice of 0.7080, 0.8402 and 0.9315 for GM, WM and BG, respectively. In comparison, average Dice scores of 91.4 % for whole tumor segmentation in MRI were reported by [5]. However, while tumor tissue usually shows good contrast to surrounding healthy tissue in MRI, segmenting anatomical structures inside the tumor was shown to be a more challenging task. Hence, [5] also report scores of 50.0 and 35.1 for such tasks. This score was further reduced for 50 % and 20 % training dataset size. These findings for the influence of data size reduction are coherent with the outcomes presented in our work. Averaged over all labels, we achieve a comparable Dice of 0.83: Milletari et al. [8] report an average Dice score of 0.82 for 3D midbrain segmentation in TCUS. For the similar domain of 3D reconstructed transfontanelle ultrasound, [7] report a Dice of 0.816 for the task of cerebral ventricle segmentation.

4 Conclusion

In this work we demonstrated that automatic and robust segmentation of complex anatomical structures in 3D US can be feasible in clinical settings. We validated this on intra-operative cranial US. We used a multi-pathway FCN that is specifically tailored towards the image domain at hand to address the difficult task of segmenting WM and GM in US. We were able to mitigate the problem of losing spatial information and preserve high-level image features. Despite US segmentation generally being a more complex task than MRI segmentation to automate, we achieved good Dice scores. Finally, we could substantiate that pre-training neural networks with synthetic data in the presence of small medical training data can improve the model robustness and accuracy.

References

  • [1] de Brebisson, A., Montana, G.: Deep neural networks for anatomical brain segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. pp. 20–28. IEEE (2015)
  • [2] Chen, H., Dou, Q., Yu, L., Qin, J., Heng, P.A.: Voxresnet: Deep voxelwise residual networks for brain segmentation from 3d mr images. NeuroImage 170, 446–455 (2018)
  • [3] Fuerst, B., Wein, W., Müller, M., Navab, N.: Automatic ultrasound–mri registration for neurosurgery using the 2d and 3d lc2 metric. Medical image analysis 18(8), 1312–1319 (2014)
  • [4] Goyal, A., Arya, M.K., Agrawal, R., Agrawal, D., Hossain, G., Challoo, R.: Automated segmentation of gray and white matter regions in brain mri images for computer aided diagnosis of neurodegenerative diseases. In: 2017 International Conference on Multimedia, Signal Processing and Communication Technologies (IMPACT). pp. 204–208. IEEE (2017)
  • [5] Kamnitsas, K., Ledig, C., Newcombe, V.F., Simpson, J., Kane, A., Menon, D., Rueckert, D., Glocker, B.: Efficient multi-scale 3d cnn with fully connected crf for accurate brain lesion segmentation. Medical Image Analysis 36, 61–78 (2017)
  • [6] Kebir, S.T., Mekaoui, S.: An efficient methodology of brain abnormalities detection using cnn deep learning network. In: 2018 International Conference on Applied Smart Systems (ICASS). pp. 1–5. IEEE (2018)
  • [7] Martin, M., Sciolla, B., Sdika, M., Wang, X., Quetin, P., Delachartre, P.: Automatic segmentation of the cerebral ventricle in neonates using deep learning with 3d reconstructed freehand ultrasound imaging. In: 2018 IEEE International Ultrasonics Symposium (IUS). pp. 1–4. IEEE (2018)
  • [8] Milletari, F., Ahmadi, S.A., Kroll, C., Plate, A., Rozanski, V., Maiostre, J., Levin, J., Dietrich, O., Ertl-Wagner, B., Bötzel, K., et al.: Hough-cnn: deep learning for segmentation of deep brain regions in mri and ultrasound. Computer Vision and Image Understanding 164, 92–102 (2017)
  • [9] Moeskops, P., Viergever, M.A., Mendrik, A.M., de Vries, L.S., Benders, M.J., Išgum, I.: Automatic segmentation of mr brain images with a convolutional neural network. IEEE transactions on medical imaging 35(5), 1252–1261 (2016)
  • [10] Nitsch, J., Klein, J., Moltz, J.H., Miller, D., Sure, U., Kikinis, R., Meine, H.: Neural-network-based automatic segmentation of cerebral ultrasound images for improving image-guided neurosurgery. In: Medical Imaging 2019: Image-Guided Procedures, Robotic Interventions, and Modeling. vol. 10951, p. 109511N. International Society for Optics and Photonics (2019)
  • [11] Qiu, W., Chen, Y., Kishimoto, J., de Ribaupierre, S., Chiu, B., Fenster, A., Yuan, J.: Automatic segmentation approach to extracting neonatal cerebral ventricles from 3d ultrasound images. Medical image analysis 35, 181–191 (2017)
  • [12] Rackerseder, J., Baust, M., Göbl, R., Navab, N., Hennersperger, C.: Initialize globally before acting locally: Enabling landmark-free 3d us to mri registration. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 827–835. Springer (2018)
  • [13] Salehi, M., Ahmadi, S.A., Prevost, R., Navab, N., Wein, W.: Patient-specific 3D Ultrasound Simulation Based on Convolutional Ray-tracing and Appearance Optimization. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 510–518. Springer (2015)
  • [14] Smith, L.N.: Cyclical learning rates for training neural networks. IEEE Winter Conference on Applications of Computer Vision (WACV) pp. 464–472 (2017)
  • [15] Wang, P., Cuccolo, N.G., Tyagi, R., Hacihaliloglu, I., Patel, V.M.: Automatic real-time cnn-based neonatal brain ventricles segmentation. In: 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI). pp. 716–719. IEEE (2018)
  • [16] Xiao, Y., Fortin, M., Unsgård, G., Rivaz, H., Reinertsen, I.: Retrospective evaluation of cerebral tumors (resect): a clinical database of pre-operative mri and intra-operative ultrasound in low-grade glioma surgeries. Medical physics (2017)
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
351024
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description