Evaluation of Multi-Slice Inputs to Convolutional Neural Networks for Medical Image Segmentation

Evaluation of Multi-Slice Inputs to Convolutional Neural Networks for Medical Image Segmentation

Abstract

When using Convolutional Neural Networks (CNNs) for segmentation of organs and lesions in medical images, the conventional approach is to work with inputs and outputs either as single slice (2D) or whole volumes (3D). One common alternative, in this study denoted as pseudo-3D, is to use a stack of adjacent slices as input and produce a prediction for at least the central slice. This approach gives the network the possibility to capture 3D spatial information, with only a minor additional computational cost.

In this study, we systematically evaluate the segmentation performance and computational costs of this pseudo-3D approach as a function of the number of input slices, and compare the results to conventional end-to-end 2D and 3D CNNs. The standard pseudo-3D method regards the neighboring slices as multiple input image channels. We additionally evaluate a simple approach where the input stack is a volumetric input that is repeatably convolved in 3D to obtain a 2D feature map. This 2D map is in turn fed into a standard 2D network. We conducted experiments using two different CNN backbone architectures and on five diverse data sets covering different anatomical regions, imaging modalities, and segmentation tasks.

We found that while both pseudo-3D methods can process a large number of slices at once and still be computationally much more efficient than fully 3D CNNs, a significant improvement over a regular 2D CNN was only observed for one of the five data sets. An analysis of the structural properties of the segmentation masks revealed no relations to the segmentation performance with respect to the number of input slices.

The conclusion is therefore that in the general case, multi-slice inputs appear to not significantly improve segmentation results over using 2D or 3D CNNs.

\makeglossaries\newacronym

drsDRSDiabetic Retinopathy Screening \newacronymvqaVQAVisual Question Answering \newacronymslamSLAMSimultaneous Localization and Mapping \newacronym[plural=CNNs,firstplural=Convolutional Neural Networks (CNNs)]cnnCNNConvolutional Neural Network \newacronymnlpNLPnatural language processing \newacronymmcbMCBMultimodal Compact Bilinear \newacronymmlbMLBMultimodal Low-rank Bilinear \newacronymmutanMUTANMultimodal Tucker Fusion for Visual Question Answering \newacronymidridIDRIDIndian Diabetic Retinopathy Image Dataset \newacronym[plural=RNNs,firstplural=Recurrent Neural Networks (RNNs)]rnnRNNRecurrent Neural Network \newacronymlstmLSTMLong Short-term Memory \newacronymbowBOWbag-of-words \newacronymgruGRUGated Recurrent Units \newacronym[plural=QAs,firstplural=Questions & Answers (QAs)]qaQAQuestion & Answer \newacronymmaMAMicroaneurysms \newacronymheHEHemorrhages \newacronymexEXHard Exudates \newacronymseSESoft Exudates \newacronymnarNARno apparent retinopathy \newacronymdrDRDiabetic Retinopathy \newacronymdmeDMEDiabetic Macular Edema \newacronympdrPDRProliferative Diabetic Retinopathy \newacronymwmlbWMLBWeighted Multimodal Low-rank Bilinear Attention Network \newacronymdlDLdeep learning \newacronymqcmlbQC-MLBQuestion-Centric Multimodal Low-rank Bilinear \newacronymbertBERTBidirectional Encoder Representations from Transformers \newacronymbleuBLEUBilingual Evaluation Understudy \newacronymmlmMLMMasked Language Model \newacronymnspNSPNext Sentence Prediction \newacronymreluReLUrectified linear unit \newacronymnnNNneural network \newacronymchalImageCLEF-VQA-MedImageCLEF-VQA-Med \newacronymproposed< Model >Full name of the proposed model \newacronymmriMRImagnetic resonance imaging \newacronymbratsBraTS19Brain Tumors in Multimodal Magnetic Resonance Imaging Challenge 2019 \newacronymkitsKiTS19Kidney Tumor Segmentation Challenge 2019 \newacronymibsrIBSR18Internet Brain Segmentation Repository \newacronymheneU-HANDUmeå Head and Neck Database \newacronymprosU-PROUmeå Pelvic Region Organs \newacronymhpc2nHPC2NHigh Performance Computer Center North \newacronymflopsFLOPsfloating point operations \newacronymctCTcomputed tomography \newacronymt1cT1cpost-contrast T1-weighted \newacronymt2T2wT2-weighted \newacronymt1T1wT1-weighted \newacronymflairFLAIRT2 Fluid Attenuated Inversion Recovery \newacronymlggLGGlow grade glioma \newacronymhggHGGhigh grade glioma \newacronym[plural=DSCs]dscDSCDice similarity coefficient \newacronymsebSEBSqueeze-and-Excitation block \newacronym[plural=REBs,firstplural=ResNet blocks]resREBResNet block \newacronymhd95HD95 percentile of the Hausdorff distance \newacronymdaucDAUCDice Area Under Curve \newacronymrftpRFTPsRatio of Filtered True Positives \newacronym[plural=SDs,firstplural=standard deviations (SDs)]sdSDstandard deviation \newacronymceCEcategorical cross–entropy \newacronymgpuGPUGraphical Processing Unit

{keywords}

Medical Image Segmentation, Convolutional Neural Network, Multi-Slice, Deep Learning

I Introduction

Segmentation of organs and pathologies are common activities for radiologists and routine work for radiation oncologists. Nowadays manual annotation of such regions of interest is aided by various software toolkits for image enhancement, automated contouring, and structure analysis in all fields on image-guided radiotherapy [32, 4, 9]. Over the recent years, \glsdl has emerged as a very powerful concept in the field of medical image analysis. The ability to train complex neural networks by example to independently perform a vast spectrum of annotation tasks has proven itself a promising method to produce segmentations of organs and lesions with expert-level accuracy [38, 24].

For both organ segmentation and lesion segmentation, the most common \glsdl model is the \glscnn. Whereas the classic approach of segmenting 3D medical volumes by \glsplcnn consists of training on and predicting the individual 2D slices independently, the interest has shifted in recent years towards full 3D convolutions in vo1umetric neural networks [37, 24, 28, 6, 11]. Volumetric convolution kernels have the advantage of taking inter-slice context into account, thus preserving more of the spatial information than what is possible when using 2D convolutions within slices. However, volumetric operations require a much larger amount of computational resources. For medical image applications, the lack of sufficient \glsgpu memory to fit entire volumes at once requires in almost all cases a patch-based approach, reduced input sizes, and/or small batch sizes and therefore longer training times.

I-a Related Work

In terms of fully connected, end-to-end 3D networks, studies often attempt to compensate for the small patch size that can maximally fit into the \glsgpu memory at once by creating more efficient architectures or utilizing post-processing methods. The original U-Net by Ronneberger et al. [34], an architecture which was, at that time, and still is, a popular and powerful network for semantic medical image segmentation, was first reintroduced as a 3D variant by Çiçek et al. [6]. The 3D U-Net was used by Vu et al. [41, 42] in a cascaded approach where a first coarse prediction was used to generate a candidate region in which a second, finer-grained prediction was performed; this proved to be an effective way of reducing the amount of input data for the final prediction. V-Net by Milletari et al. [28] extended the network of [6] by adding residual connections to the 3D U-Net.

Li et al. [22] reduced the computational cost required for a fully connected 3D \glscnn by replacing the deconvolution steps in the upsampling phase with dilated convolution to preserve the spatial resolution of the feature maps. VoxResNet [5] is a very deep residual network that was trained on small 3D patches. The resulting output probability map was combined with the original multimodal volumes into a second VoxResNet to obtain a more accurate output. A related approach from Yu et al. [46] extended this architecture by implementing long residual connections between residual blocks, in addition to the short connections within the residual blocks. The same group proposed another densely connected architecture called DenseVoxNet [45], where each layer had access to the feature maps of all its preceding layers, decreasing the number of parameters and possibly avoiding to learn redundant feature maps.

Lu et al. [25] used a graph cut model to refine the output of their coarse 3D \glscnn. A 3D network composed of two separate convolutional pathways, at low and high resolution, was introduced by Kamnitsas et al. [18]. For improvement, the resulting segmentation was, in turn, post-processed by a Conditional Random Field. A variant of this multi-scale feature extraction during convolution was used by Lian et al. [23], who used this procedure in the encoding phase of their U-Net-like 3D \glscnn. Ren et al. [33] exploited the small size of regions of interest in the head and neck area (i.e. the optic nerves and chiasm) to build an interleaved combination of small-input, shallow \glsplcnn trained at different scales and in different regions. Feng et al. [12] used a two-step procedure: a first 3D U-Net was used to localize thoracic organs in a substantially downsampled volume, and crop to a bounding box around each organ. Then, individual 3D U-Nets were trained to segment each organ inside its subvolume at the original resolution. Another example of 3D convolutions applied only on a small region of interest is from the work of Anirudh et al. [1], who randomly sampled subvolumes in lung images for which the centroid pixel intensity was above a certain intensity threshold, to classify the subvolume as containing a lung nodule or not.

While these studies have shown that 3D \glsplcnn are worth the effort, alternative approaches have been investigated to involve volumetric context to improve segmentation while avoiding 3D convolutions altogether. One of the more common methods, usually called 2.5D, is to use \glsplcnn that combine tri-planar 2D \glsplcnn from intersecting orthogonal patches [31, 35, 8, 44, 26, 29, 20, 14]. This can be a computationally efficient way of incorporating more 3D spatial information, and these studies all present promising results. However, this method is limited in the volumetric information it can encompass at once.

We, therefore, investigate a method that uses a volumetric input but is still largely 2D based with a minimal amount of 3D operations. Instead of a method that takes a single 2D slice as input, and outputs the 2D segmentation of that slice, one can also incorporate neighboring slices to provide a 3D context to enhance segmentation performance. A common approach to this is to include neighboring slices to a central slice as multiple input image channels. Novikov et al. [30] included the preceding and succeeding axial slice for vertebrae and liver segmentation. Such a three-slice input was also used by Kitrungrotsakul et al. [21] for the detection of mitotic cells in 4D data (spatial + temporal). This was a cascaded approach where a first detection step with a three-slice input produced results for these three slices. In the second step, they reduced the number of false positives where for each slice the time-frame before and after was included. In a deep \glscnn for liver segmentation, Han [16] used five neighboring slices. Ghavami et al. [15] compared incorporating three, five, and seven slices for prostate segmentation from ultrasound images. While their method produced promising segmentation results, no significant difference was found between these three input sizes. In a recent paper, Ganaye et al. [13] employed a seven-slice input producing an output for the three central slices, which the authors refer to as 2.5D. This model was used to evaluate a loss function that penalized anatomically unrealistic transitions between adjacent slices. The authors did not report a significant improvement between the baseline 2D and 2.5D models, but the 2.5D model did outperform in terms of Hausdorff Distance when the non-adjacency loss was employed.

I-B Contributions

In this paper, we systematically investigate using multiple adjacent slices as input to predict for the central slice in that subset, and we investigate this on the segmentation task in medical images. We will henceforth refer to any method based on this principle as pseudo-3D. We compare the segmentation performance of a range of input multi-slice sizes () to conventional end-to-end 2D and fully 3D input-output \glsplcnn. We employ the common approach from the literature where each neighboring slice is put as a separate channel in the input, and we will refer to this method as the channel-based method. Further, we introduce a second pseudo-3D method, that appears to have not been proposed in the literature before. This pseudo-3D method consists of two main components: a transition block that transforms a -slice input into a single-slice (i.e. 2D) feature map by using 3D convolutions, and this feature map is then followed by a standard 2D convolutional network, such as the U-Net [34] or the SegNet [2], that produces the final segmentation labels. This method shall be referred to as the proposed method.

The main contributions of our work are:

  1. We systematically compare the segmentation performance of 2D, pseudo-3D (with varying input size, ), and 3D approaches.

  2. We introduce a novel pseudo-3D method, using a transition block that transforms a multi-slice subvolume into a 2D feature map that can be processed by a 2D network. This method is compared to the channel-based pseudo-3D method.

  3. We compare the computational efficiency of fully 2D and 3D \glsplcnn to the pseudo-3D methods in terms of graphical memory use, number of model parameters, \glsflops, training time, and prediction time.

  4. We conduct all experiments on a diverse range of data sets, covering a broad range of data set sizes, imaging modalities, segmentation tasks, and body regions.

Ii Proposed Method

The underlying concept of the pseudo-3D methods is similar to that of standard slice-by-slice predictions using 2D \glsplcnn, but the input is now a subvolume with an odd number of slices, , extracted from the whole volume with a total of slices. The output of the model is compared to the ground truth of the central slice. If , the method is equivalent to a 2D \glscnn. A fully 3D \glscnn would be for both input and output, where all operations in the network are in 3D and the output volume is compared to the ground truth of the whole volume. See Figure 1 for an illustration of the proposed method. In this study, the number of slices in the input subvolume ranged from to . In order to isolate the contribution of using multi-slice inputs, this work did not include multi-slice outputs—where the multiple outputs for each slice are usually aggregated using e.g. means or medians.

Fig. 1: A comparison of 2D, pseudo-3D and 3D approaches. With a 2D network, the volume is segmented with a single slice input and output. Pseudo-3D uses multiple adjacent slices as input to produce an output of the central slice from the input. 3D approaches take in the whole volume at once, and return a prediction for the whole volume as well. In the figure, the , , and are the original width, height, and depth of the input volume, respectively.

Let the input volume be of width , height , depth , and have channels. A common way of utilizing depth information to train with regards to the central slice is as follows: group the channel and depth dimension together as one, and consider the input to be of shape , i.e. with channels. By incorporating the slices in the channel dimension, the multi-slice input can be processed by a regular 2D network. As was mentioned in Section I-B, this method is denoted here as the channel-based method.

The channel-based method is compared to a novel pseudo-3D approach denoted the proposed method. Consider the input to be of shape . This is fed through a transition block with layers (where is the floor function). In each layer, a 3D convolution with a kernel of size is applied to the volume within the image, after it has been padded in the and dimensions, but not in the dimension. Thus, after each layer in the transition block, the depth of the image is reduced by slices, while the width and height stay the same size. After the final convolution, the depth dimension is removed. Hence, the shapes change as

In both the proposed method and the channel-based method, the output layer of the network is the segmentation mask, with an output shape of . Hence, it produces a single segmentation slice, corresponding to the central slice of the input subvolume. See Figure 1 for an illustration of this.

Fig. 2: The proposed methods illustrated with the U-Net backbone. The output is the prediction for the central slice of the input. The numbers in the transition block indicate the depth and in the backbone the number of filters. Top: The Proposed Method where the transition block uses 3D convolutions and 2D padding to iteratively reduce the input from depth to , while the width and height remain. Bottom: The Channel-Based method, where neighboring slices are input as separate channels, and the input can be fed into a 2D \glscnn right away.

The network architectures that were evaluated in this work was the U-Net [34] and the SegNet [2], two popular variants of encoder-decoder architectures that have been successful in semantic medical image segmentation. An illustration of both pseudo-3D methods, with U-Net as the main network architecture, is given in Figure 2. Another illustration of the networks with SegNet backbone can be seen in Figure 11 in the Supplementary Material.

We evaluate the two pseudo-3D methods for and compare them to the corresponding conventional end-to-end 2D and 3D networks, all with the U-Net or SegNet architectures. This yields a total of different experiments for each data set (six input sizes for the two pseudo-3D methods, plus 2D and 3D methods, all with two network architectures). Apart from the segmentation performance, the computational cost is also evaluated across experiments in terms of the number of network parameters, the maximum required amount of \glsgpu memory, the number of \glsflops, the training time per epoch, and the prediction time per sample.

Iii Experiments

We here present the data sets the experiments were conducted on, as well as the encompassing information and parameters used in the experiments.

Iii-a Materials

To test the generalizing capabilities of the methods, we ran experiments on five different data sets, covering a variety of modalities, data set sizes, segmentation tasks, and body areas. Three of the data sets are publicly available, as they were part of segmentation challenges. On top of those, we further used two in-house data sets collected at the University Hospital of Umeå, Umeå, Sweden.

\acrlongpros

An in-house data set containing \glsct images of the pelvis region from patients that underwent radiotherapy for prostate cancer at the University Hospital of Umeå, Umeå, Sweden. We denote this data set \glspros. The delineated structures include the prostate (in most cases annotated as the clinical or gross target volume) and some organs at risk, among them the bladder and rectum. The individual structure masks were merged into a single multilabel truth image, with pixel value for the prostate, for the bladder, and for the rectum (see Figure 3). Patients without the complete set of structures were excluded, resulting in a final data set containing patients.

Fig. 3: \acrlongpros data set. From top to bottom: images and ground truth images of the prostate (red), bladder (green) and rectum (blue).

\acrlonghene

An in-house data set containing \glsct images of the head and neck region of patients. This data set comprises the patients from the University Hospital of Umeå, Umeå, Sweden, that participated in the ARTSCAN study [47]. We denote this data set \glshene. For each \glsct image, manual annotations of the target volumes and various organs at risk were provided. The organ structures that were included with this data were the bilateral submandibular glands, bilateral parotid glands, larynx, and medulla oblongata (see Figure 4). After removal of faulty \glsct volumes where the slice spacing changed within a volume and excluding patients in which not all of the six aforementioned structures were present, the final data set contained patients.

Fig. 4: \acrlonghene. From top to bottom: images and ground truth images at different slices of the left and right submandibular glands (red and green), left and right parotid glands (dark blue and yellow), larynx (light blue), and medulla oblongata (pink).

\acrlongbrats

The \glsbrats [27, 3] was part of the MICCAI 2019 conference. It contains multimodal pre-operative \glsmri data of patients with pathologically confirmed \glshgg () or \glslgg () from 19 different institutes. For each patient, \glst1, \glst1c, \glst2, and \glsflair scans were available, acquired with different protocols and various scanners at  T.

Manual segmentations were carried out by one to four raters and approved by neuroradiologists. The necrotic and non-enhancing tumor core, peritumoral edema, and contrast-enhancing tumor were assigned labels , , and respectively (see Figure 5). The images were co-registered to the same anatomical template, interpolated to a uniform voxel size and skull-stripped.

Fig. 5: Manual expert annotation of two patients with \glshgg from the \acrlongbrats data set. Shown are image patches with the tumor structures that are annotated in the different modalities. The image patches show (from left to right): (1) the whole tumor visible in \glsflair, (2-3) the enhancing and tumor structures visible in \glst1 and \glst1c, respectively, and (4) the final labels visible in \glst2. The segmentations are combined to generate the final labels of the tumor structures: the necrotic and non-enhancing tumor core (NCR/NET—label 1, red), the peritumoral edema (ED—label 2, green) and the GD-enhancing tumor (ET—label 4, yellow).

\acrlongkits

The data set for the \glskits challenge [17], part of the MICCAI 2019 conference, contains preoperative \glsct data from randomly selected kidney cancer patients that underwent radical nephrectomy at the University of Minnesota Medical Center between 2010 and 2018. Medical students annotated under supervision the contours of the whole kidney including any tumors and cysts (label 1), and contours of only the tumor component excluding all kidney tissue (label 2) (see Figure 6). Afterward, voxels with a radiodensity of less than  HU were excluded from the kidney contours, as they were most likely perinephric fat.

Fig. 6: \acrlongkits data set. From top to bottom: images and ground truth images of the kidney (red) and kidney tumor (green).

\acrlongibsr

The \glsibsr data set [7] is a publicly available data set with \glst1 \glsmri volumes, and is commonly used as a standard data set for tissue quantification and segmentation evaluation. Whole-brain segmentations of cerebrospinal fluid (CSF), gray matter, and white matter were included with their respective labels , , and (see Figure 7).

Fig. 7: \acrlongibsr data set. Axial slices of three patients with the ground truth of the cerebrospinal fluid (red), white matter (green) and gray matter (blue).

Iii-B Preprocessing

Due to the diverse range of data sets, it must be ensured that the training data is as similar as possible across experiments in order to achieve a fair comparison.

material/data set \glsbrats \glskits \glsibsr \glshene \glspros
type \acrshortmri \acrshortct \acrshortmri \acrshortct \acrshortct
#modalities 4 1 1 1 1
#classes 3 2 3 6 3
#patients 335 210 18 73 1 148
train 268 168 15-16 59 734
val 67 42 2-3 14 184
test 67 42 2-3 14 230
original shape 240-240-155 512-512- 256-128-256 -- 512-512-
original voxel size (in mm) 1.0-1.0-1.0 -- 1.0-1.0-1.0 -- --
preprocessed shape 160-192-128 256-256-128 256-128-256 256-256-64 256-256-128
preprocessed voxel size (in mm) 1.0-1.0-1.0 2.3-2.3-2.3 1.0-1.0-1.0 1.3-1.0-5.8 2.7-2.7-3.9
augmentation
flip left-right
elastic transform
rotation
shear
zoom
TABLE I: Data sets and augmentation techniques in this study. , , and each denote that the volume shape is varied in width, height, and/or depth, respectively. , , and each denote that the voxel spacing is varied in width, height, and/or depth, respectively.

Magnetic Resonance Image Preprocessing

The \glsbrats and \glsibsr data sets were N4ITK bias field corrected [40] and normalized to zero-mean and unit variance. The \glsbrats volumes were cropped around the center to a resolution of , to increase processing speed. This last step was skipped for the \glsibsr data set because of the much smaller amount of data samples.

Computed Tomography Image Preprocessing

In the \glspros, \glshene, and \glskits data sets, all images had an resolution (i.e. sagittal-coronal ) of and a varying slice count. The voxel size also varied between patients, so a preprocessing pipeline (see Figure 8) was set up to transform these two data sets to a uniform resolution and voxel size.

Fig. 8: Preprocessing pipeline as applied on the \glspros data set. Given are the resolutions and in parentheses the voxel dimensions in mm. , , and each denote that the volume shape is varied in width, height, and/or depth, respectively. , , and each denote that the voxel spacing is varied in width, height, and/or depth, respectively.

First, the data were resampled to an equal voxel size within the same set. The volumes were then zero-padded to the size of the single largest volume from that set after resampling. In order to increase processing speed and lower the memory consumption, the \glspros and \glskits volumes were thereafter downsampled to , and the \glshene volumes were downsampled to . An example of this method pipeline is shown in Figure 8.

As a final step, the images were normalized by clipping each case to the range , subtracting and dividing by .

Iii-C Training Details

Our method was implemented in Keras 2.2.41 using TensorFlow 1.12.02 as the backend. The experiments were trained on either a desktop computer with an NVIDIA RTX 2080 Ti \glsgpu, or the NVIDIA Tesla V100 GPUs from the \glshpc2n at Umeå University, Sweden. Depending on the model, the convergence speed, and the data set size, a single experiment took from minutes to multiple days to complete.

Experiment Setup

For the 3D experiments, the \glsbrats data set was the only data where the whole volumes could be fed into the network at once because of constraints in GPU memory. For the other data sets, we resorted to a patch-based approach where the input size would be , the largest size possible for our available hardware.

In all experiments, we employed the Adam optimizer [19] with an initial learning rate of . If the validation loss did not improve after a certain number of epochs, we used a patience callback that dropped the learning rate by a factor of and an early stopping callback that terminated the experiment. Because of the differences in data set sizes, these callbacks had to be determined from initial exploratory experiments for each separate data set to ensure experiments did not run for too long or too short. The patience callbacks were set to five epochs for the \glsbrats, \glskits, and \glspros experiments, six epochs for the \glshene data set, and ten epochs for the \glsibsr data set. The early stopping callbacks were set to epochs for \glspros data, epochs for \glsbrats and \glskits data, for \glshene data, and epochs for \glsibsr. The maximum number of epochs an experiment could run for, regardless of any changes in the validation loss, was set to for the \glshene and \glspros data and for the other data sets. Batch normalization and a norm regularization, with parameter , were applied to all convolutional layers, both in the transition block and in the main network. The \glsrelu function was used as the intermediate activate function. The activation function of the final layer was the function. Each data set was split into training and test set, and with the training set, in turn, being split into for training and for validation.

As loss function, we employed a combination of the \glsdsc and \glsce. The \glsdsc is typically defined as

(1)

with the output segmentation and its ground truth. However, a differentiable version of Equation 1, the so-called soft \glsdsc, was used. The soft \glsdsc is defined as

(2)

where for each label , the is the output of the network and is a one-hot encoding of the ground truth segmentation map. The is a small constant added to avoid division by zero.

The \glsdsc is a good objective for segmentation, as it directly represents the degree of overlap between structures. However, for unbalanced data sets with small structures and where the vast majority of pixels are background, it may converge to poorly generalizing local minima, since misclassifying only a few pixels can lead to large deviations in \glsdsc. A common way [36, 43] to resolve this is to combine the \glsdsc loss with the \glsce loss, defined as

(3)

and we did this as well. Hence, the final loss function was

(4)

Data Augmentation

In order to artificially increase the data set size and to diversify the data, we employ various common methods for on-the-fly data augmentation: flipping along the horizontal axis, rotation within a range of to degrees, shear images within the range of to , zoom with a factor between and , and adding small elastic deformations as described in [39]. The data augmentation implementation we used was based on [10]. The images in the \glskits data are asymmetric along the -axis because of the liver; therefore, vertical flipping was not applied on that data set as it would result in anatomically unrealistic images (see Table I).

Evaluation

For evaluation of the segmentation performance, we employed the conventional \glsdsc as defined in Equation 1. In order to ensure a fair comparison and to investigate the variability of the results within experiments, we used five-fold cross-validation in each experiment (except for the \glspros). Due to its much larger size, the experiments on the \glspros data set were run only once.

To compare the computational cost of our proposed models to the corresponding 2D and 3D \glscnn models, we extracted the number of trainable parameters, the maximum amount of \glsgpu memory used, the number of \glsflops, training time per epoch, and prediction time per sample.

Iv Results

Model #slices #params memory \acrshortflops t per epoch p per sample
2D 1 493k 467MB 2.450M 49s 10.17s
proposed 3 495k 497MB 2.463M 73s 10.46s
5 502k 519MB 2.497M 88s 11.11s
7 509k 541MB 2.532M 109s 11.43s
9 516k 563MB 2.567M 156s 12.15s
11 523k 586MB 2.602M 204s 12.46s
13 530k 601MB 2.637M 241s 14.73s
channel-based 3 493k 485MB 2.451M 72s 10.33s
5 493k 497MB 2.453M 82s 10.49s
7 493k 510MB 2.454M 101s 11.01s
9 493k 523MB 2.454M 138s 11.17s
11 494k 534MB 2.457M 190s 11.33s
13 495k 541MB 2.459M 249s 12.39s
3D 128 1 461k 16 335MB 7.306M 370s 2.36s
TABLE II: Architecture comparison. Experiments on U-Net architecture and multimodal \glsbrats data set. Patch shape was set at where is the number of slices. Here, t and p denote the training time per epoch and prediction time per sample, respectively.
Fig. 9: Mean and standard deviation of 5 runs on BraTS19, KiTS19, IBSR18, U-HAND and U-PRO data sets.

The segmentation performances in terms of \glsdsc of all models are illustrated in Figure 9. For each data set, the mean \glsdsc scores are plotted (with point-wise standard deviation bars) as a function of the input size, and are given for the 2D, pseudo-3D with to , and 3D models, and for the U-Net and SegNet backbones. These results are also tabled in Table III, along with summaries of the experiment setups per data set.

Randomly selected example segmentations are illustrated in Figure 10. For each data set, a prediction from the 2D models, pseudo-3D models with the proposed method, and 3D models are given, along with their respective ground truths. The \glsbrats, \glskits and \glspros segmentations are cropped for ease of viewing. We chose to omit examples for the channel-based pseudo-3D models because of their high level of similarity to the proposed method. Segmentations with the channel-based method, along with additional exemplary segmentations, can be found in Figure 14-16 in Section F of the Supplementary Material.

The computational costs of the models used for \glsbrats experiments are presented in Table II. The number of model parameters, graphical memory use, and \glsflops are only dependent on the model type, and therefore the corresponding columns in Table II are equal for all other data sets. The same variables are shown for the other data sets in Table IXXII in Section B of the Supplementary Material, where the only differences are in the training and inference times due to the different numbers of samples; these two parameters scale with the data set size.

V Discussion

This study evaluated the inclusion of neighboring spatial context as an input of \glsplcnn for medical image segmentation. Such pseudo-3D methods with a multi-slice input and single-slice output are commonly implemented by regarding the adjacent slices as additional channels of the central slice. Apart from this approach, we also proposed an alternative pseudo-3D method, based upon multiple preliminary 3D convolution steps before processing by a 2D \glscnn. Across five different data sets and using U-Net and SegNet \glscnn backbones, we compared both these pseudo-3D methods, for an input size up to , to end-to-end 2D and 3D \glsplcnn with respectively single slice and whole volume inputs and outputs. Additionally, we evaluated a number of computational parameters to get a sense of each model’s hardware requirement and load.

V-a Computational costs

As seen in Table II, the computational costs are in line with what would be expected. The transition block adds a relatively small amount of extra parameters on top of the main 2D network, and the required amount of GPU memory and \glsflops scale accordingly with . Since the input is still the same size as for the channel-based method, the training times per epoch are largely similar. One advantage of the fully 3D \glsplcnn demonstrated in these results, is that prediction time is significantly faster because samples can be processed all at once instead of slice by slice.

The high computational cost of end-to-end 3D convolution is also demonstrated in Table II. The memory footprint is almost times larger than the 2D U-Net; over  GB is required to train on the complete volumes, which is at or above the limit of most modern commodity GPUs. Both pseudo-3D methods use less than 5 % of the GPU memory consumed by the end-to-end 3D network, even at . It can thus be concluded that both pseudo-3D methods are computationally very efficient ways of including more inter-slice information, with the proposed method being slightly more expensive in terms of the GPU memory consumption compared to the channel-based method.

V-B Quantitative analysis

As can be seen in Figure 9, overall, all experiments managed to produce acceptable segmentation results, even for data sets with complex structures such as the \glsbrats images, or with organs that can be hard to visually distinguish, such as in the \glshene set. One obvious similarity between these data sets is that using a U-Net backbone outperforms the SegNet in nearly every case. Regarding the behavior as a function of input size , the results in Figure 9 are inconclusive for almost all data sets.

In the plots from the \glsbrats, \glskits, \glsibsr and \glshene data in Figure 9, there does not seem to be an additional benefit by adding more slices as input over an end-to-end 2D approach. There seem to be some exceptions, like the surge at in the \glshene results, but in these four data sets the variance is either too high or the rate of increase is too low to draw any strong conclusions. For these cases, it would be doubtful if the accessory downsides, e.g. increased training time, are worth the at most marginal improvements in segmentation performance. Likewise, there seems to be no significant difference between our proposed method and the channel-based method in these four data sets.

The only data set in this study where \glsdsc does seem to significantly improve with is in the \glspros. As more slices are being included in the input volume, the segmentation performance approaches that of a fully 3D network, and the proposed method outperforms the channel-based method by an increasing margin. While the overall improvement when going from 2D to pseudo-3D with is arguably low, we can regard the \glspros case as a demonstration of the possibility that pseudo-3D models can improve the segmentation performance over 2D methods.

Fully 3D \glsplcnn seems to produce equal or worse results than their 2D and pseudo-3D counterparts in most cases. Again, the only exception seems to be in the \glspros results, and in this case only when the U-Net is used as backbone network (see the respective plot in Figure 9). This could be explained by the much higher number of parameters of 3D \glsplcnn, which makes them prone to overfitting. The high number of data samples in the \glspros set, combined with the skip-connections that differentiates the U-Net from the SegNet, might have been enough to overcome the problem in this specific case.

There does not seem to be a straight-forward explanation as to why the \glspros data set is an exception compared to the other data sets. In an attempt to connect \glsdsc behaviour with to differences in data set properties a feature-based regression analysis was performed. We computed features of the structures (ground truth masks) that describe each mask’s structural properties: structure depth (i.e. the average number of consecutive slices a structure is present in), structure size relative to the total volume, and average structural inter-slice spatial displacement. The extracted feature values for all data sets and their respective structures can be found in the Supplementary Material Table IVVIII. We found no significant agreements between models that could connect one of these data set features to \glsdsc with respect to . For more details about the feature extraction and regression analysis, see Sections A.1 and A.2 of the Supplementary Material.

Another distinction between the \glspros set and the others included in this study is its much larger number of samples. As mentioned above, this could have been a contributing factor to the higher performance of the \glspros 3D U-Net compared to other data sets’ 3D \glscnn results. This feature was also hypothesized to influence the relation between and \glsdsc, and therefore the following analysis was performed: the same experiments were performed but now training on distinct subsets of samples from the \glspros data set. The average scores obtained from the five distinct subsets can be found in Figure 12 in the Supplementary Material, where we see a similar behavior as in Figure 9. Hence, we rule out the data set size as the main cause as well.

We, therefore, conclude that pseudo-3D methods have the potential to increase segmentation performance, but in the general case will not yield better results compared to conventional, end-to-end 2D and 3D \glsplcnn.

Further analysis to explain the behaviour of the \glspros data set might be performed. The regression-based feature analysis was somewhat rudimentary, and could likely be extended with e.g. more sophisticated models and more data.

Another possible follow-up study might be to investigate whether it is the multi-slice output (e.g. producing segmentations for all input slices) in pseudo-3D methods that improve the results in other studies. While this was out of the scope of this work, aggregating multiple outputs may be the main reason why pseudo-3D methods sometimes improve the segmentation performance. Based on our conclusions that using multi-slice inputs does not seem to improve the results on their own, the added benefit might only come into play from aggregation of multiple outputs. In this case, using something like Bayesian dropout could prove just as beneficial.

V-C Qualitative analysis

It is important to emphasize that the images in Figure 10 are randomly selected single slices from thousands of samples and are therefore presented purely for illustrative purposes, and might not always be a representation of the overall segmentation performance of a particular data set. However, some remarks can be made that can be related to the quantitative results in Figure 9. The relatively large variance in segmentation performance between experiments of the \glsbrats data are demonstrated in Figure 10; as seen, the predictions can differ quite drastically within the same model and with varying . This reflects the \glspldsc of the \glsbrats set presented in Figure 9.

It also appears that the U-Net is better at capturing fine structural details, while the SegNet segmentations seems to be coarser and simpler. This becomes particularly noticeable in data sets with complex structures, such as the gray matter-white matter border in the \glsibsr images (Figure 10). This in turn results in an overall large difference in mean \glsdsc between U-Net and SegNet. When the ground truth structures are more coarsely shaped, such as in the \glshene set, the SegNet can keep up much better with the U-Net performance.

V-D Effect of the Loss Function

In an earlier stage of this project, we employed a different experimental setup with a pure \glsdsc loss function. However, these initial experiments proved this loss not to be sufficient for all data sets. Particularly the \glskits and \glshene data sets yielded unacceptably unstable results which, even with exactly equal hyperparameters, could either result in fairly accurate segmentations or complete failure. Investigation of the \glspldsc of individual structures demonstrated that in these failed experiments, multiple structures did not improve beyond a \glsdsc on the order of 0.1. After adapting the loss function to include also the \glsce term (see Equation 4), the results improved substantially for all data sets. Performance details for each run using the pure \glsdsc and final loss function can be seen in Figure 13 and Table XIII in Section E of the Supplementary Material.

Vi Conclusion

This study systematically evaluated pseudo-3D \glsplcnn, where a stack of adjacent slices is used as input for a prediction on the central slice. The hypothesis underlying this approach is that the added neighboring spatial information would improve segmentation performance, with only a small amount of added computational cost compared to an end-to-end 2D \glscnn. However, whether or not this is actually a sensible approach had not previously been evaluated in the literature.

Aside from the conventional method, where the multiple slices are input as multiple channels, we introduced here a novel pseudo-3D method where a subvolume is repeatably convolved in 3D to obtain a final 2D feature map. This 2D feature map is then in turn fed into a standard 2D network.

We investigated the segmentation performance in terms of the \glsdsc and the computational cost for a large range of input sizes, for the U-Net and SegNet backbone architectures, and for five diverse data sets covering different anatomical regions, imaging modalities, and segmentation tasks. While pseudo-3D networks can have a large input image size and still be computationally less costly than fully 3D \glsplcnn by a large factor, a significant improvement from using multiple input slices was only observed for one of the data sets. We also observed no significant improvement of 3D network performance over 2D networks, regardless of data set size.

Because of ambiguity in the underlying cause of the behavior on the U-PRO data set compared to the results on the other data sets, we conclude that in the general case pseudo-3D approaches appear to not significantly improve segmentation results over 2D methods.

Acknowledgments

This research was conducted using the resources of the High Performance Computing Center North (HPC2N) at Umeå University, Umeå, Sweden. We are grateful for the financial support obtained from the Cancer Research Fund in Northern Sweden, Karin and Krister Olsson, Umeå University, The Västerbotten regional county, and Vinnova, the Swedish innovation agency.

material/data set \glsbrats \glskits \glsibsr \glshene \glspros
#epochs 200 200 200 100 100
optimizer Adam Adam Adam Adam Adam
learning rate
learning rate drop
patience 5 5 10 6 5
early-stopping 12 12 25 14 11
U-Net [34]
2D 0.768 (0.018) 0.766 (0.016) 0.898 (0.005) 0.656 (0.012) 0.807 (0.005)
proposed () 0.765 (0.017) 0.775 (0.014) 0.907 (0.005) 0.646 (0.008) 0.811 (0.005)
proposed () 0.766 (0.018) 0.789 (0.014) 0.909 (0.004) 0.654 (0.005) 0.817 (0.004)
proposed () 0.769 (0.013) 0.797 (0.017) 0.911 (0.003) 0.645 (0.012) 0.818 (0.004)
proposed () 0.771 (0.022) 0.790 (0.017) 0.912 (0.003) 0.642 (0.007) 0.819 (0.004)
proposed () 0.766 (0.018) 0.792 (0.023) 0.913 (0.003) 0.654 (0.008) 0.827 (0.004)
proposed () 0.772 (0.018) 0.796 (0.026) 0.916 (0.003) 0.690 (0.004) 0.831 (0.004)
channel-based () 0.770 (0.014) 0.789 (0.013) 0.904 (0.003) 0.665 (0.007) 0.809 (0.005)
channel-based () 0.767 (0.017) 0.804 (0.009) 0.904 (0.003) 0.663 (0.008) 0.810 (0.004)
channel-based () 0.765 (0.019) 0.800 (0.015) 0.896 (0.005) 0.648 (0.009) 0.813 (0.004)
channel-based () 0.763 (0.016) 0.787 (0.013) 0.902 (0.004) 0.659 (0.011) 0.809 (0.005)
channel-based () 0.764 (0.016) 0.777 (0.017) 0.902 (0.007) 0.663 (0.003) 0.809 (0.005)
channel-based () 0.769 (0.016) 0.772 (0.015) 0.905 (0.008) 0.674 (0.006) 0.814 (0.005)
3D 0.769 (0.016) 0.763 (0.014) 0.924 (0.002) 0.635 (0.009) 0.841 (0.004)
Seg-Net [2]
2D 0.744 (0.021) 0.755 (0.020) 0.782 (0.011) 0.643 (0.006) 0.763 (0.005)
proposed () 0.746 (0.017) 0.766 (0.017) 0.786 (0.007) 0.625 (0.011) 0.773 (0.005)
proposed () 0.756 (0.017) 0.767 (0.018) 0.789 (0.006) 0.635 (0.005) 0.772 (0.005)
proposed () 0.752 (0.018) 0.767 (0.015) 0.790 (0.008) 0.627 (0.004) 0.774 (0.005)
proposed () 0.760 (0.013) 0.760 (0.018) 0.792 (0.013) 0.623 (0.012) 0.784 (0.005)
proposed () 0.751 (0.016) 0.761 (0.015) 0.796 (0.008) 0.620 (0.011) 0.791 (0.005)
proposed () 0.763 (0.016) 0.778 (0.014) 0.798 (0.010) 0.662 (0.011) 0.802 (0.005)
channel-based () 0.752 (0.019) 0.767 (0.016) 0.784 (0.006) 0.638 (0.010) 0.772 (0.005)
channel-based () 0.755 (0.018) 0.769 (0.015) 0.766 (0.004) 0.629 (0.010) 0.771 (0.005)
channel-based () 0.753 (0.016) 0.755 (0.016) 0.766 (0.010) 0.639 (0.007) 0.770 (0.005)
channel-based () 0.748 (0.019) 0.737 (0.014) 0.749 (0.008) 0.628 (0.006) 0.767 (0.005)
channel-based () 0.747 (0.015) 0.735 (0.008) 0.742 (0.011) 0.627 (0.005) 0.762 (0.005)
channel-based () 0.754 (0.018) 0.741 (0.012) 0.762 (0.009) 0.649 (0.009) 0.775 (0.005)
3D 0.726 (0.013) 0.735 (0.012) 0.744 (0.012) 0.599 (0.005) 0.771 (0.005)
TABLE III: Mean \glsdsc (and standard deviation) of five runs on BraTS19, KiTS19, IBSR18, U-HAND and U-PRO data sets. Models were trained using the summation of the soft \glsdsc with the \glsce loss.
Fig. 10: Qualitative results of proposed method on all data sets (see Figure 3 in the Supplementary Material for the qualitative results of the proposed method in the same evaluated examples). From top to bottom: (i) \glsbrats tumor structures: the necrotic and non-enhancing tumor core (NCR/NET—label 1, red), the peritumoral edema (ED—label 2, green) and the GD-enhancing tumor (ET—label 4, yellow); (ii) \glskits class structure: the kidney (red) and kidney tumor (green); (iii) \glsibsr class structure: cerebrospinal fluid (red), white matter (green) and gray matter (blue); (iv) \glshene class structure: left and right submandibular glands (red and green), left and right parotid glands (dark blue and yellow), larynx (light blue), and medulla oblongata (pink); (v) \glspros class structure: prostate (red), bladder (green) and rectum (blue). From left to right: 2D, , 3D, and ground truth (GT).

Supplementary Material

Vi-a Structure Analysis

Feature Extraction

We selected three data set features that describe each data set’s structural properties: structure depth, structure size relative to the total volume, and average structural inter-slice spatial displacement (see Table IV-VIII). These aforementioned structural properties are computed as follows.

The structure depth of class is computed as

(5)

where denotes the patient , represents an unconnected region of class in patient , the denotes the number of consecutive slices (in the axial dimension) of region . Here and are the number of patients and unconnected regions of class in patient , respectively.

The structure size relative to the total volume of class is defined as

(6)

where denotes the total number of voxels labeled as class in patient . The , , and are the height, width, and depth of the input volume, respectively.

To compute the structure spatial displacement, we first compute the center of mass, , of class of patient at slice (in the axial dimension) as

(7)

where is the value of a voxel at coordinates in class in patient and slice , and where denotes the total number of voxels labeled as class in patient in slice .

With these, the structure spatial displacement of class is computed as

(8)

where denotes the Euclidian distance between two coordinate points, and .

Regression Analysis

To find explanations to why the \acrshortpros data set is an exception compared to the other data sets, we attempted to connect \acrshortdsc behaviour with , that is the number of slices extracted from the whole volume with a total of slices as the subvolume input, to differences in data set properties including structure depth, structure size relative to the total volume, and average structural inter-slice spatial displacement, and the number of training samples.

We aggregated all these structure properties to generate minimum, mean and maximum of , and over all classes in each data set, such that for each data set we obtained a fixed set of nine features. Overall, we formed a regression task to compute regression models for all models (U-Net and SegNet), architectures (2D, 3D, proposed, and channel-based), s (number of slices extracted from the whole volume), and data sets including \glsbrats, \glskits, \glsibsr, \glshene and \glspros with the following input features:

  • , minimum of structure depth over classes,

  • , average of structure depth over classes,

  • , maximum of structure depth over classes,

  • , minimum of structure size relative to the total volume over classes,

  • , mean of structure size relative to the total volume over classes,

  • , maximum of structure size relative to the total volume over classes,

  • , minimum of spatial displacement over classes,

  • , average of spatial displacement over classes,

  • , maximum of spatial displacement over classes.

We used the Bootstrap (with rounds) to compute the mean regression coefficient vectors and the corresponding confidence intervals the some regularised linear regression models. In the regression analysis, we used Ridge regression, Lasso, Elastic Net, and Bayesian ARD regression. The analysis was performed using scikit-learn 0.223. However, the regression analysis was non-conclusive, revealing no relation between the structure features and the number of input slices, .

Extracted Features

See Table IV-VIII for more details.

{adjustbox}

max width= class non-enhancing tumor core 33.3 (17.20) 25.20 (33.70) 1.76 (0.94) peritumoral edema 60.6 (16.50) 61.90 (47.60) 1.65 (0.85) GD-enhancing tumor 31.9 (18.10) 18.80 (20.50) 1.36 (1.99)

TABLE IV: Mean and standard deviation of of each class’s depth, , count over the total number of pixels, , spatial displacement, , in the axial direction over all volumes in the BraTS19 data set.
{adjustbox}

max width= class kidney 37.34 (4.34) 182.20 (50.00) 4.12 (0.98) kidney tumor 15.64 (10.84) 53.49 (107.85) 1.90 (1.39)

TABLE V: Mean and standard deviation of of each class’s depth, , count over the total number of pixels, , spatial displacement, , in the axial direction over all volumes in the KiTS19 data set.
{adjustbox}

max width= class cerebrospinal fluid 55.67 (13.42) 18.31 (7.56) 1.89 (0.29) white matter 140.06 (12.55) 815.62 (205.51) 0.82 (0.79) gray matter 114.78 (13.01) 431.49 (77.95) 0.98 (0.11)

TABLE VI: Mean and standard deviation of of each class’s depth, , count over the total number of pixels, , spatial displacement, , in the axial direction over all volumes in the IBSR18 data set.
{adjustbox}

max width= class left submandibular glands 5.19 (1.29) 2.38 (0.81) 1.67 (0.66) right submandibular glands 5.21 (1.43) 2.34 (0.78) 1.87 (0.82) left parotid glands 7.53 (2.09) 7.16 (2.96) 1.99 (0.82) right parotid glands 7.41 (2.15) 6.89 (2.83) 2.01 (0.65) larynx 5.38 (1.31) 10.62 (3.35) 1.45 (0.74) medulla oblongata 30.81 (5.35) 11.52 (2.88) 2.01 (0.54)

TABLE VII: Mean and standard deviation of of each class’s depth, , count over the total number of pixels, , spatial displacement, , in the axial direction over all volumes in the U-HAND data set.
{adjustbox}

max width= class prostate 10.7 (2.30) 2.34 (1.02) 0.63 (0.25) bladder 10.2 (3.90) 4.80 (3.18) 1.45 (0.69) rectum 25.0 (3.50) 2.99 (1.10) 0.95 (0.26)

TABLE VIII: Mean and standard deviation of of each class’s depth, , count over the total number of pixels, , spatial displacement, , in the axial direction over all volumes in the U-PRO data set.

Vi-B Supplementary Computational Results

To compare the computational cost of our proposed models to the corresponding 2D and 3D \glscnn models, we extracted the number of trainable parameters, the maximum amount of \glsgpu memory used, the number of \glsflops, training time per epoch, and prediction time per sample.

The computational costs of the models used for \glsbrats experiments are presented in Table 2 of the main paper. The number of model parameters, graphical memory use, and \glsflops are only dependent on the model type, and therefore are equal for all other data sets. The same variables are shown here for the other data sets in Table IXXII, where the only differences are in the training and inference times due to the different numbers of samples; these two parameters scale with the data set size.

{adjustbox}

max width= Model #slices #params memory #FLOPS t per epoch p per sample 2D 1 494k 468MB 2.456M 87s 19.04s proposed 3 497k 498MB 2.469M 118s 19.40s 5 504k 520MB 2.504M 131s 20.10s 7 511k 542MB 2.538M 201s 21.35s 9 518k 565MB 2.573M 255s 22.07s 11 525k 587MB 2.608M 408s 22.69s 13 530k 609MB 2.643M 576s 23.07s channel-based 3 495k 486MB 2.457M 91s 19.30s 5 495k 498MB 2.459M 93s 19.04s 7 495k 511MB 2.460M 169s 19.57s 9 496k 524MB 2.462M 240s 19.57s 11 496k 535MB 2.463M 395s 19.83s 13 497k 546MB 2.465M 551s 20.01s 3D 32 1 460k 16 315MB 7.297M 1 041s 3.08 s

TABLE IX: Architecture comparison. Experiment on U-Net architecture on KiTS19 data set. Patch shape was set at where is the number of slices. Here, t and p denote the training time per epoch and prediction time per sample, respectively. This experiment was performed on an NVIDIA GeForce GTX 1080 Ti.
{adjustbox}

max width= Model #slices #params memory #FLOPS t per epoch p per sample 2D 1 494 457 2.398M 11s 10.02s proposed 3 497k 486MB 2.410M 15s 10.21s 5 504k 507MB 2.441M 17s 10.58s 7 511k 528MB 2.472M 25s 11.24s 9 518k 550MB 2.505M 32s 11.62s 11 525k 571MB 2.537M 52s 11.95s 13 530k 592MB 2.569M 73s 12.15s channel-based 3 495k 472MB 2.386M 12s 10.16s 5 495k 484MB 2.390M 12s 10.02s 7 495k 497MB 2.393M 21s 10.30s 9 496k 510MB 2.396M 30s 10.30s 11 496k 521MB 2.399M 50s 10.44s 13 497k 532MB 2.402M 70s 10.53s 3D 32 1 460k 15 897MB 7.11M 132s 1.62s

TABLE X: Architecture comparison. Experiment on U-Net architecture on IBSR data set. Patch shape was set at where is the number of slices. Here, t and p denote the training time per epoch and prediction time per sample, respectively. This experiment was performed on an NVIDIA GeForce GTX 1080 Ti.
{adjustbox}

max width= Model #slices #params memory #FLOPS t per epoch p per sample 2D 1 494k 468MB 2.456M 16s 19.04s proposed 3 497k 498MB 2.469M 22s 19.40s 5 504k 520MB 2.504M 24s 20.10s 7 511k 542MB 2.538M 37s 21.35s 9 518k 565MB 2.573M 47s 22.07s 11 525k 587MB 2.608M 75s 22.69s 13 530k 609MB 2.643M 106s 23.07s channel-based 3 495k 486MB 2.457M 17s 19.30s 5 495k 498MB 2.459M 17s 19.04s 7 495k 511MB 2.460M 31s 19.57s 9 496k 524MB 2.462M 44s 19.57s 11 496k 535MB 2.463M 73s 19.83s 13 497k 546MB 2.465M 101 20.01s 3D 32 1 460k 16 315MB 7.297M 191 3.08 s

TABLE XI: Architecture comparison. Experiment on U-Net architecture on U-HAND data set. Patch shape was set at where is the number of slices. Here, t and p denote the training time per epoch and prediction time per sample, respectively. This experiment was performed on an NVIDIA GeForce GTX 1080 Ti.
{adjustbox}

max width= Model #slices #params memory #FLOPS t per epoch p per sample 2D 1 494k 468MB 2.456M 250s 19.04s proposed 3 497k 498MB 2.469M 340s 19.40s 5 504k 520MB 2.504M 376s 20.10s 7 511k 542MB 2.538M 578s 21.35s 9 518k 565MB 2.573M 732s 22.07s 11 525k 587MB 2.608M 1 171s 22.69s 13 530k 609MB 2.643M 1 654s 23.07s channel-based 3 495k 486MB 2.457M 262s 19.30s 5 495k 498MB 2.459M 268s 19.04s 7 495k 511MB 2.460M 485s 19.57s 9 496k 524MB 2.462M 689s 19.57s 11 496k 535MB 2.463M 1 136s 19.83s 13 497k 546MB 2.465M 1 583s 20.01s 3D 32 1 460k 16 315MB 7.297M 2 990s 3.08 s

TABLE XII: Architecture comparison. Experiment on U-Net architecture on U-PRO data set. Patch shape was set at where is the number of slices. Here, t and p denote the training time per epoch and prediction time per sample, respectively. This experiment was performed on an NVIDIA GeForce GTX 1080 Ti.

Vi-C SegNet Pseudo-3D architecture

Figure 2 in the main paper shows both pseudo-3D methods with a U-Net backbone. Here, Figure 11 shows the same methods but with a SegNet backbone.

Fig. 11: Proposed methods with a SegNet backbone. The output is the prediction for the central slice of the input. The numbers in the transition block indicate the depth and in the backbone the number of filters. Top: a transition block uses 3D convolution and 2D padding to iteratively reduce the input from depth to , while the width and height do not change. Bottom: the neighboring slices are regarded as multiple channels, and the input can be fed into the 2D \acrshortcnn right away.

Vi-D \acrshortpros Subset Experiments Results

A distinction between the \glspros set and the others included in this study is its much larger number of samples. This feature was hypothesized to influence the relation between and \glsdsc, and therefore the following analysis was performed: the same experiments were performed but now training on distinct subsets of samples from the \glspros data set. The average scores obtained from the five distinct subsets can be found here in Figure 12, where we see a similar behavior as in Figure 9 in the main paper. Hence, we rule out the data set size as the main cause of the \glspros performance behaviour.

Fig. 12: Mean and standard error of test set on the U-PRO data set. Each run was trained on 200 patients.

Vi-E Supplementary Quantitative Results

In an earlier stage of this project, we employed a different experimental setup with a pure \glsdsc loss function. However, these initial experiments proved this loss not to be sufficient for all data sets. Particularly the \glskits and \glshene data sets yielded unacceptably unstable results which, even with exactly equal hyperparameters, could either result in fairly accurate segmentations or complete failure. Investigation of the \glspldsc of individual structures demonstrated that in these failed experiments, multiple structures did not improve beyond a \glsdsc on the order of 0.1. After adapting the loss function to include also the \glsce term, the results improved substantially for all data sets. Performance details for each run using the pure \glsdsc and final loss function can be seen in Figure 13 and Table III.

Fig. 13: Mean and standard deviation of five runs on BraTS19, KiTS19, IBSR18, U-HAND and U-PRO data sets using the soft \glsdsc loss.
{adjustbox}

max width=1.1 material/data set \acrshortbrats \acrshortkits \acrshortibsr \acrshorthene \acrshortpros #epochs 200 200 200 100 100 optimizer Adam Adam Adam Adam Adam learning rate learning rate drop patience 5 5 10 6 5 early-stopping 12 12 25 14 11 U-Net 2D 0.741 (0.004) 0.479 (0.010) 0.898 (0.001) 0.537 (0.272) 0.788 (0.005) proposed () 0.749 (0.005) 0.494 (0.015) 0.884 (0.006) 0.495 (0.137) 0.785 (0.005) proposed () 0.743 (0.006) 0.486 (0.016) 0.892 (0.003) 0.423 (0.222) 0.847 (0.004) proposed () 0.744 (0.003) 0.495 (0.009) 0.894 (0.004) 0.445 (0.208) 0.861 (0.004) proposed () 0.736 (0.011) 0.505 (0.011) 0.896 (0.003) 0.404 (0.171) 0.840 (0.004) proposed () 0.749 (0.005) 0.492 (0.011) 0.886 (0.008) 0.479 (0.081) 0.818 (0.004) channel-based () 0.736 (0.005) 0.489 (0.008) 0.903 (0.001) 0.667 (0.022) 0.789 (0.005) channel-based () 0.742 (0.008) 0.487 (0.009) 0.903 (0.001) 0.512 (0.264) 0.855 (0.004) channel-based () 0.742 (0.010) 0.477 (0.013) 0.901 (0.001) 0.504 (0.181) 0.859 (0.004) channel-based () 0.739 (0.005) 0.481 (0.009) 0.903 (0.001) 0.519 (0.168) 0.853 (0.005) channel-based () 0.742 (0.004) 0.485 (0.015) 0.902 (0.001) 0.501 (0.195) 0.792 (0.005) 3D 0.724 (0.004) 0.436 (0.037) 0.699 (0.041) 0.318 (0.159) 0.827 (0.004) Seg-Net 2D 0.716 (0.003) 0.429 (0.010) 0.766 (0.002) 0.406 (0.145) 0.756 (0.005) proposed () 0.719 (0.010) 0.435 (0.009) 0.778 (0.002) 0.395 (0.223) 0.816 (0.005) proposed () 0.717 (0.007) 0.443 (0.009) 0.774 (0.002) 0.465 (0.229) 0.797 (0.005) proposed () 0.718 (0.005) 0.443 (0.007) 0.772 (0.006) 0.409 (0.217) 0.809 (0.005) proposed () 0.719 (0.009) 0.447 (0.010) 0.761 (0.008) 0.484 (0.242) 0.828 (0.005) proposed () 0.721 (0.004) 0.444 (0.007) 0.767 (0.009) 0.449 (0.166) 0.751 (0.005) channel-based () 0.720 (0.004) 0.428 (0.012) 0.777 (0.002) 0.446 (0.136) 0.805 (0.005) channel-based () 0.713 (0.005) 0.417 (0.015) 0.781 (0.002) 0.336 (0.193) 0.805 (0.005) channel-based () 0.717 (0.004) 0.417 (0.008) 0.778 (0.003) 0.349 (0.251) 0.816 (0.005) channel-based () 0.722 (0.006) 0.424 (0.013) 0.772 (0.004) 0.385 (0.147) 0.807 (0.005) channel-based () 0.717 (0.002) 0.410 (0.016) 0.777 (0.003) 0.387 (0.239) 0.744 (0.005) 3D 0.687 (0.016) 0.252 (0.091) 0.381 (0.054) 0.037 (0.017) 0.635 (0.005)

TABLE XIII: Mean \glsdsc (and standard deviation) of five runs on BraTS19, KiTS19, IBSR18, U-HAND and U-PRO data sets. Models were trained using the soft \glsdsc loss.

Vi-F Supplementary Qualitative Results

Example segmentations are illustrated in Figure 1416. It is important to emphasize that the images are randomly selected single slices from thousands of samples and are therefore presented purely for illustrative purposes, and might not always be a representation of the overall segmentation performance of a particular data set.

Fig. 14: Qualitative results of the channel-based method on all data sets (see Figure 10 in the main paper for the qualitative results of the proposed method in the same evaluated examples). From top to bottom: (i) \acrshortbrats tumor structures: the necrotic and non-enhancing tumor core (NCR/NET—label 1, red), the peritumoral edema (ED—label 2, green) and the GD-enhancing tumor (ET—label 4, yellow); (ii) \acrshortkits class structure: the kidney (red) and kidney tumor (green); (iii) \acrshortibsr class structure: cerebrospinal fluid (red), white matter (green) and gray matter (blue); (iv) \acrshorthene class structure: left and right submandibular glands (red and green), left and right parotid glands (dark blue and yellow), larynx (light blue), and medulla oblongata (pink); (v) \acrshortpros class structure: prostate (red), bladder (green) and rectum (blue). From left to right: 2D, , 3D, and ground truth (GT).
Fig. 15: Qualitative results of proposed method on all data sets (see Figure 16 in the Supplementary Material for the qualitative results of the channel-based method in the same evaluated examples). From top to bottom: (i) \acrshortbrats tumor structures: the necrotic and non-enhancing tumor core (NCR/NET—label 1, red), the peritumoral edema (ED—label 2, green) and the GD-enhancing tumor (ET—label 4, yellow); (ii) \acrshortkits class structure: the kidney (red) and kidney tumor (green); (iii) \acrshortibsr class structure: cerebrospinal fluid (red), white matter (green) and gray matter (blue); (iv) \acrshorthene class structure: left and right submandibular glands (red and green), left and right parotid glands (dark blue and yellow), larynx (light blue), and medulla oblongata (pink); (v) \acrshortpros class structure: prostate (red), bladder (green) and rectum (blue). From left to right: 2D, , 3D, and ground truth (GT).
Fig. 16: Qualitative results of channel-based method on all data sets (see Figure 15 in the Supplementary Material for the qualitative results of the proposed method in the same evaluated examples). From top to bottom: (i) \acrshortbrats tumor structures: the necrotic and non-enhancing tumor core (NCR/NET—label 1, red), the peritumoral edema (ED—label 2, green) and the GD-enhancing tumor (ET—label 4, yellow); (ii) \acrshortkits class structure: the kidney (red) and kidney tumor (green); (iii) \acrshortibsr class structure: cerebrospinal fluid (red), white matter (green) and gray matter (blue); (iv) \acrshorthene class structure: left and right submandibular glands (red and green), left and right parotid glands (dark blue and yellow), larynx (light blue), and medulla oblongata (pink); (v) \acrshortpros class structure: prostate (red), bladder (green) and rectum (blue). From left to right: 2D, , 3D, and ground truth (GT).

Footnotes

  1. https://keras.io
  2. https://tensorflow.org
  3. https://scikit-learn.org/stable/

References

  1. R. Anirudh, J. J. Thiagarajan, T. Bremer and H. Kim (2016) Lung nodule detection using 3d convolutional neural networks trained on weakly labeled data. In Medical Imaging 2016: Computer-Aided Diagnosis, Vol. 9785, pp. 978532. Cited by: §I-A.
  2. V. Badrinarayanan, A. Kendall and R. Cipolla (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE transactions on pattern analysis and machine intelligence 39 (12), pp. 2481–2495. Cited by: §I-B, §II, TABLE III.
  3. S. Bakas, H. Akbari, A. Sotiras, M. Bilello, M. Rozycki, J. S. Kirby, J. B. Freymann, K. Farahani and C. Davatzikos (2017) Advancing the cancer genome atlas glioma MRI collections with expert segmentation labels and radiomic features. Scientific data 4, pp. 170117. Cited by: §III-A3.
  4. S. Bauer, R. Wiest, L. Nolte and M. Reyes (2013) A survey of mri-based medical image analysis for brain tumor studies. Physics in Medicine & Biology 58 (13), pp. R97. Cited by: §I.
  5. H. Chen, Q. Dou, L. Yu, J. Qin and P. Heng (2018) VoxResNet: deep voxelwise residual networks for brain segmentation from 3d mr images. NeuroImage 170, pp. 446 – 455. External Links: ISSN 1053-8119 Cited by: §I-A.
  6. Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox and O. Ronneberger (2016) 3D u-net: learning dense volumetric segmentation from sparse annotation. In International conference on medical image computing and computer-assisted intervention, pp. 424–432. Cited by: §I-A, §I.
  7. C. A. Cocosco, V. Kollokian, R. K. Kwan, G. B. Pike and A. C. Evans (1997) Brainweb: online interface to a 3D MRI simulated brain database. In NeuroImage, Cited by: §III-A5.
  8. A. de Brebisson and G. Montana (2015) Deep neural networks for anatomical brain segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 20–28. Cited by: §I-A.
  9. J. Dolz, H. Kirişli, T. Fechter, S. Karnitzki, O. Oehlke, U. Nestle, M. Vermandel and L. Massoptier (2016) Interactive contour delineation of organs at risk in radiotherapy: clinical evaluation on nsclc patients. Medical physics 43 (5), pp. 2569–2580. Cited by: §I.
  10. H. Dong, A. Supratak, L. Mai, F. Liu, A. Oehmichen, S. Yu and Y. Guo (2017) TensorLayer: A Versatile Library for Efficient Deep Learning Development. ACM Multimedia. External Links: Link Cited by: §III-C2.
  11. Q. Dou, L. Yu, H. Chen, Y. Jin, X. Yang, J. Qin and P. Heng (2017) 3D deeply supervised network for automated segmentation of volumetric medical images. Medical image analysis 41, pp. 40–54. Cited by: §I.
  12. X. Feng, K. Qing, N. J. Tustison, C. H. Meyer and Q. Chen (2019) Deep convolutional neural network for segmentation of thoracic organs-at-risk using cropped 3d images. Medical physics. Cited by: §I-A.
  13. P. Ganaye, M. Sdika, B. Triggs and H. Benoit-Cattin (2019) Removing segmentation inconsistencies with semi-supervised non-adjacency constraint. Medical Image Analysis 58, pp. 101551. External Links: ISSN 1361-8415 Cited by: §I-A.
  14. Y. Geng, Y. Ren, R. Hou, S. Han, G. D. Rubin and J. Y. Lo (2019) 2.5 d cnn model for detecting lung disease using weak supervision. In Medical Imaging 2019: Computer-Aided Diagnosis, Vol. 10950, pp. 109503O. Cited by: §I-A.
  15. N. Ghavami, Y. Hu, E. Bonmati, R. Rodell, E. Gibson, C. Moore and D. Barratt (2018) Integration of spatial information in convolutional neural networks for automatic segmentation of intraoperative transrectal ultrasound images. Journal of Medical Imaging 6 (1), pp. 011003. Cited by: §I-A.
  16. X. Han (2017) Automatic liver lesion segmentation using a deep convolutional neural network method. arXiv preprint arXiv:1704.07239. Cited by: §I-A.
  17. N. Heller, N. Sathianathen, A. Kalapara, E. Walczak, K. Moore, H. Kaluzniak, J. Rosenberg, P. Blake, Z. Rengel and M. Oestreich (2019) The KiTS19 challenge data: 300 kidney tumor cases with clinical context, CT semantic segmentations, and surgical outcomes. arXiv preprint arXiv:1904.00445. Cited by: §III-A4.
  18. K. Kamnitsas, C. Ledig, V. F. Newcombe, J. P. Simpson, A. D. Kane, D. K. Menon, D. Rueckert and B. Glocker (2017) Efficient multi-scale 3d cnn with fully connected crf for accurate brain lesion segmentation. Medical image analysis 36, pp. 61–78. Cited by: §I-A.
  19. D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §III-C1.
  20. T. Kitrungrotsakul, X. Han, Y. Iwamoto, L. Lin, A. H. Foruzan, W. Xiong and Y. Chen (2019) VesselNet: a deep convolutional neural network with multi pathways for robust hepatic vessel segmentation. Computerized Medical Imaging and Graphics 75, pp. 74 – 83. External Links: ISSN 0895-6111 Cited by: §I-A.
  21. T. Kitrungrotsakul, Y. Iwamoto, X. Han, S. Takemoto, H. Yokota, S. Ipponjima, T. Nemoto, X. Wei and Y. Chen (2019) A cascade of cnn and lstm network with 3d anchors for mitotic cell detection in 4d microscopic image. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1239–1243. Cited by: §I-A.
  22. W. Li, G. Wang, L. Fidon, S. Ourselin, M. J. Cardoso and T. Vercauteren (2017) On the compactness, efficiency, and representation of 3d convolutional networks: brain parcellation as a pretext task. In International Conference on Information Processing in Medical Imaging, pp. 348–360. Cited by: §I-A.
  23. C. Lian, J. Zhang, M. Liu, X. Zong, S. Hung, W. Lin and D. Shen (2018) Multi-channel multi-scale fully convolutional network for 3d perivascular spaces segmentation in 7t mr images. Medical image analysis 46, pp. 106–117. Cited by: §I-A.
  24. G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. Van Der Laak, B. Van Ginneken and C. I. Sánchez (2017) A survey on deep learning in medical image analysis. Medical image analysis 42, pp. 60–88. Cited by: §I, §I.
  25. F. Lu, F. Wu, P. Hu, Z. Peng and D. Kong (2017-02-01) Automatic 3d liver location and segmentation via convolutional neural network and graph cut. International Journal of Computer Assisted Radiology and Surgery 12 (2), pp. 171–182. Cited by: §I-A.
  26. M. Lyksborg, O. Puonti, M. Agn and R. Larsen (2015) An ensemble of 2d convolutional neural networks for tumor segmentation. In Scandinavian Conference on Image Analysis, pp. 201–211. Cited by: §I-A.
  27. B. H. Menze, A. Jakab, S. Bauer, J. Kalpathy-Cramer, K. Farahani, J. Kirby, Y. Burren, N. Porz, J. Slotboom and R. Wiest (2014) The multimodal brain tumor image segmentation benchmark (BRATS). IEEE transactions on medical imaging 34 (10), pp. 1993–2024. Cited by: §III-A3.
  28. F. Milletari, N. Navab and S. Ahmadi (2016) V-net: fully convolutional neural networks for volumetric medical image segmentation. In 2016 Fourth International Conference on 3D Vision (3DV), pp. 565–571. Cited by: §I-A, §I.
  29. P. Mlynarski, H. Delingette, A. Criminisi and N. Ayache (2019) 3D convolutional neural networks for tumor segmentation using long-range 2d context. Computerized Medical Imaging and Graphics 73, pp. 60–72. Cited by: §I-A.
  30. A. A. Novikov, D. Major, M. Wimmer, D. Lenis and K. Bühler (2018) Deep sequential segmentation of organs in volumetric medical scans. IEEE transactions on medical imaging 38 (5), pp. 1207–1215. Cited by: §I-A.
  31. A. Prasoon, K. Petersen, C. Igel, F. Lauze, E. Dam and M. Nielsen (2013) Deep feature learning for knee cartilage segmentation using a triplanar convolutional neural network. In International conference on medical image computing and computer-assisted intervention, pp. 246–253. Cited by: §I-A.
  32. R. M. Rangayyan, F. J. Ayres and J. L. Desautels (2007) A review of computer-aided diagnosis of breast cancer: toward the detection of subtle signs. Journal of the Franklin Institute 344 (3-4), pp. 312–348. Cited by: §I.
  33. X. Ren, L. Xiang, D. Nie, Y. Shao, H. Zhang, D. Shen and Q. Wang (2018) Interleaved 3d-cnns for joint segmentation of small-volume structures in head and neck ct images. Medical Physics 45 (5), pp. 2063–2075. Cited by: §I-A.
  34. O. Ronneberger, P. Fischer and T. Brox (2015) U-net: convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pp. 234–241. Cited by: §I-A, §I-B, §II, TABLE III.
  35. H. R. Roth, L. Lu, A. Seff, K. M. Cherry, J. Hoffman, S. Wang, J. Liu, E. Turkbey and R. M. Summers (2014) A new 2.5 d representation for lymph node detection using random sets of deep convolutional neural network observations. In International conference on medical image computing and computer-assisted intervention, pp. 520–527. Cited by: §I-A.
  36. A. G. Roy, S. Conjeti, S. P. K. Karri, D. Sheet, A. Katouzian, C. Wachinger and N. Navab (2017-08) ReLayNet: retinal layer and fluid segmentation of macular optical coherence tomography using fully convolutional networks. Biomed. Opt. Express 8 (8), pp. 3627–3642. Cited by: §III-C1.
  37. B. Sahiner, A. Pezeshk, L. M. Hadjiiski, X. Wang, K. Drukker, K. H. Cha, R. M. Summers and M. L. Giger (2019) Deep learning in medical imaging and radiation therapy. Medical physics 46 (1), pp. e1–e36. Cited by: §I.
  38. D. Shen, G. Wu and H. Suk (2017) Deep learning in medical image analysis. Annual review of biomedical engineering 19, pp. 221–248. Cited by: §I.
  39. P. Y. Simard, D. Steinkraus and J. C. Platt (2003) Best practices for convolutional neural networks applied to visual document analysis.. In Icdar, Vol. 3. Cited by: §III-C2.
  40. N. J. Tustison, B. B. Avants, P. A. Cook, Y. Zheng, A. Egan, P. A. Yushkevich and J. C. Gee (2010) N4ITK: improved n3 bias correction. IEEE transactions on medical imaging 29 (6), pp. 1310. Cited by: §III-B1.
  41. M. H. Vu, G. Grimbergen, A. Simkó, T. Nyholm and T. Löfstedt (2019) End-to-End Cascaded U-Nets with a Localization Network for Kidney Tumor Segmentation. arXiv preprint arXiv:1910.07521. Cited by: §I-A.
  42. M. H. Vu, T. Nyholm and T. Löfstedt (2019) TuNet: End-to-end Hierarchical Brain Tumor Segmentation using Cascaded Networks. arXiv preprint arXiv:1910.05338. Cited by: §I-A.
  43. K. C. L. Wong, M. Moradi, H. Tang and T. Syeda-Mahmood (2018) 3D segmentation with exponential logarithmic loss for highly unbalanced object sizes. In Medical Image Computing and Computer Assisted Intervention – MICCAI 2018, A. F. Frangi, J. A. Schnabel, C. Davatzikos, C. Alberola-López and G. Fichtinger (Eds.), Cham, pp. 612–619. Cited by: §III-C1.
  44. D. Yang, S. Zhang, Z. Yan, C. Tan, K. Li and D. Metaxas (2015) Automated anatomical landmark detection ondistal femur surface using convolutional neural network. In 2015 IEEE 12th international symposium on biomedical imaging (ISBI), pp. 17–21. Cited by: §I-A.
  45. L. Yu, J. Cheng, Q. Dou, X. Yang, H. Chen, J. Qin and P. Heng (2017) Automatic 3d cardiovascular mr segmentation with densely-connected volumetric convnets. In Medical Image Computing and Computer-Assisted Intervention - MICCAI 2017, M. Descoteaux, L. Maier-Hein, A. Franz, P. Jannin, D. L. Collins and S. Duchesne (Eds.), Cham, pp. 287–295. External Links: ISBN 978-3-319-66185-8 Cited by: §I-A.
  46. L. Yu, X. Yang, H. Chen, J. Qin and P. A. Heng (2017) Volumetric convnets with mixed residual connections for automated prostate segmentation from 3d mr images. In Thirty-first AAAI conference on artificial intelligence, Cited by: §I-A.
  47. B. Zackrisson, P. Nilsson, E. Kjellén, K. Johansson, H. Modig, E. Brun, J. Nyman, S. Friesland, J. Reizenstein, H. Sjödin, L. Ekberg, B. Lödén, C. Mercke, J. Fernberg, L. Franzén, A. Ask, E. Persson, G. Wickart-Johansson, F. Lewin, L. Wittgren, O. Björ and T. Björk-Eriksson (2011) Two-year results from a swedish study on conventional versus accelerated radiotherapy in head and neck squamous cell carcinoma–the artscan study. Radiotherapy and Oncology 100 (1), pp. 41–48. Cited by: §III-A2.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
402597
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description