ErrorNet: A Unified Error Injection-Prediction Framework Applied to Retinal Vessel Segmentation

ErrorNet: A Unified Error Injection-Prediction Framework Applied to Retinal Vessel Segmentation

ErrorNet: Automated Correction of Segmentation Errors Caused by Unrepresentative Datasets

ErrorNet: Learning error representations from limited data to improve vascular segmentation

Abstract

Deep convolutional neural networks have proved effective in segmenting lesions and anatomies in various medical imaging modalities. However, in the presence of small sample size and domain shift problems, these models often produce masks with non-intuitive segmentation mistakes. In this paper, we propose a segmentation framework called ErrorNet, which learns to correct these segmentation mistakes through the repeated process of injecting systematic segmentation errors to the segmentation result based on a learned shape prior, followed by attempting to predict the injected error. During inference, ErrorNet corrects the segmentation mistakes by adding the predicted error map to the initial segmentation result. ErrorNet has advantages over alternatives based on domain adaptation or CRF-based post processing, because it requires neither domain-specific parameter tuning nor any data from the target domains. We have evaluated ErrorNet using five public datasets for the task of retinal vessel segmentation. The selected datasets differ in size and patient population, allowing us to evaluate the effectiveness of ErrorNet in handling small sample size and domain shift problems. Our experiments demonstrate that ErrorNet outperforms a base segmentation model, a CRF-based post processing scheme, and a domain adaptation method, with a greater performance gain in the presence of the aforementioned dataset limitations.

\name

Nima Tajbakhsh, Brian Lai, Shilpa P. Ananth, Xiaowei Ding \addressVoxelCloud, Inc.
Shanghai Jiao Tong University {keywords} retinal vessel segmentation, limited data, domain shift, error prediction, error correction

1

1 Introduction

Medical imaging datasets are often unrepresentative of the patient population, lacking adequate annotated images or otherwise being limited to a particular clinical site or vendor. The former leads to the small sample size problem whereas the latter causes the domain shift problem. In the presence of unrepresentative training datasets, even the most sophisticated architectures (e.g., [18, 5]) may generate non-intuitive segmentation errors such as holes in the segmentations of connected organs or breaks along the segmented vessels. Although acquiring additional annotations strikes as a natural workaround to reduce systematic segmentation errors caused by unrepresentative datasets, it incurs substantial annotation cost. Recent active learning and interactive segmentation methodologies [13, 9, 11] provide cost-effective solutions to expand medical datasets, but they still require highly-trained medical experts in the loop. Unsupervised domain adaptation is an expert-free solution that aims to expand medical datasets by bridging the domain shift between the current training set and the target test sets. However, this approach requires unlabeled data from the target domains, which not only is scarcely available, but also does not scale well to many target domains (e.g., widespread clinical deployment). Post-processing methods based on conditional random fields (CRFs) is another approach to reducing systematic segmentation errors caused by limited datasets. While effective in natural images, application of CRFs in medical images have shown inconclusive results [14]. Furthermore, post-processing with CRFs often requires extensive application-specific parameter tuning. There is a need to develop a methodology that can reduce the systematic errors caused by dataset limitations without requiring experts, additional datasets, or extensive parameter tuning.

Figure 1: Overview of the suggested segmentation framework, ErrorNet. Given a training image , the segmentation network generates the initial segmentation result , which is then degraded by the error injection network based on a learned shape prior of the vessels, resulting in the degraded segmentation result . The prediction network takes as input the original image stacked with the degraded segmentation result, and outputs an error map , which attempts to predict the true error, , between the ground truth and the degraded segmentation map at the pixel-level.

In this paper, we propose ErrorNet, a segmentation framework with the capability of reducing segmentation errors caused by domain shifts or limited datasets. ErrorNet consists of a base segmentation network, an error injection network, and an error prediction network. During training, the error injection network degrades the segmentation result from the base network according to the shape prior of the region of interest. The degraded segmentation is then fed to the error prediction network that aims to estimate an error map between the degraded input and ground truth. Essentially, the segmentation network and error injection network work together to prepare a diverse set of input data for training the error prediction network. At test time, we feed the segmentation results from the base network to the error prediction network directly, and then obtain the corrected segmentation result by adding the predicted error map to the initial segmentation result. We have evaluated ErrorNet for the task of retinal vessel segmentation in fundus images using five public datasets. This choice of datasets allows us to assess the error correction capability of ErrorNet in both same-domain and cross-domain evaluation settings. Our experiments demonstrate that ErrorNet outperforms a base segmentation model, a CRF-based post processing scheme, and a domain adaptation method, with a greater performance gain for smaller training datasets that face larger domain shifts.

Contributions. Our contributions include: 1) a novel segmentation framework with an error correction mechanism based on shape prior, 2) extensive evaluation using five public datasets in both same- and cross-dataset evaluation, 3) demonstrated gain over several baselines including CRF-based post processing and a domain adaptation method.

2 Related work

Due to limited space, we contrast our approach against domain adaptation, post-segmentation refinement, and conditional prior networks, and refer the readers to [14] for a comprehensive survey on methods for medical image segmentation with limited datasets.

Domain adaptation: These methods typically require unlabeled [4, 1] or often labeled data [3, 17] from the target domains. Thus, they are hardly applicable when the target domain is unknown a priori or too diverse to have data from. In contrast, ErrorNet does not require data from the target domains as it corrects segmentation errors based on a shape prior learned during the training stage.

Conditional Prior Networks (CPNs): CPNs can faithfully learn a shape prior for flows between images [16]. We adopt a similar structure but instead of learning a prior distribution of flows, we use the CPN architecture to learn the prior distribution of segmentation masks. We additionally extend CPNs such that they can generate examples that lie within the learned segmentation prior [8]. We learn the shape prior of segmentation masks so that ErrorNet can leverage the learned segmentation shape prior and train itself to correct imperfect segmentation masks when only limited data is available.

Post-processing schemes: Methods based on different variants of CRFs can be used to force connectivity constraints in segmentation results [7, 15]. However, these methods often require extensive parameter tuning and have shown only mixed results for medical image segmentation [14]. In contrast, ErrorNet is end-to-end trainable and application agnostic, eliminating the need for heuristic designs. Denoising autoencoders have also been used as post-processing recently [10], but handcrafted, domain-specific error patterns are required for training. This limitation is overcome in ErrorNet as the error patterns are learned systematically.

3 Method

Figure 1 shows the overview of our proposed segmentation framework, ErrorNet, which consists of three consecutive networks: a base segmentation network, an error injection network, and an error prediction network. We explain each individual network as follows:

Base segmentation network: We choose the widely used U-Net as the base segmentation network, which is trained by minimizing the cross entropy loss, .

Error injection network: The task for the error injection network is to degrade the segmentation result by injecting error patterns to the initial segmentation result. However, it is critical for the error patterns to be representative; otherwise, the subsequent error prediction network would learn an unrelated and perhaps trivial vision task, leading to an ineffective error correction mechanism. Also, the error patterns must be diverse; otherwise, the subsequent error prediction network will overfit to a limited set of error patterns in the training set, particularly when the training set is small. The importance of diverse error patterns is even more pronounced for cross-domain model evaluation where the base segmentation model may produce segmentation maps with error patterns that are partially or largely different from that of the source training dataset. The choice of error injection network is thus critical to the success of the suggested framework.

For this purpose, we use a variational autoencoder (VAE), which is trained by minimizing where the first term constrains the degraded mask to be similar to the initial segmentation result and the second term constraints the latent space of the VAE to follow a standard normal distribution. A VAE generates representative and diverse error patterns, because, during training it learns a distribution over segmentation maps in a high dimensional feature space. By sampling from this distribution, VAE can generate diverse variants of a given segmentation map, thereby increasing the input space of imperfect segmentations, enabling us to train a more robust error prediction network.

Error prediction network: We use a shallow U-Net for the task of error prediction, which takes as input the degraded segmentation map stacked with the original image, and returns a k-channel error prediction map at the original image resolution, , where is the number of classes. We use the hyperbolic tangent function in the output layer, because its output changes between -1 and 1, allowing the class corrections in both positive (strengthening) and negative (weakening) directions. We train this network by minimizing where with being the ground truth mask for the image in the batch.

During testing, we bypass the error injection network, directly passing the initial segmentation result to the error prediction model. The final segmentation mask is obtained by adding the predicted error map to the initial segmentation.

4 Experiments

Architecture details: We use a U-Net as the base segmentation model. The U-Net consists of 4 downsampling blocks in the encoder and 4 upsampling blocks in the decoder. All convolutions layers use 3x3 kernels. We have followed the best practices as suggested in [6] for configuring and training our U-Net architecture. Specifically, we have used instance normalization in downsampling blocks and leaky Relu as activation functions, and further excluded large black background regions while normalizing the training images, to name a few. For the error injection module, we use a VAE with 3 downsampling blocks, a 6400 dimensional latent feature space, and 3 upsampling blocks. To ensure that the injected error patterns do not transform the segmentation mask completely, we sample from the latent space with a variance of 0.0001. The error prediction module follows a U-Net architecture with 3 downsampling blocks followed by 3 upsampling blocks with skip connections. Both VAE and error prediction network use batch normalization and relu activation functions. The number of kernels in the VAE and error prediction network were optimized so that a GPU with 12 GB RAM can hold the entire ErrorNet in memory. We refer the readers to the appendix for architecture details.

Training Details: While ErrorNet can be trained end-to-end, we have found that stage-wise training facilitates convergence. Specifically, we first train the base segmentation by minimizing the segmentation loss function, . We then freeze the weights of the base segmentation network and train error injection network by minimizing the VAE loss, . Once VAE is trained, we train the error prediction network by minimizing , while freezing the weights of the base segmentation and the error injection networks. We refer to this training scheme as stage-wise training henceforth. The ErrorNet can now be trained jointly in an end-to-end fashion, while freezing the weights of the error injection module, effectively minimizing . We refer to this training scheme as Joint Tr in Table 2.

Datasets: Table 1 summarizes the 5 datasets used to evaluate ErrorNet for the task of retinal vessel segmentation. The selected datasets vary in terms of size, population, and acquisition machine, allowing us to evaluate the effectiveness of ErrorNet under different sample sizes and domain shift.

Dataset # images Data splits
Total Diseased Healthy Train Val Test
DRIVE 40 7 33 18 2 20
STARE 20 10 10 10 2 8
CHASE 20 0 20 17 5 6
ARIA 143 82 61 121 5 17
HRF 45 30 15 26 5 14
Table 1: Datasets used in our experiments

Err Pred

VAE

Joint Tr

Train on CHASE ARIA
Architecture Test on CHASE DRIVE ARIA STARE HRF Avg. CHASE DRIVE ARIA STARE HRF Avg.
U-Net [6] 79.3 67.6 60.3 59.5 61.5 65.6 76.7 77.3 72.0 71.28 72.3 73.9
U-Net [6] + CRF 81.2 65.4 62.6 56.4 63.6 65.8 78.4 69.5 73.0 64.6 73.5 71.8
V-GAN[12] 79.7 71.5 64.2 61.0 66.4 68.5 68.7 75.8 69.9 66.2 69.3 70.0
DA-ADV [2] 72.3 69.3 68.2 64.7 67.4 68.4 71.5 72.9 73.2 71.3 70.7 71.9
80.2 68.6 60.7 60.2 62.7 66.4 76.8 72.1 72.0 71.9 72.2 73.0
80.1 71.8 61.1 59.8 67.2 68.0 76.2 77.3 72.2 72.2 72.8 74.1
ErrorNet w/ ablation 81.5 73.2 66.5 65.2 68.6 71.0 76.7 78.9 72.0 74.0 72.6 74.8
Table 2: Comparison between ErrorNet and other performance baselines. Dice is used for comparison (see the appendix for IoU). Grey columns indicate same-domain evaluation whereas the other columns contain the results for cross-domain evaluation. ErrorNet outperforms the baselines on-average, with a larger gain in the presence of domain shift (cross-domain evaluation) and small sample size (the small Chase dataset used for training). Ablation studies show that VAE and joint-training are effective in improving the performance of ErrorNet.

Performance baselines: We compare ErrorNet against 1) a U-Net carefully-optimized according to the best practices suggested in [6], the same U-Net with CRF-based post-processing, a recent unsupervised domain adaptation method [2], and V-GAN [12], which is a modern retinal vessel segmentation network trained in an adversarial manner.

Ablation study: We compare ErrorNet with and without VAE to study the impact of the error injection network. Without VAE, the error prediction network only sees the error patterns in the training dataset. To study the effect of joint training, we also compare ErrorNet with and without joint training.

Evaluation scenarios: We evaluate ErrorNet in the presence of small sample size and domain shift problems. To study the small size, we train ErrorNet using Chase, which is a small dataset, and Aria, which is the largest dataset under study. To study the domain shift problem, we evaluate the models above on the datasets other than the one they are trained on.

Results: Table 2 summarizes the results of both evaluation scenarios. When Chase dataset is used for training and testing, ErrorNet achieves a Dice of 81.5 outperforming all performance baselines. The ablation study also shows that ErrorNet with joint training achieves a 1-point increase in Dice. Inclusion of VAE, on the other hand, shows no significant performance gain. This is because the training and test domains are the same (Chase). In the cross-domain evaluation, ErrorNet and in particular the VAE module achieve greater performance gains over the baselines. Specifically, ErrorNet achieves an average Dice score of 71.0, outperforming the second best [12] and third best method [2] by 2.4 and 2.5 points, respectively. The VAE module also enables a 1.6-point increase in Dice. The widened performance gap is due to the domain shift caused by different patient population and pathologies present in the datasets. While Chase contains only healthy fundus images of children’s eyes with a central vessel reflex, all the other datasets used for testing contain pathological cases from adult populations. We hypothesize that ErrorNet can effectively learn the general structure of eye vessels; and thus, it can help correct mis-segmentations introduced by dataset limitations.

ErrorNet trained with Aria continues to improve the segmentation performance over the baseline models, with both VAE and joint training features showing consistent performance gains. However, the superiority over baselines is not as drastic as in the case where ErrorNet was trained with Chase. Intuitively, these results make sense. Aria is a larger, more varied dataset with images from both healthy and diseased patients; and thus, the models trained on Aria generalize better to other datasets. As a result, the improvements made by the error correction module are smaller.

Qualitative comparison: Figure 2 compares the segmentation results before and after error correction by the error prediction network. Recall that the error injection network is not used during inference—segmentation results are directly sent to the error prediction network for error correction. As highlighted by the yellow boxes, ErrorNet has connected fragmented vessels or sharpened vessel structures.

Figure 2: ErrorNet is effective in bridging breaks along segmented vessels. Each row compares the segmentation results before (left) and after (right) error correction. The yellow boxes indicate regions where the ErrorNet has connected fragmented vessels or sharpened vessel structures.Full-image results are available in the appendix.

5 Conclusion

We presented ErrorNet, a framework for systematic handling of segmentation errors caused by limited datasets. We evaluated ErrorNet using 5 public datasets for the task of retinal vessel segmentation. Our results demonstrated the effectiveness of ErrorNet in both same-dataset and cross-dataset evaluations, particularly when the size of training set was small and domain shift was large. Our future work would focus on evaluating ErrorNet on other medical image segmentation tasks as well as evaluating the effectiveness of ErrorNet for the task of active learning.

Appendix

This appendix consists of 6 figures and 4 tables. The figures serve to illustrate how the error correction mechanism of ErrorNet improves the segmentation results in high resolution uncropped images. Note that, due to limited space, we showed only low resolution cropped results in the main text. The tables contain our segmentation results based on IoU and also architecture details for the base segmentation network, error injection network, and error prediction network. The readers are welcome to contact Nima Tajbakhsh at ntajbakhsh@voxelcloud.io for further clarification on our method, results, or architecture details.

Figure A.1: [Chase HRF] Effectiveness of ErrorNet for cross-dataset evaluation, where the training set comes from the Chase dataset but the test set comes from the HRF dataset. Top: Fundus image. Bottom-Left: Segmentation result for an HRF dataset image from the segmentation network (before error correction). Bottom-Right: corresponding segmentation result generated by ErrorNet after error correction. The yellow boxes indicate example regions where the ErrorNet model has connected fragmented vessels or sharpened vessel structures.
Figure A.2: [Chase Drive] Effectiveness of ErrorNet for cross-dataset evaluation, where the training set comes from the Chase dataset but the test set comes from the Drive dataset. Top: Fundus image. Bottom-Left: Segmentation result for a Drive dataset image from the segmentation network (before error correction). Bottom-Right: corresponding segmentation result generated by ErrorNet after error correction. The yellow boxes indicate example regions where the ErrorNet model has connected fragmented vessels or sharpened vessel structures.
Figure A.3: [Chase Chase] Effectiveness of ErrorNet for same-dataset evaluation, where the training and test sets both come from the Chase dataset. Top: Fundus image. Bottom-Left: Segmentation result for a Chase dataset image from the segmentation network (before error correction). Bottom-Right: corresponding segmentation result generated by ErrorNet after error correction. The yellow boxes indicate example regions where the ErrorNet model has connected fragmented vessels or sharpened vessel structures. As expected, improvement is not as drastic as that of cross-dataset evaluation.
Figure A.4: [Aria HRF] Effectiveness of ErrorNet for cross-dataset evaluation, where the training set comes from the Aria dataset but the test set comes from the HRF dataset. Top: Fundus image. Bottom-Left: Segmentation result for an HRF dataset image from the segmentation network (before error correction). Bottom-Right: corresponding segmentation result generated by ErrorNet after error correction. The yellow boxes indicate example regions where the ErrorNet model has connected fragmented vessels or sharpened vessel structures.
Figure A.5: [Aria Stare] Effectiveness of ErrorNet for cross-dataset evaluation, where the training set comes from the Aria dataset but the test set comes from the Stare dataset. Top: Fundus image. Bottom-Left: Segmentation result for a Stare dataset image from the segmentation network (before error correction). Bottom-Right: corresponding segmentation result generated by ErrorNet after error correction. The yellow boxes indicate example regions where the ErrorNet model has connected fragmented vessels or sharpened vessel structures.
Figure A.6: [Aria Drive] Effectiveness of ErrorNet for cross-dataset evaluation, where the training set comes from the Aria dataset but the test set comes from the Drive dataset. Top: Fundus image. Bottom-Left: Segmentation result for a Drive dataset image from the segmentation network (before error correction). Bottom-Right: corresponding segmentation result generated by ErrorNet after error correction. The yellow boxes indicate example regions where the ErrorNet model has connected fragmented vessels or sharpened vessel structures.
Name Feature maps (input) Feature maps (output)
Encoder pathway Conv layer - 1a 640 x 640 x 1 640 x 640 x 32
Conv layer - 1b 640 x 640 x 32 640 x 640 x 32
Max pool - 1 640 x 640 x 32 320 x 320 x 32
Conv layer - 2a 320 x 320 x 32 320 x 320 x 64
Conv layer - 2b 320 x 320 x 64 320 x 320 x 64
Max pool - 2 320 x 320 x 64 160 x 160 x 64
Conv layer - 3a 160 x 160 x 64 160 x 160 x 128
Conv layer - 3b 160 x 160 x 128 160 x 160 x 128
Max pool - 3 160 x 160 x 128 80 x 80 x 128
Conv layer - 4a 80 x 80 x 128 80 x 80 x 256
Conv layer - 4b 80 x 80 x 256 80 x 80 x 256
Max pool - 4 80 x 80 x 256 40 x 40 x 256
Conv layer -5a 40 x 40 x 256 40 x 40 x 512
Conv layer - 5b 40 x 40 x 512 40 x 40 x 512
Decoder Pathway Upsample - 1 40 x 40 x 512 80 x 80 x 512
Concat - 1 80 x 80 x 512 (up sample - 1) 80 x 80 x 768
80 x 80 x 256 (conv - 4b)
Conv layer - 6a 80 x 80 x 768 80 x 80 x 256
Conv layer - 6b 80 x 80 x 256 80 x 80 x 256
Upsample - 2 80 x 80 x 256 160 x 160 x 256
Concat - 2 160 x 160 x 256 (upsample - 2 ) 160 x 160 x 384
160 x 160 x 128 (conv - 3b)
Conv layer - 7a 160 x 160 x 384 160 x 160 x 128
Conv layer - 7b 160 x 160 x 128 160 x 160 x 128
Upsample - 3 160 x 160 x 128 320 x 320 x 128
Concat - 3 320 x 320 x 128 (upsample - 3) 320 x 320 x 192
320 x 320 x 64 (conv - 2b)
Conv layer - 8a 320 x 320 x 192 320 x 320 x 64
Conv layer - 8b 320 x 320 x 64 320 x 320 x 64
Upsample - 4 320 x 320 x 64 640 x 640 x 64
Concat - 4 640 x 640 x 64 (upsample - 4) 640 x 640 x 96
640 x 640 x 32 (conv - 1b)
Conv layer - 9a 640 x 640 x 96 640 x 640 x 32
Conv layer - 9b 640 x 640 x 32 640 x 640 x 32
Output layer 640 x 640 x 32 640 x 640 x 1
Table A.1: Architecture details for the Segmentation Network. All convolution layers use 3x3 kernels. As suggested by [6], instance normalization and leaky-relu activation functions are used.
Name Feature maps (input) Feature maps (output)
Encoder Pathway Conv layer - 1a 640 x 640 x 1 640 x 640 x 32
Conv layer - 1b 640 x 640 x 32 640 x 640 x 32
Max pool - 1 640 x 640 x 32 320 x 320 x 32
Conv layer - 2a 320 x 320 x 32 320 x 320 x 64
Conv layer - 2b 320 x 320 x 64 320 x 320 x 64
Max pool - 2 320 x 320 x 64 160 x 160 x 64
Conv layer - 3a 160 x 160 x 64 160 x 160 x 128
Conv layer - 3b 160 x 160 x 128 160 x 160 x 128
Max pool - 3 160 x 160 x 128 80 x 80 x 128
encoder conv - 4a 80 x 80 x 128 80 x 80 x 512
encoder conv - 4b 80 x 80 x 512 80 x 80 x 1
encoder dense - mu 80 x 80 x 1 6400
encoder dense - sigma 80 x 80 x 1 6400
VAE latent space sampling sampling - 1 6400 (encoder dense - mu) 6400
6400 (encoder dense - sigma)
reshape - 1 6400 80 x 80 x 1
Decoder Pathway Conv transpose - 1 80 x 80 x 1 160 x 160 x 64
Conv layer - 5a 160 x 160 x 64 160 x 160 x 64
Conv layer - 5b 160 x 160 x 64 160 x 160 x 64
Conv transpose - 2 160 x 160 x 64 320 x 320 x 32
Conv layer - 6a 320 x 320 x 32 320 x 320 x 32
Conv layer - 6b 320 x 320 x 32 320 x 320 x 32
Upsample - 3 320 x 320 x 32 640 x 640 x 32
Conv layer - 7a 640 x 640 x 32 640 x 640 x 32
Conv layer - 7b 640 x 640 x 32 640 x 640 x 32
Output layer 640 x 640 x 32 640 x 640 x 2
Sigmoid layer 640 x 640 x 2 640 x 640 x 2
Table A.2: Architecture details for the Error Injection Network. All convolution layers use 3x3 kernels. Batch normalization and relu activation functions are used throughout the network.
Name Feature maps (input) Feature maps (output)
Encoder Pathway Concat - input 640 x 640 x 1 (input image) 640 x 640 x 2
640 x 640 x 1 (degraded segmentation)
Conv layer - 1a 640 x 640 x 2 640 x 640 x 32
Conv layer - 1b 640 x 640 x 32 640 x 640 x 32
Max pool - 1 640 x 640 x 32 320 x 320 x 32
Conv layer - 2a 320 x 320 x 32 320 x 320 x 64
Conv layer - 2b 320 x 320 x 64 320 x 320 x 64
Max pool - 2 320 x 320 x 64 160 x 160 x 64
Conv layer - 3a 160 x 160 x 64 160 x 160 x 128
Conv layer - 3b 160 x 160 x 128 160 x 160 x 128
Max pool - 3 160 x 160 x 128 80 x 80 x 128
Conv layer - 4a 80 x 80 x 128 80 x 80 x 256
Conv layer - 4b 80 x 80 x 256 80 x 80 x 256
Decoder Pathway Upsample - 2 80 x 80 x 256 160 x 160 x 256
Concat - 2 160 x 160 x 256 (upsample - 2 ) 160 x 160 x 384
160 x 160 x 128 (conv - 3b)
Conv layer - 7a 160 x 160 x 384 160 x 160 x 128
Conv layer - 7b 160 x 160 x 128 160 x 160 x 128
Upsample - 3 160 x 160 x 128 320 x 320 x 128
Concat - 3 320 x 320 x 128 (upsample - 3) 320 x 320 x 192
320 x 320 x 64 (conv - 2b)
Conv layer - 8a 320 x 320 x 192 320 x 320 x 64
Conv layer - 8b 320 x 320 x 64 320 x 320 x 64
Upsample - 4 320 x 320 x 64 640 x 640 x 64
Concat - 4 640 x 640 x 64 (upsample - 4) 640 x 640 x 96
640 x 640 x 32 (conv - 1b)
Conv layer - 9a 640 x 640 x 96 640 x 640 x 32
Conv layer - 9b 640 x 640 x 32 640 x 640 x 32
Output layer 640 x 640 x 32 640 x 640 x 1
Table A.3: Architecture details for the Error Prediction Network. All convolution layers use 3x3 kernels. Batch normalization and relu activation functions are used throughout the network.

Err Pred

VAE

Joint Tr

Train on CHASE ARIA
Architecture Test on CHASE DRIVE ARIA STARE HRF Avg. CHASE DRIVE ARIA STARE HRF Avg.
U-Net [6] 65.7 51.1 43.2 42.3 44.4 49.3 62.2 63.0 56.2 55.4 56.6 58.6
U-Net [6] + CRF 68.4 48.6 45.6 39.3 46.6 49.7 64.5 53.3 57.5 47.7 58.1 56.2
V-GAN[12] 66.3 55.6 47.3 43.9 49.7 52.5 52.3 61.0 53.7 49.5 53.0 53.9
DA-ADV [2] 56.6 53.0 51.7 47.8 50.8 52.0 55.6 57.4 57.7 55.4 54.7 56.2
66.9 52.2 43.6 43.1 45.7 50.3 62.3 56.4 56.2 56.1 56.5 57.5
66.8 56.0 44.0 42.7 50.6 52.0 61.6 63.0 56.5 56.5 57.2 59.0
ErrorNet w/ ablation 68.8 57.7 49.8 48.4 52.2 55.3 62.2 65.2 56.2 58.7 57.0 59.9
Table A.4: This table is similar to Table 2 in the main text with the difference being Dice is replaced with IoU for comparison. Comparing IoU- and Dice-based results shows that the winner in each category remains unchanged (highlighted in bold). As before, grey columns indicate same-domain evaluation whereas the other columns contain the results for cross-domain evaluation. ErrorNet outperforms the competing baselines on-average, but the performance gap is wider in the presence of domain shift (cross-domain evaluation) and small sample size (the small Chase dataset used for training). Ablation studies show that VAE and joint-training are effective in improving the performance of ErrorNet.

Footnotes

  1. Authors contributed equally

References

  1. C. Chen, Q. Dou, H. Chen, J. Qin and P. Heng (2019) Synergistic image and feature adaptation: towards cross-modality domain adaptation for medical image segmentation. arXiv preprint arXiv:1901.08211. Cited by: §2.
  2. N. Dong, M. Kampffmeyer, X. Liang, Z. Wang, W. Dai and E. Xing (2018) Unsupervised domain adaptation for automatic estimation of cardiothoracic ratio. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 544–552. Cited by: Table 2, §4, §4, Table A.4.
  3. Q. Dou, C. Ouyang, C. Chen, H. Chen, B. Glocker, X. Zhuang and P. Heng (2018) PnP-adanet: plug-and-play adversarial domain adaptation network with a benchmark at cross-modality cardiac segmentation. arXiv preprint arXiv:1812.07907. Cited by: §2.
  4. Y. Huo, Z. Xu, H. Moon, S. Bao, A. Assad, T. K. Moyo, M. R. Savona, R. G. Abramson and B. A. Landman (2018) Synseg-net: synthetic segmentation without target modality ground truth. IEEE transactions on medical imaging 38 (4), pp. 1016–1025. Cited by: §2.
  5. A. Imran, A. Hatamizadeh, S. P. Ananth, X. Ding, D. Terzopoulos and N. Tajbakhsh (2019) Automatic segmentation of pulmonary lobes using a progressive dense v-network. arXiv preprint arXiv:1902.06362. Cited by: §1.
  6. F. Isensee, P. Kickingereder, W. Wick, M. Bendszus and K. H. Maier-Hein (2018) No new-net. In International MICCAI Brainlesion Workshop, pp. 234–244. Cited by: Table 2, §4, §4, Table A.1, Table A.4.
  7. K. Kamnitsas, C. Ledig, V. F. Newcombe, J. P. Simpson, A. D. Kane, D. K. Menon, D. Rueckert and B. Glocker (2017) Efficient multi-scale 3d cnn with fully connected crf for accurate brain lesion segmentation. Medical image analysis 36, pp. 61–78. Cited by: §2.
  8. D. P. Kingma and M. Welling (2013) Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114. Cited by: §2.
  9. W. Kuo, C. Häne, E. Yuh, P. Mukherjee and J. Malik (2018) Cost-sensitive active learning for intracranial hemorrhage detection. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 715–723. Cited by: §1.
  10. A. J. Larrazabal, C. Martinez and E. Ferrante (2019) Anatomical priors for image segmentation via post-processing with denoising autoencoders. In Medical Image Computing and Computer Assisted Intervention – MICCAI 2019, pp. 585–593. Cited by: §2.
  11. T. Sakinis, F. Milletari, H. Roth, P. Korfiatis, P. Kostandy, K. Philbrick, Z. Akkus, Z. Xu, D. Xu and B. J. Erickson (2019) Interactive segmentation of medical images through fully convolutional neural networks. arXiv preprint arXiv:1903.08205. Cited by: §1.
  12. J. Son, S. J. Park and K. Jung (2017) Retinal vessel segmentation in fundoscopic images with generative adversarial networks. arXiv preprint arXiv:1706.09318. Cited by: Table 2, §4, §4, Table A.4.
  13. J. Sourati, A. Gholipour, J. G. Dy, X. Tomas-Fernandez, S. Kurugol and S. K. Warfield (2019) Intelligent labeling based on fisher information for medical image segmentation using deep learning. IEEE transactions on medical imaging. Cited by: §1.
  14. N. Tajbakhsh, L. Jeyaseelan, Q. Li, J. Chiang, Z. Wu and X. Ding (2019) Embracing imperfect datasets: a review of deep learning solutions for medical image segmentation. arXiv preprint arXiv:1908.10454. Cited by: §1, §2, §2.
  15. C. Wachinger, M. Reuter and T. Klein (2018) DeepNAT: deep convolutional neural network for segmenting neuroanatomy. NeuroImage 170, pp. 434–445. Cited by: §2.
  16. Y. Yang, A. Wong and S. Soatto (2019) Dense depth posterior (ddp) from single image and sparse range. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3353–3362. Cited by: §2.
  17. Z. Zhang, L. Yang and Y. Zheng (2018) Translating and segmenting multimodal medical volumes with cycle-and shape-consistency generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9242–9251. Cited by: §2.
  18. Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh and J. Liang (2018) Unet++: a nested u-net architecture for medical image segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pp. 3–11. Cited by: §1.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
406432
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description