U-Net Training with Instance-Layer Normalization

U-Net Training with Instance-Layer Normalization

Xiao-Yun Zhou 1The Hamlyn Centre for Robotic Surgery, Imperial College London, UK 1    Peichao Li 1The Hamlyn Centre for Robotic Surgery, Imperial College London, UK 1    Zhao-Yang Wang 1The Hamlyn Centre for Robotic Surgery, Imperial College London, UK 1    Guang-Zhong Yang 1The Hamlyn Centre for Robotic Surgery, Imperial College London, UK 12Institute of Medical Robotics, Shanghai Jiao Tong University, China 2xiaoyun.zhou14@imperial.ac.uk
Abstract

Normalization layers are essential in a Deep Convolutional Neural Network (DCNN). Various normalization methods have been proposed. The statistics used to normalize the feature maps can be computed at batch, channel, or instance level. However, in most of existing methods, the normalization for each layer is fixed. Batch-Instance Normalization (BIN) is one of the first proposed methods that combines two different normalization methods and achieve diverse normalization for different layers. However, two potential issues exist in BIN: first, the Clip function is not differentiable at input values of 0 and 1; second, the combined feature map is not with a normalized distribution which is harmful for signal propagation in DCNN. In this paper, an Instance-Layer Normalization (ILN) layer is proposed by using the Sigmoid function for the feature map combination, and cascading group normalization. The performance of ILN is validated on image segmentation of the Right Ventricle (RV) and Left Ventricle (LV) using U-Net as the network architecture. The results show that the proposed ILN outperforms previous traditional and popular normalization methods with noticeable accuracy improvements for most validations, supporting the effectiveness of the proposed ILN.

Keywords:
Instance-Layer Normalization Deep Convolutional Neural Network U-Net Biomedical Image Segmentation.

1 Introduction

Biomedical image segmentation is a fundamental step in medical image analysis, i.e., 3D shape instantiation for organs [16] and prosthesis [15, 14]. Most current popular methods are based on Deep Convolutional Neural Network (DCNN) which train multiple non-linear modules for feature extraction and pixel classification with both higher automation and performance. One fundamental component in DCNN is the normalization layer. Initially, one of the main motivations for normalization was to alleviate the internal covariate shift where layers’ input distribution changes [3]. However, recent work considers the use of normalization layer is beneficial, because it increases the robustness of the networks to fluctuation associated with random initialization [2], or it achieves smoother optimization landscape [9]. In this paper, we keep this motivation question open and focus on normalization strategies.

For a feature map with dimension of , where is the batch size, is the feature height, is the feature width, is the feature channel, Batch Normalization (BN) [3][4] was the first proposed normalization method which calculated the mean and variance of a feature map along the dimension, then re-scaled and re-translated the normalized feature map with additional trainable parameters to preserve the DCNN representation ability. Instance Normalization (IN) [10] which calculated the mean and variance along the dimension was proposed for fast stylization. Layer Normalization (LN) [1] which calculated the mean and variance along the dimension was proposed for recurrent networks. Group Normalization (GN) [12] calculated the mean and variance along the and multiple-channels dimension and was validated on image classification and instance segmentation. A review of these four normalization methods for training U-Net for medical image segmentation could be found in [17]. Weight normalization [8][13] based on re-parameterization on weights was used in recurrent models and reinforcement learning. Batch Kalman normalization estimated the mean and variance considering all preceding layers [11].

Recently, Nam et al. proposed Batch-Instance Normalization [5] (BIN), which combined BN and IN with a trainable parameter. However, two risks potentially exist: 1) the trainable parameter was restricted in the range of [0, 1] with Clip function which is not differentiable at input values of 0 and 1; 2) the combined feature map was no longer with a normalized distribution, which is harmful for signal propagation in DCNN. In this paper, Instance-Layer Normalization (ILN) is proposed to combine IN and LN: 1) Sigmoid is used to solve the non-differentiable characteristic of Clip function at input values of 0 and 1; 2) an additional GN16 - GN with a group number of 16 is added after the combined feature map to ensure a normalized distribution of the combined feature map. A widely-applied and popular network architecture - U-Net  [7] is used as the network to validate the proposed ILN on the Right Ventricle (RV) and Left Ventricle (LV) image segmentation. The proposed ILN outperforms existing normalization methods with noticeable accuracy improvements in most validations in terms of the Dice Similarity Coefficient (DSC).

2 Methodology

2.1 Instance-Layer Normalization

2.1.1 Instance Normalization

With a feature map F of dimension , IN calculates the mean and variance of F as:

(1)

Then, the feature map is normalized as :

(2)

where is a small value added for division stability. For the same feature map F, LN calculates the mean and variance as:

(3)

where F is normalized in a similar way of Equ. (2) to . A trainable parameter is added to combine and . In the original BIN [5], was clipped to be in the range of with a Clip function, as shown in Figure 1.

Figure 1: The curves of Clip and Sigmoid function.

However, Clip function is not differentiable at input values of and . In this paper, Sigmoid function which is differentiable everywhere is applied to solve this potential issue:

(4)

An additional potential issue in the original BIN is that the combined is no longer with a mean of and a variance of , this non-normalized distribution may be harmful for signal propagation in DCNN. In this paper, we solve this issue with applying an additional GN16 on the combined :

(5)
(6)

where is the channel number in each feature group, // is exact division, . The feature map is normalized in a similar way of Equ. (2) as . Following BN [3], additional parameters and are added to preserve the DCNN representation ability .

2.2 Experimental Setup

Network Architecture

A widely adopted network architecture in medical image segmentation, called U-Net [7], was used as the fundamental network framework with four max-pooling layers. The start feature channel number is 16. The normalization layer was added between the convolutional and Relu layer. Cross-entropy was used as the loss function. Momentum Stochastic Gradient Descent (SGD) was used as the optimizer with the momentum set as 0.9. Weights were initialized with a truncated normal distribution with the stddev as , where is the channel number. Biases were initialized as 0.1. was initialized as 0.5.

Data Collections

6082 RV images [16], scanned with a 1.5T Magnetic Resonance Imaging (MRI) machine (Sonata, Siemens, Erlangen, Germany), with slice gap of 10mm, pixel spacing of 1.52mm, image size of , from 37 subjects mixed with Hypertrophic Cardiomyopathy (HCM) patients and asymptomatic subjects, from the atrioventricular ring to the apex were used for the validation. The ground truth was labeled by one expert with Analyze (AnalyzeDirect, Inc, Overland Park, KS, USA). rotations were applied to augment the training images. 12, 12, 13 subjects for each group were split randomly for three-fold cross validation. 805 LV images [6], from SunnyBrook MRI dataset, with subject number of 45, image size of , were used for the validation as well. rotations were applied to augment the training images. 15 subjects for each group were split randomly for three-fold cross validation.

Implementation

As the proposed ILN needs to manipulate intermediate feature maps, the U-Net framework was implemented with low-level Tensorflow functions - tf.nn. In this paper, to ensure a fair comparison, all normalization methods were re-implemented into the same framework as the ILN implementation instead of using the available high-level Tensorflow Application Programming Interface (API) exists for some normalization methods in Tensorflow library, such as those used in [17].

Experiments

Following [17] and [18], two epochs were trained for each experiment with dividing the learning rate by 5 at the second epoch. Five initial learning rates were tested for each experiment and the best value was selected to be shown. DSC was used as the evaluation metric.

3 Result

To prove the advantage of using the Sigmoid function over the Clip function (in original BIN [5]), three comparison experiments were set up: 1) using Clip function with one trainable parameter for IN feature map while the parameter for LN feature map is ; 2) using Sigmoid function with one trainable parameter for IN feature map while the parameter for LN feature map is ; 3) using Softmax function with two trainable parameters for IN and LN feature map respectively. Comparison results are shown in Section 3.1.

To prove the advantage of adding GN16 after the combined feature map, two comparison experiments with or without GN16 are conducted. Results are shown in Section 3.2. Eight randomly-selected segmentation examples are shown in Section 3.3 for intuitive illustrations. As GN16 performed similarly to IN [17], no normalization, IN, LN, GN4 are chosen as the baseline to validate the performance of the proposed ILN, as presented in details in Section 3.4. The training curves of at eight randomly-selected layers are shown in Section 3.5. In this paper, RV-1 refers to the cross validation that uses the first group of RV data as testing while uses the second and third group of RV data as training. Similar fashions were applied as the notations of the experiments on the RV-2, RV-3, LV-1, LV-2, and LV-3.

3.1 Sigmoid vs. Clip vs. Softmax Function

The meanstd segmentation DSCs of using Clip, Sigmoid and Softmax function to combine the IN and LN feature map are shown in Table 1. We can see that Sigmoid function achieves the highest DSC for most cross validations, except RV-1 experiment, which proves the effectiveness of the proposed method in this paper - replacing the Clip function in original BIN [5] with Sigmoid function.

Method RV-1 RV-2 RV-3 LV-1 LV-2 LV-3
Clip 0.7020.295 0.7070.299 0.6660.319 0.9000.099 0.8640.184 0.8040.246
Sigmoid 0.6920.304 0.7240.284 0.6750.301 0.9030.118 0.8880.135 0.8280.189
Softmax 0.6880.290 0.7200.279 0.6640.323 0.8950.151 0.8660.153 0.8270.228
Table 1: Meanstd segmentation DSCs of using Clip, Sigmoid and Softmax function to combine the feature map of IN and LN, highest DSCs are in blue colour.

3.2 With or Without GN16

The meanstd segmentation DSCs of adding or not adding GN16 after the combined feature map of IN and LN are shown in Table 2. We can see that, the method with adding GN16 achieves the highest DSC for most cross validations, except LV-3 experiment. This result proves the effectiveness of adding GN16 after the combined feature map and also proves the importance of maintaining the normalized distribution of feature maps.

Method RV-1 RV-2 RV-3 LV-1 LV-2 LV-3
No 0.6920.304 0.7240.284 0.6750.301 0.9030.118 0.8880.135 0.8280.189
Yes 0.7140.290 0.7370.267 0.6800.305 0.9190.098 0.8930.127 0.8270.211
Table 2: Meanstd segmentation DSCs of adding or not adding GN16 after the combined feature map of IN and LN, highest DSCs are in blue colour.

3.3 Segmentation Examples

Eight segmentation examples were selected randomly from the RV and LV data to show the segmentation details in Figure 2. For most cases, both the RV and LV are segmented properly. However, for cases near the RV apex, i.e., the forth figure in the first row, the segmentation quality is worse. This might be due to the tissue adhesion and the small size of RV.

Figure 2: Eight examples were selected randomly from the RV and LV segmentation results, where red indicates the ground truth, green indicates the segmentation result, and yellow indicates the true positives of the prediction.

3.4 Comparison to Other Methods

The meanstd segmentation DSCs of using no normalization, IN, LN, GN4, and the proposed ILN with the U-Net framework are shown in Table 3. We can see that, except the LV-3 experiment, the proposed ILN outperforms all other traditional methods with considerable accuracy improvements. This result proves the effectiveness of the proposed ILN in medical image segmentation.

Method RV-1 RV-2 RV-3 LV-1 LV-2 LV-3
None 0.6880.296 0.6780.318 0.6610.323 0.8990.134 0.8720.167 0.7840.280
IN 0.7090.266 0.7150.278 0.6550.327 0.9050.114 0.8760.131 0.8360.207
LN 0.7020.287 0.7180.270 0.6620.309 0.8980.120 0.8580.187 0.7930.262
GN4 0.6790.303 0.7010.291 0.6710.309 0.9080.113 0.8410.196 0.8000.255
ILN 0.7140.290 0.7370.267 0.6800.305 0.9190.098 0.8930.127 0.8270.211
Table 3: Meanstd segmentation DSCs of using no normalization, IN, LN, GN4, and the proposed ILN with the U-Net framework, highest DSCs are in blue colour.

3.5 Training Curves of

The training curves of eight layers were selected randomly from LV-1 experiment to be shown in Figure 3. We can see that was trained to be different values and the proposed ILN achieved diverse normalization at different layers. As the ground truth of is not known and it is impossible to judge the curve correctness, a comparison regarding the training curves of ILN and BIN is not illustrated.

Figure 3: The training curves of eight selected randomly from the 22 layers in U-Net.

The CPU used is Intel Xeon(R) E5-1650 v4@3.60GHz12. The GPU used is Nvidia Titan XP. Comparing ILN to IN, the parameter number increases 22, as one parameter is added to each layer. The training time for 200 iterations increases from 34.8s to 36.5s due to the additional GN16 calculation.

4 Discussion

The proposed ILN strategy is generic and flexible. The three components, IN, LN and GN16 could be replaced with other normalization methods. The proposed ILN framework is validated on medical image segmentation with a U-Net framework. We believe that it could also be useful for other tasks, which needs further validation and exploration. The proposed ILN failed to achieve the highest DSC for the LV-3 experiment. It may due to that the combination of IN, LN and GN16 is not suitable for this experiment. In the future, the proposed ILN framework would be extended to combining more normalization methods.

5 Conclusion

To improve the accuracy of biomedical image segmentation based on U-net, the ILN was proposed to combine the feature map of IN and LN with an additional trainable parameter and Sigmoid function, then add GN16 after the combined feature map. Although, various normalization methods have been proposed, the noticeable accuracy improvements of the proposed ILN - almost DSC proves the importance of carefully tuning the normalization strategy when training DCNNs.

References

  • [1] J. L. Ba, J. R. Kiros, and G. E. Hinton (2016) Layer normalization. Stat 1050, pp. 21. Cited by: §1.
  • [2] N. Bjorck, C. P. Gomes, B. Selman, and K. Q. Weinberger (2018) Understanding batch normalization. In NeurIPS, pp. 7705–7716. Cited by: §1.
  • [3] S. Ioffe and C. Szegedy (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In ICML, pp. 448–456. Cited by: §1, §1, §2.1.1.
  • [4] S. Ioffe (2017) Batch renormalization: towards reducing minibatch dependence in batch-normalized models. In NeurIPS, pp. 1945–1953. Cited by: §1.
  • [5] H. Nam and H. Kim (2018) Batch-instance normalization for adaptively style-invariant neural networks. In NeurIPS, pp. 2563–2572. Cited by: §1, §2.1.1, §3.1, §3.
  • [6] P. Radau, Y. Lu, K. Connelly, G. Paul, A. Dick, and G. Wright (2009) Evaluation framework for algorithms segmenting short axis cardiac MRI. The MIDAS Journal-Cardiac MR Left Ventricle Segmentation Challenge 49. Cited by: §2.2.
  • [7] O. Ronneberger, P. Fischer, and T. Brox (2015) U-net: convolutional networks for biomedical image segmentation. In MICCAI, pp. 234–241. Cited by: §1, §2.2.
  • [8] T. Salimans and D. P. Kingma (2016) Weight normalization: a simple reparameterization to accelerate training of deep neural networks. In NeurIPS, pp. 901–909. Cited by: §1.
  • [9] S. Santurkar, D. Tsipras, A. Ilyas, and A. Madry (2018) How does batch normalization help optimization?. In NeurIPS, pp. 2488–2498. Cited by: §1.
  • [10] D. Ulyanov, A. Vedaldi, and V. Lempitsky (2016) Instance normalization: the missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022. Cited by: §1.
  • [11] G. Wang, J. Peng, P. Luo, X. Wang, and L. Lin (2018) Batch kalman normalization: towards training deep neural networks with micro-batches. arXiv preprint arXiv:1802.03133. Cited by: §1.
  • [12] Y. Wu and K. He (2018) Group normalization. In ECCV, pp. 3–19. Cited by: §1.
  • [13] Y. Xu and X. Wang (2018) Understanding weight normalized deep neural networks with rectified linear units. In NeurIPS, pp. 130–139. Cited by: §1.
  • [14] X. Zhou, J. Lin, C. Riga, G. Yang, and S. Lee (2018) Real-time 3D shape instantiation from single fluoroscopy projection for fenestrated stent graft deployment. IEEE RAL 3 (2), pp. 1314–1321. Cited by: §1.
  • [15] X. Zhou, C. Riga, S. Lee, and G. Yang (2018) Towards automatic 3D shape instantiation for deployed stent grafts: 2D multiple-class and class-imbalance marker segmentation with equally-weighted focal U-Net. In 2018 IEEE/RSJ IROS, pp. 1261–1267. Cited by: §1.
  • [16] X. Zhou, G. Yang, and S. Lee (2018) A real-time and registration-free framework for dynamic shape instantiation. MedIA 44, pp. 86–97. Cited by: §1, §2.2.
  • [17] X. Zhou and G. Yang (2019) Normalization in training U-Net for 2D biomedical semantic segmentation. IEEE RAL. Cited by: §1, §2.2, §2.2, §3.
  • [18] X. Zhou, J. Zheng, and G. Yang (2019) Atrous convolutional neural network (ACNN) for biomedical semantic segmentation with dimensionally lossless feature maps. arXiv preprint arXiv:1901.09203. Cited by: §2.2.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
387469
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description