Self-Supervised Training For Low Dose CT Reconstruction

Self-Supervised Training For Low Dose CT Reconstruction


Ionizing radiation has been the biggest concern in CT imaging. To reduce the dose level without compromising the image quality, low dose CT reconstruction has been offered with the availability of compressed sensing based reconstruction methods. Recently, data-driven methods got attention with the rise of deep learning, the availability of high computational power, and big datasets. Deep learning based methods have also been used in low dose CT reconstruction problem in different manners. Usually, the success of these methods depends on clean labeled data. However, recent studies showed that training can be achieved successfully without clean datasets. In this study, we defined a training scheme to use low dose sinograms as their own training targets. We applied the self-supervision principle in the projection domain where the noise is element-wise independent as required in these methods. Using the self-supervised training, the filtering part of the FBP method and the parameters of a denoiser neural network are optimized. We demonstrate that our method outperforms both conventional and compressed sensing based iterative reconstruction methods qualitatively and quantitatively in the reconstruction of analytic CT phantoms and real-world CT images in low dose CT reconstruction task.


Mehmet Ozan Unal   Metin Ertas   Isa Yildirim \address\ninept Istanbul Technical University - Electronics and Communication Engineering Department, Istanbul, Turkey
{keywords} Low dose computed tomography, deep learning, reconstruction

1 Introduction

X-ray Computed Tomography (CT) uses ionizing radiation to noninvasively monitorize the inside of human body. Ionizing radiation can be harmful to human body, therefore reducing the radiation dose without sacrificing its imaging quality is crucial. Recently, deep learning (DL) has become an alternative to solve low dose CT problem. In earlier studies, the supervised convolutional neural network (CNN) training methods were applied to denoise low dose CT reconstructions [2, 5]. Generative adversarial networks (GANs) have become the rising tool for generative problems [3]. GAN training has also been applied with various loss variants and training methods for low dose CT reconstruction [12, 8, 13]. Although DL based methods have promising results, its success usually depends on clean labeled datasets. Besides, unlike the natural image domain, evaluating the quality of the datasets requires domain experts such as radiologists. Therefore, solving this problem with classical DL methods is both cost and time intensive. However, recently it was shown that successful training could also be possible with noisy labels for image denoising problems [7]. It was also shown that noisy images could be used as their own training targets [1, 6, 9]. However, these methods were designed to be used in image denoising problems. In this manner, we extended the self-supervised approach of these methods for low dose CT reconstruction problem. Therefore, we designed an algorithm which applies the self-supervision in the sinogram domain with the help of differentiable backward and forward operators of CT reconstruction. Our method aims to optimize the filtering part of the filtered back projection (FBP) method and learn the parameters of the denoiser neural network via self-supervised training. Our method is realized in three different manners: i) single sinogram self-supervised, ii) learned single shot, iii) learned self-supervised.

2 Related Work

A classical supervised training procedure for denoising problems can be defined as:


where is the expected value over , (, ) are paired samples: is the noisy measurements and is the clean measurements of the same information, is a deep neural network parameterized by . During the optimization process, a function that maps noisy images to clean images is learned. Recently proposed Noise2Noise method claimed that supervised training is possible with only noisy measurements given that the noise is independent and additive. In addition, it was also shown that training with noisy image pairs should give similar results to training with noisy-clean image pairs [7].

Noise2Noise method enabled training through noisy targets but still, two noisy measurements of the same information are required. Noise2Self is a method that enables training with just noisy measurements via self-supervision [1]. The study proposed that noisy image itself can be used as its own target. Since the neural network can minimize the loss function just by converging to the identity function, to solve this issue, Noise2Self method suggested a masking mechanism that perturbs the image according to certain rules and prevents the neural network to converge identity function which is called J invariant. The proposed cost function by Noise2Self is formulated as follows:


where is a subset of pixels, is the subset of , is the modified form of in such a way that pixels of are modified using the pixel values of except . In other words, subset of is perturbed using any pixel other than subset. It is used as the input of the denoiser () and the reconstruction loss is only calculated for the subset of the pixels. Noise2Self study showed that training with self-supervised loss should result in similar to training with clean targets if the following conditions are met: i) and should be conditionally independent. ii) noise should be element-wise independent.

The opportunity of training on noisy labels could be quite valuable for medical linear inverse problems such as low dose CT reconstruction. However, Noise2Self method may not be applied in a straight-forward way on low dose CT reconstruction problem, since the noise and artifacts, which are formed during the reconstruction process, are not element-wise independent. Therefore, we developed a method which applies self-supervision on the projection domain where the noise can be modeled as independent. There is also a method which applies self-supervision on a single image and tries to learn an image domain denoiser [4]. However, in our approach, we used self-supervision to train both a frequency-domain filter that optimizes the filtering part of FBP and a neural network that denoises the reconstruction.

3 Method

Figure 1: Proposed working schema for self-supervised low dose CT reconstruction.

In this section, we explained how the self-supervision principle of Noise2Self method is extended for low dose CT image reconstruction. Since FBP creates linearly dependent artifacts, it is not feasible to apply Noise2Self method for low dose CT imaging in a straightforward sense. Regarding that, self-supervision should be exploited in such a domain as the requirements of Noise2Self method are met. In this manner, we designed a training scheme which learns a function to map low dose CT images to standard dose CT images without any standard dose - low dose image pairs dataset. To enable training only with low dose CT data, we applied the J invariant principle of Noise2Self method in the projection domain where the noise can be modeled as element-wise independent. The suggested optimization process can be formulated as:

Figure 2: From left to right: ground truth, FBP, SART, SART+TV, SART+BM3D, the proposed method (learned self-supervised).
Ellipses Phantoms Real CT
32 view 64 view 32 view 64 view
N2S Self Sup.
N2S Sng. Shot
N2S Learned
Table 1: The average performance of the methods are given with PSNR and SSIM metrics respectively.

where is the expectation operator over , is the forward operator of the inverse problem in our case Radon transform, is a deep neural network which is parameterized with , is filtered back-projection reconstruction operator which has modified frequency-domain filter and is parameterized with , is the perturbed form of the projections which is obtained by perturbing the subset of the projections (), is the perturbed set of pixels. The working principle and training schema of the reconstruction is given in Fig. 1. The working principle can be examined in three parts:

Preprocessing: Ground truth images are converted to projections via Radon transform and contaminated with additive white Gaussian noise (AWGN). These sparsely sampled sinograms are splitted into two groups as training and test sets.

Training: The only input of training is sparsely sampled sinograms. The loss is calculated with self-supervision. First, projections are perturbed and used as the input of to calculate the initial image. The initial image is denoised with a deep neural network (). The denoised image is transformed to the projection domain via Radon transform. The loss is calculated between back-projected measurements and the real measurements only at the pixels which are modified during the perturbation operation at the beginning of the reconstruction. In other words, only modified pixels are used to calculate the loss to satisfy the J invariant principle of Noise2Self method.

Reconstruction: During the reconstruction, the projections are used as the input of trained method without any perturbation to calculate initial reconstruction. After initial reconstruction, images are denoised with the trained deep neural network ().

4 Experiments

The source code and the experiments are available at code repository1.

4.1 Experiment Settings

Deep lesion dataset [11] and ellipses dataset were used as training data. SkipNet [10] was selected as denoiser neural network architecture. The image resolution was selected as , and all projections were uniformly distributed between . Our method was compared with FBP, SART, SART+TV and SART+BM3D. The proposed idea was implemented in three different approaches:

Self-Supervised: It was tested without any pre-learning process. A randomly initialized neural network was trained only with the single noisy image with the self-supervision principle which is given in the training part of Fig.1. During the experiments, the complete reconstructions took iterations with the learning rate .

Learned Single Shot: The neural network was trained with a dataset and it calculates the denoised low dose CT reconstruction in a single shot. In our case, the neural network was trained on samples from ellipses dataset for iterations batch size of .

Learned Self-Supervised: The neural network was trained with a dataset and it was fine-tuned with self-supervised training. This method generated the most successful results during the experiments.

To quantitatively analyze the methods, images from the ellipses dataset and images from the deep lesion dataset were selected to cover a more comprehensive part of different tissue intensity and feature scenarios. Images were reconstructed at different settings and the results are given in Table 1.

4.2 Results

In Fig. 2, the reconstruction of an image from the ellipses dataset can be examined. The reconstructed image by the proposed method created a sharper image with better noise redundancy performance.

Medical CT image results are given in Fig. 3. The proposed method reconstructed the features more precisely while keeping the background smooth and clean. Although in some cases SART+BM3D method gives close or better quantitative results compared to the proposed method, it is clearly seen that the proposed method preserves the fine features more accurately.

Figure 3: From top left to buttom right: ground truth, FBP, SART, SART+TV, SART+BM3D, the proposed method (learned self-supervised).

5 Conclusion

In this study, we showed that self-supervised training can be a suitable candidate to solve the low dose CT reconstruction problem. Its performance was evaluated with both real CT images and analytical phantoms. Since it does not require clean datasets, it is possible to use this method for numerous domains, particularly when collecting the clean data is challenging.

6 Compliance with Ethical Standards

This research study was conducted retrospectively using human subject data made available in open access by the National Institutes of Health’s Clinical Center as DeepLesion dataset [11]. Ethical approval was not required as confirmed by the license attached with the open access data.

7 Acknowledgments

No funding was received for conducting this study. The authors have no relevant financial or non-financial interests to disclose.






  1. J. Batson and L. Royer. Noise2self: Blind denoising by self-supervision. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97, pages 524–533. PMLR, 2019.
  2. H. Chen, Y. Zhang, M. K. Kalra, F. Lin, Y. Chen, P. Liao, J. Zhou, and G. Wang. Low-dose ct with a residual encoder-decoder convolutional neural network. IEEE Transactions on Medical Imaging, 36(12):2524–2535, 2017.
  3. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems 27, pages 2672–2680. Curran Associates, Inc., 2014.
  4. A. A. Hendriksen, D. M. Pelt, and K. J. Batenburg. Noise2inverse: Self-supervised deep convolutional denoising for tomography. IEEE Transactions on Computational Imaging, page 1–1, 2020.
  5. K. H. Jin, M. T. McCann, E. Froustey, and M. Unser. Deep convolutional neural network for inverse problems in imaging. IEEE Transactions on Image Processing, 26(9):4509–4522, 2017.
  6. A. Krull, T. Buchholz, and F. Jug. Noise2void - learning denoising from single noisy images. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 2129–2137. Computer Vision Foundation / IEEE, 2019.
  7. J. Lehtinen, J. Munkberg, J. Hasselgren, S. Laine, T. Karras, M. Aittala, and T. Aila. Noise2noise: Learning image restoration without clean data. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, pages 2971–2980. PMLR, 2018.
  8. Z. Liu, T. Bicer, R. Kettimuthu, D. Gursoy, F. D. Carlo, and I. Foster. Tomogan: low-dose synchrotron x-ray tomography with generative adversarial networks: discussion. J. Opt. Soc. Am. A, 37(3):422–434, Mar 2020.
  9. Y. Quan, M. Chen, T. Pang, and H. Ji. Self2self with dropout: Learning self-supervised denoising from single image. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1887–1895, 2020.
  10. X. Wang, F. Yu, Z. Dou, T. Darrell, and J. E. Gonzalez. Skipnet: Learning dynamic routing in convolutional networks. In Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XIII, volume 11217, pages 420–436. Springer, 2018.
  11. K. Yan, X. Wang, L. Lu, and R. M. Summers. Deeplesion: Automated deep mining, categorization and detection of significant radiology image findings using large-scale clinical lesion annotations, 2017.
  12. Q. Yang, P. Yan, Y. Zhang, H. Yu, Y. Shi, X. Mou, M. K. Kalra, Y. Zhang, L. Sun, and G. Wang. Low-dose ct image denoising using a generative adversarial network with wasserstein distance and perceptual loss. IEEE Transactions on Medical Imaging, 37(6):1348–1357, 2018.
  13. C. You, G. Li, Y. Zhang, X. Zhang, H. Shan, M. Li, S. Ju, Z. Zhao, Z. Zhang, W. Cong, M. W. Vannier, P. K. Saha, E. A. Hoffman, and G. Wang. Ct super-resolution gan constrained by the identical, residual, and cycle learning ensemble (gan-circle). IEEE Transactions on Medical Imaging, 39(1):188–203, 2020.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description