Sharpness-aware Low dose CT denoising using conditional generative adversarial network
Low Dose Computed Tomography (LDCT) has offered tremendous benefits in radiation restricted applications, but the quantum noise as resulted by the insufficient number of photons could potentially harm the diagnostic performance. Current image-based denoising methods tend to produce a blur effect on the final reconstructed results especially in high noise levels. In this paper, a deep learning based approach was proposed to mitigate this problem. An adversarially trained network and a sharpness detection network were trained to guide the training process. Experiments on both simulated and real dataset shows that the results of the proposed method have very small resolution loss and achieves better performance relative to the-state-of-art methods both quantitatively and visually.
Keywords:Low Dose CT Denoising Conditional Generative Adversarial Networks Deep Learning Sharpness Low Contrast
The use of Computed Tomography (CT) has rapidly increased over the past decade, with an estimated 80 million CT scans performed in 2015 in the United States dose. Although CT offers tremendous benefits, its use has lead to significant concern regarding radiation exposure. To address this issue, the as low as reasonably achievable (ALARA) principle has been adopted to avoid excessive radiation dose for the patient.
Diagnostic performance should not be compromised when lowering the radiation dose. One of the most effective ways to reduce radiation dose is to reduce tube current, which has been adopted in many imaging protocols. However, low dose CT (LDCT) inevitably introduces more noise than conventional CT (convCT), which may potentially impede subsequent diagnosis or require more advanced algorithms for reconstruction. Many works have been devoted to CT denoising with promising results achieved by a variety of techniques, including those in the image, and sinogram domains and with iterative reconstruction techniques. One recent technique of increasing interest is deep learning (DL).
DL has been shown to exhibit superior performance on many image related tasks, including low level edge detection bertasius2015deepedge, image segmentation yi2016lbp, and high level vision problems including image recognition he2016deep, and image captioning vinyals2015show, with these advances now being brought into the medical domain chen2016low; chen2017low; kang2016deep; yang2017ct. In this paper, we explore the possibility of applying generative adversarial neural net (GAN) goodfellow2014generative to the task of LDCT denoising.
In many image related reconstruction tasks, e.g. super resolution and inpainting, it is known that minimizing the per-pixel loss between the output image and the ground truth alone generate either blurring or make the result visually not appealing huang2017beyond; ledig2016photo; zhang2017image. We have observed the same effect in the traditional neural network based CT denoising works chen2016low; chen2017low; kang2016deep; yang2017ct. The adversarial loss introduced by GAN can be treated as a driving force that can push the generated image to reside in the manifold of convCTs, reducing the blurring effect. Furthermore, an additional sharpness detection network was also introduced to measure the sharpness of the denoised image, with focus on low contrast regions. SAGAN (sharpness-aware generative adversarial network) will be used to denote this proposed denoising method in the remainder of the paper.
2 Related Works
LDCT Denoising algorithms can be broadly categorized into three groups, those conducted within the sinogram or image domains and iterative reconstruction methods (which iterate back and forth across the sinogram and image domains).
The CT sinogram represents the attenuation line integrals from the radial views and is the raw projection data in the CT scan. Since the sinogram is also a 2-D signal, traditional image processing techniques have been applied for noise reduction, such as bilateral filtering manduca2009projection, structural adaptive filtering balda2012ray, etc. The filtered data can then be reconstructed to a CT image with methods like filtered back projection (FBP). Although the statistical property of the noise can be well characterized, these methods require the availability of the raw data which is not always accessible. In addition, by application of edge preservation smoothing operations (bilateral filtering), small edges would inevitably be filtered out and lead to loss of structure and spatial resolution in the reconstructed CT image.
Note that the above method only performs a single back projection to reconstruct the original image. Another stream of works performs an additional forward projection, mapping the reconstructed image to the sinogram domain by modelling the acquisition process. Corrections can be made by iterating the forward and backward process. This methodology is referred as model-based iterative reconstruction (MBIR). Usually, MBIR methods model scanner geometry and physical properties of the imaging processing, e.g. the photon counting statistics and the polychromatic nature of the source x-ray beister2012iterative. Some works add prior object information to the model to regulate the reconstructed image, such as total variation minimization tian2011low; zhu2010duality, Markov random fields based roughness or a sparsity penalty bouman1993generalized. Due to its iterative nature, MBIR models tend to consume excessive computation time for the reconstruction. There are works that are trying to accelerate the convergence behaviour of the optimization process, for example, by variable splitting of the data fidelity term ramani2012splitting or by combining Nesterov’s momentum with ordered subsets method kim2015combining.
To employ the MBIR method, one also has to have access to the raw sinogram data. Image-based deniosing methods do not have this limitation. The input and output are both images. Many of the denoising methods for LDCT are borrowed from natural image processing field, such as Non-Local means buades2005non and BM3D dabov2007image. The former computes the weighted average of similar patches in the image domain while the latter is computed in a transform domain. Both methods assume the redundancy of image information. Inspired by these two seminal works, many applications have emerged applying them into LDCTs chen2009bayesian; chen2012thoracic; green2016efficient; ha2015low; ma2011low; zhang2014statistical; zhang2015statistical. Another line of work focuses on compressive sensing, with the underlying assumption that every local path can be represented as a sparse combination of a set of bases. In the very beginning, the bases are from some generic analytic transforms, e.g. discrete gradient, contourlet po2006directional, curvelet candes2002recovering. Chen et al. built a prior image constrained compressed sensing (PICCS) algorithm for dynamic CT reconstruction under reduced views based on discrete gradient transform lubner2011reduced. It has been found that these transforms are very sensitive to both true structures and noise. Later on, bases learned directly from the source images were used with promising results. Algorithms like K-SVD aharon2006rm have made dictionary learning very efficient and has inspired many applications in the medical domain chen2013improving; chen2014artifact; li2012efficient; lubner2011reduced; xu2012low.
Convolutional neural network (CNN) based methods have recently achieved great success in image related tasks. Although its origins can be traced back to the 1980s, the resurgence of CNN can be greatly attributed to increased computational power and recently introduced techniques for efficient training of deep networks, such as BatchNorm ioffe2015batch, Rectifier linear units glorot2011deep and residual connection he2016deep. Chen et al. chen2016low first used CNN to denoise CT images by learning a patch-based neural net and later on refined it with a encoder and decoder structure for end-to-end training chen2017low. Kang et al. kang2016deep devised a 24 convolution layer net with by-pass connection and contracting path for denoising but instead of mapping in the image domain, it performed end-to-end training in the wavelet domain. Yang et al. yang2017ct adopted perceptual loss into the training, which measures the difference of the processed image and the ground truth in a high level feature space projected by a pre-trained CNN. Suzuki et al. suzuki2017neural proposed to use a massive-training artificial neural network (MTANN) for CT denoising. The network accepts local patches of the LDCT and regressed to the center value of the corresponding patch of the convCT.
Generative adversarial network was first introduced in 2014 by Goodfellow et al. goodfellow2014generative. It is a generative model trying to generate real world images by employing a min-max optimization framework where two networks (Generator and Discriminator ) are trained against each other. tries to synthesize real appearing images from random noise whereas is trying to distinguish between the generated and real images. If the Generator get sufficiently well trained, the Discriminator will eventually be unable to tell if the generated image is fake or not.
The original setup of GAN does not contain any constraints to control what modes of data it can generate. However, if auxiliary information were provided during the generation, GAN can be driven to output images with specific modes. GAN in this scenario is usually referred as conditional GAN (cGAN) since the output is conditioned on additional information. Mirza et al. supplied class label encoded as one hot vector to generate MINIST digits mirza2014conditional. Other works have exploited the same class label information but with different network architecture odena2016conditional; odena2016semi. Reed et al. fed GAN with text descriptions and object locations reed2016generative; reed2016learning. Isola et al. proposed to do a image to image translation with GAN by directly supplying the GAN with images isola2016image. In this framework, training images must be aligned image pairs. Later on, Zhu et al. relaxed this restriction by introducing the cycle consistency loss so that images can be translated between two sets of unpaired samples zhu2017unpaired. But as also mentioned in their paper, the paired training remains the upper bound. Pathak et al. generated missing image patches conditioned on the surrounding image context pathak2016context. Sangkloy et al. generated images constrained by the sketched boundaries and sparse colour strokes sangkloy2016scribbler. Shrivastava et al. refined synthetic images with GAN trying to narrowing the gap between the synthetic images and real image shrivastava2016learning. Walker et al. adopted a cGAN by conditioning on the predicted future pose information to synthesize future frames of a video walker2017pose.
Two works have also applied cGAN for CT denoising. In both their works, together with ours, the denoised image is generated by conditioning on the low dose counterparts. Wolterink et al. employed a vanilla cGAN where the generator was a 7 layers all convolutional network and the discriminator is a network to differentiate the real and denoised cardiac CT using cross entropy loss as the objective function wolterink2017generative. Yang et al. adopted Wasserstein distance for the loss of the discriminator and incorporated perceptual loss to ensure visual similarity yang2017low. Using Wasserstein distance was claimed to be beneficial at stabilizing the training of GAN but not claimed to generate images of better quality arjovsky2017wasserstein; gulrajani2017improved. Our work differs in many ways and we would like to highlight some key points here. First, in our work, the generator used a U-Net style network with residual components and is deeper than the other two works. The superiority of the proposed architecture in retaining small details was shown in our simulated noise experiment. Second, our discriminator differentiates patches rather than full images which makes the resulted network have fewer parameters and applicable to arbitrary size image. Third, CT scans of a series of dose levels and anatomic regions were evaluated in this for the generality assessment. Noise and artifacts differs throughout the body. Our work showed that a singe network could potentially denoise all anatomies. Finally, a sharpness loss was introduced to ensure the final sharpness of the image and the faithful reconstruction of low contrast regions.
Sharpness Detection The sharpness detection network should be sensitive to low contrast regions. Traditional methods based on local image energy have intrinsic limitations which is the sensitivity to both the blur kernel and the image content. Recent works have proposed more sophisticated measures by exploiting the statistic differences of specific properties of blur and sharp region, e.g. gradient shi2014discriminative, local binary pattern yi2016lbp, power spectrum slope shi2014discriminative; vu2012spectral, Discrete Cosine Transform (DCT) coefficient golestaneh2017spatially. Shi et al. used sparse coding to decompose local path and quantize the local sharpness with the number of reconstructed atoms shi2015. There is research that tries to directly estimate the blur kernel but the estimated maps tend to be very coarse and the optimization process is very time consuming zhu2013estimating; chakrabarti2010analyzing. There are other works that can produce a sharpness map, such as in depth map estimation zhuo2011defocus, or blur segmentation tang2016spectral, but the depth map is not necessarily corresponding to the amount of sharpness and they tend to highlight blurred edges or insensitive to the change of small amount of blur. In this work, we adopted the method of yi2016lbp given its sensitivity to sharp low contrast regions. Detailed description can be found in section LABEL:sec:sharp and LABEL:sharp.
The rest of the paper is organized as follows. The proposed method is described in section 3. Experiments and results are presented in section LABEL:exp and LABEL:result. Discussion of the potential of the proposed method is in LABEL:discussion with conclusion drawn in section LABEL:conclusion.