CT Super-resolution GAN Constrained by the Identical, Residual, and Cycle Learning Ensemble (GAN-CIRCLE)
Computed tomography (CT) is a popular medical imaging modality for screening, diagnosis, and image-guided therapy. However, CT has its limitations, especially involved ionizing radiation dose. Practically, it is highly desirable to have ultrahigh quality CT imaging for fine structural details at a minimized radiation dosage. In this paper, we propose a semi-supervised deep learning approach to recover high-resolution (HR) CT images from low-resolution (LR) counterparts. Especially, with the generative adversarial network (GAN) as the basic component, we enforce the cycle-consistency in terms of the Wasserstein distance to establish a nonlinear end-to-end mapping from noisy LR input images to denoised HR outputs. In this deep imaging process, we incorporate deep convolutional neural network (CNNs), residual learning, and network in network techniques for feature extraction and restoration. In contrast to the current trend of increasing network depth and complexity to boost the CT imaging performance, which limit its real-world applications by imposing considerable computational and memory overheads, we apply a parallel 1x1 CNN to reduce the dimensionality of the output of the hidden layer. Furthermore, we optimize the number of layers and the number of filters for each CNN layer. Quantitative and qualitative evaluations demonstrate that our proposed model is accurate, efficient and robust for SR image restoration from noisy LR input images. In particular, we validate our composite SR networks on two large-scale CT datasets, and obtain very encouraging results as compared to the other state-of-the-art methods.
X-ray computed tomography (CT) is one of the most popular medical imaging methods for screening, diagnosis, and image-guided intervention . Higher-resolution (HR) CT (HRCT) images usually allow more better radiomics features and more accurate diagnosis. Therefore, super-resolution (SR) methods in the CT field are receiving more and more attention. The image resolution of a CT imaging system is constrained by x-ray focal spot size, detector element pitch, reconstruction algorithm, and other factors. While physiological and pathological units in the human body are on an order of 10 microns, the in-plane and through-plane resolution of clinical CT systems are on an order of submillimeter or 1 [2, 3]. Even though the modern CT imaging and visualization software can generate any small voxels, the intrinsic resolution is still far lower than what is ideal such as early tumor characterization and coronary artery analysis . Consequently, how to reconstruct HRCT images at a minimum radiation dose level is a holy grail of the CT field.
In general, there are two strategies for improving CT image resolution: (1) hardware-oriented and (2) computational. First, more sophisticated hardware components can be used, including an x-ray tube with a fine focal spot size, detector elements of small pitch, and better mechanical precision for CT scanning. These hardware-oriented methods are generally expensive, increase the CT system cost and radiation dose, and compromise the imaging speed. Especially, it is well known that high X radiation dosage in a patient could induce genetic damages and cancerous diseases [5, 6, 7]. As a result, the second type of methods for resolution improvement [8, 9, 10, 11, 12, 13, 14] is more attractive, which is to obtain HRCT images from LRCT images. This computational deblurring job is a major challenge, representing a seriously ill-posed inverse problem [2, 15]. Our neural network approach proposed in this paper is computational, utilizing advanced network architectures. More details are as follows.
To reconstruct HRCT images, various algorithms were proposed. These algorithms can be broadly categorized into the following classes: (1) Model-based reconstruction methods [16, 17, 18, 19, 20]: These techniques explicitly model the image degradation process and regularize the reconstruction according to the characteristics of projection data. These algorithms promise an optimal image quality under the assumption that model-based priors can be effectively imposed; and (2) Learning-based (before deep learning) SR methods [21, 22, 23, 24, 25, 26, 27, 28]: These methods learn a nonlinear mapping from a training dataset consisting of paired LR and HR images to recover missing high-frequency details. Especially, sparse representation-based approaches have attracted an increasing interest since it exhibits strong robustness in preserving image features, suppressing noise and artifacts. For example, Dong et al.  applied adaptive sparse domain selection and adaptive regularization to obtain excellent SR results in terms of both visual perceptions and PSNR. Zhang et al.  proposed a patch-based technique for SR enhancement of 4D-CT images. These results demonstrate that learning-based SR methods can greatly enhance overall image quality but the outcomes may still lose image subtleties and yield blocky appearance.
Recently, deep learning (DL) has been instrumental for computer vision tasks [29, 30, 31]. Convolutional Neural Networks (CNNs) learn hierarchical features and representations, leadning to a large number of CNN-based SR models for natural images [32, 33, 34, 35, 36, 37, 38, 39]. The key for the success of DL-based methods is its independence from explicit imaging models and backup by big domain-specific data. The image quality is optimized by learning features in an end-to-end manner. More importantly, Once a CNN-based SR model is trained, achieving SR is a purely feed-forward propagation, which demands a very low computational overhead.
In the medical imaging field, DL is an emerging approach which has exhibited a great potential [40, 41, 42]. For several imaging modalities, DL-based SR methods were successfully developed [43, 44, 45, 46]. Chen et al.  proposed a deep densely connected super-resolution network to reconstruct HR brain magnetic resonance (MR) images. Chaudhari et al.  developed a CNN-based network termed DeepResolve to learn a residual transformation from LR images to the corresponding HR images. More recently, Yu et al.  proposed two advanced CNN-based models with a skip connection to promote high-frequency textures which are then fused with up-sampled images to produce SR images.
|Feature extraction network||Reconstruction network|
Very recently, adversarial learning [47, 48], has become hot, which enables the CNN to learn the hierarchy of feature representations from data, with unprecedented successes. Adversarial learning is performed based on a generative adversarial network (GAN), defined as a mini-max game in which two competing players are a generator and a discriminator . In the game, is trained to learn a mapping from source images in a source domain to target images in the target domain . On the other hand, distinguishes the generated images and the real images with a binary label. Once well trained, GANs is able to model a high-dimensional distribution of target images. Wolterink et al.  proposed a unsupervised conditional GAN to optimize the nonlinear mapping from LR images to HR images, successfully enhancing the overall image quality.
However, there are still several major limitations in the DL-based SR imaging. First, existing supervised DL-based algorithms cannot address blind SR tasks without LR-HR pairs. In clinical practice, the limited number of LR and HR CT image pairs makes the supervised learning methods impractical since it is feasible to ask patients to take multiple CT scans with additional radiation doses for paired CT images. Thus, it is essential to resort to semi-supervised learning. Second, utilizing the adversarial strategy can push the generator to learn an inter-domain mapping and produce compelling target images  but there is a potential risk that the network may yield features that are not exhibited in target images due to the degeneracy of the mapping. Since the optimal is capable of translating to distributed identically to , the GAN network cannot ensure that the noisy input and predicted output are paired in a meaningful way - there exist many mappings that may yield the same distribution over . Consequently, the mapping is highly under-constrained. Furthermore, it is undesirable to optimize the adversarial objective in isolation: the model collapse problem may occur to map all inputs to the same output image [48, 51]. To address this issue, Cycle-Consistent GANs (cycleGAN) were designed to improve the performance of generic GANs, and utilized for SR imaging . Third, other limitations of GANs were also pointed out in [52, 53]. How to steer a GAN learning process is not easy since may collapse into a narrow distribution which cannot represent diverse samples from a real data distribution. Also, there is no interpretable metric for training progress. Fourth, as the number of layers increases, deep neural networks can derive a hierarchy of increasingly more complex and more abstract features. Frequently, to improve the SR imaging capability of a network, complex networks are often tried with hundreds of millions of parameters. However, given the associated computational overheads, they are hard to use in real-world applications. Fifth, local feature parts in the CT image have different scales. This feature hierarchy can provide more information to reconstruct images, but most DL-based methods [35, 36, 37] neglect to use hierarchical features. Finally, the distance between and is commonly used for the loss function to guide the training process of the network. However, the output optimized by the norm may suffer from over-smoothing as discussed in [54, 55], since the distance means to maximizing the peak signal-to-noise rate (PSNR) .
Motivated by the aforementioned drawbacks, in this study we made efforts in four aspects. First, we present a novel residual CNN-based network in the CycleGAN framework to preserve high-resolution anatomical details with no task specific regularization or prior knowledge. Specially, we utilize the cycle-consistency constraint to enforce a strong across domain consistency between and . Second, to further address the training problem of GANs [48, 53], we use the Wasserstein distance or âEarth Movingâ distance (EM distance) instead of the Jensen-Shannon (JS) divergence. Third, inspired by the recent work , we optimize the network structure by the use of several fundamental designing principles to alleviate the computational overheads [57, 58, 59, 60], which also helps prevent the network from over-fitting. Fourth, we cascade multiple layers to learn highly interpretable and disentangled hierarchical features. Moreover, we enable the information flow across the skip-connected layers to prevent gradient vanishing . Finally, we employ the norm instead of distance to facilitate deblurring. Extensive experiments on two real datasets demonstrate that our proposed composite network can achieve a satisfactory CT SR imaging performance comparable to or better than that of the state-of-the-art methods [35, 34, 38, 36, 21].
Let us first review the SR problems in the medical imaging field. Then, we introduce the proposed adversarial nets framework and present our SR imaging network architecture. Finally, we describe the optimization process.
Ii-a Problem Statement
Let a vector represent a LR image and a vector a SR image, the conventional formulation of the ill-posed linear SR problem  can be formulated as
where S H denote the down-sampling and blurring system matrix, and the noise and other factors. Note that in practice, both the system matrix and not-modeled factors can be non-linear, instead of being linear (i.e., neither scalable nor additive).
Our goal is to computationally improve noisy LRCT images obtained on a low-dose CT (LDCT) scanner to HRCT images. The main challenges in recovering HRCT images can be listed as follows. First, LRCT images contain more complex spatial variations, correlations and statistical properties than natural images, which limit the SR imaging performance of the traditional interpolation-based and blind deblurring based methods. Second, the noise in raw projection data is introduced to the image domain during the reconstruction process, resulting in unique noise and artifact patterns. This creates difficulties for analytical and iterative CT reconstruction algorithms to produce the perfect image quality. Finally, since the sampling and degradation operations are coupled and ill-posed, SR tasks cannot be performed beyond a marginal degree using the traditional methods, which cannot effectively restore some fine features and suffer from the risk of producing blurry appearance and new artifacts. To address the aforementioned limitations, here we develop an advanced neural network by composing a number of non-linear SR functional blocks for SR CT (SRCT) imaging and a residual module to learn high-frequency details. Then, we perform adopt adversarial learning in a cyclic manner to generate perceptually better SRCT images.
Ii-B Deep Cycle-Consistent Adversarial SRCT Model
Ii-B1 Cycle-Consistent Adversarial Model
Current DL-based algorithms use feed-forward CNNs to learn non-linear mappings parametrized by , which can be written as:
In order to obtain a decent , a suitable loss function must be specified to encourage to generate a SR image based on the training samples so that
where are paired LRCT and SRCT images for training. To address the limitations mentioned in II-A, our Cyclic SRCT model is shown in Fig. 1. The proposed model includes two generative mappings and given training samples and . Note that we denote the two mappings and as and respectively for brevity. The two mappings and are jointly trained to produce synthesized images in a way that confuse the adversarial discriminators and respectively, which intend to identify whether the output of each generative mapping is real or artificial. For example, given an LRCT image , attempts to generate a synthesized image highly similar to a real image so as to fool . In a similar way, attempts to discriminate between a reconstructed from and a real . The key idea is that the generators and discriminators are jointly/alternatively trained to improve their performance metrics synergistically. Thus, we have the following optimization problem:
To enforce the mappings between the source and target domains and regularize the training procedure, our proposed network combines three types of loss functions: adversarial loss (adv); cycle-consistency loss (cyc); identity loss (idt).
Ii-B2 Adversarial Loss
For marginal matching , we employ adversarial losses to urge the generated images to obey the empirical distributions in the source and target domains. To improve the training quality, we apply the Wasserstein distance  instead of the negative log-likelihood used in . Thus, we have the adversarial objective with respect to :
where the first two terms are in terms of the Wasserstein estimation, and the third term penalizes the deviation of the gradient norm of its input relative to one, is uniformly sampled along straight lines for pairs of and , and is a regularization parameter. A similar adversarial loss is defined for marginal matching in the reverse direction. We call this modified cycleGAN as the GAN-CIRCLE as summarized in the title of this paper.
Ii-B3 Cycle Consistency Loss
Adversarial training is for marginal matching [47, 48]. However, in . In these earlier studies, it was found that using adversarial losses alone cannot ensure the learned function can transform a source input successfully to a target output. To promote the consistency between and , the cycle-consistency loss can be express as:
where denotes the norm. Since the cycle-consistency loss encourages and , they are referred to as forward cycle consistency and backward cycle consistency respectively. Clearly, the cycle consistency can help prevent the degeneracy problem in adversarial learning .
Ii-B4 Identity Loss
Since a SR image should be a refined version of the LR counterpart, it is necessary to use the identity loss to regularize the training procedure. Compared with the loss, the loss does not over-penalize large differences or tolerate small errors between estimated and target images. Thus, the loss is preferred to alleviate the limitations of the loss in this context. Additionally, the loss enjoys the same fast convergence speed as that of the loss. The loss is formulated as follows:
Moreover, to impose image sparsity we can use the total variation (TV) regularization as follows:
where and compute the vertical and vertical gradients of respectively.
Ii-B5 Overall Objective Function
In the training process, our proposed network is fine-tuned in an end-to-end manner to minimize the following overall objective function:
where and are parameters to balance among different penalties.
Ii-B6 Supervised learning with GAN-CIRCLE.
In the case where we have access to paired dataset, we can render SRCT problem to train our model in a supervised fashion. Given the training paired data from the true joint, i.e. , we can define a supervision loss as follows:
Ii-C Network Architecture
Ii-C1 Generative Networks
Although more layers and larger model size usually result in the performance gain, for real application we designed a lightweight model to validate the effectiveness of GAN-CIRCLE. The two generative networks and are shown in Fig. 2. The network architecture has been optimized for SR CT imaging. It consists of two processing steams: the feature extraction network and the reconstruction network.
In the feature extraction network, we concatenate sets of non-linear SR feature blocks composed of Convolution (Conv) kernels, bias, Leaky ReLU, and a dropout layer. We utilize Leaky ReLU to prevent the ’dead ReLU’ problem thanks to the nature of leaky rectified linear units (Leaky ReLU): . Applying the dropout layer is to prevent overfitting. The number of filters are shown in Table I. In practice, we avoid normalization which is not suitable for SR, because it discards the range flexibility of the features. Then, to capture both local and the global image features, all outputs of the hidden layers are concatenated before the reconstruction network through skip connection. The skip connection helps prevent training saturation and overfitting. Diverse features which represent different details of the HRCT components can be constructed in the end of feature extraction network.
|Tibia Case||Tibia Case||Abdominal Case||Abdominal Case|
In the reconstruction network, we stack two reconstruction branches and integrate the information flows. Because all the outputs from the feature extraction network are densely connected, we propose a parallelized CNNs (Network in Network)  to restore image details. There are several benefits with the Network in Network strategy. First, the Conv layer can significantly reduce the dimensionality of the filter space for faster computation with less information loss . Second, the Conv layer can increase non-linearity of the network to learn a complex mapping better at the finer levels. For up-sampling, we adopt the transposed convolutional (up-sampling) layers  by a scale of . The last Conv layer fuses all the feature maps, resulting in an entire residual image containing mostly high-frequency details. In the supervised setting, the up-sampled image by the bicubic interpolation layer is combined (via element-wise addition) with the residual image to produce a HR output. In the unsupervised and semi-supervised setting, no interpolation is involved across the skip connection.
It should be noted that the generator shares the same architecture as in both the supervised and unsupervised scenarios. The default stride size is . However, for unsupervised feature learning, the stride of the Conv layers is in the feature blocks. Also, for supervised feature learning, the stride of the Conv layers is in the and feature blocks of . We refer to the forward generative network as G-Forward.
Ii-C2 Discriminative Networks
As shown in Fig. 3, in reference to the recent successes with GANs [65, 34], is designed to have stages of Conv, bias, instance norm  (IN) and Leaky ReLU, followed by two fully-connected layers, of which the first has outputs and the other has a single output. In addition, inspired by  no sigmoid cross entropy layer is applied by the end of . We apply filter size for the Conv layers which had different numbers of filters, which are respectively.
Iii Experiments and results
We discuss our experiments in this section. We first introduce the datasets we utilize and then describe the implementation details and parameter settings in our proposed methods. We also compare our proposed algorithm with the state-of-the-art SR methods [37, 36, 34, 35] quantitatively and qualitatively. Finally, we present the detailed diagnostic quality assessments from expert radiologists. Note that we use the default parameters of all the evaluated methods.
Iii-a Training Datasets
In this study, we used two high-quality sets of training images to demonstrate the fidelity and robustness of the proposed GAN-CIRCLE. As shown in Figs. 4 - 10, these two datasets are of very different characteristics.
Iii-A1 Tibia dataset
This micro-CT image dataset reflects twenty-five fresh-frozen cadaveric ankle specimens which were removed at mid-tibia from 17 body donors. After the soft tissue were removed and the tibia was dislocated from the ankle joint, each specimen was scanned on a Siemens microCAT II (Preclinical Solutions, Knoxville, TN, USA) in the cone beam imaging geometry. The micro-CT parameters are briefly summarized as follows: an tube voltage kV, a tube current mAs, projections over a range of degrees, an exposure time of s per projection, and the filter backprojection (FBP) method was utilized to produce isotropic voxels. Since the registration of CT scans involve an interpolation step that compromised the micro-architectural measurement, we applied the transformation volumes-of-interest (VOI)s instead of images entirely to improve reproducibility. The micro-CT images we utilized as HR images were prepared at voxel size, as the target for SR imaging based of the corresponding LR images at voxel size. The full description is in . We target X resolution improvement.
Iii-A2 Abdominal dataset
This clinical dataset is authorized by Mayo Clinic for 2016 NIH-AAPM-Mayo Clinic Low Dose CT Grand Challenge. The dataset contains full dose CT images from 10 patients with the reconstruction interval and slice thickness of and respectively. The original CT images were generated by multidetector row CT (MDCT) with image size of . The projection data is from views per scan. The HR images, with voxel size , were reconstructed using the FBP method from all projection views. More detailed information of the dataset are given in .
We perform image pre-processing for all CT images through the following workflow. The original CT images were first scaled from the CT Hounsfield Value (HU) to the unit interval [0,1], and treated as the ground-truth HRCT images. In addition, we followed the convention in [25, 69, 70] to generate LR images by adding noise to the original images and then lowering the spatial resolution by a factor of 2. For convenience in training our proposed network, we up-sampled the LR image via proximal interpolation to ensure that and are of the same size.
Since the amount of training data plays a significant role in training neural networks , we extracted overlapping patches from LRCT and HRCT images instead of directly feeding the entire CT images to the training pipeline. The overlapped patches were obtained with a predefined sliding size. This strategy preserves local anatomical details, and boost the number of samples. We randomly cropped HRCT images into patches of , along with their corresponding LRCT patches of size at the same center point for supervised learning. With the unsupervised learning methods, the size of the HRCT and LRCT patches are in batches of size .
Iii-B Implementation Details
In the proposed GAN-CIRCLE, we initialized the weights of the Conv layer based on . We computed std in the manner of where std is the standard deviation, , the filter size, and the number of filters. For example, given and , std= and all bias were initialized to . In the training process, dropping out  with was applied to each Conv layer. All the Conv and transposed Conv layers were followed by Leaky ReLu with a slope . To make the size of all feature maps the same as that of the input, we padded zeros around the boundaries before the convolution. We utilized the Adam optimizer  with to minimize the loss function of the proposed network. We set the learning rate to for all layers and then decreased by a factor of for every epochs and terminated the training after epochs. All experiments were conducted using the TensorFlow library on a NVIDA TITAN XP GPU.
Iii-C Performance Comparison
In this study, we compared the proposed GAN-CIRCLE with the state-of-the-art methods: FSRCNN , ESPCN , LapSRN , and SRGAN . For clarity, we categorized the methods into the following classes: the interpolation-based, dictionary-based, CNN-based, and GAN-based methods. Especially, we trained the publicly available FSRCNN, ESPCN, LapSRN and SRGAN with our paired LR and HR images. To demonstrate the effectiveness of the DL-based methods, we first denoised the input LR images and then super-resolved the denoised CT image using the typical interpolation methods: nearest neighbor (NN) up-sampling, bilinear interpolation, bicubic interpolation, lanczos interpolation. BM3D is one of the classic image domain denoising algorithms, which is efficient and powerful. Thus, we preprocessed the noisy LRCT images with BMD3, and then super-solved the denoised images by interpolation methods and adjusted anchored neighborhood regression A. We refer to interpolation-based methods as NN, Bilinear, Bicubic, Lanczos.
We evaluated three variations of the proposed method: (1) G-Forward, which is the forward generator of GAN-CIRCLE, (2) G-Adversarial, which uses the adversarial learning strategy, and (3) the full-fledged GAN-CIRCLE. To emphasize the effectiveness of the GAN-CIRCLE structure, we first trained the three models using the supervised learning strategy, and then trained our proposed GAN-CIRCLE in the semi-supervised scenario (GAN-CIRCLE), and finally implement GAN-CIRCLE in the unsupervised manner (GAN-CIRCLE). In the semi-supervised settings, two datasets were created separately by randomly splitting the dataset into paired and unpaired dataset with respect to three variants: , , and paired. To better evaluate the performance of each methods, we use the same size of the dataset for training and testing.
We validated the SR performance in terms of three widely-used image quality metrics: Peak signal-to-noise ratio (PSNR), Structural Similarity (SSIM) , and Information Fidelity Criterion (IFC) .
Through extensive experiments, we compared all the above-mentioned methods on the two benchmark datasets described in Section III-A.
Iii-D Experimental Results with the Tibia Dataset
We evaluated the proposed algorithms against the state-of-the-art algorithms on the tibia dataset. We present typical results in Figs. 4 and 6. It is observed that BM3D can effectively remove the noise, but it over-smooths the noisy LR images. Then, the interpolation-based methods (NN, Bilinear, Bicubic, Lanczos) yield noticeable artifacts caused by partial aliasing. On the other hand, the DL-based methods suppress such artifacts effectively. It can be seen that our proposed GAN-CIRCLE recovers more fine subtle details and capture more anatomical information in Figs 5 and 7. It is worth mentioning that Figs. (k)k and (k)k show that there are severe distortions of the original images but SRGAN generates compelling results in Figs. (k)k and (k)k, which indicate VGG network is a task-specific network which can generate images with excellent image quality. We argue that the possible reason is that the VGG network  is a pre-trained CNN-based network based on natural images where structural characteristic correlated with the content is different from that of medical images . Figs. 5 and 7 present that the proposed GAN-CIRCLE can predict images with shaper boundaries and richer textures than GAN-CIRCLE, and GAN-CIRCLE which learns additional anatomical information from the unpaired samples. The quantitative results are in Table. II. The results demonstrate that the G-Forward achieves the highest scores using the evaluation metrics, PSNR and SSIM, which outperforms all other methods. However, it has been pointed out in  that high PSNR and SSIM cannot guarantee a visually favorable result. Non-GAN based methods (FSRCNN, ESPCN, LapSRN) may fail to recover some fine structure for diagnostic evaluation, such as in Figs 5 and 7. Quantitatively, GAN-CIRCLE achieves the second best values in terms of SSIM and IFC, which has been pointed out in  that IFC value is correlated well with the human perception of SR images. Our GAN-CIRCLE obtained the comparable results qualitatively and quantitatively. Table II shows that the proposed semi-supervised method approaches the purely supervised methods on the tibia dataset. In general, our proposed GAN-CIRCLE can generate more pleasant results with sharper image contents.
|Tibia Dataset||Abdominal Dataset|
|Image Sharpness||Image Noise||Contrast Resolution||Diagnostic Acceptance||Overall Quality||Image Sharpness||Image Noise||Contrast Resolution||Diagnostic Acceptance||Overall Quality|
Iii-E Experimental Results on the Abdominal Dataset
We further compared the above-mentioned algorithms on the abdominal benchmark dataset. A similar trend can be observed on this dataset. Our proposed GAN-CIRCLE can preserve better anatomical informations and more clearly visualize the portal vein as shown in Figs. 9 and 11. These results demonstrate that CNN-based methods (FSRCNN, ESPCN, LapSRN) can significantly suppress the noise and artifacts. However, it suffers from low image quality as judged by the human observer since it assumes that the impact of noise is independent of local image features, while the sensitivity of the Human Visual System (HVS) to noise depends on local contrast, intensity and structural variations. Figs. 9 and 11 display the LRCT images processed by GAN-based methods (SRGAN, G-Adv, GAN-CIRCLE, GAN-CIRCLE, and GAN-CIRCLE) with improved structural identification. It can also observed that the GAN-based models also introduce strong noise into results. As shown in Fig. (p)p, there exist tiny artifacts on the results of GAN-CIRCLE. Figs (o)o and (o)o show that GAN-CIRCLE and GAN-CIRCLE are capable of retaining high-frequency details to reconstruct more realistic images with relatively lower noise compared with the other GAN-based methods (G-Adv, SRGAN). Table II shows that G-Fwd achieves the best performance in PSNR. Our proposed method GAN-CIRCLE and GAN-CIRCLE both obtain the pleasing results in terms of SSIM and IFC. In other words, the results show that the proposed GAN-CIRCLE and GAN-CIRCLE generate more visually pleasant results with sharper edges on the abdominal dataset than the competing state-of-the-art methods.
Iii-F Diagnostic Quality Assessment
We invited three board-certified radiologists with mean clinical CT experience of 12.3 years to perform independent qualitative image analysis on sets of images from two benchmark dataset. Each set includes the same image slice but generated using different methods. We label HRCT and LDCT images in each set as reference. The sets of images from two datasets were randomized and deidentified so that the radiologists were blind to the post-proprocessing algorithms. Image sharpness, image noise, contrast resolution, diagnostic acceptability and overall image quality were graded on a scale from 1 (worst) to 5 (best). A score of 1 refers to a ‘non-diagnostic’ image, while a score of 5 means an ‘excellent’ diagnostic image quality. The mean scores with their standard deviation are presented in Table III. The radiologists confirmed that GAN-based methods (G-Adv, SRGAN, GAN-CIRCLE, GAN-CIRCLE and GAN-CIRCLE) provide sharper images with better texture details, while CNN-based algorithms (FSRCNN, ESPCN, LapSRN, G-Fwd) give the higher noise suppression scores. Table III shows that our proposed GAN-CIRCLE and GAN-CIRCLE achieve comparable results, while outperforming the other methods in terms of image sharpness, contrast resolution, diagnostic acceptability and overall image quality.
SR imaging holds tremendous promise for practical medical applications; for example, depicting bony details, lung structures, and implanted stents, and more generally enhancing radiomics analysis. In practice, the physical constraints of system hardware components and radiation dose considerations constrain imaging performance, and computational means are necessary to optimize image resolution. For the same reason, high-quality/high-dose CT images are not often available, which means there are often not enough paired data to train a hierarchical deep generative model.
Our results have demonstrated that the addition of unpaired data actually hurt the performance of not using any unpaired data. Furthermore, the use of the adversarial learning as the regularization term fo SR imaging is a new mechanism to capture anatomical information. However, it should be noted that the existing GAN-based methods introduce additional noise to the results, as seen in Section. III-D and III-E. To cope with this limitation, we have incorporated the cycle-consistency so that the network can learn a complex deterministic mapping to improve image quality. The enforcement of identity and supervision allows the model to master more latent structural information to argument resolution. Also, we have used the Wasserstein distance to stabilize the GAN training process. Moreover, typical prior studies used complex inference to learn a hierarchy of latent variables for HR imaging, which is hard to be utilized in real medical applications. Thus, we have designed an efficient CNN-based network with skip-connection and network in network techniques. In the feature extraction network, we have optimized the network structures and reduced the computational complexity by applying small amount of filters in each Conv layer and utilizing the ensemble learning model. Both local and global features are cascaded through skip connections before being fed into the restoration/reconstruction network. In the network, we have adopted that the network in network architecture for superior SR CT performance.
Although our model has achieved the compelling results, there still exist some limitations. First, the proposed GAN-CIRCLE requires much longer training time than other standard GAN-based methods, which generally requires 1-2 days. Future work in this aspect should consider more principled ways of designing more efficient architectures that allow for learning more complex structural features with less complex networks at lower computation cost and model complexity. Second, although our proposed model can generate more plausible details and better preservation of local anatomical details, the subtle structures may not be always faithfully reconstructed. It has been also observed that the recent literature  mentions that Wasserstein distance may yield the biased sample gradients, which is subject to the risk of incorrect minimum and not well suitable to stochastic gradient descent searching. In the future, experimenting with the variants of GANs are highly recommended. Finally, we notice that the network with the adversarial learning training can generate more realistic images. However, the restored images cannot be uniformly well consistent to the original high-resolution images. To make any further progress, we may also undertake efforts to add more constraints such as sinogram consistence and low-dimensional manifold constraint to decipher the relationship between noise, blurry appearances of images and structural ground truth and even develop an adaptive and/or task-specific structural loss function.
In this paper, we have estabilished a cycle wasserstein regression adversarial training framework for CT SR imaging. Aided by unpaired data, our approach learns complex structured features more effectively with a limited amount of paired data. At a low computational cost, the proposed network G-Forward can achieve the significant SR gain. In general, the proposed GAN-CIRCLE has produced promising results in terms of preserving anatomical information and suppressing image noise in a purely supervised and semi-supervised learning fashion. Visual evaluations by the expert radiologists confirm that our proposed GAN-CIRCLE networks have brought superior diagnostic quality, which is consistent to systematic quantitative evaluations in terms of traditional image quality measures.
The authors would like to thank the NVIDIA Corporation for the donation of the TITAN XP GPU to Dr. Ge Wang’s laboratory, which was used for this study.
-  D. J. Brenner, C. D. Elliston, E. J. Hall, and W. E. Berdon, “Estimated risks of radiation-induced fatal cancer from pediatric ct,” Am. J. Roentgenol., vol. 176, no. 2, pp. 289–296, 2001.
-  H. Greenspan, “Super-resolution in medical imaging,” Comput. J., vol. 52, no. 1, pp. 43–63, 2008.
-  G. Schwarzband and N. Kiryati, “The point spread function of spiral ct,” Phys. Med. Biol., vol. 50, no. 22, p. 5307, 2005.
-  A. Hassan, S. A. Nazir, and H. Alkadhi, “Technical challenges of coronary ct angiography: today and tomorrow,” Eur. J. Radiol., vol. 79, no. 2, pp. 161–171, 2011.
-  d. G. A. Berrington and S. Darby, “Risk of cancer from diagnostic x-rays: estimates for the uk and 14 other countries.” Lancet., vol. 363, no. 9406, p. 345, 2004.
-  D. J. Brenner and E. J. Hall, “Computed tomography â- an increasing source of radiation exposure,” New Eng. J. Med., vol. 357, no. 22, pp. 2277–2284, 2007.
-  A. B. De Gonzalez and S. Darby, “Risk of cancer from diagnostic x-rays: estimates for the UK and 14 other countries,” Lancet., vol. 363, no. 9406, pp. 345–351, Jan. 2004.
-  P. J. La Rivière, J. Bian, and P. A. Vargas, “Penalized-likelihood sinogram restoration for computed tomography,” IEEE Trans. Med. Imag., vol. 25, no. 8, pp. 1022–1036, 2006.
-  M. W. Vannier, “Iterative deblurring for ct metal artifact reduction,” IEEE Trans. Med. Imag., vol. 15, no. 5, p. 651, 1996.
-  G. Wang, M. W. Vannier, M. W. Skinner, M. G. Cavalcanti, and G. W. Harding, “Spiral ct image deblurring for cochlear implantation,” IEEE Trans. Med. Imag., vol. 17, no. 2, pp. 251–262, 1998.
-  D. D. Robertson, J. Yuan, G. Wang, and M. W. Vannier, “Total hip prosthesis metal-artifact suppression using iterative deblurring reconstruction,” J. Comput. Assist. Tomogr., vol. 21, no. 2, pp. 293–298, 1997.
-  M. Jiang, G. Wang, M. W. Skinner, J. T. Rubinstein, and M. W. Vannier, “Blind deblurring of spiral ct images,” IEEE Trans. Med. Imag., vol. 22, no. 7, pp. 837–845, 2003.
-  M. Jiang, G. Wang, M. Skinner, J. Rubinstein, and M. Vannier, “Blind deblurring of spiral ct imagesâcomparative studies on edge-to-noise ratios,” Med. Phys., vol. 29, no. 5, pp. 821–829, 2002.
-  J. Wang, G. Wang, and M. Jiang, “Blind deblurring of spiral ct images based on enr and wiener filter,” J. X-Ray Sci. Technol., vol. 13, no. 1, pp. 49–60, 2005.
-  J. Tian and K.-K. Ma, “A survey on super-resolution imaging,” Signal Image Video P., vol. 5, no. 3, pp. 329–342, 2011.
-  R. Zhang, J.-B. Thibault, C. A. Bouman, K. D. Sauer, and J. Hsieh, “Model-based iterative reconstruction for dual-energy x-ray ct using a joint quadratic likelihood model,” IEEE Trans. Med. Imag., vol. 33, no. 1, pp. 117–134, 2014.
-  C. A. Bouman and K. Sauer, “A unified approach to statistical tomography using coordinate descent optimization,” IEEE Trans. Image Process., vol. 5, no. 3, pp. 480–492, 1996.
-  Z. Yu, J.-B. Thibault, C. A. Bouman, K. D. Sauer, J. Hsieh et al., “Fast model-based x-ray ct reconstruction using spatially nonhomogeneous icd optimization,” IEEE Trans. Image Process., vol. 20, no. 1, pp. 161–175, 2011.
-  K. Sauer and C. Bouman, “A local update strategy for iterative reconstruction from projections,” IEEE Trans. Signal Process., vol. 41, no. 2, pp. 534–548, 1993.
-  J.-B. Thibault, K. D. Sauer, C. A. Bouman, and J. Hsieh, “A three-dimensional statistical approach to improved image quality for multislice helical ct,” Med. Phys., vol. 34, no. 11, pp. 4526–4544, 2007.
-  J. Yang, J. Wright, T. S. Huang, and Y. Ma, “Image super-resolution via sparse representation,” IEEE Trans. Image Process., vol. 19, no. 11, pp. 2861–2873, 2010.
-  Z. Wang, J. Yang, H. Zhang, Z. Wang, Y. Yang, D. Liu, and T. S. Huang, Sparse Coding and its Applications in Computer Vision. World Scientific, 2016.
-  Z. Wang, Y. Yang, Z. Wang, S. Chang, J. Yang, and T. S. Huang, “Learning super-resolution jointly from external and internal examples,” IEEE Trans. Image Process., vol. 24, no. 11, pp. 4359–4371, 2015.
-  D. H. Trinh, M. Luong, F. Dibos, J.-M. Rocchisani, C. D. Pham, and T. Q. Nguyen, “Novel example-based method for super-resolution and denoising of medical images.” IEEE Trans. Image Process., vol. 23, no. 4, pp. 1882–1895, 2014.
-  C. Jiang, Q. Zhang, R. Fan, and Z. Hu, “Super-resolution ct image reconstruction based on dictionary learning and sparse representation,” Sci. Rep., vol. 8, no. 1, p. 8799, 2018.
-  Y. Zhang, G. Wu, P.-T. Yap, Q. Feng, J. Lian, W. Chen, and D. Shen, “Reconstruction of super-resolution lung 4d-ct using patch-based sparse representation,” in Proc. IEEE Conf. Comp. Vis. Patt. Recogn. (CVPR). IEEE, 2012, pp. 925–931.
-  W. Dong, L. Zhang, G. Shi, and X. Wu, “Image deblurring and super-resolution by adaptive sparse domain selection and adaptive regularization,” IEEE Trans. Image Process., vol. 20, no. 7, pp. 1838–1857, 2011.
-  S. Yang, M. Wang, Y. Chen, and Y. Sun, “Single-image super-resolution reconstruction via learned geometric dictionaries and clustered sparse coding,” IEEE Trans. Image Process., vol. 21, no. 9, pp. 4016–4028, 2012.
-  Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
-  Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, p. 436, 2015.
-  A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proc. Adv. Neural Inf. Process. Syst., 2012, pp. 1097–1105.
-  W. Zhang, R. Li, H. Deng, L. Wang, W. Lin, S. Ji, and D. Shen, “Deep convolutional neural networks for multi-modality isointense infant brain image segmentation,” NeuroImage, vol. 108, pp. 214–224, 2015.
-  S. Wang, M. Kim, G. Wu, and D. Shen, “Scalable high performance image registration framework by unsupervised deep feature representations learning,” in IEEE Trans. Biomed. Eng. Elsevier, 2017, pp. 245–269.
-  C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. P. Aitken, A. Tejani, J. Totz, Z. Wang et al., “Photo-realistic single image super-resolution using a generative adversarial network.” in Proc. IEEE Conf. Comp. Vis. Patt. Recogn. (CVPR), vol. 2, no. 3, 2017, p. 4.
-  C. Dong, C. C. Loy, and X. Tang, “Accelerating the super-resolution convolutional neural network,” in Eur. Conf. Comp. Vis. (ECCV). Springer, 2016, pp. 391–407.
-  W.-S. Lai, J.-B. Huang, N. Ahuja, and M.-H. Yang, “Deep laplacian pyramid networks for fast and accurate super-resolution,” in Proc. IEEE Conf. Comp. Vis. Patt. Recogn. (CVPR), 2017.
-  B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee, “Enhanced deep residual networks for single image super-resolution,” in Proc. IEEE Conf. Comp. Vis. Patt. Recogn. (CVPR), vol. 1, no. 2, 2017, p. 4.
-  W. Shi, J. Caballero, F. Huszár, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang, “Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network,” in Proc. IEEE Conf. Comp. Vis. Patt. Recogn. (CVPR), 2016, pp. 1874–1883.
-  Y. Yuan, S. Liu, J. Zhang, Y. Zhang, C. Dong, and L. Lin, “Unsupervised image super-resolution using cycle-in-cycle generative adversarial networks,” in Proc. IEEE Conf. Comp. Vis. Patt. Recogn. (CVPR), June 2018.
-  G. Wang, M. Kalra, and C. G. Orton, “Machine learning will transform radiology significantly within the next 5 years,” Med. Phys., vol. 44, no. 6, pp. 2041–2044, 2017.
-  G. Wang, “A perspective on deep imaging,” IEEE Access, vol. 4, pp. 8914–8924, 2016.
-  G. Wang, J. C. Ye, K. Mueller, and J. A. Fessler, “Image reconstruction is a new frontier of machine learning,” IEEE Trans. Med. Imag., vol. 37, no. 6, pp. 1289–1296, 2018.
-  Y. Chen, F. Shi, A. G. Christodoulou, Z. Zhou, Y. Xie, and D. Li, “Efficient and accurate mri super-resolution using a generative adversarial network and 3d multi-level densely connected network,” CoRR, vol. abs/1803.01417, 2018.
-  H. Yu, D. Liu, H. Shi, H. Yu, Z. Wang, X. Wang, B. Cross, M. Bramler, and T. S. Huang, “Computed tomography super-resolution using convolutional neural networks,” in Proc. IEEE Intl. Conf. Image Process., 2017, pp. 3944–3948.
-  J. Park, D. Hwang, K. Y. Kim, S. K. Kang, Y. K. Kim, and J. S. Lee, “Computed tomography super-resolution using deep convolutional neural network,” Phys. Med. Biol., 2018.
-  A. S. Chaudhari, Z. Fang, F. Kogan, J. Wood, K. J. Stevens, E. K. Gibbons, J. H. Lee, G. E. Gold, and B. A. Hargreaves, “Super-resolution musculoskeletal mri using deep learning,” Magn. Reson. Med., 2018.
-  I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Proc. Adv. Neural Inf. Process. Syst., 2014, pp. 2672–2680.
-  I. Goodfellow, “Nips 2016 tutorial: Generative adversarial networks,” CoRR, 2016.
-  J. M. Wolterink, T. Leiner, M. A. Viergever, and I. Išgum, “Generative adversarial networks for noise reduction in low-dose ct,” IEEE Trans. Med. Imag., vol. 36, no. 12, pp. 2536–2545, 2017.
-  D. Nie, R. Trullo, J. Lian, L. Wang, C. Petitjean, S. Ruan, Q. Wang, and D. Shen, “Medical image synthesis with deep convolutional adversarial networks,” IEEE Trans. Biomed. Eng., 2018.
-  J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networkss,” in Proc. IEEE Int. Conf. Comp. Vis. (ICCV), 2017.
-  A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” Int. Conf. Learn. Representations. (ICLR), 2016.
-  M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in Proc. Int. Conf. Mach. Learn. (ICML), 2017, pp. 214–223.
-  H. Zhao, O. Gallo, I. Frosio, and J. Kautz, “Loss functions for image restoration with neural networks,” IEEE Trans. Comput. Imag., vol. 3, no. 1, pp. 47–57, 2017.
-  C. You, Q. Yang, H. Shan, L. Gjesteby, L. Guang, S. Ju, Z. Zhang, Z. Zhao, Y. Zhang, W. Cong et al., “Structure-sensitive multi-scale deep neural network for low-dose ct denoising,” IEEE Access, 2018.
-  J. Yamanaka, S. Kuwashima, and T. Kurita, “Fast and accurate image super resolution by deep cnn with skip connection and network in network,” in Proc. NIPS. Springer, 2017, pp. 217–225.
-  K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comp. Vis. Patt. Recogn. (CVPR), 2016, pp. 770–778.
-  G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks.” in Proc. IEEE Conf. Comp. Vis. Patt. Recogn. (CVPR), vol. 1, no. 2, 2017, p. 3.
-  N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” J. Machine Learning Res., vol. 15, no. 1, pp. 1929–1958, 2014.
-  J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” CoRR, vol. abs/1607.06450, 2016.
-  I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, “Improved training of wasserstein gans,” in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 5769–5779.
-  C. Li, H. Liu, C. Chen, Y. Pu, L. Chen, R. Henao, and L. Carin, “Alice: Towards understanding adversarial learning for joint distribution matching,” in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 5495–5503.
-  M. Lin, Q. Chen, and S. Yan, “Network in network,” Int. Conf. Learn. Representations. (ICLR), 2014.
-  M. D. Zeiler, D. Krishnan, G. W. Taylor, and R. Fergus, “Deconvolutional networks,” in Proc. IEEE Conf. Comp. Vis. Patt. Recogn. (CVPR). IEEE, 2010, pp. 2528–2535.
-  K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” Int. Conf. Learn. Representations. (ICLR), 2015.
-  D. Ulyanov, A. Vedaldi, and V. S. Lempitsky, “Instance normalization: The missing ingredient for fast stylization,” CoRR, vol. abs/1607.08022, 2016.
-  C. Chen, X. Zhang, J. Guo, D. Jin, E. M. Letuchy, T. L. Burns, S. M. Levy, E. A. Hoffman, and P. K. Saha, “Quantitative imaging of peripheral trabecular bone microarchitecture using mdct,” Med. Phys., vol. 45, no. 1, pp. 236–249, 2018.
-  AAPM, “Low dose ct grand challenge,” 2017. [Online]. Available: http://www.aapm.org/GrandChallenge/LowDoseCT/#
-  D. Glasner, S. Bagon, and M. Irani, “Super-resolution from a single image,” in Proc. IEEE Int. Conf. Comp. Vis. (ICCV). IEEE, 2009, pp. 349–356.
-  C.-H. Pham, A. Ducournau, R. Fablet, and F. Rousseau, “Brain mri super-resolution using deep 3d convolutional networks,” in Proc. IEEE Int. Symp. Biomed. Imag. IEEE, 2017, pp. 197–200.
-  D. Liu, Z. Wang, Y. Fan, X. Liu, Z. Wang, S. Chang, X. Wang, and T. S. Huang, “Learning temporal dynamics for video super-resolution: A deep learning approach,” IEEE Trans. Image Process., vol. 27, no. 7, pp. 3432–3445, 2018.
-  K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in Proc. IEEE Int. Conf. Comp. Vis. (ICCV), 2015, pp. 1026–1034.
-  G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov, “Improving neural networks by preventing co-adaptation of feature detectors,” CoRR, vol. abs/1207.0580, 2012.
-  D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Int. Conf. Learn. Representations. (ICLR), 2015.
-  P. F. Feruglio, C. Vinegoni, J. Gros, A. Sbarbati, and R. Weissleder, “Block matching 3D random noise filtering for absorption optical projection tomography,” Phys. Med. Biol., vol. 55, no. 18, p. 5401, 2010.
-  Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, 2004.
-  H. R. Sheikh, A. C. Bovik, and G. De Veciana, “An information fidelity criterion for image quality assessment using natural scene statistics,” IEEE Trans. Image Process., vol. 14, no. 12, pp. 2117–2128, 2005.
-  D. Shen, G. Wu, and H.-I. Suk, “Deep learning in medical image analysis,” Annu. Rev. Biomed. Eng., vol. 19, pp. 221–248, 2017.
-  Q. Yang, P. Yan, Y. Zhang, H. Yu, Y. Shi, X. Mou, M. K. Kalra, Y. Zhang, L. Sun, and G. Wang, “Low dose ct image denoising using a generative adversarial network with wasserstein distance and perceptual loss,” IEEE Trans. Med. Imag., 2018.
-  C.-Y. Yang, C. Ma, and M.-H. Yang, “Single-image super-resolution: A benchmark,” in Eur. Conf. Comp. Vis. (ECCV). Springer, 2014, pp. 372–386.
-  M. G. Bellemare, I. Danihelka, W. Dabney, S. Mohamed, B. Lakshminarayanan, S. Hoyer, and R. Munos, “The cramer distance as a solution to biased wasserstein gradients,” CoRR, vol. abs/1705.10743, 2017.