A Hybrid Frequency-domain/Image-domain Deep Network for Magnetic Resonance Image Reconstruction
Decreasing magnetic resonance (MR) image acquisition times can potentially reduce procedural cost and make MR examinations more accessible. Compressed sensing (CS)-based image reconstruction methods, for example, decrease MR acquisition time by reconstructing high-quality images from data that were originally sampled at rates inferior to the Nyquist-Shannon sampling theorem. Iterative algorithms with data regularization are the standard approach to solving ill-posed, CS inverse problems. These solutions are usually slow, therefore, preventing near-real time image reconstruction. Recently, deep-learning methods have been used to solve the CS MR reconstruction problem. These proposed methods have the advantage of being able to quickly reconstruct images in a single pass using an appropriately trained network. Some recent studies have demonstrated that the quality of their reconstruction equals and sometimes even surpasses the quality of the conventional iterative approaches. A variety of different network architectures (e.g., U-nets and Residual U-nets) have been proposed to tackle the CS reconstruction problem. A drawback of these architectures is that they typically only work on image domain data. For undersampled data, the images computed by applying the inverse Fast Fourier Transform (iFFT) are aliased. In this work we propose a hybrid architecture that works both in the k-space (or frequency-domain) and the image (or spatial) domains. Our network is composed of a complex-valued residual U-net in the k-space domain, an iFFT operation, and a real-valued U-net in the image domain. Our experiments demonstrated, using MR raw k-space data, that the proposed hybrid approach can potentially improve CS reconstruction compared to deep-learning networks that operate only in the image domain. In this study we compare our method with four previously published deep neural networks and examine their ability to reconstruct images that are subsequently used to generate regional volume estimates. We evaluated undersampling ratios of 75% and 80%. Our technique was ranked second in the quantitative analysis, but qualitative analysis indicated that our reconstruction performed the best in hard to reconstruct regions, such as the cerebellum. All images reconstructed with our method were successfully post-processed, and showed good volumetry agreement compared with the fully sampled reconstruction measures.
Magnetic resonance (MR) is a key medical imaging modality that has critical roles in both patient care and medical research. MR scanner installations, however, are expensive. Another major limitation to MR imaging is the comparatively long image acquisition times, especially when compared to other modalities like computerized tomography. Lengthy acquisition times make MR less patient friendly and increase the per patient examination cost. MR-based compressed sensing (CS) methods seek to leverage the implicit sparsity of medical images [RN280], potentially allowing for significant k-space undersampling during acquisition, and by consequence, reducing examination times.
Traditional MR CS reconstruction techniques are iterative algorithms that usually require a sparsifying transform that when combined with regularization parameters are able to find a solution for these ill-posed inverse problems [RN280, RN247]. The drawback of these iterative approaches, however, is that they are time-consuming, making them more difficult to incorporate in a near real-time MR imaging scenario (i.e., where images are reconstructed and displayed on the scanner during the procedure).
Deep learning [RN250] is a new method that has been applied to reconstruction challenges. It has the advantage of being able to rapidly reconstruct images in a single-pass using a suitably trained network. Some deep-learning based reconstruction methods have arguably surpassed traditional iterative CS reconstruction techniques [RN307].
A few different deep learning approaches have been recently proposed to tackle the CS reconstruction problem. Jin et al. [RN254] proposed to use a U-net [RN196]. Lee et al. experimentally showed that residual U-nets can potentially improve image reconstruction. Residual blocks have subsequently been incorporated in the majority of the latest studies (cf., [RN307, RN305, RN306]).
Yang et al. [RN307] proposed a deep de-aliasing generative adversarial network (DAGAN) that uses a residual architecture as the network generator responsible for reconstructing the images associated to a loss function that has four components: an image domain loss, a frequency domain loss, a perceptual loss, and an adversarial loss. Quan et al. [RN305] proposed a generative adversarial network (GAN) with a cyclic loss [RN315]. Their method consists of a cascade of a reconstruction network followed by a refinement network with a cyclic loss component that tries to enforce that the model is bijective. Schlemper et al. [RN306] proposed a deep cascade of convolutional neural networks (CNNs) that has data consistency (DC) blocks between consecutive subnetworks in the cascade. Their hypothesis is that the DC blocks reduce the issue of overfitting, therefore allowing the training of deeper models.
The aforementioned techniques principally work in the image domain, with a few exceptions, where k-space domain information is used in the loss function and/or to implement DC layers. All of these networks [RN307, RN305, RN306] receive as input the undersampled k-space zero-filled reconstruction, and output an unaliased image.
Zhu et al. [RN289] recently proposed a method that tries to learn the domain transform. Their method first processes the undersampled input data in k-space, learns the inverse discrete Fourier transform (iDFT), and then refines the result in the image domain. In the case of CS reconstruction, the domain transform can be considered as learning an approximation for the iDFT for undersampled k-space data. The domain transform is modeled as a sequence of connected layers, and the image domain refinement is modeled as a series of convolutional layers. A disadvantage of this approach is that the domain transform has a quadratic complexity with respect to the size of the input image. For example, when dealing with images, the number of learned parameters in their model would be greater . The quadratic order of the algorithm makes it challenging to use their model for typical MR image sizes due to current hardware limitations.
Based on these studies, we hypothesize that a hybrid approach that works with both information as presented in k-space domain and image domain (such as the proposal of [RN289]) can improve MR CS reconstruction. In this work, we propose a model that consists of a cascade of a k-space domain network and an image domain network connected by the magnitude of the iDFT operation. Our model does not need to learn the domain transform, which essentially reduces our model parameter complexity to . For a input image, the number of learned parameters is reduced by a factor of for our method compared to [RN289] ( parameters). Our proposed method takes advantage of information as presented in k-space and image domain, as opposed to other image domain only approaches [RN250, RN307, RN254, RN305, RN306]. K-Space and image domain information are equivalent, because they are related by a global linear transformation. Operating in both domains with non-linear methods, however, can potentially be advantageous and improve the network learning.
In this work, we compare our proposal against four recently published, publicly available, deep learning-based reconstruction methods [RN307, RN254, RN305, RN306]. These approaches were evaluated at and undersampling levels (corresponding to acquisition acceleration factors of and ). The experiments were done using MR raw-data acquired from subjects scanned using a volumetric -weighted MR imaging sequence. The proposed reconstruction code has been made publicly available at https://github.com/rmsouza01/Hybrid-CS-Model-MRI. The dataset will be made publicly available for benchmarking purposes at https://sites.google.com/view/calgary-campinas-dataset/home and is part of the Calgary-Campinas datasets [RN136].
Ii Hybrid Network
The flowchart of our proposed method is depicted in Figure 1. There are three main components to our approach: 1) a k-space (or frequency-domain) network that essentially tries to fill missing k-space samples 2) the magnitude of the iDFT, and 3) an image-domain network that acts as an anti-aliasing filter. These components along with the proposed network loss function are described in the following subsections.
Ii-a Frequency-domain Network
The frequency-domain network, , attempts to recover a fully sampled k-space, , given undersampled k-space data, . This can be mathematically formalized as:
where is the normalized undersampled k-space data given by:
where and are the average and standard-deviation of undersampled k-spaces in the training set. The specific architecture used for the frequency-domain network is a residual U-net (Figure 2, left side). The input complex k-space image is split in two-channels: one for the real and other for the imaginary components of the k-space data.
Ii-B Magnitude of the iDFT
Before applying the iDFT, we have to undo the previous k-space normalization step. Adding a constant to the k-space data results in superposition of an impulse, , signal to the image after transformation. Undoing the normalization is accomplished by:
The next step is to transform from the frequency domain to the image domain using the iDFT and magnitude operations:
where represents the iDFT operation and is the initial estimate of the reconstructed image.
This component of our model has no trainable parameters, and the iDFT is efficiently computed using the fast Fourier transform algorithm running on a graphics processing unit (GPU).
Ii-C Image Domain Network
The last component of our method is the image domain network (). In order to improve training convergence of the network, we again normalize the initial estimate of the reconstructed image obtained in the previous step:
where and are the mean and standard-deviation of the reconstructed images in the training set. The normalized image is fed as input to the image domain network:
The final reconstructed image is . The architecture used for the image domain network is a U-net (Figure 2, right side).
Ii-D Loss Function
Our loss function was a weighted sum of normalized root mean squared errors (NRMSE) in each domain given by:
where and are the reference fully sampled k-space and image reconstruction, respectively, of the -th sample in the training set, and is the number of training samples. The first term of the loss function acts as a data fidelity term, i.e., a regularizer, and the second term represents the reconstruction error. In our experiments and . These values were empirically determined.
Iii Experimental Setup
Iii-a Network Implementations
We compared our method, which we will refer to as the Hybrid method, against 1) a plain vanilla U-net [RN196] with a residual connection, referred to as UNET; 2) RefineGAN [RN305]; 3) DAGAN [RN307]; and 4) Deep-Cascade [RN306] with a cascade of five CNNs and five DC blocks, which is the network configuration that the authors reported best results.
We used the Keras application program interface [chollet2015keras] using TensorFlow as backend [tensorflow2015-whitepaper] to implement our hybrid network and the UNET. For RefineGAN, DAGAN and Deep-Cascade, we used the source code provided by the authors. All networks were trained using our data for acceleration factors of and using two-dimensional Gaussian undersampling patterns. The networks were trained and tested on Amazon Elastic Compute Cloud services using a p3.2xlarge111https://aws.amazon.com/ec2/instance-types/p3/ instance, which has a NVIDIA Tesla V100 GPU.
Iii-B Training, Validation and Testing Dataset
Our dataset consists of 45 volumetric -weighted, fully sampled k-space datasets acquired on a clinical MR scanner (Discovery MR750; General Electric (GE) Healthcare, Waukesha, WI) that were collected as part of the ongoing Calgary Normative Study [tsang2017white]. The data was acquired with a -channel imaging coil and an acquisition matrix of size . Data were zero-filled to an image matrix size of . The multi-coil k-space data was reconstructed using vendor supplied tools (Orchestra Toolbox; GE Healthcare). Coil sensitivity maps were normalized to produce a single complex-valued image set that could be back-transformed to regenerate complex k-space samples. In our experiments, we performed retrospective undersampling, effectively simulating a single-coil acquisition scenario. Our train/validation/test data split was 25/10/10, equivalent to 4,524 slices/1,700 slices/1,700 slices. There was no overlap of same subject slices in the train, validations and test sets.
Iii-C Performance Metrics
The performance metrics used to assess the networks were:
where is the number of pixels (or voxels) in the image.
Structural Similarity (SSIM) [RN316]
where and are two variables used to stabilize the division and and represent mean and variance values of the gray-level intensities of the images.
Peak Signal to Noise Ratio (PSNR):
These metrics were used because they are commonly used to assess image reconstruction. The higher the SSIM and PSNR values the better the result. For NRMSE, smaller values represent better reconstructions.
Iii-D Volumetric Assessment
For the top two performing reconstruction techniques, we performed volumetric analysis with a commonly used software for neuroimaging analysis (FreeSurfer [fischl2012freesurfer]). We considered the fully sampled reconstruction results as our gold-standard. Only the ten test volumes were analyzed. We recorded number of processing failures and analyzed the average absolute deviation of total intra-cranial, white-matter, gray-matter, hippocampus, and amygdala volumes.
Iii-E Statistical Methods
We report mean and standard deviation of all average measures. We used a one-way analysis of variance (ANOVA) to determine statistically significant changes and post-hoc paired t-tests to determine statistically significant pair-wise differences. A -value was used to determine statistical significance.
Iv-a Quantitative Assessment
The reconstruction metrics are summarized in Table I. Deep-Cascade achieved the best quantitative results followed by our Hybrid approach across all performance metrics. The one-way ANOVA tests showed statistically significant differences () across all metrics and acceleration factors. The paired t-tests showed that Deep-Cascade was statistically better () than the other methods in the comparison and our Hybrid, while worse than Deep-Cascade, was statistically better than UNET, DAGAN, and RefineGAN (). We note that the absolute differences between Deep-Cascade and Hybrid networks, while statistically significant, were small.
The average NRMSE across the slices in the ten test volumes is depicted in Figure 3. Over the central slices NMRSE followed a predictable pattern. Towards the edges, where little of the brain was present, the error increased. Sample reconstructions for each technique are shown in Figure 4. Qualitatively the Hybrid network produces the most pleasing image, particularly in regions with large signal differences like the cerebellum.
Iv-B Volumetric Analysis
Volumetric analysis results are summarized in Table II. An example of a subject where processing failed for the fully sampled reconstruction and the Deep-Cascade network (for both and acceleration factors) is presented in Figure 5. Box-plots of the estimated volumes of total intra-cranial volume, white-matter, gray-matter, hippocampus, and amygdala volumes are shown in Figure 6. These plots represent the distribution of eight volumes from only eight subjects in the test set (because volumetric analysis failed to process two fully sampled image sets). Due to the reduced sample size () and because of reduced statistical power, we did not perform statistical tests on the volumetric data.
|Acceleration factor||Model||SSIM||NRMSE ()||PSNR|
|Model||Acceleration factor||# failures (%)|
|Fully sampled||2 (20%)|
Our hybrid method operates in frequency domain using a residual U-net and image domain using a U-net. The two networks are connected through the magnitude of the iDFT operation, and the model is fully trained end-to-end. An example for one subject of the input, intermediary and output results of our model are depicted in Figure 7. Improvement in the image quality and NMRSE is noticeable at the output of the frequency domain network (), and further improvement can be seen at the output of the image domain network (). NMRSE for the input image was .
Our Hybrid model achieved the second best metrics in the quantitative assessment, losing only to Deep-Cascade. RefineGAN was the third best method. It is important to mention that in the original RefineGAN paper the authors did not use a test set. They reported results on a validation set. The poorest performing technique was DAGAN, though it still had NMRSE and SSIM . It is also important to highlight that in the original DAGAN paper, the authors did not use MR raw k-space data in their experiments. They computed the FFT of magnitude images followed by retrospective undersampling. This is not a realistic scenario, because when applying the FFT operator to magnitude images, the output is a k-space with Hermitian symmetry, while raw k-space is not Hermitian.
Visual assessment of the reconstructions (Figure 4) indicate that Hybrid and Deep-Cascade reconstruction are the best at preserving fine details as can be more noticeably seen in the cerebellum region. With the Hybrid reconstruction edges seem sharper, which might be an explanation to the fact that volumetric analysis software was able to successfully process all ten image dates sets, while it failed twice when processing the fully sampled reconstruction, and once with Deep-Cascade (acceleration factor of ). The volumetric analysis also failed to process another subject reconstructed with a Deep-Cascade reconstruction (acceleration factor of ). The quality of the brain structure segmentations vary according to the reconstruction. This is specially noticeable in the brain extraction step (Figure 5).
The deviation of the volumes estimated from the fully sampled reconstruction measurements were used as gold-standard (reference) measure for the eight successful subjects. Good agreement was found between the reference and volumetric results for Deep-Cascade and Hybrid image sets. For total intra-cranial volume the average absolute volume deviation was in all subjects. For cerebral white-matter the absolute volume deviation was in all subjects, and for gray-matter, it was in all subjects. The average differences for the hippocampus and amygdala volumes was between and . These results are a suggestive of the feasibility of incorporating a acceleration factor in a clinical or research imaging setting.
In terms of processing time, we did not perform a systematic comparison due to the fact that the different methods were implemented in different deep learning frameworks and many of the networks were not optimized. Nevertheless, all five techniques assessed take only a few seconds to reconstruct an entire MR volume using a cloud-based GPU (Amazon Elastic Compute Cloud). In terms of processing times, UNET, DAGAN, Deep-Cascade and Hybrid took hours to train, while RefineGAN training time was hours.
In this work, we proposed a hybrid frequency domain/image domain CS MR reconstruction method that leverages the information of the iDFT mathematical formulation, essentially reducing the number of parameters of our model by orders of magnitude (for a image, the Hybrid model required fewer coefficients compared to [RN289]). Our model method was the second best in the quantitative comparison and it was the only one that did not fail in the volumetric analysis processing pipeline. Also, our Hybrid model produced the most visually pleasing images (Figure 4).
As future work, we would like to investigate the use of DC layers in our architecture and add an adversarial component for potentially improving reconstruction of high-frequency contents of the image. We will also extend our model to a more generic framework that can potentially deal with the parallel imaging scenario (cf., [sense, grappa]).
The authors would like to thank the Natural Science and Engineering Council of Canada (NSERC) for operating support, and Amazon Web Services for access to cloud-based GPU services. We would also like to thank Dr. Louis Lauzon for setting up the script to save the raw MR data at the Seaman Family MR Centre. R.S. was supported by an NSERC CREATE I3T Award and currently holds the T. Chen Fong Scholarship in Medical Imaging from the University of Calgary. R.F. holds the Hopewell Professor of Brain Imaging at the University of Calgary.