A Hybrid Frequencydomain/Imagedomain Deep Network for Magnetic Resonance Image Reconstruction
Abstract
Decreasing magnetic resonance (MR) image acquisition times can potentially reduce procedural cost and make MR examinations more accessible. Compressed sensing (CS)based image reconstruction methods, for example, decrease MR acquisition time by reconstructing highquality images from data that were originally sampled at rates inferior to the NyquistShannon sampling theorem. Iterative algorithms with data regularization are the standard approach to solving illposed, CS inverse problems. These solutions are usually slow, therefore, preventing nearreal time image reconstruction. Recently, deeplearning methods have been used to solve the CS MR reconstruction problem. These proposed methods have the advantage of being able to quickly reconstruct images in a single pass using an appropriately trained network. Some recent studies have demonstrated that the quality of their reconstruction equals and sometimes even surpasses the quality of the conventional iterative approaches. A variety of different network architectures (e.g., Unets and Residual Unets) have been proposed to tackle the CS reconstruction problem. A drawback of these architectures is that they typically only work on image domain data. For undersampled data, the images computed by applying the inverse Fast Fourier Transform (iFFT) are aliased. In this work we propose a hybrid architecture that works both in the kspace (or frequencydomain) and the image (or spatial) domains. Our network is composed of a complexvalued residual Unet in the kspace domain, an iFFT operation, and a realvalued Unet in the image domain. Our experiments demonstrated, using MR raw kspace data, that the proposed hybrid approach can potentially improve CS reconstruction compared to deeplearning networks that operate only in the image domain. In this study we compare our method with four previously published deep neural networks and examine their ability to reconstruct images that are subsequently used to generate regional volume estimates. We evaluated undersampling ratios of 75% and 80%. Our technique was ranked second in the quantitative analysis, but qualitative analysis indicated that our reconstruction performed the best in hard to reconstruct regions, such as the cerebellum. All images reconstructed with our method were successfully postprocessed, and showed good volumetry agreement compared with the fully sampled reconstruction measures.
I Introduction
Magnetic resonance (MR) is a key medical imaging modality that has critical roles in both patient care and medical research. MR scanner installations, however, are expensive. Another major limitation to MR imaging is the comparatively long image acquisition times, especially when compared to other modalities like computerized tomography. Lengthy acquisition times make MR less patient friendly and increase the per patient examination cost. MRbased compressed sensing (CS) methods seek to leverage the implicit sparsity of medical images [RN280], potentially allowing for significant kspace undersampling during acquisition, and by consequence, reducing examination times.
Traditional MR CS reconstruction techniques are iterative algorithms that usually require a sparsifying transform that when combined with regularization parameters are able to find a solution for these illposed inverse problems [RN280, RN247]. The drawback of these iterative approaches, however, is that they are timeconsuming, making them more difficult to incorporate in a near realtime MR imaging scenario (i.e., where images are reconstructed and displayed on the scanner during the procedure).
Deep learning [RN250] is a new method that has been applied to reconstruction challenges. It has the advantage of being able to rapidly reconstruct images in a singlepass using a suitably trained network. Some deeplearning based reconstruction methods have arguably surpassed traditional iterative CS reconstruction techniques [RN307].
A few different deep learning approaches have been recently proposed to tackle the CS reconstruction problem. Jin et al. [RN254] proposed to use a Unet [RN196]. Lee et al. experimentally showed that residual Unets can potentially improve image reconstruction. Residual blocks have subsequently been incorporated in the majority of the latest studies (cf., [RN307, RN305, RN306]).
Yang et al. [RN307] proposed a deep dealiasing generative adversarial network (DAGAN) that uses a residual architecture as the network generator responsible for reconstructing the images associated to a loss function that has four components: an image domain loss, a frequency domain loss, a perceptual loss, and an adversarial loss. Quan et al. [RN305] proposed a generative adversarial network (GAN) with a cyclic loss [RN315]. Their method consists of a cascade of a reconstruction network followed by a refinement network with a cyclic loss component that tries to enforce that the model is bijective. Schlemper et al. [RN306] proposed a deep cascade of convolutional neural networks (CNNs) that has data consistency (DC) blocks between consecutive subnetworks in the cascade. Their hypothesis is that the DC blocks reduce the issue of overfitting, therefore allowing the training of deeper models.
The aforementioned techniques principally work in the image domain, with a few exceptions, where kspace domain information is used in the loss function and/or to implement DC layers. All of these networks [RN307, RN305, RN306] receive as input the undersampled kspace zerofilled reconstruction, and output an unaliased image.
Zhu et al. [RN289] recently proposed a method that tries to learn the domain transform. Their method first processes the undersampled input data in kspace, learns the inverse discrete Fourier transform (iDFT), and then refines the result in the image domain. In the case of CS reconstruction, the domain transform can be considered as learning an approximation for the iDFT for undersampled kspace data. The domain transform is modeled as a sequence of connected layers, and the image domain refinement is modeled as a series of convolutional layers. A disadvantage of this approach is that the domain transform has a quadratic complexity with respect to the size of the input image. For example, when dealing with images, the number of learned parameters in their model would be greater . The quadratic order of the algorithm makes it challenging to use their model for typical MR image sizes due to current hardware limitations.
Based on these studies, we hypothesize that a hybrid approach that works with both information as presented in kspace domain and image domain (such as the proposal of [RN289]) can improve MR CS reconstruction. In this work, we propose a model that consists of a cascade of a kspace domain network and an image domain network connected by the magnitude of the iDFT operation. Our model does not need to learn the domain transform, which essentially reduces our model parameter complexity to . For a input image, the number of learned parameters is reduced by a factor of for our method compared to [RN289] ( parameters). Our proposed method takes advantage of information as presented in kspace and image domain, as opposed to other image domain only approaches [RN250, RN307, RN254, RN305, RN306]. KSpace and image domain information are equivalent, because they are related by a global linear transformation. Operating in both domains with nonlinear methods, however, can potentially be advantageous and improve the network learning.
In this work, we compare our proposal against four recently published, publicly available, deep learningbased reconstruction methods [RN307, RN254, RN305, RN306]. These approaches were evaluated at and undersampling levels (corresponding to acquisition acceleration factors of and ). The experiments were done using MR rawdata acquired from subjects scanned using a volumetric weighted MR imaging sequence. The proposed reconstruction code has been made publicly available at https://github.com/rmsouza01/HybridCSModelMRI. The dataset will be made publicly available for benchmarking purposes at https://sites.google.com/view/calgarycampinasdataset/home and is part of the CalgaryCampinas datasets [RN136].
Ii Hybrid Network
The flowchart of our proposed method is depicted in Figure 1. There are three main components to our approach: 1) a kspace (or frequencydomain) network that essentially tries to fill missing kspace samples 2) the magnitude of the iDFT, and 3) an imagedomain network that acts as an antialiasing filter. These components along with the proposed network loss function are described in the following subsections.
Iia Frequencydomain Network
The frequencydomain network, , attempts to recover a fully sampled kspace, , given undersampled kspace data, . This can be mathematically formalized as:
(1) 
where is the normalized undersampled kspace data given by:
(2) 
where and are the average and standarddeviation of undersampled kspaces in the training set. The specific architecture used for the frequencydomain network is a residual Unet (Figure 2, left side). The input complex kspace image is split in twochannels: one for the real and other for the imaginary components of the kspace data.
IiB Magnitude of the iDFT
Before applying the iDFT, we have to undo the previous kspace normalization step. Adding a constant to the kspace data results in superposition of an impulse, , signal to the image after transformation. Undoing the normalization is accomplished by:
(3) 
The next step is to transform from the frequency domain to the image domain using the iDFT and magnitude operations:
(4) 
where represents the iDFT operation and is the initial estimate of the reconstructed image.
This component of our model has no trainable parameters, and the iDFT is efficiently computed using the fast Fourier transform algorithm running on a graphics processing unit (GPU).
IiC Image Domain Network
The last component of our method is the image domain network (). In order to improve training convergence of the network, we again normalize the initial estimate of the reconstructed image obtained in the previous step:
(5) 
where and are the mean and standarddeviation of the reconstructed images in the training set. The normalized image is fed as input to the image domain network:
(6) 
The final reconstructed image is . The architecture used for the image domain network is a Unet (Figure 2, right side).
IiD Loss Function
Our loss function was a weighted sum of normalized root mean squared errors (NRMSE) in each domain given by:
(7) 
where and are the reference fully sampled kspace and image reconstruction, respectively, of the th sample in the training set, and is the number of training samples. The first term of the loss function acts as a data fidelity term, i.e., a regularizer, and the second term represents the reconstruction error. In our experiments and . These values were empirically determined.
Iii Experimental Setup
Iiia Network Implementations
We compared our method, which we will refer to as the Hybrid method, against 1) a plain vanilla Unet [RN196] with a residual connection, referred to as UNET; 2) RefineGAN [RN305]; 3) DAGAN [RN307]; and 4) DeepCascade [RN306] with a cascade of five CNNs and five DC blocks, which is the network configuration that the authors reported best results.
We used the Keras application program interface [chollet2015keras] using TensorFlow as backend [tensorflow2015whitepaper] to implement our hybrid network and the UNET. For RefineGAN, DAGAN and DeepCascade, we used the source code provided by the authors. All networks were trained using our data for acceleration factors of and using twodimensional Gaussian undersampling patterns. The networks were trained and tested on Amazon Elastic Compute Cloud services using a p3.2xlarge^{1}^{1}1https://aws.amazon.com/ec2/instancetypes/p3/ instance, which has a NVIDIA Tesla V100 GPU.
IiiB Training, Validation and Testing Dataset
Our dataset consists of 45 volumetric weighted, fully sampled kspace datasets acquired on a clinical MR scanner (Discovery MR750; General Electric (GE) Healthcare, Waukesha, WI) that were collected as part of the ongoing Calgary Normative Study [tsang2017white]. The data was acquired with a channel imaging coil and an acquisition matrix of size . Data were zerofilled to an image matrix size of . The multicoil kspace data was reconstructed using vendor supplied tools (Orchestra Toolbox; GE Healthcare). Coil sensitivity maps were normalized to produce a single complexvalued image set that could be backtransformed to regenerate complex kspace samples. In our experiments, we performed retrospective undersampling, effectively simulating a singlecoil acquisition scenario. Our train/validation/test data split was 25/10/10, equivalent to 4,524 slices/1,700 slices/1,700 slices. There was no overlap of same subject slices in the train, validations and test sets.
IiiC Performance Metrics
The performance metrics used to assess the networks were:

NRMSE
(8) where is the number of pixels (or voxels) in the image.

Structural Similarity (SSIM) [RN316]
(9) where and are two variables used to stabilize the division and and represent mean and variance values of the graylevel intensities of the images.

Peak Signal to Noise Ratio (PSNR):
(10)
These metrics were used because they are commonly used to assess image reconstruction. The higher the SSIM and PSNR values the better the result. For NRMSE, smaller values represent better reconstructions.
IiiD Volumetric Assessment
For the top two performing reconstruction techniques, we performed volumetric analysis with a commonly used software for neuroimaging analysis (FreeSurfer [fischl2012freesurfer]). We considered the fully sampled reconstruction results as our goldstandard. Only the ten test volumes were analyzed. We recorded number of processing failures and analyzed the average absolute deviation of total intracranial, whitematter, graymatter, hippocampus, and amygdala volumes.
IiiE Statistical Methods
We report mean and standard deviation of all average measures. We used a oneway analysis of variance (ANOVA) to determine statistically significant changes and posthoc paired ttests to determine statistically significant pairwise differences. A value was used to determine statistical significance.
Iv Results
Iva Quantitative Assessment
The reconstruction metrics are summarized in Table I. DeepCascade achieved the best quantitative results followed by our Hybrid approach across all performance metrics. The oneway ANOVA tests showed statistically significant differences () across all metrics and acceleration factors. The paired ttests showed that DeepCascade was statistically better () than the other methods in the comparison and our Hybrid, while worse than DeepCascade, was statistically better than UNET, DAGAN, and RefineGAN (). We note that the absolute differences between DeepCascade and Hybrid networks, while statistically significant, were small.
The average NRMSE across the slices in the ten test volumes is depicted in Figure 3. Over the central slices NMRSE followed a predictable pattern. Towards the edges, where little of the brain was present, the error increased. Sample reconstructions for each technique are shown in Figure 4. Qualitatively the Hybrid network produces the most pleasing image, particularly in regions with large signal differences like the cerebellum.
IvB Volumetric Analysis
Volumetric analysis results are summarized in Table II. An example of a subject where processing failed for the fully sampled reconstruction and the DeepCascade network (for both and acceleration factors) is presented in Figure 5. Boxplots of the estimated volumes of total intracranial volume, whitematter, graymatter, hippocampus, and amygdala volumes are shown in Figure 6. These plots represent the distribution of eight volumes from only eight subjects in the test set (because volumetric analysis failed to process two fully sampled image sets). Due to the reduced sample size () and because of reduced statistical power, we did not perform statistical tests on the volumetric data.
Acceleration factor  Model  SSIM  NRMSE ()  PSNR 

UNET  
DAGAN  
RefineGAN  
DeepCascade  
Hybrid  
UNET  
DAGAN  
RefineGAN  
DeepCascade  
Hybrid 
Model  Acceleration factor  # failures (%) 

Fully sampled  2 (20%)  
DEEPCASCADE  2 (20%)  
1 (10%)  
Hybrid  0 (0%)  
0 (0%) 
V Discussion
Our hybrid method operates in frequency domain using a residual Unet and image domain using a Unet. The two networks are connected through the magnitude of the iDFT operation, and the model is fully trained endtoend. An example for one subject of the input, intermediary and output results of our model are depicted in Figure 7. Improvement in the image quality and NMRSE is noticeable at the output of the frequency domain network (), and further improvement can be seen at the output of the image domain network (). NMRSE for the input image was .
Our Hybrid model achieved the second best metrics in the quantitative assessment, losing only to DeepCascade. RefineGAN was the third best method. It is important to mention that in the original RefineGAN paper the authors did not use a test set. They reported results on a validation set. The poorest performing technique was DAGAN, though it still had NMRSE and SSIM . It is also important to highlight that in the original DAGAN paper, the authors did not use MR raw kspace data in their experiments. They computed the FFT of magnitude images followed by retrospective undersampling. This is not a realistic scenario, because when applying the FFT operator to magnitude images, the output is a kspace with Hermitian symmetry, while raw kspace is not Hermitian.
Visual assessment of the reconstructions (Figure 4) indicate that Hybrid and DeepCascade reconstruction are the best at preserving fine details as can be more noticeably seen in the cerebellum region. With the Hybrid reconstruction edges seem sharper, which might be an explanation to the fact that volumetric analysis software was able to successfully process all ten image dates sets, while it failed twice when processing the fully sampled reconstruction, and once with DeepCascade (acceleration factor of ). The volumetric analysis also failed to process another subject reconstructed with a DeepCascade reconstruction (acceleration factor of ). The quality of the brain structure segmentations vary according to the reconstruction. This is specially noticeable in the brain extraction step (Figure 5).
The deviation of the volumes estimated from the fully sampled reconstruction measurements were used as goldstandard (reference) measure for the eight successful subjects. Good agreement was found between the reference and volumetric results for DeepCascade and Hybrid image sets. For total intracranial volume the average absolute volume deviation was in all subjects. For cerebral whitematter the absolute volume deviation was in all subjects, and for graymatter, it was in all subjects. The average differences for the hippocampus and amygdala volumes was between and . These results are a suggestive of the feasibility of incorporating a acceleration factor in a clinical or research imaging setting.
In terms of processing time, we did not perform a systematic comparison due to the fact that the different methods were implemented in different deep learning frameworks and many of the networks were not optimized. Nevertheless, all five techniques assessed take only a few seconds to reconstruct an entire MR volume using a cloudbased GPU (Amazon Elastic Compute Cloud). In terms of processing times, UNET, DAGAN, DeepCascade and Hybrid took hours to train, while RefineGAN training time was hours.
Vi Conclusion
In this work, we proposed a hybrid frequency domain/image domain CS MR reconstruction method that leverages the information of the iDFT mathematical formulation, essentially reducing the number of parameters of our model by orders of magnitude (for a image, the Hybrid model required fewer coefficients compared to [RN289]). Our model method was the second best in the quantitative comparison and it was the only one that did not fail in the volumetric analysis processing pipeline. Also, our Hybrid model produced the most visually pleasing images (Figure 4).
As future work, we would like to investigate the use of DC layers in our architecture and add an adversarial component for potentially improving reconstruction of highfrequency contents of the image. We will also extend our model to a more generic framework that can potentially deal with the parallel imaging scenario (cf., [sense, grappa]).
Acknowledgment
The authors would like to thank the Natural Science and Engineering Council of Canada (NSERC) for operating support, and Amazon Web Services for access to cloudbased GPU services. We would also like to thank Dr. Louis Lauzon for setting up the script to save the raw MR data at the Seaman Family MR Centre. R.S. was supported by an NSERC CREATE I3T Award and currently holds the T. Chen Fong Scholarship in Medical Imaging from the University of Calgary. R.F. holds the Hopewell Professor of Brain Imaging at the University of Calgary.