A Deep Information Sharing Network for Multicontrast Compressed Sensing MRI Reconstruction
Abstract
In multicontrast magnetic resonance imaging (MRI), compressed sensing theory can accelerate imaging by sampling fewer measurements within each contrast. The conventional optimizationbased models suffer several limitations: strict assumption of shared sparse support, timeconsuming optimization and “shallow” models with difficulties in encoding the rich patterns hiding in massive MRI data. In this paper, we propose the first deep learning model for multicontrast MRI reconstruction. We achieve information sharing through feature sharing units, which significantly reduces the number of parameters. The feature sharing unit is combined with a data fidelity unit to comprise an inference block. These inference blocks are cascaded with dense connections, which allows for information transmission across different depths of the network efficiently. Our extensive experiments on various multicontrast MRI datasets show that proposed model outperforms both stateoftheart singlecontrast and multicontrast MRI methods in accuracy and efficiency. We show the improved reconstruction quality can bring great benefits for the later medical image analysis stage. Furthermore, the robustness of the proposed model to the nonregistration environment shows its potential in real MRI applications.
I Introduction
Magnetic resonance imaging has been widely used to generate anatomically precise images of invivo tissue. A major limitation of MRI is the relatively slow data acquisition speed. Compressed sensing (CS) has therefore been used to accelerate MRI by reducing the number of the kspace (i.e., Fourier) measurements directly acquired by the machine [5]. CS theory shows how accurate or even perfect reconstruction can be achieved via appropriate optimizations to fill in the missing Fourier coefficients of kspace [2]. Recently, the compressed sensing MRI has been approved by FDA to two main MRI vendors: GE and Siemens [6]. Hence more MRI scans are expected to be produced using compressed sensing methods in clinic, where maintaining the high reconstruction quality with rapid imaging speed is important to improve the performance of later analysis stage and patients’ comfort. The compressed sensing for magnetic resonance imaging (CSMRI) is also an active research topic in medical imaging, and now one of the classic inverse imaging problems in the field of computer vision.
Similar to other tasks in image restoration and reconstruction, research on CSMRI is driven by proposing an effective optimization model for MRI reconstruction. For example, MRI is modeled by sparsity constraints in a fixed transform bases, e.g., SparseMRI [19], TVCMRI [20], RecPF [33] and FCSA [11, 10]. Limited by the representation ability of the models with nonadaptive transform basis, some work devoted to utilizing the geometric information within image patches such as PBDW [22], PANO [23], FDLCP [34] and GBRWT [15]. Ravishankar and Bresler [24] and Huang et al. [12] also introduced dictionary learning into CSMRI.
As [15, 24] show, models with adaptive transform bases achieve higher reconstruction quality, but at the expense of heavy computational burden. Furthermore, conventional optimizationbased CSMRI methods are implemented in situ, meaning they do not rely on information from MRI training data. The first issue is a clear drawback, while the second may have positive aspects, but the power of deep learning has shown a clear advantage in exploiting big data resources with a deep model.
Thus, deep neural networks have recently been introduced to CSMRI. For example, Wang et al. [30] use a vanilla CNN model to learn the mapping from zerofilled MRI to fullysampled MRI via a massive MRI training set. (Note the term “zerofilled MRI” means the missing Fourier coefficients are replaced by zeros, followed by an inverse 2D FFT.) Sun et al. [28] proposed ADMMNET as a modification of the alternating direction method of multipliers (ADMM) algorithm where the parameters are inferred via backpropagation. Lee et al. [16] proposed a modified UNet to learn the mapping in the residual domain. Notably, Schlemper et al. [27, 26] proposed a deep cascade convolutional neural network (DCCNN) to unroll the standard paradigm of CSMRI into the deep learning architecture. The DCCNN represents the stateoftheart performance in singlecontrast CSMRI in both imaging quality and speed.
The work mentioned above is based on singlecontrast CSMRI reconstruction. Usually, an MRI scan can obtain images of the same anatomical section under different contrasts, such as T1, T2, and protondensity (PD) weighted MRI generated, by applying different MRI protocols [18]. Multicontrast MRI contains similar but not the same image structures. By comparing multiple contrast MRI in the same region, radiologists can detect subtle abnormalities such as a developing tumor. This is illustrated in Figure 1(a), 1(b) and 1(c), where the PD, T1 and T2 MRI in the SRI24 [25] datasets exhibit similar structures. In the second row we show the root of sum of square of the horizontal and vertical gradients of the multicontrast MR images. Rather than reconstruct each multicontrast MRI independently, joint reconstruction can provide higher quality images by exploiting such structural similarity.
In this paper, we propose the first deep models for multicontrast CSMRI reconstruction. We start with two basic networks called deep independent reconstruction network (DIRN) and deep feature sharing network (DFSN). DIRN uses separate parallel networks to reconstruct each contrast of the MRI with each network a stateoftheart DCCNN architecture [27]. DFSN takes the further step of applying a feature sharing strategy that significantly reduces the number of network parameters. Our final deep model, which extends the stateoftheart results of DFSN, uses a dense connection strategy to transfer information across layers in the network. We call this endtoend model a deep information sharing network (DISN) for multicontrast CSMRI inversion. DISN comprises cascaded and densely connected inference blocks consisting of feature sharing units and data fidelity units. In the feature sharing units, all multicontrast MRI share the same feature maps. We use dense connections to help information sharing at different depths.
Our contributions can be summarized as follows:

In the proposed basic DFSN model, the feature sharing unit fully exploits the similarity among the multicontrast MRI. The comparative experiments show the DFSN model outperforms DIRN model with multiple amounts of parameters of the independent parallel networks.

In the proposed DISN model, the dense connection operation is proposed to propagate the information from lower blocks to deeper blocks directly. The number of parameters only increase linearly rather than quadratically in the regular DenseNet [7]. Even with much fewer network parameters, the dense connection strategy still shows advantages.

The experiments on various multicontrast MRI datasets show the proposed DISN model achieves the stateoftheart performance compared with both singlecontrast and multicontrast MRI methods in imaging quality and speed. We also show the improved reconstruction quality can significantly benefit the later analysis stage.

The DISN model is robust to the nonregistration situations because of large model capacity, which is the usual case in real MRI application.
The rest of this paper is organized as follows: Section II provides the related work in the field of multicontrast MRI reconstruction. Section III elaborates the basic DIRN and DFSN model and the proposed DISN models. Section IV compares the different deep models and reports the experimental results on various multicontrast MRI datasets including SRI24 [25], MRBrainS13 [21] and NeoBrainS12 [13]. Section V discusses the network size, testing running time, nonregistration environment. Finally in Section VI we draw the conclusions.
Ii Related Works: Compressed sensing for Multicontrast MRI Reconstruction
Previous work has exploited the structural correlations in multicontrast MRI using nondeep approaches. Suppose we aim at reconstructing multicontrast MRI images, for example when PD, T1 and T2 MRI are used. One can formulate this problem as
(1) 
where denotes the contrast of the complexvalued MR image to be reconstructed and indicates the set of all . denotes the undersampled Fourier matrix and () denotes the kspace measurements. Note that in the field of multicontrast MRI, it is common to undersample all the multicontrast MRI data using different undersampling masks with the same undersampling ratio. The first term is called the data fidelity and ensures consistency between the reconstructed image and measurements. encodes a regularization for the MRI contrast images.
Two notable approaches to multicontrast CSMRI with which we compare are Bayesian Compressed Sensing by Bilgic et al. [1] and FCSAMT by Huang et al. [8].
Iia Bayesian Compressed Sensing
Bilgic et al. base their approach on a modification to Bayesian compressed sensing (BCS) [14] that exploits structural similarity across contrasts. To exploit the structure similarity, the author cast the problem in gradient domain, the vertical and horizontal gradients of the multicontrast MRI are set zeromean Gaussian distributions as prior. The prior distributions for each contrast share the same precision estimated in maximum likelihood fashion by the MRI with all the contrasts. According the conjugacy, the posteriors of the gradients are also Gaussian distributions. The inferred gradients are used to produce the reconstruction for each contrast via least square. Combining the horizontal and vertical gradients with the kspace data fidelity, a least squared problem can be solved to yield the reconstruction.
We note the all the multicontrast MRI images contribute to the estimation of the precision of the gradients and the precision is shared. Instead of imposing strict sparse support assumption among these multicontrast MRI, the BCS model controls the similarity by expressing uncertainty, while also allowing for the idiosyncrasy in each contrast. However, the BCS method suffers several limitations: (1) The sparsity is imposed on gradient domain, which is an improved variant of total variation regularization in essence. (2) Each coefficients in gradient domain is imposed on a unimodal Gaussian distribution, which is difficult to capture the diversity patterns in MRI images. (3) More importantly, the running time of the BCS algorithm is long, eg., about 26 hours for processing a set of multicontrast MRI data. The running time of the initial algorithm is impracticable in real scenarios, although the authors later accelerated the model at the expense of performance [3].
IiB Group Sparsity
Huang et al. extended the FCSA [11, 10] algorithm designed originally for singlecontrast MRI to multicontrast MRI called FCSAMT [8, 9]. The FCSAMT model is based on two key observations: 1) Across the multicontrast MRIs, the variance of the gradients should show similarity in the same spatial positions. 2) The wavelet coefficients across the multicontrast MRIs also should have similar nonzero supports in the same anatomical sections. In the FCSAMT model, the least sqaured data fidelity fitting with joint total variation (JTV) regularization and group wavelet sparsity regularization is proposed as the loss function,
(2)  
where the vectorized multicontrast MRI images are arranged in column to form the data matrix , and the joint total variation is defined as and the wavelet group sparsity regularized in the form of norm , where and stands for horizontal and vertical difference operator and stands for the orthogonal wavelet transform matrix. The FCSAMT model achieves balance between the model performance and computational efficiency. In this approach, structural correlations are modeled as group sparsity, which can clearly outperform its singlecontrast FCSA counterpart on this problem. However the model is limited by the fixed transform domain, eg., finite difference and wavelet and the model is designed in situ, no external data is used to provide further information. Besides, the group sparsity assumption that the multicontrast MRI have similar sparse supports in way is too strict, especially under nonregistration environment. Huang et al. later accelerated FCSAMT using fast conditioning [17].
Compared with the BCS and FCSAMT models, the proposed DISN model can encode the complex patterns within the multicontrast MRI datasets. After training stage, the forward pass is highly efficient because no iteration for optimization is required. Another major advantage over previous methods is the powerful nonlinear mapping ability, overcoming the strict sparse support assumption, which is valuable in real applications in MRI.
Iii A deep information sharing network (DISN)
We propose a deep model that takes a set of subsampled Fourier kspace measurements at multiple contrasts, , and outputs the corresponding image reconstructions at each contrast . The model learns how to exploit structural similarities across these contrasts to produce an output that is significantly better than can be obtained via independent inversion algorithms. Because this represents the first deep learning approach to the problem, we experiment with three different deep structures, but the bestperforming structure is a “deep information sharing network” (DISN). This consists of cascaded blocks with dense connections. Within each block, we adopt a feature sharing unit combined with a data fidelity unit. Below, we elaborate on the feature sharing unit and data fidelity unit, as well as how they are combined to form the inference block. We then discuss how the blocks are connected densely in an efficient way.
Iiia Feature sharing unit
An intuitive approach to multicontrast CSMRI inversion is to simply reconstruct them separately with a deep model, eg., as shown in Figure 2(a). We call this a deep independent reconstruction network (DIRN). The DIRN model we show is made up of several parallel subnetworks. Here we plot subnetworks for PD, T1 and T2 contrasts. The architecture of each subnetwork is the stateofart DCCNN architecture [27]. If each subnetwork consists of inference blocks, we name it DIRNB (All the subnetworks share the same number of blocks). Here we adopt blocks for example, eg. , DIRNB. Each building block consists of convolutional layers with global shortcut and a data fidelity unit (we will discuss this later). In each block, the first convolutional layer is used to map the MRI images to multiple feature maps and the last convolutional layer integrates the feature maps into a single reconstruction in residual domain. The Leaky ReLU is used as activation function except for the last convolutional layer, where the identity mapping is used. There are no interactions among these subnetworks. In such a deep network setting, massive MRI data is used to learn the complex patterns of each multicontrast MRI separately.
Although DIRN may provide powerful modeling ability for each contrast of the MRI, the number of network parameters is tripled because if there are three subnetworks. As we showed in Figure 1, the structural similarity should be exploited in deep neural network architectures, both to achieve better reconstruction and also with the aim of reducing parameters. Hence we also consider a deep feature sharing network (DFSN) as shown in Figure 2(b). Similar to DIRN, the DFSN consists of cascaded inference blocks, eg. , DFSNB, while each block is made up of a feature sharing unit and a data fidelity unit. The multicontrast like T1, T2 and PD zerofilled MR magnitude images are fed to the DFSN in a stack. The DFSN network can therefore reconstruct multicontrast MRI data simultaneously. We show the feature sharing unit in Figure 3. In traditional multicontrast MRI methods, the structural similarity is modeled in the finite difference domain; instead we adopt residual learning in the feature sharing unit. Similarly, each feature sharing unit contains convolutional layers with the same number of filters as the single subnetwork of DIRN in each layer and all activation functions are Leaky ReLU, except fore the identity mapping in the last convolutional layer.
IiiA1 Discussion
The proposed feature sharing strategy has a similar motivation to traditional sparse representation methods. In each feature sharing unit, we denote the residuals for the contrast MRI as . As mentioned previously, for the last convolutional layer in the unit, the activation function is set to the identity function, thus the following equation holds: , where the denotes the feature map for the last convolutional layer in the unit, and denotes the corresponding kernel in Toeplitz matrix form. In the classic dictionary learning formulation, the signal can be approximated as , or equivalently . is the column of the dictionary and is the entry of the sparse coefficients .
In previous work such as the ScSR model for image superresolution [31, 32], the patches of high and low resolution images share the same sparse coefficients yet different dictionaries and . In such setting the correlation between the low and high resolution image patches may be overlooked. While in the DFSN model, the part function similar to highresolution and lowresolution dictionaries, in the form of , are inferred simultaneously with the representation coefficients via a large dataset. In Figure 4, we show the feature maps for the last convolutional layer in the feature sharing unit of the block with the DFSN model. We observe that they contain enough diversity to represent the structures in PD, T1 and T2 MRI, which validates the feature sharing strategy.
IiiB Data fidelity unit
We also use the data fidelity unit [27] within each block to reduce bias by enforcing more accurate values on the sampled positions in kspace. Following Equation 1, we solve the following objective function in the data fidelity unit for each contrast,
(3) 
where is the input to the data fidelity unit and is the regularization parameter. To enforce consistency between the reconstruction and the measurements , we set a large value, e.g., , which only penalizes deviation from these measured locations. The second term can be viewed as the prior guess, where the input image is the output by the feature sharing unit. We observe that these fidelity units are calculated independently for each contrast, but each input is constructed by sharing information across contrasts in the deep model.
We can simplify by working in the Fourier domain, after which the solution is (using elementwise division below)
(4) 
where the term is the Fourier transform of the zerofilled reconstruction, the term is a diagonal matrix with ones at the sampled locations and zeros otherwise. Calling the feed forward function for this unit , the relevant gradient for model training is
(5) 
IiiC Dense connections
We proposed DFSN to share information across contrasts of the MRI. We visualize the intermediate reconstructions of T2 brain MRI by the inference blocks of DFSNB model in Figure 5. The test data is undersampled with 1D Cartesian mask given in Figure 7. We observe that the outputs of the inference blocks keep on improving as the network goes deeper. However, we conjecture that intermediate reconstructions from lower blocks contain valuable information lost to deeper levels. We plot the pixelwise absolute reconstruction error map of the T2 intermediate reconstruction of the DFSNB model from the first to the fifth block in the third row of Figure 5. In the fourth and fifth rows, we show positive error map differences of lower blocks with higher blocks, meaning we only focus on the positions where the lower block achieves high reconstruction accuracy. As the blocks go deeper, we observe the intermediate reconstructions from lower blocks show less advantages.
Inspired by this observation, we densely connect the inference blocks in DFSN and propose a deep information sharing network (DISN). In Figure 2(c), we show the network architecture of the DISN5B model. The “information sharing” concept is expressed in two ways: (1) The information between the multicontrast MRI is shared via the feature sharing unit. (2) The information in the lower inference blocks and deeper inference blocks is shared by dense connections using concatenations. Each block in DISN receives the output from all previous blocks as its input.
As with DenseNet [7], where the feature maps are densely fed from lower to deeper layers by concatenation, the dimension of the channels in deeper layers may explode quadratically, limiting the depth of the model. Inspired by DenseNet and the similar MemNet [29], the DISN is different in that only the intermediate reconstructed MRI images are concatenated rather than the large number of feature maps. As a consequence, the dimensionality only increases linearly in the channel according to the number of contrast.
Iv Experiments
Iva Datasets
We conduct experiments on three multicontrast MRI datasets including the SRI24 atlas [25], MRBrainS13 benchmark [21] and NeoBrainS12 benchmark [13].
IvA1 SRI24 Atlas
The multicontrast brain MRI atlas data was obtained on a 3.0T GE scanner with 8channel head coil with three different contrast setting:

For T1weighted MRI data: 3D axial IRprep SPoiled Gradient Recalled (SPGR), TR = ms, TE = ms, number of slices = , slice thickness = mm.

For T2weighted (late echo) and PDweighted (early echo) MRI data: 2D axial dualecho fast spin echo (FSE), TR = s, TE = ms, number of slices = , slice thickness = mm.
The fieldofview covers a region of mm with resolution pixels. The SRI24 dataset contains T1wT2wPD MRI training pairs. We randomly select 36 multicontrast MRI data pairs as test data, while the others are used for training.
IvA2 MRBrainS
Twenty fully annotated multicontrast (T1weighted, T1weighted inversion recovery and T2FLAIR) 3T MRI brain scans with the size are provided in the Grand Challenge on MR Brain Image Segmentation (MRBrainS) workshop at the MICCAI2013. The voxel size of T1, T1IR and T2FLAIR MRI is , and respectively. These scans have been acquired at the UMC Utrecht (the Netherlands) of patients with diabetes and matched controls (with increased cardiovascular risk) with varying degrees of atrophy and white matter lesions (age 50). These abnormalities have different appearances in different contrasts. The 20 scans contains 320 pairs of multicontrast MRI data in total, we randomly select of the slices for training and the others for testing.
IvA3 NeoBrainS
Different from the MRBrainS data which is acquired on the adult, the grand challenge in MICCAI2012 called Neonatal Brain Segmentation (NeoBrainS) provides the multicontrast (T1weighted and T2weighted) MRI scans of the neonatal brains. All the 7 scans containing 175 multicontrast MRI data pairs of the size are acquired using Philips 3T system at the University Medical Center Utrecht, The Netherlands. The detailed imaging parameters can be found in [13]. We also randomly select of the slices as training datasets and as testing datasets.
IvB Training and parameter details
The loss function for DIRN, DFSN and DISN is
(6) 
where and are zerofilled and fullysampled magnitude MRI, respectively, for the contrast. The represents the network parameters for each subnetwork for DIRN, while in DFSN and DISN, they are incorporated in the single feature sharing unit.
During training, the deep models are implemented on TensorFlow for the Python environment on a NVIDIA Geforce GTX 1080Ti with 11GB GPU memory and Intel Xeon CPU E52683 at 2.00GHz. We use data augmentation on the entire training set. For each block within the feature sharing unit, we adopt convolutional layers followed by Leaky ReLU activation function with negative slope except for the last convolutional layer, where identity mapping is applied. For each convolutional layer, we obtain shared feature maps except for the first and last convolutional layer, where feature maps are used for the contrast residuals. These settings are applied to both DFSN and DISN. For the DIRN model, the first and last convolutional layer in the inference block has only one feature map since the contrasts are reconstructed using different deep networks. The kernel size is set to and padding is used to keep the size of feature maps unchanged. We apply Xavier initialization for all models. We train for iterations using ADAM. We select the initial learning rate to be 0.0005, the firstorder momentum to be 0.9 and the second momentum to be 0.999. Each minibatch contains MRI data pairs.
IvC Deep model comparison
We compare DIRN, DFSN and DISN to check the utility of the feature sharing and dense connection strategies on the SRI24 atlas datasets. In Figure 6, we show the network performance on DIRNB, DFSNB and DISNB using all the test data, whose orders have been shuffled. We use the undersampling pattern in Figure 7. Compared with DIRN, DFSN shows the advantage of feature sharing, while DISN has further improvement by exploiting information across depths by the dense connections. The number of parameters of DFSNB is , much fewer than DIRNB having , while DISNB has slightly more parameters () because of extra kernels used in the dense connections. We also ran the experiment with feature maps for DFSNB, resulting in parameters and found that DISNB still achieves higher reconstruction quality with feature maps. In Figure 6 we show all results for maps and one result for maps.
IvD The model comparisons on the SRI24 datasets
On the SRI24 datasets, we compare the proposed DISN5B model and its more basic versions DIRN5B and DFSN5B model with singlecontrast MRI methods PANO [23], GBRWT [15] and stateoftheart multicontrast methods, such as BCS [1] and FCSAMT [9], using three different D Cartesian masks with the same sampling ratio of . The parameter setting of the nondeep optimization models has been adjusted to optimal. The reconstructions and error residual images are shown in Figure 7. We see that the visual quality of DISN outperforms other methods, providing better preservation for structure details. This is supported by objective quantitative measures of PSNR and SSIM, which we show in Figures 8(a) and 8(b). We find that a plain deep model like DIRN and DFSN already achieves good performance on the task, while the proposed DISN model achieves the best performance. We further test DISN using three different D random masks with sampling ratio of , and compare with FCSAMT, DIRN and DFSN. These reconstruction results are shown in Figure 9. The experiment proves the DISN model can be well generalized to different sampling patterns with different undersampling ratios.
IvE The model comparisons on the MRBrainS datasets
Regions  Whole Brain  Regions of WML  
Contrasts  T1  T1IR  T2FL  T1  T1IR  T2FL 
BCS  28.77/0.854  30.92/0.922  29.40/0.788  27.80/0.816  28.27/0.852  29.39/0.851 
PANO  32.82/0.928  32.81/0.953  33.48/0.926  30.83/0.893  29.77/0.901  31.89/0.902 
GBRWT  33.16/0.939  33.10/0.958  33.91/0.943  31.00/0.896  30.17/0.909  32.07/0.901 
FCSAMT  32.59/0.934  34.73/0.967  31.57/0.915  31.12/0.899  32.22/0.928  31.53/0.892 
DIRN5B  36.48/0.967  37.54/0.978  34.52/0.935  33.87/0.935  33.70/0.944  32.90/0.916 
DFSN5B  37.17/0.969  39.80/0.984  35.71/0.949  34.77/0.942  36.41/0.963  34.22/0.937 
DISN5B  37.65/0.972  40.54/0.985  36.53/0.955  35.01/0.943  37.07/0.966  35.27/0.942 
DC  FCSAMT  GBRWT  DIRN5B  DFSN5B  DISN5B  FullSampled 
GM  66.53  70.09  70.58  75.88  76.41  77.93 
WM  77.66  79.30  79.90  84.56  85.03  85.84 
CSF  73.85  75.81  77.96  80.53  80.97  81.68 
The standard scans in the SRI atlas contain no lesions of the brain and complex structural patterns, while the abnormalities show different appearances and diagnostic information across the different contrasts. Hence we also test our proposed model on the multicontrast MRI datasets MRBrainS where the scans were acquired on patients with varying degree of white matter lesions (WML). We adopt 3 different 1D Cartesian masks for undersampling as used in Figure 7.
In Table I we observe the proposed DISN5B still outperforms the compared stateoftheart algorithms, followed by DFSN5B. We show the reconstructions on a representative multicontrast test MRI in Figure 10. From these experiments, we observe DISN model still works well in more complicated and diverse multicontrast MRI settings. The employed deep neural network is flexible enough to model the structural similarities while distinguishing the structural differences across multicontrast MRI, for example the white matter lesions regions are better recovered as shown in Table I in the proposed DISN model, providing more reliable diagnostic information.
The MRI data are annotated with segmentation labels, and we test the reconstructed MR images produced by the compared CSMRI models on the stateoftheart welltrained medical image segmentation model called UNet [4] with pixelwise crossentropy as loss function. The model is trained with fullsampled MRI and label pairs. We use the widelyused Dice Coefficients (DC) as the objective index to evaluate the segmentation performance. The averaged objective DC results are shown in Table II (DC index is in percent and higher score means better segmentation). Also, we show the subjective segmentation comparisons in Figure 11. We observe the better reconstruction produced by DISN model also leads to more accurate segmentation, which is near the upper bound of segmentation performance provided by the segmentation of fullsampled MR images on the UNet model. The proposed multicontrast MRI reconstruction model DISN can bring significant benefits in the medical image analysis.
IvF The model comparisons on the NeoBrainS datasets
Besides the multicontrast MRI data MRBrainS benchmark acquired on patients (age 50), we also test the proposed multicontrast MRI reconstruction model on the neonatal brain MRI in the NeoBrainS benchmark. The neonatal brains grow rapid and develop a wide range of cognitive and motor functions, which is critical in many neurodevelopmental and neuropsychiatric disorders, such as schizophrenia and autism. The DIRNB, DFSNB and DISNB are trained and tested on the training datasets in NeoBrainS benchmark with Cartesian undersampling mask of the size . We show the reconstructed MRI images of DIRNB, DFSNB and DISNB and their corresponding error maps in Figure 12. We observe the DISN5B again achieves the optimal reconstruction quality. In Figure 13, we give the averaged PSNR and SSIM evaluation of the three compared deep models, which is consistent with the visual assessments.
The experimental results on the three different multicontrast MRI datasets prove the proposed DISN model can also be well generalized to standard brain MRI datasets, brain MRI datasets with pathological abnormalities and neonatal brain MR datasets with different undersampling patterns and undersampling ratios.
V Discussion
Va Converge Analysis
In Figure 14 below, we show the training loss curve on the SRI24 atlas datasets as a function of iteration with masks from Figure 7. We observe the convergence for these deep models is relatively fast, and DISN gives a network with better training loss.
VB Network size
Next we discuss DISN model performance by adjusting the number of cascaded blocks from to and give these results in Figure 15 on the SRI24 datasets. We find as the number of blocks increases, the network performance steadily increase with smaller marginal improvement, while the DISNB model already achieves the stateoftheart performance in multicontrast CSMRI reconstruction.
VC Testing running time
In Table III we compare the running time for different models at testing time on the SRI24 datasets. For the optimizationbased single and multicontrast MRI methods, additional optimizations are required on these test images, making processing of a new MRI more timeconsuming. On the other hand, for DIRN5B, DFSN5B and DISN5B, the reconstruction is much faster because the model is feedforward and no iterations are required.
BCS  PANO  GBRWT  FCSAMT  DIRN  DFSN  DISN 
30min  19.6s  64.6s  5.7s  0.17s  0.11s  0.18s 
VD Nonregistration environment
The multicontrast MRI datasets used in this work have already undergone registration, i.e., been made to overlap as well as possible. However, in real MRI scenarios such accurate registration is not always realistic. For traditional optimizationbased multicontrast MRI methods such as FCSAMT, this registration must be strictly satisfied because of the rigid sparsity assumption in these models. However, for the proposed DISN the trained network is quite robust to the shifts that are normal in the realworld MRI scanning process.
In this experiment, we take the SRI24 datasets for example and train the DISN5B model with randomly shifted MRI data pairs in the small range within 2 pixels in all directions. We then test the DISN model on the positionfixed PD, T1 and positionshifted T2 data in the test datasets. The T2 data is also shifted within the 2 pixels in all directions. (We use the undersampling masks shown in Figure 7.) Since FCSAMT is done in situ, there is no retraining required using shifted examples as necessary with DISN. However, in comparison between retrained DISN and FCSAMT, we observed that DISN is more robust to these pixel shifts. This is shown for the T2 reconstruction as a function of pixel shift in Figure 16.
We observe the FCSAMT with wellregistered MRI pairs outperforms the GBRWT, which is the stateoftheart singlecontrast CSMRI method only performing on the shifted T2 data, while the performance of FCSAMT decreases dramatically when the shifting becomes severe. The DISN model steadily outperforms FCSAMT and GBRWT regardless of the shifting. The simulation experiments show the proposed DISN model has the application potential in clinical MRI scenarios.
Vi Conclusion
We have proposed the first deep models for the multicontrast CSMRI inversion problem. The model consists of densely cascaded inference blocks each containing a feature sharing unit and data fidelity unit. The feature sharing strategy can significantly reduce the number of parameters while still obtaining excellent model performance by virtue of the structural similarity across the multiple contrasts. The dense connection helps to share information across the blocks in a computationally efficient way. The experiments on different multicontrast MRI datasets demonstrate that DISN achieves stateoftheart performance in imaging quality and speed, bringing benefits to the later medical image analysis stage. Furthermore, its robustness to the nonregistration environment shows potential for real multicontrast MRI application.
Acknowledgment
This work was supported in part by the National Natural Science Foundation of China under Grants 61571382, 81671766, 61571005, 81671674, U1605252, 61671309 in part by the Guangdong Natural Science Foundation under Grant 2015A030313007, in part by the Fundamental Research Funds for the Central Universities under Grant 20720160075, 20720180059, in part by the National Natural Science Foundation of Fujian Province, China under Grant 2017J01126.
References
 [1] B. Bilgic, V. K. Goyal, and E. Adalsteinsson, “Multicontrast reconstruction with bayesian compressed sensing,” Magnetic Resonance in Medicine, vol. 66, no. 6, pp. 1601–1615, 2011.
 [2] E. J. Candes, J. K. Romberg, and T. Tao, “Stable signal recovery from incomplete and inaccurate measurements,” Communications on pure and applied mathematics, vol. 59, no. 8, pp. 1207–1223, 2006.
 [3] S. Cauley, Y. Xi, B. Bilgic, K. Setsompop, J. Xia, and E. Adalsteinsson, “Scalable and accurate variance estimation (save) for joint bayesian compressed sensing,” in th Annual Meeting of ISMRM, 2013, p. 2603.
 [4] H. Dong, G. Yang, F. Liu, Y. Mo, and Y. Guo, “Automatic brain tumor detection and segmentation using UNet based fully convolutional networks,” in Annual Conference on Medical Image Understanding and Analysis. Springer, 2017, pp. 506–517.
 [5] D. L. Donoho, “Compressed Sensing,” IEEE Transactions on information theory, vol. 52, no. 4, pp. 1289–1306, 2006.
 [6] J. A. Fessler, “Medical image reconstruction: a brief overview of past milestones and future directions,” arXiv preprint arXiv:1707.05927, 2017.
 [7] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
 [8] J. Huang, C. Chen, and L. Axel, “Fast multicontrast MRI reconstruction,” in Proceedings of the 15th international conference on Medical Image Computing and ComputerAssisted InterventionVolume Part I. SpringerVerlag, 2012, pp. 281–288.
 [9] ——, “Fast multicontrast MRI reconstruction,” Magnetic resonance imaging, vol. 32, no. 10, pp. 1344–1352, 2014.
 [10] J. Huang, S. Zhang, and D. Metaxas, “Efficient MR image reconstruction for compressed MR imaging,” Medical Image Computing and ComputerAssisted Intervention–MICCAI 2010, pp. 135–142, 2010.
 [11] ——, “Efficient MR image reconstruction for compressed MR imaging,” Medical Image Analysis, vol. 15, no. 5, pp. 670–679, 2011.
 [12] Y. Huang, J. Paisley, Q. Lin, X. Ding, X. Fu, and X.P. Zhang, “Bayesian nonparametric dictionary learning for compressed sensing MRI,” IEEE Transactions on Image Processing, vol. 23, no. 12, pp. 5007–5019, 2014.
 [13] I. Isgum, M. J. N. L. Benders, B. B. Avants, M. J. Cardoso, S. J. Counsell, E. F. Gomez, L. Gui, P. S. Huppi, K. J. Kersbergen, A. Makropoulos et al., “Evaluation of automatic neonatal brain segmentation algorithms:the neobrains12 challenge,” Medical Image Analysis, vol. 20, no. 1, pp. 135–151, 2015.
 [14] S. Ji, Y. Xue, and L. Carin, “Bayesian compressive sensing,” IEEE Transactions on Signal Processing, vol. 56, no. 6, pp. 2346–2356, 2008.
 [15] Z. Lai, X. Qu, Y. Liu, D. Guo, J. Ye, Z. Zhan, and Z. Chen, “Image reconstruction of compressed sensing MRI using graphbased redundant wavelet transform.” Medical Image Analysis, vol. 27, p. 93, 2016.
 [16] D. Lee, J. Yoo, and J. C. Ye, “Deep residual learning for compressed sensing MRI,” in 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), April 2017, pp. 15–18.
 [17] R. Li, Y. Li, R. Fang, S. Zhang, H. Pan, and J. Huang, “Fast preconditioning for accelerated multicontrast MRI reconstruction,” in International Conference on Medical Image Computing and ComputerAssisted Intervention. Springer, 2015, pp. 700–707.
 [18] Z.P. Liang and P. C. Lauterbur, Principles of magnetic resonance imaging: a signal processing perspective. SPIE Optical Engineering Press, 2000.
 [19] M. Lustig, D. Donoho, and J. M. Pauly, “Sparse MRI: The application of compressed sensing for rapid MR imaging,” Magnetic resonance in medicine, vol. 58, no. 6, pp. 1182–1195, 2007.
 [20] S. Ma, W. Yin, Y. Zhang, and A. Chakraborty, “An efficient algorithm for compressed MR imaging using total variation and wavelets,” in Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE, 2008, pp. 1–8.
 [21] A. M. Mendrik, K. L. Vincken, H. J. Kuijf, M. Breeuwer, W. H. Bouvy, J. De Bresser, A. Alansary, M. De Bruijne, A. Carass, A. ElBaz et al., “MRBrainS challenge: online evaluation framework for brain image segmentation in 3T MRI scans,” Computational intelligence and neuroscience, vol. 2015, p. 1, 2015.
 [22] X. Qu, D. Guo, B. Ning, Y. Hou, Y. Lin, S. Cai, and Z. Chen, “Undersampled MRI reconstruction with patchbased directional wavelets,” Magnetic resonance imaging, vol. 30, no. 7, pp. 964–977, 2012.
 [23] X. Qu, Y. Hou, L. Fan, D. Guo, J. Zhong, and Z. Chen, “Magnetic resonance image reconstruction from undersampled measurements using a patchbased nonlocal operator,” Medical Image Analysis, vol. 18, no. 6, p. 843, 2014.
 [24] S. Ravishankar and Y. Bresler, “MR image reconstruction from highly undersampled kspace data by dictionary learning,” IEEE transactions on medical imaging, vol. 30, no. 5, pp. 1028–1041, 2011.
 [25] T. Rohlfing, N. M. Zahr, E. V. Sullivan, and A. Pfefferbaum, “The SRI24 multichannel atlas of normal adult human brain structure,” Human Brain Mapping, vol. 31, no. 5, p. 798, 2010.
 [26] J. Schlemper, J. Caballero, J. V. Hajnal, A. Price, and D. Rueckert, “A deep cascade of convolutional neural networks for dynamic MR image reconstruction,” IEEE Transactions on Medical Imaging, vol. PP, no. 99, pp. 1–1, 2017.
 [27] ——, “A deep cascade of convolutional neural networks for MR image reconstruction,” in International Conference on Information Processing in Medical Imaging. Springer, 2017, pp. 647–658.
 [28] J. Sun, H. Li, Z. Xu et al., “Deep ADMMnet for compressive sensing MRI,” in Advances in Neural Information Processing Systems, 2016, pp. 10–18.
 [29] Y. Tai, J. Yang, X. Liu, and C. Xu, “MemNet: A persistent memory network for image restoration,” in The IEEE International Conference on Computer Vision (ICCV), Oct 2017.
 [30] S. Wang, Z. Su, L. Ying, X. Peng, S. Zhu, F. Liang, D. Feng, and D. Liang, “Accelerating magnetic resonance imaging via deep learning,” in 2016 IEEE 13th International Symposium on Biomedical Imaging (ISBI), April 2016, pp. 514–517.
 [31] J. Yang, J. Wright, T. S. Huang, and Y. Ma, “Image superresolution via sparse representation,” IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society, vol. 19, no. 11, p. 2861, 2010.
 [32] J. Yang, J. Wright, T. Huang, and Y. Ma, “Image superresolution as sparse representation of raw image patches,” in Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE, 2008, pp. 1–8.
 [33] J. Yang, Y. Zhang, and W. Yin, “A fast alternating direction method for TVL1L2 signal reconstruction from partial Fourier data,” IEEE Journal of Selected Topics in Signal Processing, vol. 4, no. 2, pp. 288–297, 2010.
 [34] Z. Zhan, J.F. Cai, D. Guo, Y. Liu, Z. Chen, and X. Qu, “Fast multiclass dictionaries learning with geometrical directions in MRI reconstruction,” IEEE Transactions on Biomedical Engineering, vol. 63, no. 9, pp. 1850–1861, 2016.