# Unsupervised Deep Learning for MR Angiography with Flexible Temporal Resolution

## Abstract

Time-resolved MR angiography (tMRA) has been widely used for dynamic contrast enhanced MRI (DCE-MRI) due to its highly accelerated acquisition. In tMRA, the periphery of the -space data are sparsely sampled so that neighbouring frames can be merged to construct one temporal frame. However, this view-sharing scheme fundamentally limits the temporal resolution, and it is not possible to change the view-sharing number to achieve different spatio-temporal resolution trade-off. Although many deep learning approaches have been recently proposed for MR reconstruction from sparse samples, the existing approaches usually require matched fully sampled -space reference data for supervised training, which is not suitable for tMRA. This is because high spatio-temporal resolution ground-truth images are not available for tMRA. To address this problem, here we propose a novel unsupervised deep learning using optimal transport driven cycle-consistent generative adversarial network (cycleGAN). In contrast to the conventional cycleGAN with two pairs of generator and discriminator, the new architecture requires just a single pair of generator and discriminator, which makes the training much simpler and improves the performance. Reconstruction results using in vivo tMRA data set confirm that the proposed method can immediately generate high quality reconstruction results at various choices of view-sharing numbers, allowing us to exploit better trade-off between spatial and temporal resolution in time-resolved MR angiography.

## I Introduction

DCE-MRI is one of the essential imaging methods for clinical diagnosis. DCE-MRI gives information on the physiological characteristics of tissues, such as blood vessel function, etc., so it is useful for the imaging of strokes or cancers [28, 33].

In DCE-MRI, after injecting the contrast agent to patient, a sequence of MR images is obtained. Since the motion during the acquisition of -space data can degrade the quality of MR images, multiple studies have been conducted to accelerate DCE-MRI with improved temporal resolution. Specifically, time-resolved MRA such as time-resolved angiography with interleaved stochastic trajectories (TWIST) [14] has been widely used as one of the solutions for enhancing the temporal resolution of DCE-MRI. Here, the periphery of -space data is very sparsely sampled at each time frame while the center of -space frequency data is fully acquired for the retention of image contrast. Then, the high frequency regions of -space data from several time frames are combined together to form a single -space data while the low frequency of -space data remains at intact. This view-sharing leads to a uniformly subsampled -space sampling pattern as shown in Fig. 1(a), after which the generalized autocalibrating partial parallel acquisition (GRAPPA) [4] can be applied to reconstruct images from the uniformly downsampled -space data.

Although this results in a highly accelerated acquisition with a noticeable enhancement in both temporal and spatial resolution, the view sharing from several temporal frames results in the blurring of the temporal resolution. In fact, the resulting inaccuracy in the temporal resolution is considered as the main huddle when using tMRA as a quantitative tool for perfusion study [18, 11]. Unfortunately, the current sampling scheme with fixed temporal window is not flexible for the reconstruction at reduced sliding window sizes to improve temporal resolution, since -space data of the reduced view sharing is a subset of uniformly sampled -space data, which results in the coherent aliasing artifacts that are difficult to overcome by parallel imaging or compressed sensing. Therefore, the current methods have limitation in investigating various spatio-temporal resolution trade-off to accurately quantify the perfusion dynamics.

To overcome this limitation, a -space interpolation algorithm [3] using annihilating filter-based low rank Hankel matrix approach (ALOHA) [9, 15, 17] was proposed to synergistically combine CS-MRI and parallel MRI (pMRI) as -space interpolation problems. Nonetheless, high computational complexity for matrix decomposition, which is essential in this algorithm, is an obstacle to practical applications.

Recently, deep learning approaches have been extensively studied for accelerated MRI [5, 6, 7, 13, 16, 26, 31]. In particular, in [5], deep network architecture using unfolded iterative compressed sensing (CS) algorithm was proposed. Domain adaptation [6], deep residual learning [16], and data consistency layers [26] were properly utilized to reconstruct CS-MRI. All of these methods have demonstrated dramatic performance improvement over the CS approaches [21, 10, 20, 9, 15], while significantly reducing computational complexity. Unfortunately, most of the existing deep learning approaches require matched ground-truth data for supervised training, which is not applicable to the tMRA since high spatial resolution with fast temporal resolution ground-truth images cannot be acquired in practice. Therefore, unsupervised neural network training without matched reference data is required.

To address this issue, here we propose a novel cycleGAN architecture inspired by the optimal transport (OT) theory [30, 24, 19, 27]. One of the important advantages of our cycleGAN is that the necessity of obtaining fast-temporal resolution data is no more required for training, as long as we can obtain unmatched reference data of high spatial resolution. Moreover, once the network is trained, subsampled -space data at various view sharing can be used as input to generate reconstruction results at various spatial and temporal resolutions. In terms of network architecture, our cycleGAN architecture is much simpler than the conventional one [35], since one of the generators in the standard cycleGAN can be replaced with a deterministic operation so that we only need a single pair of generator and discriminator. This makes training more stable and leads to better reconstruction quality than the original cycleGAN. Using experimental results, we verify that our cycleGAN can reconstruct high quality tMRA at various temporal resolutions.

## Ii Related Works

### Ii-a Optimal Transport Driven CycleGAN

Here we briefly review the OT-driven cycleGAN design in our companion paper [27], which is used for our unsupervised learning method.

Consider an inverse problem, where a noisy measurement from an unobserved image is modeled by

(1) |

where is the measurement noise and is the known measurement operator. Since inverse problems are ill-posed, the penalized least squares (PLS) approach is a classical strategy to mitigate the ill-posedness:

(2) |

where is a penalty function to impose the constraint to the reconstruction image.

In our companion paper[27], we proposed a new PLS cost function with a novel deep learning prior:

(3) |

where is a neural network with the network parameter and input . The new PLS cost in (3) was proposed so that the unknown is estimated under the constraint that there exists an inverse mapping from the measurement by a neural network . The new PLS cost function has many important advantages over existing PLS in (2). In particular, if the global minimizer is achieved, i.e. , then we have

Therefore, can be an inverse of the forward operator , which is the ultimate goal in the inverse problem.

Since we do not have any matched reference data, we further assume that in (3) are random vectors with the measure and , respectively, so that the estimation of the parameter should be done by considering all realizations of . Therefore, our goal of the unsupervised learning is to find the parameterized map such that average cost with respect to some joint distribution can be minimized. This is equal to finding the transportation mapping between two probability measures and [30, 24]. In particular, rather than choosing arbitrary joint distributions, the optimal transport theory [30, 24] informs us that the optimization problem should be formulated with respect to the optimal transportation cost

(4) |

where the minimum is taken over the joint distribution whose marginal distribution with respect to and is and , respectively. Another important discovery in our companion paper [27] is that the resulting primal problem can be equivalently represented by the Kantorovich dual formulation [30, 24]:

where

(5) |

Here, denotes some hyper-parameter, and and are simplified as

(6) |

(7) |

where is the Kantorovich 1-Lipschitz potential. Here, the term is the cycle-consistency loss, is the Wasserstein GAN (WGAN) loss [2], and is often called the discriminator. It is important to note that we just have one discriminator in (7), since is a known deterministic generator and we only require one single CNN generator [27].

Interestingly, the resulting dual formulation shows that the physics-driven data consistency term is used to stabilize the training of the neural network. This implies that in contrast to the inverse formulation using deep learning prior [34, 1] that incorporate the physics-driven information during the run-time reconstruction and network architectures, our cycleGAN formulation shows that the physics-driven constraint can be incorporated in training a feed-forward deep neural network so that it generates physically meaningful estimates [27].

## Iii Theory

### Iii-a Problem Formulation

To improve the temporal resolution of tMRI, consider the reduction of the view-sharing number as shown in Fig. 1(b). Specifically, for a given under-sampling pattern from the reduced view sharing, the forward measurement from -channel receiver coils is given by

(8) |

where

where is the -th channel unknown image, is the 2-D Fourier transform, is the projection to that denotes -space sampling indices, and is the corresponding downsampled -space. In this case, it is difficult to apply GRAPPA directly due to the irregular under-sampling pattern, so our goal is to obtain a deep learning approach to address this problem. However, the main technical difficulty is that it is not possible to obtain full -space data that can be used as reference for supervised training, since during the full -space acquisition, the image content in (8) changes due to the dynamic nature of contrast agent distribution.

### Iii-B PLS Transportation Cost for Optimal Transport

To address this problem, here we employ the unsupervised learning method as an extension of OT-driven cycleGAN [27]. For this, we first obtain the aliased reconstruction in the image domain:

(9) |

The reason we choose the aliased image domain as instead of -space measurement is that the construction of the discriminator in the image domain is well-established.

Using (8) and (9), we extend the PLS transportation cost in (3) to the following formulation:

(10) | ||||

(11) | ||||

(12) | ||||

(13) |

where are regularization parameters, and are distance metrics for the image and -space domain, respectively, whose definition will be discussed later. More specifically, (10) and (11) denote the data fidelity term and deep learning based prior term similar to those in (3). The last two terms (12) and (13) denote the identity loss for the spatial domain and frequency domain, respectively. More specifically, (12) ensures that the generator should not alter the fully sampled high resolution image, and (13) enforces that the acquired -space samples should be maintained.

Using (10)-(13), the primal form of the optimal transport problem becomes

(14) |

Theorem 1 then specifies the corresponding Kantorovich dual formulation:

###### Theorem 1.

For the given primal optimal transport problem in (14) with (10)-(13), suppose that there exist such that

(15) |

then the associated Kantorovich dual formulation is given by

Here,

(16) |

where denotes some hyper-parameter, and and are given by

(17) |

and

(18) |

Moreover, the image and -space identity loss are given by

(19) |

(20) |

###### Proof.

See Appendix. ∎

### Iii-C Choice of Metric

The metric in the image and the -space domain should be defined according to their inherent properties. For example, the -space data for each channel should be measured separately to keep the integrity of the Fourier data. So the difference are calculated for each channel and added back together to obtain the total error. This leads to the choice of , i.e. the Froebenius norm. On the other hand, to deal with the multi-channel images in which each channel image is sensitive to specific spatial locations, we define the metric as follows:

(21) |

where denotes the element-wise square-root of sum of squares (SSoS) operation such that is composed of elements

(22) |

where is the -th elements of the -th coil image. By measuring the image domain difference in terms of SSoS images, we found that the algorithm is less prone to each specific coil map. Furthermore, using the definition (21), we can easily see that

(23) |

if we choose as 1-Lipschitz function with respect to the SSoS image. The overall flowchart of the proposed cycleGAN architecture is illustrated in Fig. 2.

## Iv Method

### Iv-a Training dataset

Twelve sets of in vivo 3D DCE data were acquired with Siemens 3T Verio scanners using TWIST sequence in Gachon University Gil Medical center. The area of scans were fixed to visualize carotid vessels. Among twelve sets of given data, there were three different sets of scan parameters. The first eight sets have repetition time (TR) = 2.74 ms, echo time (TE) = 1.02 ms, 241640106 matrix, 1.0 mm slice thickness, 16 coils, and 16 temporal frames. The next two sets have TR = 2.5 ms, TE = 0.94 ms, 15964080 matrix, 1.2 mm slice thickness, 16 coils, and 30 temporal frames. Finally, for the last two sets, the acquisition parameters were the same as the two sets described beforehand, with the only difference in 2.5 mm slice thickness and 37 temporal frames. Specific sampling patterns are described in Fig. 1(a), where 2424 size ACS region were obtained for the autocalibration of 2D GRAPPA kernel. Furthermore, data were acquired using partial Fourier scheme such that only 63 of the data was acquired. For GRAPPA reconstruction, high frequency -space data of five time frames, () are combined with four center sampled frames (), to generate a single -space which is down-sampled by a factor 3 and 2 along and directions, respectively. Then, 2-D GRAPPA [4] is used to estimate missing -space data. Accordingly, the resulting sliding window size is 9 frames so that temporal blurring is unavoidable. On the other hand, images in the under-sampled domain were generated utilizing reduced view sharing as depicted in Fig. 1(b). Out of twelve patient data, four sets were used for training, which corresponds to 33,890 slices of training data. The remaining eight sets were used for testing, which corresponds to 44,850 slices of test data.

### Iv-B Network architecture

Our network consists of a single generator and a single discriminator. The generator adopts standard U-Net architecture [25], which has large receptive field and and is widely used in various tasks. More specifically, our network consists of blocks that are comprised of 33 convolution, instance normalization [29], and rectified linear unit (ReLU) [22]. Skip connections are also added with concatenation for easier gradient flow. On each stage, three blocks of convolution-instance norm-ReLU are placed except for the last stage which consists of a single convolution layer (green arrow in Fig. 3(a)). Detailed depiction of the network can be found in Fig. 3(a). As for the pooling, we adopted average pooling. The number of convolutional filters were set to 64 at the first stage, which increased 2-fold at each stage and reached 1024 at the final stage. To handle the inherent nature of MR images that are complex, the real and imaginary part of the complex data are stacked in the channel dimension. Accordingly, the number of input channels was set to 32 (16 coils 2 = 32).

For the discriminator illustrated in Fig. 3(b), we adopt PatchGAN discriminator from [8]. For stable and efficient training, our discriminator takes a single channel image data constructed by the SSoS operator from complex multi-coil image. To illustrate further, our discriminator contains three convolution layers followed by instance normalization and LeakyReLU. The first convolutional layer consists of 64 sets of convolution layer, and the number of convolution kernels in the second layer is 128. At the last layer, convolution layers is appended to compute the final feature map.

### Iv-C Network training

The hyper-parameters in (1) were set to , and . Aliased images from sub-sampled -space data with various view sharing factors (see Figs. 1(b)) were used for data in domain , whereas reconstructed multi-coil images using 2-D GRAPPA which utilizes all the view-sharing were used for since it generates best spatial resolution images. Note that the two domains do not match each other, since the temporal resolution are different. The down-sampling mask was selected randomly from one of the view sharing numbers (VS) = 2, 3, or 5. Accordingly, we could use all three types of sub-sampled data with VS=2, 3, and 5 as inputs so that we can compare various levels of spatial- and temporal- resolution to analyze its trade-off. For GRAPPA reconstruction, parameters were chosen such that optimal results could be obtained. Here, we use kernel size of 55 for interpolation.

For optimization, our network was trained with Adam optimizer [12] with momentum and . The network was trained for 50 epochs which consists of two phases. In the first phase, learning rate was specified to 0.001 for 10 epochs. Next, in the second phase learning rate was linearly degraded to 0 for the remaining 40 epochs. At each step, generator was updated 5 times per single update of the discriminator. Our batch size was set to 1. For pre-processing, each individual image was normalized with the standard deviation of the under-sampled image. The proposed network was implemented in Python using PyTorch library [23] and trained using an NVidia GeForce GTX 1080-Ti graphics processing unit. It took about 6 days for the network training.

### Iv-D Comparative Studies

As for CS reconstruction methods, ALOHA [9] and k-t SLR [20] were chosen for comparison. Reconstruction parameters for ALOHA were given as follows: annihilating filter size = 135 , 3 levels of pyramidal decomposition, decreasing LMaFit tolerance values () at each level, and ADMM parameter = . For k-t SLR, parameters were determined as: the value of p in Schatten p-norm = 0.1, regularization parameter for Schatten p-norm , and the number of outer iterations = 15.

To quantitatively compare the performance measure of the proposed algorithm as opposed to algorithms presented for comparative study, the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) index [32] were computed following the standards. Reconstructed outputs of our network is multi-coil complex-valued data whereas in clinical situations we need a single magnitude image. Therefore, we perform the SSoS to the multi-coil data to obtain a single image for comparison. We calculate the metrics between the images after SSoS operation. More specifically, PSNR is defined as follows:

(24) |

where stands for reconstructed sum-of-squares image and holds for noise-free sum-of-squares image (ground truth). is the maximum pixel intensity of the ground truth. The SSIM is defined to better capture the perceptual similarity between the original image and the distorted image, and is defined as follows:

(25) |

where , , and denote average and variance of , and covariance of and , respectively. For numeric stability, and are added where is the dynamic range of pixel values. We keep the default values and .

## V Results

### V-a In Vivo Results

Reconstruction results of the carotid vessel data sets from in vivo acquisition are demonstrated in Fig. 4. The temporal frames were chosen to illustrate the propagation of the contrast agent and to compare the temporal resolution. In the proposed method, a single neural network trained with various view sharing numbers is sufficient to provide VS-agnostic results. Thus, we provide reconstruction results with all different view sharing - 2, 3, and 5. Raw data shown in Fig. 4 were obtained by directly applying inverse Fast Fourier Transform (FFT) to the -space data without view sharing, which reflects the true temporal resolution despite its low visual quality at each time frame.

By inspection, we can see that the flow of contrast agent is rather abrupt in the GRAPPA reconstruction. For instance, there was a rapid propagation of contrast agent from the frame to the frame as shown in Fig. 4. This was due to the sliding window combination of several frames prior to the application of GRAPPA. Therefore, results acquired from GRAPPA reconstruction fail to follow the true temporal dynamics faithfully. Degradation of temporal dynamics gets even severer as number of view-sharing increases. On the other hand, in the reconstructed images using the proposed method, the flow of the contrast agent was captured to a fine temporal scale with VS=2. Minor temporal blurring can be seen when VS=3 as shown in Fig. 4. With VS=5, which is equal to the view sharing number used in GRAPPA, the spatio-temporal resolution of the proposed method were nearly identical to the results of GRAPPA. More specifically, inspecting the raw images carefully we can observe the location of contrast agent flow at each time frame. As the number of view sharing increases, which corresponds to severer temporal blending, flow appear at locations where it should not be seen. For instance, at we can clearly see the temporal resolution degradation with the increase in the number of view sharing. With GRAPPA reconstruction, the detail of the spread of the contrast agent was influenced by the high-frequency region in the -space so that the temporal dynamics of the future frame was erroneously incorporated in the current frame. On the other hand, our method provides high spatial resolution images for various view sharing numbers, although the spatial resolution improves with more view sharing at the cost of temporal blurring. Since we can reconstruct high quality results even from low view sharing factor, we could investigate various spatio-temporal resolution trade-off.

For quantitative analysis, the difference between the start point to peak intensity of the reconstructed images and that of the raw data was calculated using the eight patient data sets for detailed comparison of temporal dynamics. Distal internal carotid artery and sigmoid sinus were chosen for the box plots of the relative start point to peak intensity as shown in Fig. 5. As the number of view sharing frames increases, temporal dynamics is degraded. In addition, the temporal dynamics of reconstructed images with VS=5 using the proposed network had a similar or better tendency to those using GRAPPA, which implies the enhancement of temporal resolution in the reconstructed results using the proposed unsupervised learning.

### V-B Comparison with existing algorithms

We further validate the effectiveness of our unsupervised learning method in contrast to the state-of-the-art CS approaches - ALOHA and k-t SLR. We compared the performance between the CS approaches and the proposed method with VS=2. As shown in Fig. 6, reconstructed images with k-t SLR can be characterized with blurriness so that the details in reconstructed images with k-t SLR cannot be distinguished well. On the other hand, images reconstructed with ALOHA are sharper, and this sharpness is further improved using the proposed method, which is crucial in clinical situations.

Due to the lack of the ground-truth, exact quantification of the performance was difficult. Instead, we calculated the PSNR and SSIM index with respect to the GRAPPA reconstruction with VS=5 that has the most similar reconstruction. The main assumption is that after the saturation of the contrast agent, the differences in the image may be small. As shown in Fig. 6, in terms of quantitative metrics - PSNR and SSIM index, our unsupervised learning consistently outperforms ALOHA. Specifically, our method outperforms ALOHA by 0.030.46 dB in PSNR and exceeds k-t SLR with a large margin. The acceleration factor of the data to which the proposed method is applied is , which is much higher than used for GRAPPA reconstruction. However, the proposed method offers a spatial resolution comparable to that of the GRAPPA reconstruction. Furthermore, the computational time for slice using the proposed method was about 0.01 sec, while that the computational times for ALOHA, k-t SLR were 84.61, and 217.57 sec, respectively. This confirmed that the proposed method is computationally more efficient than ALOHA and k-t SLR.

## Vi Discussion

### Vi-a Ablation study

To analyze the role of individual losses in the proposed method, ablation studies were performed, by excluding the frequency loss and/or the identity loss under the same training conditions. The reconstruction results at acceleration are illustrated in Fig. 7. As we do not have ground-truth image, the most similar reconstruction images using GRAPPA with VS=5 are illustrated in in the last column.

The results provided by the network trained without and are demonstrated in the first column. In the second column and third column, the results excluding and , respectively, are illustrated. The reconstructed images using the proposed method are shown in the fourth column. The networks trained without and (Fig. 7(a)), and (Fig. 7(c)) provided more blurry results than the proposed method (Fig. 7 (d)). This confirmed that enforcing the frequency domain identity loss to maintain the acquired -space samples can help to reconstruct the details. By comparing Fig. 7(b) and Fig. 7(d), it was shown that the region with contrast dynamics can be reconstructed well without any deformation by the proposed method. In terms of PSNR and SSIM, the proposed network persistently outperformed the ablated networks. More specifically, our method is about dB better, which demonstrated the importance of and for our unsupervised learning.

### Vi-B Comparison with conventional cycleGAN

Fig. 8 shows the results by the conventional cycleGAN [8] and the proposed method using the data with VS=2. Again, due to the lack of the ground-truth data, we illustrate the GRAPPA reconstruction with VS=5 that shows the most similar results. Our optimal transport driven cycleGAN consists of a single pair of generator and discriminator, while the standard cycleGAN consists of two pairs of generator and discriminator. As shown in Fig. 8, the performance of the proposed cycleGAN was significantly better than that of the conventional cycleGAN. More specifically, the intensity of the vessels in the reconstruction using standard cycleGAN is much weaker compared to the proposed method and GRAPPA. In addition, the details of the vascular structures, which are crucial for an accurate diagnosis, are not well recognizable in the reconstructions with the standard cycleGAN, while are clearly visible in the reconstructions with the proposed cycleGAN. Since our dataset do not contain any ground-truth, PSNR and SSIM index were calculated with respect to the GRAPPA reconstruction with VS=5 for quantitative comparison between the conventional cycleGAN and the proposed cycleGAN. The proposed method is about 5.35.7 dB better compared to the conventional cycleGAN in terms of PSNR. This result is also consistent with SSIM index improvement.

This improvement is due to more stable training in our network architectures, since we only have a single pair of generator and discriminator, whereas in the standard cycleGAN two pairs are necessary which makes the training more difficult.

## Vii Conclusion

In this paper, we proposed a novel unsupervised learning method to improve the temporal resolution of tMRA imaging and generate reconstruction results at diverse numbers of view sharing. In particular, we proposed a novel cycleGAN architecture which requires only a single pair of generator and discriminator by exploiting the deterministic undersampling operation. We performed various simulation and in vivo experiments to verify the efficacy of the proposed method for TWIST imaging. Using the proposed unsupervised learning, it was demonstrated that the undersampled images with reduced view sharing can be properly reconstructed, resulting in the improvement of temporal resolution of TWIST imaging. Our network can provide various reconstruction results by easily altering the number of view sharing at the inference stage. Moreover, the proposed method can be used with the existing TWIST acquisition protocol without any modification of the pulse sequence, despite the significantly small computational complexity. Since the matching pairs of downsampled and fully sampled data were not required in our method, our method may provide an important new research direction that can significantly extend the clinical applications of tMRA.

## Proof of Theorem 1

The proof is a direct extension of the proof in [27], but we include the following for self-containment.

Using the cost given by Eqs. (10)-(13), the primal problem (14) becomes

(26) | ||||

(27) | ||||

(28) |

where and denote the optimal joint measure and marginal. Furthermore,

Here, the minimization with respect to the marginal is simple, and the technical difficulty lies in the minimization with respect to the joint measure . According to the Kantorovich dual formulation [30], we have

where the so-called c-transforms and are defined by [30]

Now, instead of finding the , we choose . Similarly, instead of finding the , we choose . This leads to an upper bound:

where

(29) | ||||

(30) |

Now, using Kantorovich potentials that satisfy (15), we have

This leads to the following lower-bound

By collecting the two bounds, we have

where is defined as

Therefore, the following primal problem of the optimal transport:

(31) |

can be equivalently represented by a constrained optimization problem:

(32) |

Using Lagrangian multiplier, the constrained optimization problem (32) can be converted to an unconstrained optimization problem:

where is a Lagrangian parameter and . By implementing the Kantorovich potential using CNNs with parameters and , i.e. and , we have the the following cycleGAN problem:

(33) |

where

where denotes the cycle-consistency loss in (29) and is the GAN loss given by:

(34) |

Note that the parameter does not affect the generator , since the forward operator is fixed. Therefore, our simplified min-max optimization problem becomes

(35) |

Since this is from the dual formulation of , the final step of the proof includes the remaining terms in (28). This leads to the following min-max problem:

(36) |

where and are defined by (19) and (20), respectively. This concludes the proof.

### References

- (2018) MoDL: model-based deep learning architecture for inverse problems. IEEE transactions on medical imaging 38 (2), pp. 394–405. Cited by: §II-A.
- (2017) Wasserstein gan. arXiv preprint arXiv:1701.07875. Cited by: §II-A.
- (2017) True Temporal Resolution TWIST Imaging using Annihilating Filter-based Low-rank wrap around Hankel Matrix. In The International Society for Magnetic Resonance in Medicine, Cited by: §I.
- (2002) Generalized autocalibrating partially parallel acquisitions (GRAPPA). Magn. Reson. Med. 47 (6), pp. 1202–1210. Cited by: §I, §IV-A.
- (2018) Learning a variational network for reconstruction of accelerated MRI data. Magnetic resonance in medicine 79 (6), pp. 3055–3071. Cited by: §I.
- (2018) Deep learning with domain adaptation for accelerated projection reconstruction MR. Magnetic Resonance in Medicine, https://doi.org/10.1002/mrm.27106. Cited by: §I.
- (2019) k-space deep learning for accelerated MRI. IEEE transactions on medical imaging. Cited by: §I.
- (2017) Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125–1134. Cited by: §IV-B, §VI-B.
- (2016) A general framework for compressed sensing and parallel MRI using annihilating filter based low-rank Hankel matrix. IEEE Transactions on Computational Imaging 2 (4), pp. 480–495. Cited by: §I, §I, §IV-D.
- (2009) k-t FOCUSS: A general compressed sensing framework for high resolution dynamic MRI. Magn. Reson. Med. 61 (1), pp. 103–116. Cited by: §I.
- (2019) Advanced methods in dynamic contrast enhanced arterial phase imaging of the liver. Investigative Magnetic Resonance Imaging 23 (1), pp. 1–16. Cited by: §I.
- (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §IV-C.
- (2017) A parallel MR imaging method using multilayer perceptron. Medical physics 44 (12), pp. 6209–6224. Cited by: §I.
- (2006) syngo TWIST for dynamic time-resolved MR angiography. Magnetom Flash 34 (3), pp. 92–95. Cited by: §I.
- (2016) Acceleration of mr parameter mapping using annihilating filter-based low rank hankel matrix (aloha). Magnetic resonance in medicine 76 (6), pp. 1848–1864. Cited by: §I, §I.
- (2018) Deep residual learning for accelerated MRI using magnitude and phase networks. IEEE Transactions on Biomedical Engineering. Cited by: §I.
- (2016) Reference-free single-pass EPI Nyquist ghost correction using annihilating filter-based low rank Hankel matrix (ALOHA). Magnetic resonance in medicine 76 (6), pp. 1775–1789. Cited by: §I.
- (2008) 3D time-resolved MR angiography (MRA) of the carotid arteries with time-resolved imaging with stochastic trajectories: comparison with 3D contrast-enhanced bolus-chase MRA and 3D time-of-flight MRA. American Journal of Neuroradiology 29 (10), pp. 1847–1854. Cited by: §I.
- (2019) CycleGAN with a Blur Kernel for Deconvolution Microscopy: Optimal Transport Geometry. arXiv preprint arXiv:1908.09414. Cited by: §I.
- (2011) Accelerated dynamic MRI exploiting sparsity and low-rank structure: kt SLR. IEEE Trans. Med. Imag. 30 (5), pp. 1042–1054. Cited by: §I, §IV-D.
- (2007) Sparse MRI: The application of compressed sensing for rapid MR imaging. Magn. Reson. Med. 58 (6), pp. 1182–1195. Cited by: §I.
- (2010) Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10), pp. 807–814. Cited by: §IV-B.
- (2017) Automatic differentiation in pytorch. Cited by: §IV-C.
- (2019) Computational optimal transport. Foundations and Trends® in Machine Learning 11 (5-6), pp. 355–607. Cited by: §I, §II-A.
- (2015) U-net: convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pp. 234–241. Cited by: §IV-B.
- (2018) A deep cascade of convolutional neural networks for dynamic mr image reconstruction. IEEE transactions on Medical Imaging 37 (2), pp. 491–503. Cited by: §I.
- (2019) Optimal transport, cyclegan, and penalized ls for unsupervised learning in inverse problems. arXiv preprint arXiv:1909.12116. Cited by: Proof of Theorem 1, §I, §II-A, §II-A, §II-A, §II-A, §III-B.
- (2009) Dynamic contrast-enhanced MRI in the diagnosis and management of breast cancer. NMR in Biomedicine 22 (1), pp. 28–39. Cited by: §I.
- (2016) Instance normalization: The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022. Cited by: §IV-B.
- (2008) Optimal transport: old and new. Vol. 338, Springer Science & Business Media. Cited by: Proof of Theorem 1, §I, §II-A.
- (2016) Accelerating magnetic resonance imaging via deep learning. In Biomedical Imaging (ISBI), 2016 IEEE 13th International Symposium on, pp. 514–517. Cited by: §I.
- (2004) Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13 (4), pp. 600–612. Cited by: §IV-D.
- (2009) Dynamic contrast enhanced magnetic resonance imaging in oncology: theory, data acquisition, analysis, and examples. Current medical imaging reviews 3 (2), pp. 91. Cited by: §I.
- (2017) Learning deep cnn denoiser prior for image restoration. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3929–3938. Cited by: §II-A.
- (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision, pp. 2223–2232. Cited by: §I.