Coupled Dictionary Learning for Multi-contrast MRI Reconstruction
Abstract
Medical imaging tasks often involve multiple contrasts, such as T1- and T2-weighted magnetic resonance imaging (MRI) data. These contrasts capture information associated with the same underlying anatomy and thus exhibit similarities. In this paper, we propose a Coupled Dictionary Learning based multi-contrast MRI reconstruction (CDLMRI) approach to leverage an available guidance contrast to restore the target contrast. Our approach consists of three stages: coupled dictionary learning, coupled sparse denoising, and -space consistency enforcing. The first stage learns a group of dictionaries that capture correlations among multiple contrasts. By capitalizing on the learned adaptive dictionaries, the second stage performs joint sparse coding to denoise the corrupted target image with the aid of a guidance contrast. The third stage enforces consistency between the denoised image and the measurements in the -space domain. Numerical experiments on the retrospective under-sampling of clinical MR images demonstrate that incorporating additional guidance contrast via our design improves MRI reconstruction, compared to state-of-the-art approaches.
Pingfan Song Lior Weizman João F. C. Mota Yonina C. Eldar Miguel R. D. Rodrigues
^{†}^{†}thanks:
This work was supported by the Royal Society International Exchange Scheme IE160348, by the European Union’s Horizon 2020 grant ERC-BNYQ, by the Israel Science Foundation grant no. 335/14, by ICore: the Israeli Excellence Center ’Circle of Light’, by the Ministry of Science and Technology, Israel, by UCL Overseas Research Scholarship (UCL-ORS) and by China Scholarship Council (CSC).
\address
Department of Electronic and Electrical Engineering, University College London, UK
Department of Electrical Engineering, Technion – Israel Institute of Technology, Israel
School of Engineering and Physical Sciences, Heriot-Watt University, UK
\ninept
multi-contrast MRI, coupled dictionary learning, coupled sparse denoising, guidance information
1 Introduction
Magnetic Resonance Imaging (MRI) is a noninvasive and non-ionizing medical imaging technique that has been widely used for medical diagnosis, clinical analysis, and staging of disease. Owing to its versatility, different MRI pulse sequences produce images with different contrasts, such as Fluid-attenuated inversion recovery (FLAIR), T1-weighted, and T2-weighted. Each contrast images different physical properties of the tissues examined [1, 2]. The acquisition time of conventional brain multi-contrast MRI is at least 30 minutes, which can lead to discomfort in some patients and requires sedation for pediatric patients. Different image constrasts, however, are highly correlated, because they image the same underlying anatomy [3]. Such correlation can potentially be used to shorten acquisition time by partial acquisition of the target contrast, followed by reconstruction that takes into account other fully-sampled contrasts as guidance/reference.
MRI reconstruction from under-sampled measurements has been thoroughly investigated in the case of single contrast acquisition. The pioneering framework proposed by Lustig et al. [4], motivated by compressive sensing theory [5, 6, 7, 8], uses the fact that MR images are sampled in the spatial frequency domain (a.k.a. -space) and can be represented as a sparse combination of fixed, predefined bases, for example, wavelets. Building on this work, Ravishankar et al. [9] proposed a dictionary learning based MRI reconstruction approach, DLMRI, which is based on the fact that patches of MR images can be sparsely represented with respect to a set of adaptive learned bases [10, 11]. Those adaptive bases contribute to improved performance of DLMRI over SparseMRI [4]. Both SparseMRI and DLMRI consider a single contrast.
The reconstruction of an MRI target contrast based on the availability of other contrasts has also been recently investigated in some works [12, 13, 3, 14, 15, 16, 17, 18]. For example, for the case where one contrast (the "target contrast") is under-sampled and the other is fully sampled and serves as guidance/reference, Weizman et al. [12, 13] proposed reference-based MRI, that exploits gray level similarity between T2-weighted and FLAIR, to reconstruct the target contrast (FLAIR) given the guidance contrast (T2-weighted). Their approach is specific to contrasts with gray level similarity, not to structural similarity which is our focus here. Ehrhardt et al. [3, 14] propose an approach, STVMRI, that exploits structure-guided total variation to integrate the location and direction priors from a guidance contrast into the reconstruction of the target contrast.
Inspired by DLMRI [9], we propose a coupled dictionary learning approach for multi-contrast MRI reconstruction, referred to as CDLMRI, for contrasts that exhibit structural similarity (e.g. T1-weighted and T2-weighted), a scenario more general than the one addressed in [12]. Specifically, our approach cycles between three stages: coupled dictionary learning, coupled sparse denoising and -space consistency enforcing. The first stage learns a group of dictionaries that capture inherent structural similarity on textures, edges, boundaries, or other salient features across multiple contrasts. The second stage performs joint sparse coding using the learned adaptive dictionaries to denoise the corrupted target image with the aid of a guidance contrast. The third stage enforces consistency between the denoised image and the measurements in the -space domain. Our approach shows significant advantages over the competing methods, DLMRI [9] and STVMRI [3] both in visual quality, and in peak signal-to-noise ratio (PSNR).
2 Problem Formulation
We denote by the vectorized 2D MR imaging contrast of size to be reconstructed. The vector denotes the under-sampled -space measurements related to , and the matrix denotes the corresponding under-sampled Fourier transform matrix. In addition, we assume that a fully-sampled guidance MR imaging contrast is available. Our goal is to reconstruct from its -space samples under the aid of the guidance image .
2.1 Data Model for Multi-contrast MRI Data
As we would like to utilize the structural similarity between and , we first propose a data model that captures this similarity. Our data model works with image patches, instead of the entire image level, because a patch-based model is able to capture local image features effectively, as shown in other applications such as image denoising, super-resolution, inpainting, deblurring and demosaicing [19, 20, 21, 22, 23, 24, 25].
Let and denote the vector representations of image patch pairs of size extracted from the image and , respectively, where the tuple denotes the coordinates of the top-left corner of the patches within the images. Formally, we write (resp. ), where the matrix represents the operator that extracts patch (resp. ) from (resp. ). In order to capture both the similarity and discrepancy between two different contrasts, we assume that each patch pair can be represented by a sum of two sparse representations: a common sparse component that is shared by both contrasts, and a unique sparse component for each contrast. In particular, we express the patch pair as follows:
(1) | ||||
(2) |
where is the common sparse representation shared by both contrasts, is a sparse representation specific to contrast , and is a sparse representation specific to contrast ; in addition, and are a pair of dictionaries associated with the common sparse representation , whereas and are dictionaries associated with the sparse representations and , respectively. Note that, this model can be generalized for and to have different number of atoms.
2.2 Cdlmri
Our goal is to leverage the proposed data model (1)-(2) in order to recover the target MRI contrast given the guidance MRI contrast and the -space measurements . To this end, we propose Coupled Dictionary Learning for multi-contrast MRI reconstruction algorithm that attempts to solve the following optimization problem
(3) |
Note that the first two terms in the objective ensure that the image patches are consistent with their postulated model and the third term in the objective ensures that the target image is consistent with its -space measurements; the parameter balances between model and measurements fidelity. Moreover, the first set of constraints induce sparsity for the vectors and the second set of contraints normalizes the atoms of the dictionaries in order to remove the scaling ambiguity and avoid trivial solutions.
Problem (3) is highly nonconvex. Therefore, we attempt to solve this problem by alternating between three stages over a number of cycles: 1) Coupled dictionary learning, 2) Coupled sparse denoising, and 3) -space consistency enforcing, as shown in Algorithm 1.
Stage 1) Coupled Dictionary Learning. In the first cycle, is initialized as (i.e. is set to be equal to the inverse DFT of the zero-filled Fourier measurements). In the remaining cycles, in this stage, will be the output of Stage 3) from the previous cycle. In particular, for fixed , we attempt to solve (3) via alternating minimization where in a first step we update for fixed and in a second step we update for fixed . As shown in Algorithm 2, the sparse coding step is addressed using the orthogonal matching pursuit (OMP) algorithm [26, 21] and dictionary update step is adapted from the Block Coordinate Descent [27]. Note that, since a single image consists of large amount of patches, we only use a subset of the patches to constitute the training dataset and in Stage 1) to save training time.
Stage 2) Coupled Sparse Denoising. As the sparse representations computed in Stage 1) are associated only with a subset of the collection of image patches, it is necessary to perform one additional sparse denoising stage. In addition, we also introduce linearly decreasing error thresholds to fine tune the sparse representations, since this operation has been shown experimentally to improve the de-aliasing and denoising performance. In particular, we give priority to perform the following optimization in order to determine an approximation to the common sparse representations associated with the various image patches:
(4) |
We then perform the following optimization in order to determine an approximation to the unique sparse representations associated with the various target image patches:
(5) |
Here and denote the expected error thresholds which are used, together with and , in OMP as the stopping criteria. The above formulations imply that once the objective value for -th patch decreases below the expected error threshold, there is no need to find a better point and so we terminate the OMP loop. Otherwise, the OMP program keeps iterating until (resp. ) non-zero values are retrieved for (resp. ). In addition, as the quality of the target image improves along the entire cycles, we decrease the thresholds and linearly at each cycle. This strategy significantly accelerates the convergence speed, as well as allows us to dynamically control the real sparsity of each patch more effectively.
Given the sparse representations and , we obtain each denoised patch as , where and are learned dictionaries in the Stage 1).
Stage 3) -space Consistency Enforcing. Finally, in this stage, we enforce the consistency between the denoised image and its measurements in the -space domain, similar to DLMRI [9]. In particular, given the estimated patches from Stage 2), this step is formulated as a least squares problem:
(6) |
By assuming that patches wrap around at image boundaries (which implies that the number of overlapping patches occurring at each pixel is equal), we immediately obtain the solution:
(7) |
where denotes the conjugate of the Fourier transform matrix; denotes the estimated -space samples and can be expressed as follows:
(8) |
where , represents the denoised image, and denotes the number of overlapping patches at the corresponding pixel location in . We denote by the subset of -space that has been sampled and by the updated value at location () in the -space.
The overall process consisting of the three stages is repeated over a number of cycles, which has been summarized in Algorithm 1.
3 Experiments
In this section, we conduct some practical experiments to evaluate the performance of the proposed algorithm. Similar to previous approaches[28, 29, 30, 9, 3, 12], the data acquisition was simulated by retrospectively under-sampling the 2D discrete Fourier transform of clinical magnitude MR images. The sampling masks include Cartesian 1D and 2D random sampling. We compare the proposed approach with DLMRI [9] to show the benefits of integrating guidance information into the MRI reconstruction task. DLMRI [9] is also based on dictionary learning techniques, but it does not use a guidance contrast to aid the reconstruction of the target contrast. We also compare with SVTMRI [3] which is also based on the use of a guidance contrast to aid the reconstruction of the target one.
In the experiments, we set , , , , , , (meaning: is set to 0.1 in the beginning and linearly decreases to 0.005 along the cycles.), , , (for noise-free situation), and undersampling factor = 4 fold and 20 fold for the Cartesian 1D and 2D random sampling, respectively.
The visual performance is shown in Fig. 1. It can be seen that the reconstructed image and corresponding residual from DLMRI [9] introduce noticeable aliasing, noise and blurred areas. In comparison, the edges and outlines in the reconstructed image from STVMRI [3] are very sharp, thereby more visually appealing in some high-frequency areas. However, notice that some areas in the results of STVMRI [3] have been over-sharpened, thus introducing nonnegligible artifacts. In contrast, our approach substantially attenuates aliasing and noise and, at the same time, reliably restores fine details and suppresses artifacts, leading to a more comprehensive and interpretable reconstruction. The performance improvement is also demonstrated by the PSNR values.
Fig. 2 shows the learned coupled dictionaries from the T1- and T2- weighted MRI images. It can be seen that the atom pairs from common dictionaries capture associated edges, blobs and textures with the same direction and location. Most of them exhibit considerable resemblance to each other, but with opposite intensity. This phenomenon is consistent with MRI characteristics, such as Cerebrospinal fluid (CSF) being dark in T1-weighted contrast and bright in T2-weighted. This outcome indicates that the common dictionaries are able to capture the similarity between T1-weighted and T2-weighted contrasts. In comparison, the learned unique dictionaries represent the disparities of these modalities and therefore rarely exhibit similarity.
4 Conclusion
We presented an adaptive multi-contrast MRI reconstruction framework that capitalizes on both patch-based sparsity priors induced by coupled dictionaries and structure similarity priors from the guidance contrast. The coupled dictionaries are trained directly on the target images and thus are adaptive to the contrast of interest. In addition, they also capture the correlations between T1- and T2-weighted contrasts, thereby beneficial information can be extracted from the guidance contrast to aid the reconstruction of the target contrast. Practical experiments demonstrate the superior performance of our design and significant advantage over competing methods. In the future, we will adapt our algorithm for other practical sampling patterns, such as radial sampling and also explore the impact of noise.
References
- [1] Z.-P. Liang and P. C. Lauterbur, Principles of magnetic resonance imaging: a signal processing perspective. SPIE Optical Engineering Press, 2000.
- [2] D. W. McRobbie, E. A. Moore et al., MRI from Picture to Proton. Cambridge university press, 2007.
- [3] M. J. Ehrhardt and M. M. Betcke, “Multicontrast MRI reconstruction with structure-guided total variation,” SIAM Journal on Imaging Sciences, vol. 9, no. 3, pp. 1084–1106, 2016.
- [4] M. Lustig, D. Donoho et al., “Sparse MRI: The application of compressed sensing to rapid MR imaging,” Magn. Reson. Med., vol. 58, pp. 1182–1195, 2007.
- [5] D. Donoho, “Compressed sensing,” IEEE Trans. Info. Theory, vol. 52, no. 4, pp. 1289–1306, 2006.
- [6] E. Candès, J. Romberg et al., “Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Info. Theory, vol. 52, no. 2, pp. 489–509, 2006.
- [7] Y. C. Eldar and G. Kutyniok, Compressed sensing: theory and applications. Cambridge University Press, 2012.
- [8] Y. C. Eldar, Sampling Theory: Beyond Bandlimited Systems. Cambridge University Press, 2015.
- [9] S. Ravishankar and Y. Bresler, “MR image reconstruction from highly undersampled k-space data by dictionary learning,” IEEE Trans. Med. Imag., vol. 30, no. 5, pp. 1028–1041, 2011.
- [10] B. A. Olshausen et al., “Emergence of simple-cell receptive field properties by learning a sparse code for natural images,” Nature, vol. 381, no. 6583, pp. 607–609, 1996.
- [11] M. Aharon, M. Elad et al., “K-svd: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Trans. Signal Process., vol. 54, no. 11, pp. 4311–4322, 2006.
- [12] L. Weizman, Y. C. Eldar et al., “Reference-based MRI,” Medical physics, vol. 43, no. 10, pp. 5357–5369, 2016.
- [13] L. Weizman, Y. C. Eldar et al., “Compressed sensing for longitudinal MRI: An adaptive-weighted approach,” Medical physics, vol. 42, no. 9, pp. 5195–5208, 2015.
- [14] M. Ehrhardt, “Joint reconstruction for multi-modality imaging with common structure,” Ph.D. dissertation, UCL (University College London), 2015.
- [15] J. Huang, C. Chen et al., “Fast multi-contrast mri reconstruction,” Magnetic resonance imaging, vol. 32, no. 10, pp. 1344–1352, 2014.
- [16] B. Bilgic, V. K. Goyal et al., “Multi-contrast reconstruction with bayesian compressed sensing,” Magn. Reson. Med., vol. 66, no. 6, pp. 1601–1615, 2011.
- [17] X. Qu, Y. Hou et al., “Magnetic resonance image reconstruction from undersampled measurements using a patch-based nonlocal operator,” Medical image analysis, vol. 18, no. 6, pp. 843–856, 2014.
- [18] I. Chatnuntawech, A. Martin et al., “Vectorial total generalized variation for accelerated multi-channel multi-contrast mri,” Magnetic resonance imaging, vol. 34, no. 8, pp. 1161–1170, 2016.
- [19] M. Elad and M. Aharon, “Image denoising via sparse and redundant representations over learned dictionaries,” IEEE Trans. Image Process., vol. 15, no. 12, pp. 3736–3745, 2006.
- [20] P. Chatterjee and P. Milanfar, “Patch-based near-optimal image denoising,” IEEE Trans. Imag. Proc., vol. 21, no. 4, pp. 1635–1649, 2012.
- [21] J. Mairal, F. Bach et al., “Sparse modeling for image and vision processing,” Foundations and Trends® in Computer Graphics and Vision, vol. 8, no. 2-3, pp. 85–283, 2014.
- [22] J. Mairal, F. Bach et al., “Non-local sparse models for image restoration,” in Proc. IEEE Int. Conf. Comput. Vision. IEEE, 2009, pp. 2272–2279.
- [23] R. Timofte, V. De Smet et al., “Anchored neighborhood regression for fast example-based super-resolution,” in Proc. IEEE Int. Conf. Comput. Vision. IEEE, 2013, pp. 1920–1927.
- [24] R. Timofte, V. De Smet et al., “A+: Adjusted anchored neighborhood regression for fast super-resolution,” in Proc. Asian Conf. Comput. Vision. Springer, 2014, pp. 111–126.
- [25] J. Yang, J. Wright et al., “Image super-resolution via sparse representation,” IEEE Trans. Image Process., vol. 19, no. 11, pp. 2861–2873, 2010.
- [26] J. A. Tropp and A. C. Gilbert, “Signal recovery from random measurements via orthogonal matching pursuit,” IEEE Trans. Inform. Theory, vol. 53, no. 12, pp. 4655–4666, 2007.
- [27] J. Mairal, F. Bach et al., “Online learning for matrix factorization and sparse coding,” The Journal of Machine Learning Research, vol. 11, pp. 19–60, 2010.
- [28] S. Ma, W. Yin et al., “An efficient algorithm for compressed mr imaging using total variation and wavelets,” in Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE, 2008, pp. 1–8.
- [29] J. Yang, Y. Zhang et al., “A fast alternating direction method for tvl1-l2 signal reconstruction from partial fourier data,” IEEE Journal of Selected Topics in Signal Processing, vol. 4, no. 2, pp. 288–297, 2010.
- [30] J. Trzasko and A. Manduca, “Highly undersampled magnetic resonance image reconstruction via homotopic -minimization,” IEEE Trans. Med. Imag., vol. 28, no. 1, pp. 106–121, 2009.
Appendix A: CDLMRI
We provide more details in relation to Section 2.2 CDLMRI in this appendix.
Stage 1) Coupled Dictionary Learning (details). The dictionary update step is to solve the optimization problem:
(9) |
Given a subset of the patches to constitute the training dataset and in Stage 1), the optimization problem (9) is equivalent to:
where, , , . Taking the dictionary update of and for example, we update the atom pairs one by one. For the -th atom pair and , we can immediately establish that
By expanding the Frobenius norm and removing the constant term, it turns out that the above problem is equivalent to the optimization problem
where denotes the -th row of .^{1}^{1}1Note that is a row vector resulting from the derivative w.r.t the -th atom pair, while is a column vector corresponding to the -th patch pair. We compute the derivative of the objective w.r.t. , leading to a norm equation:
Then, we apply the norm constraint.
The dictionary update of and is performed in a similar way. In order to accelerate the training, the proposed algorithm can be updated to online training version without difficulty.
Stage 3) -space Consistency Enforcing (details). In this stage, we aim to enforce consistency between the denoised image and its measurements in the -space domain. In particular, given the estimated patches from Stage 2), this step is formulated as a least square problem:
(10) |
which admits an analytical solution satisfying the normal equation
(11) |
where the superscript denotes the Hermitian transpose operation. The term is a diagonal matrix where each diagonal entry is the number of overlapping patches at the corresponding pixel location in . Assuming that patches wrap around at image boundaries, the number of overlapping patches at each pixel is the same, denoted by .^{2}^{2}2In particular, when the overlap stride , where the overlap stride is defined as the distance in pixels between corresponding pixel locations in adjacent image patches. Thus, the term represents the denoised image , where the intensity value of each pixel is the average of all the overlapping patches that cover this pixel. Multiplying by the normalized full Fourier transform matrix on the both sides of equation (11) leads to
(12) |
The matrix is a diagonal matrix consisting of ones (corresponding to sampling locations in -space) and zeros. Under the "wrap around" assumption, . Thus, the matrix pre-multiplying in (12) is diagonal and trivially invertible. The vector represents the zero-filled Fourier measurements. Dividing both sides of (12) by the constant to obtain
where , denotes the denoised image. We denote by the subset of -space that has been sampled and by the updated value at location () in the -space. Note that (8) uses the dictionaries that were learned in Stage 1) to interpolate the non-sampled Fourier frequencies, and update the sampled frequencies. Then, we immediately obtain the solution:
(13) |
where denotes the conjugate Fourier transform matrix. denotes the estimated -space samples as in (8). In other words, the estimation is obtained by inverse DFT of . Then the process returns to the Stage 1). The whole process is shown in Algorithm 1.
Appendix B: More Experiments
Tissue can be characterized by two different relaxation times – T1 (longitudinal relaxation time) and T2 (transverse relaxation time).^{3}^{3}3T1 (longitudinal relaxation time) is a measure of the time taken for excited spinning protons to realign with the external magnetic field and return to equilibrium. T2 (transverse relaxation time) is a measure of the time taken for excited spinning protons to lose phase coherence among the nuclei spinning perpendicular to the main field. T1-weighted and T2-weighted pair of MRI scans are two basic types of multi-contrast data, where the former is produced by using short TE and TR times and conversely the latter is produced by using longer TE (Time to Echo) and TR (Repetition Time) times. In general, T1-weighted MRI images results in highlighted/bright fat tissue, such as subcutaneous fat (SC fat) and bone marrow, and suppressed/dark water-based tissue, such as Cerebrospinal fluid (CSF). In contrast, T2-weighted MRI images highlight both fat tissue and water-based tissue. Therefore, the correlation of T1-weighted and T2-weighted is complex, instead of simple reverse mapping relationship.
In this experiment, we use under-sampled T1-weighted MRI as the target contrast and corresponding fully-sampled T2-weighted as the guidance contrast to replicate the same scenario as in [3]. Similar to previous approaches[28, 29, 30, 9, 3, 12], the data acquisition was simulated by retrospectively under-sampling the 2D discrete Fourier transform of clinical magnitude MR images.^{4}^{4}4After using the Fourier transform to transform measured k-space data into image space, the image data is of complex type, which is then manipulated for different clinical utility. In clinical practice, magnitude images are nearly exclusively used for diagnosis as it maximizes the signal-to-noise ratio (SNR). Phase-images are occasionally generated in clinical MRI for the depiction of flow and characterization of susceptibility-induced distortions. Therefore, from the perspective of diagnosis, we focus on the magnitude images. The sampling masks include Cartesian 1D and 2D random sampling. We compare the proposed approach with DLMRI [9] to show the benefits of integrating guidance information into the MRI reconstruction task. We also compare with SVTMRI [3] which uses the structure-guided total variation to integrate the guidance contrast to aid the reconstruction of the target one.Figure 3 and 4 show reconstruction results for the scenario where a variable density Cartesian mask is employed for under-sampling on the target T1-weighted contrast, with a fully sampled T2-weighted MRI for guidance contrast.