Deep MR Brain Image SuperResolution Using SpatioStructural Priors
Abstract
High resolution Magnetic Resonance (MR) images are desired for accurate diagnostics. In practice, image resolution is restricted by factors like hardware and processing constraints. Recently, deep learning methods have been shown to produce compelling stateoftheart results for image enhancement/superresolution. Paying particular attention to desired hiresolution MR image structure, we propose a new regularized network that exploits image priors, namely a lowrank structure and a sharpness prior to enhance deep MR image superresolution (SR). Our contributions are then incorporating these priors in an analytically tractable fashion as well as towards a novel prior guided network architecture that accomplishes the superresolution task. This is particularly challenging for the low rank prior since the rank is not a differentiable function of the image matrix (and hence the network parameters), an issue we address by pursuing differentiable approximations of the rank. Sharpness is emphasized by the variance of the Laplacian which we show can be implemented by a fixed feedback layer at the output of the network. As a key extension, we modify the fixed feedback (Laplacian) layer by learning a new set of training data driven filters that are optimized for enhanced sharpness. Experiments performed on publicly available MR brain image databases and comparisons against existing stateoftheart methods show that the proposed prior guided network offers significant practical gains in terms of improved SNR/image quality measures. Because our priors are on output images, the proposed method is versatile and can be combined with a wide variety of existing network architectures to further enhance their performance.
I Introduction
High Resolution (HR) MR images provide rich structural information about bodily organs which is critical in analyzing any given medical condition. Often, the quality of these images is restricted by factors like imaging hardware, sensor noise, budget, and time constraints. In such scenarios, the spatial resolution of these images can be enhanced by a welldesigned mathematical algorithm. Simple and fast interpolation methods like bilinear and bicubic [1] have been widely used for increasing the size of lowresolution (LR) medical images. In many cases, these methods are known to introduce blurring, blocking artifacts, ringing and are thus unable to recover sharp details of an image. To alleviate this problem, an alternative approach known as superresolution (SR) was introduced in [2]. Current literature on SR can be classified into two categories: multiimage SR and singleimage SR.
In multiimage SR [2, 3], an HR image is generated by exploiting the information from multiple LR images which are acquired from the same scene with a slightly shifted field of view. However, these methods are likely to fail if an adequate amount of LR images from the same scene are not available. As an alternative approach, single image SR was introduced wherein multiple LR images from the same scene are not required to obtain an HR image. In this approach, a mapping between LR and HR images is learned by constructing examples from a given database [4, 5, 6, 7, 8, 9].
Recently, deep learning methods have been shown to produce compelling stateoftheart results [10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23] for single image SR. Invariably though, the training requirement of deep networks, i.e. the number of example LR and HR image (or patch) pairs, is quite significant. In some medical diagnosis problems, generous LR and HR pairs is not a problem but there are compelling realworld problems such as enhancing 3T MR to 7T MR images [8, 24], where the paucity of training has been recognized. There has been encouraging recent application of deep networks for MR image SR [25, 26, 27, 28, 29] but the methods remain training intensive. An outstanding open challenge for deep MR image superresolution is the development of methods that exhibit a graceful degradation with respect to (w.r.t.) the number of training LR and HR image pairs.
Our approach to improve deep MR image superresolution for all training regimes is via the exploitation of suitable structural prior information pertinent to MR images. In [30], a model based SR approach is presented that uses lowrank (approximated by nuclear norm) and total variation regularizers. Despite the promise shown by a low rank prior, incorporating a low rank constraint or even its nuclear norm relaxation in a deep network for SR presents a stiff analytical challenge since neither is a differentiable function of the image matrix (and hence the network parameters). Our contribution includes incorporating a suitable approximation to the rank, which is smooth, differentiable and amenable for learning in a deep CNN framework. Additionally, recognizing the need for well formed sharp edges in diagnosis, we propose a sharpness prior realized via a variance of the Laplacian measure which adds to the network structure at the output as a fixed feedback layer. As we bring prior information into a deep network, we call our method Deep Network with SpatioStructural Priors (DNSP). As a key extension, we modify the fixed feedback (Laplacian) layer by learning a new set of training data driven filters that are optimized for enhanced sharpness.
Another class of approaches for single image SR called selfimage SR [31, 32] has been developed recently. Self image SR has been adapted to enhance resolution of MR images [33, 34]. These approaches exploit the fact that MR images are inherently anisotropic to learn a regression between LR and HR images. Thus, they generate new additional images, each of which is LR along a certain
direction, but is HR in the plane normal to it. Further, these approaches improve the resolution across z axis assuming the images in axial plane are HR. But, in our approach, we focus on improving the resolution of images in axial plane by exploiting suitable structural information of MR images thereby differentiating from the aforementioned methods.
Contributions: While most of the existing deep learning methods for MR image SR focus on learning an end to end relationship between LR and HR image, our goal is to enrich this deep learning framework by bringing in suitable structural information via informative priors and incorporating them in an analytically tractable fashion. Specifically, our key contributions are as follows:

Novel PriorGuided Network Structure: We propose a new network structure that consists of two components: 1) A regression network that maps LR to HR images 2.) a prior information network component that guides the learning of the regression network during the training phase. Note only, the regression network is used in the test (inference) stage to map LR to HR images.

Incorporating SpatioStructural Priors: We impose a low rank constraint on the output of the deep network. Evidence for brain MR images being lowrank has been provided recently [30]. However, incorporating a low rank constraint or even its nuclear norm relaxation into a deep learning framework is not straightforward as neither of the functions are differentiable. We provide a solution to integrate low rank constraint into a deep network by approximating the rank function with a smooth and differentiable function. We further incorporate a spatially based sharpness prior defined as the variance of Laplacian computed on the network output (image). Laplacian can be implemented via a linear convolution with a filter (fixed) and subsequently, the variance is computed to yield a regularization term.

Data Adaptive Filters to Enhance Sharpness: We further extend the aforementioned two contributions by learning a series of filters that are aimed at enhancing the sharpness of the output image. We develop new data adaptive regularizers which ensure that the learned sharpness filters are physically meaningful.

Novel Regularized Loss Function: Analytically, to integrate the proposed priors, we introduce three new regularization terms in the loss function along with the standard reconstruction loss term. The first regularization term poses a low rank prior, the second is a sharpness prior, while the third constrains the filters that replace the Laplacian and are aimed at enhancing sharpness. Further, backpropagation equations for optimizing network parameters w.r.t the regularized loss function are derived in a form that is implementation friendly.

Experimental validation and reproducibility: Experimental validation of our method is carried out on two publicly available data bases: 1.) Brainweb (BW)^{1}^{1}1http://brainweb.bic.mni.mcgill.ca/brainweb/ and 2.) Alzheimer’s Disease Neuroimaging Initiative (ADNI)^{2}^{2}2http://adni.loni.usc.edu/. We compare DNSP against several state of the art methods that are used for MR image SR. We also provide the entire code of our experiments for the purpose of reproducibility at https://scholarsphere.psu.edu/concern/generic_works/9s4655g25h.
A preliminary version of this work was presented as a 4 page conference paper at 2018 IEEE International Conference on Image Processing [35]^{3}^{3}3Our extensions from conference (4 pages) to Journal (13 pages) are consistent with the IEEE signal processing society guidelines: https://signalprocessingsociety.org/publicationsresources/informationauthors. This present draft involves both substantial conceptual and experimental extensions including: 1.) We evolve the fixed Laplacian layer into a learnable one that computes an enhanced sharpness measure, 2.) more detailed analytical development is presented including backpropogation derivations under the new regularizers, 3) we integrate the aforementioned priors with the most competitive deep network architectures that are used for image SR thereby demonstrating the versatility of our method, and 4) we significantly expand experiments by comparing against many new stateoftheart methods. Results are also presented for several variants of DNSP and in many new scenarios (training and test selection).
The rest of the paper is organized as follows. The proposed prior guided deep network for MR image SR is explained in Section II. Extensions of DNSP to include learnable sharpness filters is subsequently presented in Section III. Detailed experimental validation against the state of the art methods is reported in Section IV. Finally, our work is summarized and concluded in Section V.
Ii SpatioStructural Priors for Deep MR image superresolution
We first introduce the notation that is followed through rest of the paper and then give a brief introduction for deep networks for image SR before describing our DNSP method.
Iia Notation
Let represent the LR image where and are the width and height of the image respectively. Let be the output HR image and is the desired scale to which needs to be upscaled and is the ground truth HR image for . Let be the convolutional filter in layer where , and represent the width, height and depth of the filter respectively. Similarly, let be the bias coefficient of layer . The objective of the network is to learn and so that the output of the network is a close representation of the ground truth . So, let . To make the size of input and output of the network the same, we first upscale by a factor of using bicubic interpolation and use this upscaled as input to the network. Finally, let the mapping function of the network be represented by where .
IiB Deep CNNs For SR
Deep learning methods are a class of machine learning methods which are inspired by biological neural networks. In general, a cascade of many nonlinear processing units are used to learn features to represent data effectively for a given task. In particular, a deep CNN for image SR usually consists of two or more convolutional layers (each layer essentially is a combination of filters followed by an activation function) which are used to learn an endtoend mapping between sample HR and LR image pairs. For example, Fig. 1 illustrates the SRCNN network [11, 27] for superresolution which is known to be the most widely used deep SR network. Following these footsteps, many new architectures have been designed for image SR which showed considerable gains in terms of performance [10, 12, 14, 15, 16, 17, 18, 21, 22, 23]. Each convolutional layer in the network consists of several learnable filters, which are convolved with the output from the previous layer. For a given layer, outputs obtained by convoluting with each filter are combined to form a data cube which is passed through a nonlinear activation function and then forwarded as an input to next layer [36]. Most commonly used activation function in deep networks is the Rectified linear unit (Relu) [37]. The input to the first layer is the image obtained after bicubic interpolation and the output of the last layer is the expected HR image. The filters are learned to minimize the loss function given by:
(1) 
where represents the Frobenius norm.
IiC Deep Network with SpatioStructural Priors (DNSP)
As discussed in Section I, we integrate two priors into the learning of the CNN. Note that both the priors are to be applied on as it represents the desired output HR image. The two priors are as follows:
Low Rank Prior: It has been demonstrated recently [30, 38, 9, 39] that MR images are naturally rank deficient. For example, Figure 2 shows several low rank images of an MR image reconstructed from partial singular value decomposition (SVD) approximation. We can observe that the recovered image with a rank of 90, which is approximately half of full rank (170) of the image matrix; still exhibits a Peak Signal to Noise Ratio (PSNR) of about 45dB. Further, the reconstruction is visually indistinguishable from the original image. We wish to emphasize that an image being lowrank implies that the effective rank of the matrix is low. For example, it can be observed from Figure 2 that the change in PSNR value in the range of 110120 rank is relatively negligible compared to that of the PSNR change in the range of 90110 rank. Hence the effective rank of this particular image can be argued to be in between 115 and 120 which is much smaller than the full rank of 170. Rank of an image captures the global structure of a given image. An effective lowrank implies that the image adheres to some structural properties like near symmetry which can be observed in brain images. Hence, a lowrank constraint is effective in recovering the global structure of a given brain image.
However, the rank of a matrix is a nondifferentiable function w.r.t. its input and therefore cannot be used as regularizer in a CNN. Most of the optimization problems with a lowrank constraint are solved by minimizing the nuclear norm of the matrix which is a convex relaxation of the lowrank constraint. However, this relaxation also cannot be used in a CNN as the nuclear norm is also a nondifferentiable function. To address this, we pursue smooth and differentiable approximations of the rank. In particular, in recent work [40] an estimate of the number of singular values of a matrix that are zero was proposed as:
(2) 
where , represents the singular value of and
(3) 
where is a tunable parameter that affects the measure of approximation error in finding the rank. Intuitively, for small , gives the number of singular values of which are zero. Therefore, . Let , where . Now, the function is differentiable and its gradient w.r.t. is given by:
(4) 
where SVD of .
Sharpness Prior: HR images look much sharper compared to LR images. The main reason can be attributed to blurriness of the LR images. The pursuit of quantifying sharpness begins by computing the Laplacian () of the image [41]. The laplacian of a smooth/blurred image is more uniform compared to the laplacian of a sharp image. The variance of the Laplacian is hence an indicator of sharpness. As shown in Figure 3, an MR brain image is degraded by a gaussian filter with different blur parameters and plotted against the variance of laplacian. It can be observed that the variance of laplacian decreases as the blur parameter increases. Therefore, we propose to use as a regularizer to encourage the CNN to yield sharper HR images. is a quadratic function in Y and therefore a differentiable function which can be easily integrated into the CNN learning. Note that the laplacian of an image can be implemented by wellknown linear filters [41], which are also easily integrated into the CNN via a filtering layer at the output as shown in Fig. 4.
Remark: The two priors are chosen carefully so that they perform a complementary job to each other. For example, the lowrank constraint captures the global structure of the brain image and the sharpness prior aids in recovering the finer local structure thereby complementing the lowrank prior.
Network Structure: A key advantage of using priors on the output HR image is that they can be incorporated into any network architecture. In Figure 4, we show an example where the aforementioned two priors are incorporated into the basic SRCNN [11] framework. In Section IV, we demonstrate the versatility of our approach by incorporating the priors with more advanced networks. As observed from the Figure 4, to obtain the variance of laplacian, we use a filter after the final layer to compute the Laplacian and subsequently find the variance of Laplacian. The loss function of DNSP to be minimized is given by:
(5) 
where, , and are positive regularization parameters, note that negative sign before is to increase the variance of Laplacian. Note that the loss function in Eq (1) is a special case of Eq. (5). We learn by minimizing using a stochastic gradient descent method [42, 43]. In particular, weights are updated by the following equation:
(6) 
where, represents the iteration number, represents the learning rate, and represents the values of weights at previous iteration. As , following gradients are to be computed: , , where denotes an arbitrary scalar entry in filter . For simplicity, let output image be of dimension . The equation for computing the gradient of weight in layer is given by:
(7) 
where between two matrices and is defined as , is the gradient of and is the gradient for . The complete expression for is given by:
where , and is obtained by convolving with a laplacian operator . Expression for is given by:
Detailed derivations for the above equations are reported in the Appendix. Note that the gradient for bias terms are also updated in a similar fashion. The partial derivative is obtained by a standard back propagation rule [42, 43].
Iii DNSP with Data Adaptive Sharpness Enhancing Filters
A fixed Lapalacian filter can be sensitive to noise, enhance spurious components or might not be the best choice for a particular given data. Further, in the literature there exists a variety of sharpness enhancing filters [44] for different applications. Recall in Fig. 4 that the Laplacian is computed through a fixed convolutional filter. To develop dataadaptive filters, we intend to learn a set of sharpness enhancing filters jointly with the network parameters instead of using a fixed Laplacian filter. For this purpose, the fixed laplacian layer in Figure 4 is replaced by a set of filters that are initialized with the standard laplacian filter with additive minor perturbations generated by a normal random variable with 0 mean and a small variance of . The output of the SR network is passed through these filters which is followed by computing the average variance obtained from outputs of all these learnable sharpness filters. The extended architecture of our network that implements a learnable sharpness layer is shown in Fig. 6.
We represent these filters by , where is the total number of learnable sharpness filters. The modified equation to compute the variance (sharpness prior) is:
(8) 
A first step towards learning dataadaptive filters is the selection of sharp and smooth training patches which we carry out as follows:

From each training image, we extract two patches of size of which one is sharpest and the other is smoothest.

To find the sharpest patch from the training image where , we pass all the patches of size through a standard laplacian filter and select the patch that gives the maximum response in the sense of Frobenious norm of the patches.

Similarly, to find the smoothest patch , select the patch that gives the minimum response in the sense of Frobenious norm of the patches.

A visual inspection of all the candidate smooth and sharp patches is performed to arrive at selected smooth and sharp patches.
Note that a sharpness enhancing filter is expected to give the maximum response for the sharpest patches and the minimum response for the smoothest patches. This behavior is captured by formulating the following regularization term:
(9) 
A negative sign before the response of sharp patches indicates that we intend to maximize it. Figure 5 shows three representative examples of sharp and smooth patches extracted via the procedure we discussed above. The new regularized loss function to learn the parameters of the SR network and dataadaptive sharpness filters is given by:
(10) 
Modified BackPropagation Equations: Extending Eq. (7), we obtain:
(11) 
where is an arbitrary network parameter in layer . Note that does not depend on , which is not reflected in backpropagation equation. However, its influence is felt on learnable sharpness filters. The term remains the same as described in Section IIC. The complete expression for is given by:
(12) 
where , and is obtained by convolving with the learnable sharpness filter . The expression for is given by:
The backpropagation equations for the sharpness filter parameter are given by:
(13) 
where is the coefficient in learnable filter and . Note that gradient of is not dependent on the first two terms of the loss function in Eq. (III). is given by:
(14) 
where , is the coefficient of . can also defined in the similar fashion. is given by:
(15) 
where and is the same matrix as defined previously. From equations (11) and (13), we can observe that weights of the learnable sharpness filter influence the SR network parameters and viceversa.
Iv Experimental Evaluation
Iva Experimental Setup
Databases, Training and Test SetUp: We evaluate the proposed DNSP on two publicly available MR brain image databases. The first database is 20 simulated T1 brain image stacks from Brainweb (BW)^{4}^{4}4http://brainweb.bic.mni.mcgill.ca/brainweb/. Axial slices of these 20 stacks are distributed evenly for training and evaluation purposes. From each stack, we extract 40 slices making a total of 400 images for training and 400 images for evaluation. The second database we work with is from the Alzheimer’s Disease Neuroimaging Initiative (ADNI)^{5}^{5}5http://adni.loni.usc.edu/. The same training and test configuration is employed as that of the BW database.
LR image simulation: Consistent with [4, 30], we simulate training LR images by applying a gaussian blur and factor of downsampling. These LR images are then upscaled by bicubic interpolation. To speed up the training process, we further extract patches of size from these bicubic enlarged LR training images. Note that this is also a standard procedure used for training a typical deep SR network [11, 12, 14].
Parameter Choices: To obtain an accurate rank surrogate of , we chose based on guidelines mentioned in [40].
We determine regularization weights in Eq. (III) as , , and based on cross validation. More details can be found in the supplementary document. The number of learnable sharpness filters^{6}^{6}6It is observed that choosing did not offer any observable practical gains. is chosen as 8. The size of smooth and sharp patches is chosen as with the help of a domain expert. Batch size and number of epochs are chosen to be 64 and 50 for all the experiments. A total of around 6400 patches are extracted for ADNI dataset and 8000 patches are extracted for BW dataset. Therefore, 100 iterations are required to complete an epoch in the ADNI dataset and 125 iterations are required to complete an epoch in the BW dataset. For optimization, an Adam optimizer [45] with a learning rate of is used. These values are consistent with other deep learning based SR methods [11, 25, 14].
Methods and Metrics for Comparison: Two standard metrics PSNR and structural similarity index (SSIM) [46] are used for evaluation. We compare against following six methods:

Bicubic interpolation (BC)[1] a fast baseline method.

SRSW [4]  an example based SR via sparse weighting (SRSW) for medical image SR, represents a stateoftheart sparsity based method published in 2014.

LRTV [30] amongst the most competitive model based approaches, involves lowrank (nuclear norm) and total variation (LRTV) regularizers published in 2015.

SRCNN [11] the most widely used deep SR network, published in 2016.

DCSRN [25] Densely Connected SuperResolution Network (DCSRN), a stateoftheart deep learning approach developed specifically for MR image superresolution, published in 2018.
Network Architecture: As mentioned previously, the two proposed priors can be integrated into any deep SR network. Two deep SR networks we used are SRCNN and EDSR. SRCNN architecture is illustrated in Fig. 4. Figure 7 illustrates the architecture of EDSR. It is composed of total residual blocks wherein a convolutional layer of a residual block consists of filters. First layer is composed of 256 filters and last layer consists of one filter. A detailed description of network architectures for both the methods can be found in [11, 10]. Note that in the original EDSR architecture, the interpolation block is placed at the end of the residual network. In this work, to be consistent with SRCNN, we perform a bicubic interpolation prior to sending the input to EDSR network thereby removing the interpolation block after the residual network. Hence the size of the input send to the EDSR network is same as the size of the desired output. We did not observe noticeable performance difference by shifting the interpolation block, hence we chose the configuration that is consistent with other deep learning frameworks. More details can be found in the supplementary document. Unless otherwise stated, note that our priors are integrated with the EDSR network. Remark: Note that our choice of SRCNN and EDSR as base DNSP networks is because SRCNN is widely used and EDSR has recently been shown to be one of the best performing methods (winner of the 2017 NTIRE contest at IEEE conference on Computer Vision and Pattern recognition (CVPR)). The goal in this work is to demonstrate the value of priors in enhancing performance and not to perform an exhaustive comparison of deep SR architectures [17, 20].
IvB Significance of Priors: DNSP Variants
In this section, we report the results for different variants of the proposed DNSP to bring out the value added by each prior. We name the variants as follows: 1) DNSPNP, network with no priors which is same as EDSR, 2) DNSPLR, network with only low rank prior, 3) DNSPFS, network with only sharpness prior with a fixed Laplacian layer, 4) DNSPLS, network with only sharpness prior but with learnable sharpness filters and finally 5) DNSPAP, network with both the priors included along with learnable sharpness filters. Table I shows the PSNR and SSIM on both the datasets. We can observe that priors improve the performance of the network. Among the individual priors, we observe the best performance for DNSPLS, which is expected as the sharpness is enhanced via a data adaptive procedure that exploits available training. Figure 8 shows a comparison of the response from fixed laplacian filter and filters that are learned via DNSPLS method. We can observe that spurious (undesirable/noiselike) edges that are present in the fixed Laplacian response are minimally seen in the response of the 8 filters learned based on data, which on the other hand lead to sharper images overall. It can be observed that the responses of learned sharpness filter depart from that of the Laplacian ( for example in exhibiting some directional orientation), which is a result of training image data adaptation.
Further, to provide more insights about the performance of different priors, validation curves for PSNR vs EPOCH on test sets are illustrated in Fig. 9. It can be observed that the network with priors always outperforms the one without priors. As expected DNSPAP is the best performing method. We also observe that DNSPLS does better than DNSPFS and DNSPLR. For all the subsequent experiments, unless otherwise stated, we report results of DNSPAP.
Method  Database  PSNR  SSIM 

DNSPNP  BW  
ADNI  
DNSPLR  BW  
ADNI  
DNSPFS  BW  
ADNI  
DNSPLS  BW  
ADNI  
DNSPAP  BW  33.9170  .8902 
ADNI  32.1364  .9534 
Method  Database  PSNR (x2)  SSIM (x2)  PSNR (x3)  SSIM (x3)  PSNR (x4)  SSIM (x4) 

BC  BW  
ADNI  
SRSW[[4],TIP 2014]  BW  
ADNI  
LRTV[[30], TMI 2015]  BW  
ADNI  
SRCNN[[11], TPAMI 2016]  BW  
ADNI  
DCSRN[[25], ISBI 2018]  BW  
ADNI  
EDSR[[10], CVPR 2017]  BW  
ADNI  
DNSPSRCNNAP  BW  
ADNI  
DNSPEDSRAP  BW  33.92  .8902  30.50  .8312  26.72  .7742 
ADNI  32.14  .9534  28.02  .8757  25.78  .8253 
IvC Comparisons Against StateoftheArt Methods
Table II shows PSNR and SSIM values for all competing methods. Note that we used two different base networks for DNSP: 1) DNSPSRCNNAP  the base network is SRCNN and 2) DNSPEDSRAP  the base network is EDSR. Three trends emerge from the results: 1) DNSPEDSRAP outperforms the competition, 2) DNSPSRCNNAP does better than all the methods except EDSR, and 3) overall, deep SR methods, i.e. SRCNN, EDSR, DCSRN and DNSP perform better than other alternatives. To confirm this statistically, we performed a 2way Analysis of Variance (ANOVA) on PSNR values for all the methods across the two datasets which is illustrated in Fig. 10. It may be inferred from Fig. 10 that deep learning methods are statistically well separated from the traditional methods and further DNSPEDSRAP is well separated from all the competing methods indicating the effectiveness of using prior information. Figures 11 and 12 illustrate the results of the top 4 methods w.r.t. PSNR on a sample image from BW and ADNI databases respectively for a downsampling factor of 2 while Figures 13 and 14 show results for a downsampling factor of 4. DNSPEDSRAP particularly excels in recovering fine image detail (enlarged with zoomin boxes), thanks to dataadaptive sharpness.
IvD Performance in Varying Training Regimes
Figure 15 compares the performance of the learning based methods for different percentage of training samples considered on both the datasets. Twenty five, and percent of the 400 training images are employed. Two inferences can be made: 1) DNSPEDSRAP consistently outperforms EDSR, SRCNN, DCSRN and SRSW, 2) The performance degradation of both DNSPSRCNNAP and DNSPEDSRAP is more graceful with a decrease in the number of training samples. For example, PSNR values for EDSR, SRCNN, and SRSW dropped by almost close to 11.5db whereas for DNSP*, the drop is in between .51db, when the training drops to 25 percent. Another interesting observation is DNSPSRCNNAP does better than EDSR for the 25 percent training scenario. These results unequivocally demonstrate the value of priors (capturing domain specific signal structure) in enhancing performance when training imagery is limited. This is due to the fact that priors aid in capturing the structure of the images which in turn ensures that the deep network outputs images that are consistent with the structure of the original images. In the case of large training samples, the network has a sufficient amount of training samples to discover the inherent structure of the images. However, when the training data is limited, the network by itself does not have a sufficient amount of images to discover the inherent structure. In such cases priors guide the network to discover the appropriate structures of the underlying images, thereby enhancing the performance of the network. This fact is clearly brought out in Figure 9 where validation curves of different variants of our proposed method are illustrated. There we can observe that the prior guided networks start performing better than DNSPNP (EDSR with no priors) right from epoch 1 which confirms that the network is being guided to discover appropriate structures.
Further, to confirm this statistically, Fig. 16 shows the 2way ANOVA analysis for the 25 percent training scenario for all the deep learning based methods. It can be observed that DNSPEDSRAP is well separated from the other methods. Note that LRTV and BC are excluded from this experiment since these methods are not learning based.
Method  PSNR  SSIM 

LFMRI  
DNSPEDSRAP 


IvE Enhancing low field MR images
A key practical task is discovering the mapping from low to high field MRI. This is required in scenarios where expensive, high field MR captures are not available but low field MR images may be enhanced prior to diagnosis. This problem has received recent attention via learning based methods [8, 24]. The dataset in [8, 24] is however not publicly available. To circumvent this, we simulate low field MR images for the ADNI dataset using a recently developed technique in [47]. Certain assumptions are made by the authors of [47]: 1.) the noise model is assumed thermal, and 2.) Further, a single global relaxation correction function is used to account for the signal change at different field strengths. We point to [47] for more details and the code that implements the degradation from high to low field MR. We use their code to simulate 1.5T images from the available 3T ADNI images.
We report results on the 25 percent training setup as described before. For this experiment, we compared against [24] since it is also a deep learning method developed specifically for low to high field MR enhancement. We call this method as LFMRI. Table III shows the results and the benefits of DNSP are readily apparent. Visual comparisons of enhanced images via both the methods are shown in Fig. 17. A one way statistical ANOVA is further performed (using 400 test images) to confirm that the benefits of DNSP are indeed statistically pronounced – see Fig. 18.
IvF Experiments on Real World Clinical image pairs
To further validate our framework in real world scenarios, we perform experiments on two new datasets obtained from Human Connectome Project (HCP) [48]^{7}^{7}7https://db.humanconnectome.org/app/template/. Wide variety of datasets are available in the aforementioned link of which we selected two scenarios that are closely related to our work.

3T7TDW: This scenario consists of the Diffusion Weighted (DW) MRI images of the same patients acquired at 3T (Tesla) and 7T magnetic field strengths. We extracted the data of 15 patients and used images from 5 patients for training and the images from the rest of 10 patients for testing. The same selection strategy is used 5 times and the results are averaged to remove the selection bias. All the 3T scans are obtained from a customized Siemens 3T “Connectome Skyra”. A Spin Echo sequence with a repetition time (TR) of 5520ms and an echo time (TE) of 89.5ms is used for acquisition. The dimension of the images in 3T are in axial plane. The 7T scans are acquired by a Siemens Magnetom 7T MR scanner. A Spin Echo sequence with a TR of 7000ms and a TE of 71.2ms is used for acquisition. The dimension of the images in 7T are in axial plane. Before extracting the patches for training as described in Section IVA, we perform registration of 3T scans with a reference 7T scan using the Statistical Parametric Mapping (SPM) tool box [49, 50]^{8}^{8}8https://www.fil.ion.ucl.ac.uk/spm/software/spm12/. Note that a bicubic interpolation is not required for this scenario as the 3T images are already registered to the 7T images and hence a mapping is learned from the registered 3T images to the 7T images. During the inference, the new 3T scan is registered to the reference 7T scan and is send through the learned network. This dataset addresses two issues 1.) a realistic image enhancement application where the a low quality MR image is enhanced to a high quality MR image, 2.) recently it has been argued that DW MRI images can be a substitute for Positron Emission Tomography (PET) images [51, 52, 53] thereby confirming the versatility of our proposal.

3T3TT1: This scenario consists of the T1 Weighted MRI images of the same patients acquired by two different 3T scanners at different resolutions. The training and test strategy is similar to that of the 3T7TDW. All the high resolution 3T scans are obtained from a customized Siemens 3T “Connectome Skyra”. Scans are acquired with a repetition time (TR) of 2400ms and an echo time (TE) of 2.14ms is used for acquisition. The dimension of the images in are in axial plane with a slice thickness of 0.7mm. The lowresolution 3T scans are acquired by Siemens Magnetom 3T scanner at a resolution of with a slice thickness of 1.6mm. The registration is performed similar to the procedure described for the above scenario. The LR 3T images are registered to the reference 3T HR image via the SPM tool box.
Quantitative results are reported in Table IV. As can be observed, DNSPEDSRAP achieves superior performance over the state of the art EDSR network. Figures 19 and 20 show visual comparisons for example images from both the datasets. It can be observed that the DNSPEDSRAP enhanced image is closer to ground truth image compared to EDSR.
Method  Dataset  PSNR  SSIM 

EDSR  3T7TDW  
3T3TT1  
DNSPEDSRAP  3T7TDW  26.22  .8581 
3T3TT1  27.98  .8670 
IvG Ablation Study Against a TotalVariation (TV) regularizer
Although a detailed ablation study is performed in Section IVB, the study is centered around the variants of our own method. To demonstrate the benefits of the proposed priors comprehensively, we perform an experiment that incorporates both lowrank and totalvariation (TV) regularizers into a deep learning framework. We particularly chose TV and lowrank as this combination has proven successful for brain images in [30]. TV regularizer is used for recovering fine structures while suppressing noise. Our proposed sharpness enhancement measure does a similar job to a TV regularizer but more effectively as dataadaptive filters for a given dataset are learned where as a TV regularizer is generic and does not exploit any available training. Table V reports the comparison of our method against the lowrank and TV priors incorporated with the EDSR network (called EDSRTVLR) for the two realworld clinical datasets. Table V reveals that DNSPEDSRAP performs the best. Further, to confirm this statistically, Fig. 21 shows the 2way ANOVA analysis for EDSRTVLR and DNSPEDSRAP. It can be observed that DNSPEDSRAP is well separated from EDSRTVLR for both the datasets.
Method  Dataset  PSNR  SSIM 

EDSRTVLR  3T7TDW  
3T3TT1  
DNSPEDSRAP  3T7TDW  26.22  .8581 
3T3TT1  27.98  .8670 
V Discussions and Conclusion
In this paper, we present a novel regularized deep network structure for MR image superresolution, which excels in varying training regimes and experimental setups. This is accomplished by using two spatiostructural priors on the expected output HR image: 1) a lowrank prior, and 2) a sharpness prior. Our contributions include the development of new regularization terms that are inspired by the priors on the output of the network as well as tractable algorithmic methods to incorporate them in a deep learning setup. We demonstrate the versatility of our method by experimental validation involving two widely used and highly competitive deep learning architectures for the SR problem. Because our priors are on the network output, the proposed DNSP method can be combined with many other deep SR networks as well.
Future work could develop and incorporate other meaningful priors such as those that are anatomically inspired [54]. The interaction of prior induced regularization with specific network architectures can also be explored for speeding up network training and inference.
Appendix A Backpropagation Derivations
First we derive the backpropagation equations for the loss function in Eq. (5) which is given by:
(16) 
where, , and are positive regularization parameters. We learn by minimizing using a stochastic gradient descent method [43]. The weights ate each iteration are updated by the following rule
(17) 
where, represents the iteration number, represents the learning rate for the stochastic gradient descent method and represents the values of weights at previous iteration. As , following gradients are to be computed: , , where denotes an arbitrary scalar entry in filter . For simplicity, let output image be of dimension . The equation for computing the gradient of weight in any given layer is given by:
(18) 
where is the gradient of and is the gradient for . The complete expression for is given by:
(19) 
where , and is obtained by convolving with a laplacian operator . Expression for is given by:
(20) 
Gradient for is derived in [40]. Deriving expression for is mentioned below:
To obtain laplacian of the output image , it is convolved with a filter . Let the laplacian be represented by , where is given by Eq. (20). Variance of laplacian is given by . Therefore,
(21) 
Now, gradient of w.r.t is obtained by following chain rule:
(22) 
Note that can influence only , , , , . Hence the chain rule is restricted only to these values as the partial derivative of all the ’s w.r.t is 0. It is straightforward to observe that
(23) 
Substituting these values in Eq. (A) gives:
(24) 
Now Eq. (19) directly follows by taking derivative of in Eq. (21) w.r.t to obtain .
BackPropagation Equations for Modified Loss Function:
The modified loss function is given by:
(25) 
Following the lines of above derivation, the gradient of modified loss function w.r.t network parameter in layer is given by:
(26) 
Note that network parameter does not depend on , hence not reflected in backpropagation equations. The expression for remains same as described above. However, the expression for differs from the fixed laplacian version. First, a set of filters are used instead of a single laplacian filter wherein each filter is defined by set of coefficients , . Now, the expression for is given by:
(27) 
where , and is obtained by convolving with the learnable sharpness filter . Expression for is given by:
The expression for directly follows from the derivation of except for the fact that constant values in are replaced by the learnable filter parameters and summed over all the filters.
The backpropagation equations for a given learnable sharpness filter parameter is given by:
(28) 
where is given by:
(29) 
where , is the coefficient of , is the number of training images. This expressions is derived by applying the chain rule  derivative of Frobenious norm followed by derivative of convolution operation w.r.t to the filter parameter. is also defined similarly to . Following the same strategy, the expression for is given by:
(30) 
where and is the same matrix as defined by Eq. (27).
References
 [1] T. M. Lehmann, C. Gonner, and K. Spitzer, “Survey: Interpolation methods in medical image processing,” IEEE Trans. on Medical Imaging, vol. 18, no. 11, pp. 1049–1075, 1999.
 [2] R. Tsai, “Multiframe image restoration and registration,” Adv. Comput. Vis. Image Process., vol. 1, no. 2, pp. 317–339, 1984.
 [3] S. Farsiu, M. D. Robinson, M. Elad, and P. Milanfar, “Fast and robust multiframe super resolution,” IEEE Trans. on Image Processing, vol. 13, no. 10, pp. 1327–1344, 2004.
 [4] D.H. Trinh, M. Luong, F. Dibos, J.M. Rocchisani, C.D. Pham, and T. Q. Nguyen, “Novel examplebased method for superresolution and denoising of medical images,” IEEE Trans. on Image Processing, vol. 23, pp. 1882–1895, 2014.
 [5] W. T. Freeman, T. R. Jones, and E. C. Pasztor, “Examplebased superresolution,” Computer Graphics and Applications, vol. 22, no. 2, pp. 56–65, 2002.
 [6] H. Chang, D.Y. Yeung, and Y. Xiong, “Superresolution through neighbor embedding,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 1. IEEE, 2004, pp. I–I.
 [7] J. Yang, J. Wright, T. S. Huang, and Y. Ma, “Image superresolution via sparse representation,” IEEE Trans. on Image Processing, vol. 19, no. 11, pp. 2861–2873, 2010.
 [8] K. Bahrami, F. Shi, X. Zong, H. W. Shin, H. An, and D. Shen, “Reconstruction of 7Tlike images from 3T MRI,” IEEE Trans. on Medical Imaging, vol. 35, no. 9, pp. 2085–2097, 2016.
 [9] B. Wen, Y. Li, and Y. Bresler, “The power of complementary regularizers: Image recovery via transform learning and lowrank modeling,” arXiv preprint arXiv:1808.01316, 2018.
 [10] B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee, “Enhanced deep residual networks for single image superresolution,” in The IEEE conference on computer vision and pattern recognition (CVPR) workshops, vol. 1, no. 2, 2017, p. 4.
 [11] C. Dong, C. C. Loy, K. He, and X. Tang, “Image superresolution using deep convolutional networks,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 38, no. 2, pp. 295–307, 2016.
 [12] J. Kim, J. Kwon Lee, and K. Mu Lee, “Accurate image superresolution using very deep convolutional networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1646–1654.
 [13] D. Liu, Z. Wang, B. Wen, J. Yang, W. Han, and T. S. Huang, “Robust single image superresolution via deep networks with sparse prior,” IEEE Trans. on Image Processing, vol. 25, no. 7, pp. 3194–3207, 2016.
 [14] T. Guo, H. S. Mousavi, T. H. Vu, and V. Monga, “Deep wavelet prediction for image superresolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1100–1109.
 [15] Z. Wang, D. Liu, J. Yang, W. Han, and T. Huang, “Deep networks for image superresolution with sparse prior,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 370–378.
 [16] C. Dong, C. C. Loy, and X. Tang, “Accelerating the superresolution convolutional neural network,” in European Conference on Computer Vision. Springer, 2016, pp. 391–407.
 [17] R. Timofte, E. Agustsson et al., “NTIRE 2017 challenge on single image superresolution: Methods and results,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1110–1121.
 [18] J. Kim, J. Kwon Lee, and K. Mu Lee, “Deeplyrecursive convolutional network for image superresolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1637–1645.
 [19] T. Guo, H. S. Mousavi, and V. Monga, “Orthogonally regularized deep networks for image superresolution,” in IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, 2018.
 [20] R. Timofte et al., “NTIRE 2018 challenge on single image superresolution: Methods and results,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.
 [21] X. Zhao, Y. Zhang, T. Zhang, and X. Zou, “Channel splitting network for single MR image superresolution,” IEEE Transactions on Image Processing, vol. 28, no. 11, pp. 5649–5662, Nov 2019.
 [22] Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu, “Residual dense network for image superresolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 2472–2481.
 [23] Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y. Fu, “Image superresolution using very deep residual channel attention networks,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 286–301.
 [24] K. Bahrami, F. Shi, I. Rekik, and D. Shen, “Convolutional neural network for reconstruction of 7Tlike images from 3T MRI using appearance and anatomical features,” in Deep Learning and Data Labeling for Medical Applications. Springer, 2016, pp. 39–47.
 [25] Y. Chen, Y. Xie, Z. Zhou, F. Shi, A. G. Christodoulou, and D. Li, “Brain MRI super resolution using 3d deep densely connected neural networks,” in Biomedical Imaging (ISBI 2018), International Symposium on. IEEE, 2018, pp. 739–742.
 [26] J. Shi, Q. Liu, C. Wang, Q. Zhang, S. Ying, and H. Xu, “Superresolution reconstruction of MR image with a novel residual learning network algorithm,” Physics in Medicine & Biology, vol. 63, no. 8, p. 085011, 2018.
 [27] X. Yang, S. Zhant, C. Hu, Z. Liang, and D. Xie, “Superresolution of medical image using representation learning,” in Wireless Communications & Signal Processing (WCSP), 8th International Conference on. IEEE, 2016, pp. 1–6.
 [28] K. Srinivasan, A. Ankur, and A. Sharma, “Superresolution of magnetic resonance images using deep convolutional neural networks,” in Consumer ElectronicsTaiwan (ICCETW), International Conference on. IEEE, 2017, pp. 41–42.
 [29] C.H. Pham, A. Ducournau, R. Fablet, and F. Rousseau, “Brain MRI superresolution using deep 3d convolutional networks,” in Biomedical Imaging (ISBI), International Symposium on. IEEE, 2017, pp. 197–200.
 [30] F. Shi, J. Cheng, L. Wang, P.T. Yap, and D. Shen, “LRTV: MR image superresolution with lowrank and total variation regularizations,” IEEE Trans. on Medical Imaging, vol. 34, no. 12, pp. 2459–2466, 2015.
 [31] A. Shocher, N. Cohen, and M. Irani, “Zeroshot” superresolution using deep internal learning,” in Conference on computer vision and pattern recognition (CVPR), 2018.
 [32] J.B. Huang, A. Singh, and N. Ahuja, “Single image superresolution from transformed selfexemplars,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 5197–5206.
 [33] A. Jog, A. Carass, and J. L. Prince, “Self superresolution for magnetic resonance images,” in International Conference on Medical Image Computing and ComputerAssisted Intervention. Springer, 2016, pp. 553–560.
 [34] C. Zhao, A. Carass, B. E. Dewey, and J. L. Prince, “Self superresolution for magnetic resonance images using deep networks,” in Biomedical Imaging, International Symposium on. IEEE, 2018, pp. 365–368.
 [35] V. Cherukuri, T. Guo, S. J. Schiff, and V. Monga, “Deep MR image superresolution using structural priors,” in IEEE International Conference on Image Processing (ICIP). IEEE, 2018, pp. 410–414.
 [36] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, p. 436, 2015.
 [37] X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural networks,” in Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011, pp. 315–323.
 [38] Z. Tang, S. Ahmad, P. Yap, and D. Shen, “Multiatlas segmentation of mr tumor brain images using lowrank based image recovery,” IEEE Transactions on Medical Imaging, vol. 37, no. 10, pp. 2224–2235, Oct 2018.
 [39] Z. Tang, Y. Cui, and B. Jiang, “Groupwise registration of mr brain images containing tumors via spatially constrained lowrank based image recovery,” in International Conference on Medical Image Computing and ComputerAssisted Intervention. Springer, 2017, pp. 397–405.
 [40] M. MalekMohammadi, M. BabaieZadeh, A. Amini, and C. Jutten, “Recovery of lowrank matrices under affine constraints via a smoothed rank function,” IEEE Trans. on Signal Processing, vol. 62, no. 4, pp. 981–992, 2014.
 [41] D. Forsyth and J. Ponce, Computer vision: a modern approach. Upper Saddle River, NJ; London: Prentice Hall, 2011.
 [42] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradientbased learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
 [43] P. J. Werbos, The roots of backpropagation: from ordered derivatives to neural networks and political forecasting. John Wiley & Sons, 1994, vol. 1.
 [44] M. Sonka, V. Hlavac, and R. Boyle, Image processing, analysis, and machine vision. Cengage Learning, 2014.
 [45] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
 [46] D. Brunet, S. S. Channappayya, Z. Wang, E. R. Vrscay, and A. C. Bovik, “Optimizing image quality,” in Handbook of Convex Optimization Methods in Imaging Science, Ed. V. Monga. Springer, 2017, pp. 15–41.
 [47] Z. Wu, W. Chen, and K. S. Nayak, “Minimum field strength simulator for proton density weighted MRI,” PloS one, vol. 11, no. 5, p. e0154711, 2016.
 [48] D. C. Van Essen, S. M. Smith, D. M. Barch, T. E. Behrens, E. Yacoub, K. Ugurbil, W.M. H. Consortium et al., “The wuminn human connectome project: an overview,” Neuroimage, vol. 80, pp. 62–79, 2013.
 [49] R. S. Frackowiak, Human brain function. Elsevier, 2004.
 [50] K. J. Friston, J. Ashburner, C. D. Frith, J.B. Poline, J. D. Heather, and R. S. Frackowiak, “Spatial registration and normalization of images,” Human brain mapping, vol. 3, no. 3, pp. 165–189, 1995.
 [51] Y. Ohba, H. Nomori, T. Mori, K. Ikeda, H. Shibata, H. Kobayashi, S. Shiraishi, and K. Katahira, “Is diffusionweighted magnetic resonance imaging superior to positron emission tomography with fludeoxyglucose f 18 in imaging non–small cell lung cancer?” The Journal of thoracic and cardiovascular surgery, vol. 138, no. 2, pp. 439–445, 2009.
 [52] W. Luboldt, R. Kufer, N. Blumstein, T. L. Toussaint, A. Kluge, M. D. Seemann, and H.J. Luboldt, “Prostate carcinoma: diffusionweighted imaging as potential alternative to conventional mr and 11 ccholine pet/ct for detection of bone metastases,” Radiology, vol. 249, no. 3, pp. 1017–1025, 2008.
 [53] F. Barchetti, A. Stagnitti, V. Megna, N. Al Ansari, A. Marini, D. Musio, M. Monti, G. Barchetti, V. Tombolini, C. Catalano et al., “Unenhanced wholebody mri versus petct for the detection of prostate cancer metastases after primary treatment,” Eur Rev Med Pharmacol Sci, vol. 20, no. 18, pp. 3770–3776, 2016.
 [54] O. Oktay, E. Ferrante, K. Kamnitsas, M. Heinrich, W. Bai, J. Caballero, S. A. Cook, A. de Marvao, T. Dawes, D. P. O‘Regan et al., “Anatomically constrained neural networks (ACNNs): application to cardiac image enhancement and segmentation,” IEEE Trans. on Medical Imaging, vol. 37, no. 2, pp. 384–395, 2018.