Residual Dense Network for Image Super-Resolution
A very deep convolutional neural network (CNN) has recently achieved great success for image super-resolution (SR) and offered hierarchical features as well. However, most deep CNN based SR models do not make full use of the hierarchical features from the original low-resolution (LR) images, thereby achieving relatively-low performance. In this paper, we propose a novel residual dense network (RDN) to address this problem in image SR. We fully exploit the hierarchical features from all the convolutional layers. Specifically, we propose residual dense block (RDB) to extract abundant local features via dense connected convolutional layers. RDB further allows direct connections from the state of preceding RDB to all the layers of current RDB, leading to a contiguous memory (CM) mechanism. Local feature fusion in RDB is then used to adaptively learn more effective features from preceding and current local features and stabilizes the training of wider network. After fully obtaining dense local features, we use global feature fusion to jointly and adaptively learn global hierarchical features in a holistic way. Extensive experiments on benchmark datasets with different degradation models show that our RDN achieves favorable performance against state-of-the-art methods.
Single image Super-Resolution (SISR) aims to generate a visually pleasing high-resolution (HR) image from its degraded low-resolution (LR) measurement. SISR is used in various computer vision tasks, such as security and surveillance imaging , medical imaging , and image generation . While image SR is an ill-posed inverse procedure, since there exists a multitude of solutions for any LR input. To tackle this inverse problem, plenty of image SR algorithms have been proposed, including interpolation-based , reconstruction-based , and learning-based methods [27, 28, 19, 2, 20, 8, 10, 30].
Among them, Dong et al.  firstly introduced a three-layer convolutional neural network (CNN) into image SR and achieved significant improvement over conventional methods. Kim et al. increased the network depth in VDSR  and DRCN  by using gradient clipping, skip connection, or recursive-supervision to ease the difficulty of training deep network. By using effective building modules, the networks for image SR are further made deeper and wider with better performance. Lim et al. used residual blocks (Fig. 1(a)) to build a very wide network EDSR  with residual scaling  and a very deep one MDSR . Tai et al. proposed memory block to build MemNet . As the network depth grows, the features in each convolutional layer would be hierarchical with different receptive fields. However, these methods neglect to fully use information of each convolutional layer. Although the gate unit in memory block was proposed to control short-term memory , the local convolutional layers don’t have direct access to the subsequent layers. So it’s hard to say memory block makes full use of the information from all the layers within it.
Furthermore, objects in images have different scales, angles of view, and aspect ratios. Hierarchical features from a very deep network would give more clues for reconstruction. While, most deep learning (DL) based methods (e.g., VDSR , LapSRN , and EDSR ) neglect to use hierarchical features for reconstruction. Although memory block  also takes information from preceding memory blocks as input, the multi-level features are not extracted from the original LR image. MemNet interpolates the original LR image to the desired size to form the input. This pre-processing step not only increases computation complexity quadratically, but also loses some details of the original LR image. Tong et al. introduced dense block (Fig. 1(b)) for image SR with relatively low growth rate (e.g.,16). According to our experiments (see Section 5.2), higher growth rate can further improve the performance of the network. While, it would be hard to train a wider network with dense blocks in Fig. 1(b).
To address these drawbacks, we propose residual dense network (RDN) (Fig. 2) to fully make use of all the hierarchical features from the original LR image with our proposed residual dense block (Fig. 1(c)). It’s hard and impractical for a very deep network to directly extract the output of each convolutional layer in the LR space. We propose residual dense block (RDB) as the building module for RDN. RDB consists dense connected layers and local feature fusion (LFF) with local residual learning (LRL). Our RDB also support contiguous memory among RDBs. The output of one RDB has direct access to each layer of the next RDB, resulting in a contiguous state pass. Each convolutional layer in RDB has access to all the subsequent layers and passes on information that needs to be preserved . Concatenating the states of preceding RDB and all the preceding layers within the current RDB, LFF extracts local dense feature by adaptively preserving the information. Moreover, LFF allows very high growth rate by stabilizing the training of wider network. After extracting multi-level local dense features, we further conduct global feature fusion (GFF) to adaptively preserve the hierarchical features in a global way. As depicted in Figs. 2 and 3, each layer has direct access to the original LR input, leading to an implicit deep supervision .
In summary, the main contributions of this work are three-fold:
We propose a unified frame work residual dense network (RDN) for high-quality image SR with different degradation models. The network makes full use of all the hierarchical features from the original LR image.
We propose residual dense block (RDB), which can not only read state from the preceding RDB via a contiguous memory (CM) mechanism, but also fully utilize all the layers within it via local dense connections. The accumulated features are then adaptively preserved by local feature fusion (LFF).
We propose global feature fusion to adaptively fuse hierarchical features from all RDBs in the LR space. With global residual learning, we combine the shallow features and deep features together, resulting in global dense features from the original LR image.
2 Related Work
Recently, deep learning (DL)-based methods have achieved dramatic advantages against conventional methods in restoration . Due to the limited space, we only discuss some works on image SR. Dong et al. proposed SRCNN , establishing an end-to-end mapping between the interpolated LR images and their HR counterparts for the first time. This baseline was then further improved mainly by increasing network depth or sharing network weights. VDSR  and IRCNN  increased the network depth by stacking more convolutional layers with residual learning. DRCN  firstly introduced recursive learning in a very deep network for parameter sharing. Tai et al. introduced recursive blocks in DRRN  and memory block in Memnet  for deeper networks. All of these methods need to interpolate the original LR images to the desired size before applying them into the networks. This pre-processing step not only increases computation complexity quadratically , but also over-smooths and blurs the original LR image, from which some details are lost. As a result, these methods extract features from the interpolated LR images, failing to establish an end-to-end mapping from the original LR to HR images.
To solve the problem above, Dong et al.  directly took the original LR image as input and introduced a transposed convolution layer (also known as deconvolution layer) for upsampling to the fine resolution. Shi et al. proposed ESPCN , where an efficient sub-pixel convolution layer was introduced to upscale the final LR feature maps into the HR output. The efficient sub-pixel convolution layer was then adopted in SRResNet  and EDSR , which took advantage of residual leanrning . All of these methods extracted features in the LR space and upscaled the final LR features with transposed or sub-pixel convolution layer. By doing so, these networks can either be capable of real-time SR (e.g., FSRCNN and ESPCN), or be built to be very deep/wide (e.g., SRResNet and EDSR). However, all of these methods stack building modules (e.g., Conv layer in FSRCNN, residual block in SRResNet and EDSR) in a chain way. They neglect to adequately utilize information from each Conv layer and only adopt CNN features from the last Conv layer in LR space for upscaling.
Recently, Huang et al. proposed DenseNet, which allows direct connections between any two layers within the same dense block . With the local dense connections, each layer reads information from all the preceding layers within the same dense block. The dense connection was introduced among memory blocks  and dense blocks . More differences between DenseNet/SRDenseNet/MemNet and our RDN would be discussed in Section 4.
The aforementioned DL-based image SR methods have achieved significant improvement over conventional SR methods, but all of them lose some useful hierarchical features from the original LR image. Hierarchical features produced by a very deep network are useful for image restoration tasks (e.g., image SR). To fix this case, we propose residual dense network (RDN) to extract and adaptively fuse features from all the layers in the LR space efficiently. We will detail our RDN in next section.
3 Residual Dense Network for Image SR
3.1 Network Structure
As shown in Fig. 2, our RDN mainly consists four parts: shallow feature extraction net (SFENet), redidual dense blocks (RDBs), dense feature fusion (DFF), and finally the up-sampling net (UPNet). Let’s denote and as the input and output of RDN. Specifically, we use two Conv layers to extract shallow features. The first Conv layer extracts features from the LR input.
where denotes convolution operation. is then used for further shallow feature extraction and global residual learning. So we can further have
where denotes convolution operation of the second shallow feature extraction layer and is used as input to residual dense blocks. Supposing we have residual dense blocks, the output of the -th RDB can be obtained by
where denotes the operations of the -th RDB. can be a composite function of operations, such as convolution and rectified linear units (ReLU) . As is produced by the -th RDB fully utilizing each convolutional layers within the block, we can view as local feature. More details about RDB will be given in Section 3.2.
After extracting hierarchical features with a set of RDBs, we further conduct dense feature fusion (DFF), which includes global feature fusion (GFF) and global residual learning (GRL). DFF makes full use of features from all the preceding layers and can be represented as
where is the output feature-maps of DFF by utilizing a composite function . More details about DFF will be shown in Section 3.3.
3.2 Residual Dense Block
Now we present details about our proposed residual dense block (RDB) in Fig. 3. Our RDB contains dense connected layers, local feature fusion (LFF), and local residual learning, leading to a contiguous memory (CM) mechanism.
Contiguous memory mechanism is realized by passing the state of preceding RDB to each layer of current RDB. Let and be the input and output of the -th RDB respectively and both of them have G feature-maps. The output of -th Conv layer of -th RDB can be formulated as
where denotes the ReLU  activation function. is the weights of the -th Conv layer, where the bias term is omitted for simplicity. We assume consists of G (also known as growth rate ) feature-maps. refers to the concatenation of the feature-maps produced by the -th RDB, convolutional layers in the -th RDB, resulting in G+G feature-maps. The outputs of the preceding RDB and each layer have direct connections to all subsequent layers, which not only preserves the feed-forward nature, but also extracts local dense feature.
Local feature fusion is then applied to adaptively fuse the states from preceding RDB and the whole Conv layers in current RDB. As analyzed above, the feature-maps of the -th RDB are introduced directly to the -th RDB in a concatenation way, it is essential to reduce the feature number. On the other hand, inspired by MemNet , we introduce a convolutional layer to adaptively control the output information. We name this operation as local feature fusion (LFF) formulated as
where denotes the function of the Conv layer in the -th RDB. We also find that as the growth rate G becomes larger, very deep dense network without LFF would be hard to train.
Local residual learning is introduced in RDB to further improve the information flow, as there are several convolutional layers in one RDB. The final output of the -th RDB can be obtained by
It should be noted that LRL can also further improve the network representation ability, resulting better performance. We introduce more results about LRL in Section 5. Because of the dense connectivity and local residual learning, we refer to this block architecture as residual dense block (RDB). More differences between RDB and original dense block  would be summarized in Section 4.
3.3 Dense Feature Fusion
After extracting local dense features with a set of RDBs, we further propose dense feature fusion (DFF) to exploit hierarchical features in a global way. Our DFF consists of global feature fusion (GFF) and global residual learning.
Global feature fusion is proposed to extract the global feature by fusing features from all the RDBs
where refers to the concatenation of feature-maps produced by residual dense blocks . is a composite function of and convolution. The convolutional layer is used to adaptively fuse a range of features with different levels. The following convolutional layer is introduced to further extract features for global residual learning, which has been demonstrated to be effective in .
Global residual learning is then utilized to obtain the feature-maps before conducting up-scaling by
where denotes the shallow feature-maps. All the other layers before global feature fusion are fully utilized with our proposed residual dense blocks (RDBs). RDBs produce multi-level local dense features, which are further adaptively fused to form . After global residual learning, we obtain dense feature .
It should be noted that Tai et al.  utilized long-term dense connections in MemNet to recover more high frequency information. However, in the memory block , the preceding layers don’t have direct access to all the subsequent layers. The local feature information are not fully used, limiting the ability of long-term connections. In addition, MemNet extracts features in the HR space, increasing computational complexity. While, inspired by [4, 21, 13, 16], we extract local and global features in the LR space. More differences between our residual dense network and MemNet would be shown in Section 4. We would also demonstrate the effectiveness of global feature fusion in Section 5.
3.4 Implementation Details
In our proposed RDN, we set as the size of all convolutional layers except that in local and global feature fusion, whose kernel size is . For convolutional layer with kernel size , we pad zeros to each side of the input to keep size fixed. Shallow feature extraction layers, local and global feature fusion layers have G=64 filters. Other layers in each RDB has G filters and are followed by ReLU . Following , we use ESPCNN  to upscale the coarse resolution features to fine ones for the UPNet. The final Conv layer has output channels, as we output color HR images. However, the network can also process gray images.
Difference to DenseNet. Inspired from DenseNet , we adopt the local dense connections into our proposed residual dense block (RDB). In general, DenseNet is widely used in high-level computer vision tasks (e.g., object recognition). While RDN is designed for image SR. Moreover, we remove batch nomalization (BN) layers, which consume the same amount of GPU memory as convolutional layers, increase computational complexity, and hinder performance of the network. We also remove the pooling layers, which could discard some pixel-level information. Furthermore, transition layers are placed into two adjacent dense blocks in DenseNet. While in RDN, we combine dense connected layers with local feature fusion (LFF) by using local residual learning, which would be demonstrated to be effective in Section 5. As a result, the output of the -th RDB has direct connections to each layer in the -th RDB and also contributes to the input of -th RDB. Last not the least, we adopt global feature fusion to fully use hierarchical features, which are neglected in DenseNet.
Difference to SRDenseNet. There are three main differences between SRDenseNet  and our RDN. The first one is the design of basic building block. SRDenseNet introduces the basic dense block from DenseNet . Our residual dense block (RDB) improves it in three ways: (1). We introduce contiguous memory (CM) mechanism, which allows the state of preceding RDB have direct access to each layer of the current RDB. (2). Our RDB allow larger growth rate by using local feature fusion (LFF), which stabilizes the training of wide network. (3). Local residual learning (LRL) is utilized in RDB to further encourage the flow of information and gradient. The second one is there is no dense connections among RDB. Instead we use global feature fusion (GFF) and global residual learning to extract global features, because our RDBs with contiguous memory have fully extracted features locally. As shown in Sections 5.2 and 5.3, all of these components increase the performance significantly. The third one is SRDenseNet uses loss function. Whereas we utilize loss function, which has been demonstrated to be more powerful for performance and convergence .
Difference to MemNet. In addition to the different choice of loss function ( in MemNet ), we mainly summarize another three differences bwtween MemNet and our RDN. First, MemNet needs to upsample the original LR image to the desired size using Bicubic interpolation. This procedure results in feature extraction and reconstruction in HR space. While, RDN extracts hierarchical features from the original LR image, reducing computational complexity significantly and improving the performance. Second, the memory block in MemNet contains recursive and gate units. Most layers within one recursive unit don’t receive the information from their preceding layers or memory block. While, in our proposed RDN, the output of RDB has direct access to each layer of the next RDB. Also the information of each convolutional layer flow into all the subsequent layers within one RDB. Furthermore, local residual learning in RDB improves the flow of information and gradients and performance, which is demonstrated in Section 5. Third, as analyzed above, current memory block doesn’t fully make use of the information of the output of the preceding block and its layers. Even though MemNet adopts densely connections among memory blocks in the HR space, MemNet fails to fully extract hierarchical features from the original LR inputs. While, after extracting local dense features with RDBs, our RDN further fuses the hierarchical features from the whole preceding layers in a global way in the LR space.
5 Experimental Results
We first describe the experimental settings and analyze the effects of the basic parameters of RDN. Then the contributions of different components in our proposed RDN are investigated in ablation experiments. Then we compare our RDN and RDN+ (using self-ensemble ) with state-of-the-art methods using three degradation models to simulate LR images. We further demonstrate the effectiveness of our RDN by super-resolving real-world images.
Datasets and Metrics. Recently, Timofte et al. have released a high-quality (2K resolution) dataset DIV2K for image restoration applications . DIV2K consists of 800 training images, 100 validation images, and 100 test images. We train all of our models with 800 training images and use 5 validation images in the training process. For testing, we use five standard benchmark datasets: Set5 , Set14 , B100 , Urban100 , and Manga109 . The SR results are evaluated with PSNR and SSIM  on Y channel (i.e., luminance) of transformed YCbCr space.
Degradation Models. In order to fully demonstrate the effectiveness of our proposed RDN, we use three degradation models to simulate LR images. The first one is bicubic downsampling by adopting the Matlab function imresize with the option bicubic (denote as BI for short). We use BI model to simulate LR images with scaling factor , , and . Similar to , the second one is to blur HR image by Gaussian kernel of size with standard deviation 1.6. The blurred image is then downsampled with scaling factor (denote as BD for short). We further produce LR image in a more challenging way. We first bicubic downsample HR image with scaling factor and then add Gaussian noise with noise level 30 (denote as DN for short).
Training Setting. Following settings of , in each training batch, we randomly extract 16 LR RGB patches with the size of as inputs. We randomly augment the patches by flipping horizontally or vertically and rotating 90. 1,000 iterations of back-propagation constitute an epoch. We implement our RDN with the Torch7 framework and update it with Adam optimizer . The learning rate is initialized to 10 for all layers and decreases half for every 200 epochs. Training a RDN roughly takes 1 day with a Titan Xp GPU for 200 epochs.
5.2 Study of D, C, and G.
In this subsection, we investigate the basic network parameters: the number of RDB (denote as D for short), the number of Conv layers per RDB (denote as C for short), and the growth rate (denote as G for short). We use the performance of SRCNN  as a reference. As shown in Figs. 4 and 4, larger D or C would lead to higher performance. This is mainly because the network becomes deeper with larger D or C. As our proposed LFF allows larger G, we also observe larger G (see Fig. 4) contributes to better performance. On the other hand, RND with smaller D, C, or G would suffer some performance drop in the training, but RDN would still outperform SRCNN . More important, our RDN allows deeper and wider network, from which more hierarchical features are extracted for higher performance.
|Different combinations of CM, LRL, and GFF|
5.3 Ablation Investigation
Table 1 shows the ablation investigation on the effects of contiguous memory (CM), local residual learning (LRL), and global feature fusion (GFF). The eight networks have the same RDB number (D = 20), Conv number (C = 6) per RDB, and growth rate (G = 32). We find that local feature fusion (LFF) is needed to train these networks properly, so LFF isn’t removed by default. The baseline (denote as RDN_CM0LRL0GFF0) is obtained without CM, LRL, or GFF and performs very poorly (PSNR = 34.87 dB). This is caused by the difficulty of training  and also demonstrates that stacking many basic dense blocks  in a very deep network would not result in better performance.
We then add one of CM, LRL, or GFF to the baseline, resulting in RDN_CM1LRL0GFF0, RDN_CM0LRL1GFF0, and RDN_CM0LRL0GFF1 respectively (from 2 to 4 combination in Table 1). We can validate that each component can efficiently improve the performance of the baseline. This is mainly because each component contributes to the flow of information and gradient.
We further add two components to the baseline, resulting in RDN_CM1LRL1GFF0, RDN_CM1LRL0GFF1, and RDN_CM0LRL1GFF1 respectively (from 5 to 7 combination in Table 1). It can be seen that two components would perform better than only one component. Similar phenomenon can be seen when we use these three components simultaneously (denote as RDN_CM1LRL1GFF1). RDN using three components achieves the best performance.
We also visualize the convergence process of these eight combinations in Fig. 5. The convergence curves are consistent with the analyses above and show that CM, LRL, and GFF can further stabilize the training process without obvious performance drop. These quantitative and visual analyses demonstrate the effectiveness and benefits of our proposed CM, LRL, and GFF.
5.4 Results with BI Degradation Model
Simulating LR image with BI degradation model is widely used in image SR settings. For BI degradation model, we compare our RDN with 6 state-of-the-art image SR methods: SRCNN , LapSRN , DRRN , SRDenseNet , MemNet , and MDSR . Similar to [29, 16], we also adopt self-ensemble strategy  to further improve our RDN and denote the self-ensembled RDN as RDN+. As analyzed above, a deeper and wider RDN would lead to a better performance. On the other hand, as most methods for comparison only use about 64 filters per Conv layer, we report results of RDN by using D = 16, C = 8, and G = 64 for fair comparison. EDSR  is skipped here, because it uses far more filters (i.e., 256) per Conv layer, leading to a very wide network with high number of parameters. However, our RDN would also achieve comparable or even better results than those by EDSR .
Table 2 shows quantitative comparisons for , , and SR. Results of SRDenseNet  are cited from their paper. When compared with persistent CNN models ( SRDenseNet  and MemNet ), our RDN performs the best on all datasets with all scaling factors. This indicates the better effectiveness of our residual dense block (RDB) over dense block in SRDensenet  and memory block in MemNet . When compared with the remaining models, our RDN also achieves the best average results on most datasets. Specifically, for the scaling factor , our RDN performs the best on all datasets. When the scaling factor becomes larger (e.g., and ), RDN would not hold the similar advantage over MDSR . There are mainly three reasons for this case. First, MDSR is deeper (160 v.s. 128), having about 160 layers to extract features in LR space. Second, MDSR utilizes multi-scale inputs as VDSR does . Third, MDSR uses larger input patch size (65 v.s. 32) for training. As most images in Urban100 contain self-similar structures, larger input patch size for training allows a very deep network to grasp more information by using large receptive field better. As we mainly focus on the effectiveness of our RDN and fair comparison, we don’t use deeper network, multi-scale information, or larger input patch size. Moreover, our RDN+ can achieve further improvement with self-ensemble .
In Fig. 6, we show visual comparisons on scale . For image “119082”, we observe that most of compared methods would produce noticeable artifacts and produce blurred edges. In contrast, our RDN can recover sharper and clearer edges, more faithful to the ground truth. For the tiny line (pointed by the red arrow) in image “’img_043’, all the compared methods fail to recover it. While, our RDN can recover it obviously. This is mainly because RDN uses hierarchical features through dense feature fusion.
5.5 Results with BD and DN Degradation Models
Following , we also show the SR results with BD degradation model and further introduce DN degradation model. Our RDN is compared with SPMSR , SRCNN , FSRCNN , VDSR , IRCNN_G , and IRCNN_C . We re-train SRCNN, FSRCNN, and VDSR for each degradation model. Table 3 shows the average PSNR and SSIM results on Set5, Set14, B100, Urban100, and Manga109 with scaling factor . Our RDN and RDN+ perform the best on all the datasets with BD and DN degradation models. The performance gains over other state-of-the-art methods are consistent with the visual results in Figs. 7 and 8.
For BD degradation model (Fig. 7), the methods using interpolated LR image as input would produce noticeable artifacts and be unable to remove the blurring artifacts. In contrast, our RDN suppresses the blurring artifacts and recovers sharper edges. This comparison indicates that extracting hierarchical features from the original LR image would alleviate the blurring artifacts. It also demonstrates the strong ability of RDN for BD degradation model.
For DN degradation model (Fig. 8), where the LR image is corrupted by noise and loses some details. We observe that the noised details are hard to recovered by other methods [3, 10, 35]. However, our RDN can not only handle the noise efficiently, but also recover more details. This comparison indicates that RDN is applicable for jointly image denoising and SR. These results with BD and DN degradation models demonstrate the effectiveness and robustness of our RDN model.
5.6 Super-Resolving Real-World Images
We also conduct SR experiments on two representative real-world images, “chip” (with 244200 pixels) and “hatc” (with 133174 pixels) . In this case, the original HR images are not available and the degradation model is unknown either. We compare our RND with VDSR , LapSRN , and MemNet . As shown in Fig. 9, our RDN recovers sharper edges and finer details than other state-of-the-art methods. These results further indicate the benefits of learning dense features from the original input image. The hierarchical features perform robustly for different or unknown degradation models.
In this paper, we proposed a very deep residual dense network (RDN) for image SR, where residual dense block (RDB) serves as the basic build module. In each RDB, the dense connections between each layers allow full usage of local layers. The local feature fusion (LFF) not only stabilizes the training wider network, but also adaptively controls the preservation of information from current and preceding RDBs. RDB further allows direct connections between the preceding RDB and each layer of current block, leading to a contiguous memory (CM) mechanism. The local residual leaning (LRL) further improves the flow of information and gradient. Moreover, we propose global feature fusion (GFF) to extract hierarchical features in the LR space. By fully using local and global features, our RDN leads to a dense feature fusion and deep supervision. We use the same RDN structure to handle three degradation models and real-world data. Extensive benchmark evaluations well demonstrate that our RDN achieves superiority over state-of-the-art methods.
This research is supported in part by the NSF IIS award 1651902, ONR Young Investigator Award N00014-14-1-0484, and U.S. Army Research Office Award W911NF-17-1-0367.
-  M. Bevilacqua, A. Roumy, C. Guillemot, and M. L. Alberi-Morel. Low-complexity single-image super-resolution based on nonnegative neighbor embedding. In BMVC, 2012.
-  C. Dong, C. C. Loy, K. He, and X. Tang. Learning a deep convolutional network for image super-resolution. In ECCV, 2014.
-  C. Dong, C. C. Loy, K. He, and X. Tang. Image super-resolution using deep convolutional networks. TPAMI, 2016.
-  C. Dong, C. C. Loy, and X. Tang. Accelerating the super-resolution convolutional neural network. In ECCV, 2016.
-  X. Glorot, A. Bordes, and Y. Bengio. Deep sparse rectifier neural networks. In AISTATS, 2011.
-  K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016.
-  G. Huang, Z. Liu, K. Q. Weinberger, and L. van der Maaten. Densely connected convolutional networks. In CVPR, 2017.
-  J.-B. Huang, A. Singh, and N. Ahuja. Single image super-resolution from transformed self-exemplars. In CVPR, 2015.
-  T. Karras, T. Aila, S. Laine, and J. Lehtinen. Progressive growing of gans for improved quality, stability, and variation. submitted to ICLR 2018, 2017.
-  J. Kim, J. Kwon Lee, and K. Mu Lee. Accurate image super-resolution using very deep convolutional networks. In CVPR, 2016.
-  J. Kim, J. Kwon Lee, and K. Mu Lee. Deeply-recursive convolutional network for image super-resolution. In CVPR, 2016.
-  D. Kingma and J. Ba. Adam: A method for stochastic optimization. In ICLR, 2014.
-  W.-S. Lai, J.-B. Huang, N. Ahuja, and M.-H. Yang. Deep laplacian pyramid networks for fast and accurate super-resolution. In CVPR, 2017.
-  C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi. Photo-realistic single image super-resolution using a generative adversarial network. In CVPR, 2017.
-  C.-Y. Lee, S. Xie, P. Gallagher, Z. Zhang, and Z. Tu. Deeply-supervised nets. In AISTATS, 2015.
-  B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee. Enhanced deep residual networks for single image super-resolution. In CVPRW, 2017.
-  D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In ICCV, 2001.
-  Y. Matsui, K. Ito, Y. Aramaki, A. Fujimoto, T. Ogawa, T. Yamasaki, and K. Aizawa. Sketch-based manga retrieval using manga109 dataset. Multimedia Tools and Applications, 2017.
-  T. Peleg and M. Elad. A statistical prediction model based on sparse representations for single image super-resolution. TIP, 2014.
-  S. Schulter, C. Leistner, and H. Bischof. Fast and accurate image upscaling with super-resolution forests. In CVPR, 2015.
-  W. Shi, J. Caballero, F. Huszár, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In CVPR, 2016.
-  W. Shi, J. Caballero, C. Ledig, X. Zhuang, W. Bai, K. Bhatia, A. M. S. M. de Marvao, T. Dawes, D. OâRegan, and D. Rueckert. Cardiac image super-resolution with global correspondence using multi-atlas patchmatch. In MICCAI, 2013.
-  C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi. Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI, 2017.
-  Y. Tai, J. Yang, and X. Liu. Image super-resolution via deep recursive residual network. In CVPR, 2017.
-  Y. Tai, J. Yang, X. Liu, and C. Xu. Memnet: A persistent memory network for image restoration. In ICCV, 2017.
-  R. Timofte, E. Agustsson, L. Van Gool, M.-H. Yang, L. Zhang, B. Lim, S. Son, H. Kim, S. Nah, K. M. Lee, et al. Ntire 2017 challenge on single image super-resolution: Methods and results. In CVPRW, 2017.
-  R. Timofte, V. De, and L. V. Gool. Anchored neighborhood regression for fast example-based super-resolution. In ICCV, 2013.
-  R. Timofte, V. De Smet, and L. Van Gool. A+: Adjusted anchored neighborhood regression for fast super-resolution. In ACCV, 2014.
-  R. Timofte, R. Rothe, and L. Van Gool. Seven ways to improve example-based single image super resolution. In CVPR, 2016.
-  T. Tong, G. Li, X. Liu, and Q. Gao. Image super-resolution using dense skip connections. In ICCV, 2017.
-  Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: from error visibility to structural similarity. TIP, 2004.
-  R. Zeyde, M. Elad, and M. Protter. On single image scale-up using sparse-representations. In Proc. 7th Int. Conf. Curves Surf., 2010.
-  H. Zhang, V. Sindagi, and V. M. Patel. Image de-raining using a conditional generative adversarial network. arXiv preprint arXiv:1701.05957, 2017.
-  K. Zhang, X. Gao, D. Tao, and X. Li. Single image super-resolution with non-local means and steering kernel regression. TIP, 2012.
-  K. Zhang, W. Zuo, S. Gu, and L. Zhang. Learning deep cnn denoiser prior for image restoration. In CVPR, 2017.
-  L. Zhang and X. Wu. An edge-guided image interpolation algorithm via directional filtering and data fusion. TIP, 2006.
-  Y. Zhang, Y. Zhang, J. Zhang, D. Xu, Y. Fu, Y. Wang, X. Ji, and Q. Dai. Collaborative representation cascade for single image super-resolution. IEEE Trans. Syst., Man, Cybern., Syst., PP(99):1–11, 2017.
-  W. W. Zou and P. C. Yuen. Very low resolution face recognition problem. TIP, 2012.