Gated Multiple Feedback Network for Image Super-Resolution

Gated Multiple Feedback Network for Image Super-Resolution

Abstract

The rapid development of deep learning (DL) has driven single image super-resolution (SR) into a new era. However, in most existing DL based image SR networks, the information flows are solely feedforward, and the high-level features cannot be fully explored. In this paper, we propose the gated multiple feedback network (GMFN) for accurate image SR, in which the representation of low-level features are efficiently enriched by rerouting multiple high-level features. We cascade multiple residual dense blocks (RDBs) and recurrently unfolds them across time. The multiple feedback connections between two adjacent time steps in the proposed GMFN exploits multiple high-level features captured under large receptive fields to refine the low-level features lacking enough contextual information. The elaborately designed gated feedback module (GFM) efficiently selects and further enhances useful information from multiple rerouted high-level features, and then refine the low-level features with the enhanced high-level information. Extensive experiments demonstrate the superiority of our proposed GMFN against state-of-the-art SR methods in terms of both quantitative metrics and visual quality. Code is available at https://github.com/liqilei/GMFN.

\addauthor

Qilei Li*qilei.li@outlook.com1 \addauthorZhen Li*zhenli1031@gmail.com1 \addauthorLu Lululu19900303@126.com1 \addauthorGwanggil Jeonggjeon@gmail.com23 \addauthorKai Liukailiu@scu.edu.cn4 \addauthorXiaomin Yangarielyang@scu.edu.cn1 \addinstitution College of Electronics and Information Engineering
Sichuan University
Chengdu, China \addinstitution School of Electronic Engineering
Xidian University
Xi’an, China \addinstitution Department of Embedded Systems Engineering
Incheon National University
Incheon, Korea \addinstitution College of Electrical Engineering
Sichuan University
Chengdu, China GMFN for Image Super-Resolution

*Both authors have equally contribution.Corresponding author

1 Introduction

Single image super-resolution (SR) aims to reconstruct a high-resolution (HR) image from its corrupted low-resolution (LR) measurement. It’s an ill-posed problem since an LR image can be degraded from multiple HR images. In recent years, the development of deep learning (DL) based high-level vision (skip connections [He et al.(2016)He, Zhang, Ren, and Sun, Huang et al.(2017)Huang, Liu, Van Der Maaten, and Weinberger] and attention mechanism [Hu et al.(2018)Hu, Shen, and Sun]) helps networks for image SR become much deeper: from 3 layers in SRCNN [Dong et al.(2014)Dong, Loy, He, and Tang] to about 400 layers in RCAN [Zhang et al.(2018b)Zhang, Li, Li, Wang, Zhong, and Fu], and also made the effects of image SR a truly breakthrough [Kim et al.(2016a)Kim, Kwon Lee, and Mu Lee, Dong et al.(2016)Dong, Loy, and Tang, Lim et al.(2017)Lim, Son, Kim, Nah, and Mu Lee, Tong et al.(2017)Tong, Li, Liu, and Gao, Haris et al.(2018)Haris, Shakhnarovich, and Ukita, Zhang et al.(2018c)Zhang, Tian, Kong, Zhong, and Fu]. Nevertheless, as the network deepens, the required parameters are rapidly increasing. To alleviate this problem, the recurrent structures were exploited in [Kim et al.(2016b)Kim, Kwon Lee, and Mu Lee, Tai et al.(2017)Tai, Yang, and Liu, Han et al.(2018)Han, Chang, Liu, Yu, Witbrock, and Huang, Li et al.(2019)Li, Yang, Liu, Yang, Jeon, and Wu].

However, nearly all the DL based image SR networks are wholly feedforward: the features solely flow from the shallower layers to deeper ones, subsequently, the high-level features extracted from the top layer are directly used to reconstruct an SR image. For these feedforward networks, since the receptive fields in shallower layers are smaller than deeper ones, shallower layers cannot take the valuable contextual information into account. Such a shortcoming hinders the reconstruction ability to some extent.

Figure 1: Qualitative results for image SR on ‘img_092’ from Urban100 dataset. The proposed GMFN accurately recovers more image details compared with other state-of-the-art image SR methods.

The feedback mechanism in deep networks aims to refine the low-level features by propagating high-level features to the shallow layers. With the help of high-level information, low-level features become more representative and informative. It has been widely exploited in many high-level vision tasks [Carreira et al.(2016)Carreira, Agrawal, Fragkiadaki, and Malik, Zamir et al.(2017)Zamir, Wu, Sun, Shen, Shi, Malik, and Savarese, Jin et al.(2017)Jin, Chen, Jie, Feng, and Yan, Sam and Babu(2018), Zhang et al.(2018a)Zhang, Wang, Qi, Lu, and Wang, Li et al.(2019)Li, Yang, Liu, Yang, Jeon, and Wu] but has rarely been employed for image SR. Although SRFBN [Li et al.(2019)Li, Yang, Liu, Yang, Jeon, and Wu] explored the feasibility of feedback mechanism for image SR, its feedback connections only propagate the highest-level feature to a shallow layer, other high-level information captured under different sizes of receptive fields is omitted. Hence, such a design neither fully exploits high-level features, nor adequately refines the low-level features.

Based on the above considerations, we propose the gated multiple feedback network (GMFN) for image SR. Since not only the highest-level feature is effective in refining low-level features, we employ multiple feedback connections to transmit multiple high-level features to shallow layers. However, excessive high-level features may be overly redundant, and directly using them may conflict with the original low-level features. Consequently, we design gated feedback modules (GFMs) to adaptively select as well as enhance useful high-level information to refine low-level features. Thanks to the valuable contextual information from the high-level features, the low-level features become more representative, which will intrinsically improve the reconstruction ability of the network. As shown in Fig. 1, our proposed GMFN shows better visual quality in comparison with other state-of-the-art image SR methods.

The contributions of our work are summarized as follows:

  • We propose the gated multiple feedback network (GMFN) for accurate image SR. Extensive experiments demonstrate the superiority of the proposed GMFN among other state-of-the-art SR methods. Particularly, our final model unfolded with two time steps and each contains 7 residual dense blocks (RDBs) outperforms RDN[Zhang et al.(2018c)Zhang, Tian, Kong, Zhong, and Fu] which employs 16 RDBs.

  • We design the multiple feedback connections to propagate multiple hierarchical high-level features for refining the low-level features. Since high-level features are captured under large receptive fields, they possess more contextual information which is lacking in low-level features. With the help of valuable contextual information introduced by multiple feedback connections, low-level features become more representative, and then the reconstruction performance is intrinsically improved.

  • We design the simple yet efficient gated feedback module (GFM) to adaptively select and further enhance useful information from multiple rerouted high-level features for refining low-level features. Since only the useful information is permitted to pass, the redundant information among high-level features is efficiently eliminated. The selected and enhanced high-level information enables low-level features to be more informative.

2 Related Work

2.1 Feedback mechanism

The feedback mechanism in deep network empowers the low-level features to become more representative and informative by propagating the high-level information extracted from deep layers to shallow layers. It has been widely studied for various computer vision tasks (e.g\bmvaOneDotclassification [Zamir et al.(2017)Zamir, Wu, Sun, Shen, Shi, Malik, and Savarese], pose estimation [Carreira et al.(2016)Carreira, Agrawal, Fragkiadaki, and Malik], and so on [Pinherio and Pedro(2014), Liang et al.(2015)Liang, Hu, and Zhang, Jin et al.(2017)Jin, Chen, Jie, Feng, and Yan, Sam and Babu(2018), Zhang et al.(2018a)Zhang, Wang, Qi, Lu, and Wang, Li et al.(2019)Li, Yang, Liu, Yang, Jeon, and Wu]). The majority of feedback connections in these networks are single-to-single, which means only the highest-level features are transmitted to the shallowest layer. Following a different direction, [Jin et al.(2017)Jin, Chen, Jie, Feng, and Yan, Zhang et al.(2018a)Zhang, Wang, Qi, Lu, and Wang] applied the single-to-multiple feedback connections to scene parsing and salient object detection, in which the highest-level features are transmitted to multiple shallow layers. They claimed that delivering other high-level features back would introduce redundant information and it might hurt performance on high-level vision tasks. Nevertheless, we argue that such single-to-single and single-to-multiple feedback connection designs are not suitable for image SR task since different levels of features are captured under different receptive fields, every piece of them is significant in reconstructing an SR image.

Taking flaws of previous works into consideration, we introduce new types of feedback connections for accurate image SR, in which multiple hierarchical high-level features are transmitted to shallower layer(s). In other words, the proposed feedback connections are naturally multiple-to-single and multiple-to-multiple. Moreover, we design the gated feedback module to adaptively eliminate redundant information among propagated high-level features, and refine low-level features by using selected high-level information. The valuable contextual knowledge from the high-level information enables the low-level features to be more informative and representative, hence the reconstruction performance is intrinsically improved. Experimental results demonstrate that our gated multiple feedback connections obviously outperform both single-to-single and single-to-multiple ones (see Sec. 4.2).

2.2 Deep learning based image SR

Recently, deep learning based image SR technology has been rapidly developed over the pioneering work [Dong et al.(2014)Dong, Loy, He, and Tang]. The input of the network has changed from the interpolated LR image [Dong et al.(2014)Dong, Loy, He, and Tang, Kim et al.(2016a)Kim, Kwon Lee, and Mu Lee, Kim et al.(2016b)Kim, Kwon Lee, and Mu Lee] to the original LR image [Ledig et al.(2017)Ledig, Theis, Huszár, Caballero, Cunningham, Acosta, Aitken, Tejani, Totz, Wang, et al., Lim et al.(2017)Lim, Son, Kim, Nah, and Mu Lee, Zhang et al.(2018c)Zhang, Tian, Kong, Zhong, and Fu]. By doing so, the required computational cost was quadratically saved, and the motion effect caused by interpolation operation was efficiently alleviated. Furthermore, the applications of various skip connections helped the networks went deeper and obtained better reconstruction performance. EDSR [Lim et al.(2017)Lim, Son, Kim, Nah, and Mu Lee] and RCAN [Zhang et al.(2018b)Zhang, Li, Li, Wang, Zhong, and Fu] employed residual skip connections [He et al.(2016)He, Zhang, Ren, and Sun], SRDenseNet [Tong et al.(2017)Tong, Li, Liu, and Gao] applied dense skip connections [Huang et al.(2017)Huang, Liu, Van Der Maaten, and Weinberger], and RDN [Zhang et al.(2018c)Zhang, Tian, Kong, Zhong, and Fu] further integrated residual and dense skip connections together. However, these networks require a huge amount of parameters.

Recurrent structure can effectively reduce the parameters of the network. It has been widely applied to image SR. Specifically, DRCN [Kim et al.(2016b)Kim, Kwon Lee, and Mu Lee] and DRRN [Tai et al.(2017)Tai, Yang, and Liu] can be explained as recurrent neural networks (RNNs) if we regard the input LR image as the initial hidden state, and zero input as the input state [Han et al.(2018)Han, Chang, Liu, Yu, Witbrock, and Huang]. Based on this view, DSRN [Han et al.(2018)Han, Chang, Liu, Yu, Witbrock, and Huang] and NLRN [Liu et al.(2018)Liu, Wen, Fan, Loy, and Huang] designed dual-state and introduced non-local operations for image SR. In these RNN-based methods, however, the information flows from LR image to HR image are solely feedforward. Though SRFBN [Li et al.(2019)Li, Yang, Liu, Yang, Jeon, and Wu] explored the feasibility of feedback mechanism for image SR by designing an RNN with single-to-single feedback connections which deliver the highest-level features to a shallow layer. We argue that SRFBN fails to fully use other high-level features captured under large receptive fields, thus it cannot efficiently refine the low-level features.

In contrast, we propose GMFN to make the use of multiple high-level features to enrich the representation of low-level features by recurrently using feedback connections. Except multiple feedback information flows, the proposed GMFN has three other main differences compared with the aforementioned RNN-based methods. First, there are multiple recurrent connections, rather than one or two, between two adjacent time steps in our proposed GMFN. Second, in contrast to block-wise recurrent connections, our recurrent connections in the proposed GMFN can bypass multiple blocks, and thus are more flexible. Third, the features carried by recurrent connections in our proposed GMFN are first sent to the gated feedback module (GFM) for selecting meaningful information rather than directly sent to the recurrent block at the next time step.

3 Gated Multiple Feedback Network for Image SR

3.1 Network framework

As mentioned in [Zamir et al.(2017)Zamir, Wu, Sun, Shen, Shi, Malik, and Savarese, Li et al.(2019)Li, Yang, Liu, Yang, Jeon, and Wu], the core merit of a feedback system is propagating output to input in an iterative manner. Following this formulation, our proposed gated multiple feedback network (GMFN) is naturally designed as a convolutional recurrent neural network unrolling time steps, and the sub-network at each time step can be regarded as an independent convolutional neural network which aims at reconstructing an SR image using an original LR image. As shown in Fig. 2, each sub-network mainly consists of four parts: an initial low-level feature extraction block, multiple residual dense blocks (RDBs), multiple gated feedback modules (GFMs), and a reconstruction block (RB). The parameters of these four parts are shared across time. The communication between the sub-networks at two adjacent time steps is achieved by multiple groups of feedback connections. The GFM before one bottom RDB receive one group of feedback connections and further refines the low-level features using selected high-level information.

Figure 2: The framework of our proposed gated multiple feedback network (GMFN).

Given as the input image of GMFN at the -th time step, we apply two convolutional layers to extract initial low-level feature . The first layer and the second layer hold and sized convolutional kernels, respectively. can be obtained by

(1)

where represents the function of the initial low-level feature extraction block. Then the extracted initial low-level feature is fed to multiple RDBs to learn hierarchical features.

Stacking more RDBs will provide more various sizes of receptive field in a sub-network, thus form a better hierarchy of extracted features. Such abundant hierarchical features better assist us in refining low-level features. Each refinement process is accomplished by the GFM placed before one RDB with one group of feedback connections. The details about the GFM will be discussed in Sec. 3.2. Supposing we cascade RDBs at each time step, the final high-level feature in the LR space can be obtained by

(2)

where represents the function combining the operations of RDBs and GFMs. Specifically, owing to the lack of high-level information provided by the previous time step, there is no GMF placed before any RDB at the first time step (see Fig. 2). Following RDN [Zhang et al.(2018c)Zhang, Tian, Kong, Zhong, and Fu], the number of the convolutional layers for per RDB is set to 8.

In reconstruction block, the extracted high-level feature is first upscaled by a deconvolutional layer. Then, a sized convolutional layer recovers a residual image using the upscaled feature. Finally, the recovered residual image is combined with the interpolated LR image to reconstruct the SR image at the -th time step. The mathematical formulation of the reconstruction block can be expressed as:

(3)

where , and represent the functions of the reconstruction block, the deconvolutional layer and the convolutional layer, and interpolated kernel, respectively.

With time steps unfolded in the proposed GMFN, we can obtain SR images totally. Similarly, there are HR images as the reconstruction target of each sub-network. We adopt loss function to optimize our GMFN. The loss function can be formulated as:

(4)

where represents the parameter set of GMFN, and denotes the target HR image at the -th time step.

3.2 Gated feedback module and multiple feedback connections

The gated feedback module (GFM) is employed to utilize multiple high-level features rerouted from the previous time step to refine the low-level feature extracted from shallow layers. As shown in Fig. 2, one GFM is composed of a gate unit and a refinement unit. The gate unit adaptively selects and enhances useful high-level information from multiple high-level features. The refinement unit first refines low-level features by using the selected meaningful high-level information, and further sends the refined low-level feature to the following RDB. The placement of GFM is determined by the level of features to be refined. According to the relative hierarchical relationship among multiple cascaded RDBs, we choose the input of multiple shallow RDBs as the low-level features need to be refined, and the output of multiple deep RDBs as the high-level features to be rerouted. Since the deepest RDBs can extract the most representative information in the LR space which especially facilitates the refinement processes of initial low-level features, we employ multiple groups of feedback connections to deliver multiple high-level features from the deepest RDBs to the shallowest ones. Each group of feedback connections is handled by one GFM. Let’s denote as the set of selected indexes of the shallowest RDBs whose input is regarded as low-level features, and as the set of selected indexes of the deepest RDBs whose output is used to refine these low-level features. At the -th time step, the output of the -th RDB can be obtained via

(5)

where and represent the functions of the -th RDB and the refinement unit in the -th GFM, respectively, refers to the concatenation of and , and refers to the selected and enhanced high-level information from multiple high-level features which flow into the -th GFM. These high-level features are extracted from the deepest RDBs, and are then carried by one group of feedback connections. Therefore, the selected and enhanced high-level information can be given by

(6)

where represents the function of the gate unit in the -th GFM. Based on the relative hierarchical relationship among multiple cascaded RDBs, Eq. 6 indicates that the -th GFM only receive the output of RDBs whose indexes are equal or larger than from the previous time step. For parameter and computation efficiency, we employ two sized convolutional layers as the gate unit and the refinement unit in the -th GFM, respectively.

According to Eq. 5 and Eq. 6, the number of GFMs at each time step (except the first time step) and the number of groups of feedback connections between two adjacent time steps are equal to , and the number of feedback connections in each group is determined by the value of . Thus, we can adjust the values of and in selected index sets and to control how many low-level features need to be refined and high-level features will be rerouted, respectively. The mentioned feedback connections in Sec. 2.1 are special cases of our feedback formulation. In detail, we can easily set to achieve single-to-single () or single-to-multiple () feedback connection(s) which only routes the highest-level feature back to the shallowest RDB(s). However, since we argue that every piece of high-level information captured under different receptive fields is important for reconstructing an SR image, we set to achieve multiple-to-single () and multiple-to-multiple () feedback manners which fully exploits high-level features to refine the low-level feature(s).

3.3 Implementation details

We set unfolded time steps as 111For more analysis about time steps please refer to supplement material., and cascade RDBs in the sub-network at each time step. Following the previous work [Wang et al.(2018)Wang, Yu, Wu, Gu, Liu, Dong, Qiao, and Change Loy], the residual scale factor for each RDB is set to 0.2. The number of convolutional kernels in the first layer and the last layer of the sub-network is set to and , respectively. Because we mainly focus on the reconstruction of RGB images, naturally equals to 3. The number of convolutional kernels in other layers is set to . In the proposed GMFN, all convolutional and deconvolutional layers are followed by a PReLU [He et al.(2015)He, Zhang, Ren, and Sun] activation function except the last convolutional layer of each RDB and the reconstruction block. In the reconstruction block, a bilinear kernel is used to interpolate the LR image. For different upscale factors, the settings for the deconvolutional layer are same as [Li et al.(2019)Li, Yang, Liu, Yang, Jeon, and Wu].

4 Experimental Results

4.1 Settings

Datasets and evaluation metrics. We use 800 images from DIV2K for training, and augment training images with scaling, rotations, and flips. For testing, we employ five standard benchmark datasets: Set5 [Bevilacqua et al.(2012)Bevilacqua, Roumy, Guillemot, and Alberi-Morel], Set14 [Zeyde et al.(2010)Zeyde, Elad, and Protter], B100 [Martin et al.(2001)Martin, Fowlkes, Tal, Malik, et al.], Urban100 [Huang et al.(2015)Huang, Singh, and Ahuja], and Manga109 [Matsui et al.(2017)Matsui, Ito, Aramaki, Fujimoto, Ogawa, Yamasaki, and Aizawa]. We generate LR images from HR images by using the Matlab function imresize with the option bicubic. The SR results are evaluated with PSNR and SSIM [Wang et al.(2004)Wang, Bovik, Sheikh, Simoncelli, et al.] metrics on Y channel (i.e., luminance) of transformed YCbCr space.

Training settings. For each iteration, 16 RGB LR patches with a size of are fed to the network. The parameters are initialized using the He’s method [He et al.(2015)He, Zhang, Ren, and Sun]. Adam [Kingma and Ba(2014)] is employed to optimize the parameters with an initial learning rate of . The learning rate is halved for every iterations. The model is implemented under Pytorch framework and trained on an NVIDIA 2080Ti GPU.

4.2 Study of multiple feedback connections and GFM

In the following experiments, the number of convolutional kernels and for the first layer and other layers are set to 128 and 32, respectively. Each model is trained under iterations and evaluated on Urban100 dataset with scale factor .

(a)
(b)
(c)
Figure 3: Study of multiple feedback connections. Single-to-multiple (SM) feedback manner is provided for a better comparison. (a) Performance of various multiple-to-single feedback connections. (b) Performance of various multiple-to-multiple feedback connections. (c) Performance of various single-to-multiple anti-feedback connections.

Study of multiple-to-single feedback connections. Multiple-to-single feedback connections aim to transmit multiple high-level features to the first RDB. We compare seven cases of multiple-to-single feedback manners by setting and in selected index sets and , respectively. Specifically, for among these cases, the feedback connection is single-to-single as SRFBN [Li et al.(2019)Li, Yang, Liu, Yang, Jeon, and Wu]. For a better comparison, we employ single-to-multiple feedback manner [Jin et al.(2017)Jin, Chen, Jie, Feng, and Yan, Zhang et al.(2018a)Zhang, Wang, Qi, Lu, and Wang] as the baseline. Fig. 8(a) illustrates that all multiple-to-single feedback manners perform better than the single-to-multiple and single-to-single ones. As the propagated high-level information increases, the performance of the network gradually improves. This demonstrates that multiple high-level features are beneficial for refining low-level features. However, excessively introducing high-level features may conflict with the original low-level features, thus propagating more high-level features after the peak (i.e. ) will hurt the reconstruction performance of the network.

Study of multiple-to-multiple feedback connections. We fix with to meet the requirements of the multiple-to-multiple manner. Fig. 8(b) shows that as more shallow RDBs receive high-level information, the performance gradually degrades. This is owing to the first RDB has already made full use of the information from the rerouted high-level features. If the high-level features are propagated to other RDBs, they may conflict with newly refined low-level features and hinder the reconstruction ability of the network. Even so, multiple-to-single feedback connections still perform better than the single-to-multiple one. This further illustrates that not only the highest-level feature but multiple ones help to refine low-level features.

Study of anti-feedback connections. We design anti-feedback connections to further illustrate the effectiveness of the proposed multiple feedback connections. In detail, we reverse the feedback connections to transmit low-level information extracted from the shallowest RDB(s) to the deepest RDB. Similar to the definition of and , we take and to control how many low-level features to be transmitted and how many high-level features to be refined. As opposed to multiple-to-single feedback connections, we set and combine various to achieve multiple-to-single anti-feedback connections. As can be seen in Fig. 3(c), the anti-feedback connections shows worse reconstruction effect compared with the proposed multiple feedback connections. This demonstrates that exploiting low-level information to enhance high-level features is less efficient than using abundant high-level information to refine low-level features.

Figure 4: Visualization of averaged feature maps.

Study of the gated feedback module. The refinement unit in a GFM receives the feedback connection to achieve communication between two adjacent time steps. If we directly remove all GFMs or all refinement units in the GFMs, the communication between the two time step would be disconnected. Thus, we only investigate the necessity of the gate unit in the GFM. For and , equipped with the gate unit, our model achieves a PSNR value of 26.13. After removing the gate unit, multiple high-level features are directly concatenated with low-level features at the refinement unit, and the PSNR value under this circumstance drops to 26.06. The reason is that without the gate unit, directly concatenating redundant high-level features with low-level features will confuse the refinement unit, further hinder the reconstruction ability of the network. To better understand the gated feedback module, we visualize the averaged feature maps in Fig. 4. As can be seen, the gate unit adaptively selects the high frequency components, such as edges and outlines, of the hierarchical feedback high-level features and generates a more informative high-level features . With the help of the selected and enhanced high-level feature, the input low-level feature effectively accesses to high-level information, thus the refined low-level feature becomes more representative than the input low-level feature.

4.3 Comparison with the state-of-the-arts

In this sub-section, the proposed GMFN is equipped with the multiple feedforward connections by setting and . and are enlarged to 256 and 64, respectively. We demonstrate the effectiveness of GMFN by comparing it with eight state-of-the-art SR methods: SRCNN [Dong et al.(2014)Dong, Loy, He, and Tang], VDSR [Kim et al.(2016a)Kim, Kwon Lee, and Mu Lee], DRRN [Tai et al.(2017)Tai, Yang, and Liu], NLRN [Liu et al.(2018)Liu, Wen, Fan, Loy, and Huang], EDSR [Lim et al.(2017)Lim, Son, Kim, Nah, and Mu Lee], D-DBPN [Haris et al.(2018)Haris, Shakhnarovich, and Ukita], RDN [Zhang et al.(2018c)Zhang, Tian, Kong, Zhong, and Fu], and SRFBN [Li et al.(2019)Li, Yang, Liu, Yang, Jeon, and Wu]. We re-evaluate these comparison methods in accordance with corresponding public implementations and report the quantitative and qualitative comparison results222Comparison on running time and number of parameters is available in supplement material in Tab. 1 and Fig. 5, respectively. As can be seen, the proposed GMFN performs best on both PSNR and SSIM metrics in most public datasets. Especially, with only 14 RDBs (, ), our GMFN exhibits better reconstruction performance than RDN which has 16 RDBs. The qualitative results shown in Fig. 5 indicate our GMFN can reconstruct a faithful SR image with sharper and clearer edges. It can recover more image details compared with other methods. The consistency between quantitative and qualitative results convincingly proves the superiority of the proposed GMFN.

Dataset Scale Bicubic SRCNN VDSR DRRN NLRN EDSR D-DBPN RDN SRFBN GMFN
[Dong et al.(2014)Dong, Loy, He, and Tang] [Kim et al.(2016a)Kim, Kwon Lee, and Mu Lee] [Tai et al.(2017)Tai, Yang, and Liu] [Liu et al.(2018)Liu, Wen, Fan, Loy, and Huang] [Lim et al.(2017)Lim, Son, Kim, Nah, and Mu Lee] [Haris et al.(2018)Haris, Shakhnarovich, and Ukita] [Zhang et al.(2018c)Zhang, Tian, Kong, Zhong, and Fu] [Li et al.(2019)Li, Yang, Liu, Yang, Jeon, and Wu] (Ours)
Set5 33.66/0.9299 36.66/0.9542 37.53/0.9590 37.74/0.9591 38.08/0.9610 38.11/0.9602 38.09/0.9600 38.24/0.9614 38.11/0.9609 38.21/0.9612
- 30.39/0.8682 32.75/0.9090 33.67/0.9210 34.03/0.9244 34.30/0.9271 34.65/0.9280 -/- 34.71/0.9296 34.70/0.9292 34.73/0.9295
28.42/0.8104 30.48/0.8628 31.35/0.8830 31.68/0.8888 31.94/0.8920 32.46/0.8968 32.47/0.8980 32.47/0.8990 32.47/0.8983 32.55/0.8991
Set14 30.24/0.8688 32.45/0.9067 33.05/0.9130 33.23/0.9136 33.57/0.9167 33.92/0.9195 33.85/0.9190 34.01/0.9212 33.82/0.9196 34.05/0.9211
27.55/0.7742 29.30/0.8215 29.78/0.8320 29.96/0.8349 30.25/0.8386 30.52/0.8462 -/- 30.57/0.8468 30.51/0.8461 30.58/0.8473
26.00/0.7027 27.50/0.7513 28.02/0.7680 28.21/0.7721 28.44/0.7759 28.80/0.7876 28.82/0.7860 28.81/0.7871 28.81/0.7868 28.84/0.7888
B100 29.56/0.8431 31.36/0.8879 31.90/0.8960 32.05/0.8973 32.18/0.8991 32.32/0.9013 32.27/0.9000 32.34/0.9017 32.29/0.9010 32.34/0.9017
27.21/0.7385 28.41/0.7863 28.83/0.7990 28.95/0.8004 29.05/0.8024 29.25/0.8093 -/- 29.26/0.8093 29.24/0.8084 29.27/0.8093
25.96/0.6675 26.90/0.7101 27.29/0.7260 27.38/0.7284 27.48/0.7304 27.71/0.7420 27.72/0.7400 27.72/0.7419 27.72/0.7409 27.74/0.7421
Urban100 26.88/0.8403 29.50/0.8946 30.77/0.9140 31.23/0.9188 31.77/0.9243 32.93/0.9351 32.55/0.9324 32.89/0.9353 32.62/0.9328 32.96/0.9361
24.46/0.7349 26.24/0.7989 27.14/0.8290 27.53/0.8378 27.90/0.8443 28.80/0.8653 -/- 28.80/0.8653 28.73/0.8641 28.87/0.8667
23.14/0.6577 24.52/0.7221 25.18/0.7540 25.44/0.7638 25.78/0.7713 26.64/0.8033 26.38/0.7946 26.61/0.8028 26.60/0.8015 26.69/0.8048
Manga109 30.30/0.9339 35.60/0.9663 37.22/0.9750 37.60/0.9736 38.55/0.9768 39.10/0.9773 38.89/0.9775 39.18/0.9780 39.08/0.9779 39.13/0.9778
26.95/0.8556 30.48/0.9117 32.01/0.9340 32.42/0.9359 33.24/0.9414 34.17/0.9476 -/- 34.13/0.9484 34.18/0.9481 34.24/0.9487
24.89/0.7866 27.58/0.8555 28.83/0.8870 29.18/0.8914 29.82/0.8982 31.02/0.9148 30.91/0.9137 31.00/0.9151 31.15/0.9160 31.24/0.9174
Table 1: Quantitative evaluation under scale factors , and . The best performance is shown in bold and the second best performance is underlined.
Figure 5: Qualitative comparison of our GMFN with other methods on 4 image SR.

5 Conclusion

In this paper, we propose the gated multiple feedback network (GMFN) for accurate image SR. It successfully enriches the representation of low-level features by propagating multiple hierarchical high-level features to shallow layers. The elaborately designed gated feedback module (GFM) efficiently selects and enhances meaningful high-level information from multiple groups of feedback connections and uses the selected and enhanced high-level information to refine the low-level features. Extensive experiments on investigating and analyzing various feedback manners demonstrate the superiority of our proposed multiple feedback connections. With two time steps and each contains 7 RDBs, the proposed GMFN achieves better reconstruction performance compared to state-of-the-art image SR methods including RDN[Zhang et al.(2018c)Zhang, Tian, Kong, Zhong, and Fu] which contains 16 RDBs.

Acknowledgement

The research in our paper is sponsored by National Natural Science Foundation of China (No.61701327 and No.61711540303), Science Foundation of Sichuan Science and Technology Department (No.2018GZ0178).

References

  • [Bevilacqua et al.(2012)Bevilacqua, Roumy, Guillemot, and Alberi-Morel] Marco Bevilacqua, Aline Roumy, Christine Guillemot, and Marie Line Alberi-Morel. Low-complexity single-image super-resolution based on nonnegative neighbor embedding. In BMVC, 2012.
  • [Carreira et al.(2016)Carreira, Agrawal, Fragkiadaki, and Malik] Joao Carreira, Pulkit Agrawal, Katerina Fragkiadaki, and Jitendra Malik. Human pose estimation with iterative error feedback. In CVPR, 2016.
  • [Dong et al.(2014)Dong, Loy, He, and Tang] Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. Learning a deep convolutional network for image super-resolution. In ECCV, 2014.
  • [Dong et al.(2016)Dong, Loy, and Tang] Chao Dong, Chen Change Loy, and Xiaoou Tang. Accelerating the super-resolution convolutional neural network. In ECCV, 2016.
  • [Han et al.(2018)Han, Chang, Liu, Yu, Witbrock, and Huang] Wei Han, Shiyu Chang, Ding Liu, Mo Yu, Michael Witbrock, and Thomas S Huang. Image super-resolution via dual-state recurrent networks. In CVPR, 2018.
  • [Haris et al.(2018)Haris, Shakhnarovich, and Ukita] Muhammad Haris, Gregory Shakhnarovich, and Norimichi Ukita. Deep back-projection networks for super-resolution. In CVPR, 2018.
  • [He et al.(2015)He, Zhang, Ren, and Sun] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In ICCV, 2015.
  • [He et al.(2016)He, Zhang, Ren, and Sun] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016.
  • [Hu et al.(2018)Hu, Shen, and Sun] Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. In CVPR, 2018.
  • [Huang et al.(2017)Huang, Liu, Van Der Maaten, and Weinberger] Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. In CVPR, 2017.
  • [Huang et al.(2015)Huang, Singh, and Ahuja] Jia-Bin Huang, Abhishek Singh, and Narendra Ahuja. Single image super-resolution from transformed self-exemplars. In CVPR, 2015.
  • [Jin et al.(2017)Jin, Chen, Jie, Feng, and Yan] Xiaojie Jin, Yunpeng Chen, Zequn Jie, Jiashi Feng, and Shuicheng Yan. Multi-path feedback recurrent neural networks for scene parsing. In AAAI, 2017.
  • [Kim et al.(2016a)Kim, Kwon Lee, and Mu Lee] Jiwon Kim, Jung Kwon Lee, and Kyoung Mu Lee. Accurate image super-resolution using very deep convolutional networks. In CVPR, 2016a.
  • [Kim et al.(2016b)Kim, Kwon Lee, and Mu Lee] Jiwon Kim, Jung Kwon Lee, and Kyoung Mu Lee. Deeply-recursive convolutional network for image super-resolution. In CVPR, 2016b.
  • [Kingma and Ba(2014)] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In ICLR, 2014.
  • [Ledig et al.(2017)Ledig, Theis, Huszár, Caballero, Cunningham, Acosta, Aitken, Tejani, Totz, Wang, et al.] Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. Photo-realistic single image super-resolution using a generative adversarial network. In CVPR, 2017.
  • [Li et al.(2019)Li, Yang, Liu, Yang, Jeon, and Wu] Zhen Li, Jinglei Yang, Zheng Liu, Xiaomin Yang, Gwanggil Jeon, and Wei Wu. Feedback network for image super-resolution. In CVPR, 2019.
  • [Liang et al.(2015)Liang, Hu, and Zhang] Ming Liang, Xiaolin Hu, and Bo Zhang. Convolutional neural networks with intra-layer recurrent connections for scene labeling. In NeurIPS, 2015.
  • [Lim et al.(2017)Lim, Son, Kim, Nah, and Mu Lee] Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee. Enhanced deep residual networks for single image super-resolution. In CVPRW, 2017.
  • [Liu et al.(2018)Liu, Wen, Fan, Loy, and Huang] Ding Liu, Bihan Wen, Yuchen Fan, Chen Change Loy, and Thomas S Huang. Non-local recurrent network for image restoration. In NeurIPS, 2018.
  • [Martin et al.(2001)Martin, Fowlkes, Tal, Malik, et al.] David Martin, Charless Fowlkes, Doron Tal, Jitendra Malik, et al. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In ICCV, 2001.
  • [Matsui et al.(2017)Matsui, Ito, Aramaki, Fujimoto, Ogawa, Yamasaki, and Aizawa] Yusuke Matsui, Kota Ito, Yuji Aramaki, Azuma Fujimoto, Toru Ogawa, Toshihiko Yamasaki, and Kiyoharu Aizawa. Sketch-based manga retrieval using manga109 dataset. Multimedia Tools and Applications, 2017.
  • [Pinherio and Pedro(2014)] Ronan Collobert Pedro HO Pinherio and H Pedro. Recurrent convolutional neural networks for scene parsing. In ICML, 2014.
  • [Sam and Babu(2018)] Deepak Babu Sam and R Venkatesh Babu. Top-down feedback for crowd counting convolutional neural network. In AAAI, 2018.
  • [Tai et al.(2017)Tai, Yang, and Liu] Ying Tai, Jian Yang, and Xiaoming Liu. Image super-resolution via deep recursive residual network. In CVPR, 2017.
  • [Tong et al.(2017)Tong, Li, Liu, and Gao] Tong Tong, Gen Li, Xiejie Liu, and Qinquan Gao. Image super-resolution using dense skip connections. In ICCV, 2017.
  • [Wang et al.(2018)Wang, Yu, Wu, Gu, Liu, Dong, Qiao, and Change Loy] Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Yu Qiao, and Chen Change Loy. Esrgan: Enhanced super-resolution generative adversarial networks. In ECCV, 2018.
  • [Wang et al.(2004)Wang, Bovik, Sheikh, Simoncelli, et al.] Zhou Wang, Alan C Bovik, Hamid R Sheikh, Eero P Simoncelli, et al. Image quality assessment: from error visibility to structural similarity. TIP, 2004.
  • [Zamir et al.(2017)Zamir, Wu, Sun, Shen, Shi, Malik, and Savarese] Amir R Zamir, Te-Lin Wu, Lin Sun, William B Shen, Bertram E Shi, Jitendra Malik, and Silvio Savarese. Feedback networks. In CVPR, 2017.
  • [Zeyde et al.(2010)Zeyde, Elad, and Protter] Roman Zeyde, Michael Elad, and Matan Protter. On single image scale-up using sparse-representations. In Curves and Surfaces, 2010.
  • [Zhang et al.(2018a)Zhang, Wang, Qi, Lu, and Wang] Xiaoning Zhang, Tiantian Wang, Jinqing Qi, Huchuan Lu, and Gang Wang. Progressive attention guided recurrent network for salient object detection. In CVPR, 2018a.
  • [Zhang et al.(2018b)Zhang, Li, Li, Wang, Zhong, and Fu] Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bineng Zhong, and Yun Fu. Image super-resolution using very deep residual channel attention networks. In ECCV, 2018b.
  • [Zhang et al.(2018c)Zhang, Tian, Kong, Zhong, and Fu] Yulun Zhang, Yapeng Tian, Yu Kong, Bineng Zhong, and Yun Fu. Residual dense network for image super-resolution. In CVPR, 2018c.

Supplementary Material

The following items are contained in the supplementary material:

A. Feedback networks vs. feedforward networks.

B. Study of time step.

C. Model analysis.

D. More qualitative results.

Appendix A Feedback networks vs. feedforward networks

Figure 6: Feedback networks (FB) vs. feedforward networks (FF)
Figure 7: Study of time step (T)

We testify the superiority of our feedback networks over the corresponding feedforward networks. The number of convolutional kernels and for the first layer and other layers are set to 128 and 32, respectively. All models are trained under iterations and evaluated on Urban100 dataset. We set and to represent multiple-to-single (MS) feedback network, and to represent multiple-to-multiple (MM) one, and mark them with ‘FB’. Their feedforward counterparts (marked with ‘FF’) are implemented by disconnecting the loss to all time steps except the last one [Zamir et al.(2017)Zamir, Wu, Sun, Shen, Shi, Malik, and Savarese, Li et al.(2019)Li, Yang, Liu, Yang, Jeon, and Wu]. The experimental results shown Figure 7 indicate that both MM_FB and MS_FB feedback networks outperform the corresponding feedforward networks. This confirms that our gated multiple feedback network has obvious advantages over the traditional feedforward networks.

(a)
(b)
Figure 8: (a) Accuracy and numbers of parameters trade-off. (b) Accuracy and average running time trade-off. Models are evaluated on Urban100 under scale factor on an NVIDIA 1080Ti GPU with an i7-7700K CPU.

Appendix B Study of time step

In this section, we investigate the influence of the time step on the proposed GMFN. Under identical setting of multiple feedback connection, we set = 1, 2, 3, and 4, respectively. The performance evaluated on Urban100 dataset is shown in Figure 7. It can be observed that with the help of multiple feedback connections, the reconstruction ability is significantly improved compared with the one without feedback connections (=1). However, we also observed as continues to increase, the reconstruction quality improves slightly. Hence, we set in our main paper for better balance the reconstruction performance and the computational cost.

Appendix C Model analysis

We compare the running time and the number of parameters of our final model with some representative state-of-the-art methods on Urban100 with scale factor . Figure 8 shows that the proposed GMFN can well balance the reconstruction accuracy, running time, as well as number of parameters. In terms of running time, GMFN runs orders of magnitude faster than RDN [Zhang et al.(2018c)Zhang, Tian, Kong, Zhong, and Fu], EDSR [Lim et al.(2017)Lim, Son, Kim, Nah, and Mu Lee], and so on. Compared with SRFBN [Li et al.(2019)Li, Yang, Liu, Yang, Jeon, and Wu] and D-DBPN [Haris et al.(2018)Haris, Shakhnarovich, and Ukita], which require similar running time, GMFN achieves a better reconstruction performance. Additionally, GMFN requires 6% fewer parameters than D-DBPN, 56% fewer parameters than RDN, and 77% fewer parameters than EDSR while obtaining a higher PSNR value. RCAN [Zhang et al.(2018b)Zhang, Li, Li, Wang, Zhong, and Fu] can attain a better reconstruction performance than all other comparison methods, but it holds relatively more parameters and a much deeper network design (about 400 layers). We will further extend our work following such design.

Appendix D More qualitative results

In Fig. 9-16, we provide more qualitative results to prove the superiority of the proposed GMFN.

Figure 9: Qualitative results on ‘img_044’ with scale factor 4. The proposed GMFN better recovers grids on the ceiling.
Figure 10: Qualitative results on ‘img_062’ with scale factor 4. The proposed GMFN produces a faithful SR image and avoids artifacts as other methods.
Figure 11: Qualitative results on ‘ParaisoRoad’ with scale factor 4. Only the GMFN accurately restored the letter "M", and the results of the other methods broke.
Figure 12: Qualitative results on ‘ToutaMairimasu’ with scale factor 4. Only GMFN recoveres two horizontal lines as the HR image. Other comparison methods only restore one horizontal line erroneously.
Figure 13: Qualitative results on ‘UchiNoNyansDiary’ with scale factor 4. Only GMFN faithful recovers detail of headwear, while other comparison methods cause heavy blurring artifacts.
Figure 14: Qualitative results on ‘253027’ with scale factor 4. Only GMFN correctly reconstructs the direction of the zebra’s stripes.
Figure 15: Qualitative results on ‘210088’ with scale factor 4. GMFN reconstructes a more vivid fisheye compared with other methods.
Figure 16: Qualitative results on ‘butterfly’ with scale factor 4.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
382226
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description