Image Enhancement by Recurrently-trained Super-resolution Network

Image Enhancement by Recurrently-trained Super-resolution Network

Seoul National University, Seoul, Korea
LG Eletronics, Korea
   Nojun Kwak
Seoul National University, Seoul, Korea

We introduce a new \textcolorblacklearning strategy for image enhancement by recurrently training the same \textcolorblacksimple super-resolution (SR) network multiple times. After \textcolorblackinitially training an SR network by using pairs of a corrupted low resolution (LR) image and an original image, \textcolorblackthe proposed method makes \textcolorblackuse of the trained SR network to generate new high resolution (HR) images \textcolorblackwith a doubled resolution from the original uncorrupted images. \textcolorblackThen, the new HR images are downscaled to the original resolution, which work as target images for the SR network in the next stage. \textcolorblackThe newly generated HR images \textcolorblackby the repeatedly trained SR network show better \textcolorblackimage \textcolorblackquality and \textcolorblackthis strategy of training LR to mimic new HR can lead to a more efficient SR network. \textcolorblackUp to a certain point, by repeating this process multiple times, better and better images are obtained. This recurrent leaning \textcolorblackstrategy for SR can be a good solution for downsizing convolution networks and \textcolorblackmaking a more efficient SR network. \textcolorblackTo measure the enhanced image quality, for the first time in this area of super-resolution and image enhancement, we use VIQET [18] MOS score which reflects human visual quality more accurately than the conventional MSE measure.


1 Introduction


blackNowadays, 2K (19201080) videos are \textcolorblackwidely used in digital broadband systems. From the viewpoint of image quality, there \textcolorblackare two major \textcolorblacksources of impaired quality in the current video images. The first \textcolorblackone is \textcolorblackthe intrinsic limitation of image capture systems such as lens optics, sensor resolution, focusing performances and ISO noise. These defects make \textcolorblackinherent limitations in image quality\textcolorblack. Therefore almost every videos cannot show \textcolorblackideal 2K image quality, with maximum frequencies. \textcolorblackThe second \textcolorblacksource of corruptions is due from image processing such as compression, resolution converting and noise reduction. \textcolorblackMost components of a broadband system have a fixed resolution and they require unavoidable needs of image scaling. \textcolorblackParticularly, in order to transmit sources with lower resolution made in the past via a broadband system, video upscaling is inevitable. Also \textcolorblackmost of contents providers \textcolorblackuse heavy compression \textcolorblackand restoring the lost information is a difficult task \textcolorblackwhich requires enhanced techniques.

Figure 1: \textcolorblackProposed Concept. It would be better to learn through super-resolution-processed images. \textcolorblackThe figures are (original), and \textcolorblackobtained by our method. With the same network, \textcolorblack(c) shows better quality than (b).

blackThe early image enhancement techniques were mostly at the level of unsharp-masking, which is \textcolorblackjust a simple amplification \textcolorblackoperation. With the advent of high-resolution panels, super-resolution \textcolorblacktechniques have been developed to improve the original resolution. \textcolorblackIn recent years, machine learning \textcolorblacktechnologies are widely used and deep learning has \textcolorblackachieved state-of-the-art super-resolution \textcolorblackperformance. However, \textcolorblackas the resolution of the image \textcolorblackas well as the complexity of the algorithm increases, \textcolorblackimplementation becomes harder and harder. \textcolorblackTherefore, it is necessary to develop cheaper and more effective algorithms \textcolorblackfor image enhancement and super-resolution.


blackIn this paper, as shown in Figure 1, we introduce the concept of iterative learning as an approach for image enhancement. In general, if the same \textcolorblackalgorithm is applied repeatedly, the result will be reinforced. Especially in deep learning, we can approach \textcolorblackthis idea without \textcolorblackadditional computational cost at inference time. What we need is additional training with \textcolorblackthe new HR, made in the previous stage. Through this recurrent learning, the image converges to a better quality, and it \textcolorblackis possible to obtain \textcolorblackan enhancement over the existing super-resolution with a smaller-sized network. \textcolorblackBy training a small network with 200K parameters three repetitions, we could get images with better quality compared to a model with over 1M parameters.


blackThe contributions of this paper are as follows:

1. By showing an effective \textcolorblackway of repeatedly training a simple feed-forward network for super-resolution through recurrent learning, \textcolorblackwe improve the efficiency of super-resolution \textcolorblackachieving similar results to those \textcolorblackof complex deep networks without \textcolorblackmuch computational cost.

2. By shifting the evaluation criteria of the image quality from a simple \textcolorblackarithmetic measure like Mean-Squared-Error \textcolorblack(MSE) to a user-centered measure of \textcolorblackVIQET [18] Mean-Opinion-Score \textcolorblack(MOS), \textcolorblackwhich can be computed in an automated way without human involvement, we can focus on additional \textcolorblacksubtle image quality improvement \textcolorblackand set the number of repetitions appropriately.

2 Related work

2.1 \textcolorblackImage enhancement

Along with the development of computer vision and the spread of digital broadcasting, image processing techniques have been widely used in image production and reproduction. \textcolorblackThese techniques can be roughly divided into \textcolorblacknoise reduction and \textcolorblackimage enhancement. The former includes technologies such as de-noising and de-interlacing that remove artifacts that should not be present in the original image. In the latter, sharpness and contrast \textcolorblackare used to amplify \textcolorblacksome components in the original image \textcolorblackfor better image quality. These algorithms typically perform a series of operations that widen the min-max range of the original \textcolorblackimage, such as \textcolorblackthe Laplacian filtering. The Laplacian filtering is inexpensive and effective, but it is not possible to perform \textcolorblackmore complex processing \textcolorblacksuch as restoring missing texture components or broken edges.

There are several image enhancement algorithms that \textcolorblackresolve these weaknesses. \textcolorblackThe first one is improving the adaptiveness \textcolorblackof the processing region such as \textcolorblackAdaptive Unsharp Masking [13], which selects the processing region wisely. The second \textcolorblackone is to use two or more weights in the function to enable more complex processing, which includes \textcolorblackBilateral Filter [21] and NLM Filter[2]. There \textcolorblackare huge amount of related works, but with the advent of super-resolution, research on conventional image enhancement has diminished. This is \textcolorblackmainly because the improvement is due \textcolorblackfrom the amplification of \textcolorblacksome components, and the original frequency is maintained. Ultimately, it requires an approach such as super-resolution in order to change the shape of the original. We present a direction for developing new super-resolution \textcolorblackimages for better \textcolorblackquality.

2.2 \textcolorblackSuper-resolution

Before the boom of deep learning, as a better upscaler, \textcolorblacksuper-resolution algorithms were developed. \textcolorblackThese techniques have mainly been applied to a \textcolorblackdisplay system such as a television. \textcolorblackIt is proposed as a solution to compensate for \textcolorblackthe mismatch between the input image size and the panel resolution. There are two \textcolorblackmain categories of \textcolorblackthis technology: \textcolorblack1) utilizing machine learning \textcolorblackalgorithms and \textcolorblack2) weighted image blending using the patches from the original \textcolorblackimage. The former is similar to the super-resolution \textcolorblackapproaches through the current deep learning, but \textcolorblackwith simple architectures consisting of only one layer. The latter is a self-image generation method that retrieves the most similar patches \textcolorblackfrom the surrounding areas of images after upscaling [6]. Conventional super-resolution \textcolorblackmethods are well documented in \textcolorblack[19]. \textcolorblackThey can perform upscaling while keeping the sharpness of an edge. However, there are technical limitations such as \textcolorblackthe problem for a large upscaling ratio or impossibility of \textcolorblackrestoring complex textures. \textcolorblackThese require huge computations, but it was difficult to learn enough with a human-designed shallow network. Recently, \textcolorblackdeep neural networks have been improved to enable more powerful super-resolution processing. This is why we use deep learning as a solution.

2.3 Super-resolution with deep learning


blackRecently, as a branch of deep learning \textcolorblackresearch, ‘single image super-resolution’ has \textcolorblackemerged. It makes down-sized \textcolorblacklow-resolution (LR) images and \textcolorblacktrains convolution networks \textcolorblackso as to reproduce the original images from the LR images. \textcolorblackCompared to the revolutionarily simple SRCNN [5], \textcolorblackthe architecture has become heavier and more complex. VDSR [7] introduced VGG and ResNet into \textcolorblackthe super-resolution filed \textcolorblackin 2015. ESPCN [14] \textcolorblackand DRCN [8] present various aspects of residual structure and \textcolorblackthe view of upscaling \textcolorblackin 2016. SRGAN [10], SRResNET [9], DRRN [15], EDSR [11], DenseSR [17], MDSR [11] \textcolorblackand Memnet [16] \textcolorblackdeveloped Resnet architecture and \textcolorblackadjusted the GAN solution into super-resolution \textcolorblackin 2017. In 2018, super-resolution is now more widely used in terms of utilization. A good example is [20]. \textcolorblackThere is a research utilizing a recursive method for super-resolution [4]. \textcolorblackDifferent from ours, it tries to \textcolorblackproduce \textcolorblacka face image from \textcolorblackan 88 tiled mosaic through repeated application of super-resolution, \textcolorblack\textcolorblackwhich is similar to info-GAN [3] \textcolorblackthat produces a facial image.


blackRegardless of the super-resolution algorithm, \textcolorblackthe general operation principle is similar. Basically, the single image super-resolution is learned by back propagation so as to \textcolorblackreduce the difference between the \textcolorblackoriginal and the generated image through the SR network from an LR image.

There are some limitations in these works. First, in the low-resolution images used in other SR papers, the improvement of the image \textcolorblackquality by \textcolorblackincreasing the resolution is remarkable. However in a high-resolution image, the improvement caused by an increase in resolution \textcolorblackis not obvious. Second, the advances in super-resolution have led \textcolorblackto an increased performance sacrificing the computational complexity. \textcolorblackThis high computational demand makes it hard to implement super-resolution with hardware. Therefore, it is necessary to develop a method that can achieve better performance with a more simple network. Finally, \textcolorblacksince current super-resolution algorithms are learned \textcolorblackonly from the original \textcolorblackimages, they target the original and \textcolorblackwe cannot expect image quality beyond the original. \textcolorblackIn the following section, we present a new approach to get \textcolorblackimages with better quality at a low cost through recurrent learning.

3 \textcolorblackProposed method

3.1 \textcolorblackRecurrent Training Strategy (RTS)

The core of this paper is the recurrent \textcolorblacktraining strategy of an SR network which is achieved by reducing the \textcolorblacksize of the output image of the SR network to its original size and \textcolorblackretraining the SR network with this new target. The \textcolorblackproposed recurrent \textcolorblacktraining strategy is \textcolorblackbriefly shown in Figure 2.

Figure 2: \textcolorblackRecurrent Training Strategy(RTS). LR is a low resolution image obtained by down-scaling the original image (). One stage is composed of the SR training phase (Phase A) and the image enhancement phase (Phase B). By successive application of these two phases, we can obtain a better SR network and enhanced images. \textcolorblackAlthough it takes multiple times to learn, \textcolorblackat inference time, it is possible to obtain \textcolorblackimages with quality equivalent to the finally learned level at once \textcolorblackby feed-forwarding with the learned weights.

blackAs can be seen in the figure, a stage of RTS consists of two phases. In the first phase \textcolorblack(Phase A), SR network is trained. \textcolorblackIn this phase, the SR is learned so that the target image is obtained from the degraded LR image, and the weight of the super-resolution network is updated. The LR image is a \textcolorblacklow quality image which is corrupted through down-scaling, adjusting compression \textcolorblackand so on. In Stage 1\textcolorblack, the target is set to the original image, and the network is \textcolorblacktuned such that the LR is improved to the image quality of the original level.

The second phase, Phase B, is the process of creating a new target image using the network \textcolorblackobtained in Phase A. \textcolorblackThis is done by inputting the original image to the trained network of Phase A \textcolorblackand we can expect that the output image has higher quality than the original image . By downscaling the output image , we can obtain a new image which acts as a target in the next stage. Generally, super-resolution is accompanied by increasing the size of the image to increase the image resolution. So the additional down-scaling process is required because the resolution must be the same as the original for \textcolorblackretraining the network.

To summarize, we have to go through two steps to get a new HR \textcolorblackimage. The former is the process of obtaining the super-resolution \textcolorblackfunction , and the latter is the \textcolorblackprocess of obtaining enhanced image by down-scaling the output image of the SR network back to the original size. Then, we perform this process repeatedly on a stage basis, so that the same super-resolution processing can be \textcolorblackmore effective. \textcolorblackThe overall procedure of Stage can be summarized as follows:


Here, and denote the super-resolution network and the down-scaling operation, respectively. is an image which has lower resolution than the original .

By applying super-resolution to the original image, a \textcolorblackhigher quality image is expected to be obtained. Even if the obtained image is down-scaled to the original size, \textcolorblackcomponents from the additionally generated resolution \textcolorblackmay remain, this is the same \textcolorblackas an UHD (Untra HD) channel that looks better than the existing channels on a FHD \textcolorblack(full HD) TV. If \textcolorblack has a better image quality than \textcolorblack, we can assume that the super-resolution \textcolorblacknetwork can learn from \textcolorblackthis better target, and after \textcolorblackthe learning, the resultant network will be better. \textcolorblackTherefore, assuming that we go through this additional process in a direction that gradually improves the results, we will eventually reach the maximum improvement point of each image. And it \textcolorblackwill perform better than the \textcolorblackcurrent super-resolution \textcolorblacknetwork trained to mimic the original image.

Figure 3: Used network for recurrent learning. This network was developed exclusively for recursive learning systems. The biggest difference from the existing SR network is \textcolorblackthat it was designed to produce the best result at the \textcolorblackblue layer \textcolorblackwhere the residual is added \textcolorblackwith only 3 channels. \textcolorblackThe model is also very light and easy to control the number of layers.

3.2 \textcolorblackReason for enhanced quality

If is the original image, is the up-scaler, is the down-scaler, and is the residual component, the super resolution trained by \textcolorblack can be expressed as follows:


Here, we can suppose is \textcolorblackthe same as , and \textcolorblackbecomes \textcolorblacka locally blurred version of :


blackCombining (2) and (4), we get


blackBecause the goal of learning is to minimize \textcolorblackthe error :


blackif enough learning has been done, \textcolorblackbecomes similar to :


In the first stage, after \textcolorblackthe super resolution is fully learned, \textcolorblackinstead of , we can input to \textcolorblackthe SR network, \textcolorblackthen it becomes


blackBy down-scaling this, we get the target image for stage 2 which can be expressed as


blackAssuming a linear111For a nonlinear down-scaler such as nearest neighbor or bicubic interpolation, we can take Taylor series expansion. down-scaler, as , it becomes


blackThe distributions of zero-mean residual components of different sized images will be approximately equal, that is , unless there are not much high frequency components only existing in and not in . By down-scaling , the distribution will be more contracted near zero and we can approximate with a contracted version of as


Finally, with (7), \textcolorblackcan be approximated \textcolorblackas


blackThe above equation has high similarity \textcolorblackwith unsharp masking. In other words, has a high probability that the \textcolorblackcomponents with a large deviation from the mean are amplified. And it can be expected that \textcolorblackthe new target image can \textcolorblackhave improved sharpness compared to .

When \textcolorblackthe super resolution \textcolorblacknetwork is trained with this new target, it becomes


This is obtained by replacing in (7) with the new target in computing . As stage goes on, it becomes \textcolorblack


blackSince is smaller than 1, \textcolorblack(Experimental results show that the average for stage 1 on DIV2K valid set is 0.779 with the standard deviation of 0.098.) \textcolorblackhigher order terms will disappear and can be expected \textcolorblackto convergence to \textcolorblacka certain point not far from . \textcolorblackUsing the SR network, the actual difference is learned by a nonlinear function \textcolorblackand it makes hard to guarantee convergence, these non-linearity has much better performance than the simple USM algorithm. As iteration goes on, the difference tend to converges to and the image quality of becomes better than that of the original image .

3.3 \textcolorblackConfiguration of super-resolution network

We set up a convolution network based on the residual layer. The use of multiple layers of residual type has the advantage of automatically adapting image filters of various sizes. In order to make use of these advantages and to easily control the number of parameters, we constructed the \textcolorblacknetwork shown in Fig. 3.

This network is designed exclusively for recurrent learning, which uses \textcolorblacka subsequent down-scaler \textcolorblackafter the operation of super-resolution. The biggest difference from \textcolorblackthe regular super resolution network \textcolorblackis that it produces the best output of the same resolution with original, not \textcolorblackin an upscaled domain. Learning proceeds in the same way as the existing super resolution, but since the expected output is synthesized in 3ch at the original resolution, optimum result is produced at the corresponding step.

It is composed \textcolorblackof an input module, \textcolorblackan output module and three \textcolorblack64-channel residual networks using 33 filters The input module \textcolorblackconsists of 128128 RGB (3ch) input and 64ch output. After the input module, a \textcolorblackseries of residual layers, all of which have the same number of channels (64 ch) with 33 convolutions, is applied. As shown in the figure, the base model uses 3 layers. In the output module, the final residual \textcolorblacklayer does not make use of ReLU because \textcolorblackthe difference between the input and the output will have both \textcolorblacka positive and negative value. Here, the output with blue residual will be next stage’s HR, because the next channel extension and pixel shuffle has no ReLU and it can be canceled by the down-scaler. The residual layer is \textcolorblackbased on SRResNet [9] and VDSR [7]. \textcolorblackThe final residual without ReLU is from VDSR, and \textcolorblackthe inner residual layer \textcolorblackshares the concept \textcolorblackwith SRResNet.

Finally, we need a down-scaler to convert the \textcolorblackup-scaled image to the same size as the original, using the Lanczos scaler. Lanczos filter has better performance than linear \textcolorblackand bicubic filters, and it makes greater in \textcolorblack(11).

Stage N Difference Ratio Delta from (N-1)
form the Original(%)
HR1 0.95 0.95
HR2 1.35 0.40
HR3 1.47 0.12
HR4 1.48 0.01
HR5 1.51 0.03
HR6 1.55 0.04
HR7 1.62 0.07
HR8 1.76 0.14
Table 1: Image difference ratio of validation set. \textcolorblackIn our model with the proposed recurrent \textcolorblacktraining strategy, the change of difference ratio \textcolorblackfrom the previous stage is minimum at stage 4.
Figure 4: Change of image difference \textcolorblackvs. stage. It can be seen that the difference converges to almost zero in HR4 (Yellow Line). But with repeated application of unsharp masking, we can observe the delta \textcolorblackincreases linearly (Green Line).
(Original)                    (HR1)                       (HR2)                       (HR3)                       (HR4)
Figure 5: Comparison of \textcolorblackoutput images \textcolorblackin each stage. \textcolorblackFor better comparison, each row shows a part \textcolorblack(64 64 pixels) of an image in DIV2K image set. \textcolorblackAs described in the introduction, the originals are naturally blurred in the capturing and the transmission process. The stage number increases \textcolorblackfrom left to right. \textcolorblackAs the stage increases, it can be observed that \textcolorblackthe texture is clearly seen and the result becomes clearer. It is easily identifiable in a man’s beard, in the particle of the bricks, and in the clarity of the letters.

3.4 Setting the number of stages

In our proposal, \textcolorblackthe number of stages can be set as the point where the quality of is worse than that of and the corresponding network can be used as the final super resolution network. The important thing here is how to judge \textcolorblackwhether the output image gets better or not. As a result of \textcolorblackour experiment, \textcolorblackwe can consider a couple of solutions. The easiest approach is to measure the quality of the actual \textcolorblackvalidation set to see if this score is increased. This is intuitive, but it has the disadvantage that it is difficult to use it because of the time and effort required for measurement. The second method is to confirm whether the change \textcolorblack becomes larger or smaller as the repetition progresses. As the output of each stage approaches a certain ideal point, the difference \textcolorblackwill gradually decrease. When the output of the stage deviates from the ideal point, the divergence starts and the difference \textcolorblackwill increase. This will be explained in more detail in the \textcolorblackfollowing experimental results.

4 Experimental result

4.1 Training setup


blackIn our experiment, we used the DIV2K image dataset [1], which contains 2K RGB images. This is to prove \textcolorblackthe performance of image enhancement on contents in the FHD level which occupies the majority of \textcolorblackcurrent broadcasting \textcolorblackcontents. The DIV-2K \textcolorblackdataset consists of 800 train sets and 100 validation sets. \textcolorblackFrom the 800 DIV-2K training set, we made our \textcolorblacktraining database which was constructed by ripping 100 random patches \textcolorblackwith a 256256 size. After making \textcolorblackthe original images, , we adjusted the JPEG compression and \textcolorblackdown-scaled the patches to \textcolorblackthe half size to create the LR images. \textcolorblackThe learning was carried out from the 80,000 pairs made in this way.


black In all of our experiments, we used MSE as a loss function and Adam as an optimizer. Batch size of 8 was used with learning rate of 0.001. We trained the network for 20 epochs.

4.2 Convergence of recurrent learning

In order to confirm whether the \textcolorblackproposed recurrent learning \textcolorblackconverges to the maximum improvement point or simply diverges as if the Laplacian filter is applied multiple times, we observed the \textcolorblackdifference of \textcolorblackoutput images between consecutive stages. The \textcolorblackimage difference is an objective indicator that shows the average difference of all the pixels in \textcolorblackan image in \textcolorblackterms of ratio. It can be similar to MSE, but it is normalized to the image’s brightness and resolution. \textcolorblackIt is a more interpretable measure in that it is more intuitive \textcolorblacktelling the percentage of the image changed. The average image difference of the \textcolorblack100 validation images in each stage \textcolorblackis shown in Table 1.


blackFigure 4 shows Table 1 to a graph. It shows that the learning converges to a certain point. And it can be seen that the \textcolorblackchange of difference starts to increase after a specific point, the Stage 4. In order to confirm that the recurrent learning is different from the repetitive application of \textcolorblacka simple filter, we \textcolorblackrepeatedly applied unsharp-masking (USM) filter to the original image, which is shown by the green line. The difference \textcolorblackincreases linearly and \textcolorblackimage difference \textcolorblackfrom Stage (N-1) does not converge ever. Also, from the \textcolorblackobservation that the image difference is linearly increased \textcolorblackby repeated USM application, we can confirm that the image difference is properly measured.

MOS Resol Edge SN Texture Satu Color Illumi Dynamic Over Under
Name Score ution Acutance Index Acutance ration Warmth nation Range Expose Expose
LR 3.645 2.8 52.6 311.9 96.3 116.8 107.7 156.8 101.0 0.1 0.4
HR0 3.825 2.8 56.6 314.8 108.7 117.3 107.8 163.6 101.0 0.1 0.4
HR1 3.962 2.8 82.7 322.3 150.7 117.7 107.8 182.7 101.3 0.1 0.4
HR2 4.028 2.8 92.5 325.3 167.8 118.4 108.0 190.0 101.2 0.0 0.4
HR3 4.057 2.8 94.3 326.2 171.8 118.0 107.8 191.4 101.2 0.0 0.4
HR4 4.050 2.8 93.1 326.7 171.4 117.1 107.2 191.1 101.3 0.1 0.4
HR5 4.048 2.8 92.1 326.7 170.5 117.7 107.6 190.7 101.3 0.1 0.4
HR6 4.038 2.8 89.5 327.6 168.1 116.0 107.8 188.2 101.1 0.0 0.4
HR7 4.030 2.8 87.9 329.0 166.9 114.6 107.3 186.8 101.1 0.1 0.4
HR8 4.030 2.8 87.3 331.1 166.1 111.7 106.1 186.2 101.2 0.1 0.4
Table 2: Result of VIQET image quality measurement. The final MOS score in this table is obtained based on the detailed score of 10 items excluding Flat Region Index which results in NaN for all the cases. Except for over and under exposed ratio, a high score lead to a high MOS, which means a better quality.

4.3 The result images


blackFigure 5 shows \textcolorblacksome samples of the output images in each stage. It shows how the \textcolorblackresultant images for each stages are being improved. All images are down-scaled with the same resolution as the original, and we can visually confirm that the \textcolorblackquality of the images \textcolorblackis improved. The improvement level at each stage is similar to the slope of the graph in \textcolorblackFig. 4. \textcolorblackAs stage goes on, it can be observed that the texture is clearly seen and the result becomes clearer. It can be said that recurrent learning converges to a certain level of improvement, since there is little difference between and \textcolorblackin all the samples.

Figure 6: The MOS results for each stage. The final score, MOS (Red line), peaked at . The increase in MOS is similar to the convergence of the difference, yellow line in Fig.4. Also the \textcolorblackspecific measures in other colors tend to be increasing until and then \textcolorblackthey saturate or decrease slightly.

4.4 Verification using VIQET tool

In order to clearly demonstrate the effectiveness of image enhancement, we looked for ways to more objectively measure the image quality. Many non-reference quality measure models \textcolorblackexists, but most \textcolorblackof them need pre-learning the references and \textcolorblackcheck how close the measured image is from \textcolorblackthe learned. This \textcolorblackis not what we want, because we \textcolorblackdo not have any reference. Fortunately, the Video Quality Experts Group (VQEG) has developed a tool that measures the image quality and \textcolorblackdeveloped the Mean Opinion Score (MOS) of \textcolorblackan image as 0 (Poor) to 4.5 (Best). Using this tool, we can evaluate the image quality improvement for each stage objectively.


blackIn the VQEG image quality Evaluation Tool (VIQET) [18], the average MOS score is derived for each image through a total of 11 measures: 1. Resolution, 2. Multi-scale Edge Acutance, 3. Noise Signature Index, 4. Flat Region Index, 5. Multi-scale Texture Acutance, 6. Saturation, 7. Color Warmth, 8. Illumination, 9. Dynamic Range, 10. Over Exposed ratio, 11. Under Exposed ratio. \textcolorblackTo use VIQET, we must set a category of an image among 4 categories: Outdoor Day - Landscape, Indoor - Wall Hanging, Indoor - Arrangements and Outdoor Night - Landmark. Most of the DIV-2K images \textcolorblackare bright images taken from the outside and we simply set all \textcolorblackthe images to the outdoor day category. The \textcolorblackdetailed average scores of 100 DIV-2K \textcolorblackvalidation images for each stage are shown in Table 2.

The key point is that the overall score, MOS value, increases with \textcolorblackthe stage. The result proves that “The image quality can be increased by recurrent learning”. The MOS results for each stage are shown in Fig. 6. Based on the MOS, it can be confirmed that the image quality from the original \textcolorblack3.825 was improved up to 4.057. The increase is 0.232 points \textcolorblackfrom the original, which is larger than the difference between LR and \textcolorblackthe original, 0.180. \textcolorblackIt prove that image enhancement through recurrent learning is efficient. \textcolorblackThere is a limitation \textcolorblackin improving the image quality based on the difference between and LR with one-time learning and we have resolved this by providing a new target obtained through recurrently training the network. As can be seen in Fig. 6, the MOS increase is largely due to the improvement of four terms. The first one is the improvement of multi-scale edge acutance \textcolorblackvalue. It means the edge got to be more sharp. The second is the Noise Signature Index, which means that the distinction between image and noise is more clear. The \textcolorblackthird one is multi-scale texture acutance, and it means increment of the level of activity and detail in the scene. Lastly, illumination means the light has become more enriched, also \textcolorblackindicating that the image has more detail.

Network Param HR0 HR1 HR2 HR3
Residual1 41K 2.384 2.639 2.701 2.712
Residual3 114K 2.384 2.637 2.719 2.720
Residual6 225K 2.384 2.644 2.725 2.744
SRCNN 57K 2.384 2.603 2.680 2.680
VDSR 666K 2.384 2.599 2.668 2.669
SRResNet 1.5M 2.384 2.617 2.720 2.703
EDSR(6416) 1.7M 2.384 2.632 2.725 2.737
Table 3: MOS results for various networks. This table shows that there is an additional improvement in the index after HR1. It means the application of Recurrent Training Strategy (RTS) is meaningful regardless of the network. And we can confirm the proposed network is more suitable for recurrent learning. \textcolorblackImages are available in the supplementary material.

4.5 Rts \textcolorblackon various networks


blackTo investigate the characteristics of the proposed RTS more clearly and to check whether the proposed RTS is still effective with other \textcolorblackalgorithms, we test \textcolorblackRTS on several other networks as well as the proposed network. \textcolorblackRTS was applied to total six networks \textcolorblackand the performances are shown in Table 3. \textcolorblackIn the table, the above three networks are designed based on our proposal shown in Fig. 3 by changing the number of residual layers in the middle to 1, 3 and 6, respectively. The proposed architecture in Fig. 3 has the advantage of being able to easily compare the number of parameters in the network \textcolorblackconsidering the number of repetitions. For example, a comparison between learning once with doubled parameters and learning twice with halved parameters \textcolorblackis possible. In the three bottom rows, we have \textcolorblackchosen SRCNN, VSDR, \textcolorblackSRResNet and ESDR to check whether our \textcolorblackrecurrent training strategy is meaningful even in \textcolorblacka heavier super-resolution network. \textcolorblackBy this experiment, the characteristics of \textcolorblackthe proposed RTS on various networks can be checked from the simplest network, SRCNN, to \textcolorblacka latest heavy network, \textcolorblackEDSR.

Unfortunately, the existing networks are so heavy that the GPU memory \textcolorblackoverflows, making it impossible to keep the original size of a DIV2K image intact. So we have to divide each image into 42 and processed \textcolorblackthem independently. The MOS result is \textcolorblackthe averaged value of 800 pieces of divided 100 \textcolorblackvalidation images. The segmentation of the image caused a decrease in the content and resolution, which resulted in a decrease in the MOS score. However, since this segmentation is applied equally to all the six \textcolorblacknetworks, the tendency is comparable and the experiment is fair. Table 3 shows the resultant MOS scores as well as the number of parameters in each network.

In \textcolorblackTable 3, the scores of HR2 and HR3 are higher than \textcolorblackHR1 at all times. This suggests that recurrent learning may lead to additional performance improvement in existing algorithms. In addition, it is shown that the improvement \textcolorblackdue from the learning method is more meaningful than the improvement \textcolorblackcaused by using a different network. It means that the \textcolorblackquality of an image is more dependent on the number of repetitions in RTS than the size of the network, and as a result, the proposed RTS can produce good results even with light networks. Although the \textcolorblackamount of increase in MOS is different for each algorithm, the tendency of learning is similar. From this, we can say that applying \textcolorblackRTS makes it possible to use \textcolorblacka less expensive network.

5 Conclusion

In this paper, we proposed a new image enhancement method by a recurrently-trained super-resolution network. \textcolorblackThe proposed recurrent learning consists of two phases: \textcolorblackNetwork Training phase to learn HR form LR, and \textcolorblackTarget Update phase to apply \textcolorblackthe trained network to the original to produce HR \textcolorblackimage for the next stage. We clarified the \textcolorblackcharacteristics of recurrent learning by analyzing the process and results. Using image difference, we can show the results converge to a specific improvement point without divergence. Also, by using VIQET mean-opinion-score, it has been \textcolorblacknumerically shown that the quality of an image improves clearly as we repeat the learning procedure. It is the first time to use these objective \textcolorblackmeasures in the image-enhancement \textcolorblackarea, \textcolorblackwhich is meaningful in that the MOS score \textcolorblackis useful not only in a theory [12] but also in practice.


blackThe DIV2K image dataset used in this paper has an equivalent image quality to that of an actual digital broadcasting video, \textcolorblackwhich is different from the conventional datasets with low quality used in the existing super-resolution papers. Nonetheless, by RTS, the MOS score increased more \textcolorblackthan the difference between the original and its low-resolution version, which is impossible with the conventional one-time learning. The increased MOS score proves that image enhancement through recurrent learning is efficient.

In order to clarify the \textcolorblackeffectiveness of the proposed RTS, we experimented with various networks. The results show that recurrent learning can make an additional improvement, regardless of the network used. And the results show a way to \textcolorblackobtain a cheap but effective network. \textcolorblackWe expect that it will contribute to the related society by dramatically reducing the cost of super-resolution and image enhancement.


  • [1] E. Agustsson and R. Timofte. Ntire 2017 challenge on single image super-resolution: Dataset and study. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, July 2017.
  • [2] A. Buades, B. Coll, and J. . Morel. A non-local algorithm for image denoising. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 2, pages 60–65 vol. 2, June 2005.
  • [3] X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 2172–2180. Curran Associates, Inc., 2016.
  • [4] R. Dahl, M. Norouzi, and J. Shlens. Pixel recursive super resolution. In The IEEE International Conference on Computer Vision (ICCV), Oct 2017.
  • [5] C. Dong, C. C. Loy, K. He, and X. Tang. Learning a deep convolutional network for image super-resolution. In D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, editors, Computer Vision – ECCV 2014, pages 184–199, Cham, 2014. Springer International Publishing.
  • [6] D. Glasner, S. Bagon, and M. Irani. Super-resolution from a single image. In 2009 IEEE 12th International Conference on Computer Vision (ICCV), pages 349–356, Los Alamitos, CA, USA, oct 2009. IEEE Computer Society.
  • [7] J. Kim, J. Kwon Lee, and K. Mu Lee. Accurate image super-resolution using very deep convolutional networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
  • [8] J. Kim, J. Kwon Lee, and K. Mu Lee. Deeply-recursive convolutional network for image super-resolution. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
  • [9] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. P. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi. Photo-realistic single image super-resolution using a generative adversarial network. CoRR, abs/1609.04802, 2016.
  • [10] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi. Photo-realistic single image super-resolution using a generative adversarial network. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
  • [11] B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee. Enhanced deep residual networks for single image super-resolution. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, July 2017.
  • [12] M. Mrak, S. Grgic, and M. Grgic. Picture quality measures in image compression systems. In The IEEE Region 8 EUROCON 2003. Computer as a Tool., volume 1, pages 233–236 vol.1, Sep. 2003.
  • [13] A. Polesel, G. Ramponi, and V. J. Mathews. Image enhancement via adaptive unsharp masking. IEEE Transactions on Image Processing, 9(3):505–510, March 2000.
  • [14] W. Shi, J. Caballero, F. Huszar, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
  • [15] Y. Tai, J. Yang, and X. Liu. Image super-resolution via deep recursive residual network. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
  • [16] Y. Tai, J. Yang, X. Liu, and C. Xu. Memnet: A persistent memory network for image restoration. In The IEEE International Conference on Computer Vision (ICCV), Oct 2017.
  • [17] T. Tong, G. Li, X. Liu, and Q. Gao. Image super-resolution using dense skip connections. In The IEEE International Conference on Computer Vision (ICCV), Oct 2017.
  • [18] V. Q. E. G. (VQEG). Vqeg image quality evaluation tool (viqet) version, 2016.
  • [19] J. Yang, J. Wright, T. S. Huang, and Y. Ma. Image super-resolution via sparse representation. IEEE Transactions on Image Processing, 19(11):2861–2873, Nov 2010.
  • [20] J. Yoo, S.-h. Lee, and N. Kwak. Image restoration by estimating frequency distribution of local patches. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
  • [21] B. Zhang and J. P. Allebach. Adaptive bilateral filter for sharpness enhancement and noise removal. IEEE Transactions on Image Processing, 17(5):664–678, May 2008.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description