How Does the LowRank Matrix Decomposition Help Internal and External Learnings for SuperResolution
Abstract
Wisely utilizing the internal and external learning methods is a new challenge in superresolution problem. To address this issue, we analyze the attributes of two methodologies and find two observations of their recovered details: 1) they are complementary in both feature space and image plane, 2) they distribute sparsely in the spatial space. These inspire us to propose a lowrank solution which effectively integrates two learning methods and then achieves a superior result. To fit this solution, the internal learning method and the external learning method are tailored to produce multiple preliminary results. Our theoretical analysis and experiment prove that the proposed lowrank solution does not require massive inputs to guarantee the performance, and thereby simplifying the design of two learning methods for the solution. Intensive experiments show the proposed solution improves the single learning method in both qualitative and quantitative assessments. Surprisingly, it shows more superior capability on noisy images and outperforms stateoftheart methods.
I Introduction
Singleimage superresolution (SISR) aims at reconstructing a high resolution (HR) image from a single low resolution (LR) image of the same scene. The ultimate goal is to obtain a visually pleasing image with high resolution, more details, and higher signaltonoise ratio (SNR). It spans a wide bank of applications from visible image (satellite and aerial image, medical image, biometric image, etc.) to invisible image (ultrasound and range images, etc.). Unfortunately, SISR is illposed because it is an underdetermined problem, thus may not result in an unique solution. To constrain the solution, the prior knowledge about the desired image is expected to be learned from the given data. To this end, SISR algorithms have stepped forward from interpolation to learning the mapping functions between the given LR image and the desired HR image. Recently, there exist two trends: external learning and internal learning for this purpose.
The external learning [1, 2, 3, 4, 5, 6, 7, 8, 9] attempts to encode the expected information between aligned LR and HR image/patch pairs. Afterwards, the underlying scene can be obtained by decoding. Routinely, such information is learned from massive and varied external datasets, because these methods assume that the input LR image provides insufficient information. Among sophisticated machine learning algorithms, the coupled dictionary learning approaches reported a better performance. They are designed to extract the abundant “metadetail” shared among external images which can be utilized for reconstructing details. This strategy brings us two pros: 1). As learned from external examples, the metadetail consists of information which does not exist in the input LR image. Therefore, it is possible to enrich the lost details that are commonly shared in nature images; 2). If the external datasets are large scale, the metadetails have a rich diversity that is essential to ensure the quality of SR images. Unfortunately, the external learning cannot always guarantee a well recovery for an arbitrary input LR image. Especially, when certain singular and unique details rarely exist in the given datasets, external learning is apt to introduce either unexpected details or smoothness. This is called weak relevancy.
To avoid this issue, another attempt achieves SR images by exploring “selfsimilarities” among patches in the image [10, 11, 12, 13]. These methods, namely internal learning, stand on the fact that patches of a natural image recur within and across scales of the same image, and light a new way of image superresolution. Consequently, the encoding of internal learning is built in the multiscale and geometric invariant space between LR and HR patch pairs. When compared to the external learning, the internal learning may provide less quantity and diversity of feasible pairs, but they are more relevant to the input LR image. Thus, it inherently outperforms external learning on the images/regions that contain the densely repeated patterns. As such, the core of internal learning is locating the selfsimilarities regardless of the fact that there are scale change and geometric distortion. However, inappropriate matches of irregular patterns in the image lead to artifacts.
Obviously, neither the external learning nor the internal learning is perfect for SISR. We question whether an integration of the two could enhance SISR performance? By analyzing their results in the feature space (estimation error distribution) and the image plane (preference map), our finding reveals that their attributes are quite different. Specifically, two methods produce rather complementary details in the feature space and the image plane (see Section III A for details). However, the straightforward combination, e.g. a weighted average, may not reach an improvement because of the unknown balance of priors and errors from two learning methods. Wang et al. [14] proposed an alternative method to fuse an internal HR result and another external HR result according to the preference by comparing their SNRs on each small patch. However, there still exist two remaining problems. 1). Less diversity: the method fuses only two inputs which makes the result heavily relies on the chosen methods and datasets. 2). Biased criterion: SNR is just one of many criteria in superresolution problem. It may not always result in a visually pleasing outcome.
To address all above problems, we first increase the diversity. Specifically, an external learning method is tailored by learning the mapping function from varied datasets, which brings multiple preliminary HR images for improving the reliability. For internal learning, we tailor the patch matching criteria which gives weight to the similarity matching and consider the geometry transformation as well. It also produces multiple preliminary HR images. With dozens of HR images from both methods, we change the viewpoint to treat the data fusion as a dimension reduction problem and build these preliminary HR images into a matrix. Our analyses reveal the internal and external learning errors in the matrix are sparse and mostly full rank. This finding inspires us to employ the lowrank matrix decomposition for the integration, which can efficiently recover a lowrank matrix corrupted by errors and noises, while at the same time preserving the widespread pixels into the reconstructed image. Our method essentially differs from other combinations. It neither requires specific parameters nor relies on a single criterion, but fuses images according to their inherent properties to achieve a better qualitative and quantitative result. Thus, our contribution is threefold:

We propose a parameterfree solution that considers the integration of internal and external learning for SISR as a lowrank matrix decomposition problem. We also tailor the external learning and internal learning methodologies to improve the diversity and fit to our solution.

We theoretically and experimentally prove that only a few internal and external learning outputs can result in a desired SR image. It simplifies the design of the internal and external learning methods for the lowrank solution.

Due to the inherent antinoise attribute, our lowrank solution is robust on a wide variety of data, including noiseless images, synthetic noisy images and real noisy images.
The experiment on the noiseless images shows our proposed lowrank solution outperforms the stateoftheart methods on the benchmark databases. For the noisy images, our solution again demonstrates a superior ability of restraining noise. Even if the noise variance is up to 20.
In the remainder of the paper, we first briefly review the most relevant works and the stateoftheart methods in section II, then, explore the attributes of internal and external learnings which are the foundation for the integration using lowrank matrix decomposition in section III. Section IV gives the proposed lowrank solution in detail, and provides a theoretical analysis of its performance against input. Furthermore, we introduce how to tailor an internal learning method and an external learning method for the solution. In section V, intensive experiments show the effectiveness of the proposed solution on a variety of data and settings. We finally conclude in section VI.
Ii Related Works
In SR literature, recent studies more focused on external learning methods that learn priors from a huge volume of LRHR image pairs to predict the HR image. For example, the seminal work, Freeman et al. [15] employed Markov Random Field to select the appropriate HR image patches from candidates. This approach is capable of producing plausible fine details. However, lack of relevant images in the database results in a fairly poor outcome. Yang et al. [2, 16] introduced a dictionary learning strategy for SR using sparse representation. They also assumed that the sparse representation of HR image patches are similar to that of the LR one. These methods desire to learn two coupled dictionaries, namely, the overcomplete HR dictionary and the LR dictionary. Being combined with the HR dictionary, HR image patches are able to be well constructed. Zeyde’s method [17] is based on Yang’s framework, but reduces the computation cost. It represents the LR image patches by performing dimension reduction on features using PCA, and the HR dictionary is directly obtained by the pseudoinverse. Timofte et al. [18, 19] combined the anchored neighborhood regression with sparse representation for fast single image SR. Recently, deep networks (Deep Network Cascade [4] and Convolutions Neural Network (CNN) [5] are utilized to directly learn the endtoend mapping between the LR/HR image patch pairs. Furthermore, Dong et al. [8] proposed a compact hourglassshape CNN structure to further accelerate the SR, and Kim et al. [9] designed a very deep CNN for SR which resulted in a significant quantitative improvement.
On the contrary, internal learning methods achieve SR by exploring the local selfsimilarities in the image instead of using external data. Freedman et al. [11] super resolved a LR image using priors from the similar patches found in a neighbor region under the same scale. The quantity of matched patches, however, is not always sufficient for a desired SR result. Thus, Glasner et al. [20] introduced an approach of searching across varied image scales to increase the matching number. Recent studies reported above ¡°translated¡± matching is unlikely to handle the textural appearance variation. Therefore, Zhu et al. [21] deformed the patched using optical flow to allow texture change. Huang et al. [21] extended the internal patch search space by applying affine transformations for geometric variations in image patch.
In reality, the LR images usually suffer from noise corruption. However, until now, only a few researches focused on noisy image SR. Sigh et al. [22] tried to integrate the merits of image denoising and image SR, and then proposed a convex combination of the denoised result and the SR result of a noisy input. Liu et al. [23] took advantage of the task transferring of deep learning. They first trained a SR network based on noiseless data, and subsequently finetuned the net by using the noisy data. As a result, their model was adapted to both noiseless and noisy tasks.
So far, works aimed at combining pros of both internal learning and external learning are still rare. The method [14] fuses an internal HR result and another external HR result according to the internal and external learning preference by comparing their SNRs on each small patch. It reported an improved performance on ultra high definition image. Recently, lowrank decomposition method has been applied in a wide variety of computer vision tasks, including image restoration [24, 25], person reidentification [26], face recognition [27, 28], image classification [29, 30, 31], etc. To the best of our knowledge, this is the first work in the literature to use lowrank decomposition for integrating the internal learning and external learning methods in the image SR problem.
Iii Attributes of Internal Learning and External Learning
As aforementioned, the internal learning and external learning contribute differently to a SR result. It is certainly worth analyzing their own attributes, and exploring an appropriate methodology to integrate their pros. To this end, we follow a general model of SR problem but address on the complementarity and sparsity of two learning methods.
In general, SR problem can be formulated as
(1) 
where is the super resolved image, is the ground truth HR image, is the given LR image, is the SR function, , is the balance weight, is the regularization in SR function, and is the estimation error map with the same size of . Although the recovered detail is implicit in Eq. (1), it is correlated with because and are known. Thus, studying the attribute of a learning method is equal to studying .
Iiia Complementarity Analysis
Firstly, the distributions of produced by two learning methods are going to be explored in feature space. We selected 14 representative images in SR problem and their corresponding LR images . The internal and external learning methods introduced in sections IVA and IVB are applied on each by varying the configuration settings (e.g. different training datasets, image rotations and matching criteria), and result in 10 super resolved images where five come from the external learning method and the other are from the internal learning method, respectively. In total, there are 140 different SR images. We further compute the estimation error map and slid an window on it to obtain a set of subimages. From each error map, 50 resulted subimages from either the external learning method or the internal learning method are randomly chosen. As a result, 7000 subimages are used for analysis. To visualize the attribute of in the feature space, they are projected into 3D and 2D spaces using the locality preserving projections (LPP) [32] and plotted in Fig. 2. The distribution illustrates that internal learning and external learning do provide different estimation errors (in other words, recovered details). Particularly, the 2D distribution shows that the external learning recovers more diverse details whose distribution is sparser, but the internal learning concentrates on the specific detail recovery whose distribution is denser. One can see two methods perform less overlap in the 3D and 2D feature spaces. This observation suggests that they are complementary in the SR problem. Moreover, the , which come from the same learning method but with varied configurations, locate differently in the feature space, which demonstrates that SR images are the interaction result of the learning methods and the training data.
Secondly, we explore the distribution on the image plane produced by two learning methods. To illustrate the difference, we produced a preference map (PM) that visualizes which method would benefit a pixel in the HR image. Meanwhile, we also inspect the overlap between two error groups by counting the quantity of the same errors. Specifically, given a LR image, we calculate
(2) 
where is the internal learning result, is the external learning result. denotes the elementwise absolute value in a matrix. is a function of counting the number of nonzero elements in a matrix. and denote the numbers of elements whose values are greater than a threshold in the absolute error maps respectively, denotes the number of above elements at the same locations in two error maps. Here, setting is to emphasize the high value components in error map that largely correspond to those high frequency signals in the SR result, because recovering the high frequency signals is the major concern of a SR algorithm.
Each pixel , which locates at in PM, is assigned either 0 (black) or 255 (white) by comparing its corresponding and ,
(3) 
Fig. 3 demonstrates two samples, Person and Woman. The preference maps visually show that the internal learning does perform better on the repeated patterns in the input LR image due to the selfsimilarity, e.g. the hair nets & bush in Fig. 3.(b), and the external learning has better recovery on those details commonly shared in the training datasets, see Fig. 3.(c). To have a quantitative validation, Fig. 4.(a) plots the stacked bar of three groups according to the calculation Eq. (2), where the red bar denotes the errors produced by both methods, the green bar is the errors which resulted only from internal learning, analogously, external learning is the blue bar. This statistic reveals the above fact again, and . Both qualitative and quantitative assessments conclude that the internal and external learnings are complementary for SR problem on the image plane as well.
Till now, one can see that internal learning and external learning are complementary in both feature space and image plane, and could further improve the quality of SR image if wisely integrated.
IiiB Sparsity Analysis
Except for the complementarity, we also analyze the intensity of that inspires us a wise integration. We randomly selected 100 LR images from BSD300 database [33], then applied the internal and external learning methods on each as the same in the complementarity analysis. In total, 1000 varied HR images are obtained. Again, we computed each absolute estimation error map and then counted the number of nonzero elements. In this study, we only consider the pixels whose error is greater than a threshold . Finally, an overall statistical histogram, Fig. 4.(b), plots the percentage of nonzero elements in among these 1000 samples. One can see that most of the samples () are composed of very few elements with significant values. This reveals the major estimation errors of both learning methods are sparse.
Iv The LowRank Solution for Integration
From the above analyses of attribute, we have two observations: 1). Given an input, the internal learning and the external learning produce similar HR images with different but complementary recovered details. These highdimensional results (HR images) share a common lowdimensional structure because they look similar. 2). The estimation errors of both methods are sparse. These observations inspire us to develop a lowrank matrix decomposition solution to integrate the pros of two methods. However, the conventional SR methods usually produce one result. Only two inputs, one from the internal learning and another from the external learning, cannot take advantage of the lowrank solution. Therefore, we tailor the configuration settings for the internal learning and the external learning to have a bank of preliminary HR images, which are diverse and have essential and complementary priors acquired for the final SR images reconstruction. Afterward, a parameterfree lowrank solution is proposed to combine the pros of both internal and external learning methods according to their inherent property. Fig. 1 illustrates the pipeline of the proposed framework.
Iva A Tailored Internal Learning
The internal learning is in light of a finding that small patches in nature images often recur within and cross scale spaces [11]. Given an input LR image , enlarge it to with zooming factor, meanwhile, use a highpass filter to decompose into the low frequency band and the high frequency band . Then, extract patches from preenlarged image , which act as the lowfrequency component of recovered SR image patches . We search the most similar lowfrequency patch from with patch . Thus, the corresponding highfrequency patch is regarded as the highfrequency component of recovered SR image patch . The SR patch is reconstructed by . We keep enlarging with zooming factor and repeat the above process until the desired size of SR image is reached.
To alleviate the “Less diversity” problem, we attempt to produce multiple different HR images. By varying the number of similar patches to be matched (), the generated preliminary HR images have certain variation to ensure the diversity. However, experiments show the quantity of similar patches in an image is limited. If is set to a large value, some selected patches would have considerable differences and then result in an unfaithful result. To expand the patch searching space, we geometrically deform the “translated” patches by estimating an affine transformation matrix that maps the target patch to its nearest neighbors in the downsampled image. By doing this, a feasible range of value can be found to ensure both accuracy and diversity. With ready, we follow the work [34] to compute the construction weight of each matched patch by minimizing
(4) 
where is the normalizing constant and the parameter is a filter degree parameter. Second, our patch matching is based on the norm (Euclidean distance), which is sensitive to rotation. We find that matching under varied angles results in different results. The similar observation is also reported in [35]. Thus, we put the LR image at four angles (), and search the similar patches in each rotated image. Finally, we are able to obtain preliminary HR results from the internal learning method.
IvB A Tailored External Learning
External learning infers the missing highfrequency details in LR images by training a sparse representation on external datasets. It suggests that image patches can be well represented as a sparse linear combination of atoms via an appropriate overcomplete dictionary. By jointly training LRHR dictionaries, we are able to obtain the relationship of sparse representations between the low and highresolution image patches. To have SR results for test images, the sparse coefficients of LR image patches are firstly computed, and then the HR patch coefficients can be estimated through the obtained relationships between LRHR patches. In the end, the HR patches’ pixels are derived by multiplying the HR dictionary.
Similar to internal learning, we also expect multiple variant HR outputs to ensure the diversity of external learning. To this end, we build up multiple LRHR dictionaries from different external training datasets. However, traditional coupled dictionary learning method is a bilevel optimization problem, which is good at learning the common characteristic from training databases but ignores the specific ones. Domke et al. [36] reported that bilevel optimization suffers from high computation cost in training and is difficult to reach the specific characteristic because of massive iterations of minimizing the internal function. His solution truncates the internal optimization to a fixed number of iterations. By back optimizing the internal parameters, a better result can be achieved which has more specific characteristics from each training dataset. Inspired by [36], we truncate the sparse encoding procedure ISTA [37] to iterations in LR dictionary training. We also find the truncated algorithm meets our criterion even when . The encoding function is then formulated as:
(5) 
where is the sparse representation, is the LR image patches, is a soft thresholding function, is a vector of threshold, , are equivalent to , , respectively, is a prespecifie LR dictionary in the ISTA algorithm, here is the learned LR dictionary. For more details on the parameter , please refer to [38].
Given varied training datasets, each dictionarypair is computed by
(6)  
subject to:  
where denotes HR image patches, denotes the Frobenius norm.
Because image patch in dictionary learning is represented by a vector, it is sensitive to patch rotation and results in different SR results with changes in angles. Similar to the tailored internal learning, we put the LR input at four angles () and apply our external learning method on each. Then, we can have preliminary HR results.
IvC The LowRank Solution
IvC1 The LowRank Matrix Decomposition Model
From above tailored internal and external learning methods, multiple preliminary HR images are reconstructed from the selfsimilarities and the learned metadetail. They have the essential but different recovered details that can be considered as new priors. It is natural to use them together to further improve the result. However, each image also has its own estimation errors and noise beside the prior. Thus, a straightforward integration (e.g. taking a weighted average) does not work. Due to the significant correlation among these HR images, we vectorize and then stack them up as a matrix . The correlated components in can be reduced into a lowdimensional subspace, while the uncorrelated components (estimation errors and noise) are left in the original space. In other words, can be approximated by a lowrank matrix. We, therefore, treat the integration as a dimension reduction problem, and propose to decompose into a lowrank component and a sparse component. The reason is that lowrank decomposition was designed for extracting correlation, and has proven its robustness and effectiveness on separating the lowrank component from the sparse component in the data. In this problem, the lowrank component is the desired SR image, and the sparse component contains the estimation errors and noise.
To obtain the lowrank component, we let , where denotes the low rank matrix, is the perturbation matrix. The smaller the is, the better performance we achieve. Thus, the problem now becomes how to find the best rank estimation of the matrix , and is formulated as:
(7) 
Instead of perturbation matrix , lowrank decomposition uses sparse representation . So, is further written as . and are computed by
(8) 
where is the nuclear norm of the matrix (the sum of singular values of ), is the norm of . Because of huge computation of SVD decomposition of large matrix and the problematic underdeterminate objective (8), we employ an alternating projection algorithm that is like the GoDec method [39]. Specifically, we alternately compute one by fixing another one as follows:
(9) 
where denotes the number of nonzero elements in the matrix.
In this problem, the rank is expected to be 1. However, existing lowrank algorithms cannot guarantee a global solution with . Alternatively, the average of the lowrank components (images) can be the ultimate SR result. As the LR images often involve noise and they are random in real world scenarios, our algorithm also calculates the noise matrix when computing . This shows another capability, antinoise, which means our solution is able to superresolve the noisy images. Experiments in section VD demonstrate this additional advantage.
IvC2 Input Quantity vs. Performance
In practice, we encounter a question that how many inputs are feasible in our lowrank solution to have the desired SR result. Fortunately, some progress has been made in theories of lowrank modeling. Candès at al. [40, 41, 42] and Chandrasekaran et al. [43] have theoretically proven that the solution of (8) can recover the lowrank matrix and the sparse matrix with a high probability, if the following conditions are met: 1. The underlying satisfies the incoherence condition^{1}^{1}1The incoherence condition mathematically characterizes the difficulty of recovering the underlying from a few sampled entries. Informally, it says that the singular vectors of should sufficiently “spread out” and be uncorrelated with the standard basis.; 2. The nonzero entries of the underlying are sufficiently sparse with a random spatial distribution; 3. The observed entries are uniformly distributed in matrix , and the lower bound on number of them is on the order of , where is the image size (product of width and height of an image). Firstly, our problem satisfies the condition 1. Secondly, section IIIB proves the estimation error is sparse, meanwhile, the noise is random. Both meet the condition 2. Finally, the condition 3 indicates that a detail from either the internal learning or external learning is able to be recovered if it appears in more than times. In other words, we need at least images to recover and . Commonly, approximates to 5 or 6 in the superresolution problem. This means our lowrank solution is solvable when having a small number of the external and internal learning results as the input.
Unfortunately, aforementioned researches do not provide an upper bound or prove if more images will always lead to better results. To answer this question, it is better to explore the influence of and to the convergence of our lowrank solution. We hereinafter employ the theory founded by Lewis & Malick [44], then give an analysis below by considering first and with the core mathematical background.
The algorithm solving (9) can be imagined as projecting onto a manifold and then onto another one alternately, where and are two manifolds around a point .
(10) 
where is a matrix, is a projection of a matrix to a set . We can see any point is a local result of Eq.(9) because
(11) 
Thus, updating in Eq.(9) is achieved by
(12) 
The theory [44] states that a smaller angle between these two manifolds produces a faster convergence and better result of the algorithm. Therefore, our task becomes to discuss how and influence the angle between two manifolds and in which is defined as
(13) 
where is the inner product, is the unit sphere in . The angle at point is defined as the angle between the corresponding tangent spaces and .
The normal spaces of manifolds and at point are given by
(14) 
where is the eigenvalue decomposition. Assume , we have
(15) 
where is the noise corresponding to , is the elementwise hard thresholding error of . Thus, the normal space of manifold of is
(16) 
Due to the tangent space is the complement space of the normal space, we have
(17) 
Therefore, the angle at point can be simplified as
(18) 
Then, we can see
(20) 
where the diagonal elements of and are eigenvalues of and , respectively. Moreover, implicates that , the above inequality can be deduced as
(21) 
Till now, we can conclude that the angle at a local solution is influenced by the . In other words, the convergence of will slow down and the resulting algorithm will degrade while is augmented.
For , the analogous analysis can be done as above, in which is the singular value hard thresholding error of .
The above analysis theoretically indicates that a better performance of our lowrank solution can be achieved by abating . As aforementioned in sections IVA and IVB, we increase the diversity of internal and external learnings to produce multiple preliminary HR results as the input of the proposed lowrank solution. It aims at increasing the appropriate details from self similarities and external samples, and integrating them as much as possible. However on the other hand, too many preliminary HR results augment of both and as well. One can see the relation between the lowrank input quantity and its performance is not monotonically increasing. When the quantity is small, the learned details from new entries contribute to the SR image more than the associated error does. Then the performance improves with adding more preliminary HR images. Nevertheless, massive input entries will not bring significant new details into the lowrank solution further, but introduce more errors which degrade SR image quality and slow down the convergence. In this case, the quality of the resulted SR image gets worse when the preliminary HR images are sustainedly added. Ideally, we expect to find the turning point which reaches at the peak of the performance against the input quantity. Our experiments show that only around 26 42 preliminary HR images achieve the best performance, please refer to Fig. 6. This conclusion allows a much simpler tailoring of the internal learning and external learning to fit the lowrank solution.
V Experiments and Discussion
In order to demonstrate the effectiveness of our proposed method, the qualitative and quantitative assessments are conducted on five aspects: 1). the experiment of lowrank input quantity vs. its performance, 2). comparison with recent stateofthearts on noiseless images, 3). comparison on the synthetic noisy images with varied noises, 4). comparison on the real noisy images, and 5). comparison with denoising + superresolution on noisy images.
The comparison baselines consist of six stateoftheart methods: the adjusted anchored neighbor neighbor regression (A+) [19], the transformed selfexample (TSE) [13] and the deep learning based SR methods: the Superresolution Convolutional Neural Network (SRCNN) [5], the Sparse Coding based Network (SCN) [23], Fast SuperResolution Convolutional Neural Network (FSRCNN) [8] and Very Deep Convolutional Networks for Image SuperResolution (VDSR) [9], where TSE is an internal learning method, others are external learning methods, particularly, SCN can super resolve noisy images. In addition, we also assess the lowrank solution on internal learning / external learning individually, named Internal / External. The quantitative assessments are carried out on two evaluation metrics, namely peaksignaltonoise ratio (PSNR) and structural similarity (SSIM). PSNR is the ratio between the reference signal and the distortion signal in an image. SSIM measures the structure changes between the reference and distorted image, which stresses the topology information.
Va Experimental Configuration
To train the external learning method, 4.1 million image patchpairs are randomly selected from database [2] where the LR images are upsampled to the same size as the corresponding HR ones and then both of them are partitioned into patches of size . Next, the LRHR image patchpairs are sorted in a descending order by the patch variance, and sequentially divided into groups with an overlap of 100,000 samples between each other. Each group contributes to one dictionary in external learning. To effectively extract the high frequency signals in LR patches, the first and second order derivatives of the image patches along and axes are used as the feature (preprocessing) with dimension of 324 (). Since it is time consuming, we then reduce the dimensionality to 30 using PCA, which preserves 99.9% of average energy.
For the internal learning, we select varied similar patches (the number from to ) as selfsimilarity from four rotated images to reconstruct multiple HR images. Since our human visual system is more sensitive to the luminance channel than the chromatic channels, the tested images are transformed from RGB to YCbCr color space. So, the tailored internal and external learning methods perform only on the Y channel. The evaluations are carried on the noiseless images from three standard test datasets Set5 [45], Set14 [17], BSD100 (a set of 100 images from BSD300) [33], the synthetic noisy images by adding varied Gaussian noise to noiseless images, and real noisy images from dataset [46].
Data Set  Set5  Set14  BSD100  
Upscaling  
A+ [19]  36.55  32.59  30.29  32.28  29.13  27.33  31.21  28.29  26.82 
(0.9544)  (0.9088)  (0.8603)  (0.9056)  (0.8188)  (0.7491)  (0.8863)  (0.7835)  (0.7087)  
TSE [13]  36.49  32.58  30.31  32.22  29.16  27.40  31.18  28.29  26.84 
(0.9537)  (0.9093)  (0.8619)  (0.9034)  (0.8196)  (0.7518)  (0.8855)  (0.7840)  (0.7106)  
SRCNN [5]  36.66  32.75  30.49  32.45  29.30  27.50  31.36  28.41  26.90 
(0.9542)  (0.9090)  (0.8628)  (0.9067)  (0.8215)  (0.7513)  (0.8879)  (0.7863)  (0.7101)  
SCN [23]  37.21  33.34  31.14  32.80  29.57  27.81  31.60  28.60  27.14 
(0.9571)  (0.9173)  (0.8789)  (0.9101)  (0.8263)  (0.7619)  (0.8915)  (0.7905)  (0.7191)  
FSRCNN [8]  37.00  33.16  30.71  32.63  29.43  27.59  31.56  28.52  27.01 
(0.9558)  (0.9140)  (0.8657)  (0.9088)  (0.8242)  (0.7535)  (0.8894)  (0.7853)  (0.7131)  
VDSR [9]  37.53  33.66  31.35  33.03  29.77  28.01  31.90  28.82  27.29 
(0.9587)  (0.9213)  (0.8838)  (0.9124)  (0.8314)  (0.7674)  (0.8960)  (0.7976)  (0.7251)  
External  36.46  32.63  30.31  32.25  29.16  27.26  31.17  28.29  26.90 
(0.9540)  (0.9052)  (0.8614)  (0.9052)  (0.8202)  (0.7487)  (0.8864)  (0.7836)  (0.7067)  
Internal  36.52  32.55  30.14  32.19  29.14  27.41  31.06  28.26  26.84 
(0.9594)  (0.9082)  (0.8603)  (0.9057)  (0.8202)  (0.7506)  (0.8815)  (0.7842)  (0.7097)  
External  36.58  32.71  30.45  32.41  29.29  27.41  31.33  28.38  27.03 
(0.9561)  (0.9078)  (0.8626)  (0.9061)  (0.8212)  (0.7512)  (0.8878)  (0.7858)  (0.7083)  
Internal  36.58  32.51  30.36  32.36  29.23  27.53  31.24  28.47  27.04 
(0.9622)  (0.9077)  (0.8625)  (0.9064)  (0.8214)  (0.7524)  (0.8832)  (0.7877)  (0.7111)  
Our  36.71  32.81  30.52  32.53  29.40  27.42  31.45  28.58  27.11 
Improvement  (0.9633)  (0.9098)  (0.8645)  (0.9098)  (0.8236)  (0.7503)  (0.8881)  (0.7889)  (0.7121) 
VB The Experiment on LowRank Input Quantity vs. Its Performance
In the section IVC2, our theoretical proof shows that the relation between the lowrank input quantity and its performance should not be monotonically increasing. When input (the matrix stacked by the internal and external SR results) has very few images, adding more preliminary HR images will improve the overall performance until reaching the best, namely, the turning point. Afterward, more inputs may make performance decline. Unfortunately, it is nontrivial to theoretically prove how many input images could let the algorithm reach the turning point, because the errors and noise of each preliminary HR image varies according to the learning methods and data. To have an empirical result, we produce multiple preliminary HR images by setting similar patches for the internal learning, dictionary pairs for the external learning, and rotating patch at angles (). Then, the average performance of 27 test images against the number of input images are plotted in Fig. 6. It should be noted that internal learning and external learning provide the same amount of samples as the lowrank input. This result verifies our proof in the section IVC2. One can see that the proposed solution improves the overall performance quickly to the maximum before the number of inputs reaches 36. It indicates that around 36 preliminary HR images^{2}^{2}2The number of inputs may vary with a small perturbation when applying different internal and/or external learning methods. (including 18 external learning results and 18 internal ones) can result in the best output. This number ensures the genuine details appear more than 5 or 6 times in inputs. However, when more inputs are sustainedly added, the performance gradually decreases. This small required number of inputs makes it simple to tailor the internal and external learning methods without creating many preliminary HR images. We hereafter apply 36 HR images as the input in the following experiments.
VC Comparison on Noiseless Images
To demonstrate the improved performance, we compare the proposed method with A+ [19], TSE [13], SRCNN [5], SCN [23], FSRCNN [8], VDSR [9], External, and Internal on noiseless images from aforementioned three datasets. The quantitative comparison is listed in Table I, from which we can see that our proposed method boosts PSNR and SSIM, outperforms the A+, SRCNN and TSE but performs worse than the deep learning based methods: SCN, FSRCNN and VDSR. Interestingly, only applying lowrank on the external learning improves PSNR 0.13 dB (SSIM 0.0017) than original method. Compared to others, it performs a little worse than SRCNN but better than A+, 0.10 dB on PSNR (0.0011 on SSIM). Similarly, applying lowrank on the internal learning improves PSNR 0.14 dB (SSIM 0.0016) than original one, and performs better than the best internal learning method, TSE. Furthermore, we apply lowrank on the integration of external and internal learning, and obtain better results, both in PSNR and SSIM. However, we have to admit that our proposed method are indeed worse than the deep learning based methods, particularly SCN and VDSR. Due to the conclusion, in Section III and IV, external and internal learning methods produce different but complementary details. Our lowrank solution is able to wisely integrate pros of both methods and further boosts up SR performance. However, the SR result relies on the selected external and internal learning methods because lowrank itself does not produce new details. The major reason of the success of deep learning methods is that of having a more complete solution space by using huge training data and computation cost. Instead, the method in our solution searches less space and then results in a worse quantitative assessment in Table 1. Nevertheless, the more sophisticated the learning methods applied in our lowrank solution, the better results one will have. To have a better understanding, Fig. 7 shows a qualitative comparison, for more results, please refer to Fig. 1 Fig. 2 in supplementary material. One can see that A+ produces some blur and jaggy artifacts, particularly, noticeable artifacts can be found along the edges. SRCNN produces HR image with relatively rich details compared with A+, but still creates evident artefacts along the edges. TSE applies an advanced patch matching strategy, thus achieves more visually pleasing results, where edges become sharper but artefacts mostly vanish. FSRCNN and VDSR generate more highfrequency details without noticeable jaggy artefacts. By applying lowrank on the external/internal learning methods individually, we successfully grasp the diversity of each preliminary HD image. Thus, the External / Internal achieves competitive visual results. One can see that External has a better detail recovery when the patches frequency appear in the training datasets, and Internal does perform better on the repeated patterns. Our proposed method, integrating the pros of the external and internal learning methods, generates a more visually pleasing image. Although our method is worse than SCN, FSRCNN and VDSR in quantitative comparison, the qualitative results are rather comparable.
VD Comparisons on the Noisy Images
In reality, the LR images are often corrupted by noise. The analysis in section IVC2 shows that our lowrank solution has already taken the noise into account. The experiments below, probe the lowest nature of the lowrank solution to a variety of noisy data in SR.
VD1 The synthetic noisy images
We take 14 images from Set14, and then add Gaussian noise with the variance from 4 to 20 where the step is 4. Comparison is done among A+ [19], SRCNN [5], SCN [23], FSRCNN [8], VDSR [9], TSE [13], External, Internal, and our proposed method, where SCN claims its robustness to noise. Fig. 9 plots the PSNR and SSIM scores against varied noise levels, and shows the quality of all images decreases with the noise level increasing. But a significant difference is that the proposed method decreases much slower than the other six. This result proves that the low rank solution can effectively remove noise.
The qualitative results are shown in Fig. 8 and Fig. 10, where the LR face and monarch images are added Gaussian white noise with standard deviation of 20 and 12 respectively, for more results, please refer to Fig. 3 Fig. 5 in supplementary material. One can see that SRCNN, FSRCNN and VDSR are sensitive to severe noise. A+ method successfully suppresses the noise, but more artefacts appear in smooth regions. SCN method performs slightly better than above methods, but remaining noise is still obvious. TSE method produces a relatively finer result by patchmatching strategy. Our proposed method suppresses the noise significantly and produces little noisecaused artifacts. It could be explained that all competing methods do not discriminate the noise, but process it as the real detail during SR procedure. On the contrary, Equation 8 shows that lowrank decomposition is able to separate the sparse component (noise and error) from inputs.
Furthermore, we compare with SRNI [22] that is devoted to superresolve noisy image. However, we sampled 50 images from BSD300 but excluded from BSD100 which is one of benchmark databases. For a fair comparison, we follow the exact same experimental setting in [22] by downsampling those images, adding Gaussian noise with the variance from 10 up to 30, and then superresolving the noisy LR images. Fig. 11 lists the PSNR and SSIM scores against varied noise levels, and shows our proposed method consistently outperforms SRNI across all testing noise levels.
VD2 The real noisy images
We find the noise in real LR images may not be always Gaussian, thus choose the real noisy images from [46] for a further comparison. Since there is no ground truth, we are not able to provide the quantitative comparison. The qualitative comparison in Fig. 12 is similar as the results of the synthetic noisy images. For more results, please refer to Fig. 6 Fig. 8 in supplementary material. One can see the competing methods are still sensitive to severe noise in real noisy images. SRCNN produces some blur along the edges but VDSR generates unrealistic sharp details. Instead, our proposed method consistently suppresses the noise and recover the edges better than the others.
VD3 The denoised noisy images
When encountering noisy LR images, a straightforward consideration is to denoise them and then apply SR. To test this configuration, we compare our proposed method, External and Internal with denoising + SR. The SR methods and testing data are the same as those in Section VD1. The denoising process is done by the well accepted algorithm BM3D [47]. Fig. 14 plots the PSNR and SSIM scores against varied noise levels, and illustrates the denoising process does improve the competing methods on noisy LR images. This configuration even achieves comparable performance as External and Internal. However, when applying lowrank on both external and internal learning methods, our proposed method is superior to them again due to an effective integration of the complementary details recovered by two methods. Fig. 13 shows the qualitative results of a test image with noise level 8, for more results, please refer to Fig. 9 Fig. 11 in supplementary material. Thanks to BM3D denoising, the majority of noise has vanished from results of all competing SR methods. However, the local magnifications depict certain highfrequency signal along edges is inevitably lost because of denoising procedure, and certain residual noise and artifacts are magnified. Instead, our proposed method has no denoising process. Thus, it can not only suppress the noise, but also preserves the highfrequency details.
Vi Conclusion
Instead of solving the problem of SR by internal learning or external learning separately, we proposed a lowrank solution to integrate their pros together. We show that the attributes of internal and external learning are complementary in the feature space and image plane. Meanwhile, their estimation errors are sparse. This is the basis, upon which the proposed solution is found. With theoretical analysis and a real data experiment, we also proved that the lowrank solution does not require massive input to achieve a desired SR image. This result makes tailoring the internal and external learning methods easier for the integration. Compared to other stateoftheart methods, our proposed solution is parameter free, and does not need preprocessing for internal or external prior selection during the integration. The experiment on the noiseless data has demonstrated a comparable performance of our proposed solution. Particularly, on a variety of noisy data, our solution performs superior on restraining noise and recovering sharp details. Furthermore, additional performance could be gained by generalizing our solution with more recent methods. Till now, it is still uncertain if the numbers of preliminary HD images from internal and external learning methods must be same. This is an issue we hope to further explore in future studies in order to determines better performance.
References
 [1] J. Yang, J. Wright, T. Huang, and Y. Ma, “Image superresolution as sparse representation of raw image patches,” in CVPR. IEEE, 2008, pp. 1–8.
 [2] J. Yang, J. Wright, T. S. Huang, and Y. Ma, “Image superresolution via sparse representation,” IEEE Transactions on Image Processing, vol. 19, no. 11, pp. 2861–2873, 2010.
 [3] S. Wang, D. Zhang, Y. Liang, and Q. Pan, “Semicoupled dictionary learning with applications to image superresolution and photosketch synthesis,” in CVPR. IEEE, 2012, pp. 2216–2223.
 [4] Z. Cui, H. Chang, S. Shan, B. Zhong, and X. Chen, “Deep network cascade for image superresolution,” in ECCV. Springer, 2014, pp. 49–64.
 [5] C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a deep convolutional network for image superresolution,” in ECCV. Springer, 2014, pp. 184–199.
 [6] K. Zhang, D. Tao, X. Gao, X. Li, and Z. Xiong, “Learning multiple linear mappings for efficient single image superresolution,” IEEE Transactions on Image Processing, vol. 24, no. 3, pp. 846–861, 2015.
 [7] D. Dai, R. Timofte, and L. Van Gool, “Jointly optimized regressors for image superresolution,” in Eurographics, vol. 7, 2015, p. 8.
 [8] C. Dong, C. C. Loy, and X. Tang, “Accelerating the superresolution convolutional neural network,” in ECCV. Springer, 2016, pp. 391–407.
 [9] J. Kim, J. Kwon Lee, and K. Mu Lee, “Accurate image superresolution using very deep convolutional networks,” in CVPR. IEEE, 2016.
 [10] M. Protter, M. Elad, H. Takeda, and P. Milanfar, “Generalizing the nonlocalmeans to superresolution reconstruction,” IEEE Transactions on Image Processing, vol. 18, no. 1, pp. 36–51, 2009.
 [11] G. Freedman and R. Fattal, “Image and video upscaling from local selfexamples,” ACM Trans.Graph., vol. 30, no. 2, p. 12, 2011.
 [12] A. Giachetti and N. Asuni, “Realtime artifactfree image upscaling,” IEEE Transactions on Image Processing, vol. 20, no. 10, pp. 2760–2768, 2011.
 [13] J.B. Huang, A. Singh, and N. Ahuja, “Single image superresolution from transformed selfexemplars,” in CVPR. IEEE, 2015, pp. 5197–5206.
 [14] Z. Wang, Y. Yang, Z. Wang, S. Chang, J. Yang, and T. S. Huang, “Learning superresolution jointly from external and internal examples,” IEEE Transactions on Image Processing, vol. 24, no. 11, pp. 4359–4371, 2015.
 [15] W. T. Freeman, T. R. Jones, and E. C. Pasztor, “Examplebased superresolution,” IEEE Computer Graphics and Applications, vol. 22, no. 2, pp. 56–65, 2002.
 [16] J. Yang, Z. Wang, Z. Lin, S. Cohen, and T. Huang, “Coupled dictionary training for image superresolution,” IEEE Transactions on Image Processing, vol. 21, no. 8, pp. 3467–3478, 2012.
 [17] R. Zeyde, M. Elad, and M. Protter, “On single image scaleup using sparserepresentations,” in Curves and Surfaces. Springer, 2012, pp. 711–730.
 [18] R. Timofte, V. De, and L. Van Gool, “Anchored neighborhood regression for fast examplebased superresolution,” in ICCV. IEEE, 2013, pp. 1920–1927.
 [19] R. Timofte, V. De Smet, and L. Van Gool, “A+: Adjusted anchored neighborhood regression for fast superresolution,” in ACCV. Springer, 2014, pp. 111–126.
 [20] D. Glasner, S. Bagon, and M. Irani, “Superresolution from a single image,” in ICCV. IEEE, 2009, pp. 349–356.
 [21] Y. Zhu, Y. Zhang, and A. L. Yuille, “Single image superresolution using deformable patches,” in CVPR. IEEE, 2014, pp. 2917–2924.
 [22] A. Singh, F. Porikli, and N. Ahuja, “Superresolving noisy images,” in CVPR. IEEE, 2014, pp. 2846–2853.
 [23] D. Liu, Z. Wang, B. Wen, J. Yang, W. Han, and T. S. Huang, “Robust single image superresolution via deep networks with sparse prior,” IEEE Transactions on Image Processing, vol. 25, no. 7, pp. 3194–3207, 2016.
 [24] W. Dong, G. Shi, and X. Li, “Nonlocal image restoration with bilateral variance estimation: A lowrank approach,” IEEE transactions on image processing, vol. 22, no. 2, pp. 700–711, 2013.
 [25] H. Zhang, W. He, L. Zhang, H. Shen, and Q. Yuan, “Hyperspectral image restoration using lowrank matrix recovery,” IEEE Transactions on Geoscience and Remote Sensing, vol. 52, no. 8, pp. 4729–4743, 2014.
 [26] X.Y. Jing, X. Zhu, F. Wu, R. Hu, X. You, Y. Wang, H. Feng, and J.Y. Yang, “Superresolution person reidentification with semicoupled lowrank discriminant dictionary learning,” IEEE Transactions on Image Processing, vol. 26, no. 3, pp. 1363–1378, 2017.
 [27] C.F. Chen, C.P. Wei, and Y.C. F. Wang, “Lowrank matrix recovery with structural incoherence for robust face recognition,” in CVPR. IEEE, 2012, pp. 2618–2625.
 [28] X.Y. Jing, F. Wu, X. Zhu, X. Dong, F. Ma, and Z. Li, “Multispectral lowrank structured dictionary learning for face recognition,” Pattern Recognition, vol. 59, pp. 14–25, 2016.
 [29] C. Zhang, J. Liu, Q. Tian, C. Xu, H. Lu, and S. Ma, “Image classification by nonnegative sparse coding, lowrank and sparse decomposition,” in CVPR. IEEE, 2011, pp. 1673–1680.
 [30] Y. Zhang, Z. Jiang, and L. S. Davis, “Learning structured lowrank representations for image classification,” in CVPR, 2013, pp. 676–683.
 [31] F. Wu, X.Y. Jing, X. You, D. Yue, R. Hu, and J.Y. Yang, “Multiview lowrank dictionary learning for image classification,” Pattern Recognition, vol. 50, pp. 143–154, 2016.
 [32] X. Niyogi, “Locality preserving projections,” in NIPS, vol. 16. MIT, 2004, p. 153.
 [33] D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics,” in ICCV. IEEE, 2001, pp. 416–423.
 [34] A. Buades, B. Coll, and J.M. Morel, “A nonlocal algorithm for image denoising,” in CVPR, vol. 2. IEEE, 2005, pp. 60–65.
 [35] R. Timofte, R. Rothe, and L. Van Gool, “Seven ways to improve examplebased single image super resolution,” in CVPR, 2016, pp. 1865–1873.
 [36] J. Domke, “Generic methods for optimizationbased modeling,” in AISTATS, 2012, pp. 318–326.
 [37] A. Beck and M. Teboulle, “A fast iterative shrinkagethresholding algorithm for linear inverse problems,” SIAM journal on imaging sciences, vol. 2, no. 1, pp. 183–202, 2009.
 [38] K. Gregor and Y. LeCun, “Learning fast approximations of sparse coding,” in ICML, 2010, pp. 399–406.
 [39] T. Zhou and D. Tao, “Godec: Randomized lowrank & sparse matrix decomposition in noisy case,” in ICML, 2011, pp. 33–40.
 [40] E. J. Candès, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?” J. ACM, vol. 58, no. 3, pp. 11:1–11:37, 2011.
 [41] E. J. Candès and Y. Plan, “Matrix completion with noise,” Proceedings of the IEEE, vol. 98, no. 6, pp. 925–936, 2010.
 [42] E. J. Candès and T. Tao, “The power of convex relaxation: Nearoptimal matrix completion,” IEEE Transactions on Information Theory, vol. 56, no. 5, pp. 2053–2080, 2010.
 [43] V. Chandrasekaran, S. Sanghavi, P. A. Parrilo, and A. S. Willsky, “Ranksparsity incoherence for matrix decomposition,” SIAM Journal on Optimization, vol. 21, no. 2, pp. 572–596, 2011.
 [44] A. S. Lewis and J. Malick, “Alternating projections on manifolds,” Mathematics of Operations Research, vol. 33, no. 1, pp. 216–234, 2008.
 [45] M. Bevilacqua, A. Roumy, C. Guillemot, and M. L. AlberiMorel, “Lowcomplexity singleimage superresolution based on nonnegative neighbor embedding,” in BMVC. BMVA Press, 2012.
 [46] http://ni.neatvideo.com/examples/#cowboy.
 [47] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3d transformdomain collaborative filtering,” IEEE Transactions on Image Processing, vol. 16, no. 8, pp. 2080–2095, 2007.