Process of image Super-Resolution
In this paper we explain a process of super-resolution reconstruction allowing to increase the resolution of an image.The need for high-resolution digital images exists in diverse domains, for example the medical and spatial domains. The obtaining of high-resolution digital images can be made at the time of the shooting, but it is often synonymic of important costs because of the necessary material to avoid such costs, it is known how to use methods of super-resolution reconstruction, consisting from one or several low resolution images to obtain a high-resolution image. The american patent US 9 208 537 describes such an algorithm. A zone of one low-resolution image is isolated and categorized according to the information contained in pixels forming the borders of the zone. The category of it zone determines the type of interpolation used to add pixels in aforementioned zone, to increase the neatness of the images. It is also known how to reconstruct a low-resolution image there high-resolution image by using a model of super-resolution reconstruction whose learning is based on networks of neurons and on image or a picture library. The demand of chinese patent CN 107563965 and the scientist publication “Pixel Recursive Super Resolution”, R. Dahl, M. Norouzi, J. Shlens propose such methods.
Last February 2017, three researchers described before from Google Brain published their results on Pixel Recursive Super Resolution:
Their technology consists in approaching the final picture by combining an algorithm and database pictures of Google to obtain from an initial 8x8 definition picture, a 32x32 definition picture which is very similar as the actual picture. A French startup Lablanche & Company (http://www.lablanche-and-co.com) is said to have developed an innovation based on a powerful algorithm which could deliver similar results as Google, BUT, without using any database. If the achievements of this new algorithm are confirmed, it would represent a very important innovation for any potential end user which would want to operate independently of Google database, and faster. In this paper, we present the new technical of interpolation for image super-resolution based on conditional interpolations in three directions called successively to achieve a blurred high resolution image with a lot of more details to help deconvolution algorithm (the directional interpolation monitored by the orientation of the motion filter) to create truth details at the end (LABLANCHE process).
For the tests, we must realize two stages.
Stage 1 : pixelization or compression of images which consists to fill in each bloc of 4x4 pixels of R, G or B channel with the center intensity of the bloc.
Stage 2 : storage of intensities in a vector of unsigned char type for the channel (if monochrome image) or three channels (if RGB image). In the C++ language, the unsigned char type elements corresponds to bytes. This stage provides a smaller size file which contains center intensities.
The parameter named step corresponds to the size of same color blocs, in this paper step=4 and the center intensity is the color of the 4 by 4 pixels bloc to increase the general enhancement of the image. step controls the quality of the reconstruction and therefore the loss of quality generated by the compression. For step=4, the compression rate is equivalent of the JPEG size format. step and step/2 must be even numbers to realize the reconstruction.
We note q=step/2, the number of levels. Here we have two levels LEVEL 1 and LEVEL 2. step is an even number which enables to share the 4 by 4 pixels bloc in 4 blocs qxq.
In the case of 2 by 2 pixels blocs, we have one level (step=2).
Considering that images as vectors, the problem can be modeled with a low resolution observation named of the original image which is obtained in applying an operator of movement (i.e. a geometric transformation), a blur operator , and with a subsampling operator . An additive noise completes this inverse problem: =+.
The operator (subsampling + blurring + geometric transformation) owns more columns that rows, the system is underdetermined. The extension factor between the image and the image is equal to .
The image obtained by a super-resolution algorithm from must be verified the reconstruction constraint, i.e. must gives using the observation model.
3 Compressed Sensing
The Compressed Sensing technique makes it possible to find the most parsimonious solution from an undetermined linear system. It involves not only the means of finding such a solution but also the linear systems that are acceptable. The process is to reconstruct accurately a signal or an image when the number of measurements is less than the length of the signal.
The paradox can be explained by the fact that the signal admits a sparse representation in an appropriate basis. This sampling aims to replace the classic Shannon sampling. This theory has been widely developed in recent years and the amount of applications is continuing to rise in diverse fields such as tomography with MRI and computed tomography or radar and radar imagery, in particular in signal and image compression.
We consider a real valued signal x, which we will see in the form of a column vector of length N. In the case of images or higher dimension signals, we can arrange the data to form a long vector.
We indicate this signal in a base such as x= s. We wish to select the base whereby the majority of coefficients of s are equal to zero, ie x is parsimonious in the base. These bases are known and usually used to compress the starting signal. Compressed Sensing proposes to acquire directly the compressed version of the signal so that unnecessary samples do not need to be processed.
The linear measuring process that is used consists of making M scalar products, with M much smaller than N, between x and a collection of vectors going from 1 to M. We obtain therefore samples of = measurements. Considering that the measurement matrix has the () as column, we can thus write x as: y=x=s=s (with =).
The measurement matrix design must be able to find the most parsimonious signals. In order to do so, the matrix measurements must follow certain properties, one of which is called RIP (Restricted Isometry Property). The matrix satisfies the restricted isometry property (RIP) of order k if there exists a in (0,1) such that (1-) (1+) holds for all x .
Furthermore, must be inconsistent with , so the coherence of must be closest to 0. The coherence of a matrix is the largest absolute inner product between any two columns of .
It is evident that these properties are satisfied with a high probability simply by choosing at random, for example by using a Gaussian distribution. Although these RIP or inconsistency conditions are satisfactory for some sets of measurements, they are only necessary conditions and not sufficient. There are other properties, also insufficient, which require us to need even less samples in order to guarantee a parsimonious signal reconstruction.
The last step of the Compressed Sensing process is the reconstruction of the starting signal. For this, we know the M values for y, the matrix measurements used as well as the base. The reconstruction algorithm seeks to find the coefficients of s. Thanks to the knowledge of , it is then simple to find x, the starting signal.
There are infinite solutions for the equation s=y, however we are looking for the solution which minimises a certain norm. The L2 norm measures the signal energy which is why by using a L2 -minimization, we will hardly ever find a K-parsimonious result. The L0 norm measures the parsimony of the signal (we count the number of elements not equal to zero), the optimization =argmin such as s=y gives a good result. The disadvantage is that this problem can not be calculated digitally. It has been shown that the optimization based on the L1 norm makes it possible to find an exact K-parsimonious signal, the problem of the L1-minimization is convex.
Its resolution could be reduced to a linear program, more often known as basis pursuit.
This resolution is not the only one and there are other techniques which allow for even better results such as Orthogonal Matching Pursuit (OMP), Conjugate Gradient Pursuit (CGP) or Stagewise Weak Conjugate Gradient Pursuit (StWCGP).
The number of necessary measurements M to have an exact recovery of the K-parsimonious signal is in the same order than 5K.
We use a reconstruction of pixelated image with an algorithm of conditional and directional interpolation.
The algorithm owns as scientific base the fact when the pixels ratio is very small the decoherence used in compressed sensing (CS) plays an important role in the quality of reconstruction whereas the parsimony plays an important role when the ratio is more important (over 10). So the parsimony research is efficient for the problem with step=2 but is inefficient for step=4 and step=3.
The algorithm mixes the information in order to approach the reality because the decoherence is controled with following conditional interpolations in three possible directions which enables for the deconvolution (the final step called directional interpolation) to create the details to give an acceptable reconstruction.
4.1 Conditional interpolation
Conditional interpolation of LEVEL 1
We start with a 4x4 pixels bloc filled in with the center intensity (B bloc).
The B bloc is divided in four similar neighbors 2x2 blocs : B1, B2, B3 and B4.
The B1 bloc (at the top and on the left) always keeps its starting intensity. The B bloc (4x4) owns three 4 by 4 neighbors blocs:
-the C bloc (on the right)
-the D bloc (at the bottom)
-the E bloc (at the bottom and on the right)
fig.1 illustration of the image splitting in four neighbor blocs B,C,D,E
fig.2 illustration of LEVEL 1 interpolation
First dichotomy (conditional interpolation of LEVEL 1)
B1 keeps the color of B bloc (always)
B2 takes the average color if and keeps the color of B bloc otherwise
B3 takes the average color if and keeps the color of B bloc otherwise
B4 takes the average color if and keeps the color of B bloc otherwise
p2 is the parameter which controls the mean between the B bloc and the C bloc.
p3 is the parameter which controls the mean between the B bloc and the D bloc.
p4 is the parameter which controls the mean between the B bloc and the E bloc.
The setting of p2, p3 and p4 parameters is very important. We realize the interpolation between the two blocs only if they have enough closed colors. These parameters are integers between 0 and 255. If the value is equal to 0, there is no interpolation and if the value is 255, the interpolation is always realized. In other cases, the interpolation is realized if the two adjacent blocs have intensities with a smaller difference than the constraint. The constraints p2, p3 and p4 define a metric between two adjacent blocs in three directions.
The goal of this step is to separate different interest areas of the image and to generate contrast enhancement. We called the LEVEL 1 routine several times with different parameters p2, p3 and p4 to obtain the contrast enhancement and the last call is realized with non conditional interpolations e.g. with , , in order to have a presmoothing.
We use the center intensity to colorize the 4 by 4 pixels bloc on the figures 1 and 2. On this example, the B2 bloc and the B4 bloc are changed by interpolation and the B3 bloc keeps its color because of the constraints.
Conditional interpolation of LEVEL 2
We start with the B1 bloc.
The B1 bloc is divided in 4 neighbours blocs of size 1 (pixels) : B11, B12, B13 and B14.
The B11 pixel (at the top and on the left) always keeps the B1 bloc intensity.
fig.3 illustration of LEVEL 2 interpolation
Second dichotomy (conditional interpolation of LEVEL 2)
B11 keeps the color of B1 bloc (always)
B12 takes the average color if and keeps the B1 color otherwise
B13 takes the average color if and keeps the B1 color otherwise
B14 takes the average color if and keeps the B1 color otherwise
p2 is the parameter which controls the mean between the B1 bloc and the B2 bloc.
p3 is the parameter which controls the mean between the B1 bloc and the B3 bloc.
p4 is the parameter which controls the mean between the B1 bloc and the B4 bloc.
The goal of this step is to create a smoothing of the image.
In general, p2, p3 and p4 must be equal to 255 to obtain a precise smoothing.
In the case where step=2, this metric corresponds to the LEVEL 1 interpolation with 2x2 blocs instead of 4x4 blocs. In this case, this metric is a smoothing which controls the averaging of two adjacent blocs and the constraints p2, p3 and p4 have values between 0 and 255 because we work with original intensities. In the case step=4, this metric is the final smoothing of the image because we work with intensities computed in LEVEL 1 interpolations.
4.2 Directional interpolation
The deconvolution with the motion blur filter aims to create details of the image. The deconvolution can be formulated by the inverse problem where h is the motion blur filter, y the observed image and x the unknown image.
is the convolution operator. The idea is to increase the size of the filter until the details are created but it is possible only with a well-designed y image built using the appropriate conditional interpolations.
The motion filter owns two parameters : the distance and the direction (or the angle of deconvolution) . Before the deconvolution we must increase the size of the image with a magnification factor to avoid the creation of blurred blocs on the image after the deconvolution. In the practice for very small images with a 32x32 size (1024 pixels on each channel). The size becomes 60x60 (3600 pixels on each channel).
For larger sizes, we can take . The paramaters of conditional interpolations are known after a step of deep learning with a large database of HR and LR images and the better values are chosen to obtain the better interpolation, much less blurred than the other interpolation methods such as bicubic interpolations.
The range of values for the parameter begins from 0 to 19 and the range of values begins from 0 to 170 by step of 10 degrees (or 5 degrees) such as resume in the following table (see table 1: table of deconvolution values ).
For an image, there is a resonance angle (by analogy with the resonance frequency) which appears when the user makes the motion deconvolution algorithm because the details are created only with this value of .
We note this unique value. The parameter length depends of the size of the image, its value decreases when the size of the image is larger.
This phenomenon leads us to build a new mathematical theory where the structure of images will be described with an angle of movement instead of a wavelet.The information can be represented as a row vector
or a matrix with different calls of LEVEL 1 interpolations. In the practice, we need only three or four LEVEL 1 interpolations, the first two have the role of contrast enhancement and the last is the presmoothing. The smoothing has two components: the last call of LEVEL 1 interpolation just after the different areas are dissociated and the LEVEL 2 interpolation at the end of the interpolation process. We write the three parameters of the first LEVEL 1 occurrence , and . For the second occurrence , and , and for the third occurrence , and . So this representation is given in the next matrix:
Source: Image Source (Digital Video Camera)
Amount (to apply to the image): 100,125,150 or 175
Noise: Remove Noise (Yes/No or Dark only or Light only)
table 1: table of deconvolution values
In this table, the angles are indicated in degrees.
We indicate the blur motion filter h for three angles with L=11. h is a vector for horizontal and vertical direction and a matrix otherwise. The filter h plays the role of the matrix in the parsimonious research (see Compressed Sensing section).
h is a row vector of length 11 with a unique value, h=.
h is a column vector of length 11 with the same value, h=.
h is a 11x3 matrix. h=.
To conclude we have the formulation of the algorithm.
q LEVEL interpolations + motion deconvolution algorithm
2 by 2 blocs: LEVEL 2 interpolation + motion deconvolution algorithm
In the last case, the LEVEL 2 interpolations are conditional to create a smoothing with appropriate averaging.
4.3 Generalization to 3 by 3 pixels blocs
In the case where step and q are not even numbers, for example for step=3, we define q=E(step/2) where E is the integer part. The B1 bloc is a 2 by 2 pixels bloc but the sizes of B2, B3 and B4 are different. The B2 bloc is a 2 by 1 pixels bloc, the B3 bloc is the 1 by 2 pixels bloc and B4 bloc is a pixel.
fig.4 illustration of the case 3 by 3 pixels
The B2 bloc is stretched in the direction of the right adjacent bloc and the B3 bloc is stretched in the direction of the bottom adjacent bloc which enables to say that the geometry of this interpolation method is optimal for the case where step=3.
The algorithm described before is the same with these new B2, B3 and B4 blocs and the L parameter is smaller because we need a weaker deconvolution than the deconvolution used in the case of 4 by 4 pixels. Nevertheless, the angle of resonance is the same for a given image.
This case is important because it corresponds to 11 of known pixels and we have demonstrated for 32x32 images that the 11 larger coefficients approximation in the best wavelet base (in general Villasenor 1 or Villasenor 2) is very precise for a visual identification by the human eye. Therefore this algorithm enables to depixelize images in this case because we have an equivalence between the pixels ratio and the best wavelet larger coefficients keeped ratio in the quality of reconstruction and approximation. Villasenor 1 is the wavelet basis used in the JPEG 2000 standard. The best wavelets are Villasenor 1,2,3,4,5.
fig.5 Original 32x32 image
fig.6 Approximation with the 11 of larger coefficients in Villasenor 1 wavelet basis (Ingrid Daubechies wavelet)
fig.6-bis Daubechies wavelet family
5.1 First results
In this part, we work with three reference 32x32 images representing three human faces, two women and a man. These images of celebrities have been published in February 2017 by researchers of Google Brain team to illustrate neural network applications.
We begin to give the conditional interpolations parameters obtained after a lot of tests on a large dataset of images with different geometries. We have chosen one of the best combination of parameters which brings out the resonance angle of motion deconvolution .
fig.7 geometric model
We also imaginate to determine the parameters of the first three occurrences using statistical metrics as the MSE, the PSNR, the SSIM, the MS-SSIM but we prefer to use visual criterias because of the nature of this problem. The PSNR is the most used metric in image processing. where EQM is the quadratic mean error between two images and of size mxn defined by . The PSNR (dB) measures the proximity of compressed image with the original image but doesn’t consider the visual quality of the reconstruction.
There are 16 millions () of possible combinations for one occurrence but there are 274942 billions of combinations for two occurrences and 4.55 x for three occurrences. Therefore, it is possible to test all combinations of constraints with a personal laptop computer for one occurrence but for two occurrences we need a datacenter.
The user can find easily the angle of resonance for the image with the found optimal constraints. At first we fix the value of the L parameter (the length), for example L=10. In a second time we test the different angles until the better appears. In a third time, we ajust the L parameter with the resonance value until the details are created.
The faces are more difficult to reconstruct than other things because the high frequencies are much more important and the relief is very complex to reproduce.
The first occurrence aims to increase the contrast of the image, so we will build a criterion based on the local contrast of original images using 4x4 or 8x8 windows to find the optimal constraints for the first occurrence. The learning of constraints can be realized solving different minimization problems.
R is a regulator and is the compromise between the two terms of the equation. is the original image and is the solution obtained with optimal parameters. The choice of R depends of the nature of the problem. Here we define R as the inverse of the sum of 8x8 windows contrasts of the image because on the one hand the aim of these constraints is to enhance the contrast and on the other hand the contrast of pixels is low inside a 4x4 window but is very high in a 8x8 window (close to 1). For a 32x32 image, there are 16 windows of 8 by 8 pixels and R= where is the contrast of the bloc. = where and are the maximum intensity and the minimum intensity of the bloc.
The optimization problems (2) and (3) are unpraticable because of the memory of our computer.
We propose a cascade optimization model. The first parameters are obtained by the first optimization problem and are fixed to these optimal values, and the next three parameters are found by the second optimization problem and are fixed to these optimal values, and the last three parameters with the length L and the angle are found by the third optimization problem.
We obtain the next solution with (FACE 1), (FACE 2) and (FACE 3).
FACE 1= ;
LO: Light Only
FACE 2= ;
FACE 3= ;
DO: Dark Only
DVC: Digital Video Camera
We present here the results with pixelated images using the center intensities to fill in the different 4 by 4 blocs.
These three images are pixelated with the center intensities of blocs.
The users can do it on the following link:
The pixels ratio is 6.25 (step=4).
These reconstruction results (see fig.8) are much better than the 6.25 best wavelet larger coefficients keeped approximations (see fig 8.bis), therefore this reconstruction algorithm is much better than Compressed Sensing and bicubic interpolation. We will explore this new method on large dataset images.
5.2 Large Dataset results
We use the next matrix for all tests.
We observe that the optimal angle changes according to the image (Angular Phenomenon).
6 Future prospects
In the future, it would be interesting to find a better deconvolution algorithm and to find a trick to understand the links between conditional interpolations and the optimal angle of the deconvolution. The interpretation of these results is to see images as a combination of angles generated by conditional interpolations and the coherence of these angles is achieved with the optimal orientation of the motion deconvolution.
Charles Dossal help has been very precious in my work to compute biorthogonal wavelet bases for large scale problems, without his help i didn’t identify the transfer between the parsimonious research (l1 minimization and l0 minimization) and the decoherent research (conditional interpolations) when the information pixels rate decreases. https://www.math.u-bordeaux.fr/ cdossal/Enseignements/MHT923/TDOndelettes2D.pdf
(*) sebastien LABLANCHE obtained the master’s degree in statistics at Paul Sabatier University in 2012 and the bachelor’s degree in signal processing at Bordeaux 1 University in 2010.
- Yonina C.Eldar and Gitta Kutyniok, Compressed Sensing Theory and Applications, Cambridge
- Stephane Mallat, Une exploration des signaux en ondelettes, les editions de l’ecole polytechnique
- Charles Dossal, Estimation de fonctions geometriques et deconvolution, These pour l’obtention du titre de Docteur de L’Ecole Polytechnique, le 5 Decembre 2005
- Richard E.Blahut, Willard Miller, Jr, Calvin H.Wilcox, Radar and Sonar Part I, Springer-Verlag
- Sung Cheol Park, Min Kyn Park and Moon Gi Kang, Super-Resolution Image Reconstruction: A Technical Overview, IEEE SIGNAL PROCESSING MAGAZINE,1053-5888/03, May 2003.
- Pradeep Sen and Soheil Darabi, Compressive Image Super-Resolution, Advanced Graphics Lab, Department of Electrical and Computer Engineering, University of New Mexico, USA, New Mexico, 2009 IEEE,1235-1242
- Jianchao Yang, John Wright, Thomas S.Huang, Yi Ma, Image Super-Resolution via Sparse Representation, IEEE Transactions on image processing,vol.19, pp. 2861-2873, November 2010
- Michael Elad and Michal Aharon, Image Denoising via Sparse and Redundant Representations over Learned Dictionaries, IEEE Transactions on image processing, vol.15, pp. 3736-3745, December 2006
- Michal Irani, Shmuel Releg, Image Sequence Enhancement using multiple motions analysis, Institute of computer Science, The Hebrew University of Jerusalem, 0-8186-2855-3/92, 1992 IEEE
- John Wright, Allen Y. Yang, Arvind Ganesh, S.Shankar Sastry, Yi Ma, Robust Face Recognition via Sparse Representation, February 2009 IEEE Transactions on pattern Analysis and Machine Intelligence, vol.31, pp.210-227
- Hong Chang, Dit-Yan Yeung, Yiming Xiong, Super-Resolution Through Neighbor Embedding, Hong-Kong University of Science and Technology, IEEE 2004, Computer Society Conference on Computer Vision and Pattern Recognition, , vol.1, pp 275-282