Image Deconvolution with Deep Image and Kernel Priors
Abstract
Image deconvolution is the process of recovering convolutional degraded images, which is always a hard inverse problem because of its mathematically illposed property. On the success of the recently proposed deep image prior (DIP), we build an image deconvolution model with deep image and kernel priors (DIKP). DIP is a learningfree representation which uses neural net structures to express image prior information, and it showed great success in many energybased models, e.g. denoising, superresolution, inpainting. Instead, our DIKP model uses such priors in image deconvolution to model not only images but also kernels, combining the ideas of traditional learningfree deconvolution methods with neural nets. In this paper, we show that DIKP improve the performance of learningfree image deconvolution, and we experimentally demonstrate this on the standard benchmark of six standard test images in terms of PSNR and visual effects.
1 Introduction
Image restoration is a long studied and challenging problem that aims to restore a degraded image to its original form [1]. One way to model the processes of image degradation is convolution with translational invariance [46]
(1) 
where is the original image, is the convolution kernel, is the additive noise, is the degraded image, and denotes the number of channels in the images ( for greyscale images and for color images). Image deconvolution is the process of recovering the original image from the observed degraded image , i.e. the inverse process of convolutional image degradation. This work focuses on image deconvolution in two different settings: kernelknown and kernelunknown (a.k.a. blind deconvolution).
Kernelknown: The preliminary stage of image deconvolution mainly considers the case where the convolution kernel is given [37], i.e. recovering with knowing in Equation 1. This problem is illposed, because simply applying the inverse of the convolution operation on degraded image with kernel , i.e. , gives an inverted noise term , which dominates the solution [16].
Blind deconvolution: In reality, we can hardly obtain the detailed kernel information and the deconvolution problem is formulated in a blind setting [25]. More concisely, blind deconvolution is to recover without knowing . This task is much more challenging than it is under nonblind settings, because the observed information becomes less and the domains of the variables become larger [5].
In image deconvolution, prior information on unknown images and kernels (in blind settings) can significantly improve the deconvolved results. A traditional representation for such prior information is handcrafted regularizers in image energy minimization [12], e.g. total variation (TV) regularization for image sharpness [5] and regularization for kernel sparsity [38, 43]. However, prior representations like the abovementioned regularizers have limited ability of expressiveness [27]. Therefore, this work aims to find better prior representations of images and kernels to improve deconvolution performances.
Deep neural architecture has a strong capability to accommodate and express information because of its intricate and flexible structure [40]. Compared to other image prior representations with limited structures (e.g. regularizers), neural nets with such powerful expressiveness seem more capable of capturing higherlevel prior of natural images and degradation kernels. Deep image prior (DIP) [42] is a neuralbased image prior representation which achieved good performance in various image restoration problems. The main idea of DIP is to substitute image variable in an energy function by the output of a deep convolutional neural net (ConvNet) with random noise inputs, so that the image prior can be captured by the hyperparameter of the ConvNet, and the output image is determined by the parameter of the ConvNet. One point to emphasize here is that priors expressed by both handcrafted regularizers and DIP are embodied in their own formulations or structures, which does not require large datasets for training. In the existing applications (incl. denoising, inpainting, etc.) of DIP, the degradation processes are considered as known. In this paper, we are the first to show that deep priors perform well in image deconvolution. Furthermore, we show that ConvNets can be utilized as a source of prior knowledge not only for natural images but also for degradation kernels (named as deep kernel prior, DKP), bridging the gap between traditional methods and deep neural nets. Through experiments we demonstrate that our deep image and kernel priors (DIKP) result in a significant improvement over traditional learningfree regularizationbased priors in image deconvolution^{1}^{1}1We do not show any results from supervised deep network techniques because our method is unsupervised and our objective is to prove that our deep priors are better than handcrafted priors in image deconvolution..
2 Related work
The earliest traditional methods of image deconvolution include RichardsonLucy (RL) method [32] and Weiner Filtering [45]. Due to their simplicity and efficiency, these two methods are still widely used today, but they may be subject to ringing artifacts [30]. To solve this, many refinements based on handcrafted regularization priors came out. [8] adopted TV regularizer as prior in kernelknown deconvolution. [48] proposed a progressive multiscale optimization method based on RL method, with edgepreserving regularization as the image prior. For degradation kernels, early methods [31] only dealt with their simple parametric forms. Later then, natural image statistics were used to estimate kernels [11, 26]. After that, [38, 43] adopted regularizer as kernel prior in blind deconvolution. However, handcrafted priors mentioned above have relatively simple structures, so their expressiveness is rather limited [27].
This work is inspired by traditional image deconvolution methods by handcrafted priors [36, 43], but trying to use deep image priors instead of handcrafted priors. It uses ConvNet to express the prior information of both natural images and degradation kernels, putting kernelknown and blind deconvolution under the same model. Besides, as discussed in [42], its ConvNetbased image prior representation links two sets of popular deconvolution methods: learningbased approaches by ConvNet [46, 49, 28] and learningfree approaches by handcrafted prior [38].
3 Data set and evaluation metrics
As discussed in section 1, capturing image prior by either regularization or deep neural net structures is learningfree. Therefore, data set explored in this work is only used for testing. Experiments and performance evaluation are conducted on a data set with standard test images shown in Figure 2. Those images, along with their preprocessing and evaluation mentioned in the following, are in line with standard practice and widely used in denoising [7], TV deblurring [2], etc., which guarantees the reliability of our results.
3.1 Observed data generation and kernels
To preprocess the image data and obtain degraded observations, we use the degradation model formulated as Equation 1 to transfer the original standard test image to the observed image , illustrated by the diagram in Figure 1. The noise matrix is i.i.d. Gaussian with respect to each entry, and the noise strength (i.e. standard deviation) is fixed at to reduce experimental variables. To explore different kinds of degradation models, three common kernels for different kinds of degradation, Gaussian kernel [17], defocus [16] and motion blur [47] are used to generate the data set.
Gaussian: The kernel for degradation caused by atmospheric turbulence can be described as a twodimensional Gaussian function [19, 33], and the entries of the unscaled kernel are given by the formula [16]
where is the center of , and determines the width of the kernel (i.e. standard deviation of the Gaussian). In this work, and are set to .
Defocus: Outoffocus is another issue in optical imaging. Knowledge of the physical process that causes outoffocus provides an explicit formulation of the kernel [16]
where denotes the radius of the kernel, which is set to in this work.
Motion blur: This happens if an image being recorded changes in a single exposure when taking a photograph. For example, when taking a picture, moving objects being taken at high speed or lens shake will blur the picture. In noiseless case, the convolution processes of motion blur with amplitude and shifting angle are given by the formula [21]
in which the shape of the kernel is a line segment as Figure 3 shows. In this work, the blur amplitude and shifting angle are set as and .
3.2 Evaluation metrics
We use the Mean Square Error (MSE) between the degraded image variable and the observation
to measure the energy function [42] and to track parameter iterations in the first experiment (see subsection 5.2). Using this metric, to minimize the energy is to find the image that, when degraded, is the same as the observation .
To measure image deconvolution quantitatively, we use the Peak Signal to Noise Ratio (PSNR) (in dB) [18] between the image variable and the standard test image
where is the maximum possible pixel value of the image, e.g. if images in doubleprecision floatingpoint data type, if in bit data type. In this work, we use doubleprecision floatingpoint data type, i.e. .
In subsection 5.3, we compare the gradient distributions among output images and standard test images. To measure the similarity between a gradient frequency distribution and one by standard test images , we use the KullbackLeibler (KL) divergence [24]
where denotes a bin corresponding to a range of gradient values, is the whole bin set covering all possible gradient values. From the definition, the similarity between two distributions and their KL divergence are negatively correlated.
4 Methodology
According to section 1, both regularizationbased prior and deep image prior are embedded in energy minimization models, which, in general, are formulated as [12]
(2) 
where indicates the energy term associated with the data, and is the prior term. A general explanation of the energy term is the numerical difference between the given image data and the image variable processed by given degradation. For image deconvolution, the degradation operator is convolution, therefore the energy is designed as . The energy term can also be designed for other tasks in image restoration, such as inpainting [39], superresolution [14] and image denoising [36]. Methods adopted in this work are all based on the deconvolution energy model and its mutants.
4.1 Baseline models with regularization prior
The gradient magnitude of a twodimensional function is defined and formulated as the following [15]
the discrete formulation of which for an image is given by the following matrix
where the square and the square root calculations are entrywise, and is the discrete partial derivative operator (see [16, Chap. 7] and [3, Sec. 2] for its formal definition and its specified usage in this paper, respectively).
In image processing, discrete gradient magnitudes are proven to be a strong prior to natural images [38, 16]. The sum of such magnitudes in a single image is a regularization representation of the image prior, i.e. total variation norm
The efficiency of TV norm has been proven for recovering blocky images [10] and images with sharp edges [6].
It is also known that norm is capable of expressing the sparsity of matrices [13], defined as
In most instances, degradation convolution kernels are sparse [38]. Thus sparsity regularization is a strong prior to convolution kernels in blind settings.
The baseline models in this work are energy minimization with TV and regularization priors, of which the details in the two main settings are as follows.
Kernelknown: The baseline model with known is formulated as the following energy minimization model with TV regularization prior
(3) 
where is the TV regularization parameter. To solve the TV regularization system efficiently, we adopt a fast gradientbased algorithm named MFISTA [2] , which has performed remarkable timeefficiency and convergence property in TV regularization.
Blind deconvolution: The baseline system in blind setting introduces a new sparsity prior compared to the nonblind baseline above, which is formulated as
(4) 
where is the regularization parameter. This TV doubleprior system can be solved using TNIPMFISTA algorithm proposed in [43]. To optimize both the image and the kernel, this algorithm adopts fixupdate iterations between MFISTA and an regularization algorithm named Truncated Newton Interior Point method (TNIP) [22].
4.2 Deconvolution with DIKP
DIKP aim to capture the priors of images/kernels by the structures of generative deep neural nets. Taking image variable as an example, it reparameterises the image as the neural net output , defined as the following surjection
where denotes the support^{2}^{2}2 [35], where is the sample space of noise vector . of the input noise probability density function , denotes the weight space determined by the network structure, and is the solution space of , containing the prior information. The neural net maps the random noise network input and the network weights to the output . Ideally, by adjusting the network structure to its optimum, the solution space only contains images on desired prior information.
From the perspective of mechanics, the desired prior is expressed by the network structure, and the weights explores solutions on the prior. The random input noise is a highdimensional Gaussian. The main reason to take a random noise as the network input is to increase the robustness [29] to overcome degeneracy issues. On the other hand, highdimensional Gaussian vectors are essentially concentrated uniformly in a sphere [20]. Therefore the input space can be approximated as a single point, and the surjection can be rewritten with the input space eliminated
which maps only a selection of parameters on the network, to an output image . In the rest of the report, denotes output image by deep image prior with weight .
4.2.1 Energy functions of DIKP deconvolution
Traditional energy minimization (formulated as Equation 2) for image deconvolution explores the whole image space as the domain. By reparameterising the image term into the neural net output , the solution space contains the prior information expressed by the structure of , instead of the prior term . Thereby with deep image prior the general energy model by Equation 2 turns into
(5) 
By optimizing network weights on a ideal structure, an image is optimized conditioned on the desired prior.
Kernelknown image deconvolution objective with deep image prior is derived directly from Equation 5, by applying the deconvolution energy function
(6) 
where is the observed kernel. The minimizer is obtained by Adam optimizer [23] with random initialization.
Blind deconvolution: In blind settings, the convolution kernel is assumed to be unobservable. Thereby the kernel is parameterised by another deep neural net structure containing prior information regarding degradation kernels. After parameterisation on kernel matrix in Equation 6, the blind deconvolution objective with deep image prior is formulated as the following system
(7) 
where and have different ConvNet structures since the prior information of natural images and kernels are apparently different. To obtain the minimizers and , we use Adam to update the two variables simultaneously.
5 Experiments
To explore to what extent deep priors can capture prior knowledge of natural images in deconvolution models, we \@setpar compare the energy convergence property during DIKP deconvolution optimization between natural images and noise images; compare the gradient distributions among standard test images and images from both baseline model and DIKP.\@noitemerr This part of experiments aims to evaluate DIKP’s expressiveness on natural images, therefore it is only conducted in kernelknown setting, i.e. DKP is deactivated. The second part of our experiment aims to find out whether our proposed DIKP deconvolution models improve the performance of image deconvolution in both kernelknown and blind settings, compared with the baselines. In our results, PSNR comparison is conducted for quantitative analysis on deconvolution performance, and qualitative analysis is based on the presented images.
5.1 Experiment Setup
Convolution: Convolution processes in this paper, including data generation and energy calculations, are subject to reflexive boundary condition [16]. Specifically, for color images, all channels share the same kernel [16].
Baseline: In kernelknown setting, the TV regularization parameter is set to , within a reasonable range for image deconvolution according to [2]. In blind setting, the regularization parameters are set to and as the same in [43], among the experiments of which such setting achieved the best results.
ConvNet architecture as DIKP: As suggested for superresolution setting in [42], we use hourglass architecture (shown in Figure 5) as the main body of DIKP, whose hyperparameter settings are shown as follows
For images: ; ; ; ; ; upsample stride size ; Sigmoid to output. For kernels (if blind): ; ; ; ; ; upsample stride size ; Softmax to output.
We put Sigmoid and Softmax on ConvNet outputs for images and kernels respectively, because image pixels range from and kernel pixels sum to . The reason for setting upsample stride size to for kernel generation is to prevent degeneration due to their small size (). It is worth mentioning that we apply addnoise regularization to the neural network, i.e. we disturb the noise input with an additive Gaussian at the beginning of each iteration. This technique aims to increase model robustness to perturbation [29]. Although this regularization has a negative impact on the optimization process, we find that the network can still converge the energy to with a sufficient number of iterations and improve deconvolution performance.
5.2 Bias in convergence
Even though the complex structure of the neural network in a DIKP model allows the solution space to have a variety of features regarding natural images, it is still possible for the DIKP model to express interference information other than natural images [40], e.g. noise. Therefore, we introduce noise into our experiments, using our DIKP kernelknown model on natural images (incl. greyscale and color images) and noise respectively. By comparing the convergence property of the energy functions on the two in the optimization process, we can know whether our model can block such interference information in its solution space.
In our control experiment, we decide to use Gaussian white noise and uniform noise, generated from Gaussian and uniform . Figure 6 shows the optimization curves of energy values with respect to iterations in DIKP kernelknown deconvolution, where each plot corresponds to each degradation kernel. In spite of the Gaussian kernel, energy value convergence shows obvious differences between natural images and noise in DIKP deconvolution with defocus and motion blur kernels. More specifically, we observe that curves by the noise are clearly above those by natural images, and sudden leaps take place for energy values by noise in both plots. We speculate, the cause of this observation is that, the ConvNet structures in DIKP are unstable to parameter fluctuations for noise generation, which also explains how DIKP deconvolution blocks noise information. For the Gaussian, although in Figure 6 we cannot see a wild difference between noise and natural images, in Figure 7 we can still observe that the energy value by the uniform noise converges slower than that by natural images in early iterations, which also indicates that DIKP model blocks uniform noise in Gaussian degraded deconvolution.
The DIKP deconvolution in the control experiments with noise indeed shows biases to natural images from the perspective of energy function convergence, which means in most cases, DIKP are capable of blocking interference and irrelevant information in image deconvolution.
5.3 Image gradient distributions
C.man  house  Lena  boat  house.c  peppers  avg.  

Gaussian  reg  24.108  29.541  29.663  26.353  27.842  28.550  27.676 
Ours 
25.093 
30.745 
30.705 
27.436 
29.021 
28.827 
28.638 

Defocus  reg  23.841  29.053  29.164  25.874  27.488  28.210  27.272 
Ours 
25.688 
30.473 
30.355 
27.480 
29.594 
29.089 
28.780 

Motion blur  reg  \colorred6.921  \colorred6.142  \colorred5.251  \colorred6.268  \colorred6.172  \colorred5.697  \colorred6.075 
Ours 
27.089 
31.566 
31.801 
28.435 
30.007 
29.661 
29.760 
C.man  house  Lena  boat  house.c  peppers  avg.  

Gaussian  reg  19.553  14.214 
29.798 
26.323 
14.662 
24.790 
21.557 
Ours 
23.230 
27.748 
26.094  24.977 
27.122 
21.347 
25.086 

Defocus  reg  18.845  13.519 
27.435 
24.035  13.849  24.782  20.411 
Ours 
23.021 
23.094 
26.286 
25.154 
24.462 
28.229 
25.041 

Motion blur  reg  16.835  12.865  25.304  22.625  15.295  22.207  19.189 
Ours 
23.935 
24.382 
26.156 
25.039 
22.862 
26.152 
24.754 
Previous image statistics studies [44, 34] have shown that natural image gradients follow heavytailed distributions, which provide a natural prior for natural images. Starting from this, we consider evaluating the gradient distributions of our modelgenerated images with a “standard” distribution which can be assumed as the natural prior.
With notations in subsection 4.1, the gradients of image can be defined as matrices (horizontal) and (vertical) [9], where each element is a gradient value. In this experiment, we calculate the image gradient value distributions in image sets, standard test images, images by the baseline model and images by the DIKP model. The estimated probability distribution from frequency for each set is denoted by , and , where is assumed to be the “standard” distribution. Therefore between the distributions by the modelgenerated image sets, the one with greater similarity to the “standard” distribution is more in line with the natural prior.
Since the values of image gradients are continuous because of their doubleprecision floatingpoint data type, we split the range of gradient values into disjoint bins and count the number of gradient values that fall in each bin as the frequency. Figure 8 plots the logarithm probability distribution for each image set. Since the plot is in log scale, we can infer that all the three distributions have the heavytailed property, and their logprobability curves are similar in shape to each other. The peak closeup in the distribution shows a decreasing order of baselineDIKPstandard in terms of logprobability, the gradient values in which lie around . This shows that the density of the baseline and DIKP model where the gradient values are close to is larger than the standard images, and further speaking, the DIKP model performs closer to the standard than the baseline in this range. However, the closeup in the middle of peak and tail gives an order of standardbaselineDIKP, which indicates the exact opposite to the above peakrange results. The results above are in expectation because the TV regularizer in the baseline tends to reduce image gradient values due to the property of TV norm [4] and thereby gives high frequency where gradients are close to , and low frequency outside of peak range, which also illustrates DIKP’s better performance in high frequency gradients.
Overall, the KL divergence between gradient distributions of DIKPgenerated images and standard test images is , while for the baseline, . This indicates that DIKP have a greater similarity to the “standard” than the baseline in terms of gradient distribution. The result is foreseeable because although the baseline performs closer to the standard than DIKP in the middle range, DIKP perform closer to the standard in the peak with much higher frequency.
5.4 Performance on deconvolution
We run our baselines and DIKP models on degraded images ( degradation kernels on standard test images) in both kernelknown and blind settings. Then we compute the PSNR between generated results and original standard test images, and visualize some of the results for quantitative and qualitative comparison respectively.
Shown in Table 1 and Table 2 are PSNR comparisons between baseline and deep priors for kernelknown and blind deconvolution respectively. Overall, our DIKP deconvolution models always perform better than baseline models in terms of average PSNR on different degradation kernels. In kernelknown setting, DIKP even give a larger PSNR value on every single degraded image. Particularly, when the kernel type is set as motion blur, the baseline gives unexpectedly bad results as shown by the PSNR values marked in red in Table 1. We suspect this is because TV regularizer overfits the gradient prior on the motion deblur, so that the nonedge regions of the image tend to be in the same pixel value (see Figure 9). When the kernel is set to Gaussian or defocus, the performance is improved by around in terms of PSNR as we expect. In blind setting, DIKP improve the PSNR performance by around , which is significantly beyond the performance of the baseline. However, baseline gives higher PSNR values than the deep image prior for a few pictures and kernel types, such as Lena degraded by Gaussian or defocus. A possible reason is that the gradient values in Lena are relatively small, so that TV regularization gives better results on this specific image.
Figure 10 visualizes the comparison between images restored from Gaussian degraded Lena and defocused house.c in kernelknown setting. From the pictures and their closeups, we see that DIKP perform better in detail recovery. For example, the hair in Figure 10 has only a clear outline, while the details shown in Figure 10 are more abundant as well as the trees shown in Figure 10 compared with Figure 10. One possible explanation is that TV regularizer overoptimizes the sharpness of images, resulting in good performance only in outlines but not in detail.
In spite of the two kernels above, DIKP achieve remarkable results especially in motion blur deconvolution. Figure 9 visualizes the comparison between images restored from motion blurred C.man in both settings. As mentioned previously, kernelknown baseline gives an unsatisfactory result (Figure 9), where only the basic outline of the cameraman can be observed, and all other details inside the image are lost, while kernelknown DIKP restore the image almost perfectly as shown in Figure 9. For blind motion deblurring on C.man, The result (Figure 9) given by baseline still has motion blur, and the shape of its kernel is completely different from motion blur, while DIKP remove motion blur efficiently and the shape of its kernel is much closer to motion blur than the baseline (see Figure 9), which also verifies ConvNet’s expressiveness on degradation kernels.
6 Conclusions
We investigate deep ConvNet’s expressiveness on the prior information of natural images and degradation kernels in DIKP image deconvolution, and present its performance in both kernelknown and blind settings. More importantly, we propose DIKPbased energy minimization pipelines for image deconvolution in the two settings, and achieve performance which is far beyond our baselines [2, 43]. Our motivation is to adopt DIKP with more complex structures to express image prior information based on the idea of traditional learningfree optimization methods, and at the same time to improve image deconvolution performance by traditional learningfree methods. Through the first two experiments, we prove that the ConvNet structures of DIKP capture strong prior information on natural images in terms of generation types and gradient distributions. In the final experiment, we show the significant improvement by DIKP models compared with the baselines in terms of both PSNR values and visual effects, especially for motionblurred images. However, we verify DIKP’s expressiveness on degradation kernels only by an adjusted hourglass structure. It is hard to associate kernel features and deep neural structures intuitively. Therefore, future endeavours in this topic should focus on the structures of DIKP generating kernels, trying other hyperparameters on hourglass, or other ConvNet structures, e.g. texture nets [41]. Besides, as applied in [38], the formulation of energy functions may be adjusted with gradient terms to become more suitable for this task.
Acknowledgements. We thank Yusheng Tian for helpful changes and Prof. Steve Renals for organizing this project.
References
 [1] (2014) A study on the importance of image processing and its applications. IJRET: International Journal of Research in Engineering and Technology 3. Cited by: §1.
 [2] (2009) Fast gradientbased algorithms for constrained total variation image denoising and deblurring problems. IEEE Transactions on Image Processing 18 (11), pp. 2419–2434. Cited by: §3, §4.1, §5.1, §6.
 [3] (2004) An algorithm for total variation minimization and applications. Journal of Mathematical imaging and vision 20 (12), pp. 89–97. Cited by: §4.1.
 [4] (2005) Recent developments in total variation image restoration. Mathematical Models of Computer Vision 17 (2). Cited by: §5.3.
 [5] (1998) Total variation blind deconvolution. IEEE transactions on Image Processing 7 (3), pp. 370–375. Cited by: §1, §1.
 [6] (2000) Highorder total variationbased image restoration. SIAM Journal on Scientific Computing 22 (2), pp. 503–516. Cited by: §4.1.
 [7] (2007) Video denoising by sparse 3d transformdomain collaborative filtering. In 2007 15th European Signal Processing Conference, pp. 145–149. Cited by: §3.
 [8] (2006) Richardson–lucy algorithm with total variation regularization for 3d confocal microscope deconvolution. Microscopy research and technique 69 (4), pp. 260–266. Cited by: §2.
 [9] (1986) A note on the gradient of a multiimage. Computer vision, graphics, and image processing 33 (1), pp. 116–125. Cited by: §5.3.
 [10] (1996) Recovery of blocky images from noisy and blurred data. SIAM Journal on Applied Mathematics 56 (4), pp. 1181–1198. Cited by: §4.1.
 [11] (2006) Removing camera shake from a single photograph. In ACM transactions on graphics (TOG), Vol. 25, pp. 787–794. Cited by: §2.
 [12] (2010) A convergent overlapping domain decomposition method for total variation minimization. Numerische Mathematik 116 (4), pp. 645–685. Cited by: §1, §4.
 [13] (2001) The elements of statistical learning. Vol. 1, Springer series in statistics New York, NY, USA:. Cited by: §4.1.
 [14] (1974) Superresolution through error energy reduction. Optica Acta: International Journal of Optics 21 (9), pp. 709–720. Cited by: §4.
 [15] (1977) Digital image processing(book). Reading, Mass., AddisonWesley Publishing Co., Inc.(Applied Mathematics and Computation (13), pp. 451. Cited by: §4.1.
 [16] (2006) Deblurring images: matrices, spectra, and filtering. Vol. 3, Siam. Cited by: §1, §3.1, §3.1, §3.1, §3.1, §4.1, §4.1, §5.1.
 [17] (1987) Deblurring gaussian blur. Computer Vision, Graphics, and Image Processing 38 (1), pp. 66–80. Cited by: §3.1.
 [18] (2008) Scope of validity of psnr in image/video quality assessment. Electronics letters 44 (13), pp. 800–801. Cited by: §3.2.
 [19] (1989) Fundamentals of digital image processing. Englewood Cliffs, NJ: Prentice Hall,. Cited by: §3.1.
 [20] (2006) High dimensional statistical inference and random matrices. arXiv preprint math/0611589. Cited by: §4.2.
 [21] (2009) DCTbased local motion blur detection. In International Conference on Instrumentation, Communication, Information Technology, and Biomedical Engineering 2009, pp. 1–6. Cited by: §3.1.
 [22] (2007) An efficient method for compressed sensing. In Image Processing, 2007. ICIP 2007. IEEE International Conference on, Vol. 3, pp. III–117. Cited by: §4.1.
 [23] (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §4.2.1.
 [24] (1997) Information theory and statistics. Courier Corporation. Cited by: §3.2.
 [25] (1996) Blind image deconvolution. IEEE signal processing magazine 13 (3), pp. 43–64. Cited by: §1.
 [26] (2007) Blind motion deblurring using image statistics. In Advances in Neural Information Processing Systems, pp. 841–848. Cited by: §2.
 [27] (2009) Classification via group sparsity promoting regularization. Cited by: §1, §2.
 [28] (2016) Image denoising using very deep fully convolutional encoderdecoder networks with symmetric skip connections. CoRR abs/1603.09056. External Links: Link, 1603.09056 Cited by: §2, Figure 5.
 [29] (2007) Adding noise to improve noise robustness in speech recognition. In Eighth Annual Conference of the International Speech Communication Association, Cited by: §4.2, §5.1.
 [30] (2001) Digital signal processing: principles algorithms and applications. Pearson Education India. Cited by: §2.
 [31] (1992) Blur identification by the method of generalized crossvalidation. IEEE Transactions on Image Processing 1 (3), pp. 301–311. Cited by: §2.
 [32] (1972) Bayesianbased iterative method of image restoration. JOSA 62 (1), pp. 55–59. Cited by: §2.
 [33] (2018) Imaging through turbulence. CRC press. Cited by: §3.1.
 [34] (200506) Fields of experts: a framework for learning image priors. In IEEE Conf. on Computer Vision and Pattern Recognition, Vol. 2, pp. 860–867. Cited by: §5.3.
 [35] (1988) Real analysis. Vol. 32, Macmillan New York. Cited by: footnote 2.
 [36] (1992) Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena 60 (14), pp. 259–268. Cited by: §2, §4.
 [37] (1990) Survey of recent developments in digital image restoration. Optical Engineering 29 (5), pp. 393–405. Cited by: §1.
 [38] (2008) Highquality motion deblurring from a single image. In Acm transactions on graphics (tog), Vol. 27, pp. 73. Cited by: §1, §2, §2, §4.1, §4.1, §6.
 [39] (2003) Euler’s elastica and curvaturebased inpainting. SIAM journal on Applied Mathematics 63 (2), pp. 564–592. Cited by: §4.
 [40] (2013) Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199. Cited by: §1, §5.2.
 [41] (2016) Texture networks: feedforward synthesis of textures and stylized images.. In ICML, Vol. 1, pp. 4. Cited by: §6.
 [42] (2018) Deep image prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9446–9454. Cited by: §1, §2, §3.2, §5.1.
 [43] (201706) An iterative method for image deblurring based on total variation and compressed sensing. Bachelor’s Thesis, School of Mathematical Sciences, Fudan University, 220 Handan Rd., Yangpu District, Shanghai, China. Cited by: §1, §2, §2, §4.1, §5.1, §6.
 [44] (2007) What makes a good model of natural images?. In CVPR, Cited by: §5.3.
 [45] (1949) Extrapolation, interpolation and smoothing of stationary time serieswith engineering applications’ mit press. Cited by: §2.
 [46] (2014) Deep convolutional neural network for image deconvolution. In Advances in Neural Information Processing Systems, pp. 1790–1798. Cited by: §1, §2.
 [47] (1997) Identification of blur parameters from motion blurred images. Graphical models and image processing 59 (5), pp. 310–320. Cited by: §3.1.
 [48] (2008) Progressive interscale and intrascale nonblind image deconvolution. In Acm Transactions on Graphics (TOG), Vol. 27, pp. 74. Cited by: §2.
 [49] (2017) Learning deep cnn denoiser prior for image restoration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3929–3938. Cited by: §2.