Modelblind Video Denoising Via Frametoframe Training
Abstract
Modeling the processing chain that has produced a video is a difficult reverse engineering task, even when the camera is available. This makes model based video processing a still more complex task. In this paper we propose a fully blind video denoising method, with two versions offline and online. This is achieved by finetuning a pretrained AWGN denoising network to the video with a novel frametoframe training strategy. Our denoiser can be used without knowledge of the origin of the video or burst and the post processing steps applied from the camera sensor. The online process only requires a couple of frames before achieving visuallypleasing results for a wide range of perturbations. It nonetheless reaches state of the art performance for standard Gaussian noise, and can be used offline with still better performance.
1 Introduction
Denoising is a fundamental image and video processing problem. While the performance of denoising methods and imaging sensors has steadily improved over decades of research, new challenges have also appeared. Highend cameras still acquire noisy images in low lightning conditions. High speed video cameras use short exposure times, reducing the SNR of the captured frames. Cheaper, lower quality sensors are used extensively, for example in mobile phones or surveillance cameras, and require denoising even with a good scene illumination.
A plethora of approaches have been proposed for image and video denoising: PDE and variational methods [36, 7], bilateral filters [41], domain transform methods [31, 33], nonlocal patchbased methods [3]. In the last decade, most research focused on modeling image patches [51, 45], [15] or groups of similar patches [13, 27, 22, 17], [5]. Recently the focus has shifted towards neural networks.
The first neural network with results competitive with patchbased methods was introduced in [5], and consisted of a fully connected network trained to denoise image patches. More recently, [47] proposed a deep CNN with 17 to 20 convolutional layers with filters and reported a significant improvement over the stateoftheart. The authors also trained a blind denoising network that can denoise an image with an unknown noise level , and a multitask network that can handle blindly three types of noise. A lighter version of DnCNN was proposed in [49], which allows a spatially variant noise variance by adding the noise variance map as an additional input. The architectures of DnCNN and FFDnet keep the image size throughout the network. Other networks have been proposed [30, 37, 8] that use pulling and upconvolutional layers in a Ushaped architecture [35]. Other works proposed neural networks with an architecture obtained by unrolling optimization algorithms such as those used for MAP inference with MRFs probabilistic models [2, 38, 11, 43]. For textures formed by repetitive patterns, nonlocal patchbased methods still perform better than “local” CNNs. To remedy this, some attempts have been made to include the nonlocal patch similarity in a CNN framework [34, 11, 24, 44, 12].
The most widely adopted assumption in the literature is that of additive white Gaussian noise (AWGN). This is justified by the fact that the noise generated by the photon count process at the imaging sensor can be modeled as Poisson noise, which in turn can be approximated by AWGN after a variance stabilizing transform (VST) [1, 29, 28]. However in many practical applications the data available is not the raw data straight from the sensor. The camera output is the result of a processing pipeline, which can include quantization, demosaicking, gamma correction, compression, etc. The noise at the end of the pipeline is spatially correlated and signal dependent, and it is difficult to model. Furthermore the details of the processes undergone by an image or video are usually unknown. To make things even more difficult, a large amount of images and video are generated by mobile phone applications which apply their own processing of the data (for example compression, of filter or effects selected by the user). The specifics of this processing are unknown, and might change with different releases.
The literature addressing this case is much more limited. The works [23, 16] address denoising noisy compressed images. RF3D [26] handles correlated noise in infrared videos. Datadriven approaches provide an interesting alternative when modelling is not challenging. CNNs have been applied successfully to denoise images with nonGaussian noise [47, 9, 18]. In applications in which the noise type is unknown, one could use modelblind networks such as DnCNN3 [48] trained to denoise several types of noise, or the blind denoiser of [18]. These however have two important limitations. First, the performance of such modelblind network drops with respect to modelspecific networks [48]. Second, training the network requires a dataset of images corrupted with each type of noise that we wish to remove (or the ability to generate it synthetically [18]). Generating ground truth data for real photographs is not straightforward [32, 9]. Furthermore, in many occasions we do not have access to the camera, and a single image or a video is all that we have.
In this work we show that, for certain kinds of noise, in the context of video denoising one video is enough: a network can be trained from a single noisy video by considering the video itself as a dataset. Our approach is inspired by two works: the oneshot object segmentation method [6] and the noisetonoise training proposed in the context of denoising by [25].
The aim of oneshot learning is to train a classifier network to classify a new class with only a very limited amount of labeled examples. Recently Caelles et al. [6] suggested a oneshot framework for object segmentation in video, where an object is manually segmented on the first frame and the objective is to segment it in the rest of the frames. Their main contribution is the use of a pretrained classification network, which is finetuned to a manual segmentation of the first frame. This finetuned network is then able to segment the object in the rest of the frames. This generalizes the oneshot principle from classification to other types of problems. Borrowing the concept from [6], our work can be interpreted as a oneshot blind video denoising method: a network can denoise an unseen noise type by finetuning it to a single video. In our case however, we do not require “labels” (i.e. the ground truth images without noise). Instead, we benefit from the noisetonoise training proposed by [25]: a denoising network can be trained by penalizing the loss between the predicted output given a noisy and a second noisy version of the same image, with an independent realization of the noise. We benefit from the temporal redundancy of videos and use the noisetonoise training between adjacent frames to finetune a pretrained denoising network. That is, the network is trained by minimizing the error between the predicted frame and the past (or future) frame. The noise used to pretrain the network can be very different from the type of noise in the video.
We present the different tools, namely one of the stateoftheart denoising network DnCNN [48] and a training principle for denoising called noise2noise [25], necessary to derive our refined model in Section 2. We present our truly blind denoising principle in Section 3. We compare the quality of our blind denoiser to the state of the art in Section 4. Finally we conclude and open new perspectives for this type of denoising in Section 5.
2 Preliminaries
The proposed modelblind denoiser builds upon DnCNN and the noisetonoise training. In this section we provide a brief review of these works, plus some other related work.
2.1 DnCNN
DnCNN [48] was the first neural network to report a significant improvement over patchbased methods such as BM3D [13] and WNNM [17]. It has a simple architecture inspired by the VGG network [39], consisting of 17 convolutional layers. The first layer consists of 64 followed by ReLU activations and outputs feature maps. The next 15 layers also compute 64 convolutions, followed by batch normalization [19] and ReLU. The output layer is simply a convolutional layer.
To improve training, in addition to the batch normalization layers, DnCNN uses residual learning, which means that network is trained to predict the noise in the input image instead of the clean image. The intuition behind this is that if the mapping from the noisy input to the clean target is close to the identity function, then it is easier for the network to learn the residual mapping, .
DnCNN provides stateoftheart image denoising for Gaussian noise with a rather simple architecture. For this reason we will use it for all our experiments.
2.2 Noisetonoise training
The usual approach for training a neural network for denoising (or other image restoration problems) is to synthesize a degraded image from a clean one according to a noise model. Training is then achieved by minimizing the empirical risk which penalizes the loss between the network prediction and the clean target . This method cannot be applied for many practical cases where the noise model is not known. In these settings, noise can not be synthetically added to a clean image. One can generate noisy data by acquiring it (for example by taking pictures with a camera), but the corresponding clean targets are unknown, or are hard to acquire [10, 32].
Lehtinen et al. [25] recently pointed out that for certain types of noise it is possible to train a denoising network from pairs of noisy images corresponding to the same clean underlying data and independent noise realizations, thus eliminating the need for clean data. This allows to learn networks for noise that cannot be easily modeled (an appropriate choice of the loss is still necessary though so that the network converge to a good denoising).
Assume that the pairs are distributed according to . For a dataset of infinite size, the empirical risk of an estimator converges to the Bayesian risk, i.e. the expected loss: . The optimal estimator depends on the choice of the loss. From Bayesian estimation theory [20] we know that:^{1}^{1}1The median and mode are taken elementwise. For a continuous random variable the loss is defined as a limit. See [20] and [25].
(1)  
(2)  
(3) 
Here denotes by the expectation of the posterior distribution given the noisy observation . During training, the network learns to approximate the mapping .
The key observation leading to noisetonoise training is that the same optimal estimators apply when the loss is computed between and , a second noisy version of . In this case we obtain the mean, median and mode of the posterior . Then, for example if the noise is such that , then the network can be trained by minimizing the MSE loss between and a second noisy observation . If the median (resp. the mode) is preserved by the noise, then the loss (resp. the ) loss can be used.
3 Modelblind video denoising
In this section we show how one can use a pretrained denoising network learned for an arbitrary noise and finetune it to other target noise types using a single video sequence, attaining the same performance as a network trained specifically for the target noise. This fine tuning can be done offline (using the whole video as a dataset) or online, i.e. framebyframe, depending on the application and the computational resources at hand.
Our approach is inspired by the oneshot video object segmentation approach of [6], where a classification network is finetuned using the manually segmented first frame, and then applied to the other frames. As opposed to the segmentation problem, we do not assume that we have a ground truth (clean frames). Instead, we adapt the noisetonoise training to a single video.
We need pairs of independent noisy observations of the same underlying clean image. For that we take advantage of the temporal redundancy in videos: we consider consecutive frames as observations of the same underlying clean signal transformed by the motion in the scene. To account for the motion we need to estimate it and warp one frame to the other. We estimate the motion using an optical flow. We use the TVL1 optical flow [46] with an implementation available in [40]. This method is reasonably fast and is quite robust to noise when the flow is computed at a coarser scale.
Let us denote by the optical flow from frame to frame . The warped is then (we use bicubic interpolation). Similarly, we define the warped clean frame . We assume

that the warped clean frame matches , i.e. , and

that the noise of consecutive frames is independent.
Occluded pixels in the backward flow from to do not have a correspondence in frame . Nevertheless, the optical flow assigns them a value. We use a simple occlusion detector to eliminate these false correspondences from our loss. A simple way to detect occlusions is to determine regions where the divergence of the optical flow is large [4]. We therefore define a binary occlusion mask as
(4) 
Pixels with an optical flow that points out of the image domain are considered occluded. In practice, we compute a more conservative occlusion mask by dilating the result of Eq. (4).
We then compute the loss masking out occluded pixels. For example, for the loss we have:
(5) 
Similarly one can define masked versions of other losses. For all the experiments shown we used the masked loss since it has better training properties than the (as has been demonstrated in [50]). In the noisetonoise setting, the choice of the loss depends on the properties of the noise [25]. All the noise types considered in this work preserve the median of the posterior distribution, which justifies the use of an .
We now have pairs of images and the corresponding occlusions masks and we apply the noisetonoise principle to finetune the network on this dataset. In order to increase the number of training samples the symmetric warping can also be done, i.e. warping to using the forward optical flow from to . This allows to double the amount of data used for the fine tuning. We consider two settings: offline and online training.
Offline finetuning.
We denote the network as a parametrized function , where is the parameter vector. In the offline setting we finetune the network parameters by doing a fixed number of steps of the minimization of the masked loss over all frames in the video:
(6) 
where by we denote an operator which does optimization steps of function starting from and following a given optimization algorithm (for instance gradient descent, Adam [21], etc.). The initial condition for the optimization is the parameter vector of the pretrained network. The finetuned network is then applied to the rest of the video.
Online finetuning
In the online setting we train the network in a framebyframe fashion. As a consequence we denoise each frame with a different parameter vector . At frame we compute by doing optimization steps corresponding to the minimization of the loss between frames and :
(7) 
The initial condition for this iteration is given by the finetuned parameter vector at the previous frame . The first frame is denoised using the pretrained network. The finetuning starts for the second frame. A reasonable concern is that the network overfits the given realization of the noise and the frame at each step. This is indeed the case if we use a large number of optimization iterations at a single frame. A similar behavior is reported in [42], which trains a network to minimize the loss on a single data point. We prevent this from happening by using a small number of iterations (e.g. ). We have observed that the parameters finetuned at can be applied to denoise any other frame without any significant drop in performance.
4 Experiments
In this section we demonstrate the flexibility of the proposed finetuning blind denoising approach with several experimental results. For all these experiments the starting point for the finetuning process is a DnCNN network trained for an additive white Gaussian noise of standard variation . In all cases we use the same hyperparameters for the fine tuning: a learning rate of and iterations of the Adam optimizer. For the offline case we use the entire video. The videos used in this section come from Derf’s database^{2}^{2}2https://media.xiph.org/video/derf/. They’ve been converted to grayscale by averaging the three color channels and downscaled by a factor two in each direction to ensure that they contain little to no noise. The code and data to reproduce the results presented in this section are available on https://github.com/tehret/blinddenoising.
To the best of our knowledge there is not any other blind video denoising method in the literature. We will compare with stateoftheart methods on different types of noise. Most methods have been crafted (or trained) for a specific noise model and often a specific noise level. We will also compare with an image denoising method proposed by Lebrun et al. [23] which assumes a Gaussian noise model with variance depending on the intensity and the local frequency of the image. This model was proposed for denoising of compressed noisy images. We cannot compare with some more recent blind denoising methods, such as [10], because there is no code available. We will compare with DnCNN [48] and VBM3D [14]. VBM3D is a video denoising altorithm. All the other methods are image denoising applied framebyframe (perspectives for videos are mentioned in Section 5).
The first experiment is to check that our finetuning does not deteriorate a well trained network (for example by overfitting). We applied the proposed learning process to a sequence contaminated with AWGN with standard deviation , which is precisely the type of noise the network was trained on. The perframe PSNR is presented in Figure 2. The offline finetuning performs on par with the pretrained network. The PSNR of the online process has a higher variance, with some significant drops for some frames.
In Figure 3 we show the results obtained still with Gaussian noise, but with . The main point of this experiment is to be able to compare with a reference, namely a DnCNN network trained with . First, we can see that both finetuned networks perform better than the pretrained network for , if fact their performance is as good as the DnCNN network trained specifically for (in fact the offline trained actually performs slightly better than the reference network). Our process also outperforms the “noise clinic” of [23].
We have also tested the proposed finetuning on other types of noise. Figure 4 shows the results for multiplicative Gaussian noise:
where the noise has a standard deviation of (the images are withing the range [0,1]). With this model, the variance depends on the pixel intensity . Results with correlated Gaussian noise of standard deviation (obtained by convolving an additive white Gaussian noise with a disk kernel) are shown in Figure 5. We also show results (Figure 6) with the salt and pepper uniform noise used in [25], obtained by replacing with probability the value of a pixel with a value sampled uniformly in . Finally we show in Figure 7 results for JPEG compressed Gaussian noise, obtained by compressing an image corrupted by an AWGN of with JPEG. The last one is particularly interesting because it is a realistic use case for which the noise model is then hard to estimate. While in this case the noise can be generated synthetically for training a network over a dataset, this is not possible with other compression tools (for example for proprietary technologies). We can see the effectiveness of the finetuning in all examples. The offline training is more stable (smaller variance) and gives slightly better results, although the difference is small.
A visual comparisons with other methods is shown in Figure 8 for JPEG compressed noise and in Figure 9 for AWGN with . The results of the finetuned network has no visible artifacts and produces a visually pleasing result even though the network has never seen this type of noise before the finetuning.
In Tables 1 and 2 we show the PSNR of the results obtained on 4 sequences for AWGN of and JPEG compressed AWGN of and compression factor . For the case of AWGN the finetuned networks attain the performance of the DnCNN trained for that specific noise. For JPEG compressed Gaussian noise, the finetuned network is on average above the pretrained network.
Method  pedestrian area  crowd run  touchdown pass  station  Average 

DnCNN 25  28.06  28.07  28.05  28.04  28.06 
DnCNN 50  32.81  30.51  33.23  32.07  32.16 
Online finetuned  32.77  30.47  33.15  32.01  32.10 
Batch finetuned  32.89  30.54  33.24  32.26  32.23 
VBM3D  29.96  25.35  30.24  29.35  28.73 
Noise Clinic  29.67  29.17  29.17  29.70  29.43 
Method  pedestrian area  crowd run  touchdown pass  station  Average 

DnCNN 25 
33.60  30.76  33.46  32.65  32.62 
Online finetuned  34.14  30.86  34.15  33.09  33.06 
Batch finetuned  34.40  30.88  34.05  33.25  33.15 
VBM3D  34.16  28.95  33.83  33.53  32.62 
Noise Clinic  30.63  29.73  30.46  30.24  30.27 
Figure 10 shows the impact of stopping online finetuning at a frame , and using to process the remaining frame. We can see that the more frames are used for the finetuning the better the performance.
5 Discussion and perspectives
Denoising methods based on deep learning often require large datasets to achieve stateoftheart performance. Lehtinen et al. [25] pointed out that in many cases the clean ground truth images are not necessary, thus simplifying the acquisition of the training datasets. With the framework presented in this paper we take a step further and show that a single video is often enough, removing the need for a dataset of images. By applying a simple frametoframe training on a generic pretrained network (for example a DnCNN network trained for additive Gaussian noise with fixed standard deviation), we successfully denoise a wide range of different noise models even though the network has never seen the video nor the noise model before its finetuning. This opens the possibility to easily process data from any unknown origin.
We think that the current fine tuning process can still be improved. First, given that the application is video denoising, it is expected that better results will be achieved by a video denoising network (the DnCNN network processes each frame independent of the others). Using the temporal information could improve the denoising quality, just like video denoising methods improve over framebyframe image denoising methods, but also might stabilize the variance of the result for the online finetuning.
References
 [1] F. J. Anscombe. The transformation of poisson, binomial and negativebinomial data. Biometrika, 35(3/4):246–254, 1948.
 [2] A. Barbu. Training an active random field for realtime image denoising. IEEE Transactions on Image Processing, 18(11):2451–2462, Nov 2009.
 [3] A. Buades, B. Coll, and J.M. Morel. A nonlocal algorithm for image denoising. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 2, pages 60–65. IEEE, 2005.
 [4] A. Buades, J.L. Lisani, and M. Miladinović. Patchbased video denoising with optical flow estimation. IEEE Transactions on Image Processing, 25(6):2573–2586, June 2016.
 [5] H. C. Burger, C. J. Schuler, and S. Harmeling. Image denoising: Can plain neural networks compete with bm3d? In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 2392–2399, June 2012.
 [6] S. Caelles, K.K. Maninis, J. PontTuset, L. LealTaixé, D. Cremers, and L. Van Gool. Oneshot video object segmentation. In Computer Vision and Pattern Recognition (CVPR), 2017.
 [7] A. Chambolle and P.L. Lions. Image recovery via total variation minimization and related problems. Numerische Mathematik, 76(2):167–188, 1997.
 [8] C. Chen, Q. Chen, J. Xu, and V. Koltun. Learning to see in the dark. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
 [9] C. Chen, Q. Chen, J. Xu, and V. Koltun. Learning to see in the dark. arXiv preprint arXiv:1805.01934, 2018.
 [10] J. Chen, J. Chen, H. Chao, and M. Yang. Image blind denoising with generative adversarial network based noise modeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3155–3164, 2018.
 [11] Y. Chen and T. Pock. Trainable Nonlinear Reaction Diffusion: A Flexible Framework for Fast and Effective Image Restoration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6):1256–1272, 6 2017.
 [12] C. Cruz, A. Foi, V. Katkovnik, and K. Egiazarian. Nonlocalityreinforced convolutional neural networks for image denoising. IEEE Signal Processing Letters, 25(8):1216–1220, Aug 2018.
 [13] K. Dabov and A. Foi. Image denoising with blockmatching and 3D filtering. Electronic …, 6064:1–12, 2006.
 [14] K. Dabov, A. Foi, and K. Egiazarian. Video denoising by sparse 3D transformdomain collaborative filtering. In EUSIPCO, 2007.
 [15] M. Elad and M. Aharon. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image processing, 15(12):3736–3745, 2006.
 [16] M. Gonzalez, J. Preciozzi, P. Muse, and A. Almansa. Joint denoising and decompression using cnn regularization. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2018.
 [17] S. Gu, L. Zhang, W. Zuo, and X. Feng. Weighted nuclear norm minimization with application to image denoising. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2862–2869, 2014.
 [18] S. Guo, Z. Yan, K. Zhang, W. Zuo, and L. Zhang. Toward convolutional blind denoising of real photographs. arXiv preprint arXiv:1807.04686, 2018.
 [19] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
 [20] S. Kay. Fundamentals of statistical processing, volume i: Estimation theory: Estimation theory v. 1, 1993.
 [21] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
 [22] M. Lebrun, A. Buades, and J.M. Morel. A nonlocal bayesian image denoising algorithm. SIAM Journal on Imaging Sciences, 2013.
 [23] M. Lebrun, M. Colom, and J.M. Morel. The noise clinic: a blind image denoising algorithm. Image Processing On Line, 5:1–54, 2015.
 [24] S. Lefkimmiatis. Nonlocal color image denoising with convolutional neural networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5882–5891, July 2017.
 [25] J. Lehtinen, J. Munkberg, J. Hasselgren, S. Laine, T. Karras, M. Aittala, and T. Aila. Noise2noise: Learning image restoration without clean data. arXiv preprint arXiv:1803.04189, 2018.
 [26] M. Maggioni, E. SánchezMonge, and A. Foi. Joint removal of random and fixedpattern noise through spatiotemporal video filtering. IEEE Transactions on Image Processing, 23(10):4282–4296, 2014.
 [27] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman. Nonlocal sparse models for image restoration. In Computer Vision, 2009 IEEE 12th International Conference on, pages 2272–2279. IEEE, 2009.
 [28] M. Makitalo and A. Foi. A closedform approximation of the exact unbiased inverse of the anscombe variancestabilizing transformation. IEEE transactions on image processing, 20(9):2697–2698, 2011.
 [29] M. Makitalo and A. Foi. Optimal inversion of the anscombe transformation in lowcount poisson image denoising. IEEE transactions on Image Processing, 20(1):99–109, 2011.
 [30] X. Mao, C. Shen, and Y.B. Yang. Image restoration using very deep convolutional encoderdecoder networks with symmetric skip connections. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 2802–2810. Curran Associates, Inc., 2016.
 [31] P. Moulin and J. Liu. Analysis of multiresolution image denoising schemes using generalized Gaussian and complexity priors. Information Theory, IEEE Transactions on, 45(3):909–919, Apr 1999.
 [32] T. Plotz and S. Roth. Benchmarking Denoising Algorithms with Real Photographs. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2750–2759. IEEE, jul 2017.
 [33] J. Portilla, V. Strela, M. Wainwright, and E. Simoncelli. Image denoising using scale mixtures of gaussians in the wavelet domain. Image Processing, IEEE Transactions on, 12(11):1338–1351, Nov 2003.
 [34] P. Qiao, Y. Dou, W. Feng, R. Li, and Y. Chen. Learning nonlocal image diffusion for image denoising. In Proceedings of the 25th ACM International Conference on Multimedia, MM ’17, pages 1847–1855, New York, NY, USA, 2017. ACM.
 [35] O. Ronneberger, P. Fischer, and T. Brox. UNet: Convolutional Networks for Biomedical Image Segmentation. Miccai, pages 234–241, 2015.
 [36] L. I. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena, 60(14):259–268, 1992.
 [37] V. Santhanam, V. I. Morariu, and L. S. Davis. Generalized deep image to image regression. CoRR, abs/1612.03268, 2016.
 [38] U. Schmidt and S. Roth. Shrinkage fields for effective image restoration. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, pages 2774–2781, June 2014.
 [39] K. Simonyan and A. Zisserman. Very deep convolutional networks for largescale image recognition. arXiv preprint arXiv:1409.1556, 2014.
 [40] J. SÃ¡nchez PÃ©rez, E. MeinhardtLlopis, and G. Facciolo. TVL1 Optical Flow Estimation. Image Processing On Line, 2013.
 [41] C. Tomasi and R. Manduchi. Bilateral filtering for gray and color images. In Computer Vision, 1998. Sixth International Conference on, pages 839–846. IEEE, 1998.
 [42] D. Ulyanov, A. Vedaldi, and V. Lempitsky. Deep image prior. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
 [43] R. Vemulapalli, O. Tuzel, and M. Liu. Deep gaussian conditional random field network: A modelbased deep network for discriminative denoising. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4801–4809, June 2016.
 [44] D. Yang and J. Sun. Bm3dnet: A convolutional neural network for transformdomain collaborative filtering. IEEE Signal Processing Letters, 25(1):55–59, Jan 2018.
 [45] G. Yu, G. Sapiro, and S. Mallat. Solving inverse problems with piecewise linear estimators: From gaussian mixture models to structured sparsity. Image Processing, IEEE Transactions on, 21(5):2481–2499, May 2012.
 [46] C. Zach, T. Pock, and H. Bischof. A duality based approach for realtime tvl 1 optical flow. In Joint Pattern Recognition Symposium. Springer, 2007.
 [47] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang. Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising. IEEE Transactions on Image Processing, 26(7):3142–3155, 7 2017.
 [48] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Transactions on Image Processing, 26(7):3142–3155, 2017.
 [49] K. Zhang, W. Zuo, and L. Zhang. FFDNet: Toward a Fast and Flexible Solution for {CNN} based Image Denoising. CoRR, abs/1710.0, 2017.
 [50] H. Zhao, O. Gallo, I. Frosio, and J. Kautz. Loss Functions for Image Restoration With Neural Networks. IEEE Transactions on Computational Imaging, 3(X):47–57, 3 2017.
 [51] D. Zoran and Y. Weiss. From learning models of natural image patches to whole image restoration. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 479–486, Nov 2011.