Deep Class Aware Denoising

Deep Class Aware Denoising

Tal Remez 1
   Or Litany 1
   Raja Giryes 1
   Alex M. Bronstein 2
1 School of Electrical Engineering, Tel-Aviv University, Israel
2 Computer Science Department, Technion - IIT, Israel

The increasing demand for high image quality in mobile devices brings forth the need for better computational enhancement techniques, and image denoising in particular. At the same time, the images captured by these devices can be categorized into a small set of semantic classes. However simple, this observation has not been exploited in image denoising until now. In this paper, we demonstrate how the reconstruction quality improves when a denoiser is aware of the type of content in the image. To this end, we first propose a new fully convolutional deep neural network architecture which is simple yet powerful as it achieves state-of-the-art performance even without being class-aware. We further show that a significant boost in performance of up to dB PSNR can be achieved by making our network class-aware, namely, by fine-tuning it for images belonging to a specific semantic class. Relying on the hugely successful existing image classifiers, this research advocates for using a class-aware approach in all image enhancement tasks.

   Ground truth image Noisy image Denoised by TNRD [13] Denoised by our method
Figure 1: Perceptual comparison of class-aware and standard denoising. Our proposed face-specific denoiser produces a visually pleasant result and avoids artifacts commonly introduced by general-purpose denoisers. The reader is encouraged to zoom in for a better view of the artifacts.

1 Introduction

The ubiquitous use of mobile phone cameras in the recent decade has set a very high demand on the image quality these devices are expected to produce. On the other hand, the never-ending pursuit of more pixels at smaller form factors puts stringent constraints on the amount of light each pixel is exposed to and results in noisier images. This puts an increasing weight on computational post-processing techniques, in particular on image denoising.

Many image acquisition artifacts such as low-light noise and camera shake [16] can be compensated by image enhancemnet techniques. Denoising in the presense of additive white Gaussian noise is one of the key problems studied in this context. While realistic low-light imaging is largely dominated by the Poisson-distributed shot noise, there exist various techniques that allow accurate treatment of non-Gaussian noise sources with a Gaussian denoiser [39, 40, 47, 57]. Moreover, it has been shown in [11, 15, 46, 57, 63] that having a good Gaussian denoising algorithm allows to solve efficiently many other image processing problems such as deblurring, inpainting, compression postprocessing and more, without compromising the reconstruction quality or the need to design a new strategy adapted to a new setting. In view of these results, it is evident that a good Gaussian denoiser sets the foundation for solving a variety of image reconstruction and enhancement problems.

Numerous methods have been proposed for removing Gaussian noise from images, including -SVD [2], non-local means [9], BM3D [14] non-local -SVD [38], field of experts (FoE) [52], Gaussian mixture models (GMM) [65], non-local Bayes [33], nonlocally centralized sparse representation (NCSR) [19] and simultaneous sparse coding combined with Gaussian scale mixture (SSC-GSM) [18]. These techniques have been designed based on some properties of natural images such as the recurrence of patches at different locations or their sparsity in a certain dictionary.

In the past few years, the state-of-the-art in image denoising has been achieved by techniques based on artificial neural networks [10, 13, 62]. Neural networks (NNs) are essentially concatenations of basic units (layers), each comprising a linear operation followed by a simple non-linearity, resulting in an intricate highly non-linear response. Currently, they are among the most popular and powerful tools in machine learning [6, 17, 23, 34, 51]. NN-based approaches have led to state-of-the-art results in numerous tasks in computer vision (e.g. for image classification [25, 32], video classification [29], object detection [60], face recognition [53], and handwriting word recognition [44]), speech recognition [50] and natural language processing [5, 26, 56, 59], artificial intelligence (e.g., playing videogames [42] or beating the world Go champion [55], which is considered to be a very prominent milestone in the AI community), medical imaging [24], image processing (e.g., image decovolution [54], inpainting [43] and super-resolution [7, 35, 30]), and more [34].

The first neural network to achieve state-of-the-art performance in image denoising has been proposed in [10]. It is based on a fully connected architecture and therefore requires more training examples at training and much more memory and arithmetic complexity at inference compared to the more recent solution in [62], which proposes a neural network based on a deep Gaussian Conditional Random Field (DGCRF) model, or the model-based Trainable Nonlinear Reaction Diffusion (TNRD) network introduced in [13].


One of the main elements we find to be missing in the current denosing techniques (and image enhancement strategies in general) is the awareness of the class of images being processed. Such an approach is much needed as the objects typically photographed by phone camera users belong to a limited number of semantic classes. In this paper, we demonstrate that it is possible to do better image enhancement when the algorithm is class-aware.

We demonstrate this claim on the Gaussian denoising task, for which we propose a novel convolutional neural network (CNN)-based architecture that obtains performance higher than or comparable to the state-of-the-art. The advantage of our architecture is its simple design and the ease of adaptation to new data. We fine-tune a pretrained network on several popular image classes and demonstrate a further significant improvement in performance compared to the class-agnostic baseline.

In light of the high performance achieved by modern image classification schemes, the proposed techniqe may be used to improve the image quality in mobile phone camera. To substantiate this claim, we show that the state-of-the-art image classification networks are resilient to the presence of even large amounts of noise.

2 Class Aware Denoising

The current theory of patch-based image denoising sets a bound on the achievable performance [12, 36, 37]. In fact, since existing methods have practically converged to that bound, one may be tempted to deem futile the on-going pursuit of better performance. As it turns out, two possibilities to break this barrier still exist. The first is to use larger patches. This has been proved useful in [10] where the use of patches allowed to outperform BM3D [14] that held the record for many years. A second “loophole” which allows a further improvement in denoising performance is to use a better image prior, such as narrowing down the space of images to a more specific class. These two possibilities are not mutually exclusive, and indeed we exploit both. First, as detailed in the sequel, our network has a perceptive field of size , which is bigger than the existing practice, while the convolutional architecture keeps the network from becoming prohibitively large. Second, we fine-tune our denoiser to best fit a particular class. The class information can be provided manually by the user, for example when choosing face denoising for cleaning a personal photo collection, or automatically, by applying one of the many existing powerful classification algorithms.

The idea of combining classification with reconstruction has been previously proposed by [4] which also dubbed it recogstruction. In their work, the authors set a bound on super-resolution performance and showed it can be broken when a face-prior is used. Several other studies have shown that it is beneficial to design a strategy for a specific class. For example, in [8] it has been shown that the design of a compression algorithm dedicated to faces improves over generic techniques targeting general images. Specifically for the class of faces, several face hallucination methods have been developed [64], including face super-resolution and face sketch-photo synthesis techniques. In [28], the authors showed that given a collection of photos of the same person it is possible to obtain a more faithful reconstruction of the face from a blury image. In [27, 66] class labeling at a pixel-level is used for the colorization of gray-scale images. In [3], the subspaces attenuated by blur kernels for specific classes are learned, thus improving the deblurring performance.

Building on the success demonstrated in the aforementioned body of work, in this paper, we propose to use semantic classes as a prior and build class-aware denoisers. Different from previous methods, our model is made class-aware via training and not by design, hence it may be automatically extended to any type and number of classes. While in this paper we focus on Gaussian denoising, our methodology can be easily extended to much broader class-aware image enhancement, rendering it applicable to many low-level computer vision tasks.

3 DenoiseNet

Our network performs additive Gaussian image denoising in a fully convolutional manner. It receives a noisy grayscale image as the input and produces an estimate of the original clean image. The network architecture is shown in Figure 2. The layers at the top row of the diagram calculate features using convolutions of size , stride , and ReLU non-linearities. While the layers at the bottom of the diagram can be viewed as negative noise components as their sum cancels out the noise, and are calculated using a single channel convolution of size with stride . In all experiments we used networks with layers implemented in TensorFlow [1] and trained it for mini-batches on a Titan-X GPU with a set of images from the PASCAL VOC dataset [20]. We used mini-batches of patches of size . Images were converted to YCbCr and the Y channel was used as the input grayscale image after being scaled and shifted to the range of . During training, image patches were randomly cropped and flipped about the vertical axis. To avoid convolution artifacts at the borders of the patches caused by the receptive field of pixels in the deepest layer, we used an loss on the central part cropping the outer pixels during training time and padded the image symmetrically during test time by . Training was done using the ADAM optimizer [31] with a learning rate of , , and . Code and pretrained models will be made available111

Figure 2: DenoiseNet fully convolutional architecture. All convolutions are of size and stride . Convolution resulting feature sizes are listed as . The bottom row of outputs can be viewed as a negative noise components as their sum cancels out the noise.

3.1 Simplicity vs capacity

The choice of network architecture was motivated by the trade off between simplicity and capacity. To best illustrate the concept that class awareness may improve image enhancement algorithms, it was important to incorporate the class via the data, instead of explicitly manipulating the network architecture. This requires an as-simple-as-possible design. A rather straightforward choice would have been the fully connected architecture proposed by Burger et al. [10]; however, the huge amount of parameters this network uses renders it impractical for many applications. Alternatively, a very lightweight architecture was proposed by Chen and Pock [13]; however their model was specifically tailored to their task and, thus, one should be extremely cautious about generalizing any concept demonstrated on it. These two somewhat conflicting paradigms led us to design a new architecture which is both relatively light-weight while extremely simple to understand and implement. In terms of capacity, we have two orders of magnitude less parameters than the NN proposed by Burger [10], but only one order of magnitude more than that introduced by Chen and Pock [13]. Note that the reduction in the number of parameters does not decrease the receptive field as our model is much deeper.

4 Classification in the presence of noise

The tacit assumption of our class-aware approach is the ability to determine the class of the noisy input image. While the goal of this research is not to improve image classification, we argue that the performance of modern CNN based classification algorithm such as Inception [60, 61] or resNet [25] is relatively resilient to a moderate amount of noise. In addition, since we are interested in canonical semantic classes such as faces and pets which are far coarser than the ImageNet classes [49], the task becomes even easier: confusing two breeds of cats is not considered an error.

The aforementioned networks can be further fine-tuned using noisy examples to increase their resilience to noise. Alternatively, one could simply run a class-agnostic denoiser on the image before plugging it into the classification network. To illustrate the noise resilience property we ran the pre-trained Inception-v3 [61] network on a few tens of images from the pets class. We then gradually added noise to these images and counted the number of images on which the classifier changed its most confident class to a different class, as visualized in Figure 3. Observe that the network classification remains stable even in the presence of large amount of noise.

Figure 3: Noise resilience of image classification. The percentage of images on a pre-trained inception-v3 classifier remains stable exceeds even in the presence of large amount of noise.

5 Experiments

In all experiments in this section our network was trained on images from the PASCAL VOC [20] dataset and was compared to BM3D [14], multilayer perceptrons (MLP) [10] and TNRD [13] on the following three test sets: (i) images from PASCAL VOC [21]; (ii) a denoising dataset with quantized images from [62]; and (iii) test images chosen by [48] from the Berkeley segmentation dataset [41].

5.1 Class-agnostic denoising

Pascal Voc.

In this experiment we tested the denoising algorithms on test images from the PASCAL VOC dataset [21]. We believe this large and diverse set of images is representative enough to make conclusions about the denoising performance. Table 1 summarizes performance in terms of average PSNR for all test images contaminated by white Gaussian noise with and up to . It is evident that our method outperforms all other methods for both noise levels by over dB.

MLP [10]
TNRD [13]
DenoiseNet 34.87 32.79 30.36 28.88 27.32 26.30 25.74
Table 1: Performance on PASCAL VOC. Average PSNR values on a image test set. Our method outperforms all other methods for all noise levels.

To examine the statistical significance of the improvement our method achieves, in Figure 4 we compare the gain in performance with respect to BM3D achieved by our method, MLP and TNRD. Image indices are sorted in ascending order of performance gain. A smaller zero-crossing value affirms our method outperforms BM3D on a larger portion of the dataset than the competitors. The plot visualizes the large and consistent improvement in PSNR achieved by DenoiseNet. A summary of the number of images on which each algorithm performed the best is presented in Figure 5.

Figure 4: Comparison of performance profile relative to BM3D. Image indices are sorted in ascending order of performance gain relative to BM3D. The improvement of our method over two competing algorithms is demonstrated by (i) a noticeable decrease of the zero-crossing point, and (ii) consistently higher values of gain over BM3D. The distribution reveals the statistical significance of the reported improvement. The comparison was made on images from PASCAL VOC.
Figure 5: Top performance distribution on PASCAL VOC test set. Percentage of images on which a denoising algorithm performed the best. Our method wins on of the images, whereas MLP, BM3D, and TNRD win on and respectively.

Berkeley segmentation dataset.

In this experiment we tested the performance of our method, trained on PASCAL VOC, on the a set of images selected by [48] from Berkeley segmentation dataset [41]. Even though these test images belong to a different dataset, Figure 2 shows that our method outperforms previous methods for all sigma values.

MLP [10]
TNRD [13]
DenoiseNet 33.58 31.44 29.04 27.56 26.06 25.12 24.61
Table 2: Performance on images from Berkeley segmentation dataset. Average PSNR values on a test set of images selected by [48]. Our method outperforms all others for all noise levels.

Quantized noise.

Even though our network has not been explicitly trained to treat quantized noisy images, we evaluated its performance on such images from [62]. Results are reported in Table 3. The set contains test images from the Berkeley segmentation dataset and additional images from the PASCAL VOC 2012 [22] dataset. All images have been quantized to bits in the range . For the noise level of our methods outperforms previous methods but fails to do so for .

MLP [10] 25.20
TNRD [13]
DenoiseNet 28.71
Table 3: Performance on quantized test images from [62]. Images have been clipped to a range of and quantized to bits. PSNR values for two different noise levels are reported.

5.2 Class-aware denoising

This experiment evaluates the boost in performance resulting from fine-tunning a denoiser on a set of images belonging to a particular class. In order to do so we collected images from ImageNet [49] of the following six classes: face, pet, flower, beach, living room, and street. The images per class were split into train (), validation () and test () sets. We then trained a separate class-aware denoiser for each of the six classes. This was done by fine-tuning our class-agnostic model, that had been trained on PASCAL VOC, using the images from ImageNet. The performance of the class-aware denoisers was compared to its class-agnostic counterpart as well as to other denoising methods. Average PSNR values summarized in Figure 6 demonstrate that our class-aware models outperforms our class-agnostic network, BM3D [14], multilayer perceptrons (MLP) [10] and TNRD [13]. Notice how class-awareness boosts performance by up to .

Figure 6: Class-aware denoising performance on ImageNet. Average PSNR values for different methods on images belonging to six different semantic classes. It is evident that the class-specific fine-tuned models outperform all other methods. In addition being class-aware enables to gain up to dB PSNR copared to our class-agnostic network.

5.3 Cross-class denoising

To further demonstrate the effect of refining a denoiser to a particular class, we tested each class-specific denoiser on images belonging to other classes. The outcome of this mismatch is evident both qualitatively and quantitatively. The top row of Figure 7 presents a comparison of class-aware denoisers fine-tuned to the street and face image classes applied to a noisy image of a face. The denoiser tuned to the street class produces noticeable artifacts around the eye, cheek and hair areas. Moreover, the edges appear too sharp and seem to favor horizontal and vertical edges. This is not very surprising as street images contain mainly man-made rectangle shaped structures. In the second row, strong artifacts appear on the hamster’s fur when the image is processed by living room-specific denoiser. The pet-specific denoiser, on the other hand, produces a much more naturally looking result. Additional examples demonstrating artifacts caused by the mismatch on the canonical images House and Lena are presented in the bottom two rows. Notice how the street-specific denoiser reconstructs sharp boundaries of the building whereas the face-specific counterpart smears them.

To quantify the effect of mismatching we evaluated the percentage of wins of every fine-tuned denoiser on each type of image class. A win means that a particular denoiser produced the highest PSNR among all the others. A confusion matrix for all combinations of class-specific denoisers and image classes is presented in Figure 8. We conclude that applying a denoiser of the same class as the image results in the best performance.

Ground truth Noisy image Correct denoiser Wrong denoiser
face-specific street-specific
pet-specific living room-specific
street-specific face-specific
face-specific street-specific

Figure 7: Cross-class denoising. Representative outputs of DenoiseNet denoisers fine-tuned to the class of the inpuit image (third column from left), and to a mismatched class (rightmost column). The reader is encouraged to zoom in for a better view of the artifacts.
Figure 8: Denoiser performance per semantic class. Each row represents a specific semantic class of images while class-aware denoisers are represented as columns. The -th element in the confusion matrix shows the probability of the -th class-aware denoiser to outperform all other denoisers on the -th class of images.
Noisy input Output
Ground truth Layer 5 Layer 10 Layer 15 Layer 20

Figure 9: Gradual denoising process by flower-specific DenoiseNet. The top row presents the noisy image (left) and the intermediate result obtained by removing the noise estimated up to the respective layer depth. The second row presents the ground truth image (left) and the noise estimates produced by individual layers; the noise images have been scaled for display purposes. We encourage the reader to zoom-in onto the images to best view the fine details and noise.

5.4 Network noise estimation

This section presents a few examples that we believe give insights about the noise estimation of our class-aware networks. The overall noise estimation of the network is the sum of the estimates produced by all individual layers. These are presented in the bottom row of Figure 9. Interestingly, they differ significantly from one another. The shallow layer estimations appear to handle local noise while the deeper ones seem to focus on object contours. In the top row, we present the input image after it has been denoised by all layers up to a specific depth. To further examine what is happening ”under the hood” of our class-aware denoisers, in Figure 10 we show the error after and layers (rows ). Surprisingly, even thought it has not been explicitly enforced at training, the error monotonically decreases with the layer depth (see plots in row ). This non-trivial behavior is consistently produced by the network on all test images. Lastly, to visualize which of the layers was the most dominant in the denoising process, we assign a different color to each layer and color each pixel according to the layer in which its value changed the most. The resulting image is shown in the bottom row of Figure 10. It can be observed that the first few layers govern the majority of the pixels while the following ones mainly focus on recovering and enhancing the edges and textures that might have been degraded by the first layers.

6 Discussion

Given the state-of-the-art performance of our network, an important task is to interpret what it has learned and what is the relation between the action of DenoiseNet and the principles governing the previous manually designed state-of-the-art denoising algorithms. One such principle that has been shown to improve denoising in recent years is gradual denoising, namely that iteratively removing small portions of the noise is preferable to removing it all at once [45, 58, 67]. Interestingly, as can be seen in Figure 9, our network exhibits such a behavior despite the fact it has not been trained explicitly to have a monotonically decreasing error throughout the layers. Each layer in the network removes part of the noise in the image, where the flat regions are being denoised mainly in the first layers, while the edges in the last ones. This may be explained by the fact that the deeper layers corresponds to a larger receptive field and therefore may recover in a better way global patterns such as edges that may be indistinguishable from noise if viewed just in the context of a small patch.

In a certain sense, the present research demonstrates that in some cases the whole is smaller than the sum of its parts. That is, splitting the input image to several categories and then building a fine-tuned filter for each is preferable over a universal filter. That said, the decision to split according to a semantic class was made due to the immediate availability of off-the-shelf classifiers and their resilience to noise. Yet, this splitting scheme may very well be sub-optimal. Other choices for data partitioning could be made. In particular, a classifier could be learned automatically, e.g., by incorporating the splitting scheme into a network architecture and training it end-to-end. In such cases, the partitioning would lose its simple interpretation as semantic classes, and would instead yield some abstract classes. We defer this interesting direction to future research.

Ground truth

Noisy input

Denoised image

Error after 5 layers

Error after 10 layers

Error after 20 layers (output)

RMSE at different layers

Layer contributing the most to each pixel

Figure 10: Gradual denoising process. Images are best viewed electronically, the reader is encouraged to zoom in for a better view. Please refer to Section 5.4 for more details.


  • [1] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, et al. Tensorflow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow. org, 1, 2015.
  • [2] M. Aharon, M. Elad, and A. Bruckstein. K-svd: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process., 54(11):4311–4322, Nov 2006.
  • [3] S. Anwar, C. P. Huynh, and F. Porikli. Class-specific image deblurring. In IEEE International Conference on Computer Vision (ICCV), pages 495–503, Dec. 2015.
  • [4] S. Baker and T. Kanade. Limits on super-resolution and how to break them. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(9):1167–1183, 2002.
  • [5] J. Bellegarda and C. Monz. State of the art in statistical methods for language and speech processing. Computer Speech and Language, 35:163–184, Jan. 2016.
  • [6] Y. Bengio. Learning deep architectures for ai. Foundations and Trends in Machine Learning, 2(1):1–127, 2009.
  • [7] J. Bruna, P. Sprechmann, and Y. LeCun. Super-resolution with deep convolutional sufficient statistics. In ICLR, 2016.
  • [8] O. Bryt and M. Elad. Compression of facial images using the K-SVD algorithm. Journal of Visual Communication and Image Representation, 19(4):270 – 282, 2008.
  • [9] A. Buades, B. Coll, , and J. Morel. A non-local algorithm for image denoising. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2005.
  • [10] H. C. Burger, C. J. Schuler, and S. Harmeling. Image denoising: Can plain neural networks compete with bm3d? In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2392–2399. IEEE, 2012.
  • [11] S. Chan, X. Wang, and O. Elgendy. Plug-and-play admm for image restoration: Fixed point convergence and applications. ArXiv, abs/1605.01710, 2016.
  • [12] P. Chatterjee and P. Milanfar. Is denoising dead? IEEE Trans. Image Process., 19(4):895–911, 2010.
  • [13] Y. Chen and T. Pock. Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration. IEEE Transactions on Pattern Analysis and Machine Intelligence (CVPR), 2016.
  • [14] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian. Image denoising by sparse 3-d transform-domain collaborative filtering. IEEE Trans. Image Process., 16(8):2080–2095, 2007.
  • [15] Y. Dar, A. M. Bruckstein, M. Elad, and R. Giryes. Postprocessing of compressed images via sequential denoising. IEEE Trans. Imag. Proc., 25(7):3044–3058, 2016.
  • [16] M. Delbracio and G. Sapiro. Burst deblurring: Removing camera shake through fourier burst accumulation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
  • [17] L. Deng and D. Yu. Deep learning: Methods and applications. Foundations and Trends in Signal Processing, 7(3-4):197–387, 2014.
  • [18] W. Dong, G. Shi, Y. Ma, and X. Li. Image restoration via simultaneous sparse coding: Where structured sparsity meets gaussian scale mixture. International Journal of Computer Vision (IJCV), 114(2):217–232, Sep.
  • [19] W. Dong, L. Zhang, G. Shi, and X. Li. Nonlocally centralized sparse representation for image restoration. IEEE Trans. Image Process., 22(4):1620–1630, April 2013.
  • [20] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge. International journal of computer vision, 88(2):303–338, 2010.
  • [21] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2010 (VOC2010) Results.
  • [22] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results.
  • [23] I. Goodfellow, Y. Bengio, and A. Courville. Deep learning. Book in preparation for MIT Press, 2016.
  • [24] H. Greenspan, B. van Ginneken, and R. M. Summers. Guest editorial deep learning in medical imaging: Overview and future promise of an exciting new technique. IEEE Transactions on Medical Imaging, 35(5):1153–1159, May 2016.
  • [25] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  • [26] J. Hirschberg and C. D. Manning. Advances in natural language processing. Science, 349(6245):261–266, 2015.
  • [27] S. Iizuka, E. Simo-Serra, and H. Ishikawa. Let there be color!: Joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification. In SIGGRAPH, 2016.
  • [28] N. Joshi, W. Matusik, E. H. Adelson, and D. J. Kriegman. Personal photo enhancement using example images. ACM Trans. Graph., 29(2):12:1–12:15, Apr. 2010.
  • [29] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video classification with convolutional neural networks. In CVPR, 2014.
  • [30] J. Kim, J. K. Lee, and K. M. Lee. Accurate image super-resolution using very deep convolutional networks. In CVPR, 2016.
  • [31] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In ICLR, 2015.
  • [32] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (NIPS), pages 1097–1105, 2012.
  • [33] M. Lebrun, A. Buades, and J. M. Morel. A nonlocal bayesian image denoising algorithm. SIAM Journal on Imaging Sciences, 6(3):1665–1688, 2013.
  • [34] Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Foundations and Trends in Signal Processing, 521:436–444, May 2015.
  • [35] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi. Photo-realistic single image super-resolution using a generative adversarial network. arXiv abs/1609.04802, 2016.
  • [36] A. Levin and B. Nadler. Natural image denoising: Optimality and inherent bounds. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2833–2840. IEEE, 2011.
  • [37] A. Levin, B. Nadler, F. Durand, and W. Freeman. Patch complexity, finite pixel correlations and optimal denoising. In ECCV, 2012.
  • [38] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman. Non-local sparse models for image restoration. In ICCV, pages 2272–2279, 2009.
  • [39] M. Makitalo and A. Foi. Optimal inversion of the Anscombe transformation in low-count Poisson image denoising. IEEE Trans. on Image Proces., 20(1):99–109, Jan. 2011.
  • [40] M. Makitalo and A. Foi. Noise parameter mismatch in variance stabilization, with an application to poisson-gaussian noise estimation. IEEE Trans. on Image Proces., 23(12):5348–5359, Jan. 2014.
  • [41] D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proc. 8th Int’l Conf. Computer Vision, volume 2, pages 416–423, July 2001.
  • [42] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. Human-level control through deep reinforcement learning. Nature, 518:529–533, Feb. 2015.
  • [43] D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros. Context encoders: Feature learning by inpainting. In CVPR, 2016.
  • [44] A. Poznanski and L. Wolf. Cnn-n-gram for handwriting word recognition. In CVPR, 2016.
  • [45] Y. Romano and M. Elad. Boosting of image denoising algorithms. SIAM Journal on Imaging Sciences, 8(2):1187–1219, 2015.
  • [46] Y. Romano, M. Elad, and P. Milanfar. The little engine that could: Regularization by denoising (red). arXiv:1611.02862, 2016.
  • [47] A. Rond, R. Giryes, and M. Elad. Poisson inverse problems by the plug-and-play scheme. Journal of Visual Communication and Image Representation, 2016.
  • [48] S. Roth and M. J. Black. Fields of experts. International Journal of Computer Vision, 82(2):205–229, 2009.
  • [49] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. Int. Journal of Computer Vision, 115(3):211–252, 2015.
  • [50] H. Sak, A. Senior, K. Rao, and F. Beaufays. Fast and accurate recurrent neural network acoustic models for speech recognition. In INTERSPEECH, 2015.
  • [51] J. Schmidhuber. Deep learning in neural networks: An overview. Neural Networks, 61:85–117, 2015.
  • [52] U. Schmidt, Q. Gao, and S. Roth. A generative perspective on mrfs in low-level vision. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010.
  • [53] F. Schroff, D. Kalenichenko, and J. Philbin. Facenet: A unified embedding for face recognition and clustering. In CVPR, 2015.
  • [54] C. J. Schuler, H. C. Burger, S. Harmeling, and B. Schölkopf. A machine learning approach for non-blind image deconvolution. In CVPR, 2013.
  • [55] D. Silver, A. Huang, C. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis. Mastering the game of go with deep neural networks and tree search. Nature, 529:484–489, 2016.
  • [56] R. Socher, A. Perelygin, J. Wu, J. Chuang, C. Manning, A. Ng, and C. Potts. Recursive deep models for semantic compositionality over a sentiment treebank. In EMNLP, 2013.
  • [57] S. Sreehari, S. V. Venkatakrishnan, B. Wohlberg, G. T. Buzzard, L. F. Drummy, J. P. Simmons, and C. A. Bouman. Plug-and-play priors for bright field electron tomography and sparse interpolation. IEEE Transactions on Computational Imaging, 2(4):408–423, Dec 2016.
  • [58] J. Sulam and M. Elad. Expected patch log likelihood with a sparse prior. In Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR), Hong-Kong, 2015.
  • [59] I. Sutskever, O. Vinyals, and Q. Le. Sequence to sequence learning with neural networks. In NIPS, 2014.
  • [60] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In CVPR, 2015.
  • [61] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision, journal = arXiv, abs/1512.00567, year = 2015, url =,.
  • [62] R. Vemulapalli, O. Tuzel, and M.-Y. Liu. Deep gaussian conditional random field network: A model-based deep network for discriminative denoising. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  • [63] S. Venkatakrishnan, C. Bouman, and B. Wohlberg. Plug-and-play priors for model based reconstruction. In GlobalSIP, 2013.
  • [64] N. Wang, D. Tao, X. Gao, X. Li, and J. Li. A comprehensive survey to face hallucination. International Journal of Computer Vision, 106(1):9–30, 2014.
  • [65] G. Yu, G. Sapiro, and S. Mallat. Solving inverse problems with piecewise linear estimators: From Gaussian mixture models to structured sparsity. IEEE Trans. Image Process., 21(5):2481 –2499, may 2012.
  • [66] R. Zhang, P. Isola, and A. A. Efros. Colorful image colorization. ECCV, 2016.
  • [67] D. Zoran and Y. Weiss. From learning models of natural image patches to whole image restoration. In ICCV, 2011.

Denoising Examples

In the next few pages we include two examples for each of the six semantic classes used in the paper. For each example, our class-aware denoiser is compared to BM3D [14], MLP [10] and TNRD [13]. In addition we present the ground truth and noisy images. Although in most cases the difference between the methods is visible in full view, we encourage the reader to zoom-in to fully appreciate fine image details.

  Ground truth Noisy image BM3D
  MLP TNRD Pet-specific DenoiseNet

  Ground truth Noisy image BM3D
  MLP TNRD Pet-specific DenoiseNet

  Ground truth Noisy image BM3D
  MLP TNRD Face-specific DenoiseNet

  Ground truth Noisy image BM3D
  MLP TNRD Face-specific DenoiseNet

  Ground truth Noisy image BM3D
  MLP TNRD Flower-specific DenoiseNet

  Ground truth Noisy image BM3D
  MLP TNRD Flower-specific DenoiseNet

  Ground truth Noisy image BM3D
  MLP TNRD Street-specific DenoiseNet

  Ground truth Noisy image BM3D
  MLP TNRD Street-specific DenoiseNet

  Ground truth Noisy image BM3D
  MLP TNRD Living room-specific DenoiseNet

  Ground truth Noisy image BM3D
  MLP TNRD Living room-specific DenoiseNet

  Ground truth Noisy image BM3D
  MLP TNRD Beach-specific DenoiseNet

  Ground truth Noisy image BM3D
  MLP TNRD Beach-specific DenoiseNet

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description