Joint demosaicing and denoising by overfitting of bursts of raw images

Joint demosaicing and denoising by overfitting of bursts of raw images

Thibaud Ehret Axel Davy Pablo Arias Gabriele Facciolo
CMLA, ENS Cachan, CNRS
Université Paris-Saclay, 94235 Cachan, France
thibaud.ehret@ens-cachan.fr
Abstract

Demosaicking and denoising are the first steps of any camera image processing pipeline and are key for obtaining high quality RGB images. A promising current research trend aims at solving these two problems jointly using convolutional neural networks. Due to the unavailability of ground truth data these networks cannot be currently trained using real RAW images. Instead, they resort to simulated data. In this paper we present a method to learn demosacking directly from mosaicked images, without requiring ground truth RGB data. We apply this to learn joint demosaicking and denoising only from RAW images, thus enabling the use of real data. In addition we show that for this application overfitting a network to a specific burst improves the quality of restoration for both demosaicking and denoising.

Figure 1: Using a burst, our method (trained without any ground truth whatsoever) is able to not only denoise well () but also doesn’t show any artifacts like zipping or moire in the difficult regions.

1 Introduction

Most camera sensors capture a single color at each photoreceptor, determined by a color filter array (CFA) located on top of the sensor. The most commonly used CFA is the so-called Bayer pattern, consisting of a regular subsampling of each color channel. This means, not only that each pixel of the resulting raw image contains one third of the necessary information, but also that the color channels are never sampled at the same positions. The problem of interpolating the missing colors is called demosaicking and is a challenging ill-posed inverse problem. To further complicate things, the captured data is contaminated with noise.

For these reasons the first two steps of a camera processing pipeline are demosaicking and denoising. Traditionally, these problems have been treated separately, but this is suboptimal. Demosaicking first a noisy RAW image correlates the noise making its subsequent denoising harder [31]. Alternatively, if denoising is applied on the mosaicked data it becomes harder to exploit the cross-color correlations, which is useful for color image denoising [9, 10].

Until recently, state-of-the-art methods for joint denoising and demosaicking were based on carefully crated heuristics, such as avoiding interpolation across image edges [20, 31, 2]. Other methods resort to variational principles where the heuristics are encoded as a prior model [7, 19].

Recent data-driven approaches have significantly outperformed traditional model-based methods [22, 14, 23, 24, 25, 33]. In [14], state-of-the-art results are reported with a network trained on a special dataset tailored to demosaicking in which hard cases are over-represented. In [24] an iterative neural network is proposed, later improved by [25] obtaining state-of-the-art performance on both real and synthetic datasets. These networks are relatively lightweight and do not need a lot of training data. The authors in [33] propose two networks for demosaicking. They train on several CFA patterns to compare performance and integrate the handling of denoising with a fine-tuning step. In [39] the authors find that the artefacts of challenging cases are better dealt with norm, or their proposed combination of the norm with MS-SSIM. Meanwhile in [27] alternative metrics to PSNR are also considered.

The major difficulty in training data-driven demosaicking and denoising methods is the difficulty to obtain realistic datasets of pairs of noisy RAW and ground truth RGB images. For this reason demosaicking networks are trained with simulated data generated by mosaicking existing RGB images. However simulated data follows a statistic that can be different from real data. The RGB images used for training have already been processed by a full ISP (Image Signal Processors) pipeline which includes demosaicking and denoising steps which leave their footprint on the output image. Additionally, the Poisson noise model is only an approximation to the real noise of a specific camera. Several factors can cause deviations. For example the noise can have spatial variations due to temperature gradients in the sensor, or caused by the vignetting or the electronic components in its surrounding.

The need for a specific treatment of realistic noise has been identified in the denoising literature. Indeed most of the existing works target synthetic types of noise, e.g. Gaussian noise. Since the noise distribution is well defined, specific methods can be crafted [8, 28, 15] and data can be simulated with ground truth so to train neural networks [37, 38]. However it has been shown recently in [32] and [1] that networks trained on synthetic noise often fail to generalize to realistic types of noise. This has started a trend of study of ”real noisy images”. For example [6, 16] acquire datasets where a low-noise reference image is created by using a longer exposure time. Creating this type of dataset is time consuming and prone to bias, as to avoid motion blur in the long exposure the images need to be acquired with a tripod and the scene has to be static.

More recently Lehtinen et al[29] proposed a novel way of training a denoising network without ground truth, only from pairs of noisy images with independent noise realizations. This approach has been taken further by [26] which eliminate the need for the second noisy observation, albeit with a penalty in the quality of the obtained results. In the context of burst and video denoising the frame-to-frame approach of [11] proposes to fine-tune a pre-trained Gaussian denoising network to other types of noise requiring only a single video.

Contribution

In this paper we introduce a mosaic-to-mosaic (M2M) training stategy analog to the noise-to-noise [29] and frame-to-frame [11] frameworks to be able to handle mosaicked RAW data. The trained network learns to interpolate two thirds of the image data, without having ever seen a complete image. This allows us to train both demosaicking and joint denoising and demosaicking networks without requiring ground truth. The resulting networks attain state-of-the-art results, thus eliminating the need to simulate simplistic noise models or to capture time-consuming datasets with long exposure reference frames. Although we show results only with a Bayer pattern, our method can equally be applied to other CFA patterns, such as the Fuji X-Trans. To the best of our knowledge, this is the first method that learns joint demosaicking and denoising without any ground truth whatsoever (the network has never seen an RGB image and has only seen noisy mosaicked images).

With the proposed framework, we can fine-tune a pre-trained network to a RAW burst. This allows to leverage the already available multi-frame burst data that is present on many mobile camera phones. The fine-tuning not only adapts the network to the specificities of the camera noise, but it also overfits to the burst. We demonstrate that this overfitting, when controlled, can be beneficial. Additionally, when used with an loss, the fine-tuned network naturally handles noise clipping, a common but challenging problem [29, 40].

The rest of the paper is organized as follows. In Section 2 we present the proposed mosaic-to-mosaic training of a demosacking network from a dataset of RAW mosaicked data without ground truth. In Section 3 we address the problem of joint denoising and demosaicking given a burst of RAW mosaicked noisy images. Results are shown in Section 4.

2 Learning demosaicking w/o ground truth

In this section, we propose a learning method to train demosaicking networks without any ground truth RGB images. Consider two different pictures of a same scene and . We shall use one image as partial ground truth to learn demosaicking the other (provided that there is a slight movement between the two, so that with high probability the mosaic patterns do not match).

Figure 2: Proposed pipeline to train for demosaicking without using any ground truth. The output after applying the network on the first image is warped using the transform and masked with so to be compared to the second masked mosaicked image. The black corners seen in at last stage of the diagram indicate the undefined pixels after the transform, which are not considered by the loss.

Our method requires the two pictures can be registered, which is possible when the viewpoint is not too different. This condition is typically met for bursts of images. Modern cameras systematically take burst of images, these sequences allow to eliminate shutter lag, to apply temporal noise reduction, and to increase the dynamic range of the device. Nevertheless the pair of pictures can also be acquired manually by taking two separate pictures of the same scene.

In the following, we suppose we have a set of pairs of images (for example extracted from bursts), where each pair of images (, ) are pictures of the same scene for which we have estimated a transformation that registers to . In the case of bursts, estimating an affinity is often sufficient. Pairs with not enough matching parts can be discarded. The original mosaicked image can be obtained from its demosaicked one by masking pixels. Thus, if we apply a demosaicking network to , then apply the transformation followed by the mosaicking mask, we are supposed to get . We can compute , where represents the mosaicking operation (masking pixels), compute a distance to , which acts as ground truth, and backpropagate the gradient to train . In some sense, acts as a partial ground truth, as only one third of gets compared to . However contrary to artifical RGB ground truths, we do not suffer from bias introduced by the RGB processing pipeline, nor require complex settings to produce these RGB ground truths. We implemented with a bicubic operation through which gradient can be backpropagated easily. This results in the following loss:

(1)

where . The norm is computed only in the pixels where both images are defined. In this section we use ( norm). The method to train for demosaicking without ground truth data is depicted in Figure 2.

Demosaicking network

To test the proposed training, we will use throughout the paper a network architecture heavily inspired by the one from Gharbi et al[14] while using improvements suggested in more recent work with the usage batch normalization layers [21] as well as residual learning [18]. These techniques are known to speed-up training time and sometimes increase performance. The network starts with a four-channel Bayer image that goes through a serie of Conv+BN+ReLu layers with features and convolutions. A layer of Conv+BN+ReLu produces features with convolutions. It is followed by an upsampling layer producing an RGB image of twice the width and twice the height. Like Gharbi et al. we added a layer (a Conv+BN+ReLu with convolutions) before the layer producing the final output. Since our network is residual we need to add bilinear interpolated RGB image to produce the final result. All convolution layers have padding to keep the resolution constant from beginning to end. The architecture of the network is depicted in Figure 3.

Figure 3: Architecture of the network used to compare the performance of learning on RGB ground truth or only with pairs of RAW images.

Comparing learning with ground truth RGB and our method

We verify that this method for training demosaicking without ground truth is competitive with regular training by training the same architecture with both methods and show comparable results. For this experiment we considered a mosaicking with Bayer pattern which is the most frequent mosaicking pattern.

In order to be able to compare the results of training with and training without ground truth, we decided to simulate the pairs on which the demosaicking is trained. For both trainings we use the dataset of [33], which consists of 500 images (of sizes around ) from Flickr. To generate pairs to learn with our method, we warped the same RGB image with a random affinity - thus simulating two viewpoints - and generated the mosaicked images from them. To speed up the training we chose the same transform for all patches of a same batch. We trained both networks for epochs using Adam and a learning rate of . We also reduced the learning rate by a factor of at epochs and .

Figure 4 compares the evolution of the PSNR on the Kodak dataset111http://r0k.us/graphics/kodak/ while training our network with ground truth against the training without ground truth. It can be observed that training without ground truth behaves the same than with the ground truth. The convergence speed seems to be equivalent as well as the final demosaicking quality.

Figure 4: Evolution of the average PSNR on the Kodak dataset when training with ground truth data and when training without RGB ground truth data available. Training without RGB ground truth behaves the same than training with an RGB ground truth.

Table 1 shows the quality of demosaicking using either ground truth or no ground truth versus the state of the art in image demosaicking. The model learned without having ever seen an RGB image is able to achieve the same quality than the same network trained using the RGB ground truth, which indicates having a ground truth is not necessary to obtain state-of-the-art performance on this task. For comparison, we also show the results obtained with model-based methods [13, 19] that do not need training with ground truth (they do not need training at all).

Method With ground truth Without ground truth
Getreuer et al. [13] - 38.1
Heide et al. [19] - 40.0
Gharbi et al. [14] 41.2 -
Ours 41.2 41.3
Table 1: PSNR results for different demosaicking method on the Kodak dataset. Our method outperforms all methods without ground truth while still achieving PSNR at the state-of-the-art level for method trained with ground truth.

3 Joint demosaicking and denoising by fine-tuning on a burst

The results in the previous section demonstrate that with the proposed M2M training, we can train a demosaicking network without RGB ground truth. For practical applications such a network is of little use, as the noise in the real RAW mosaicked data will negatively affect its performance. In this section we go one step further by training a network for joint demosaicking and denoising. This could be done using a dataset consisting of many pairs of RAW mosaicked images from the same scene. Instead, based on the on-line learning framework of [11], we propose to use the previously presented training strategy to learn a joint demosaicking and denoising network from a single burst.

Joint demosaicking and denoising without ground truth

Figure 5: From left to right: PSNR over the whole image, PSNR of the non saturated regions, PSNR of the saturated regions. After overfitting the network (DnCNN on noise ) works better on both the saturated regions and the non saturated regions. Moreover it performs as well as a fine-tuning done one the same image where the noise wasn’t clipped. However is not able to deal with clipping.

Using the noise-to-noise (N2N) framework presented in [29], we aim to train a network with parameters . Learning a joint demosaicking and denoising network in a supervised fashion corresponds to solving

(2)

Where the are noisy mosaicked images, and the are their ideal noise-free demosaicked image, is a loss such as or . In the N2N framework, the equivalent problem (conditionally on the noise being mean preserving for , or median preserving for ) is to solve

(3)

where are noisy observations of .

Combining that with equation 1, our proposal is to solve

(4)

where the are pairs of noisy images of the same scene, and was introduced in Section 2. We use in this section ( norm), which allows to handle clipped noise (see discussion on the choice of the loss).

The loss requires the computation of a transform matching each pair of mosaicked images. For that we use the inverse compositional algorithm [34, 3] to estimate a parametric transform (in practice we estimate an affinity which is the transform better suited for bursts). An implementation of this method is available in [4]. The advantage of this method is that it is robust to noise and can register two images very precisely (provided that they can be registered with an affinity). Since we only have access to Bayer images of size , the first step is to generate four-channel images of size corresponding to the four phases of the Bayer pattern. The transform is then estimated on these images before upscaling it to the correct size.

Having the pairs with the associated transform, one can finally apply the pipeline presented in Section 2 and in Figure 2. The only difference is that now the image is noisy. Similar to [11] we initialize the network using a pretrained network. In particular, we use the network trained for demosaicking without ground truth presented in Section 2.

Choice of Loss

One particularly well known problem with denoising is clipped noise: The underlying signal belongs to a fixed range, but the noise can make it leave that intensity range. Due to hardware clipping, the measured image is inside the fixed range, and thus the noise statistics are biased. When minimizing with the norm over the same image with several noise realizations, the best estimator is the median of the realizations [29], which is unaffected by the hardware clipping. Thus by using norm and fine tuning on a burst, our method handles clipping without any pre or post-processing required. This phenomenon is illustrated on Figure 5.

Figure 6: From left to right: reference image, noisy (), pretrained DnCNN and DnCNN after overfitting. The details, such as the trees, are sharper and more distinguishable after overfitting. Figure best visualized zoomed-in on a computer.
Figure 7: From left to right: image of binary noise and an image of stripes. Fine-tuning DnCNN on each gives a much bigger increase of quality when applied on the very self-similar image of stripes compared to the image of binary noise.

Overfitting to a single scene

By fine-tuning over a single burst the network ends up overfitting the data. Usually overfitting to the training data is avoided as it results in a poor generalization capability. However, in our case the fine-tuned network will only be applied to that burst, and overfitting improves the result for that specific burst. There are in the literature other examples where a network is overfitted to a specific input (or a small dataset of inputs). For example, [5] turns an object classification network into a video segmentation by overfitting it to the first frame (which is labeled). The network then learns to track the labeled objects in the following frames. Several image restoration problems are addressed in [35] by using a network as a prior. The network parameters are trained for each input image. In [11] a pre-trained denoising is fine-tuned to an input video.

This overfitting is also reminiscent of traditional image processing methods that fit a model to the patches of the image. In [36] the image patches are modeled using a Gaussian mixture model (GMM), in [12] by representing them sparsely over a learned dictionary, and in [30] via sparse convolutions over a set of kernels. In all these cases the models were trained on the input image. The assumption underlying these methods is that images are self-similar and highly redundant, allowing for compact representations of their patches.

Figure 6 shows that fine-tuning a grayscale denoising network (DnCNN) on a burst of images can significantly improve the denoising results. The likely explanation is that the network is able to capture a part of the image self-similarity, similar to the model-based methods. Figure 7 illustrates the performance evolution when fine-tuning a denoising network on a set of noisy realizations of two synthetic images, one of stripes (thus very self-similar) and a binary noise image (thus not self-similar). The performance gap is explained by the self-similarity of the former image.

4 Experimental results

To evaluate quantitatively the performance of the proposed training strategy, we first apply it on simulated data, since the are no real noisy raw bursts with ground truth publicly available. We generate the burst from a single image by applying random affinities. In the cases where noise is considered, the added noise is white Gaussian. During training, the affinities are estimated from the noisy raw data.

Denoising by overfitting amounts to temporal noise reduction

Figure 8: Overfitting a pre-trained denoising network (DnCNN ) network to a specific sequence increase the quality of the result. The visible drops correspond to each change of image considered (pairs are considered in lexicographical order). It is important to finish with the reference image as to maximise the performance. Nevertheless it doesn’t quite reach the average + DnCNN combo.

Overfitting our network to a sequence allows to restore the image beyond the performance of a single image denoising. In the experiment shown in Figure 8 a sequence of 10 frames without mosaicking pattern and without motion is considered. The plot shows the PSNR evolution as the fine-tuning processes all the pairs (90 in total).

We consider the pairs in lexicographical order, that is every time a new input image is selected it is sequentially paired with all other images in the sequence. Note the characteristic shape traced by the PSNR curve: every time a new input image is selected the performance first drops and then steadily improves surpassing the previous peak. This shows that not only the network is adapting to denoising the current input image but it is also building upon previously seen images.

This fine tuning can be linked to a temporal noise reduction (TNR). For comparison the plot includes the PSNR of results obtained by averaging the frames, which amounts to a naive TNR, by denoising a single frame with DnCNN, and by denoising using DnCNN the result of the naive TNR. The latter amounts to the best possible TNR result in this ideal case. Note that the fine tuning is largely surpassing the performance of single image and temporal averaging denoising.

In practice temporal averaging followed by denoising cannot be applied to mosaicked images, so the upper bound cannot be attained, which justifies even more the relevance of the proposed method.

Improving demosaicking by overfitting

Similarly to overfitting for denoising, overfitting improves demosaicking. The evolution of the improvement, showed in Figure 10, is quite similar to the one presented for denoising. Moreover artifacts that existed in the initial network, due to a low amount of training, are removed completely by the overfitting, see Figure 9. The result then look visually very similar to the result from Gharbi et al. that was trained specially to deal with these difficult cases.

Table 2 compares the PSNR obtained for different networks on the Kodak dataset. Our network was overfitted to the lighthouse image, which is singled out in the table. As expected, the overfitted network works well on the reference image but its performance decrease on the other images. The network trained the regular way performs better on the whole Kodak dataset than the network that was overfitted on a specific image. The increase in performance for this reference image after overfitting was of more than .

Figure 9: From left to right: reference image, our pretrained network, our network after overfitting and Gharbi et al. Because of the reduced size of the training set our blind network still has some moire artefact but they completely disappear after overfitting on the data achieving a result visually close to Gharbi et al. without having to learn on a specific well-chosen dataset. Figure best visualized zoomed-in on a computer.
Figure 10: Overfitting a pre-trained demosaicking network (from Section 2) to a specific sequence increase the quality of the result. The visible drops correspond to each change of image considered (pairs are considered in lexicographical order). It is important to finish with the reference image as to maximise the performance.
Method Lighthouse image (1) Kodak dataset
Overfitted on (1) 44.4 40.4
Regular 42.1 41.3
Table 2: PSNR results using an overfitted network on the lighthouse image of the Kodak dataset versus a the non-overfitted one. While overfitting improves on the specific image, the overall performance on the dataset is decreased.

Joint demosaicking and denoising using overfitting starting from the network trained in Section 2

The final application of overfitting is to do both previous applications at the same time. Table 3 compares the result to two other methods of joint denoising and demosaicking. The network was fine-tuned on each image individually. Overall this approach is quite competitive even though everything was trained without ground truth and only using the information from the burst to learn denoising. Moreover when the quality is not sufficient, it can be improved by using more images for the fine tuning. Nevertheless it is disappointing that for some images it performs particularly bad compared to the other methods while on some other it works very well. However, it seems that the images on which it doesn’t perform well are not very self-similar. This could be reason of the poor performance.

Not only do we achieve competitive results in terms of PSNR, the results are visually artefact free. Indeed, as shown in Figure 1, even in the region that are particularly hard such as the fence. For example there’s no zipping artifact appearing compared to the method by Ghardi et al.

Method Gharbi et al. missing [14] Kokkinos and Lefkimmiatis [25] Ours
Kodak01 34.9 34.5 34.9
Kodak13 32.9 32.3 33.6
Kodak16 37.1 36.5 36.0
Kodak19 36.1 35.5 36.3
Kodak19 33.0 31.1 32.6
Kodak19 (20 images burst) 33.0 31.1 33.4
Kodak 36.2 33.8 35.8
Table 3: PSNR results of different methods for the task of joint denoising and demosaicking. It shows that even though our method is completely blind, it is able to compete with the state of the art. Moreover increasing the length of the burst also allows to improve the quality in the cases where it might perform worse otherwise. Our method used the network trained in Section 2 and was fine-tuned with generated noisy images except when mentioned otherwise.

The final experiment is on real data. We took a burst from the HDR+ dataset [17] and applied our process. We compare the result to the one provided with the dataset in Figure 11. While it is hard to really compare the result provided and our result due to the post-processing, it seems that our process did a good job recreating the details and removing the noise.

Figure 11: Experiment on a real burst. The result after our pipeline is on the left. The result of the HDR pipeline is on the right. It seems that details are well reconstructed and the noise well reduced. Contrast was enhanced for our method in order to see the dark region. Figure best visualized on a zoomed in on a computer.

5 Conclusion

In this work, we have proposed a novel way of training demosaicking neuronal network without any RGB ground truth, by using instead other mosaicked data of the same scene (such as from a burst of images). Based on it and on recent neural network advances, we proposed a method to train jointly demosaicking and denoising with bursts of noisy raw images. We show that fine-tuning on a given burst boosts the reconstruction performance. Clipped noise, a hard problem, is handled natively. It also present a specific case where overfitting a network to the training data is valuable. Since we do not expect generalization there’s only benefits from this overfitting.

We hope our work can lead to new camera pipeline calibration procedures, and general improvement of the image quality when a burst is available.

References

  • [1] A. Abdelhamed, S. Lin, and M. S. Brown. A high-quality denoising dataset for smartphone cameras. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
  • [2] H. Akiyama, M. Tanaka, and M. Okutomi. Pseudo four-channel image denoising for noisy cfa raw data. In 2015 IEEE International Conference on Image Processing (ICIP), pages 4778–4782, Sep. 2015.
  • [3] S. Baker and I. Matthews. Equivalence and efficiency of image alignment algorithms. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 1, pages I–1090. Citeseer, 2001.
  • [4] T. Briand, G. Facciolo, and J. Sánchez. Improvements of the Inverse Compositional Algorithm for Parametric Motion Estimation. Image Processing On Line, 8:435–464, 2018.
  • [5] S. Caelles, K.-K. Maninis, J. Pont-Tuset, L. Leal-Taixé, D. Cremers, and L. Van Gool. One-shot video object segmentation. In Computer Vision and Pattern Recognition (CVPR), 2017.
  • [6] C. Chen, Q. Chen, J. Xu, and V. Koltun. Learning to see in the dark. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
  • [7] L. Condat and S. Mosaddegh. Joint demosaicking and denoising by total variation minimization. In 2012 19th IEEE International Conference on Image Processing, pages 2781–2784, Sep. 2012.
  • [8] K. Dabov and A. Foi. Image denoising with block-matching and 3D filtering. Electronic …, 6064:1–12, 2006.
  • [9] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian. Color image denoising via sparse 3d collaborative filtering with grouping constraint in luminance-chrominance space. In 2007 IEEE International Conference on Image Processing, volume 1, pages I – 313–I – 316, Sep. 2007.
  • [10] A. Danielyan, M. Vehvilainen, A. Foi, V. Katkovnik, and K. Egiazarian. Cross-color bm3d filtering of noisy raw data. In 2009 International Workshop on Local and Non-Local Approximation in Image Processing, pages 125–129, Aug 2009.
  • [11] T. Ehret, A. Davy, J.-M. Morel, G. Facciolo, and P. Arias. Model-blind video denoising via frame-to-frame training. arXiv preprint arXiv:1811.12766, 2018.
  • [12] M. Elad and M. Aharon. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image Processing, 15(12):3736–3745, 2006.
  • [13] P. Getreuer. Color demosaicing with contour stencils. In 2011 17th International Conference on Digital Signal Processing (DSP), pages 1–6. IEEE, 2011.
  • [14] M. Gharbi, G. Chaurasia, S. Paris, and F. Durand. Deep joint demosaicking and denoising. ACM Transactions on Graphics (TOG), 35(6):191, 2016.
  • [15] S. Gu, L. Zhang, W. Zuo, and X. Feng. Weighted nuclear norm minimization with application to image denoising. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2862–2869, 2014.
  • [16] S. Guo, Z. Yan, K. Zhang, W. Zuo, and L. Zhang. Toward convolutional blind denoising of real photographs. arXiv preprint arXiv:1807.04686, 2018.
  • [17] S. W. Hasinoff, D. Sharlet, R. Geiss, A. Adams, J. T. Barron, F. Kainz, J. Chen, and M. Levoy. Burst photography for high dynamic range and low-light imaging on mobile cameras. ACM Transactions on Graphics, 35(6):1–12, nov 2016.
  • [18] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  • [19] F. Heide, K. Egiazarian, J. Kautz, K. Pulli, M. Steinberger, Y.-T. Tsai, M. Rouf, D. Paja̧k, D. Reddy, O. Gallo, J. Liu, and W. Heidrich. FlexISP. ACM Transactions on Graphics, 33(6):1–13, 11 2014.
  • [20] K. Hirakawa and T. W. Parks. Joint demosaicing and denoising. IEEE Transactions on Image Processing, 15(8):2146–2157, Aug 2006.
  • [21] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In F. Bach and D. Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 448–456, Lille, France, 07–09 Jul 2015. PMLR.
  • [22] D. Khashabi, S. Nowozin, J. Jancsary, and A. W. Fitzgibbon. Joint demosaicing and denoising via learned nonparametric random fields. IEEE Transactions on Image Processing, 23(12):4968–4981, Dec 2014.
  • [23] T. Klatzer, K. Hammernik, P. Knobelreiter, and T. Pock. Learning joint demosaicing and denoising based on sequential energy minimization. In 2016 IEEE International Conference on Computational Photography (ICCP), pages 1–11, May 2016.
  • [24] F. Kokkinos and S. Lefkimmiatis. Deep image demosaicking using a cascade of convolutional residual denoising networks. In The European Conference on Computer Vision (ECCV), September 2018.
  • [25] F. Kokkinos and S. Lefkimmiatis. Iterative residual network for deep joint image demosaicking and denoising. CoRR, abs/1807.06403, 2018.
  • [26] A. Krull, T. Buchholz, and F. Jug. Noise2void - learning denoising from single noisy images. CoRR, abs/1811.10980, 2018.
  • [27] C. Kwan, B. Chou, and J. Bell III. Comparison of deep learning and conventional demosaicing algorithms for mastcam images. Electronics, 8(3), March 2019.
  • [28] M. Lebrun, A. Buades, and J.-M. Morel. A Nonlocal Bayesian Image Denoising Algorithm. SIAM Journal on Imaging Sciences, 6(3):1665–1688, 2013.
  • [29] J. Lehtinen, J. Munkberg, J. Hasselgren, S. Laine, T. Karras, M. Aittala, and T. Aila. Noise2noise: Learning image restoration without clean data. arXiv preprint arXiv:1803.04189, 2018.
  • [30] M. Mørup, M. N. Schmidt, and L. K. Hansen. Shift invariant sparse coding of image and music data. Submitted to Journal of Machine Learning Research, 2008.
  • [31] S. H. Park, H. S. Kim, S. Lansel, M. Parmar, and B. A. Wandell. A case for denoising before demosaicking color filter array data. In 2009 Conference Record of the Forty-Third Asilomar Conference on Signals, Systems and Computers, pages 860–864, Nov 2009.
  • [32] T. Plotz and S. Roth. Benchmarking denoising algorithms with real photographs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1586–1595, 2017.
  • [33] N.-S. Syu, Y.-S. Chen, and Y.-Y. Chuang. Learning deep convolutional networks for demosaicing. arXiv preprint arXiv:1802.03769, 2018.
  • [34] P. Thevenaz, U. E. Ruttimann, and M. Unser. A pyramid approach to subpixel registration based on intensity. IEEE transactions on image processing, 7(1):27–41, 1998.
  • [35] D. Ulyanov, A. Vedaldi, and V. Lempitsky. Deep image prior. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  • [36] G. Yu, G. Sapiro, and S. Mallat. Solving inverse problems with piecewise linear estimators: From gaussian mixture models to structured sparsity. Image Processing, IEEE Transactions on, 21(5):2481–2499, May 2012.
  • [37] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang. Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising. IEEE Transactions on Image Processing, 26(7):3142–3155, 7 2017.
  • [38] K. Zhang, W. Zuo, and L. Zhang. FFDNet: Toward a Fast and Flexible Solution for {CNN} based Image Denoising. CoRR, abs/1710.0, 2017.
  • [39] H. Zhao, O. Gallo, I. Frosio, and J. Kautz. Loss Functions for Image Restoration With Neural Networks. IEEE Transactions on Computational Imaging, 3(1):47–57, 3 2017.
  • [40] M. Zhussip, S. Soltanayev, and S. Y. Chun. Theoretical analysis on noise2noise using stein’s unbiased risk estimator for gaussian denoising: Towards unsupervised training with clipped noisy images. CoRR, abs/1902.02452, 2019.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
362745
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description