Notations and problem setting
We assume the noisy image is generated by , in which denotes the underlying clean image and denotes the zero-mean, additive noise that is independent of . For training the denoiser, we do not assume that neither the distribution nor the covariance of is known. Moreover, we assume only noisy images for \emphdistinct clean images, , are available for learning the denoiser, hence, a straightforward N2N training is not possible. The CNN-based denoiser is denoted as , in which is the model parameter, and we use the standard quality metrics, PSNR and SSIM, to evaluate the goodness of denoising. Furthermore, following the convention, we normalize the pixels of the images to have values in .
Description of GAN2GAN
The primary motivation of our GAN2GAN method is simple; given a single noisy image , we want to generate two image pairs that correspond to the noisy images for the same underlying clean image of , but each with independent realization of the noise that is present in . Such generation is challenging, since we have to blindly separate the noise and the clean image solely from , learn the distribution of the noise, and switch only the noise part of with the independent realizations of the noise. Despite the challenge, once successful, we can then use those pairs to carry out the N2N training to train a denoiser. To achieve this goal, we propose the following 3 steps of our GAN2GAN method.
Smooth noisy patch extraction\labelsubsec:noisy patch
The first step is to extract the noisy image patches from that correspond to smooth, homogeneous areas. Our extraction method is similar to that of the GCBD method proposed in \citechen2018image, but we make a critical improvement. The GCBD determines a patch is smooth if it satisfies the following rules for all of its smaller sub-patches, :
in which and are the empirical mean/variance of the pixel values, and are the hyperparameters. While the rule (\the@equationgroup@ID) works for extracting smooth patches to some extent, we show in our experiments that it also ends up choosing patches with high-frequency repeating patterns, which are far from being smooth. Thus, we instead utilize the 2D discrete Wavelet transform (DWT) for a new extraction rule. I.e., we determine is smooth if its four sub-band decompositions obtained by DWT, , satisfy the following:
in which stands for the empirical standard deviation of the Wavelet coefficients, , and is a hyperparameter. In words, if the empirical standard deviation of the coefficients of each sub-band is not far from the average of them, we determine is a smooth patch. This single rule is much simpler than (\the@equationgroup@ID), which has to be evaluated for all the sub-patches, . In our experiments, we show this modification of the extraction rule plays a critical role in our training of our generative model and the final denoising performance. Once noisy patches are extracted from using (\the@equationgroup@IDa), by exploiting that the noise is zero-mean and additive, we subtract each patch with its mean pixel value, and obtain a set of “noise” patches, . Such subtraction is valid since in smooth patches, all the pixel values should be close to their mean.
Training a W-GAN based generative model
Equipped with the noisy images and the extracted noise patches , we train a W-GAN based generative model, of which overall structure is depicted in Figure Document. Our model has three generators, , and two critics, , and the subscripts stand for the model parameters.
The loss functions associated with the components of our model are following:
L_\textcyc(\thetab_2,\thetab_3)\triangleq& \mathbbE_\Zb[∥\Zb-g_\thetab_3(g_\thetab_2(\Zb))∥_1]. The loss (Document) is for training the first generator-critic pair, , of which learns to generate the independent realization of the noise mimicking the patches in , taking the random vector as input. The second loss (Document) links the two generators, and , with the second critic, . The second generator is intended to generate the underlying (unobserved) “clean” image for the input noisy image , and the critic determines how close the distribution of the generated noisy images, , is to the distribution of the input noisy images. Note by adding the “estimated” clean image with the generated noise from the first generator, we aim to simulate the noisy images that have the independent noise realization of the noise in the original noisy image . The given noisy images in are used as input to as well as to in this loss term. The third loss (Document) is similar to the so-called cycle loss proposed in CycleGAN, and it works as a regularizer for the estimated clean image, . In CycleGAN, such loss was devised to impose the cycle consistency between the images such that only the intended characteristic of the input is changed while the basic structure is preserved. We apply this loss to change the noise realization, the “intended characteristic”, while preserving the underlying clean image, the “basic structure”. We show in our experiment that this third generator, , and loss play an important role in maintaining the quality of .
Once the losses are defined, training the generators and critics are done in an alternating manner, as in the training of W-GAN [arjovsky2017wasserstein], to approximately solve
\underset\thetab_1,\thetab_2,\thetab_3min \underset\wbb_1,\wbb_2max[ α\Lcal_\nb(\thetab_1,\wbb_1)+& β\Lcal_\Zb(\thetab_1,\thetab_2,\wbb_2)
+& γ\Lcal_\textcyc(\thetab_2,\thetab_3) ], in which are hyperparameters to control the trade-offs between the loss functions. There are a few important subtle points for training with the overall objective (Document). Firstly, while we use for the inner maximization for critics, we use for the outer minimization for generators. The main intuition for using different for training the generators is due to different levels of confidence in the generator loss terms. Namely, we assign the largest weight to (Document) since it is a deterministic loss and its value has a clear meaning. The generator loss (Document), which is in the form of the standard W-GAN loss, gets the medium level weight since the meaning of its value is less certain than (Document). In contrast, the generator loss in (Document), which consists of two generators, can become somewhat unstable during training, hence, it gets the least weight. Secondly, the output layer of must have the sigmoid activation function. Note itself can be thought of another denoiser, but since we are not training it with any target, we need to ensure the outputs of have values between to prevent from obvious errors of generating negative or out-of-bound pixel values. Without the sigmoid activation, it turned out all the generators cannot be trained properly at all. Finally, using the right architectures for the generators and critics, e.g., number of layers and filters, was critical since the training procedure got very sensitive to the architectural variations. The details of model architectures and hyperparameters are in the Supplementary Materials.
Iterative GAN2GAN training of a denoiser
Once the training of our generative model is done, for each given in , we can generate the synthetic noisy pair
in which are two independent random vectors sampled from . We denote the set of such generated noisy image pairs as . Then, using and similarly as in the usual N2N training, we can train a CNN-based denoiser by minimizing
Note in (Document), we only use the generated noisy images and do not use the actual observed in , justifying the name GAN2GAN. While we show in our experimental results that the denoiser obtained by minimizing (Document), denoted as , is already a decent denoiser, we can devise an iterative GAN2GAN training to further upgrade the obtained . Namely, by observing that is designed to be a denoiser itself, we can replace it with a better quality denoiser, . Then, for each in , we again generate
with another realizations of to obtain a new set of generated image pairs . We expect (Document) would be closer to the true independently realized noisy image pairs in distribution compared to (Document) since a better denoiser than is used. The examples of such synthesized noisy image pairs are given in the Supplementary Materials. With , the GAN2GAN training, warm starting from , can be done again to further update the model. We show this iterative GAN2GAN training, typically with just one or two iterations, becomes extremely effective and gives a significant boost in denoising performance.