Poisson Noise Reduction with Higherorder
Natural Image Prior Model
Abstract
Poisson denoising is an essential issue for various imaging applications, such as night vision, medical imaging and microscopy. Stateoftheart approaches are clearly dominated by patchbased nonlocal methods in recent years. In this paper, we aim to propose a local Poisson denoising model with both structure simplicity and good performance. To this end, we consider a variational modeling to integrate the socalled Fields of Experts (FoE) image prior, that has proven an effective higherorder Markov Random Fields (MRF) model for many classic image restoration problems. We exploit several feasible variational variants for this task. We start with a direct modeling in the original image domain by taking into account the Poisson noise statistics, which performs generally well for the cases of high SNR. However, this strategy encounters problem in cases of low SNR. Then we turn to an alternative modeling strategy by using the Anscombe transform and Gaussian statistics derived data term. We retrain the FoE prior model directly in the transform domain. With the newly trained FoE model, we end up with a local variational model providing strongly competitive results against stateoftheart nonlocal approaches, meanwhile bearing the property of simple structure. Furthermore, our proposed model comes along with an additional advantage, that the inference is very efficient as it is wellsuited for parallel computation on GPUs. For images of size , our GPU implementation takes less than 1 second to produce stateoftheart Poisson denoising performance.
keywords:
Poisson denoising, Fields of Experts, Anscombe root transformation, nonconvex optimization.AMS subject classifications: 49J40, 49N45, 68U10, 68T05, 68T20, 90C26, 65K10
1 Introduction
The degradation of the acquired signal by Poisson noise is a common problem that arises in applications such as biomedical imaging, fluorescence microscopy and astronomy imaging, among many others Bertero2010 (); Rodrigues08 (); Berry05 (). Thereby, the removal of Poisson noise is of fundamental importance especially for further processing, such as image segmentation and recognition. However, the Poisson noise is not additive and its strength depends on the image intensity, alluding to the fact that Poisson denoising is generally intractable relative to the usual case of additive white Gaussian noise.
Up to now, there exist plenty of Poisson denoising algorithms in the literatures, see INAnscombe7 (), INAnscombe8 (), Figueiredo (), INAnscombe10 () and the references therein for a survey. Roughly speaking, existing Poisson denoising approaches can be divided into two main classes: (1) with variancestabilizing transformation (VST) and (2) without VST.
Usually, the approaches in the first class take three steps for Poisson denoising. (Step 1:) A nonlinear VST such as Anscombe Anscombe1 (); Anscombe2 () or Fisz Fisz () is applied to the input data. This produces data with approximate additive Gaussian noise, which is signalindependent. The rationale of applying a VST to the input data is to remove the signaldependency property of Poisson noise. (Step 2:) As the transformed image can be approximately treated as a noisy image with Gaussian noise of unitary variance, Gaussian denoising algorithms are applicable to estimate a clean image for the transformed image. (Step 3:) The underlying unknown noisefree image is obtained by applying an inverse VST INAnscombe1 (); Lefkimmiatis (); INAnscombe3 (); INAnscombe5 () to the denoised data.
Since the problem of Gaussian noise removal is well studied, there are many candidates applicable for the second step. A typical choice is the wellknown BM3D algorithm BM3D (), with which the resulting Poisson denoising algorithm can lead to stateoftheart performance. In general, the VST based approaches work very well for relative largecount Poisson denoising problems. However, its performance will dramatically decrease for cases of lowcounts Salmon2 (), especially for extremely lowcount cases e.g., images with . This motivates the other class of Poisson denoising algorithms, which directly investigate the statistics properties of the Poisson noise, other than the transformed noise.
To avoid the VST operation, several authors have investigated reconstruction algorithms that rely directly on the Poisson statistics. In Willettpoisson (), a novel denoising algorithm in consideration of the Poisson statistics is developed, which is originated from the multiscale likelihood framework proposed by Kolaczyk and Nowak Kolaczyk (). In Salmon2 (), J. Salmon derived a Gaussian mixture model (GMM) Yu1 () based approach relying directly on the Poisson noise properties. This approach combines an adaptation of Principal Component Analysis (PCA) for Poisson noise PoissonPCA1 () and sparse Poisson intensity estimation methods SPIRALTAP (), whereby it achieves stateoftheart results for particularly high noise levels. Two versions are involved in this method: the nonlocal PCA (NLPCA) and the nonlocal sparse PCA (NLSPCA). Among them, the NLSPCA adds an regularization term to the minimized objective and achieves a better recovery performance. The similar datafidelity term derived from Poisson noise statistics is also adopted in Figueiredo (), Le1 (), Giryes (). Especially, the work in Giryes () proposes a new scheme for Poisson denoising and achieves stateoftheart result. It relies on the Poisson statistical model and uses a dictionary learning strategy with a sparse coding algorithm.
1.1 Our contribution
When we have a closer look at stateoftheart Poisson denoising approaches, we find that such approaches are clearly dominated by patchbased nonlocal methods, such as PCA based model Salmon2 () and BM3D based algorithm INAnscombe5 (). Furthermore, some of them (e.g., Salmon2 ()) achieve good denoising performance at the cost of computation efforts. A notable exception is the BM3D based algorithm INAnscombe5 (), which meanwhile offers high efficiency, thanks to the highly engineered and well refined BM3D algorithm.
The goal of this paper/our contribution is to propose a local model with simple structure, while simultaneously with strongly competitive denoising performance to those stateoftheart nonlocal models. To this end, we investigate variational models to incorporate a local image prior model called Fields of Experts (FoE) RothFOE (), which has proven highly effective for Gaussian denoising ChenGCPR (); chenTIP () and multiplicative noise reduction SpeckleFOE ().
We consider several possible ways to construct the corresponding variational models. We start from a straightforward variational model with a data term directly derived from the Poisson noise statistics. While this model generally performs well in cases of high SNR, it encounters problems in cases of low SNR due to too many zerovalued pixels. Motivated by the BM3D based algorithm INAnscombe5 (), we then resort to the Anscombe transform based framework, together with a quadratic data term, which is derived from Gaussian statistics.
With the Anscombe transform based variational model, we directly train the FoE image prior model in the Anscombe transform domain by using a lossspecific training scheme. In experiments, we find that the quadratic data term does not perform very well for the cases of low SNR, due to the imprecise stabilization and markedly skewed distribution of the transformed data. Therefore, we investigate an ad hoc data fidelity term for these circumstances to improve the performance. Eventually, we reach a FoE prior based Poisson denoising model, which can lead to competitive performance against stateoftheart algorithms, meanwhile preserve the structure simplicity.
In summary, our contribution comes from a successful adaptation of the FoE image prior model for the problem of Poisson denoising via a general variational model, which is capable of handling Poisson noise of any level.
1.2 Organization
The remainder of the paper is organized as follows. Section II presents brief reviews for the building blocks exploited in our work. In the subsequent section III, we propose variant variational models to incorporate FoE priors for the task of Poisson denoising. Subsequently, Section IV describes comprehensive experiment results for the proposed model. The concluding remarks are drawn in the final Section V.
2 Preliminaries
To make the paper selfcontained, in this section we provide a brief review of Poisson noise, the Anscombe Transform and FoE image prior models. Due to the nonconvexity of FoE regularization, FoE related variational models impose nonconvex optimization problems. In order to efficiently solve these problem, we resort to a recently developed nonconvex optimization algorithm  iPiano iPiano (). For the sake of readability, we also present the basic update rule of the iPiano algorithm in this section. Furthermore, as we repeatedly make use of the lossspecific learning scheme to train a FoE regularizer for the Poison denoising task, in this section we also present the general training process, which is treated as a toolbox to train a FoE model.
2.1 Poisson noise
Let be the original true image of interest and be the observed Poisson noisy image, where the entries in (given ) are Poisson distributed independent random variables with mean and variance , i.e.,
(2.1) 
where and are the th component in and respectively, and is the Kronecker delta function. According to the property of the Poisson distribution, for the Poisson noise , we have and . This alludes to the fact that Poisson noise is signal dependent, as the variance (strength) of the noise is proportional to the signal intensity . Moreover, as the standard deviation of the noise equals , the signaltonoise ratio (SNR) is given as . Then the effect of Poisson noise increases (i.e., the SNR decreases) as the intensity value decreases. Therefore, it is reasonable to define the noise level in the image by the maximal value (the peak value) in , since lower intensity in the image yields a stronger noise.
According to the Gibbs function, the likelihood (2.1) leads to the following energy term (also known as data term in the variational framework) via the equation
(2.2) 
where denotes the standard inner product. This datafidelity term (2.2) is the socalled Csiszár Idivergence model csiszar1991least (), and has been widely investigated in previous Poisson denoising algorithms, e.g., Salmon2 (); SPIRALTAP (); Lingenfelter ().
2.2 Variance stabilization with the Anscombe transform
The Anscombe transform Anscombe1 (); Anscombe2 () is defined as
(2.3) 
Applying this nonlinear transform to Poisson distributed data gives a signal with approximate additive Gaussian noise of unitary variance. Concerning the inverse transform, INAnscombe5 () exploited three types of optimal inverse Anscombe transforms, which can significantly improve the recovery performance obtained by the direct algebraic inverse, particularly in the circumstances of the lowcount. In this paper we consider the following closedform approximation of the exact unbiased inverse of the Anscombe transform makitalo2011closed ()
(2.4) 
2.3 The FoE image prior model
The FoE image prior model RothFOE () is defined by a set of linear filters and a potential function. This image prior model has been widely investigated for many classic image restoration problems due to its effectiveness chenTIP (); SamuelMRF (); GaoCVPR2010 (). Given an image , the FoE prior model is formulated as the following energy functional
(2.5) 
where is the number of filters, is a set of learned linear filters with the corresponding weights , is the number of pixels in image , denotes the convolution of the filter with image , and denotes the potential function. It is clear that the FoE prior model is defined upon locally supported filter responses, oriented towards local directions and textural features.
In previous works, the potential function is usually chosen as a nonconvex function. In this paper, we follow the common choice of the potential function , defined as
(2.6) 
2.4 The iPiano algorithm
The iPiano algorithm is designed for solving an optimization problem which is composed of a smooth (possibly nonconvex) function and a convex (possibly nonsmooth) function :
(2.7) 
It is based on a forwardbackward splitting scheme with an inertial force term. Its basic update rule is given as
(2.8) 
where and are the step size parameters. is the forward gradient descent step, and is the inertial term. The term denotes the standard proximal mapping nesterov2004introductory (), which is the backward step. The proximal mapping is given as the following minimization problem
(2.9) 
2.5 Lossspecific learning scheme to train the FoE prior model
In our work we exploit a lossspecific training scheme to learn FoE prior models for the Poisson denoising problem. Here we first summarize the general training framework, and in latter sections we repeatedly use it as a toolbox to train FoE prior models.
2.5.1 Lossspecific training model for the FoE prior
Given a set of clean/noisy image pairs where and are the clean image and the associated noisy image corrupted by Poisson noise, respectively, the lossspecific training scheme is to learn optimized parameters to minimize certain predefined loss function. In summary, the learning model is formulated as the following bilevel optimization problem
(2.10) 
where the lowerlevel problem is defined as the FoE prior regularized variational model, and the upper level problem is defined by a loss function to measure the difference between the optimal solution of the lowerlevel problem and the groundtruth . At present, we do not specify the specific form of the data term in the lowerlevel problem, as it can have different forms in our work.
In the training phase, the FoE model is parameterized by , which is related to the weights of the filters, and by , which is related to the filters coefficients. we use instead of to ensure the learned weights are positive. The dependency of on is defined as
(2.11) 
where , are usually chosen as a set of zeromean basis filters to keep consistent with the findings in Huang1999_Statistics () that meaningful filters should be zeromean.
2.5.2 Optimization for the training phase
Previous works chenTIP (); SamuelMRF () have demonstrated that it is possible to find a meaningful stationary point via gradientbased optimization algorithms for the learning problem (2.10). The gradient of the loss function with respect to the parameters can be computed via implicit differentiation SamuelMRF ()
(2.12) 
In (2.12), denotes the Hessian matrix of ,
where is an highly sparse matrix, implemented as 2D convolution of the image with filter kernel , i.e., , and matrix is given as
with
algocf[htbp] \end@dblfloat Then we arrive at the following formulations to compute the gradients
(2.13) 
where , with
Note that all the expressions in (2.12) and (2.13) are evaluated at , and therefore we first have to solve the lowerlevel problem for a given parameter vector before computing the gradient of . Equation (2.13) is the derivatives for the case of one training sample. Considering the case of training samples, the derivatives of the overall loss function with respect to the parameters are just the sum of (2.13) over the training dataset.
Now we can exploit gradientbased algorithm for optimization. In this work, we employed a quasiNewton’s method  LBFGS BFGS () for optimization. Let us summarize our training algorithm in Algorithm LABEL:algo1. Note that the step (2) in Algorithm LABEL:algo1 is completed using the iPiano algorithm. More details will be described in Section 3. The training algorithm LBFGS is terminated when the relative change of the loss is less than a tolerance, e.g., , a maximum number of iterations e.g., is reached or LBFGS can not find a feasible step to decrease the loss.
2.6 Details of the training setup for Poisson denoising
It is shown in chenTIP () that, although the loss specific training scheme is a discriminative training approach, the trained FoE image prior model is not highlytailored to the noise level used for training, but in contrast, it generalizes well for different noise levels. Therefore, in the circumstance of Poisson denoising, we also only train the FoE image prior model once at a certain noise level, and then apply it for different noise levels.
Taking into account the fact that generic natural image priors have an interesting effect only at medium noise levels levin2011natural (), we conduct our training experiments based on the images with relatively high peak value. In this study, we choose .
As we focus on the problem of Poisson denoising for natural images in a general sense, FoE prior models should be trained based on natural image datasets. We conducted our training experiments using the training images from the BSDS300 image segmentation database amfm2011 (). We used the whole 200 training images, and randomly sampled one patch from each training image, resulting in a total of 200 training samples. The gray value range of all the images was first scaled with peak value , and then corrupted with Poisson noise using the Matlab function poissrnd.
Concerning the model capacity of the trained FoE models, following a common setup of filters for the FoE model in ChenGCPR (), we learned 48 filters with dimension . As we focused on meanzero filters, we made use of a modified DCT7 basis (the atom with constant entries is excluded) to construct our learned filters. We initialized the filters using the modified DCT7 basis. All filters have the same norm and weight, which are 0.1 and 1, respectively^{1}^{1}1 As shown in previous work ChenGCPR (), the training process is not sensitive to the initialization..
As shown in the Algorithm LABEL:algo1, the main computation burden in the training phase is that: for each training sample, we need to solve the lowerlevel with iPiano, and then calculate the gradients with respect to parameters according to (2.13), where a matrix inverse operation is involved. With our unoptimized Matlab implementation, the training time for the setup considered in this work (48 filters of size ) was approximately 24 hours on CPUs Intel X5675, 3.07GHz.
3 Proposed Variational Models for Poisson Noise Reduction
In this section, we propose several variational models to incorporate FoE priors for the task of Poisson denoising. As we will investigate two different FoE prior models, we make use of the following notations for clarity

trained in the original image domain with the Poisson noise statistics derived data term;

trained in the Anscombe transform domain with a quadratic data term.
3.1 A straightforward formulation
We started with a straightforward formulation to integrate a FoE prior for Poisson denoising. It is given as the following energy minimization problem
(3.1) 
where is a tradeoff parameter. This model is a direct combination of the FoE prior model (2.5) and the Poisson noise statistics derived data term (2.2). Regarding the FoE prior model, we exploited the lossspecific training scheme described in Section 2.5, and the lowerlevel problem is defined as the above Poisson denoising model, i.e., we have the data term
with where is the noisy input. With respect to the upperlevel problem, we considered the quadratic loss function
with denoting the groundtruth (clean) image. Then it is easy to check that
which are required to compute the gradients in (2.13). Now we can run training experiments based on the Poisson denoising model using the Algorithm LABEL:algo1. Then we obtain a specialized FoE model prior , which is directly trained based on the Poisson denoising model (3.1). The learned filters are shown in Figure 3(a).
The corresponding minimization problem (3.1) with can be solved by the iPiano algorithm. Casting (3.1) in the form of (2.7), we have and . It is easy to check that
(3.2) 
and the proximal mapping with respect to is given by the following pointwise operation
(3.3) 
The calculation rule (3.3) for solving can guarantee that the constraint is fulfilled when the input image is strictly positive.
We show a Poisson denoising example using model (3.1) with in Figure 1(c). We also present the result of the BM3Dbased algorithm INAnscombe5 () for a comparison. A quick glance at the our result shows that the investigated model (3.1) achieves overall quite good result.
3.1.1 Problem of model (3.1) for cases of low peak value
In (3.1), it is clear that for the points of , the data term has the minimizer . In other words, if the points of can not be filled by the prior term, they will not be updated and fixed at 0 in the iterative process. For the above exploited example with , as only few pixels have the gray value 0, these points can be filled by the prior term.
However, in cases of low peak value, large regions with many zeros frequently appear in the image. Meanwhile, noisy pixels with very high intensity also appear in these regions, and thus strong edges forms. As these regions are significantly larger than the filter size of the FoE prior, the prior term is not able to fill those zeros. As a consequence, those pixels with zero intensity can not be updated and fixed at 0 in the iterations. Finally, these noisy strong edges are retained, instead of smoothed out. An example of is shown in Figure 2, where large patches with many zeros appear in the dark parts of the image, e.g., the coat of the cameraman.
If we directly apply the model (3.1) with the prior term to this problem, the obtained result is shown in Figure 2(d). As expected, there is a problem in those regions with many zeros. We can remedy this problem by setting those zerovalued pixels in the noisy images to certain constant . The result with is given in Figure 2(e). It indeed improves the performance in the dark regions, but it is still unsatisfactory. If we consider a larger , it can indeed remove the artifacts in these dark regions, but it brings a overall luminance bias, as such a large changes the original input image too much.
It turns out that the straightforward modeling (3.1) is not suitable for cases of low peak value, and this motivates us to consider other alternatives for Poisson denoising. Inspired by the BM3Dbased algorithm INAnscombe5 (), which perform Poisson denoising based on the Anscombe variancestabilizing transformation, we also attempt to exploit this strategy, i.e., performing noise reduction in the transform domain. Note that the points of become those of after Anscombe transformation, thereby there is no the aforementioned problem in model (3.1) for low peak cases.
3.2 Poisson denoising in the Anscombe transform domain
Following the strategy of INAnscombe5 (), we consider the following threestep procedure to integrate the FoE prior model for Poisson denoising. First, the noisy data is modified by applying the Anscombe transform in (2.3) to generate the data . The second step is to denoise the transformed data via a variational model based on the trained FoE image prior model. Finally, an estimate of the underlying image is obtained by applying the inverse transform (2.4) to the denoised result . Therefore, the overall denoising procedure is given by
(3.4) 
The second step is the key point of the proposed Poisson denoising algorithm, where we make use of a quadratic data term as the noise statistics of the transformed data can be approximately treated as additive Gaussian noise. The proximal mapping with respect to this new data term is given by the following pointwise operation
(3.5) 
which is required by the iPiano algorithm.
Note that the FoE prior model in (3.4) performs in the Anscombe root transform domain which will make a nonlinear change to the image dynamical range. Therefore, we need to retrain the FoE model directly based on the variational model in (3.4). Comparing two set of filters learned in the original image domain and the Anscombe transform domain, respectively, as shown in Figure 3, one can see that have different appearances, especially those filters with relatively high weights.
We still make use the lossspecific training scheme described in Section 2. Now, the lowerlevel problem is given as
where we omit the tradeoff parameter as it can incorporated into the weights . The corresponding upperlevel problem is defined as the quadratic loss function
It is easy to check that
and the gradient is given as
matrix is given as
with . Then we can exploit the Algorithm LABEL:algo1 to train FoE prior models in the Anscombe transform domain. The obtained FoE prior model is named as .
3.3 An ad hoc data term for the cases of low peak in the Anscombe transform
For the denoising method in the Anscombe transform domain, the corresponding data term is usually chosen as a quadratic function, which is derived from the Gaussian noise statistics. Once we have obtained a new FoE prior model for the Anscombe transform based data, we can apply it for Poisson denoising with various noise levels^{2}^{2}2 Note that as the exploited FoE prior model is trained based on the case of , we need to tune the tradeoff parameter for other specific noise levels such that the result is optimized. It is effortless to tune the parameter , and a detailed guideline will be presented in the implementation after acceptance. by using the framework (3.4). An example for the case of is shown in Figure 1(d). One can see that in terms of both quantitative results and visual inspection, it achieves better results than those straightforward models, and moreover, it leads to strongly competitive result to the BM3Dbased algorithm INAnscombe5 ().
Even though the proposed model (3.4) works quite well in cases with relatively high peak value, in experiments, we find that its performance for cases with low peak value (e.g., ) is not satisfactory, as it generally leads to somehow oversmoothing result. An example for the case of is shown in Figure 4(b). We believe that the reason is attributed to the data term. Note that the noise statistics of the transformed data can only be approximately treated as additive Gaussian noise. The approximation error will dramatically increase for cases of low count. Therefore, it is better to exploit a different data term for these cases, rather than employing the same quadratic data term.
In practice, we find that a naive attempt to exploit the direct Poisson noise statistics derived data term (2.2) can generally provide preferable results for these cases, i.e., we replace the second step in the framework (3.4) with the following variational model
(3.6) 
Note that the FoE prior model is , and this model works in the Anscombe transform domain only for the cases of lowpeak.
The result of this modified version is shown in Figure 4(c). Comparing the quantitative indexes and the visual quality, we can see that when the peak value is very low, the modified model (3.6) generates preferable result on detail preservation, especially around the tassel of ’s hat.
In order to further investigate the influence of the data term, we also consider a Poisson denoising experiment for the case with relative high peak value, e.g., by using two different data terms, i.e., (3.4) and (3.6). The corresponding results are shown in Figure 4(e) and (f). One can see that for this case, the conventional Gaussian noise derived data term performs better. As a consequence, it is reasonable to adopt adaptive data terms in the proposed framework to accommodate different situations.
In order to better validate this adaptation, we repeated the above evaluation based on the eight test images in Fig 6, and summarize the average performance in terms of PSNR and MSSIM in Table 1. Numerical results strength the adaptation described above.
Although there is no clear interpretation why it is a good choice to replace the quadratic data term with (3.6) for the case of low peak, we can find some clues in Figure 5 for an explanation. We can see that the Idivergence data term (3.6) is quite similar to the quadratic data term at the right part of the point , while there is some difference at the left part, i.e., this modification makes a difference, but not a huge difference.
An additional Poisson denoising example for the case of using the model (3.6) is shown Figure 2(c). One can see that the resulting performance is even better than the BM3Dbased algorithm INAnscombe5 ().
PSNR/MSSIM  
Model (3.4)  Model (3.6)  
peak=2  22.93/0.61  23.34/0.64 
peak=7  25.80/0.72  25.57/0.70 
3.4 The final selected FoEbased variational model for Poisson denoising
After a comprehensive investigation of several variants to incorporate the FoE prior model for Poisson denoising, we arrive at the following conclusions:

It is better to formulate the FoEbased Poisson denoising model in the Anscombe transform domain.

An adaptive modeling of the data term for different noise level is helpful.
As a rule of thumb, is used as the threshold. Just as indicated in Table 1, for , we employ the model 3.4, and for we use the model 3.6. Therefore, our best performing version is given as
(3.7) 
with and . Note that, in the binning case, the threshold is still peak = 5. However, the peak value should be determinated by the binned image. In the following test procedure, we will only present the results obtained by this final version for the sake of brevity.
4 Comparison to Stateofthearts
In this section, we report numerical experiments on Poisson noise reduction with 8 test images in Fig. 6. Additionally, we compare the proposed algorithm with the NLSPCA Salmon2 () and the BM3D with the exact unbiased inverse Anscombe INAnscombe5 () (both with and without binning), which usually provide stateoftheart results. The corresponding codes are downloaded from the author’s homepage, and we use them as is. For the binning technique, we closely follow Salmon2 () and use a ones kernel to increase the peak value to be 9 times higher, and a bilinear interpolation for the upscaling of the lowresolution recovered image. Moreover, in our study we take one particular noise realization with the Matlab command randn(’seed’,0); rand(’seed’,0), because we want to provide determinant and easily reproducible results for a direct comparison between different algorithms. Two commonly used quality measures are taken to evaluate the Poisson denoising performance, i.e., PSNR and the mean structural similarity index (MSSIM) SSIM (). For convenience sake, the proposed model (3.7) is abbreviated as FoEPNRbin if the binning technique is adopted, and FoEPNR otherwise.
4.1 Experimental results
We evaluated the performance of the proposed variational model (3.7) (with and without binning) for various images in Fig. 6 with different peak values ranging from 0.1 to 40.
Examining the recovery images in Fig. 7, we see that compared with the other methods, the proposed algorithm FoEPNRbin is more accurate at capturing strong edges and details, especially the bright part. It is worthy noting that, by closely observation, for the low peak value the proposed method with binning technique performs better on capturing the structure of the image, and therefore provides higher PSNR/MSSIM value.
Method  Peak  Boat  Cameraman  Fluocells  House  Lena  Bridge  Pepper  Man  Average value 
NLSPCA
NLSPCAbin BM3D BM3Dbin FoEPNR FoEPNRbin 
18.27/0.41  16.56/0.40  21.42/0.49  17.44/0.44  18.30/0.47  16.60/0.22  15.62/0.39  18.15/0.41  17.80/0.40  
18.80/0.45  17.19/0.48  21.45/0.50  18.09/0.52  19.03/0.59  16.92/0.22  15.19/0.45  18.41/0.46  18.14/0.46  
0.1  16.34/0.33  15.65/0.34  20.68/0.42  15.92/0.38  16.78/0.42  15.96/0.21  15.22/0.42  16.71/0.36  16.66/0.36  
18.89/0.46  16.99/0.51  21.45/0.51  17.84/0.57  18.78/0.59  17.03/0.22  15.44/0.49  18.31/0.47  18.09/0.48  
18.68/0.42  16.81/0.41  21.41/0.48  18.23/0.50  18.26/0.49  17.35/0.22  15.61/0.43  18.37/0.41  18.09/0.42  
19.28/0.47  17.66/0.51  21.70/0.52  18.47/0.61  19.07/0.59  17.25/0.23  16.07/0.51  18.68/0.47  18.52/0.49  
NLSPCA
NLSPCAbin BM3D BM3Dbin FoEPNR FoEPNRbin 
19.11/0.41  17.65/0.37  22.45/0.53  18.44/0.42  19.54/0.50  17.45/0.24  16.94/0.44  19.18/0.43  18.85/0.42  
19.58/0.47  17.82/0.53  21.98/0.52  19.15/0.57  19.73/0.61  17.45/0.24  16.24/0.51  19.01/0.48  18.87/0.49  
0.2  18.38/0.40  17.06/0.42  22.23/0.51  17.89/0.48  18.96/0.50  17.09/0.25  16.67/0.50  18.68/0.44  18.40/0.46  
19.99/0.47  18.08/0.54  22.46/0.55  19.03/0.58  20.29/0.61  17.82/0.25  16.73/0.52  19.66/0.68  19.26/0.53  
19.78/0.43  17.85/0.49  22.19/0.54  19.03/0.56  19.35/0.56  17.81/0.25  16.36/0.50  19.39/0.47  18.97/0.48  
20.03/0.49  18.52/0.58  22.58/0.57  18.54/0.63  20.53/0.64  17.53/0.26  17.07/0.54  19.76/0.50  19.32/0.53  
NLSPCA
NLSPCAbin BM3D BM3Dbin FoEPNR FoEPNRbin 
20.42/0.45  19.06/0.51  23.73/0.59  20.29/0.54  21.26/0.57  18.46/0.28  18.34/0.55  20.60/0.49  20.27/0.50  
20.17/0.48  18.31/0.54  22.51/0.54  20.47/0.61  20.57/0.62  18.19/0.26  16.93/0.53  19.74/0.49  19.61/0.51  
0.5  20.07/0.47  18.64/0.52  23.81/0.58  19.71/0.58  21.13/0.59  18.15/0.28  18.18/0.56  20.40/0.49  20.07/0.57  
21.06/0.50  19.53/0.58  23.85/0.60  21.25/0.64  22.37/0.64  18.53/0.28  18.55/0.59  21.09/0.52  20.78/0.54  
21.01/0.50  19.29/0.57  23.57/0.58  20.95/0.64  21.95/0.63  18.59/0.29  18.71/0.58  20.96/0.51  20.63/0.54  
21.11/0.51  19.85/0.62  24.03/0.63  21.18/0.66  22.21/0.66  18.49/0.29  18.66/0.60  21.01/0.52  20.82/0.56  
NLSPCA
NLSPCAbin BM3D BM3Dbin FoEPNR FoEPNRbin 
21.17/0.49  20.26/0.61  24.61/0.63  21.54/0.61  22.53/0.64  18.98/0.29  19.32/0.61  21.41/0.52  21.23/0.55  
20.20/0.48  18.41/0.55  22.64/0.54  20.64/0.63  20.66/0.62  18.30/0.26  17.01/0.54  19.88/0.49  19.72/0.51  
1  21.40/0.51  20.40/0.60  24.65/0.62  21.80/0.65  22.65/0.64  19.38/0.33  20.00/0.62  21.64/0.53  21.49/0.56  
21.90/0.52  20.49/0.62  24.86/0.64  22.68/0.69  23.54/0.68  19.53/0.32  20.14/0.64  22.00/0.54  21.89/0.58  
22.01/0.53  20.85/0.63  24.87/0.64  22.41/0.68  23.28/0.67  19.73/0.34  20.39/0.66  22.04/0.55  21.95/0.59  
21.88/0.52  20.73/0.64  25.09/0.65  22.18/0.68  23.35/0.69  19.32/0.33  19.92/0.64  21.84/0.54  21.79/0.59  
NLSPCA
NLSPCAbin BM3D BM3Dbin FoEPNR FoEPNRbin 
21.76/0.52  20.69/0.63  25.34/0.65  23.23/0.70  23.88/0.68  19.49/0.31  20.25/0.65  22.33/0.55  22.12/0.59  
20.29/0.48  18.33/0.55  22.69/0.54  20.85/0.64  20.91/0.63  18.32/0.26  17.12/0.54  19.88/0.49  19.80/0.52  
2  22.70/0.55  22.07/0.64  26.06/0.68  23.38/0.66  24.16/0.66  20.25/0.39  21.73/0.67  23.01/0.58  22.92/0.60  
22.78/0.56  21.34/0.66  26.00/0.67  23.90/0.71  24.69/0.71  20.15/0.35  21.35/0.68  22.98/0.58  22.90/0.62  
23.15/0.58  22.74/0.71  26.25/0.69  24.09/0.73  24.68/0.71  20.51/0.39  22.05/0.72  23.27/0.60  23.34/0.64  
22.76/0.55  21.51/0.64  26.07/0.67  23.55/0.70  24.59/0.72  20.04/0.38  21.43/0.68  22.90/0.56  22.86/0.61  
NLSPCA
NLSPCAbin BM3D BM3Dbin FoEPNR FoEPNRbin 
22.43/0.54  21.04/0.65  26.33/0.68  24.28/0.72  24.48/0.70  20.19/0.35  20.95/0.67  22.91/0.57  22.83/0.61  
20.29/0.48  18.33/0.55  22.68/0.54  20.92/0.64  20.78/0.63  18.37/0.26  16.92/0.54  19.86/0.49  19.77/0.52  
4  24.24/0.61  23.96/0.71  27.63/0.73  25.73/0.72  26.05/0.71  21.54/0.47  23.62/0.74  24.37/0.63  24.64/0.66  
23.66/0.59  21.95/0.68  26.96/0.70  25.08/0.74  25.77/0.74  20.80/0.39  22.40/0.72  23.84/0.61  23.81/0.65  
24.33/0.62  24.41/0.76  27.55/0.73  26.03/0.77  26.11/0.75  21.51/0.45  23.55/0.77  24.31/0.63  24.73/0.69  
23.61/0.58  22.08/0.66  27.07/0.68  25.08/0.74  25.66/0.74  20.92/0.42  22.57/0.72  23.87/0.60  23.86/0.64  
NLSPCA
NLSPCAbin BM3D BM3Dbin FoEPNR FoEPNRbin 
22.94/0.56  21.16/0.65  26.99/0.69  24.49/0.73  25.10/0.72  20.36/0.35  20.92/0.67  23.32/0.58  23.16/0.62  
20.02/0.47  18.19/0.55  22.49/0.53  20.17/0.63  19.13/0.60  17.78/0.24  15.97/0.51  19.49/0.48  19.16/0.50  
40  29.39/0.79  29.17/0.85  32.62/0.88  32.06/0.86  31.68/0.86  25.58/0.73  29.64/0.88  28.95/0.80  29.89/0.83  
25.56/0.68  23.07/0.73  29.93/0.81  27.36/0.79  28.44/0.81  22.63/0.54  24.87/0.81  26.13/0.71  26.00/0.74  
29.20/0.78  28.93/0.85  32.43/0.87  31.31/0.85  31.33/0.85  25.79/0.76  29.55/0.88  28.92/0.80  29.68/0.80  
25.63/0.69  22.92/0.74  30.08/0.82  27.43/0.79  28.45/0.81  22.73/0.56  24.95/0.82  26.25/0.72  26.06/0.74 
In Fig. 8Fig. 10, the recovered results are reported for peak=1, 2 and 4 respectively. It can be observed that FoEPNR and BM3D perform best on detail preservation, and achieve evidently better results in term of PSNR/MSSIM index. Note that, the nonlocal technique BM3D is affected by the socalled ghost artifacts, structured signallike patches that appear in homogeneous areas of the image, originated by random noise and reinforced through the patch selection process. The essential idea of BM3D is that only a few pixels with the most similar context are selected for the estimation. As a consequence, the selection process is inevitably influenced by the noise itself, especially in flat areas of the image, which can be dangerously selfreferential.
The recovery error in terms of PSNR (in dB) and MSSIM are summarized in Table 2. Comparing the indexes in Table 2 and the denoising results in the present figures, we can safely draw the following conclusions.

For where the Anscombe transform is less effective, the best indexes are guaranteed by our proposed FoEbased methods. The proposed methods employ a specialized FoE model trained for Poisson denoising problem and adaptive data terms to accommodate different Poisson noise levels, whereby they gain better results for the case of low peak value (). As the peak value increases, the accuracy of the Anscombe transform increases, leading to the performance improvement of BM3Dbased Poisson denoising algorithm. Note that by looking at the overall performance, our proposed variational model and the BM3Dbased method behave similarly.

In the rather low noise level (), the introduction of the binning technique yields a significant performance enhancement. This is reasonable because when the noise level is extremely low, the operation of aggregating the noisy Poisson pixels into small bins results in a smaller Poisson image with lower resolution but higher counts per pixel. Therefore, one obvious advantage of the binning is that it can significantly reduces computation burden. However, as the peak raises, the binning becomes less effective. The reason is that if the peak value is not very low again, there is no need for the noisy image to suppress its noise level using binning at the loss of resolution reduction which will weaken or even eliminate many image details.

In terms of MSSIM, the best overall performance is provided by the proposed methods (FoEPNR and FoEPNRbin) and BM3Dbased method with the former slightly better, indicating that our method is more powerful in geometrypreserving, which can noticeably be visually perceived in Fig. 7.

In the visual quality, the typical structured artifacts encountered with the BM3Dbased algorithm do not appear when the proposed methods are used. Meanwhile, we found that our method introduces blocktype artifacts for the case of low peak values, e.g., in Fig. 8(d). The main reason is that our method is a local model, which becomes less effective to infer the underlying structure solely from the local neighborhoods, if the input image is too noisy.
For better describing the typical artifacts introduced by the proposed localbased methods, we briefly analyze the denoising results within the red rectangles of Fig. 7(e) and Fig. 8(e). By observation on the image structure within the red rectangle of Fig. 8(e), it is understandable for FoEPNR to recover the result in Fig. 8(d) which seems more similar to the underlying feature in the noisy image than Fig. 8(c). However, the recovered result of BM3Dbased method within the red rectangle is smoother and closer to the original clean image than that of FoEPNR. A similar phenomenon can also be observed in Fig. 7 where FoEPNR produces more serious artifact than BM3Dbased method does. As mentioned above, for too noisy input images, the local method becomes less effective. Therefore, some incorrect features caused by noise will disturb the denoised result of FoEPNR. Although this uncorrect feature is possibly not quite obvious, the trained FoE filters will still extract it and exhibit this feature apparently. In this case, the advantage of nonlocality comes through by selecting several most similar context for estimating the underlying image intensity. That is to say, the denoised result at some location keep consistent with its similar contexts’ information. However, if the uncorrect feature is apparent enough, both the nonlocal and local methods will extract it, as analyzed on Fig. 7(c) and (d).
It is worthy noting that, in the aforementioned experiments we employ the case of one fixed noise realization because we want to provide determinant and easily reproducible results for a direct comparison between different algorithms. For a comprehensive evaluation, we also take 100 trials to compare the performances of the proposed method and BM3Dbased method on image with many noise realizations. The peak value is set as 1. The average PSNR/MSSIM values are presented in Table 3 from which we can see that the difference of the average PSNR/MSSIM using 100 different noise realizations is similar to that of PSNR/MSSIM obtained by the fixed pseudorandom noise realization adopted in our study. Moreover, for 100 repetitions of Table 2, the total time consumed by NLSPCA (referring to Table 4) is about 1490 hours. Therefore, we have not presented the results based on many noise realization in consideration of time consumption. Nevertheless, the results of one fixed noise realization can already elucidate the difference of the test algorithms.
BM3D  FoEPNR 
20.40/0.60  20.85/0.63 
20.36/0.61  20.80/0.64 
4.2 Run time
BM3D  NLSPCA  FoEPNR  
1.38  367.9  27.31 (0.20)  
4.6  1122.1  52.75 (0.63) 
It is worthwhile to note that our model is a local model with simple structure, as it merely contains convolution of linear filters with an image, and therefore quite differs from the nonlocal models (e.g., the BM3Dbased Poisson denoising method). Our approach corresponds to a nonconvex minimization problem, which is efficiently solved by an iterative algorithm  iPiano. Each iteration consists of convolution operations of filters of size and a few additional pixelwise calculations. The overall required iterations vary for different noise levels. For cases of relatively low peak, e.g., , it generally consumes 150 iterations; while for cases of relatively high peak, e.g., , usually 60 iterations is sufficient. In Table 4, we report the typical run time of our model for the images of two different dimensions exploited in this paper for the case of . We also present the run time of two competing algorithms for a comparison ^{3}^{3}3 All the methods are run in Matlab with singlethreaded computation for CPU implementation. We only consider the version without binning technique..
Due to the structural simplicity of our model, it is wellsuited to GPU parallel computation. We are able to implement our algorithm on GPU with ease. It turns out that the GPU implementation based on NVIDIA Geforce GTX 780Ti can accelerate the inference procedure significantly, as shown in Table 4.
5 Conclusion
We exploited variational models to incorporate the FoE prior, an widely used image prior/regularization model, in the context of Poisson noise reduction. We went through a comprehensive study of various variants to evaluate the corresponding performance. Finally, we arrived at the best variant based on (1) the Anscombe transformation, (2) newly trained FoE image prior directly in the Anscombe transform domain and (3) an ad hoc setting of the date fidelity term. This final version can lead to strongly competitive performance to stateoftheart Poisson denoising methods, meanwhile, it bears reasonable computation efficiency due to the usage of the iPiano algorithm. In summary, we have demonstrated that a simple local model can also achieve stateoftheart Poisson denoising performance, if with appropriate modeling and training.
In our current work, we conducted our training experiments using the images from the BSDS300 image segmentation database amfm2011 () which is targeted for natural images in a general sense. However, the Poisson noise often arises in applications such as biomedical imaging, fluorescence microscopy and astronomy imaging. In these typical applications, training specialized FoE prior model for a specific task bears the potential to improve the current results. This is subject to our future research work.
References
 [1] Francis J Anscombe. The transformation of Poisson, binomial and negativebinomial data. Biometrika, pages 246–254, 1948.
 [2] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik. Contour detection and hierarchical image segmentation. IEEE Trans. Pattern Analysis and Machine Intelligence, 33(5):898–916, May 2011.
 [3] Richard Berry and James Burnell. Astronomical image processing. WillmanBell, Inc, 2000.
 [4] Mario Bertero, Patrizia Boccacci, Giorgio Talenti, Riccardo Zanella, and Luca Zanni. A discrepancy principle for Poisson data. Inverse Problems, 26(10):105004, 2010.
 [5] Yunjin Chen, Wensen Feng, Rene Ranftl, Hong Qiao, and Thomas Pock. A higherorder MRF based variational model for multiplicative noise reduction. IEEE Signal Process. Lett., 21(11):1370–1374, 2014.
 [6] Yunjin Chen, Thomas Pock, René Ranftl, and Horst Bischof. Revisiting lossspecific training of filterbased MRFs for image restoration. In GCPR, pages 271–281, 2013.
 [7] Yunjin Chen, René Ranftl, and Thomas Pock. Insights into analysis operator learning: From patchbased sparse models to higher order MRFs. IEEE Trans. Image Process., 23(3):1060–1072, 2014.
 [8] Imre Csiszár. Why least squares and maximum entropy? an axiomatic approach to inference for linear inverse problems. The Annals of Statistics, 19(4):2032–2066, 1991.
 [9] K. Dabov, A. Foi, V. Katkovnik, and K. O. Egiazarian. Image denoising by sparse 3d transformdomain collaborative filtering. IEEE Trans. Image Process., 16(8):2080–2095, 2007.
 [10] Aram Danielyan, Vladimir Katkovnik, and Karen Egiazarian. Deblurring of Poissonian images using BM3D frames. In SPIE Optical Engineering + Applications, pages 813812–813812, 2011.
 [11] David L Donoho. Nonlinear wavelet methods for recovery of signals, densities, and spectra from indirect and noisy data. In Proceedings of symposia in Applied Mathematics, volume 47, pages 173–205, 1993.
 [12] Mário AT Figueiredo and José M BioucasDias. Restoration of Poissonian images using alternating direction optimization. IEEE Trans. Image Process., 19(12):3133–3145, 2010.
 [13] Marek Fisz. The limiting distribution of a function of two independent random variables and its statistical application. In Colloquium Mathematicae, volume 3, pages 138–146, 1955.
 [14] Elena GilRodrigo, Javier Portilla, David Miraut, and Ricardo SuarezMesa. Efficient joint PoissonGauss restoration using multiframe relaxed analysisbased sparsity. In ICIP, pages 1385–1388, 2011.
 [15] Raja Giryes and Michael Elad. Sparsitybased Poisson denoising with dictionary learning. IEEE Transactions on Image Processing, 23(12):5057–5069, 2014.
 [16] Zachary T Harmany, Roummel F Marcia, and Rebecca M Willett. This is SPIRALTAP: sparse Poisson intensity reconstruction algorithms – theory and practice. IEEE Trans. Image Process., 21(3):1084–1096, 2012.
 [17] J. Huang and D. Mumford. Statistics of natural images and models. In CVPR, pages 541–547, Fort Collins, CO, USA, 1999.
 [18] Eric D Kolaczyk and Robert D Nowak. Multiscale likelihood analysis and complexity penalized estimation. Annals of Statistics, pages 500–527, 2004.
 [19] Triet Le, Rick Chartrand, and Thomas J Asaki. A variational approach to reconstructing images corrupted by Poisson noise. Journal of Mathematical Imaging and Vision, 27(3):257–263, 2007.
 [20] Stamatios Lefkimmiatis, Petros Maragos, and George Papandreou. Bayesian inference on multiscale models for Poisson intensity estimation: Applications to photonlimited image denoising. IEEE Trans. Image Process., 18(8):1724–1741, 2009.
 [21] Anat Levin and Boaz Nadler. Natural image denoising: Optimality and inherent bounds. In Computer Vision and Pattern Recognition (CVPR), IEEE Conference on, pages 2833–2840, 2011.
 [22] Daniel J Lingenfelter, Jeffrey A Fessler, and Zhong He. Sparsity regularization for image reconstruction with poisson data. In IS&T/SPIE Electronic Imaging, pages 72460F–72460F, 2009.
 [23] D. C. Liu and J. Nocedal. On the limited memory BFGS method for large scale optimization. Mathematical Programming, 45(1):503–528, 1989.
 [24] Florian Luisier, Cédric Vonesch, Thierry Blu, and Michael Unser. Fast interscale Wavelet denoising of Poissoncorrupted images. Signal Processing, 90(2):415–427, 2010.
 [25] Markku Makitalo and Alessandro Foi. A closedform approximation of the exact unbiased inverse of the Anscombe variancestabilizing transformation. Image Processing, IEEE Transactions on, 20(9):2697–2698, 2011.
 [26] Markku Makitalo and Alessandro Foi. Optimal inversion of the Anscombe transformation in lowcount Poisson image denoising. IEEE Trans. Image Process., 20(1):99–109, 2011.
 [27] Collins Michael, Dasgupta Sanjoy, and R E Schapire. A generalization of principal components analysis to the exponential family. In NIPS, pages 617–624, 2002.
 [28] Yu. Nesterov. Introductory lectures on convex optimization: A basic course, volume 87 of Applied Optimization.
 [29] P. Ochs, Y. Chen, T. Brox, and T. Pock. iPiano: Inertial Proximal Algorithm for Nonconvex Optimization. SIAM Journal on Imaging Sciences, 7(2):1388–1419, 2014.
 [30] Isabel Rodrigues, Joao Sanches, and Jose BioucasDias. Denoising of medical images corrupted by Poisson noise. In Image Processing, 15th IEEE International Conference on, pages 1756–1759, 2008.
 [31] Stefan Roth and Michael J. Black. Fields of experts. IJCV, 82(2):205–229, 2009.
 [32] Joseph Salmon, Zachary Harmany, CharlesAlban Deledalle, and Rebecca Willett. Poisson noise reduction with nonlocal PCA. Journal of Mathematical Imaging and Vision, 48(2):279–294, 2014.
 [33] K. G. G. Samuel and M.F. Tappen. Learning optimized MAP estimates in continuouslyvalued MRF models. In CVPR, 2009.
 [34] Uwe Schmidt, Qi Gao, and Stefan Roth. A generative perspective on MRFs in lowlevel vision. In CVPR, pages 1751–1758, 2010.
 [35] Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process., 13(4):600–612, 2004.
 [36] R M Willett and Robert D Nowak. Multiscale Poisson intensity and density estimation. IEEE Trans. Inf. Theory, pages 3171–3187, 2007.
 [37] Guoshen Yu, Guillermo Sapiro, and Stéphane Mallat. Solving inverse problems with piecewise linear estimators: from Gaussian mixture models to structured sparsity. IEEE Trans. Image Process., 21(5):2481–2499, 2012.
 [38] Bo Zhang, Jalal M Fadili, and JL Starck. Wavelets, ridgelets, and curvelets for Poisson noise removal. IEEE Trans. Image Process., 17(7):1093–1108, 2008.
 [39] Xiaoqun Zhang, Yujie Lu, and Tony Chan. A novel sparsity reconstruction method from Poisson data for 3d bioluminescence tomography. Journal of Scientific Computing, 50(3):519–535, 2012.