Speckle Reduction with Trained Nonlinear Diffusion Filtering
Abstract
Speckle reduction is a prerequisite for many image processing tasks in synthetic aperture radar (SAR) images, as well as all coherent images. In recent years, predominant stateoftheart approaches for despeckling are usually based on nonlocal methods which mainly concentrate on achieving utmost image restoration quality, with relatively low computational efficiency. Therefore, in this study we aim to propose an efficient despeckling model with both high computational efficiency and high recovery quality. To this end, we exploit a newlydeveloped trainable nonlinear reaction diffusion(TNRD) framework which has proven a simple and effective model for various image restoration problems. In the original TNRD applications, the diffusion network is usually derived based on the direct gradient descent scheme. However, this approach will encounter some problem for the task of multiplicative noise reduction exploited in this study. To solve this problem, we employed a new architecture derived from the proximal gradient descent method. Taking into account the speckle noise statistics, the diffusion process for the despeckling task is derived. We then retrain all the model parameters in the presence of speckle noise. Finally, optimized nonlinear diffusion filtering models are obtained, which are specialized for despeckling with various noise levels. Experimental results substantiate that the trained filtering models provide comparable or even better results than stateoftheart nonlocal approaches. Meanwhile, our proposed model merely contains convolution of linear filters with an image, which offers high level parallelism on GPUs. As a consequence, for images of size , our GPU implementation takes less than 0.1 seconds to produce stateoftheart despeckling performance.
keywords:
Despeckling, optimized nonlinear reaction diffusion model, convolutional neural networks, trainable activation function.1 Introduction
Synthetic aperture radar (SAR) images are inevitably corrupted by speckle noise, due to constructive and destructive electromagnetic wave interference during image acquisition. With fleets of satellites delivering a huge number of images, automatic analysis tools are essential for remote sensing major applications. Therefore, the quality of source images should be sufficient such that it is easy to extract information. However, the speckle noise visually degrades the appearance of images and therefore hinders automatic scene analysis and information extraction argenti2013tutorial (); di2014benchmarking (). For instance, speckle is the main obstacle towards the development of an effective opticalSAR fusion alparone2004landsat (). Hence, speckle reduction is a necessary preprocessing step in SAR image processing. Despeckling techniques have been extensively studied for almost 30 years lee1980digital (); kuan1987adaptive (), and new algorithms are continuously proposed to provide better and better performance. Up to now, the despeckling techniques fall broadly into four categories: filtering based methods in (1) spatial domain or (2) a transform domain, e.g., wavelet domain; (3) nonlocal filtering; and (4) variational methods. As a comprehensive review of the despeckling algorithms is beyond the scope of this paper, we only provide a brief introduction for these methods. For more details, we refer the reader to argenti2013tutorial ().
The multilook technique is a traditional spatial approach. It amounts to incoherently averaging independent observations of the same resolution cell, thus reducing the noise intensity. However, this simple averaging approach results in a clear loss in image resolution. To overcome this deficiency, a great deal of research has been conducted to develop suitable spatial filters which can reduce the noise, yet preserve details and edges Frostpami1982 () lopes1993structure (). Filters of this kind include Lee filter proposed in lee1980digital () lee1981refined () which was developed under the minimummeansquareerror (MMSE) criterion, and Kuan filter kuan1987adaptive () as well as the Map filter lopes1990maximum () which are based on the more sophisticated maximum a posteriori (MAP) approach.
Anisotropic diffusion perona1990scale () based method is also a type of widely exploited spatial filtering technology for despeckling. Anisotropic diffusion is a popular technique in the image processing community, that aims at reducing image noise without removing significant parts of the image content. A few related works that apply anisotropic diffusion filtering for the despeckling task include speckle reducing anisotropic diffusion (SRAD) yu2002speckle () and detail preserving anisotropic diffusion (DPAD) aja2006estimation (). SRAD exploits the instantaneous coefficient of variation and it leads to better performance than the conventional anisotropic diffusion method in terms of mean preservation, variance reduction and edge localization. DPAD modifies the SRAD filter to rely on the Kuan filter kuan1987adaptive () rather than the Lee filter. DPAD estimates the local statistics using a larger neighborhood, instead of the four direct neighbors used by SRAD. However, the despeckling methods based on anisotropic diffusion fell out of favor in recent years mainly because of limited performance. It is clear that there is a despeckling quality gap between the diffusion based approaches and stateoftheart nonlocal algorithms.
Image filtering in the domain of wavelet has also been widely exploited for despeckling. Most of the waveletbased despeckling techniques employ the statistical wavelet shrinkage technique with MAP Bayesian approach, e.g.RanjaniTGRS (); achim2006sar (); bianchi2008segmentation (); ranjani2011generalized (); argenti2012fast (). In general, the waveletbased methods guarantee a superior ability to preserve signal resolution in comparison with conventional spatial filters. However, they often suffer from isolated patterns in flat areas, or ringing effects near the edges of the images, leading to visually unappealing results.
Recently, incorporation with the modern denoising methods, e.g., nonlocal mean (NLM) buades2005review (), blockmatching 3D (BM3D) dabov2007image () and KSVD elad2006image (), several nonlocal despeckling approaches have been proposed deledalle2009iterative (), teuber2012nonlocal (), ParrilliPAV12 (), huang2012multiplicative (). Originated from the NLM algorithm, the probabilistic patchbased (PPB) filter deledalle2009iterative () provides promising results by developing an effective similarity measure well suited to SAR images. A drawback of the PPB filter is the suppression of thin and dark details in the regularized images. As an extension of BM3D, Parrilli ParrilliPAV12 () derived a SARoriented version of BM3D by taking into account the peculiar features of SAR images. It exhibits an objective performance comparable or superior to other techniques on simulated speckled images, and guarantees a very good subjective quality on real SAR images. Typical artifacts of the nonlocal methods are in the form of structured signallike patches in flat areas, originated from the randomness of speckle and reinforced through the patch selection process. Generally speaking, most of these techniques mainly concentrate on achieving utmost despeckling quality, with relatively low computational efficiency. An notable exception is BM3D with its improved version CozzolinoPSPV14 (). However, the BM3Dbased method involves a block matching process, which is challenging for parallel computation on GPUs, alluding to the fact that it is not straightforward to accelerate BM3D algorithm on parallel architectures.
The fourth category, i.e., variational methods SteidlT10 (); AubertA08 (); wensen14 (); chen2014higher (); durand2010multiplicative (); jin2011variational (); chen2016multiplicative (); escandevariational (); zengtie1 (); zengtie2 (), minimizes some appropriate energy functionals consisted of an image prior regularizer and a data fitting term. As a wellknown regularizer, total variation (TV) has been widely used for the despeckling task SteidlT10 (); AubertA08 (); zengtie1 (); zengtie2 (). For instance, in zengtie1 () a new variational model based on a hybrid data term and the widely used TV regularizer is proposed for restoring blurred images with multiplicative noise. Moreover, zengtie2 () proposes a twostep approach to solve the problem of restoring images degraded by multiplicative noise and blurring, where the multiplicative noise is first reduced by nonlocal filters and then a convex variational model is adopted to obtain the final restored images. Solutions of variational problems with TV regularization admit many desirable properties, most notably the appearance of sharp edges.
However, TVbased methods generate the socalled staircasing artifact. To remedy the staircasing artifact, wensen14 () incorporates the total generalized variation (TGV) BrediesKP10 () penalty into the existing data fidelity term for speckle removal, and develops two novel variation despeckling models. By involving and balancing higherorder derivatives of the image, the TGVbased despeckling method outperforms the traditional TV methods by reducing the staircasing artifact. Recently, different from handcrafted regularizers, such as TV and TGV models, chen2014higher () proposes a novel variational approach for speckle removal, which combines an image prior model named Fields of Experts (FoE) RothFOE2009 () and a recently proposed efficient nonconvex optimization algorithm  iPiano iPiano (). The proposed method in chen2014higher () can obtain strongly competitive despeckling performance w.r.t. the stateoftheart method  SARBM3D, meanwhile, preserve the property of computational efficiency.
1.1 Our Contribution
Traditional despeckling approaches based on anisotropic diffusion are handcrafted models which include elaborate selections of diffusivity coefficient, or optimal stopping time or proper reaction force term. In order to improve the capacity of the traditional diffusionbased despeckling models, we employ the newlydeveloped anisotropic diffusion based model TNRD () with trainable filters and influence functions. Instead of the first order gradient operator in previous diffusionbased despeckling models, we explore more filters of larger kernel size targeted for despeckling. On the other hand, different influence functions are considered and trained for different filters, rather than an unique function in the traditional diffusion model. Moreover, the parameters of each iteration can vary across diffusion steps.
As shown in TNRD (), the optimized nonlinear diffusion model has broad applicability to a variety of image restoration problems, and achieves recovery results of high quality surpassing recent stateofthearts. Furthermore, it only involves a small number of explicit filtering steps, and hence is highly computationally efficient, especially with parallel computation on GPUs.
In this paper, we intend to apply the TNRD framework TNRD () to the task of multiplicative noise reduction. However, a direct use of the original TNRD model is not feasible, as we have to make a few modifications oriented to this specific task:

We need to redesign the diffusion process by taking into consideration the peculiarity of multiplicative noise statistics.

Based on the new diffusion process specialized for multiplicative noise reduction, we need to recalculate the gradients required for the training phase.

In the original TNRD applications, the diffusion network is usually derived based on the direct gradient descent scheme. However, this approach will encounter some problem for the task of multiplicative noise reduction exploited in this work. The reason is explained in detail in Section 3.1 and the experimental part 4.1. To solve this problem, we employed a new architecture as shown in Fig. 2, which is derived from the proximal gradient descent method. Comparing Fig. 2 and Fig. 1, we can see that the structure of the proposed diffusion process Fig. 2 in our study is quite different from the original TNRD model Fig. 1.
Then, the model parameters in the diffusion process need to be trained by taking into account the Speckle noise statistics, including the linear filters and influence functions. Eventually, we reach a nonlinear reaction diffusion based approach for despeckling, which leads to stateoftheart performance, meanwhile gains high computationally efficiency. Experimental results show that the proposed despeckling approach with optimized nonlinear diffusion filtering leads to stateoftheart performance, meanwhile gains high computationally efficiency.
1.2 Organization
The remainder of the paper is organized as follows. Section II presents a general review of the speckle noise and the trainable nonlinear reaction diffusion process, which is required to derive the optimized diffusion process for despeckling. In the subsequent section III, we propose the optimized nonlinear diffusion process for despeckling. Subsequently, Section IV describes comprehensive experiment results for the proposed model. The concluding remarks are drawn in the final Section V.
2 Preliminaries
To make the paper selfcontained, in this section we provide a brief review of the statistics property of speckle noise and the trainable nonlinear diffusion process proposed in TNRD ().
2.1 Speckle Noise
Assume that (represented as a columnstacked vector) denotes the observed SAR image amplitude with number of looks , and denotes the underlying true image amplitude i.e., the square root of the reflectivity. According to goodman1975statistical () argenti2013tutorial (), the fullydeveloped speckle can be modeled as the multiplicative Goodman’s noise, i.e.,
(2.1) 
where is the socalled speckle noise. In the multiplicative, or fullydeveloped, speckle model, the conditional pdf of given follows a Nakagami distribution:
where is the classical Gamma function. Note that the distribution According to the Gibbs function, this likelihood leads to the following energy term via
(2.2) 
where denotes inner product. Unfortunately, this term is nonconvex w.r.t. , that is cumbersome to derive the diffusion process as described in Sec. 3.1. There are two ways to resolve the nonconvexity. First, logarithmic transformation (i.e., ) can be used. The resulting energy term is expressed as follows:
(2.3) 
Second, an alternative method to define a convex date term is using the classical Csiszár Idivergence model csiszar1991least (), which is typically derived for Poisson noise. The Idivergence based data term for the amplitude model is given by SteidlT10 (); wensen14 ()
(2.4) 
which is strictly convex w.r.t. .
Concerning the relation between (2.2) and (2.4), by following the argument stated in Steidl and Teuber’s work SteidlT10 (), we achieve similar equivalence by incorporating these two data terms into a Total Variation (TV) regularized variational model in the continuous form. The derivation reads as follows. Regarding the data term (2.2), let us consider the following convex variational functional by setting
(2.5) 
where is the TV seminorm defined as
(2.6) 
Regarding the data term (2.2), let us consider the following convex variational functional by setting
(2.7) 
As , we have for that if and only if . As a consequence, if we minimize (2.5) and (2.7) over smooth functions, the minimizer for (2.5) and the minimizer for (2.7) are respectively given by
and
for those points and . Since we have , it turns out that the minimizer and of the functional in the model (2.5) and model (2.7) respectively, coincide in the sense that . Therefore, for both , we will obtain the same result. Note that this conclusion only holds for the those norm induced regularizers, such as the TV regularization. For some other regularized variational models (e.g., TGVbased model wensen14 ()), although this data term (2.4) seems inappropriate, it performs very well for despeckling.
2.2 Trainable Nonlinear Reaction Diffusion (TNRD)
2.2.1 Highly parametrized nonlinear diffusion model
Recently, a simple but effective framework for image restoration called TNRD was proposed in TNRD (), which is based on the concept of nonlinear reaction diffusion. The TNRD framework is modeled by highly parametrized linear filters as well as highly parametrized influence functions. In contrast to those conventional nonlinear diffusion models which usually make use of handcrafted parameters, all the parameters in the TNRD model, including the filters and the influence functions, are learned from training data through a loss based approach.
The proposed framework is formulated as the following timedynamic nonlinear reactiondiffusion process with steps
(2.8) 
where is the initial status of the diffusion process, is the convolution operation, denotes the diffusion stages, denotes the number of filters, are time varying convolution kernels, is obtained by rotating the kernel 180 degrees, denotes 2D convolution of the image with the filter kernel , and are time varying influence functions (not restricted to be of a certain kind). Both the proximal mapping operation and are the reaction force. Usually, the reaction term is chosen as the derivative of a certain smooth date term , i.e., . Note that the proximal mapping operation nesterov2004introductory () related to the function is given as
As shown in TNRD (), the proposed model (2.8) can be interpreted as performing one gradient descent step at with respect to a dynamic energy functional given by
(2.9) 
where the functions are the socalled penalty functions and the regularization parameter is included in . Note that and the parameters {} vary across the stages i.e., changes at each iteration.
It is easy to apply the proposed TRND framework to different image restoration problems by incorporating specific reaction force, such as Gaussian denoising, image deblurring, image super resolution and image inpainting. This is realized by setting , while and , where is related to the strength of the reaction term, and denote the original true image and the input degraded image, respectively, and is the associated different linear operator. In the case of Gaussian denoising, is the identity matrix; for image super resolution, is related to the down sampling operation and for image deconvolution, should correspond to the linear blur kernel.
2.2.2 Overall training scheme
The proposed diffusion model in TNRD () is trained in a supervised manner. In other words, the input/output pairs for certain image processing task are firstly prepared, and then we exploit a loss minimization scheme to optimize the model parameters for each stage of the diffusion process. The training dataset consists of training samples , where is a ground truth image and is the corresponding degraded input. The model parameters in stage required to be trained include 1) the reaction force weight , (2) linear filters and (3) influence functions. All parameters are grouped as , i.e., . Then, the optimization problem for the training task is formulated as follows
(2.10) 
where . The training problem in (2.10) can be solved via gradient based algorithms, e.g., the LBFGS algorithm lbfgs (), where the gradients associated with are computed using the standard backpropagation technique lecun1998gradient ().
There are two training strategies to learn the diffusion processes: 1) the greedy training strategy to learn the diffusion process stagebystage; and 2) the joint training strategy to train a diffusion process by simultaneously tuning the parameters in all stages. Generally speaking, the joint training strategy performs better TNRD (), and the greedy training strategy is often used to provide a good initialization for the joint training. Concerning the joint training scheme, the gradients are computed as follows,
(2.11) 
For different image restoration problems, we mainly need to recompute the two components and , the main part of which are similar to the derivations in TNRD ().
3 Optimized Nonlinear Diffusion Filtering for Despeckling
3.1 Proposed Trainable Reaction Diffusion Filtering for Despeckling
The diffusion filtering process for despeckling can be derived from the energy functional (2.9) by incorporating data fidelity terms in (2.2)(2.4).
Considering the data term (2.3), we have the following energy functional
(3.1) 
with . Then, we derive the corresponding diffusion process by setting and , as the proximal mapping operation w.r.t is not easy to compute. The resulting diffusion process is given as
where denotes elementwise product. However, due to the formulation , the value of is prone to explosion in training phase and therefore makes the training fail.
If we consider the data term (2.2), we arrive at the following energy functional
(3.2) 
As the proximal mapping operator only works for a convex function , we have to set and in (2.9). Then we arrive at a direct gradient descent process with . However, this straightforward gradient descent approach is not applicable in practice, because (1) at the points with very close to zero the reaction term is enlarged so much that there will be an obvious problem of numerical instability; (2) this update rule may produce negative values of after one diffusion step, which will violate the constraint of the data term in (3.2).
In order to remedy the aforementioned problems, after each diffusion step, we additionally consider a simple projection operation
to shrink those “bad” pixels, which do not satisfy the constraint or are very close to zero. Therefore, we arrive at the following diffusion approach
(3.3) 
An illustrative example of the shrinkage function related to the projection operation is show in Fig. 3, where we set . Note that as the shrinkage function is differentiable, it is not problematic in the training phase.
As the projection operation will somehow manipulate the pixel values, a small is required to reduce its influence. However, on the other hand, a small will raise a problem of numerical stability due to the formulation , and then training will fail. In order to search an appropriate for our diffusion model, we progressively increased from with four settings . We observed that the former three settings led to failure in the training phase, and is sufficient for a stable training.
However, although the diffusion model (3.3) overcomes the shortcomings of the model (2.2), it encounters some other problems which will be explained in detail in the experimental part 4.1.
As a consequence, we resort to the data term (2.4). Although the data term (2.4) sounds inappropriate, it is incorporated in a flexible training framework with many free parameters. The training phase can somehow automatically adapt to this inappropriate data term.
Therefore, we exploit the data term (2.4), and reach the following energy functional
(3.4) 
As the data term in (3.4) is smooth and differentiable we have two possible ways to derive the diffusion process.

By setting and in (2.9), we arrive at a direct gradient descent process with . In this case, we will encounter the same problem mentioned above, thus intractable in practice.

By setting and in (2.9), the resulting diffusion process is able to overcome the above shortcomings. Details read as follows.
Recall that the proximal mapping with respect to is given as the following minimization problem
(3.5) 
The solution of (3.5) is given by the following pointwise operation
(3.6) 
Note that this update rule is able to guarantee in diffusion steps because is always positive if .
Finally, the diffusion process for despeckling using the proximal gradient method is formulated as
(3.7) 
where . The architecture of the proposed diffusion model for despeckling is as shown in Fig. 1.
3.2 Computing The Gradients for Training
In this section we derive the gradients of the loss function w.r.t the training parameters in joint training. In summary, we need to compute three parts of , i.e., , and .
First of all, according to (2.10), it is easy to check that the gradient is given as follows
where we omit the image index for brevity. Then, is computed as follows according to the chain rule,
(3.8) 
According to (3.7), it is easy to check that
(3.9) 
where is a diagonal matrix with denoting the first order derivative of function and . Here, denote the elements of which are represented as a columnstacked vector. Note that is related to the kernel , i.e., . As shown in TNRD (), and can be computed by the convolution operation with the kernel and , respectively with careful boundary handling. Moreover, can be easily derived according to (3.7), and is formulated as
(3.10) 
where denote the elements of
Now, the calculation of is obtained by incorporating (3.9) and (3.10).
Concerning the gradients , we should derive three parts of respectively. It is worthy noting that the gradients of w.r.t are only associated with , and are given as and
The detailed derivations of has been provided in our previous work TNRD (), and therefore we omit them in this paper. The formulation of is straightforward to compute according to (3.10).
According to (3.7), the gradient of w.r.t is formulated as
(3.11) 
where denote the elements of
(3.12) 
Note that is written as a row vector.
In practice, in order to ensure the value of positive during the training phase, we set . As a consequence, in implementation we employ the gradient instead of . The gradient is explicitly formulated as
(3.13) 
The architecture of the proposed diffusion model for despeckling is as shown in Fig. 2 which is quite different from the straightforward direct gradient descent (as shown in Fig. 1) used for the original TNRDbased denoisng task TNRD ().
4 Experiments
In this section, we abbreviate the proposed method as TRDMD (the fully Trained Reaction Diffusion Models for Despeckling). To evaluate the performance of the proposed method, three representative stateoftheart approaches are compared: the nonlocal methods PPBit deledalle2009iterative (), SARBM3D ParrilliPAV12 (), and our previous FoEbased despeckling work chen2014higher (). Moreover, considering that the proposed model is essentially a diffusion process, we also employ the traditional Map filter lopes1990maximum () and some related anisotropic diffusion approaches for comparison, e.g., speckle reducing anisotropic diffusion (SRAD) yu2002speckle () and detail preserving anisotropic diffusion (DPAD) aja2006estimation (). The corresponding codes are downloaded from the authors’ homepage, and we use them as is. Note that the two excellent variational methods zengtie1 () and zengtie2 () mainly focus on the restoration of images that are simultaneously blurred and corrupted by multiplicative noise. Even though these two papers also work for the problem of pure multiplicative noise reduction, the more advanced approaches such as patchbased method PPBit or SARBM3D for multiplicative noise removal could have better results. In zengtie1 (), a TV regularized variational model is exploited, which is a pairwise model and inevitably suffers from the wellknown staircase effect. In our model, we exploited a much more effective image regularization term with larger clique and adjustable potential functions. In zengtie2 (), the nonlocal filtering algorithm proposed in teuber2012new () is exploited to accomplish the first step for removing multiplicative noise. The nonlocal filtering algorithm teuber2012new () makes use of the same framework as PPBit, but investigates a different similarity measure in the presence of multiplicative noise. It is stated in deledalle2012compare () that the nonlocal filtering algorithms teuber2012new () and PPBit both have the same performance. As a consequence, we only provide comparison to those algorithms PPBit, SARBM3D and FoEbased methods, which generally perform better than zengtie1 () and zengtie2 ().
These methods are evaluated on several real SAR images and synthesized images with speckle noise using 68 optical images originally introduced by RothFOE2009 (), which have been widely exploited for Gaussian denoising. To provide a comprehensive comparison, the test number of looks are distributed between 1 to 8. For the quality assessment of despeckling methods, we closely follow the indexes employed in argenti2013tutorial (). Two classes of indexes are involved: fullreference quality indexes for simulated SAR images and noreference quality indexes for real SAR images. Three commonly used withreference indexes are taken to evaluation, i.e., PSNR, the mean structural similarity index (MSSIM) ssim () and the edge correlation (EC) sattar1997image () achim2003sar (). The MSSIM underlines the perceived changes in structural information after despeckling. MSSIM takes values over the interval [0,1], where 0 and 1 indicate no structural similarity and perfect similarity, respectively. The EC index is a measure of edge preservation between the highpass versions of the original and filtered images. Larger values correspond to a better edge retaining ability of the despeckling method.
For withoutreference indexes, we employ the mean and the variance of the ratio image oliver2004understanding (). The ideal values of and are one and respectively for an look amplitude SAR image oliver2004understanding (). We also employ the comparison between the coefficient of variation calculated on the despeckled image, namely , and its expected theoretical value on the noisefree image, touzi2002review (). The coefficient of variation is a widespread indicator of texture preservation. Theoretically speaking, the value of should be close to . If departs significantly from the texture is certainly altered.
In the following, the nonlinear diffusion process of stage with filters of size is expressed as whose number of filters is in each stage, if not specified.
To generate the training data for the simulation denoising experiments, we cropped a 256256 pixel region from each image of the Berkeley segmentation dataset MartinFTM01 (), resulting in a total of 400 training samples of size 256 256. We also employ different amounts of training samples to observe the denoising performance comparison, as shown in Fig. 5(a). Note that the PSNR values in the following three subsections are evaluated by averaging denoised results of 68 test images. For simplicity the value of is set as 8 in the following three contrastive analysises, without loss of generality.
4.1 The Experimental Comparison Between the Diffusion Model (3.3) and
In this subsection, we conducted experiments based on the diffusion model (3.3), and compared its performance with the proposed model in this study. For the sake of fairness, these methods are evaluated on synthesized images with speckle noise using 68 optical images, just as in the submitted paper. The test number of looks is set as 1. In order to perform a fast comparison, we employ the filter size and stage = 5 to setup the training models. Two cases as follows are considered for experimental analysis.
Case 1:
In this case, we take experiments directly on the 68 optical images whose minimum pixel value for each image is 1. The PSNR value obtained by the Model (3.3) is 24.30dB, just the same as , indicating that the Model (3.3) also works for images with pixel values bigger than 1.
Case 2:
In this case, we set the maximum value (peak value) of each of the 68 optical images to be 0.5255 by dividing the groundtruth image by a factor , i.e., . The obtained PSNR values along with the peak values are presented in Figure 4. From Figure 4 we can see that the gap between the two models decreases as the peak value increases. For lower peak values, the number of pixels that are smaller than 1 is bigger, leading to worse PSNR performance for Model (3.3). Especially, there is an inflection point around peak value 6. At this point, the number of pixels that are smaller than 1 becomes so large that the performance of Model (3.3) is degraded obviously.
In summary, in order to perform a stable training, we have to set . However, the hard projection operation will obviously manipulate some of the pixels, because it will set those pixels that should be 0 or very close to 0 to be 1. If the number of those pixels is relatively small, it is not a problem. However, if there is a large number of dark areas in the target image, e.g., for the case of real SAR images, the projection will significant affect the despeckled results. As a consequence, the despeckling performance will degrade dramatically. In an extreme case, if we carry out a toy test on a constant image whose size is and pixel value is 0.5. The PSNR value obtained by the Model (3.3) is 0.01dB. However, the PSNR value obtained by is 20.04dB, which is much higher than the Model (3.3).
Therefore, we mainly exploited the model in this study.
4.2 Influence of The Number of Training Samples
In this subsection, we evaluate the despeckling performance of trained models using different amounts of training samples for .
The results are summarized in Fig. 5(a), from which one can see that too small training set will result in overfitting which leads to inferior PSNR value. By observation, 400 images are typically enough to provide reliable performance.
4.3 Influence of filter size
In this subsection, we investigate the influence of the filter size on the despeckling performance in Fig. 5(b). The diffusion stages are set as 5, and 400 images are used for training.
One can see that increasing the filter size keeps bringing improvement. However, the rate of increase in PSNR becomes relatively slower for larger filter size. Balancing the training time and the performance improvement, we choose the model in our experiments.
4.4 Influence of Diffusion Stages
In this study, any number of diffusion stages can be exploited in our model. But in practice, the tradeoff between run time and accuracy should be considered. Therefore, we need to study the influence of the number of diffusion stages on the denoising performance. and 400 images are used for training.
As shown in Fig. 5(c), the performance improvement becomes quite insignificant when the diffusion stages . In order to save the training time, we choose in the following experiments as it provides the best tradeoff between performance and computation time.
4.5 Experimental Results
By analyzing the above three subsections, we decide to employ model and 400 images for training. Note that, the diffusion model needs to be trained respectively for different number of looks.
4.5.1 Comparison with Traditional Anisotropic Diffusion Based Approaches
=1  =3  =5  =8  
PSNR/MSSIM/EC  PSNR/MSSIM/EC  PSNR/MSSIM/EC  PSNR/MSSIM/EC  
Map  21.02/0.513/0.238  23.46/0.651/0.394  24.70/0.708/0.469  25.97/0.757/0.540 
SRAD  24.82/0.686/0.465  25.42/0.738/0.489  26.92/0.790/0.570  28.25/0.819/0.639 
DPAD  24.78/0.681/0.473  25.48/0.735/0.499  27.00/0.789/0.577  28.29/0.818/0.643 
24.95/0.678/0.479  27.28/0.775/0.630  28.53/0.819/0.700  29.70/0.852/0.754 
In this subsection, we compare the proposed despeckling model with the Map filter lopes1990maximum () and some related anisotropic diffusion approaches, e.g., speckle reducing anisotropic diffusion (SRAD) yu2002speckle () and detail preserving anisotropic diffusion (DPAD) aja2006estimation (). Moreover, to ensure that SRAD and DPAD achieve their best performance respectively, we set a time step and run 4003000 iterations for different noise levels.
By observation on despeckling performances in Fig. 6 and Fig. 7, we can see that in comparison with the Map filter, the other three methods show superior performance on edge localization and detail preservation. However, the unnatural smoothness introduced after iterative processing makes SRAD and DPAD produce cartoonlike images, i.e., made up by textureless geometric patches. Therefore, SRAD and DPAD may be unsuitable for practical application, because fine details and textures that may be useful for analysis are destroyed. On the contrary, the proposed method produces more natural results in visual quality and preserves more detailed information.
Quantitative evaluation results are shown in Table 1. Overall speaking, DPAD gives slightly better results than SRAD in terms of evaluation indexes. Meanwhile, Table 1 show that the best indexes are provided by the proposed model, indicating that the proposed model is more powerful in structural and edge preservation.
4.5.2 Comparison with Stateoftheart Despeckling Approaches
In this subsection, we compare the proposed despeckling model with several stateoftheart despeckling approaches, i.e., the nonlocal methods PPBit deledalle2009iterative (), SARBM3D ParrilliPAV12 (), and our previous FoEbased despeckling work chen2014higher ().
Examining the recovery images with in Fig. 8, we see that in comparison with PPBit and FoE, the proposed method and SARBM3D work better at capturing textures, details and small features. On the other hand, in comparison with SARBM3D, the proposed method recovers clearer edges and is relatively less disturbed by artifacts. As shown in Fig. 8(d)Fig. 11(d) the despeckled results of SARBM3D are affected by the structured signallike artifacts that appear in homogeneous areas of the image. This phenomenon is originated from the selection process in the BM3D denoising process. The selection process is easily influenced by the noise itself, especially in flat areas of the image, which can be dangerously selfreferential.
=1  =3  =5  =8  
PSNR/MSSIM/EC  PSNR/MSSIM/EC  PSNR/MSSIM/EC  PSNR/MSSIM/EC  
Noisy  12.95/0.223/0.148  17.42/0.378/0.255  19.59/0.460/0.322  21.61/0.535/0.393 
PPBit  23.78/0.590/0.346  25.95/0.713/0.487  26.95/0.763/0.549  27.86/0.803/0.602 
SARBM3D  24.85/0.676/0.455  27.12/0.776/0.605  28.27/0.818/0.669  29.35/0.852/0.723 
FoE  24.65/0.669/0.493  27.07/0.763/0.637  28.32/0.812/0.698  29.54/0.848/0.753 
24.95/0.678/0.479  27.28/0.775/0.630  28.53/0.819/0.700  29.70/0.852/0.754 
In Fig. 9Fig. 10 are reported the recovered results for . For better visual comparison, we also provide the magnified results within the red rectangles shown in Fig. 9. By observation, one can see that the performance of is better than FoE and SARBM3D on detail preservation. In the visual quality, the typical structured artifacts encountered with the BM3Dbased algorithm do not appear when the proposed method is used. Moreover, our method works better in geometrypreserving than FoE, which can be visually perceived by comparison on (e) and (f) in Fig. 10. Actually, one can see that functional (2.9) is exactly the fields of experts (FoE) image prior regularized variational model for image restoration. However, in the model, both the linear filters and influence functions are trained and optimized, which is the critical factor for the effectiveness of the proposed diffusion despeckling model. This critical factor of the optimized diffusion model is quite different from the FoE prior based variational model and traditional convolutional networks, where only linear filters are trained with fixed influence functions.
The recovery error in terms of PSNR (in dB) and MSSIM are summarized in Table 2. Comparing the indexes in Table 2, the overall performance of in terms of PSNR/MSSIM is better than the other methods. This indicates that for most images our method is powerful in the recover quality and geometry feature preservation. Overall speaking, the proposed model and SARBM3D perform better than the other test methods in terms of texture and detail preservation, as illustrated in the MSSIM index. In addition, according to the EC index, the proposed model preserves more clearer edges in comparison with SARBM3D.
We also present the despeckling results for real SAR amplitude images in Fig. 11Fig. 12. The test four SAR images are as follows: 1) singlelook Radarsat1 image of Vancouver (Canada) in Fig. 11, identified as ; 2) singlelook SAR images identified as and from SaintPolsurMer (France), sensed in 1996 by RAMSES of ONERA, as shown in Fig. 11; and 3) sixlook AirSAR cropland scene identified as , as shown in Fig. 12. By closely visual comparison, we can observe that recovers clearer texture and sharper edges. Especially for the tiny features, is able to catch but the other methods neglects them. Although these features are not quite obvious, the trained diffusion model still extracts them and exhibit these features apparently.
Table 3 lists the three withoutreference indexes of the different algorithms on four real SAR images. Comparing the indexes in Table 3, one can see that no algorithm is predominant in the despeckling performance. However, overall speaking, the proposed model provides most preferable results in terms of the evaluation indexes. Overall speaking, the proposed model provides comparable or even better results than the test stateoftheart methods, both in visual effects and the numerical indexes.
Image  Index  Ideal  PPBit  BM3D  FoE  
RIM  1  0.862  0.873  0.913  0.900  
RIV  0.273  0.198  0.179  0.195  0.179  
0.304  0.236  0.284  0.281  0.314  
RIM  1  0.861  0.850  1.020  0.859  
RIV  0.273  0.178  0.157  0.073  0.203  
0.770  0.744  0.736  0.838  0.750  
RIM  1  0.863  0.837  1.027  0.847  
RIV  0.273  0.165  0.147  0.060  0.203  
0.844  0.848  0.808  0.932  0.829  
RIM  1  0.974  0.969  0.965  0.973  
RIV  0.046  0.037  0.038  0.028  0.048  
0.337  0.412  0.415  0.426  0.423 
4.5.3 Run Time
The proposed method merely contains convolution of linear filters with an image, which offers high computation efficiency and high levels of parallelism making it well suited for GPU implementation. More precisely, the proposed model can not only run fast on CPU especially when the 2D convolution is efficiently realized by FFT, but also be implemented using GPU. Even without GPU, the proposed model is efficient enough.
In Table 4, we report the typical run time of our model for the images of two different dimensions for the case of . It is worthy noting that the FoEbased method needs more iterations to converge if the noise level is higher. Therefore, for or 5, the consuming time of FoEbased method is more than . We also present the run time of four competing algorithms for a comparison. Note that the method FANS CozzolinoPSPV14 () is the improved version of SARBM3D, at the expense of slight performance degradation. Meanwhile, the main body of SARBM3D, PPBit and FANS are all implemented in C language. All the methods are run in Matlab with singlethreaded computation for CPU implementation.
Due to the structural simplicity of our model, it is wellsuited to GPU parallel computation. We are able to implement our algorithm on GPU with ease. It turns out that the GPU implementation based on NVIDIA Geforce GTX 780Ti can accelerate the inference procedure significantly, as shown in Table 4. By comparison, we see that our model is generally faster than the other methods, especially with GPU implementation. This is reasonable because the PPBit and SARBM3D are based nonlocal approach with high computation complexity. For FoEbased method, the required maximum number of iterations is 120 for , 150 for , 300 for and 600 for . On the contrary, our model needs only 10 iteration steps with similar amount of computations at each step in comparison with FoE.
SARBM3D  FANS  PPBit  FoE (=8)  
42.4  3.71  13.2  14.86(0.12)  3.05 (0.03)  
169.1  14.40  48.9  59.43(0.37)  9.33 (0.09) 
5 Conclusion
In this study we proposed a simple but effective despeckling approach with both high computational efficiency and comparable or even better results than stateoftheart approaches. We achieve this goal by exploiting the newlydeveloped trainable nonlinear reaction diffusion model. The linear filters and influence functions in the model are simultaneously optimized by taking into account the speckle noise statistics.
The proposed model merely contains convolution of linear filters with an image, which offers high computation efficiency on CPU. Moreover, high levels of parallelism in the proposed model make it well suited for GPU implementation. Therefore, the proposed model is able to deal with massivescale and huge amount of data. In comparison with several related anisotropic diffusion despeckling approaches with handcrafted filters and unique influence function, e.g., SRAD and DPAD, the proposed model provides a much enhanced performance in both visual effects and evaluation indexes. In comparison with the nonlocalbased PPBit and SARBM3D methods, the proposed model bears the properties of simple structure and high efficiency with strongly competitive despeckling performance.
Note that, the training data is our study is natural images. Hence, the despeckling performance on real SAR images could be enhanced if the remote sensing dataset is employed for training.
References
 (1) A. Achim, E. E. Kuruoğlu, and J. Zerubia. SAR image filtering based on the heavytailed rayleigh model. Image Processing, IEEE Transactions on, 15(9):2686–2693, 2006.
 (2) A. Achim, P. Tsakalides, and A. Bezerianos. SAR image denoising via bayesian wavelet shrinkage based on heavytailed modeling. Geoscience and Remote Sensing, IEEE Transactions on, 41(8):1773–1784, 2003.
 (3) S. AjaFernández and C. AlberolaLópez. On the estimation of the coefficient of variation for anisotropic diffusion speckle filtering. Image Processing, IEEE Transactions on, 15(9):2694–2701, 2006.
 (4) L. Alparone, S. Baronti, A. Garzelli, and F. Nencini. Landsat ETM+ and SAR image fusion based on generalized intensity modulation. Geoscience and Remote Sensing, IEEE Transactions on, 42(12):2832–2839, 2004.
 (5) F. Argenti, T. Bianchi, A. Lapini, and L. Alparone. Fast map despeckling based on Laplacian–Gaussian modeling of wavelet coefficients. Geoscience and Remote Sensing Letters, IEEE, 9(1):13–17, 2012.
 (6) F. Argenti, A. Lapini, T. Bianchi, and L. Alparone. A tutorial on speckle reduction in synthetic aperture radar images. IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 1(3):6–35, 2013.
 (7) G. Aubert and J.F. Aujol. A variational approach to removing multiplicative noise. SIAM Journal of Applied Mathematics, 68(4):925–946, 2008.
 (8) T. Bianchi, F. Argenti, and L. Alparone. Segmentationbased map despeckling of SAR images in the undecimated wavelet domain. Geoscience and Remote Sensing, IEEE Transactions on, 46(9):2728–2742, 2008.
 (9) K. Bredies, K. Kunisch, and T. Pock. Total generalized variation. SIAM J. Imaging Sciences, 3(3):492–526, 2010.
 (10) A. Buades, B. Coll, and J.M. Morel. A review of image denoising algorithms, with a new one. Multiscale Modeling & Simulation, 4(2):490–530, 2005.
 (11) L. Chen, X. Liu, X. Wang, and P. Zhu. Multiplicative noise removal via nonlocal similaritybased sparse representation. Journal of Mathematical Imaging and Vision, 54(2):199–215, 2016.
 (12) Y. Chen, W. Feng, R. Ranftl, H. Qiao, and T. Pock. A higherorder MRF based variational model for multiplicative noise reduction. Signal Processing Letters, IEEE, 21(11):1370–1374, 2014.
 (13) Y. Chen and T. Pock. Trainable Nonlinear Reaction Diffusion: A flexible framework for fast and effective image restoration. To appear in IEEE TPAMI, 2016.
 (14) Y. Chen, W. Yu, and T. Pock. On learning optimized reaction diffusion processes for effective image restoration. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
 (15) D. Cozzolino, S. Parrilli, G. Scarpa, G. Poggi, and L. Verdoliva. Fast adaptive nonlocal SAR despeckling. IEEE Geosci. Remote Sensing Lett., 11(2):524–528, 2014.
 (16) I. Csiszár. Why least squares and maximum entropy? an axiomatic approach to inference for linear inverse problems. The annals of statistics, 19(4):2032–2066, 1991.
 (17) K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian. Image denoising by sparse 3d transformdomain collaborative filtering. Image Processing, IEEE Transactions on, 16(8):2080–2095, 2007.
 (18) C.A. Deledalle, L. Denis, and F. Tupin. Iterative weighted maximum likelihood denoising with probabilistic patchbased weights. Image Processing, IEEE Transactions on, 18(12):2661–2672, 2009.
 (19) C.A. Deledalle, L. Denis, and F. Tupin. How to compare noisy patches? patch similarity beyond gaussian noise. International journal of computer vision, 99(1):86–102, 2012.
 (20) G. Di Martino, M. Poderico, G. Poggi, D. Riccio, and L. Verdoliva. Benchmarking framework for SAR despeckling. Geoscience and Remote Sensing, IEEE Transactions on, 52(3):1596–1615, 2014.
 (21) Y. Dong and T. Zeng. A convex variational model for restoring blurred images with multiplicative. SIAM J. Imaging Sciences, 6(3):1598–1625, 2013.
 (22) S. Durand, J. Fadili, and M. Nikolova. Multiplicative noise removal using l1 fidelity on frame coefficients. Journal of Mathematical Imaging and Vision, 36(3):201–226, 2010.
 (23) M. Elad and M. Aharon. Image denoising via sparse and redundant representations over learned dictionaries. Image Processing, IEEE Transactions on, 15(12):3736–3745, 2006.
 (24) P. Escande, P. Weiss, and W. Zhang. A variational model for multiplicative structured noise removal.
 (25) W. Feng, H. Lei, and Y. Gao. Speckle reduction via higher order total variation approach. Image Processing, IEEE Transactions on, 23(4):1831–1843, April 2014.
 (26) V. Frost, K. Shanmugan, and J. Holtzman. A model for radar images and its applications to adaptive digital filtering of multiplicative noise. IEEE Trans. Pattern Anal. Mach. Intell., 4(2):157–166, 1982.
 (27) J. W. Goodman. Statistical properties of laser speckle patterns. In Laser speckle and related phenomena, pages 9–75. Springer, 1975.
 (28) Y.M. Huang, D.Y. Lu, and T. Zeng. Twostep approach for the restoration of images corrupted by multiplicative noise. SIAM J. Scientific Computing, 35(6), 2013.
 (29) Y.M. Huang, L. Moisan, M. K. Ng, and T. Zeng. Multiplicative noise removal via a learned dictionary. Image Processing, IEEE Transactions on, 21(11):4534–4543, 2012.
 (30) Z. Jin and X. Yang. A variational model to remove the multiplicative noise in ultrasound images. Journal of Mathematical Imaging and Vision, 39(1):62–74, 2011.
 (31) D. T. Kuan, A. Sawchuk, T. C. Strand, and P. Chavel. Adaptive restoration of images with speckle. Acoustics, Speech and Signal Processing, IEEE Transactions on, 35(3):373–383, 1987.
 (32) Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
 (33) J.S. Lee. Digital image enhancement and noise filtering by use of local statistics. Pattern Analysis and Machine Intelligence, IEEE Transactions on, (2):165–168, 1980.
 (34) J.S. Lee. Refined filtering of image noise using local statistics. Computer graphics and image processing, 15(4):380–389, 1981.
 (35) D. C. Liu and J. Nocedal. On the limited memory BFGS method for large scale optimization. Mathematical Programming, 45(1):503–528, 1989.
 (36) A. Lopes, E. Nezry, R. Touzi, and H. Laur. Maximum a posteriori speckle filtering and first order texture models in SAR images. In Geoscience and Remote Sensing Symposium, 1990, pages 2409–2412, 1990.
 (37) A. Lopes, E. Nezry, R. Touzi, and H. Laur. Structure detection and statistical adaptive speckle filtering in SAR images. International Journal of Remote Sensing, 14(9):1735–1758, 1993.
 (38) D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proc. 8th Int’l Conf. Computer Vision, 2001.
 (39) Y. Nesterov. Introductory lectures on convex optimization, volume 87 of Applied Optimization. Kluwer Academic Publishers, Boston, MA, 2004. A basic course.
 (40) P. Ochs, Y. Chen, T. Brox, and T. Pock. iPiano: Inertial Proximal Algorithm for Nonconvex Optimization. SIAM J. Imaging Sciences, 7(2):1388–1419, 2014.
 (41) C. Oliver and S. Quegan. Understanding synthetic aperture radar images. SciTech Publishing, 2004.
 (42) S. Parrilli, M. Poderico, C. V. Angelino, and L. Verdoliva. A nonlocal SAR image denoising algorithm based on LLMMSE wavelet shrinkage. IEEE T. Geoscience and Remote Sensing, 50(2):606–616, 2012.
 (43) P. Perona and J. Malik. Scalespace and edge detection using anisotropic diffusion. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 12(7):629–639, 1990.
 (44) J. Ranjani and S. J. Thiruvengadam. Dualtree complex wavelet transform based SAR despeckling using interscale dependence. IEEE T. Geoscience and Remote Sensing, 48(6):2723–2731, 2010.
 (45) J. J. Ranjani and S. Thiruvengadam. Generalized SAR despeckling based on dtcwt exploiting interscale and intrascale dependences. Geoscience and Remote Sensing Letters, IEEE, 8(3):552–556, 2011.
 (46) S. Roth and M. J. Black. Fields of experts. International Journal of Computer Vision, 82(2):205–229, 2009.
 (47) F. Sattar, L. Floreby, G. Salomonsson, and B. Lovstrom. Image enhancement based on a nonlinear multiscale method. IEEE transactions on image processing, 6(6):888–895, 1997.
 (48) G. Steidl and T. Teuber. Removing multiplicative noise by douglasrachford splitting methods. Journal of Mathematical Imaging and Vision, 36(2):168–184, 2010.
 (49) T. Teuber and A. Lang. A new similarity measure for nonlocal filtering in the presence of multiplicative noise. Computational Statistics & Data Analysis, 56(12):3821–3842, 2012.
 (50) T. Teuber and A. Lang. Nonlocal filters for removing multiplicative noise. Scale Space and Variational Methods in Computer Vision, 6667:50–61, 2012.
 (51) R. Touzi. A review of speckle filtering in the context of estimation theory. Geoscience and Remote Sensing, IEEE Transactions on, 40(11):2392–2404, 2002.
 (52) Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process., 13(4):600–612, 2004.
 (53) Y. Yu and S. T. Acton. Speckle reducing anisotropic diffusion. Image Processing, IEEE Transactions on, 11(11):1260–1270, 2002.