Fast Linearized Alternating Direction Minimization Algorithm with Adaptive Parameter Selection for Multiplicative Noise Removal

# Fast Linearized Alternating Direction Minimization Algorithm with Adaptive Parameter Selection for Multiplicative Noise Removal

Dai-Qiang Chen Department of Mathematics, Third Military Medical University, Chongqing 400038, Chongqing, People’s Republic of China Li-Zhi Cheng Department of Mathematics and System, School of Sciences, National University of Defense Technology, Changsha 410073, Hunan, People’s Republic of China
###### Abstract

Owing to the edge preserving ability and low computational cost of the total variation (TV), variational models with the TV regularization have been widely investigated in the field of multiplicative noise removal. The key points of the successful application of these models lie in: the optimal selection of the regularization parameter which balances the data-fidelity term with the TV regularizer; the efficient algorithm to compute the solution. In this paper, we propose two fast algorithms based on the linearized technique, which are able to estimate the regularization parameter and recover the image simultaneously. In the iteration step of the proposed algorithms, the regularization parameter is adjusted by a special discrepancy function defined for multiplicative noise. The convergence properties of the proposed algorithms are proved under certain conditions, and numerical experiments demonstrate that the proposed algorithms overall outperform some state-of-the-art methods in the PSNR values and computational time.

###### keywords:
Total variation; regularization parameter; discrepancy principle; linearized alternating direction minimization; multiplicative noise
journal: \newproof

pfProof

## 1 Introduction

Images contaminated by the additive Gaussian noise are widely investigated in the field of image restoration. However, for coherent imaging systems such as laser or ultrasound imaging, synthetic aperture radar (SAR) and optical coherence tomography, image acquisition processes are different from the usual optical imaging, and thus the multiplicative noise (speckle), rather than the additive noise, provides an appropriate description of these imaging systems. In this paper, we mainly consider the problem of the multiplicative noise removal.

Let be the observed image corrupted by the multiplicative noise. Our goal is to recover the original image from the observed image. The corresponding relationship can be expressed as follows:

 f=u⋅η, (1.1)

where is the multiplicative noise that follows the Gamma distribution, i.e.,

 Pη(x)=MMΓ(M)xM−1e−Mx1{x≥0}. (1.2)

In the last several years, various variational models and iterative algorithms [1, 2, 3, 4, 5, 6, 7] based on the TV regularization have been proposed for the multiplicative noise removal. Aubert and Aujol [1] used a maximum a posteriori (MAP) estimator and established the following TV regularized variational model:

 minuτ⟨logu+fu−1,1⟩+∥∇u∥1, (1.3)

where is a fixed regularization parameter, and is the (isotropic) total variation of , i.e.,

 ∥∇u∥1=m∑i=1n∑j=1√(∇1iju)2+(∇2iju)2.

Note that and denote the horizontal and vertical first order differences at pixel respectively, 1 denotes a matrix of ones with the same size as , the multiplication and division are performed in componentwise, and Neumann boundary conditions are used for the computation of the gradient operator and its adjoint (just the negative divergence operator) ; see [7, 19] for more details.

The objective function in the AA model (1.3) is nonconvex, and hence it is difficult to find a global optimal solution. Recently, in many existing literatures [2, 3], the log transformation was used to resolve the non-convexity. The variational model based on the log transformed image can be reformulated as follows:

 minuτ⟨u+fe−u,1⟩+∥∇u∥1. (1.4)

The above model overcomes the drawback of the AA model, and numerical experiments verify its efficiency [2]. In [4], the exponential of the solution of the exponential model (1.4) is proved to be equal to the solution of the classical I-divergence model

 minuτ⟨u−flogu,1⟩+∥∇u∥1. (1.5)

Very recently, new convex methods such as shifting technique [6] and m-th root transformation [8] were also investigated for this problem.

The augmented Lagrangian framework has recently been proposed to solve the exponential model (1.4) and the I-divergence model (1.5). Although this iterative technique is useful, inner iteration [3] or inverses involving the Laplacian operator [4] are required at each iteration. The computational cost of the inner iteration or the inverse operation is still expensive. Very recently, linearized techniques [9, 10, 11, 12] were widely used for accelerating the alternating minimization algorithm for solving the variational model in image processing. In [13], a fast proximal linearized alternating direction (PLAD) algorithm, which linearized both the fidelity term and the quadratic term of the augmented Lagrangian function, were proposed to solve the TV models (1.4) and (1.5). Numerical results there demonstrate that the PLAD algorithm overall outperforms the augmented Lagrangian method for multiplicative noise removal.

The regularization parameter in (1.4) controls the trade-off between the goodness-of-fit of and a smoothness requirement due to the TV regularization. The regularized solution highly depends on the selection of . Specifically, large leads to little smoothing, and thus, noise will remain almost unchanged in the denoised image or the regularized solution, whereas small leads to oversmoothing solution, so that fine details in the image are destroyed. Therefore, choosing a suitable is a key issue for the variational model (1.4). In the existing literatures, the main approaches for automatic parameter setting include the generalized cross validation [14], the L-curve [15, 16], the Stein unbiased risk estimator [17] and the discrepancy principle [18]. These methods are all based on the assumption of Gaussian noise and not suitable for the special denoised problem here. In [7], a new discrepancy principle based on the statistical characteristics of the multiplicative noise was proposed for selecting a proper value of . However, it needs to solve (1.4) or the corresponding I-divergence model several times for a sequence of ’s. Hence the computational cost is expensive.

In this paper, we use the special discrepancy principle for the multiplicative noise to compute . A linearized alternating direction minimization algorithm with auto-adjusting of the regularization parameter is proposed to solve (1.4). In each iteration of the proposed algorithm, the regularization parameter is updated in order to guarantee that the denoised image satisfies the discrepancy principle. Due to the use of the linearization technique, the solution of the subproblem in the iteration step has a closed-form expression, and therefore the zero point of the discrepancy function with respect to can be computed by the Newton method efficiently. Numerical experiments show that our algorithm is very effective in finding a proper . Moreover, the proposed algorithm can be extended to the case of the spatially adapted regularization parameter without extra computational burden almost.

We will give the convergence proof of the proposed algorithms with adaptive parameter estimation. With a fixed regularization parameter, our algorithms are reduced to the PLAD algorithm proposed in [13]. As we known, the convergence of the original PLAD method is unsolved at present. Therefore, one contribution of our work is to supply a complete convergence proof for the PLAD method.

The rest of this paper is organized as follows. In section 2 we briefly review the PLAD algorithms proposed in [13]. In section 3, we utilize the statistical characteristic of the Gamma noise to define a special discrepancy function, and then propose a new linearized alternating direction minimization algorithm, which automatically estimate the value of the regularization parameter by computing the zero point of the corresponding discrepancy function. Finally the convergence properties of the proposed iterative algorithm are proved. In section 4 we extend the proposed algorithm to the case of the spatially-adaptive regularization parameter. In section 5 the numerical results are reported to compare the proposed algorithms with some of the recent state-of-the-art methods.

## 2 The proximal linearized alternating direction method

In this section, we briefly review the PLAD algorithm proposed in [13], which will serve as the foundation for the algorithm presented in the next section. Consider the following TV regularized minimization problem:

 minuH(u)+∥∇u∥1, (2.1)

where is a real-valued, convex and smooth function. The corresponding constrained optimization problem can be reformulated as:

 minu,z{H(u)+∥z∥1|∇u=z}. (2.2)

The augmented Lagrangian function for (2.2) is defined as

 Lρ(u,z,b)=H(u)+∥z∥1+⟨b,z−∇u⟩+ρ2∥z−∇u∥22. (2.3)

Then the well-known ADMM (alternating direction method of multipliers) for solving (2.1) can be expressed as follows:

 ⎧⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪⎩uk+1=argminu{H(u)+⟨bk,zk−∇u⟩+ρ2∥zk−∇u∥22},zk+1=argminz{∥z∥1+⟨bk,z−∇uk+1⟩+ρ2∥z−∇uk+1∥22},bk+1=bk+ρ(zk+1−∇uk+1). (2.4)

The first subproblem in (2.4) is difficult to solve. Therefore, the authors in [13] used the second-order Taylor expansion of at , and meanwhile replaced the Hessian matrix with ( is a constant). Then the first subproblem in (2.4) can be simplified as follows:

 uk+1=argminu{⟨∇H(uk)+ρdiv(zk−∇uk),u−uk⟩+⟨bk,zk−∇u⟩+12δ∥u−uk∥22}. (2.5)

Replacing the first subproblem in (2.4) with (2.5) we obtain the PLAD algorithm for the minimization problem (2.1).

Now consider the variational models (1.4) and (1.5). They can be reformulated as

 minuH(u)+λ∥∇u∥1, (2.6)

where , and or . While the PLAD algorithm is applied to solve the minimization problem in (2.6), it can be expressed as the following simple form:

 ⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩uk+1=uk−δ(1−fφ−1(uk)+ρdiv(zk−∇uk)+divbk),zk+1=shrink(∇uk+1−bkρ,λρ),bk+1=bk+ρ(zk+1−∇uk+1), (2.7)

where for the exponential model and for the I-divergence model. The shrinkage operator is defined as:

 z=shrink(u,c)=max(∥u∥2−c,0)u∥u∥2.

If we further assume that , and , then the first formula in (2.7) should be replaced by

 uk+1=PU(uk−δ(1−fφ−1(uk)+ρdiv(zk−∇uk)+divbk)), (2.8)

where denotes the projection onto the set .

## 3 Linearized alternating minimization algorithm with adaptive parameter selection

In this section, we describe the proposed linearized alternating minimization algorithm with adaptive parameter selection. To this end, we first introduce the statistical characteristic of some random variable with respect to Gamma noise . It will play an important role in developing our algorithm.

###### Lemma 3.1

Let be a Gamma random variable (r.v) with mean 1 and standard deviation . Consider the following discrepancy function with respect to

 I(η)=η−logη. (3.1)

Then the following estimate of the expected value of holds true for large :

 E{I(η)}=1+12M+112M2−52M3+O(1M3). (3.2)

The above conclusion has appeared in the existing literature [7], and therefore we omit the proof here.

Next, we consider the PLAD algorithm for solving the exponential model (1.4). Choose . The solution in (2.5) has a closed expression as follows:

 uk+1=uk−δ(τ(1−fe−uk)+ρdiv(zk−∇uk)+divbk), (3.3)

which can be seen as a linear function with respect to . Since is an approximation of the log transformed original image, we have

 uk+1+fe−uk+1−logf=feuk+1−logfeuk+1≈η−logη. (3.4)

Denote

 A1=−δ(1−fe−uk),
 A2=uk−δ(ρdiv(zk−∇uk)+divbk).

Then . Let . Then according to Lemma 3.1 and the relation (3.4), for large we have

 1mnm∑i=1n∑j=1(A1τ+A2+fe−(A1τ+A2)−logf)i,j−C≈0. (3.5)

On the other hand, considering the function , it is monotonically increasing for and decreasing for . Besides, . The denoised image is a smoothed version of the noisy image , and its value will be far from under the over-smoothness condition (with small ). Therefore, we demand that obtained by (3.3) satisfies the following inequality in order to avoid over-smoothness:

 K(τ;wk)=1mnm∑i=1n∑j=1(A1τ+A2+fe−(A1τ+A2)−logf)i,j−¯C≤0, (3.6)

where and is a constant for each iteration. According to (3.5) we infer that is proper for large . In fact, we use this setting for most experiments in section 5. The function in (3.6) is called as the discrepancy function defined for Gamma noise.

The relation in (3.6) inspires us to update the value of in the iteration step as follows: if , then . Otherwise update just as the zero point of the function , i.e., . The following lemma guarantees the existence of the solution of .

###### Lemma 3.2

Assume that , and , then the equation has at least one solution.

Proof: Obviously, is continuous with respect to . If , then . Hence there exists such that for any . If , then there exists for some , and thus . This shows that there exists such that for any . Moreover, due to , there exists satisfies . Therefore, the equation has at least one solution.

Based on the above analysis, we obtain the linearized alternating direction minimization algorithm with adaptive parameter selection, which is summarized in Algorithm 1.

### 3.1 Convergence analysis of the proposed algorithm

Here we investigate the convergence properties of the sequence generated by Algorithm 1. Assume that , and is one solution of the variational model

 minu,z¯τ⟨u+fe−u,1⟩+∥z∥1   s.t.  ∇u=z. (3.7)

Choose in Algorithm 1. Denote , and as a Lipschitz constant that satisfies

 ∥∇D(u1)−∇D(u2)∥≤LD∥u1−u2∥

for any ( denotes unless otherwise specified in the following). Then we have the following convergence result.

###### Theorem 3.3

Let be the sequence generated by Algorithm 1 with . Then it converges to a point where the first-order optimality conditions for (3.7) are satisfied.

Proof: The first-order optimality conditions for the sequence generated by Algorithm 1 are

 ⎧⎪ ⎪⎨⎪ ⎪⎩0=τk+1∇D(uk)+1δ(uk+1−uk)+ρdiv(zk−∇uk)+divbk,0=tk+1+ρ(zk+1−∇uk+1+ρ−1bk),bk+1=bk+ρ(zk+1−∇uk+1), (3.8)

where . Since is one solution of (3.7), the first-order optimality conditions can be rewritten as

 0=¯t+¯b,   0=¯τ∇D(¯u)+div¯b,   0=∇¯u−¯z (3.9)

for some . Rearrange (3.9) to get

 ⎧⎪ ⎪⎨⎪ ⎪⎩0=¯τ∇D(¯u)+1δ(¯u−¯u)+ρdiv(¯z−∇¯u)+div¯b,0=¯t+ρ(¯z−∇¯u+ρ−1¯b),¯b=¯b+ρ(¯z−∇¯u). (3.10)

Denote the errors by , , and . Subtracting (3.10) from (3.8) we obtain

 ⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩0=τk+1∇D(uk)−¯τ∇D(¯u)+1δ(uk+1e−uke)+ρdiv(zke−∇uke)+divbke,0=tk+1e+ρ(zk+1e−∇uk+1e+ρ−1bke),bk+1e=bke+ρ(zk+1e−∇uk+1e). (3.11)

Taking the inner product with , and on both sides of the three equations of (3.11) respectively, we obtain

 ⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩0=⟨τk+1∇D(uk)−¯τ∇D(¯u),uk+1e⟩+1δ⟨uk+1e−uke,uk+1e⟩+⟨ρdiv(zke−∇uke)+% divbke,uk+1e⟩,0=⟨tk+1e,zk+1e⟩+ρ⟨zk+1e−∇uk+1e+ρ−1bke,zk+1e⟩,⟨bk+1e,bke⟩=⟨bke,bke⟩+ρ⟨zk+1e−∇uk+1e,bke⟩. (3.12)

Due to , according to (3.12) we get

 ⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩⟨τk+1∇D(uk)−¯τ∇D(¯u),uk+1e⟩+12δ(∥uk+1e∥2+∥uk+1−uk∥2−∥uke∥2)=ρ⟨∇uk+1e,zke+ρ−1bke−∇uke⟩,⟨tk+1e,zk+1e⟩=⟨−bke,zk+1e⟩+ρ⟨∇uk+1e−zk+1e,zk+1e⟩,12ρ(∥bk+1e∥2−∥bke∥2)=ρ2∥∇uk+1e−zk+1e∥2+⟨bke,zk+1e−∇uk+1e⟩, (3.13)

The last equation in (3.13) uses the relation . Adding the equations in (3.13) we obtain

 12δ(∥uk+1e∥2+∥uk+1−uk∥2−∥uke∥2)+⟨τk+1∇D(uk)−¯τ∇D(¯u),uk+1e⟩+⟨tk+1e,zk+1e⟩+12ρ(∥bk+1e∥2−∥bke∥2)=ρ⟨∇uk+1e,zke−∇uke⟩+ρ⟨∇uk+1e−zk+1e,zk+1e⟩+ρ2∥∇uk+1e−zk+1e∥2=−ρ2∥∇uk+1e−zke∥2+ρ2(∥∇uk+1e∥2+∥∇(uk+1−uk)∥2−∥∇uke∥2)−ρ2(∥zk+1e∥2−∥zke∥2). (3.14)

Besides,

 ⟨τk+1∇D(uk)−¯τ∇D(¯u),uk+1e⟩ = τk+1⟨∇D(uk)−∇D(¯u),uk+1e⟩+(τk+1−¯τ)⟨∇D(¯u),uk+1e⟩ = τk+1⟨Quke,Quk+1e⟩+(τk+1−¯τ)⟨∇D(¯u),uk+1e⟩ = τk+12(∥Quke∥2+∥Quk+1e∥2−∥Q(uk+1−uk)∥2)+(τk+1−¯τ)⟨∇D(¯u),uk+1e⟩,

where and ( is a positive-definite matrix due to ).

Substituting it in (3.14) we obtain

 1δ∥uk+1e∥2−ρ∥∇uk+1e∥2+ρ∥zk+1e∥2+1ρ∥bk+1e∥2+τk+1(∥Quke∥2+∥Quk+1e∥2)+(1δ∥uk+1−uk∥2−ρ∥∇(uk+1−uk)∥2−τk+1∥Q(uk+1−uk)∥2)+2(τk+1−¯τ)⟨∇D(¯u),uk+1e⟩+2⟨tk+1e,zk+1e⟩+ρ∥∇uk+1−zk∥2=1δ∥uke∥2−ρ∥∇uke∥2+ρ∥zke∥2+1ρ∥bke∥2. (3.15)

Since , we have

 (1δ∥uk+1−uk∥2−ρ∥∇(uk+1−uk)∥2−τk+1∥Q(uk+1−uk)∥2)≥0.

Moreover, by the convexity of we have . By the update criterion of in Algorithm 1 we infer that , and therefore

 1mn⟨uk+1+fe−uk+1−logf,1⟩≤¯C,

which implies that by the definition of . Due to , we infer that . By the convexity of we also have .

After removing the nonnegative terms except the first four terms in the left side of the equation (3.15), we obtain

 1δ∥uk+1e∥2−ρ∥∇uk+1e∥2+ρ∥zk+1e∥2+1ρ∥bk+1e∥2≤1δ∥uke∥2−ρ∥∇uke∥2+ρ∥zke∥2+1ρ∥bke∥2. (3.16)

Since , we know that for any . According to (3.16) we conclude that is bounded and hence is also bounded.

By summing (3.15) from some to , we can derive

 +∞∑k=j0(1δ∥uk+1−uk∥2−ρ∥∇(uk+1−uk)∥2−τk+1∥Q(uk+1−uk)∥2)++∞∑k=j0(2(τk+1−¯τ)⟨∇D(¯u),uk+1e⟩+2⟨tk+1e,zk+1e⟩+ρ∥∇uk+1−zk∥2)=1δ∥uj0e∥2−ρ∥∇uj0e∥2+ρ∥zj0e∥2+1ρ∥bj0e∥2, (3.17)

and hence we obtain

 +∞∑k=j0(1δ−ρ∥△∥2−¯τLD)∥uk+1−uk∥2++∞∑k=j0ρ∥∇uk+1−zk∥2<+∞. (3.18)

This implies that

 limk→+∞∥uk+1−uk∥=0,   limk→+∞∥∇uk+1−zk∥=0. (3.19)

Therefore,

 ∥∇uk−zk∥≤∥∇(uk+1−uk)∥+∥∇uk+1−zk∥→0,  as  k→+∞. (3.20)

According to (3.19) and (3.20) we also have

 ∥zk+1−zk∥≤∥∇uk+1−zk+1∥+∥∇uk+1−zk∥→0,  as  k→+∞. (3.21)

Moreover, by the relation of and (3.20) we have

 limk→+∞∥bk+1−bk∥=0. (3.22)

In (3.16) we have shown that is bounded. Then there exists a convergent subsequence, also denoted by for convenience, converging to a limit . Due to , it follows that and hence there exists a subsequence of converging to a limit . Besides, as , there exists a subsequence of converging to .

For convenience, let be a subsequence converging to . Since is a closed proper convex function, according to Theorem 24.4 in [20] we have .

Let in (3.8) while utilizing the convergence results in (3.19)-(3.22) to obtain

 ⎧⎪⎨⎪⎩0=¯τ∇D(u∞)+ρdiv(z∞−∇u∞)+divb∞,0=t∞+ρ(z∞−∇u∞+ρ−1b∞),0=z∞−∇u∞, (3.23)

which is equivalent to

 0=¯τ∇D(u∞)+divb∞,   0=t∞+b∞,   0=∇u∞−z∞.

Hence the limit point satisfies the first-order optimality conditions for (3.7).

The proof of this theorem started with any saddle point . Here we consider the special case of which is the limit of a convergent subsequence