Second Order Differences of Cyclic Dataand Applications in Variational Denoising

# Second Order Differences of Cyclic Data and Applications in Variational Denoising

## Abstract

In many image and signal processing applications, as interferometric synthetic aperture radar (SAR) or color image restoration in HSV or LCh spaces the data has its range on the one-dimensional sphere . Although the minimization of total variation (TV) regularized functionals is among the most popular methods for edge-preserving image restoration such methods were only very recently applied to cyclic structures. However, as for Euclidean data, TV regularized variational methods suffer from the so called staircasing effect. This effect can be avoided by involving higher order derivatives into the functional.

This is the first paper which uses higher order differences of cyclic data in regularization terms of energy functionals for image restoration. We introduce absolute higher order differences for -valued data in a sound way which is independent of the chosen representation system on the circle. Our absolute cyclic first order difference is just the geodesic distance between points. Similar to the geodesic distances the absolute cyclic second order differences have only values in . We update the cyclic variational TV approach by our new cyclic second order differences. To minimize the corresponding functional we apply a cyclic proximal point method which was recently successfully proposed for Hadamard manifolds. Choosing appropriate cycles this algorithm can be implemented in an efficient way. The main steps require the evaluation of proximal mappings of our cyclic differences for which we provide analytical expressions. Under certain conditions we prove the convergence of our algorithm. Various numerical examples with artificial as well as real-world data demonstrate the advantageous performance of our algorithm.

\DeclareCaptionLabelSeparator

periodspace.  \captionsetupformat=hang,labelsep=periodspace,indention=-2em,labelfont=bf,width=.95skip=.5 \captionsetup[subfigure]aboveskip=.5ex,belowskip=1.5ex,indention=0cm,labelformat=simple,labelsep=space, labelfont=normal, hypcap=true,width=.975skip=.5 \setkomafontsectioning \setkomafonttitle \setlistleftmargin=0pt,labelwidth=*, ,align=left,itemsep=0pt

## 1 Introduction

A frequently used method for edge-preserving image denoising is the variational approach which minimizes the Rudin-Osher-Fatemi (ROF) functional [40]. In a discrete (penalized) form the ROF functional can be written as

 ∑i,j(fi,j−xi,j)2+λ∑i,j|∇xi,j|,λ>0,

where is the given corrupted image and denotes the discrete gradient operator which contains usually first order forward differences in vertical and horizontal directions. The regularizing term can be considered as discrete version of the total variation (TV) functional. Since the gradient does not penalize constant areas the minimizer of the ROF functional tends to have such regions, an effect known as staircasing. An approach to avoid this effect consists in the employment of higher order differences/derivatives. Since the pioneering work [10] which couples the TV term with higher order terms by infimal convolution various techniques with higher order differences/derivatives were proposed in the literature, among them [8, 11, 12, 15, 16, 27, 29, 31, 32, 41, 42, 43].

In various applications in image processing and computer vision the functions of interest take values on the circle or another manifold. Processing manifold-valued data has gained a lot of interest in recent years. Examples are wavelet-type multiscale transforms for manifold data  [25, 37, 49] and manifold-valued partial differential equations  [13, 24]. Finally we like to mention statistical issues on Riemannian manifolds [19, 20, 36] and in particular the statistics of circular data [18, 28]. The TV notation for functions with values on a manifold has been studied in [22, 23] using the theory of Cartesian currents. These papers were an extension of the previous work [21] were the authors focus on -valued functions and show in particular the existence of minimizers of certain energies in the space of functions with bounded total cyclic variation. The first work which applies a cyclic TV approach among other models for imaging tasks was recently published by Cremers and Strekalovskiy in [44, 45]. The authors unwrapped the function values to the real axis and proposed an algorithmic solution to account for the periodicity. An algorithm which solves TV regularized minimization problems on Riemannian manifolds was proposed by Lellmann et al. in [30]. They reformulate the problem as a multilabel optimization problem with an infinite number of labels and approximate the resulting hard optimization problem using convex relaxation techniques. The algorithm was applied for chromaticity-brightness denoising, denoising of rotation data and processing of normal fields for visualization. Another approach to TV minimization for manifold-valued data via cyclic and parallel proximal point algorithms was proposed by one of the authors and his colleagues in [50]. It does not require any labeling or relaxation techniques. The authors apply their algorithm in particular for diffusion tensor imaging and interferometric SAR imaging. For Cartan-Hadamard manifolds convergence of the algorithm was shown based on a recent result of Bačák [1]. Unfortunately, one of the simplest manifolds that is not of Cartan-Hadamard type is the circle .

In this paper we deal with the incorporation of higher order differences into the energy functionals to improve denoising results for -valued data. Note that the (second-order) total generalized variation was generalized for tensor fields in [46]. However, to the best of our knowledge this is the first paper which defines second order differences of cyclic data and uses them in regularization terms of energy functionals for image restoration. We focus on a discrete setting. First we provide a meaningful definition of higher order differences for cyclic data which we call absolute cyclic differences. In particular our absolute cyclic first order differences resemble the geodesic distance (arc length distance) on the circle. As the geodesics the absolute cyclic second order differences take only values in . This is not necessary the case for differences of order larger than two. Following the idea in [50] we suggest a cyclic proximal point algorithm to minimize the resulting functionals. This algorithm requires the evaluation of certain proximal mappings. We provide analytical expression for these mappings. Further, we suggest an appropriate choice of the cycles such that the whole algorithm becomes very efficient. We apply our algorithm to artificial data as well as to real-world interferometric SAR data.

The paper is organized as follows: in Section 2 we propose a definition of differences on . Then, in Section 3, we provide analytical expressions for the proximal mappings required in our cyclic proximal point algorithm. The approach is based on unwrapping the circle to  and considering the corresponding proximal mappings on the Euclidean space. The cyclic proximal point algorithm is presented in Section 4. In particular we describe a vectorization strategy which makes the Matlab implementation efficient and provides parallelizability, and prove its convergence under certain assumptions. Section 5 demonstrates the advantageous performance of our algorithm by numerical examples. Finally, conclusions and directions of future work are given in Section 6.

## 2 Differences of S1–valued data

Let be the unit circle in the plane

 Extra open brace or missing close brace

endowed with the geodesic distance (arc length distance)

 dS1(p,q)=arccos(⟨p,q⟩).

Given a base point , the exponential map  from the tangent space  of at onto is defined by

 expq(x)=Rxq,Rx:=(cosx−sinxsinxcosx).

This map is -periodic, i.e., for any , where denotes the unique point in such that , . Some useful properties of the mapping  (which can also be considered as mapping from onto ) are collected in the following remark.

###### Remark 2.1.

The following relations hold true:

1. for all .

2. If then for all .

While i) follows by straightforward computation relation ii) can be seen as follows: For there exists such that

 x−y=(x−y)2π+2πk=z+2πk.

Hence it follows and since further

 x=(x)2π=(z+y+2πk)2π=(z+y)2π.

To guarantee the injectivity of the exponential map, we restrict its domain of definition from to . Thus, for , there is now a unique satisfying . In particular we have . Given such representation system  of , centered at an arbitrary point  on  the geodesic distance becomes

 dS1(p1,p2)=d(x1,x2)=mink∈Z|x2−x1+2πk|=|(x2−x1)2π|. (1)

Actually we need only in the minimum. Clearly, this definition does not depend on the chosen center point .

We want to determine general finite differences of -valued data. Let with

 ⟨w,1d⟩=d∑j=1wj=0, (2)

where denotes the vector with components one. We define the finite difference operator by

 Δ(x;w):=⟨x,w⟩for allx∈Rd.

By (2), we see that vanishes for constant vectors and is therefore translation invariant, i.e.,

 Δ(x+α1d;w)=Δ(x;w)for allα∈R. (3)
###### Example 2.2.

For the binomial coefficients with alternating signs

 w=bn:=((−1)j+n−1(nj−1))n+1j=1

we obtain the (forward) differences of order :

 Δ(x;w)=Δn(x)=⟨x,bn⟩=n+1∑j=1(−1)j+n−1(nj−1)xj.

Note that does not only fulfill (2), but vanishes exactly for all ‘discrete polynomials of order ’, i.e., for all vectors from . Here we are interested in first and second order differences

 Δ1(x1,x2) =Δ(x;b1)=x2−x1, Δ2(x1,x2,x3) =Δ(x;b2)=x1−2x2+x3.

Moreover, we will apply the ‘mixed second order’ difference with and use the notation

 Δ1,1(x1,x2,x3,x4)=Δ(x;b1,1)=−x1+x2+x3−x4.

We want to define differences for points using their representation with respect to an arbitrary fixed center point. As the geodesic distance (1) these differences should be independent of the choice of the center point. This can be achieved if and only if the differences are shift invariant modulo . Let . We define the absolute cyclic difference of (resp. ) with respect to by

 Missing or unrecognized delimiter for \big (4)

where denotes the component-by-component application of if , and , . The definition allows that points having the same value are treated separately, cf. Figure 5. This ensures that is a continuous map. For example we have . Figures 4 and 5 illustrate definition (4). For the absolute cyclic differences related to the differences in Example 2.2 we will use the simpler notation

 dn(x):=d(x;bn)andd1,1(x):=d(x;b1,1). (5)

The following equivalent definition of absolute cyclic differences appears to be useful.

###### Lemma 2.3.

Let be sorted in ascending order as and set . Let denote the corresponding permutation matrix, i.e., and Consider the shifted versions of given by

 xk=x1+2πk−1∑j=1ejk=2,…,d,

where denotes the -th unit vector. Then it holds

 Missing or unrecognized delimiter for \Bigl (6)
###### Proof.

The first equality in (6) follows directly by definition (4). To see the second one, note that by linearity of the inner product we have

 ⟨PTxk,w⟩=⟨PTx1,w⟩+2π⟨k−1∑j=1ej,Pw⟩=⟨x,w⟩+2π⟨k−1∑j=1ej,Pw⟩. (7)

For the geodesic distance we obtain by (1) that . In general the relation

 d(x;w)=|(⟨x,w⟩)2π| for all x∈[−π,π)d (8)

does not hold true as the following example shows.

###### Example 2.4.

In general the -th order absolute cyclic difference cannot be written as Consider for example the absolute cyclic third order difference for given by (6) as

 d3(x1,x2,x3,x4)=mink=1,2,3,4Δ3(xk),Δ3(x)=−x1+3x2−3x3+x4.

We obtain

 Δ3(x1)=Δ3(x)=−46π16,Δ3(x2)=Δ3(x4)=−78π16,Δ3(x3)=18π16,

so that .

For relation  (8) holds true by the next lemma.

###### Proposition 2.5.

For the following relation holds true:

 Missing or unrecognized delimiter for \bigl (9)

Note that we need only the minimum over in Proposition 2.5 and more precisely

 d(x;w)=⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩|Δ(x;w)|if |Δ(x;w)|∈[0,π],|Δ(x;w)−2πσ|=2π−|Δ(x;w)|if|Δ(x;w)|∈(π,2π],|Δ(x;w)|−2πif |Δ(x;w)|∈(2π,3π],|Δ(x;w)−4πσ|=4π−|Δ(x;w)|if|Δ(x;w)|∈(3π,4π), (10)

where and

 sgn(x):=⎧⎪⎨⎪⎩1 ifx>0,0 ifx=0,−1 otherwise.
###### Proof.

Since for , we see that and .

First we consider . By Lemma 2.3 we obtain

 Missing or unrecognized delimiter for \bigl (11)

where we can assume by the cyclic shift invariance of that .

If , then the corresponding permutation matrix in Lemma 2.3 is the identity matrix. Further we obtain that and by (11) we get

 |Δ2(PTx2)|=|Δ2(x2)|=|Δ2(x)+2π|and|Δ2(x3)|=|Δ2(x)−2π|.

If , then and . In this case we get

 |Δ2(PTx2)|=|Δ2(x)+2π|and|Δ2(PTx3)|=|Δ2(x)+4π|.

This proves the first assertion.

For we can again assume that . Exploiting that

 Δ1,1(x1,x2,x3,x4)=Δ1,1(x1,x3,x2,x4)

we have to consider the following three cases:

If , then is the identity matrix, and

 |Δ1,1(x2)|=|Δ2(x)−2π|,|Δ1,1(x3)|=|Δ2(x)|,|Δ1,1(x4)|=|Δ2(x)+2π|.

If , then  and . By  (7) we have

 |Δ1,1(x2)|=|Δ2(x)−2π|,|Δ1,1(x3)|=|Δ2(x)|,|Δ1,1(x4)|=|Δ2(x)−2π|.

If , then and . Here we obtain

 |Δ1,1(x2)|=|Δ2(x)−2π|,|Δ1,1(x3)|=|Δ2(x)−4π|,|Δ1,1(x4)|=|Δ2(x)−2π|.

This finishes the proof. ∎

## 3 Proximal mapping of absolute cyclic differences

For a proper, closed, convex function and the proximal mapping is defined by

 Extra open brace or missing close brace (12)

see [34]. The above minimizer exits and is uniquely determined. Many algorithms which were recently used in variational image processing reduce to the iterative computation of values of proximal mappings. An overview of applications of proximal mappings is given in [35].

In this section, we are interested in proximal mappings of absolute cyclic differences , i.e., for . More precisely, we will determine for -valued vectors represented by the values

 proxλd(⋅;w)p(f):=argminx∈[−π,π)d12d∑j=1d(xj,fj)2+λd(x;w)p,λ>0

for and first and second order absolute cyclic differences , . Here means that we are looking for the representative of in . In particular, we will see that these proximal mapping are single-valued for with and have two values for .

We start by considering the proximal mappings of the appropriate differences in . Then we use the results to find the proximal functions of the absolute cyclic differences.

### 3.1 Proximity of differences on Rd

First we give analytical expressions for , where and , . Since we could not find a corresponding reference in the literature, the computation of the minimizer of

 E(x;f,a,w):=12∥f−x∥22+λ|⟨x,w⟩−a|p,λ>0 (13)

###### Lemma 3.1.

For given and , set

 s:=sgn(⟨f,w⟩−a)andμ:=⟨f,w⟩−a∥w∥22.

Then the minimizer of

 E(x;f,a,w):=12∥f−x∥22+λ|⟨x,w⟩−a|,λ>0 (14)

is given by

 ^x=f−smin{λ,|μ|}w (15)

and the minimum by

 E(^x;f,a,w)={∥w∥2212μ2 if |μ|≤λ,∥w∥22(12λ2+λ(|μ|−λ)) otherwise. (16)
###### Proof.

Since , there exists a component and we rewrite

 Missing or unrecognized delimiter for \big

Substituting , and , we see that , where  is the minimizer of

 F(y;g,v):=12∥g−y∥22+ν|⟨v,y⟩|.

The (Fenchel) dual problem of reads

 ^t:=argmint∈R{∥g−tw∥22subject to|t|≤ν} (17)

and the relation between the minimizers of the primal and dual problems is given by

 ^y=g−^tv. (18)

Rewriting (17) we see that is the minimizer of

 (t−~μ)2subject to|t|≤ν,

where . Hence we obtain

 ^t={~μ if |~μ|≤ν,sgn(~μ)ν otherwise.

and by (18) further

 ^y=g−sgn(~μ)min{ν,|~μ|}v.

Substituting back results in (15) and plugging into we get (16). ∎

###### Example 3.2.

Let , , and .

1. For and we get and so that the minimizer of follows by soft shrinkage of with threshold :

 ^x=(f1+smf2−sm),m:=min{λ,|f2−f1|2}. (19)
2. For and we obtain and . Consequently, the minimizer of is given by

 ^x=⎛⎜⎝f1−smf2+2smf3−sm⎞⎟⎠,m:=min{λ,|f1−2f2+f3|6}. (20)
3. For and we obtain and , so that the minimizer of is given by

 ^x=⎛⎜ ⎜ ⎜⎝f1+smf2−smf3−smf4+sm⎞⎟ ⎟ ⎟⎠,m:=min{λ,|f2−f1+f3−f4|4}. (21)

We will apply the following corollary.

###### Corollary 3.3.

Let . Further, let and be given such that . Then

 minx∈RdE(x;f,a,w)
###### Proof.

Set and . By assumption and according to (16) we have to consider three cases.

[label=0.]
1. Let . Then by assumption also  and we conclude by (16) that

 minx∈RdE(x;f,a,w)=12∥w∥22μ2<12∥w∥22~μ2=minx∈RdE(x;~f,~a,w).
2. Let and . By (16) this implies

 minx∈RdE(x;f,a,w) =12∥w∥22μ2, minx∈RdE(x;~f,~a,w) =12∥w∥22λ2+∥w∥22λ(|~μ|−λ).

Since and we obtain .

3. Let and . By (16) this implies

 minx∈RdE(x;f,a,w) =12∥w∥22λ2+∥w∥22λ(|μ|−λ) <12∥w∥22λ2+∥w∥22λ(|~μ|−λ)=minx∈RdE(x;~f,~a,w)

and we are done.∎

Next we consider the case .

###### Lemma 3.4.

Let .

1. Then, for and , the minimizer of

 E(x;f,a,w)=∥f−x∥22+λ(⟨x,w⟩−a)2,λ>0 (23)

is given by

 Missing or unrecognized delimiter for \right

and the minimum by

 E(^x;f,a,w)=λ1+λ∥w∥22(⟨f,w⟩−a)2. (24)
2. If for some and , then

 minx∈RdE(x;f,a,w)
###### Proof.
1. Setting the gradient of (23) to zero results in

 2(x−f)+2λ(⟨x,w⟩−a)w =0, (I+λwwT)x =f+λaw.

Using the Sherman-Morrison formula [7, p. 129] it follows

 ^x =(I−λ1+λ∥w∥22wwT)(f+λaw) Missing or unrecognized delimiter for \right =f−λ(⟨f,w⟩−a)1+λ∥w∥22w.

For the corresponding energy we obtain by straightforward computation

 E(^x;f,a,w) =∥f−^x∥22+λ(⟨x,w⟩−a)2 =λ2(⟨f,w⟩−a)2(1+λ∥w∥22)2∥w∥22+λ[⟨f,w⟩−λ(⟨f,w⟩−a)∥w∥221+λ∥w∥22−a]2 Missing or unrecognized delimiter for \bigl
2. follows directly from (24).∎

### 3.2 Proximity of absolute cyclic differences of first and second order

Now we turn to -valued data represented by . We are interested in the minimizers of

 E(x;f,w):=12d∑j=1d(fj,xj)2+λd(x;w)p,λ>0 (26)

###### Theorem 3.5.

For set . Let and , where is adapted to the respective length of .

1. If , then the unique minimizer of  is given by

 ^x=(f−smw)2π,m:=min{λ,|(⟨f,w⟩)2π|∥w∥22}. (27)
2. If , then has the two minimizers

 ^x=(f∓smw)2π,m:=min{λ,π∥w∥22}. (28)

Note that for case ii) appears exactly if and are antipodal points.

###### Proof.

By (1) and Lemma 2.5 we can rewrite in (26) as

 E(x;f,w) :=12d∑j=1minkj∈Z|fj−xj−2πkj|2+λminσ∈Z|⟨x,w⟩−2πσ| (29) Extra open brace or missing close brace (30)

where . Let

 Ek,σ(x):=12∥f−x−2πk∥22+λ|⟨x,w⟩−2πσ|.

We are looking for

 minx∈[−π,π)dE(x;f,w)=minx∈[−π,π)dmink∈Zdσ∈ZEk,σ(x)=mink∈Zdσ∈Zminx∈[−π,π]dEk,σ(x)