New explicit thresholding/shrinkage formulas for one class of regularization problems with overlapping group sparsity and their applicationsThe work of Gang Liu, Ting-Zhu Huang and Jun Liu is supported by 973 Program (2013CB329404), NSFC (61370147), Sichuan Province Sci. & Tech. Research Project (2012GZX0080). The work of Xiao-Guang Lv is supported by Postdoctoral Research Funds (2013M540454, 1301064B).

# New explicit thresholding/shrinkage formulas for one class of regularization problems with overlapping group sparsity and their applications††thanks: The work of Gang Liu, Ting-Zhu Huang and Jun Liu is supported by 973 Program (2013CB329404), NSFC (61370147), Sichuan Province Sci. & Tech. Research Project (2012GZX0080). The work of Xiao-Guang Lv is supported by Postdoctoral Research Funds (2013M540454, 1301064B).

Gang Liu, Ting-Zhu Huang, Xiao-Guang Lv, Jun Liu School of Mathematical Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan, 611731, P. R. China (wd5577@163.com).School of Mathematical Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan, 611731, P. R. China (tingzhuhuang@126.com).School of Mathematical Sciences, Nanjing Normal University, Nanjing, Jiangsu, 210097, P. R. China (xiaoguanglv@126.com).School of Mathematical Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan, 611731, P. R. China (junliucd@163.com).
###### Abstract

The least-square regression problems or inverse problems have been widely studied in many fields such as compressive sensing, signal processing, and image processing. To solve this kind of ill-posed problems, a regularization term (i.e., regularizer) should be introduced, under the assumption that the solutions have some specific properties, such as sparsity and group sparsity. Widely used regularizers include the norm, total variation (TV) semi-norm, and so on. Recently, a new regularization term with overlapping group sparsity has been considered. Majorization minimization iteration method or variable duplication methods are often applied to solve them. However, there have been no direct methods for solve the relevant problems because of the difficulty of overlapping. In this paper, we proposed new explicit shrinkage formulas for one class of these relevant problems, whose regularization terms have translation invariant overlapping groups. Moreover, we apply our results in TV deblurring and denoising with overlapping group sparsity. We use alternating direction method of multipliers to iterate solve it. Numerical results also verify the validity and effectiveness of our new explicit shrinkage formulas.

Key words: overlapping group sparsity; regularization; explicit shrinkage formula; total variation; ADMM; deblurring

## 1 Introduction

The least-square regression problems or inverse problems have been widely studied in many fields such as compressive sensing, signal processing, image processing, statistics and machine learning. Regularization terms with sparse representations (for instance the norm regularizer) have been developed into an important tool in these applications recently [8, 19, 7, 33]. These methods are based on the assumption that signals or images have a sparse representation, that is, only containing a few nonzero entries. To further improve the solutions, more recent studies suggested to go beyond sparsity and took into account additional information about the underlying structure of the solutions [8, 21, 19]. Particularly, a wide class of solutions which with specific ¡°group sparsity¡± structure are considered. In this case, a group sparse vector can be divided into groups of components satisfying a) only a few of groups contain nonzero values and b) these groups are not needed to be sparse. This property sometimes calls “joint sparsity” that a set of sparse vectors with the union of their supports being sparse [7]. Many literature had consider this new sparse problems [8, 21, 19, 33, 7, 28]. Putting such group vectors into a matrix as row vectors of the marix, this matrix will only have few nonzero rows and these rows may be not sparse. These problems are typically obtained by replacing the problem

 minz   ∥z∥1+β2∥z−x∥22, (1)

with

 minz   r∑i=1∥z[i]∥2+β2∥z−x∥22, (2)

where is a given vector, , with represents the norm of vector , is the absolute value of , and is the th group of with and . The first term of former equations is called the regularization term, the second term is called the fidelity term, and is the regularization parameters.

Group sparsity solutions have better representation and have been widely studied both for convex and nonconvex cases [8, 28, 11, 16, 26, 7]. More recently, overlapping group sparsity (OGS) had been considered [8, 7, 25, 24, 32, 12, 20, 22, 30, 2, 1]. These methods are based on the assumption that signals or image have a special sparse representation with OGS. The task is to solve the following problem

 minz   ∥z∥2,1+β2∥z−x∥22, (3)

where is the generalized -norm. Here, each is a group vector containing (called group size) elements that surrounding the th entry of . For example, with . In this case, , and contain the th entry of (), which means overlapping different from the form of group sparsity (2). Particularly, if , the generalized -norm degenerates into the original -norm, and the relevant regularization problem (3) degenerates to (1).

To be more general, we consider the weighted generalized -norm (we only consider that each group has the same weight, which means translation invariant) instead the former generalized -norm, the task can be extended to

 minz   ∥z∥w,2,1+β2∥z−x∥22, (4)

where is a nonnegative real vector with the same size as and “” is the point-wise product or hadamard product. For instance, with as the former example. Particularly, the weighted generalized -norm degenerates into the generalized -norm if each entry of equals to 1. The problems (3) and (4) had been considered in [12, 20, 22, 30, 2, 1, 8]. They solve the relevant problems by using variable duplication methods (variable splitting, latent/auxilliary variables, etc.). Particularly, Deng et. al in [8] introduced a diagonal matrix for this variable duplication methods. This matrix was not easy to find and would break the structure of the coefficient matrix, which made the difficulty of solving solutions under high dimensional vector cases. Moreover, it is difficult to extend this method to the matrix case.

Considering the matrix case of the problem (4), we can get

 minA   ∥A∥W,2,1+β2∥A−X∥2F, (5)

where , . Here, each is a group matrix containing (called group size) elements that surrounding the th entry of . For example,

 Wg∘(Ai,j)g=⎡⎢ ⎢ ⎢ ⎢ ⎢⎣(Wg)1,1Ai−l1,j−l2(Wg)1,2Ai−l1,j−l2+1⋯(Wg)1,K2Ai−l1,j+r2(Wg)2,1Ai−l1+1,j−l2(Wg)2,2Ai−l1+1,j−l2+1⋯(Wg)2,K2Ai−l1+1,j+r2⋮⋮⋱⋮(Wg)K1,1Ai+r1,j−l2(Wg)K1,2Ai+r1,j−l1+1⋯(Wg)K1,K2Ai+r1,j+r2⎤⎥ ⎥ ⎥ ⎥ ⎥⎦∈RK1×K2,

where , , , (with and ) and [] denotes the largest integer less than or equal to . Particularly, if , this problem degenerate to the former vector case (4). If , this problem degenerate to the original regularization problem (1) for the matrix case. If for , , this problem had been considered in Chen et. al [4]. However, they used an iterative algorithm based on the principle of majorization minimization (MM) to solve this problem.

In this paper, we propose new explicit shrinkage formulas for all the former problems (3), (4) and (5), which can get accurate solutions without iteration in [4], variable duplication (variable splitting, latent/auxilliary variables, etc.) in [12, 20, 22, 30, 2, 1], or finding matrix in [8]. Numerical results will also verify the validity and effectiveness of our new explicit shrinkage formulas. Moreover, this new method can be used as a subproblem in many other OGS problems such as compressive sensing with regularization and image restoration with total variation (TV) regularization. According to the framework of ADMM, the relevant convergence theory results of these OGS problems can be easy to be obtained because of the accurate solution of their subproblems with the application of our new explicit shrinkage formulas. For example, we will apply our results in image restoration using TV with OGS in our work, and we will obtain the convergence theorems. Numerical results will also verify the validity and effectiveness of our new methods.

The outline of the rest of this paper is as follows. In Section 2 we detailed deduce our explicit shrinkage formulas for OGS problems (3), (4) and (5). In Section 3, we propose some extension for these shrinkage formulas. In Section 4, we apply our results in image deblurring and denoising problems with OGS TV. Numerical results are given in Section 5. Finally, we conclude this paper in Section 6.

## 2 OGS-shrinkage

### 2.1 Original shrinkage

For the original sparse represent solutions we often want to solve the following problems

 minz   ∥z∥p+β2∥z−x∥22,  p=1,2. (6)

Definition 1. Define shrinkage mappings and from to by

 Sh1(x,β)i=sgn(xi)max{|xi|−1β,0}, (7)
 Sh2(x,β)=x∥x∥2max{∥x∥2−1β,0}, (8)

where both expressions are taken to be zero when the second factor is zero, and “sgn” represents the signum function indicating the sign of a number, that is, sgn()=0 if , sgn()=-1 if and sgn()=1 if .

The shrinkage (7) is known as soft thresholding and occurs in many algorithms related to sparsity since it is the proximal mapping for the norm. Then the minimizer of (6) with is the following equation (9).

 argminz   ∥z∥1+β2∥z−x∥22=Sh1(x,β). (9)

Thanks to the additivity and separability of both the norm and the square of the norm, the shrinkage (7) can be deduced easily by the following formula:

 minz   ∥z∥1+β2∥z−x∥22=n∑i=1minzi|zi|+β2|zi−xi|2. (10)

The minimizer of (6) with is the following equation (11).

 argminz   ∥z∥2+β2∥z−x∥22=Sh2(x,β). (11)

This formula is deduced by the Euler equation of (6) with . Clearly,

 β(z−x)+z∥z∥2∋0, (12)
 (1+1β1∥z∥2)z−x∋0. (13)

We can easily get that the necessary condition is that the vector is parallel to the vector . That is . Substitute into (13), and the formula (8) is obtained. More details please refer to [37, 38].

Our new explicit OGS shrinkage formulas are based on these observations, especially the properties of additivity, separability and parallelity.
Remark 1. The problem (2) is also easy to be solved by a simple shrinkage formula, which is not used in this work. More details refer to [8, 35, 34].

### 2.2 The OGS shrinkage formulas

Now we focus on the problem (3) firstly. The difficulty of this problem is overlapping. Therefore, we must take some special techniques to avoid overlapping. That is the point of our new explicit OGS shrinkage formulas.

It is obvious that the first term of the problem (3) is additive and separable. So if we find some relative rules such that the second term of the problem (3) has the same properties with the same variable as the first term, the solution of (3) can be easily found similar as (10).

Assuming period boundary conditions is used here, we observe that each entry of vector would appear exactly times in the first term. Therefore, to hold on the uniformity of vectors and , we need multiply the second term by . To maintain the invariability of the problem (3), and after some manipulations, we have

 fm(z)=minz∥z∥2,1+β2∥z−x∥22=minz∑ni=1∥(zi)g∥2+β2ss∥z−x∥22=minz∑ni=1∥(zi)g∥2+β2s∑ni=1∥(zi)g−(xi)g∥22=minz∑ni=1(∥(zi)g∥2+β2s∥(zi)g−(xi)g∥22), (14)

where is same as defined before.

For example, we set and define . The generalized -norm can be treated as the generalized norm of generalized points, whose entry is also a vector, and the absolute value of each entry is treated as the norm of . See Figure 1(a) intuitively, where the top line is the vector , the rectangles with dashed line are original , and the rectangles with solid line are the generalized points. Because of the period boundary conditions, we know that each line of Figure 1(a) is translated equal. We treat the vector same as the vector . Putting these generalized points (rectangles with solid line in the figure) as the columns of a matrix, then can be regarded as matrix Frobenius norm in Figure 1(a) with every line being the row of the matrix. This is why the second equality in (14) holds. Therefore, generally, for each of the last line of (14), from the equation (8) and (11), we can obtain

 argmin(zi)g∥(zi)g∥2+β2s∥(zi)g−(xi)g∥22=Sh2((xi)g,βs),

then

 (zi)g=max{∥(xi)g∥2−sβ,0}(xi)g∥(xi)g∥2,   (zi)g=([(zi)g]1,[(zi)g]2,...,[(zi)g]s). (15)

Similarly as Figure 1(a), for each , the th entry (or ) of the vector (or ) may appear times, so we need compute each for times in different groups.

However, the results from (15) are wrong, because the results in different groups are different from (15). That means the results are not able to be satisfied simultaneously in this way. Moreover, for each of the last line in (14), the result (15) is given by that the vector is parallel to the vector . Notice this point and ignore that or , particularly for and , the vector can be split as follows,

 z=(z1,z2,z3,z4,z5,⋯,zn−2,zn−1,zn)=+14(z1,z2,z3,0,0,⋯,0,0,zn).14(z1,z2,z3,z4,0,⋯,0,0,0)+14(0,z2,z3,z4,z5,⋯,0,0,0)+⋯+14(z1,0,0,0,0,⋯,zn−2,zn−1,zn)+14(z1,z2,0,0,0,⋯,0,zn−1,zn) (16)

Let be the expansion of , with , , . Let be the expansion of similarly as . Then, we have , and . Moreover, we can easily obtain that and for every .

On one hand, the Euler equation of (with ) is given by

 β(z−x)+(z1)′g∥(z1)′g∥2+⋯+(z′n)g∥(z′n)g∥2∋0, (17)
 β4n∑i=1((zi)′g−(xi)′g)+(z1)′g∥(z1)′g∥2+⋯+(zi)′g∥(zi)′g∥2+⋯+(zn)′g∥(zn)′g∥2∋0. (18)

From the deduction of the 2-dimensional shrinkage formula (8) in Section 2.1, we know that the necessary condition of minimizing the th term of the last line in (14) is that is parallel to . That is, is parallel to for every . For example,

 (z2)′g=(z1,z2,z3,z4,0,⋯,0,0)//(x2)′g=(x1,x2,x3,x4,0,⋯,0,0). (19)

Then we obtain . Therefore, (18) changes to

 β4n∑i=1((zi)′g−(xi)′g)+(x1)′g∥(x1)′g∥2+⋯+(xi)′g∥(xi)′g∥2+⋯+(xn)′g∥(xn)′g∥2∋0, (20)
 β(z−x)+(x1)′g∥(x1)′g∥2+⋯+(xi)′g∥(xi)′g∥2+⋯+(xn)′g∥(xn)′g∥2∋0, (21)
 z∋x−1β((x1)′g∥(x1)′g∥2+⋯+(xi)′g∥(xi)′g∥2+⋯+(xn)′g∥(xn)′g∥2), (22)

for each component, we obtained

 zi∋xi−1β(xi∥(xi−2)′g∥2+xi∥(xi−1)′g∥2+xi∥(xi)′g∥2+xi∥(xi+1)′g∥2). (23)

Therefore, when and , we find a minimizer of (3) on the direction that all the vectors are parallel to the vectors . In addition, when , (18) holds, then , therefore, holds. Moreover, because of the strict convexity of , we know that the minimizer is unique. This minimizer is accurate.

On the other hand, when or , our method may not obtain the accurate minimizer. When , we know that the minimizer of the subproblem is exactly that . When and the minimizer of the subproblem is that (this is because that the parameter is two small), our method is not able to obtain the accurate minimizer. For example, that while makes the element in take different values in different subproblems. However, we can obtain an approximate minimizer in this case, which is that the element is a simple summation of corresponding subproblems containing . We will show that in experiments of Section 5 the approximate minimizer is also good. Moreover, when we take this problem as a subproblem of the image processing problem, we can set the parameter to be large enough to make sure that the minimizer of the subproblem is accurate. Therefore, the convergence theorem results can be obtained by this accuracy, which will be applied in Section 4.

In addition, form (23), we can know that the element of the minimizer can be treated as in subproblems independently and then combine them. After some manipulations, in conclusion, we can get the following two general formulas.

1).

 argminz   ∥z∥2,1+β2∥z−x∥22=ShOGS(x,β), (24)

with

 ShOGS(x,β)i=zi=max{1−1βF(xi),0}xi. (25)

Here, for instance, when group size , in and is defined similarly as , we have . The is contained in if and only if has the component , and we follow the convention in because implies and the value of is insignificant in (25).

2).

 argminz   ∥z∥2,1+β2∥z−x∥22=ShOGS(x,β), (26)

with

 ShOGS(x,β)i=zi=G(xi)⋅xi. (27)

Here, symbols are the same as 1), and
.

When is sufficiently large or sufficiently small, the former two formulas are the same and both are accurate. For the other values of , from the experiments, we find that 2) is better approximate than 1), so we choose the formula 2). Then, we obtain the following algorithm for finding the minimizer of (3).
Algorithm 1 Direct shrinkage algorithm for the minimization problem (3) Input:   Given vector , group size , parameter . Compute: Definition of , for example , with and (). = ones(1,s) = [1,1,,1]. If is even, then and . Let = fliplr() be fliping over 180 degrees. Compute , by convolution of and . Compute pointwise. Compute by correlation of and , or by convolution of and .

We can see that Algorithm 1 only need 2 times convolution computations with time complexity , which is just the same time complexity as one step iteration in the MM method in [4]. Therefore, our method is much more efficient than MM method or other variable duplication methods. In Section 5, we will give the numerical experiments for comparison between our method and the MM method. Moreover, if , our Algorithm 1 degenerates to the classic soft thresholding as our formula (27) degenerates to (7). Moreover, when is sufficiently large or sufficiently small, our algorithm is accurate while MM method is also approximate.
Remark 2 Our new explicit algorithm can be treated as an average estimation algorithm for solve all the overlapping group subproblem independently. In Section 5, our numerical experiments show that Our new explicit algorithm is accurate when is sufficiently large or sufficiently small, and is approximate to the other methods for instance the MM method for other .

For the problem (4), similar to (14), we can get

 fw(z)=minz∥z∥w,2,1+β2∥z−x∥22=minz∑ni=1∥wg∘(zi)g∥2+β2∑sk=1(wg)2k∑sk=1(wg)2k∥z−x∥22=minz∑ni=1∥wg∘(zi)g∥2+β2∥wg∥22∑ni=1∥wg∘((zi)g−(xi)g)∥22=minz∑ni=1∥wg∘(zi)g∥2+β2∥wg∥22∑ni=1∥wg∘(zi)g−wg∘(xi)g∥22=minz∑ni=1(∥wg∘(zi)g∥2+β2∥wg∥22∥wg∘(zi)g−wg∘(xi)g∥22). (28)

See Figure 1(b) intuitively. All the symbols are the same as before. Similarly as before, we know that the necessary condition of minimizing the th term of the last line in (28) is that () is parallel to (). That is, for every . On the other hand, Let =diag() be a diagonal matrix with diagonal being the vector , then . If the vector (the same size as ) is parallel to the vector , we have . Then, . We obtain that the vector is also parallel to the vector . Therefore, , and each of them is a unit vector. Then, . That is,

 wg∘wg∘(zi)g∥wg∘(zi)g∥2=wg∘wg∘(xi)g∥wg∘(xi)g∥2.

Particularly, we first consider that , and . We mark be the expansion of () similarly as , and we can get . Then, the Euler equation of is given by

 β(z−x)+(wg∘wg∘(z1)g)′∥(wg∘(z1)g)′∥2+⋯+(wg∘wg∘(zn)g)′∥(wg∘(zn)g)