A globally convergent and locally quadraticallyconvergent modified B-semismooth Newton method for \ell_{1}-penalized minimization

# A globally convergent and locally quadratically convergent modified B-semismooth Newton method for ℓ1-penalized minimization

Esther Hans Johannes Gutenberg University Mainz, Institute of mathematics, Staudingerweg 9, D-55099 Mainz, Germany. E-Mail: hanse@uni-mainz.de, raasch@uni-mainz.de    Thorsten Raasch 11footnotemark: 1
June 30, 2019
###### Abstract

We consider the efficient minimization of a nonlinear, strictly convex functional with -penalty term. Such minimization problems appear in a wide range of applications like Tikhonov regularization of (non)linear inverse problems with sparsity constraints. In (2015 Inverse Problems 31 025005), a globalized Bouligand-semismooth Newton method was presented for -Tikhonov regularization of linear inverse problems. Nevertheless, a technical assumption on the accumulation point of the sequence of iterates was necessary to prove global convergence. Here, we generalize this method to general nonlinear problems and present a modified semismooth Newton method for which global convergence is proven without any additional requirements. Moreover, under a technical assumption, full Newton steps are eventually accepted and locally quadratic convergence is achieved. Numerical examples from image deblurring and robust regression demonstrate the performance of the method.
Keywords: global convergence, semismooth Newton method, -Tikhonov regularization, inverse problems, sparsity constraints, quadratic convergence
Mathematics Subject Classification: 49M15, 49N45, 90C56

## 1 Introduction

We are concerned with the efficient minimization of

 minu∈ℓ2g(u)+∞∑k=1wk|uk|, (1)

where is a twice Lipschitz-continuously differentiable and strictly convex functional, and is a positive weight sequence with . Minimization problems of the form (1) appear in various applications from engineering and natural sciences. A well-known example is Tikhonov regularization for inverse problems with sparsity constraints, e.g. medical imaging, geophysics, nondestructive testing or compressed sensing, see e.g. [16, 20, 26, 55]. Here, one aims to solve a possibly nonlinear ill-posed operator equation , . In practice, one has to reconstruct from noisy measurement data . In the presence of perturbed data, regularization strategies are required for the stable computation of a numerical solution to an inverse problem [16, 49]. Applying Tikhonov regularization with sparsity constraints, one minimizes a functional consisting of a suitable discrepancy term and a sparsity promoting penalty term, see e.g. [13] and the references therein. Sparsity here means the a priori assumption that the unknown solution is sparse, i.e.  has only few nonzero entries. As an example, in the special case of a linear discrete ill-posed operator equation , linear, bounded and injective, , one may choose the discrepancy term [16]. For nonlinear inverse problems like parameter identification problems, convex discrepancy terms from energy functional approaches may be considered, see e.g. [31, 35, 38, 40]. Sparsity of the Tikhonov minimizer with respect to a given basis can be enforced by using the penalty term in (1), where the weights act as regularization parameters, see e.g. [25, 26, 34] and the references therein.

In current research, sparsity-promoting regularization techniques are widely used, see e.g. [6, 13, 24, 26, 33, 34, 38, 39, 40] and the references therein. Such recovery schemes usually outperform classical Tikhonov regularization with coefficient penalties in terms of reconstruction quality if the unknown solution is sparse w.r.t. some basis. This is the case in many parameter identification problems for partial differential equations with piecewise smooth solutions, like electrical impedance tomography [24, 33] or inverse heat conduction scenarios [7].

There exists a variety of approaches for the numerical minimization of (1) in the literature. In the special case of a quadratic functional , iterative soft-thresholding [13] as well as related approaches for general functionals are well-studied, see e.g. [6, 8, 48]. Accelerated methods and gradient-based methods introduced in [4, 20, 38, 41, 54] often gain from clever stepsize choices. Homotopy-type solvers [42] and alternating direction methods of multipliers [55] besides many others are also state-of-the-art.

Other popular approaches for the solution of (1) are semismooth Newton methods [9, 52]. A semismooth Newton method and a quasi-Newton method for the minimization of (1) were proposed by Muoi et al. in the infinite-dimensional setting [40], inspired by previous work of Herzog and Lorenz [26]. If is convex and smooth, it was shown e.g. in [11, 26, 39], that is a minimizer of (1) if and only if is a solution to the zero-finding problem ,

 F(u):=u−Sγw(u−γ∇g(u))=0 (2)

for any fixed , where denotes the componentwise soft thresholding of with respect to a positive weight sequence , denotes the gradient of and . In [40], from (2) was shown to be Newton differentiable, i.e. under a suitable assumption on there exists a family of slanting functions with

 limh→0∥F(u+h)−F(u)−G(u+h)h∥ℓ2∥h∥ℓ2=0, (3)

see also [9, 26, 52] for the definition of Newton derivatives. A local semismooth Newton method was defined in [40] by

 G(u(j))d(j) =−F(u(j)), (4) u(j+1) =u(j)+d(j),j=0,1,…. (5)

with a specially chosen , cf. [9, 52]. In [40], locally superlinear convergence was proven under suitable assumptions on the functional .

Nevertheless, the above mentioned semismooth Newton methods are only locally convergent in general. In [39], a semismooth Newton method with filter globalization was presented where semismooth Newton steps are combined with damped shrinkage steps. Another globalized semismooth Newton method was developed in [28]. In loc. cit., inspired by [27, 32, 43, 45], the method from [26] was globalized in a finite-dimensional setting for the special case of a quadratic discrepancy term

 minu∈Rn12∥Ku−f∥22+n∑k=1wk|uk|, (6)

where is injective and . In [28], was shown to be Lipschitz continuous and directionally differentiable, i.e. Bouligand differentiable [17, 43, 50]. For such nonlinearities a B(ouligand)-Newton method can be defined [43], replacing (4) by the generalized Newton equation

 F′(u(j),d(j))=−F(u(j)). (7)

In [28], the system (7) was shown to be equivalent to a uniquely solvable mixed linear complementarity problem [12]. By the choice (7), automatically is a descent direction with respect to the merit functional ,

 Θ(u):=∥F(u)∥22, (8)

cf. [43]. Additionally, this Bouligand-Newton method can be interpreted as a semismooth Newton method with a specially chosen slanting function and is therefore called a B-semismooth Newton method, cf. [46]. By introducing suitable damping parameters, the method can be shown to be globally convergent under a technical assumption on the in practice unknown accumulation point of the sequence of iterates, see also [27, 32, 43, 45]. Indeed, if the chosen Armijo stepsizes tend to zero, the merit functional has to fulfill the condition

 lim(u,v)→(u∗,u∗)Θ(u)−Θ(v)−Θ′(u∗,u−v)∥u−v∥2=0 (9)

at to ensure global convergence.

In this work, we present a modified, globally convergent semismooth Newton method for the minimization problem (1) in the finite-dimensional setting

 minu∈Rng(u)+n∑k=1wk|uk| (10)

for general (not necessarily quadratic) strictly convex functionals . Our work is inspired by Pang [44], where a globally and locally quadratically convergent modified Bouligand-Newton method was presented for the solution of variational inequalities, nonlinear complementarity problems and nonlinear programs. We take advantage of similarities of nonlinear complementarity problems and the zero-finding problem (2) to propose a modified method similar to [44]. Starting out from [28, 40], we develop a globalized B-semismooth Newton method for general possibly nonquadratic discrepancy functionals . In order to achieve global convergence without any requirements on the a priori unknown accumulation point of the iterates, inspired by [44], we propose a special modification of the Newton directions from (7), retaining the descent property w.r.t. . The resulting generalized Newton equation is again shown to be equivalent to a uniquely solvable mixed linear complementarity problem. Fortunately, in our proposed scheme, under a technical assumption, full Newton steps are accepted in the vicinity of the zero of . As a consequence, under an additional regularity assumption, locally quadratic convergence is achieved. Additionally, the resulting modified method can be interpreted as a generalized Newton method proposed by Han, Pang and Rangaraj [27]. In a neighborhood of the zero of , the modified method, under a technical assumption, coincides with the B-semismooth Newton method from [28] reformulated for nonquadratic . If is a quadratic functional, it was shown in [28] that in a neighborhood of the zero, the B-semismooth Newton method finds the exact zero of within finitely many iterations.

Alternatively, one may consider other globalization strategies as trust region methods or path-search methods instead of the considered line-search damping strategy, see e.g. [52, 18] and the references therein. The path-search globalization strategy proposed by the authors of [47, 14] could be a promising, albeit conceptually different, alternative. These approaches go beyond the scope of this paper and are part of future work.

For the rest of the paper, we require the following assumption on the smoothness of , similar to [40, Assumption 3.1, Example 3.4]. In Section 3, we will need a further assumption regarding the locally quadratic convergence of the method.

###### Assumption 1.
1. The function is twice Lipschitz-continuously differentiable and the Hessian is positive definite for all . Moreover, there exist constants with

uniformly for all .

2. The level sets of are compact.

The compactness of the level sets in the case of a quadratic functional , injective, , was shown in [28]. Note that the positive definiteness of the Hessian implies strict convexity of the functional and ensures unique solvability of (10).

The paper is organized as follows. Section 2 treats the proposed B-semismooth Newton method and its modification as well as their feasibility. Section 3 addresses the global convergence and the local convergence speed of the methods. Numerical examples demonstrate the performance of the proposed algorithms in Section 4.

## 2 A B-semismooth Newton method and its modification

In this section, we present the algorithm of the B-semismooth Newton method from [28] generalized to the minimization problem (10) as well as a modified version and discuss their feasibility. Additionally, we suggest a hybrid method. We start with the modified algorithm because the generalized B-semismooth Newton method can immediately be deduced from the modified method.

### 2.1 A modified B-semismooth Newton method and its feasibility

In the following, we introduce a modified B-semismooth Newton method for the solution of (10). We denote the active set by , where

 A+(u) :={k:γ(∇g(u))k+γwk

and the inactive set by , where

 I∘(u) :={k:γ(∇g(u))k−γwk

Below, we drop the argument if there is no risk of confusion.

For defined by (2), we then have

 Fk(u) =min{γ(∇g(u))k+γwk,uk}, k∈A+(u)∪I∘(u)∪I+(u), (16) Fk(u) =max{γ(∇g(u))k−γwk,uk}, k∈A−(u)∪I∘(u)∪I−(u). (17)

By Assumption 1, is Lipschitz continuous and directionally differentiable. The directional derivative of can be easily deduced.

###### Lemma 2.

The directional derivative of at in the direction is given elementwise by

 F′k(u,d)=⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩γ(∇2g(u)d)k,k∈A(u),dk,k∈I∘(u),min{γ(∇2g(u)d)k,dk},k∈I+(u),max{γ(∇2g(u)d)k,dk},k∈I−(u). (18)
###### Proof.

The claim is trivially true for and . For we have with (14) and (16)

 limt↘0min{γ(∇g(u+td))k+γwk,uk+tdk}−min{γ(∇g(u))k+γwk,uk}t = limt↘0min{γ(∇g(u+td))k−γ(∇g(u))k,tdk}t = min{limt↘0γ(∇g(u+td))k−γ(∇g(u))kt,dk} = min{γ(∇2g(u)d)k,dk}.

The claim for results analogously. ∎

The directional derivative of the merit functional from (8) at in the direction is given by , where denotes the Euclidean scalar product, see e.g. [28, Lemma 3.2].

To introduce the modified semismooth Newton method, we define the subsets

 A++(u) :={k:γ(∇g(u))k+γwk

Inspired by [44], we define the modified index sets

 ¯¯¯¯¯¯¯¯A+(u) (23) ¯¯¯¯¯¯¯¯A−(u) (24) ¯¯¯¯¯¯I∘(u) :=I∘(u)∖(I∘+(u)∪I∘−(u)), (25) ¯¯¯¯¯¯¯I+(u) :=I+(u)∪A++(u)∪I∘+(u), (26) ¯¯¯¯¯¯¯I−(u) :=I−(u)∪A−−(u)∪I∘−(u). (27)

We denote and respectively. The subsets (19)–(22) fulfill , , and if .

In the following lemma, we consider a linear complementarity problem which is important for all further discussions, cf. [28].

###### Lemma 3.

Let and . The linear complementarity problem

 x≥0,Nx+z≥0,⟨x,Nx+z⟩=0, (28)

with

 N= N(u) := γ⎛⎜⎝M¯¯¯¯¯¯¯I+,¯¯¯¯¯¯¯I+−M¯¯¯¯¯¯¯I+,¯¯¯¯AM−1¯¯¯¯A,¯¯¯¯AM¯¯¯¯A,¯¯¯¯¯¯¯I+M¯¯¯¯¯¯¯I+,¯¯¯¯AM−1¯¯¯¯A,¯¯¯¯AM¯¯¯¯A,¯¯¯¯¯¯¯I−−M¯¯¯¯¯¯¯I+,¯¯¯¯¯¯¯I−M¯¯¯¯¯¯¯I−,¯¯¯¯AM−1¯¯¯¯A,¯¯¯¯AM¯¯¯¯A,¯¯¯¯¯¯¯I+−M¯¯¯¯¯¯¯I−,¯¯¯¯¯¯¯I+M¯¯¯¯¯¯¯I−,¯I−−M¯¯¯¯¯¯¯I−,¯¯¯¯AM−1¯¯¯¯A,¯¯¯¯AM¯¯¯¯A,¯¯¯¯¯¯¯I−⎞⎟⎠ (29)

and

 (30)

has a unique solution.

###### Proof.

By Assumption 1, is symmetric and positive definite. Therefore, from (3) is symmetric and positive definite, see [28, Lemma 3.3]. Hence (28) is uniquely solvable, see [12, Theorem 3.3.7] and [28, Theorem 3.5]. ∎

Now we can define the generalized Newton equation for , cf. [28]. Let and

 B=B(u):=¯¯¯¯¯A(u)∪{k∈¯¯¯¯¯¯¯I+(u)∪¯¯¯¯¯¯¯I−(u):xk>0},C=C(u):=¯¯¯¯¯¯I∘(u)∪{k∈¯¯¯¯¯¯¯I+(u)∪¯¯¯¯¯¯¯I−(u):xk=0}, (31)

where is the unique solution to the linear complementarity problem (28). Then, by defining the generalized derivative blockwise

 (32)

the modified semismooth Newton method is given by

 G(u(j))d(j) =−F(u(j)), (33) u(j+1) =u(j)+tjd(j),j=0,1,… (34)

with suitably chosen damping parameters .

###### Remark 4.

In [40], Muoi et al. chose the slanting function

 (35)

blocked according to the active and inactive sets, to define the local semismooth Newton method (4),(5). The key difference of (32) compared to (35) is the modification of the index sets. Note that from (32) is not a slanting function in general because in regions where is smooth, does not coincide with the Fréchet-derivative of .

Let and . Then solves (33) if and only if

 γ(Md(j))¯¯¯¯A =−F(u(j))¯¯¯¯A, (36) d(j)¯¯¯¯¯¯I∘ =−u(j)¯¯¯¯¯¯I∘, (37)

and

 x :=⎛⎜⎝d(j)¯¯¯¯¯¯¯I++u(j)¯¯¯¯¯¯¯I+−d(j)¯¯¯¯¯¯¯I−−u(j)¯¯¯¯¯¯¯I−⎞⎟⎠ (38) y :=N(u(j))x+z(u(j))=⎛⎝γ(Md(j))¯¯¯¯¯¯¯I++(F(u(j)))¯¯¯¯¯¯¯I+−γ(Md(j))¯¯¯¯¯¯¯I−−(F(u(j)))¯¯¯¯¯¯¯I−⎞⎠,

where , solve the linear complementarity problem (28), cf. [28, Lemma 3.4].

We summarize the above observations in the following lemma, cf. [28, Theorem 3.5].

###### Lemma 5.

Let be the unique solution to (28) for an iterate and . Then, the Newton update from (33) is given by

 d(j)¯¯¯¯¯¯I∘ =−u(j)¯¯¯¯¯¯I∘, (39) d(j)¯¯¯¯¯¯¯I+ =x¯¯¯¯¯¯¯I+−u(j)¯¯¯¯¯¯¯I+, d(j)¯¯¯¯¯¯¯I− =−x¯¯¯¯¯¯¯I−−u(j)¯¯¯¯¯¯¯I−, d(j)¯¯¯¯A =1γM−1¯¯¯¯A,¯¯¯¯A(−γM¯¯¯¯A,¯¯¯Id(j)¯¯¯I−F(u(j))¯¯¯¯A).

Before proceeding, we prove some useful identities similar to [44, Lemma 2].

###### Lemma 6.

Let , the unique solution to (39) and . For , we have the following identities

 (γ(∇g(u))k+γwk)γ(Md)k =−(γ(∇g(u))k+γwk)2, k∈¯¯¯¯¯¯¯¯A+(u), (40) (γ(∇g(u))k−γwk)γ(Md)k =−(γ(∇g(u))k−γwk)2, k∈¯¯¯¯¯¯¯¯A−(u), (41) ukdk =−u2k, k∈¯¯¯¯¯¯I∘(u). (42)

 (γ(∇g(u))k+γwk)γ(Md)k ≤−(γ(∇g(u))k+γwk)2 (43)

holds, for we have

 (γ(∇g(u))k−γwk)γ(Md)k ≤−(γ(∇g(u))k−γwk)2 (44)

and for we have

 ukdk ≤−u2k. (45)
###### Proof.

Equations (40), (41) and (42) immediately follow from (36) and (37). For , we have by definition and with (28) and (38) we have implying (43). For , we have by definition and with (28) and (38) we have , implying (44).

For , we have and because of (28) and (38). If , we have and because of (28) and (38). In both cases (45) follows. ∎

Now we verify that from (39) is a descent direction of the merit functional from (8) at .

###### Lemma 7.

Let with and . Let be the solution to (39). Then, we have

 Θ′(u,d)≤−2Θ(u)<0, (46)

i.e.  is a true descent direction of at in the direction .

###### Proof.

The proof follows the idea of [44, Proof of Proposition 5]. We have

 Θ′(u,d)=2⟨F(u),F′(u,d)⟩=28∑i=1Ti,

where we have with Lemma 2

 T1:=∑k∈¯¯¯¯A(u)Fk(u)γ(Md)k,T3:=∑k∈I+(u)Fk(u)min{dk,γ(Md)k},T5:=∑k∈A++(u)Fk(u)γ(Md)k,T7:=∑k∈I∘+(u)ukdk,T2:=∑k∈¯¯¯¯¯¯I∘(u)ukdk,T4:=∑k∈I−(u)Fk(u)max{dk,γ(Md)k},T6:=∑k∈A−−(u)Fk(u)γ(Md)k,T8:=∑k∈I∘−(u)ukdk.

For , we have

because of (28) and (38). Similarly, for we have

With (40)–(45), we obtain

 Θ′(u,d)=28∑i=1Ti≤−2n∑k=1Fk(u)2=−2Θ(u),

finishing the proof. ∎

We choose the stepsizes in (34) by the well-known Armijo rule

 tj:=max{βl:Θ(u(j)+βld(j))≤(1−2σβl)Θ(u(j)),l=0,1,…},

where and , see also [27, 28, 32, 43, 44, 45]. These stepsizes can be computed in finitely many iterations. We cite the following lemma from [28, Proposition 4.1].

###### Lemma 8.

Let , . Let with and let be computed by (39). Then, there exists a finite index with

 Θ(u(j)+βld(j))≤(1−2σβl)Θ(u(j)). (47)
###### Proof.

According to Lemma 7, it holds The remainder of the proof follows [28, Proof of Proposition 4.1]. ∎

The algorithm of the modified B-semismooth Newton method, in the following denoted by modBSSN, is stated in Algorithm 1. The feasibility of Algorithm modBSSN is guaranteed because of the lemmata stated above.

###### Remark 9.

Pang [44] introduced a modified B-Newton method for a nonlinear complementarity problem. Han, Pang and Rangaraj [27] interpreted this iteration as a generalized Newton method

 F(u(j))+~G(u(j),d(j))=0,u(j+1)=u(j)+tjd(j),j=0,1,…,

where fulfills the assumption that is surjective for each fixed , and

 2⟨F(u),~G(u,d)⟩≥Θ′(u,d)

for all , , see [27, Section 2.3]. In the very same way, our Algorithm modBSSN can be interpreted as a generalized Newton method with and from (32), cf. Lemma 7.

### 2.2 A B-semismooth Newton method and its feasibility

The generalized formulation of the B-semismooth Newton method (5), (7) from [28] for the setting (10), in the following denoted by BSSN, is identical to Algorithm modBSSN replacing the modified index sets (23)–(27) by the original index sets (11)–(15) in (28)–(30) and (39), cf. Algorithm 1 and [28]. Analogously to the proofs in Section 2.1, the Newton directions can be shown to be uniquely determined and the Armijo stepsizes are well-defined because the Newton directions are descent directions w.r.t. the merit functional . Thus, the feasibility of the Algorithm BSSN is guaranteed.

###### Remark 10.

The modification of the index sets in Algorithm modBSSN is needed to prove global convergence without any additional requirements, see Section 3. Let be the unique zero of and let , i.e.  is smooth at . Then, there exists a neighborhood of where the index subsets (19)–(22) are empty for all , i.e. the modified index sets (23)–(27) match the original index sets (11)–(15). Therefore, Algorithm modBSSN locally coincides with in a neighborhood of the zero of if and hence is a semismooth Newton method there.

### 2.3 A globally convergent hybrid method

The B-semismooth Newton method (Algorithm BSSN) from Section 2.2 is efficient in practice because the index sets in step are usually empty so that the generalized Newton equation simplifies to a system of linear equations of the size . The size of the system of linear equations usually decreases in the course of the iteration. Nevertheless, the method may fail to converge, see Remark 10 and Theorem 15. However, the global convergence of Algorithm modBSSN from Section 2.1 is ensured by Theorem 12 but here a mixed linear complementarity problem has to be solved in each iteration, see (39). Additionally, in order to set up the matrix and the vector from (3) and (30), systems of linear equations of the size with the same matrix have to be solved if . Note that in (36) resp. (39) no additional system of linear equations has to be solved for the computation of . Nevertheless, Algorithm modBSSN is usually less efficient than Algorithm BSSN.

We suggest a hybrid method by starting with Algorithm BSSN and switching to Algorithm modBSSN when Algorithm BSSN begins to stagnate, by replacing the modified index sets (23)–(27) by the index sets (11)–(15) in (28)–(30) and (39). In our numerical experiments, we switch to Algorithm modBSSN if the number of Newton steps exceeds a limit and if the chosen stepsize is smaller than a threshold , i.e. if and . In the sequel, this hybrid method is called hybridBSSN. An overview of the proposed methods is given in Algorithm 1. Similar hybrid methods, combining the fast local convergence properties of a local semismooth Newton method with the globally convergent generalized Newton method from [27] were proposed by Qi [45] and Ito and Kunisch [32].

## 3 Global convergence and local convergence speed

In this section, we consider the convergence properties of the algorithms from Section 2.

### 3.1 Convergence of the modified B-semismooth Newton method

In the following, we address the global convergence of Algorithm modBSSN and its convergence speed in a neighborhood of the zero of . Concerning the boundedness of the sequence of Newton directions , we cite [28, Proposition 4.6].

###### Lemma 11.

Let and be the solution to (39). Then, there exists a constant independent of , with

 ∥d∥2≤C∥F(u)∥2. (48)
###### Proof.

The proof follows [28, Proof of Proposition 4.6] by substituting the index sets , and by the modified index sets , and respectively. An inspection of the proof of Lemma 3.3 from [28] and Assumption 1 shows that the Rayleigh quotient of from (3) is bounded from below. ∎

In the following theorem, we present our main result on the global convergence of Algorithm modBSSN.

###### Theorem 12.

Let be an accumulation point of the sequence of iterates produced by Algorithm . Then, we have .

###### Proof.

We proceed analogously to the proof of [44, Theorem 1] and we also use the proof of [44, Proposition 1]. We suppose for all , because otherwise the claim is proven. Because of the Armijo rule (47), the sequence