1 Introduction

GENERALIZED NEWTON’S METHOD

BASED ON GRAPHICAL DERIVATIVES

[3ex] T. HOHEISEL1, C. KANZOW2, B. S. MORDUKHOVICH3 and H. PHAN4

Abstract. This paper concerns developing a numerical method of the Newton type to solve systems of nonlinear equations described by nonsmooth continuous functions. We propose and justify a new generalized Newton algorithm based on graphical derivatives, which have never been used to derive a Newton-type method for solving nonsmooth equations. Based on advanced techniques of variational analysis and generalized differentiation, we establish the well-posedness of the algorithm, its local superlinear convergence, and its global convergence of the Kantorovich type. Our convergence results hold with no semismoothness assumption, which is illustrated by examples. The algorithm and main results obtained in the paper are compared with well-recognized semismooth and -differentiable versions of Newton’s method for nonsmooth Lipschitzian equations.

Key words. nonsmooth equations, optimization and variational analysis, Newton’s method, graphical derivatives and coderivatives, local and global convergence

AMS subject classification. 49J53, 65K15, 90C30

1 Introduction

Newton’s method is one of the most powerful and useful methods in optimization and in the related area of solving systems of nonlinear equations

(1.1)

defined by continuous vector-valued mappings . In the classical setting when is a continuously differentiable (smooth, ) mapping, Newton’s method builds the following iteration procedure

(1.2)

where is a given starting point, and where is a solution to the linear system of equations (often called “Newton equation”)

(1.3)

A detailed analysis and numerous applications of the classical Newton’s method (1.2), (1.3) and its modifications can be found, e.g., in the books [7, 14, 26] and the references therein.

However, in the vast majority of applications—including those to optimization, variational inequalities, complementarity and equilibrium problems, etc.—the underlying mapping in (1.1) is nonsmooth. Indeed, the aforementioned optimization-related models and their extensions can be written via Robinson’s formalism of “generalized equations,” which in turn can be reduced to standard equations of the form above (using, e.g., the projection operator) while with intrinsically nonsmooth mappings ; see [8, 19, 33, 29] for more details, discussions, and references.

Robinson originally proposed (see [32] and also [34] based on his earlier preprint) a point-based approximation approach to solve nonsmooth equations (1.1), which then was developed by his student Josephy [11] to extend Newton’s method for solving variational inequalities and complementarity problems. Other approaches replace the classical derivative in the Newton equation (1.3) by some generalized derivatives. In particular, the -differentiable Newton method developed by Pang [27, 28] uses the iteration scheme (1.2) with being a solution to the subproblem

(1.4)

where denotes the classical directional derivative of at in the direction . Besides the existence of the classical directional derivative in (1.4), a number of strong assumptions are imposed in [27, 28] to establish appropriate convergence results; see Section 5 below for more discussions and comparisons.

In another approach developed by Kummer [16] and Qi and Sun [31], the direction in (1.2) is taken as a solution to the linear system of equations

(1.5)

where is an element of Clarke’s generalized Jacobian of a Lipschitz continuous mapping . In [30], Qi suggested to replace in (1.5) by the choice of from the so-called -subdifferential of at , which is a proper subset of ; see Section 4 for more details. We also refer the reader to [8, 15, 34] and bibliographies therein for wide overviews, historical remarks, and other developments on Newton’s method for nonsmooth Lipschitz equations as in (1.1) and to [13] for some recent applications.

It is proved in [31] and [30] that the Newton type method based on implementing the generalized Jacobian and -subdifferential in (1.5), respectively, superlinearly converges to a solution of (1.1) for a class of semismooth mappings ; see Section 4 for the definition and discussions. This subclass of Lipschitz continuous and directionally differentiable mappings is rather broad and useful in applications to optimization-related problems. However, not every mapping arising in applications (from both theoretical and practical viewpoints) is either directionally differentiable or Lipschitz continuous. The reader can find valuable classes of functions and mappings of this type in [24, 35] and overwhelmingly in spectral function analysis, eigenvalue optimization, studying of roots of polynomials, stability of control systems, etc.; see, e.g., [4] and the references therein.

The main goal and achievements of this paper are as follows. We propose a new Newton-type algorithm to solve nonsmooth equations (1.1) described by general continuous mappings that is based on graphical derivatives. It reduces to the classical Newton method (1.3) when is smooth, being different from previously known versions of Newton’s method in the case of Lipschitz continuous mappings . Based on advanced tools of variational analysis involving metric regularity and coderivatives, we justify well-posedness of the new algorithm and its superlinear local and global (of the Kantorovich type) convergence under verifiable assumptions that hold for semismooth mappings but are not restricted to them. Detailed comparisons of our algorithm and results with the semismooth and -differentiable Newton methods are given and certain improvements of these methods are justified.

Note metric regularity and related concepts of variational analysis has been employed in the analysis and justification of numerical algorithms starting with Robinson’s seminal contribution; see, e.g., [1, 18, 25] and their references for the recent account. However, we are not familiar with any usage of graphical derivatives and coderivatives for these purposes.

The rest of the paper is organized as follows. In Section 2 we present basic definitions and preliminaries from variational analysis and generalized differentiation widely used for formulations and proofs of the main results.

Section 3 is devoted to the description of the new generalized Newton algorithm with justifying its well-posedness/solvability and establishing its superlinear local and global convergence under appropriate assumptions on the underlying mapping .

In Section 4 we compare our algorithm with the scheme of (1.5). We also discuss in detail the major assumptions made in Section 3 deriving sufficient conditions for their fulfillment and comparing them with those in the semismooth Newton methods.

Section 5 contains applications of our algorithm to the -differentiable Newton method (1.4) with largely relaxed assumptions in comparison with known ones. In Section 6 we give some concluding remarks and discussions on further research.

Our notation is basically standard in variational analysis and numerical optimization; cf. [8, 24, 35]. Recall that, given a set-valued mapping , the expression

(1.6)

defines the Painlevé-Kuratowski upper/outer limit of as . Let us also mention that the symbols and stand, respectively, for the conic hull and convex hull of the set in question, that denotes the Euclidean distance between a point and a set , and that the notation signifies the matrix transposition. As usual, stands for the closed ball centered at with radius .

2 Tools of Variational Analysis

In this section we briefly review some constructions and results from variational analysis and generalized differentiation widely used in what follows. The reader may consult the texts [3, 24, 35, 36] for more details and additional material.

Given a nonempty set and a point , the (Bouligand-Severi) tangent/contingent cone to at is defined by

(2.1)

via the outer limit (1.6). This cone is often nonconvex while its polar/dual cone

(2.2)

is always convex and can be intrinsically described by

where the symbol signifies that with . The construction (2.2) is known as the prenormal cone or the Fréchet/regular normal cone to at . For convenience we put if . Observe that the prenormal cone (2.2) may not have natural properties of generalized normals in the case of nonconvex sets ; e.g., it often happens that when is a boundary point of and the cone (2.2) does not possesses required calculus rules. The situation is dramatically improved when we consider a robust regularization of (2.2) via the outer limit (1.6) and arrive at the construction

(2.3)

known as the (limiting, basic, Mordukhovich) normal cone to at . If is locally closed around , the basic normal cone (2.3) can be equivalently described as

via the Euclidean projector on ; this was in fact the original definition of the normal cone in [21]. Despite its nonconvexity, the normal cone (2.3) and the corresponding subdifferential and coderivative constructions for extended-real-valued functions and set-valued mappings enjoy comprehensive calculus rules, which are particularly based on variational/extremal principles of variational analysis.

Consider next a set-valued mapping with the graph

and define the graphical derivative and coderivative constructions generated by the tangent and normal cones, respectively. Given , the graphical/contingent derivative of at is introduced in [2] as a mapping with the values

(2.4)

defined via the contingent cone (2.1) to the graph of at the point ; see [3, 35] for various properties, equivalent representation, and applications. The coderivative of at is introduced in [22] as a mapping with the values

(2.5)

defined via the normal cone (2.3) to the graph of at ; see [24, 35] for extended calculus and a variety of applications. We drop in the graphical derivative and coderivative notation when the mapping in question is single-valued at . Note that the graphical derivative and coderivative constructions in (2.4) and (2.5) are not dual to each other, since the basic normal cone (2.3) is nonconvex and hence cannot be tangentially generated.

In this paper we employ, together with (2.4) and (2.5), the following modified derivative construction for mappings, which seems to be new in generality although constructions of this (radial, Dini-like) type have been widely used for extended-real-valued functions.

Definition 2.1

(restrictive graphical derivative of mappings). Let , and let . Then a set-valued mapping given by

(2.6)

is called the restrictive graphical derivative of at .

The next proposition collects some properties of the graphical derivative (2.4) and its restrictive counterpart (2.6) needed in what follows.

Proposition 2.2

(properties of graphical derivatives). Let , and let . Then the following assertions hold:

(a) We have for all .

(b) There are inverse derivative relationships

(c) If is single-valued and locally Lipschitzian around , then

(d) If is single-valued and directionally differentiable at , then

(e) If is single-valued and Gâteaux differentiable at with the Gâteaux derivative , then we have

(f) If is single-valued and Fréchet differentiable at with the derivative , then

Proof. It is shown in [35, 8(14)] that the graphical derivative (2.4) admits the representation

(2.7)

The inclusion in (a) is an immediate consequence of Definition 2.1 and representation (2.7).

The first equality in (b), observed from the very beginning [2], easily follows from definition (2.4). We can similarly check the second one in (b).

To justify the equality in (c), it remains to verify by (a) the opposite inclusion ’’ when is single-valued and locally Lipschitzian around . In this case fix , pick any , and find by representation (2.7) sequences and such that

The local Lipschitz continuity of around with constant implies that

for all sufficiently large, and hence we have the convergence

Thus , which justifies (c). Assertions (d) and (e) follow directly from the definitions. Finally, assertion (f) is implied by (e) in the local Lipschitzian case (c) while it can be easily derived from the (Fréchet) differentiability of at with no Lipschitz assumption; see, e.g., [35, Exercise 9.25(b)].

Proposition 2.2 reveals important differences between the graphical derivative (2.4) and the coderivative (2.5). Indeed, assertions (c) and (d) of this proposition show that the graphical derivative of locally Lipschitzian and directionally differentiable mappings is always single-valued. At the same time, the coderivative single-valuedness for locally Lipschitzian mappings is equivalent to the strict/strong Fréchet differentiability of at the point in question; see [24, Theorem 3.66]. It follows from the well-known formula

(2.8)

that the latter strict differentiability condition characterizes also the single-valuedness of the generalized Jacobian of at .

In fact, in the case of being locally Lipschitzian around the coderivative (2.5) admits the subdifferential description

(2.9)

where the (basic, limiting, Mordukhovich) subdifferential of a general scalar function at is defined geometrically by

(2.10)

via the normal cone (2.3) to the epigraph and admits analytical descriptions in terms of the outer limit (1.6) of the Fréchet/regular and proximal subdifferentials at points nearby; see [24, 35] with the references therein. Note also that the basic subdifferential (2.10) of a continuous function can be also described via the coderivative of by ; see [24, Theorem 1.80].

Finally in this section, we recall the notion of metric regularity and its coderivative characterization that play a significant role in the paper. A mapping is metrically regular around if there are neighborhoods of and of as well as a number such that

(2.11)

Observe that it is sufficient to require the fulfillment of (2.11) just for those satisfying the estimate dist for some ; see [24, Proposition 1.48].

We will see below that metric regularity is crucial for justifying the well-posedness of our generalized Newton algorithm and establishing its local and global convergence. It is also worth mentioning that, in the opposite direction, a Newton-type method (known as the Lyusternik-Graves iterative process) leads to verifiable conditions for metric regularity of smooth mappings; see, e.g., the proof of [24, Theorem 1.57] and the commentaries therein. The latter procedure is replaced by variational/extremal principles of variational analysis in the case of nonsmooth and set-valued mappings under consideration; cf. [10, 24, 35].

In this paper we broadly use the following coderivative characterization of the metric regularity property for an arbitrary set-valued mapping with closed graph, known also as the Mordukhovich criterion (see [23, Theorem 3.6], [35, Theorem 9.45], and the references therein): is metrically regular around if and only if the inclusion

(2.12)

which amounts the kernel condition .

3 The Generalized Newton Algorithm

This section presents the main contribution of the paper: a new generalized Newton method for nonsmooth equations, which is based on graphical derivatives. The section consists of three parts. In Subsection 3.1 we precisely describe the algorithm and justify its well-posedness/solvability. Subsection 3.2 contains a local superlinear convergence result under appropriate assumptions. Finally, in Subsection 3.3 we establish a global convergence result of the Kantorovich type for our generalized Newton algorithm.

3.1 Description and Justification of the Algorithm

Keeping in mind the classical scheme of the smooth Newton method in (1.2), (1.3) and taking into account the graphical derivative representation of Proposition 2.2(f), we propose an extension of the Newton equation (1.3) to nonsmooth mappings given by:

(3.1)

This leads us to the following generalized Newton algorithm to solve (1.1):

Algorithm 3.1

(generalized Newton’s method).

Step 0: Choose a starting point .

Step 1: Check a suitable termination criterion.

Step 2: Compute such that (3.1) holds.

Step 3: Set , , and go to Step 1.

The proposed Algorithm 3.1 does not require a priori any assumptions on the underlying mapping in (1.1) besides its continuity, which is the standing assumption in this paper. Other assumptions are imposed below to justify the well-posedness and (local and global) convergence of the algorithm. Observe that Proposition 2.2(c,d) ensures that Algorithm 3.1 reduces to scheme (1.4) in the -differentiable Newton method provided that is directionally differentiable and locally Lipschitzian around the solution point in question. In Section 5 we consider in detail relationships with known results for the -differentiable Newton method, while Section 4 compares Algorithm 3.1 and the assumptions made with the corresponding semismooth versions in the framework of (1.5).

To proceed further, we need to make sure that the generalized Newton equation (3.1) is solvable, which is a major part of the well-posedness of Algorithm 3.1. The next proposition shows that an appropriate assumption to ensure the solvability of (3.1) is metric regularity.

Proposition 3.2

(solvability of the generalized Newton equation). Assume that is metrically regular around with in (2.11), i.e., we have . Then there is a constant such that for all the equation

(3.2)

admits a solution . Furthermore, the set of solutions to (3.2) is computed by

(3.3)

Proof. By the assumed metric regularity (2.11) of we find a number and neighborhoods of and of such that

Pick now an arbitrary vector and select sequences and as . Suppose with no loss of generality that for all . Then we have

and hence there is a vector such that for all . This shows that the sequence is bounded, and thus it contains a subsequence that converges to some element . Passing to the limit as and recalling the definitions of the outer limit (1.6) and of the tangent cone (2.1), we arrive at

which justifies the desired inclusion (3.2). The solution representation (3.3) follows from (2.7) and Proposition 2.2(b) in the case of single-valued mappings, since

due to (3.2). This completes the proof of the proposition.

3.2 Local Convergence

In this subsection we first formulate major assumptions of our generalized Newton method and then show that they ensure the superlinear local convergence of Algorithm 3.1.

  • There exist a constant , a neighborhood of , and a neighborhood of the origin in such that the following holds:
    For all , , and for any with there is a vector such that

  • There exists a neighborhood of such that for all we have

A detailed discussion of these two assumptions and sufficient conditions for their fulfillment are given in Section 4. Note that assumption (H2) means, in the terminology of [8, Definition 7.2.2] focused on locally Lipschitzian mappings , that the family provides a Newton approximation scheme for at .

Now we establish our principal local convergence result that makes use of the major assumptions (H1) and (H2) together with metric regularity.

Theorem 3.3

(superlinear local convergence of the generalized Newton method). Let be a solution to (1.1) for which the underlying mapping is metrically regular around and assumptions (H1) and (H2) are satisfied. Then there is a number such that for all the following assertions hold:

(i) Algorithm 3.1 is well defined and generates a sequence converging to .

(ii) The rate of convergence is at least superlinear.

Proof. To justify (i), pick such that assumptions (H1) and (H2) hold with and and such that Proposition 3.2 can be applied. Then we choose a starting point and conclude by Proposition 3.2 that the subproblem

has a solution . Thus the next iterate is well defined. Let further and get by the choice of the starting point . By assumption (H1), find a vector such that

Taking this into account and employing assumption (H2), we get the relationships

which imply that . The latter yields, in particular, that . Now standard induction arguments allow us to conclude that the iterative sequence generated by Algorithm 3.1 is well defined and converges to the solution of (1.1) with at least a linear rate. This justifies assertion (i) of the theorem.

Next we prove assertion (ii) showing that the convergence is in fact superlinear under the validity of assumption (H2). To proceed, we basically follow the proof of assertion (i) and construct by induction sequences satisfying

with , and with such that

Applying then assumption (H2) gives us the relationships

which ensure the superlinear convergence of the iterative sequence to the solution of (1.1) and thus complete the proof of the theorem.

3.3 Global Convergence

Besides the local convergence in the classical Newton method based on suitable assumptions imposed at the (unknown) solution of the underlying system of equations, there are global (or semi-local) convergence results of the Kantorovich type [12] for smooth systems of equations which show that, under certain conditions at the starting point and a number of assumptions to hold in a suitable region around , Newton’s iterates are well defined and converge to a solution belonging to this region; see [7, 12] for more details and references. In the case of nonsmooth equations (1.1) results of the Kantorovich type were obtained in [31, 34] for the corresponding versions of Newton’s method. Global convergence results of different types can be found in, e.g., [6, 8, 9, 28] and their references.

Here is a global convergence result for our generalized Newton method to solve (1.1).

Theorem 3.4

(global convergence of the generalized Newton method). Let be a starting point of Algorithm 3.1, and let

(3.4)

with some . Impose the following assumptions:

  • The mapping in (1.1) is metrically regular on with modulus , i.e., it is metrically regular around every point with the same modulus .

  • The set-valued map uniformly on converges to as in the sense that: for all there is such that

  • There is such that

    (3.5)

    and for all we have the estimate

    (3.6)

Then Algorithm 3.1 is well defined, the sequence of iterates remains in and converges to a solution of (1.1). Moreover, we have the error estimate

(3.7)

Proof. The metric regularity assumption (a) allows us to employ Proposition 3.2 and, for any and satisfying the inclusion , to find sequences of and as such that

In view of assumption (3.5) in (c) and the iteration procedure of the algorithm, this implies

which ensures that due the form of in (3.4) and the choice of . Proceeding further by induction, suppose that and get the relationships

which imply the estimates

and hence justify that . Thus all the iterates generated by Algorithm 3.1 remain in . Furthermore, for any natural numbers and , we have

which shows that the generated sequence is a Cauchy sequence. Hence it converges to some point that obviously belongs to the underlying closed set (3.4).

To show next that is a solution to the original equation (1.1), we pass to the limit as in the iterative inclusion

(3.8)

It follows from assumption (b) that . The continuity of then implies that , i.e., is a solution to (1.1).

It remains to justify the error estimate (3.7). To this end, first observe by (3.5) that

for all . Passing now to the limit as , we arrive at (3.7) thus completes the proof of the theorem.

4 Discussion of Major Assumptions and Comparison with Semismooth Newton Methods

In this section we pursue a twofold goal: to discuss the major assumptions made in Section 3 and to compare our generalized Newton method based on graphical derivatives with the semismooth versions of the generalized Newton method developed in [30, 31]. As we will see from the discussions below, these two aims are largely interrelated.

Let us begin with sufficient conditions for metric regularity in terms of the constructions used in the semismooth versions of the generalized Newton method. Given a locally Lipschitz continuous vector-valued mapping , we have by the classical Rademacher theorem that the set of points

(4.1)

is of full Lebesgue measure in . Thus for any mapping locally Lipschitzian around the set

(4.2)

is nonempty and obviously compact in . It was introduced in [38] for as the set of “almost-gradients” and then was called in [30] the -subdifferential of at . Clarke’s generalized Jacobian [5] of at is defined by the convex hull

(4.3)

We also make use of the Thibault derivative/limit set [39] (called sometimes the “strict graphical derivative” [35]) of at defined by

(4.4)

Observe the known relationships [15, 39] between the above derivative sets

(4.5)

The next result gives a sufficient condition for metric regularity of Lipschitzian mappings in terms of the Thibault derivative (4.4). It can be derived from the coderivative characterization of metric regularity (2.12), while we give here a direct independent proof.

Proposition 4.1

(sufficient condition for metric regularity in terms of Thibault’s derivative). Let be locally Lipschitzian around , and let

(4.6)

Then the mapping is metrically regular around .

Proof. Kummer’s inverse function theorem [17, Theorem 1.1] ensures that condition (4.6) implies (actually is equivalent to) the fact that there are neighborhoods of and of such that the mapping is one-to-one with a locally Lipschitzian inverse . Let be a Lipschitz constant of on . Then for all and we have the relationships

which thus justify the metric regularity of around .

To proceed further with sufficient conditions for the validity of our assumption (H1), we first introduce the notion of directional boundedness.

Definition 4.2

(directional boundedness). A mapping is said to be directionally bounded around if

(4.7)

for all near and for all .

It is easy to see that if is either directionally differentiable around or locally Lipschitzian around this point, then it is directionally bounded around . The following example shows that the converse does not hold in general.

Example 4.3

(directional bounded mappings may not be directionally differentiable). Define a real function by

It is easy to see that this function is not directionally differentiable at . However, it is directionally bounded around . Indeed, for any near condition (4.7) holds because is simply differentiable at . For we have

The next proposition and its corollary present verifiable sufficient conditions for the fulfillment of assumption (H1).

Proposition 4.4

(assumption (H1) from metric regularity). Let , and let be a solution to (1.1). Suppose that is metrically regular around i.e., , that it is directionally bounded and one-to-one around this point. Then assumption (H1) is satisfied.

Proof. Recall that the metric regularity of around is equivalent to the condition by the coderivative criterion (2.12). Let be a neighborhood of such that is metrically regular and one-to-one on . Choose further a neighborhood of from the definition of metric regularity of around . Then pick , and an arbitrary direction satisfying . Employing now Proposition 3.2, we get

By the local single-valuedness of and the metric regularity of around there exists a number such that

for all sufficiently small. It follows that

by the directional boundedness of around . The boundedness of the family

allows us to select a sequence such that for some . By passing to the limit above as and employing Definition 2.1 we get that

which completes the proof of the proposition.

Corollary 4.5

(sufficient conditions for (H1) via Thibault’s derivative). Let be a solution to (1.1), where is locally Lipschitzian around and such that condition (4.6) holds, which is automatic when for all . Then (H1) is satisfied with being both metrically regular and one-to-one around .

Proof. Indeed, both metric regularity and bijectivity of around