Stability and instability in saddle point dynamics Part II: The subgradient method

# Stability and instability in saddle point dynamics Part II: The subgradient method

## Abstract

In part I we considered the problem of convergence to a saddle point of a concave-convex function via gradient dynamics and an exact characterization was given to their asymptotic behaviour. In part II we consider a general class of subgradient dynamics that provide a restriction in an arbitrary convex domain. We show that despite the nonlinear and non-smooth character of these dynamics their -limit set is comprised of solutions to only linear ODEs. In particular, we show that the latter are solutions to subgradient dynamics on affine subspaces which is a smooth class of dynamics the asymptotic properties of which have been exactly characterized in part I. Various convergence criteria are formulated using these results and several examples and applications are also discussed throughout the manuscript.

\newaliascnt

theoremdummy \aliascntresetthetheorem \newaliascntpropositiondummy \aliascntresettheproposition \newaliascntcorollarydummy \aliascntresetthecorollary \newaliascntlemmadummy \aliascntresetthelemma \newaliascntprimaldummy \aliascntresettheprimal \newaliascntdualdummy \aliascntresetthedual \newaliascntexampledummy \aliascntresettheexample \newaliascntdefinitiondummy \aliascntresetthedefinition \newaliascntproblemdummy \aliascntresettheproblem \newaliascntremarkdummy \aliascntresettheremark

{IEEEkeywords}

Nonlinear systems, subgradient dynamics, saddle points, non-smooth systems, networks, large-scale systems.

\IEEEpeerreviewmaketitle

## 1 Introduction

\IEEEPARstart

In [18] we studied the asymptotic behaviour of the gradient method when this is applied on a general concave-convex function in an unconstrained domain, and provided an exact characterization to its limiting solutions. Nevertheless, in many applications, such as primal/dual algorithms in optimization problems, it becomes necessary to constrain the system states in a prescribed convex set, e.g. positivity constraints on Lagrange multipliers or constraints on physical quantities like data flow, and prices/commodities in economics [20], [24], [37], [12]. The subgradient method is used in such cases, which is a version of the gradient method with a projection term in the vector field additionally included, so as to ensure that the trajectories do not leave the desired set.

In discrete time, there is an extensive literature on the subgradient method, via its application in optimization problems (see e.g. [33]). However, in many applications, for example power networks [41, 10, 22, 7, 8, 23, 38, 28, 32] and classes of data network problems [24], [37], [12], [30] continuous time models are considered. It is thus important to have a good understanding of the subgradient dynamics in a continuous time setting, which could also facilitate analysis and design by establishing links with other more abstract results in dynamical systems theory.

A main complication in the study of the subgradient method arises from the fact the this is a non-smooth system, i.e. a nonlinear ODE with a discontinuous vector field due to the projections involved. This prohibits the direct application of classical Lyapunov or LaSalle theorems (e.g. [25]), which is reflected in the direct approach used by Arrow, Hurwicz and Uzawa in [1] that avoids the use of such tools. More recently, the work of Feijer and Paganini [12] unified the previously ad-hoc and application focused analysis of primal dual gradient dynamics in network optimisation, and proposed that the switching in the dynamics be interpreted in the framework of hybrid automata, where a LaSalle Invariance principle was recently obtained in [31]. However, as recently pointed out in [4], there are cases where the assumptions required in [31] do not hold. In [4], the LaSalle invariance principle for discontinuous Carathéodory systems is applied to prove convergence of the subgradient method under positivity constraints and the assumption of strict concavity. Further results on the asymptotic properties of the subgradient method under positivity constraints where derived in [5] where global convergence was also shown under a condition of local strict concavity-convexity. In [35] the subgradient method is used to solve linear programs with inequality constraints. In general, proving convergence for the subgradient method even in simple cases, is a non-trivial problem that requires the non-smooth character of the system to be explicitly addressed.

Our aim in this paper is to provide a framework of results that allow one to study the asymptotic behaviour of the subgradient method in a general setting, where the trajectories are constrained to an arbitrary convex domain, and the concave-convex function considered is not necessarily strictly concave-convex. One of our main results is to show that despite the nonlinear and non smooth character of the subgradient dynamics, their limiting behaviour are solutions to explicit linear differential equations.

In particular, we show that these linear ODEs are limiting solutions of subgradient dyanmics on an affine subspace, which is a class of dynamics that fit within the framework studied in Part I [18]. These dynamics can therefore be exactly characterized, thus allowing to prove convergence to a saddle point for broad classes of problems.

The results in this paper are illustrated by means of examples that demonstrate also the complications in the dynamic behaviour of the subgradient method relative to the unconstrained gradient method. We also apply our results to modification schemes in network optimization, that provide convergence guarantees while maintaining a decentralized structure in the dynamics.

The methodology used for the derivations in the paper is also of independent technical interest. In particular, the notion of a face of a convex set is used to characterize the ODEs associated with the limiting behaviour of the subgradient dynamics. Furthermore, some more abstract results on corresponding semi-flows have been used to address the complications associated with the non-smooth character of subgradient dynamics.

The paper is structured as follows. Section 2 provides preliminaries from convex analysis and dynamical systems theory that will be used within the paper. The problem formulation is given in section 3 and the main results are presented in section 4, where various examples that illustrate those are also discussed. Applications to modification methods in network optimization are given in section 5. The proofs of the results are given in sections 6 and 7 and an application to the problem of multipath routing is discussed in Appendix .2.

## 2 Preliminaries

We use the same notation and definitions as in part I of this work [18] and we refer the reader to the preliminaries section therein. The notions below from convex analysis and analysis of dynamical systems will additionally be used throughout the paper.

### 2.1 Convex analysis

We recall first for convenience the following notions defined in part I [18] that will be frequently used in this manuscript. For a closed convex set and , we denote the normal cone to through as . When is an affine space is independent of and is denoted . If is in addition non-empty, then we denote the projection of onto as . Also for vectors , denotes the Euclidean metric and the Euclidean norm.

#### Concave-convex functions and saddle points

For a function that is concave-convex on the (standard) notion of a saddle point was given in part I [18]. We now consider restricted to a non-empty closed convex set , in which case the notion of saddle point needs to be modified to incorporate the constraints.

###### Definition \thedefinition (Restricted saddle point).

Let be non-empty closed and convex. For a concave-convex function , we say that is a -restricted saddle point of if for all and with we have the inequality .

If in addition then is a -restricted saddle point if and only if the vector of partial derivatives lies in the normal cone .

Any -restricted saddle point in the interior of is also a saddle point. If is closed and convex and is a -restricted saddle point, then is also a -restricted saddle point.

However, it in general does not hold that if has a saddle point, and is closed convex and non-empty, then has a -restricted saddle point (an explicit example illustrating this is given later in 4.2(ii)). In this manuscript we will only consider cases where at least one -restricted saddle point exists, leaving the problem of showing existence to the specific application.

#### Concave programming

Concave programming (see e.g. [3]) is concerned with the study of optimization problems of the form

 maxx∈C,g(x)≥0U(x) (1)

where , are concave functions and is non-empty closed and convex. Under some mild assumptions, the solutions to such problems are saddle points of the Lagrangian

 φ(x,y)=U(x)+yTg(x) (2)

where are the Lagrange multipliers. This is stated in the Theorem below.

###### Theorem \thetheorem.

Let be concave and Slater’s condition hold, i.e.

 ∃x′∈relintC with g(x′)>0. (3)

Then is an optimum of (1) if and only if with a -restricted saddle point of (2).

The min-max optimization problem associated finding a -restricted saddle point of (2) is the dual problem of (1).

#### Faces of convex sets

Some of the main results of this manuscript refer to faces of a convex set. We refer the reader to [16, Chap. 1.8.] for further discussion of such topics.

###### Definition \thedefinition (Face of a convex set).

Given a non-empty closed convex set , a face of is a subset of that has both the following properties:

1. is convex.

2. For any line segment , if then .

For the readers convenience we recall some standard properties of faces:

1. The intersection of two faces of is a face of .

2. The empty set and itself are both faces of . If a face is neither or it is called a proper face.

3. If is a face of and is a face of , then is a face of .

4. For a face of , the normal cone is independent of the choice of . In these cases we drop the dependence and write it as .

5. may be written as the disjoint union:

 K=⋃{relintF:F is a face of K}. (4)

Property (a) above leads to the following definition.

###### Definition \thedefinition (Minimal face containing a set).

For a convex set and a subset we define the minimal face containing as

 ⋂{F:F is a face of K and A⊆F}

which is a face by property (a) above.

### 2.2 Dynamical systems

###### Definition \thedefinition (Flows and semi-flows).

A triple is a flow (resp. semi-flow) if is a metric space, is a continuous map from (resp. ) to which satisfies the two properties

1. For all , .

2. For all , (resp. ),

 ϕ(t+s,x)=ϕ(t,ϕ(s,x)). (5)

When there is no confusion over which (semi)-flow is meant, we shall denote as . For sets (resp. ) and we define .

###### Definition \thedefinition (ω-limit set).

Given a semi-flow we denote the set of -limit points of trajectories as

 Ω(ϕ,X,ρ)=⋃x∈X⋂t≥0¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ϕ([t,∞),x). (6)

where denotes the closure of in .

###### Definition \thedefinition (Invariant sets).

For a semi-flow we say that a set is positively invariant if . If is also a flow we say that is negatively invariant if . If for all then we say is invariant.

###### Definition \thedefinition (Sub-(semi)-flow).

For a flow (resp. semi-flow) and an invariant (resp. positively invariant) set we obtain the sub-flow (resp. sub-semi-flow) by restricting to act on and denote it as .

###### Definition \thedefinition (Global convergence).

We say that a (semi)-flow is globally convergent, if for all initial conditions , the trajectory converges to the set of equilibrium points of as , i.e.

 inf{d(ϕ(t,x),y):y an equilibrium point}→0 as t→∞.

In part I of this work much of the analysis relied on a specific form of stability, linked to incremental stability, which we reproduce below for the convenience of the reader.

###### Definition \thedefinition (Pathwise stability).

We say that a semi-flow is pathwise stable if for any two trajectories the distance is non-increasing in time.

As it will be discussed in the paper, the -limit set of pathwise stable semiflows, is comprised of semiflows of the class defined below.

###### Definition \thedefinition ((Semi)-Flow of isometries).

We say that a (semi)-flow is a (semi)-flow of isometries if for every (resp. ), the function is an isometry, i.e. for all it holds that .

Finally, we will need the notion of Carathéodory solutions of differential equations.

###### Definition \thedefinition (Carathéodory solution).

We say that a trajectory is a Carathéodory solution to a differential equation , if is an absolutely continuous function of , and for almost all times , the derivative exists and is equal to .

## 3 Problem formulation

The main object of study in this work is the subgradient method on an arbitrary concave-convex function in and an arbitrary convex domain . We first recall the definition of the gradient method, which is studied in part I of this work [18].

###### Definition \thedefinition (Gradient method).

Given a concave-convex function on , we define the gradient method as the flow on generated by the differential equation

 ˙x =φx (7) ˙y =−φy.

The subgradient method is obtained by restricting the gradient method to a convex set by the addition of a projection term to the differential equation (7).

###### Definition \thedefinition (Subgradient method).

Given a non-empty closed convex set and a function that is concave-convex on , we define the subgradient method on as a semi-flow on consisting of Carathéodory solutions of

 ˙z =f(z)−PNK(z)(f(z)) (8) f(z) =[φx−φy]T.

by a transformation of coordinates.

###### Remark \theremark.

For (non-affine) convex sets the subgradient method (8) is a non-smooth system. The vector field is discontinuous due to the convex projection term, independently of the regularity of the function or of the boundary of . This is in contrast to the gradient method (7), which is a smooth system, as it inherits the regularity of the function .

The equilibrium points of the subgradient method on are exactly the -restricted saddle points.

We briefly summarise the contributions of this work in the bullet points below.

• We show that the subgradient dynamics, despite being nonlinear and non-smooth, have an -limit set that is comprised of solutions to only linear ODEs.

• These solutions are shown to belong to the limit set of the subgradient method on affine subspaces. This links with part I [18] of this two part work, where the limiting solutions of such systems have been exactly characterized. Based on this characterization of the limiting solutions, a convergence result for subgradient dynamics is also presented.

• Various applications of the results above are considered. In particular, we give a proof of the convergence of the subgradient method applied to any strictly concave-convex function for an arbitrary convex domain. Furthermore, we apply our results to modifications methods in network optimization that provide convergence guarantees while maintaining a decentralized structure in the dynamics. An application to the problem of multi-path routing is also discussed.

## 4 Main Results

This section states the main results of the paper. The results are divided into three subsections. To facilitate the readability of section 4 we outline below the main Theorems that will be presented and the way these are related.

In 4.1 we consider pathwise stable semiflows, an abstraction we use for the subgradient dynamcis in order to develop tools for their analysis that are valid despite their non-smooth character. In particular, 4.1 gives an invariance principle for such semi-flows, which applies without any smoothness assumption on the dynamics. We then additionally incorporate projections that constrain the trajectories within a closed convex set. Our key result, 4.1, says that for these semi-flows the dynamics on the -limit set are smooth.

In 4.2 we apply these tools to the subgradient method (8). In 4.2 we show that the limiting solutions of the (non-smooth) subgradient method on a convex set are given by the dynamics of the (smooth) subgradient method on an affine subspace. This allows us to obtain 4.2, a criterion for global asymptotic stability of the subgradient method.

In 4.3 we combine 4.2 with the results of Part I of this work [18] (for convenience of the reader reproduced in Appendix .1) to obtain a general convergence criterion (4.3) for the subgradient method.

These results are illustrated with examples throughout. The proofs of the results are given in section 6.

### 4.1 Pathwise stability and convex projections

If one wishes to extend the results of Part I of this work [18] to the subgradient method on a non-empty closed convex set , then one runs into two problems, both coming from the discontinuity of the vector field in (8). The first is that the previously simple application of LaSalle’s theorem would become much more technical - needing tools from non-smooth analysis. The second, more fundamental, problem is that LaSalle’s theorem only gives convergence to a set of trajectories, and it remains to characterise this set. The trajectories in this set still satisfy an ODE with a discontinuous vector field, and we do not have uniqueness of the solution backwards in time - we still, though, have a semi-flow.

To solve these issues we reinterpret the prior results in terms of a simple property which is still present in the subgradient method.

The main tool used to prove the results in [18] was pathwise stability, (2.2), which says that the Euclidean distance between any two solutions is non-increasing with time (we will later prove such a result for the subgradient method). Intuitively, one would expect that the distance between any two of the limiting solutions would be constant. A more abstract way of saying this is that the sub-flow obtained by considering the gradient method with initial conditions in the -limit set is a flow of isometries. In fact, this can be proved for any pathwise stable semi-flow, as stated in Proposition 4.1 below.

###### Proposition \theproposition.

Let be a pathwise stable semi-flow (see 2.2) with which has an equilibrium point . Let be its -limit set. Then the sub-semi-flow (see 2.2) defines a flow of isometries (see 2.2). Moreover, is a convex set.

Note here that is a flow rather than a semi-flow. This comes from the simple observation that an isometry is always invertible, so we can define, for , as .

###### Remark \theremark.

Care should be taken in interpreting the backwards flow given by 4.1. There could be multiple trajectories in that meet at a point in at time , but exactly one of these trajectories will lie in for all times .

We would like to note that we are not the first to make this observation. Indeed, we deduce this result from a more general result in [6] which was published in 1970.

We consider pathwise stable differential equations which are projected onto a convex set, and make the following set of assumptions.

 (ϕ,K,d) is the semi-flow of Carathéodory solutions of (9) ˙z =f(z)−PNK(z)(f(z)) where, K ⊆Rn is non-empty, closed and convex C1∋f: K→Rn satisfies, for all z,w∈K, (f (z)−f(w))T(z−w)≤0.

It should be noted that the final inequality in (9) holds for the subgradient method (8), which is evident from the proof of the pathwise stability of the gradient method presented in [18, Appendix B].

A simple first result is that the projected dynamics are still pathwise stable.

###### Lemma \thelemma.

Let (9) hold. Then is pathwise stable.

Our main result on such projected differential equations is that, even though the projection term gives a discontinuous vector field, when we restrict our attention to the -limit set, the vector field is . This allows us to replace non-smooth analysis with smooth analysis when studying the asymptotic behaviour of such systems.

###### Theorem \thetheorem.

Let (9) hold and assume that the semi-flow has an equilibrium point. Let be its -limit set. Then defines a flow of isometries given by solutions to the following differential equation, which has a vector field,

 ˙z=f(z)−PNV(f(z)). (10)

Here is the affine span of the (unique) minimal face (see 2.1.3) of that contains the set of equilibrium points of the semi-flow.

###### Remark \theremark.

The existence of a minimal face of that contains the set of equilibrium points is a simple consequence of the definition of a face (see 2.1.3 and the discussion that follows). The important part of 4.1 is that the dynamics on are given by (10), i.e. the projection operator in (8) becomes which does not depend on the position .

### 4.2 The subgradient method

We now apply theses results to the subgradient method. Our first result reduces the study of the convergence on general convex domains, where the subgradient method is non-smooth, to the study of convergence of the subgradient method on affine spaces, which is a smooth dynamical system studied in [18]. We also show that when an internal saddle point exists then the limiting behaviour of the subgradient method is determined by that of the corresponding unconstrained gradient method.

As in part I of this work [18], given a concave-convex function we define the following

• is the set of saddle points of

• is the set of solutions to the gradient method (7) (i.e. no projections included) that lie a constant distance from any saddle point.

###### Theorem \thetheorem.

Let be non-empty, closed and convex. Let be , concave-convex on and have a -restricted saddle point. Let denote the subgradient method (8) on and be its -limit set. Then is convex, and defines a flow of isometries. Furthermore, the following hold:

1. The trajectories of solve the ODE:

 ˙z=f(z)−PNV(f(z)), (11)

where is the affine span of , with being the minimal face containing all -restricted saddle points.

2. If there exists a saddle point of in the interior of , then

 Ω={z(t)∈S:z(R)⊆K}. (12)

where is as defined before the theorem statement.

###### Remark \theremark.

The ODE (11) is the subgradient method on the affine subspace . A main significance of 4.2 is the fact that the solutions of (11) in can be characterized using the results in part I [18]. In particular, it follows from Theorem .1 in Appendix .1 that these satisfy explicit linear ODEs. This therefore shows that even though the subgradient dynamics are nonlinear and nonsmooth their -limit set is comprised of solutions to only linear ODEs (stated in 4.3).

###### Remark \theremark.

Later, in 4.3 we use the results in [18] on the subgradient method on affine subspaces together with 4.2 to obtain a convergence criterion for the subgradient method. This is used subsequently to give proofs for the applications considered in 5.

###### Remark \theremark.

It will be discussed in the proof of 4.2 that 4.2(ii) is a special case of 4.2(i) where the projection term in (11) equal to zero. In 4.2(ii) there is a simple characterization of the limiting solutions of the subgradient method, as just the limiting solutions of the corresponding gradient method that lie in . Note that the set in (12) was exactly characterized in [18].

###### Remark \theremark.

A simple consequence of (12) is the fact if there exists a saddle point in the interior of then the subgradient method is globally convergent if the corresponding unconstrained gradient method is globally convergent.

We now present several examples to illustrate the application of 4.2 in some simple cases.

The first example corresponds to a case where the unconstrained gradient method (7) is globally convergent, but the subgradient method is not.

###### Example \theexample.

Define the concave-convex function

 φ(x1,x2,y)=−12|x1|2+(x1+x2)y (13)

where . This has a single saddle point at , and is the Lagrangian of the optimisation problem

 maxx1+x2=0−12|x1|2 (14)

where variable in function is the Lagrange multiplier associated with the constraint. On this function the gradient method is the linear system

 ⎡⎢⎣˙x1˙x2˙y⎤⎥⎦=⎡⎢⎣−101001−1−10⎤⎥⎦⎡⎢⎣x1x2y⎤⎥⎦. (15)

It is easily verified that all the eigenvalues of this matrix lie in the left half plane, so that the gradient method is globally convergent. Now consider the family of convex sets defined by

 Ka={(x1,x2,y)∈R3:x1≥a} (16)

for . The subgradient method on is given by the system

 ˙x1 =[−x1+y]+x1−a (17) ˙x2 =y ˙y =−x1−x2.

The convergence of the subgradient method on depends crucially on the value of . There are three cases:

1. : In this case the saddle point lies in the interior of so that 4.2(ii) applies, and as the unconstrained gradient method is globally convergent, so is the subgradient method on .

2. : Here the unconstrained saddle point lies outside . A simple computation shows that the point is the only -restricted saddle point. 4.2(i) can be used here. The only proper face of is the set

 Fa={(a,x2,y):x2,y∈R}. (18)

The subgradient method on is the system

 [˙x2˙y]=[01−10][x2y]+[0−a] (19)

together with the equality . This matrix has imaginary eigenvalues , showing that the subgradient method on is not globally convergent. It is easy to verify that some of these oscillatory solutions are also solutions of the subgradient method on . Therefore the subgradient method on is not globally convergent when .

3. : In this case the saddle point lies on the boundary of . 4.2(i) applies, and the analysis of the subgradient method on is the same as in case (ii) above. However, when we check whether any oscillatory solutions of the subgradient method on are also solutions of the subgradient method on , we find that there are no such solutions. Indeed, for a trajectory to be a solution to both the subgradient method on and the subgradient method on we must have both and by (17). Then (17) implies that and then that . So the only such solution is the saddle point. Therefore the subgradient method on is globally convergent.

This shows that the subgradient method on undergoes a bifurcation at .

The following example illustrates that the subgradient method can be globally convergent when the gradient method is not.

###### Example \theexample.

Define the concave-convex function

 φ(x1,x2,y)=−12|x2|2+x1y. (20)

This has a single saddle point at and corresponds to the optimisation problem

 maxx1=0−12|x2|2 (21)

where the constraint is relaxed via the Lagrange multiplier . The gradient method applied to is the linear system

 ⎡⎢⎣˙x1˙x2˙y⎤⎥⎦=⎡⎢⎣0010−10−100⎤⎥⎦⎡⎢⎣x1x2y⎤⎥⎦ (22)

whose matrix has eigenvalues so the gradient method is not globally convergent. We again consider the subgradient method on the closed convex set defined by (16) for splitting into three cases:

1. : As in 4.2(i) the saddle point lies in the interior of . As the unconstrained gradient method is not globally convergent, 4.2(ii) implies that the subgradient method on is also not globally convergent.

2. : The subgradient method on is given by

 ˙x1 =[y]+x1−a (23) ˙x2 =−x2 ˙y =−x1

The saddle point lies outside . For to be a -restricted saddle point, (23) implies that , but this is impossible in , so there are no -restricted saddle points. This can also be understood in terms of the optimisation problem (21) which has empty feasible set if we impose the further condition that . This means that none of our results apply, but a direct analysis of (23) shows that so that as , and the system is not globally convergent.

3. : Solving (23) for the -restricted saddle points yields the continuum . None of these lie in the interior of , so 4.2(ii) does not apply and 4.2(i) is used to analyze the asymptotic behaviour. The only proper face of is defined by (18). On , the subgradient method is the system

 [˙x2˙y]=[−1000][x2y] (24)

together with the equality , which is clearly globally convergent, noting that the set of -restricted saddle points is . Therefore the subgradient method on is also globally convergent.

So in this case the subgradient method on starts non-convergent for , becomes globally convergent for and finally looses all its equilibrium points when .

Although the minimal face in 4.2(i) is given as the intersection of all faces that contain -restricted saddle points, it can be useful to obtain convergence criteria that do not depend upon knowledge of all -restricted saddle points. We note that if the subgradient method is globally convergent on any affine span of a face of , then global convergence is implied.

###### Corollary \thecorollary.

Let be non-empty, closed and convex. Let be and concave-convex on . Let have a -restricted saddle point. Assume that, for any face of that contains a -restricted saddle point, the subgradient method on is globally convergent. Then the subgradient method on is globally convergent.

###### Example \theexample.

To illustrate this result, let us consider the case of positivity constraints, where are restricted to . Here the faces of are given by sets of the form

 {(x,y)∈Rn+×Rm+:xi=0,yj=0 for% i∉I,j∉J}

where and are sets of indices. The affine span of such a face is then given by

 {(x,y)∈Rn+m:xi=0,yj=0 for i∉I,j∉J}. (25)

Thus, by 4.2, checking convergence of the subgradient method in this case may be done by checking convergence of the gradient method with any arbitrary set of coordinates fixed as zero1.

In some cases the faces of the constraint set have an interpretation in terms of the specific problem.

###### Example \theexample.

Consider the optimisation problem

 maxgj(x)≥0,j∈{1,…,m}U(x) (26)

where are concave functions in . This is associated with the Lagrangian

 φ(x,y)=U(x)+∑j∈{1,…,m}yjgj(x) (27)

where is a vector of Lagrange multipliers2. To ensure that the Lagrange multipliers are non-negative we define the constraint set . As in 4.2 the affine spans of the faces of are given by (25) for and any subset of . The subgradient method applied on such a face corresponds to the gradient method on the modified Lagrangian

 φ′(x,y)=U(x)+∑j∈Jyjgj(x) (28)

which is associated with the modified optimisation problem

 maxgj(x)=0,j∈JU(x) (29)

where, compared to (26), the inequality constraints are replaced by equality constraints, and some subset of the constraints are removed.

If is concave-convex on then 4.2 applies. We obtain that the subgradient method on applied to is globally convergent, if, for any , the gradient method applied to the Lagrangian corresponding to the modified optimisation problem (29) is globally convergent.

### 4.3 A general convergence criterion

By combining 4.2 with the results on the limiting solutions of the (smooth) subgradient method on affine subspaces given in [18] we obtain the following convergence criterion for the subgradient method on arbitrary convex sets and arbitrary concave-convex functions. This states that the subgradient method is globally convergent, if it has no trajectory satisfying an explicit linear ODE.

To state the theorem we recall from [18] the definition of the following matrices of partial derivatives of a concave-convex function

 A(z) =[0φxy(z)−φyx(z)0] (30) B(z) =[φxx(z)00−φyy(z)].

The theorem is stated under the assumption that is a -restricted saddle point. The general case is obtained by a translation of coordinates.

###### Theorem \thetheorem.

Let be non-empty, closed and convex in with . Let be concave-convex on and have as a -restricted saddle point. Let be the minimal face of that contains all -restricted saddle points and let be the affine span of . Let be the orthogonal projection matrix onto the orthogonal complement of . Let also and be the matrices defined in (30).

Then if the subgradient method (8) on applied to has no non-constant trajectory that satisfies both the following

1. the linear ODE

 ˙z(t)=ΠA(0)Πz(t) (31)
2. for all and ,

 z(t)∈ker(ΠB(rz(t))Π)∩ker(Π(A(rz(t))−A(0))Π), (32)

then the subgradient method is globally convergent.

###### Remark \theremark.

Although the condition (32) appears difficult to verify, it is only necessary to show that the condition does not hold (by non-trivial trajectories) in order to prove global convergence. This turns out to be easy in many cases, for example in the proofs of the convergence of the modification methods discussed in section 5 (5.2.4).

###### Remark \theremark.

It should be noted that (31) and (32) are satisfied by all trajectories in the -limit set of the subgradient method. This follows from Theorem 4.2 and Theorem .1 and is stated in the corollary below.

###### Corollary \thecorollary.

Consider the subgradient method (8) and let be a -restricted saddle point. Then any trajectory in the -limit set satisfies (31) and (32), i.e. it is a solution of a linear ODE.

## 5 Applications

In this section we apply the results of 4 to obtain global convergence in a number cases. First we consider the subgradient method applied to a strictly concave-convex function on an arbitrary convex domain. Then we look at some examples of modification methods, relevant in network optimization, where the concave-convex function is modified to provide guarantees of convergence. The application of one such modification method to the problem of multi-path routing is also discussed in Appendix .2.

The proofs for this section are provided in 7.

### 5.1 Convergence for strictly concave-convex functions on arbitrary convex domains

The convergence of the subgradient method when applied to functions that are strictly concave-convex, (i.e. at least one of the concavity or convexity is strict), was proved by Arrow, Hurwicz and Uzawa [1] under positivity constraints. More recently, [12] and [4] revisited this result, giving more modern proofs in the case where the concave-convex function has the form (2) with and strictly concave, with further extensions provided in [5] for concave-convex functions with positivity constraints in one of the variables. The case of restriction of a general concave-convex function to an arbitrary convex set appears to be unknown in the literature (the theory for discrete time subgradient methods is more complete, see e.g. [33]). Using the results established in the previous section we prove here that for a non-empty closed convex set the subgradient method on applied to a strictly concave-convex function is globally convergent.

###### Theorem \thetheorem.

Let be non-empty, closed and convex. Let be and strictly concave-convex on , and have a -restricted saddle point. Then the subgradient method (8) on is globally convergent.

###### Remark \theremark.

It follows from the proof of Theorem 5.1 that it is sufficient for the concave-convex function to be strictly concave-convex only in an open ball about the saddle point3 rather than the whole of the domain for global convergence to be guaranteed.

### 5.2 Modification methods for convergence

We will consider methods for modifying so that the (sub)gradient method converges to a saddle point. The methods that will be discussed are relevant in network optimisation (see e.g. [1], [12]), as they preserve the localised structure of the dynamics. It should be noted that these modifications do not necessarily render the function strictly concave-convex and hence convergence proofs are more involved. We show below that the results in 4 provide a systematic and unified way of proving convergence by making use of Theorem 4.3, while also allowing to consider these methods in a generalized setting of a general convex domain.

#### Auxiliary variables method

Given a concave-convex function defined on a convex domain , we define the modified concave-convex function as

 φ′(x′,x,y)=φ(x,y)+ψ(Mx−x′) (33) ψ:Rn′→R,ψ∈C2, is% strictly concave with ψ(0)=0,ψ(u)≤0,

where is a vector of auxiliary variables, and is a constant matrix that satisfies for a -restricted saddle point of .

We define the augmented convex domain as . Note that the additional auxiliary variables are not restricted and are allowed to take values in the whole of . Also note that the identity matrix always satisfies the assumptions upon above.

###### Remark \theremark.

An important feature of this modification (and also the ones that will be considered below) is the fact that there is a correspondence between -restricted saddle points of and -restricted saddle points of , with the values of at the saddle points remaining unchanged. In particular, if is a -restricted saddle point of , then is a -restricted saddle point of . In the reverse direction, if is a -restricted saddle point of then and is a -restricted saddle point of .

###### Remark \theremark.

The significance of this method will become more clear in the multipath routing problem discussed in Appendix .2. In particular, this method allows convergence to be guaranteed in network optimization problems without introducing additional information transfer among nodes. Special cases of this method have also been used in [9], [19] in applications in economic and power networks.

#### Penalty function method

For this and the next method we will assume that the concave-convex functions is a Lagrangian originating from a concave optimization problem (see 2.1.2). We will assume that the Lagrangian satisfies

 φ(x,y) =U(x)+yTg(x) (34) C2∋U :Rn→R is concave C2∋g :Rn→Rm is concave.

We consider a so called penalty method (see e.g. [14]). This method adds a penalising term to the Lagrangian based directly on the constraint functions. The new Lagrangian is defined by

 φ′(x,y) =φ(x,y)+ψ(g(x)) (35) C2∋ψ:Rm →R is strictly concave with ψu>0 ψ(u) =0⟺u≥0.

It is easy to see that the saddle points of and are the same.

###### Remark \theremark.

This modification method is also often applied to network optimization problems, i.e. problems where is of the form and each of the is a function of only a few of the components of . Similarly each component, , of the constraints depends on only a few of the components of . The subgradient method for such problems applied to (34) has a decentralized structure. When applied to the modified version (35) the dynamics will still have a decentralized structure, but will often also involve additional information exchange between neighboring nodes, e.g. when is linear, due to the nonlinearity of the function .

###### Remark \theremark.

This method has been considered previously by many authors, (see [12] and the references therein4), either without constraints, or with positivity constraints, i.e. . 5.2.4 below applies to all non-empty closed convex sets .

#### Constraint modification method

We next recall a method proposed by Arrow et al.[1] and later studied in [12]. Here we instead modify the constraints to enforce strict concavity. The Lagrangian (34) is modified to become:

 φ′(x,y) =U(x)+yTψ(g(x)) (36) C2∋U :Rn→R is concave C2∋g :Rn→Rm is concave C2∋ψ= [ψ1,…,ψm]T:Rm→Rm ψj(0) =0,ψju≥0 and ψjuu<0 for j=1,…m.

It is clear that the value of at the saddle points of the modified and original Lagrangian will be the same. In analogy with Remark 5.2.2, this method also preserves the decentralized structure of the subgradient method for network optimization problems, but may require additional information transfer.

###### Remark \theremark.

Previous works [1],[12],[4] have proved convergence of this method with positivity constraints, i.e. . 5.2.4 below applies to any constraint set which is a product set with , both non-empty closed and convex.

#### Convergence results

We now give a global convergence result for each of the methods described above on general convex domains.

###### Theorem \thetheorem (Convergence of modification methods).

Assume that , and satisfy one of the following:

1. Auxiliary variable method: Let be concave-convex on a non-empty closed convex set. Let and be defined by (33) and the text directly below it.

2. Penalty function method: Let have the form (34), be defined by (35) and be an arbitrary non-empty closed convex set.

3. Constraint modification method: Let have the form (34), be given by (36) and with , both non-empty closed and convex.

Also assume that has a -restricted saddle point. Then the subgradient method (8) applied to on domain in and domain in is globally convergent.

###### Remark \theremark.

Each of the convergence results in 5.2.4 is proved using 4.3. It should also be noted that none of the modification methods produce necessarily a strictly concave-convex function . Global convergence to a saddle point is still though guaranteed by ensuring that no trajectory, other than saddle points, satisfy conditions (31), (32) in 4.3.

## 6 Proofs of the main results

In this section we prove the main results of the paper, which are stated in 4.

### 6.1 Outline of the proofs

We first give a brief outline of the derivations of the results to improve their readability.

#### Pathwise stability and convex projections

In 6.2 we prove the results described in 4.1.

We revisit some of the literature on topological dynamical systems [6], quoting a more general result 6.2, from which 4.1 is deduced. Then 4.1 is proved using the convexity of the domain . The combination of these results allow us to prove the main result of the subsection, 4.1, using the fact that the convex projection term cannot break the isometry property of the flow on the -limit set.

#### Subgradient method

In subsections 6.3, 6.4 we prove the results in subsections 4.2, 4.3, respectively, using the results in 4.1.

### 6.2 Convergence to a flow of isometries

In this section we provide the proofs of 4.1, 4.1 and 4.1.

We begin by revisiting the literature on topological dynamical systems, in which a type of incremental stability is studied, and show how this leads to an invariance principle for pathwise stability.

###### Definition \thedefinition (Equicontinuous semi-flow).

We say that a flow (resp. semi-flow) is equicontinuous if for any and there is a such that if then

 ρ(x(t),x′(t))≤ε for all t∈R (% resp. R+). (37)
###### Remark \theremark.

In the control literature equicontinuity of a semi-flow would correspond to ‘semi-global non-asymptotic incremental stability’, but we shall keep the term equicontinuity for brevity and consistency with [6].

###### Definition \thedefinition (Uniformly almost periodic flow).

We say that a flow is uniformly almost periodic if for any there is a syndetic set , (i.e. for some compact set ), for which

 ρ(ϕ(t,x),x)≤ε for all t∈A,x∈X. (38)

For the readers convenience we reproduce the results, [6, Theorem 8] and [11, Proposition 4.4.], that we will use.

###### Theorem \thetheorem (G. Della Riccia [6]).

Let be an equicontinuous semi-flow and let be either locally compact or complete. Let be its -limit set. Then is an equicontinuous semi-flow of homeomorphisms of onto . This generates an equicontinuous flow.

The backwards flow given by 6.2 is only unique on , (see 4.1 which also applies here).

###### Proposition \theproposition (R. Ellis [11]).

Let be a flow, with compact. Then the following are equivalent:

1. The flow is equicontinuous.

2. The flow is uniformly almost periodic.

In our case we study pathwise stability which is a particular form of equicontinuity. We prove stronger results in this special case.

###### Proof of 4.1.

By 6.2 is an equicontinuous flow with an equilibrium point . Let be arbitrary, and define

 YR={z(0)∈Ω:supt∈Rd(z(t),¯z)≤R}. (39)

As the flow is equicontinuous, is a closed bounded subset of and hence compact, and moreover, the union of the sets over is . By 6.2 the flow is uniformly almost periodic. By pathwise stability, is a non-increasing along the direct product flow, and is a continuous function on a compact set. Hence we have the inequality, for any two points ,

 limt→−∞ d(z(t),z′(t))=supt∈Rd(z(t),z′(t)) (40) ≥inft∈Rd(z(t),z′(t))=limt→∞d(z(t),z′(t)).

We claim that the two limits are equal. Indeed, by uniform almost periodicity there are sequences and as for which

 0=limn→∞d(z(tn),z(0))=limn→∞d(z(t′n),z(0)) (41)

and the analogous limits hold for for the same sequences . Hence, by continuity of , we have

 limt→−∞d(z(t),z′(t))=d(z(0),z′(0))=limt→∞d(z(t),z′(t)). (42)

Hence is constant. By picking big enough, this holds for any , which completes the proof that the sub-semi-flow generates a flow of isometries.

It remains to show that is convex. To this end let be two trajectories of . Let that and define . By the same argument as used in the proof of [18, Proposition 28] we deduce that is a trajectory of the original semi-flow, but (as argued above) by uniform almost periodicity of we have a sequence of times for which as and the same limit for . Hence also, showing that is in the -limit set. ∎

We now work under the set of assumptions (9) and consider projected pathwise stable differential equations.

###### Proof of 4.1.

Let and be two arbitrary solutions to the projected ODE, and define . Then is absolutely continuous and for almost all times we have,

 ˙W(t) =(z(t)−z′(t))T(˙z(t)−˙z′(t)) (43) =(z(t)−z′(t))T(f(z(t))−f(z′(t)))+ −(z(t)−z′(t))TPNK(z(t))(f(z(t)))+ +(z(t)−z′(t))TPNK(z′(t))(f(z′(t))).

The first term is non-positive due to the assumption that the ODE satisfies (9). The other two terms are non-positive due to the definition of the normal cone.

We now use the isometry property together with the geometry of the convex projection term to obtain the key result of this section, 4.1, which states that the limiting dynamics of a pathwise stable ODE restricted to a convex set have smooth vector field and lie inside one of the faces of .

To prove