Stability and instability
in saddle point dynamics - Part I
We consider the problem of convergence to a saddle point of a concave-convex function via gradient dynamics. Since first introduced by Arrow, Hurwicz and Uzawa in  such dynamics have been extensively used in diverse areas, there are, however, features that render their analysis non trivial. These include the lack of convergence guarantees when the function considered is not strictly concave-convex and also the non-smoothness of subgradient dynamics. Our aim in this two part paper is to provide an explicit characterization to the asymptotic behaviour of general gradient and subgradient dynamics applied to a general concave-convex function. We show that despite the nonlinearity and non-smoothness of these dynamics their -limit set is comprised of trajectories that solve only explicit linear ODEs that are characterized within the paper.
More precisely, in Part I an exact characterization is provided to the asymptotic behaviour of unconstrained gradient dynamics. We also show that when convergence to a saddle point is not guaranteed then the system behaviour can be problematic, with arbitrarily small noise leading to an unbounded variance. In Part II we consider a general class of subgradient dynamics that restrict trajectories in an arbitrary convex domain, and show that their limiting trajectories are solutions of subgradient dynamics on only affine subspaces. The latter is a smooth class of dynamics with an asymptotic behaviour exactly characterized in Part I, as solutions to explicit linear ODEs. These results are used to formulate corresponding convergence criteria and are demonstrated with several examples and applications presented in Part II.
Finding the saddle point of a concave-convex function is a problem that is relevant in many applications in engineering and economics and has been addressed by various communities. It includes, for example, optimization problems that are reduced to finding the saddle point of a Lagrangian. The gradient method, first introduced by Arrow, Hurwicz and Uzawa  has been widely used in this context as it leads to decentralized update rules for network optimization problems. It has therefore been extensively used in areas such as resource allocation in communication and economic networks (e.g. , , , , , ), game theory , distributed optimization [12, 30, 28] and power networks [31, 9, 17, 7, 8, 29, 22, 25].
Nevertheless, in broad classes of problems there are features that render the analysis of the asymptotic behaviour of gradient dynamics nontrivial. In particular, even though for a strictly concave-convex function convergence to a saddle point via gradient dynamics is ensured, when this strictness is lacking, convergence is not guaranteed and oscillatory solutions can occur. The existence of such oscillations has been reported in various applications , , , , however, an exact characterization of their explicit form for a general concave-convex function, which leads also to a necessary and sufficient condition for their existence, has not been provided in the literature and is one of the aims of Part I of this work.
Furthermore, when subgradient methods are used to restrict the dynamics in a convex domain (needed, e.g., in optimization problems), the dynamics become non-smooth in continuous-time. This increases significantly the complexity in the analysis as classical Lyapunov and LaSalle type techniques (e.g. ) cannot be applied. This is also reflected in the alternative approach taken for the convergence proof in  for subgradient dynamics applied to a strictly concave-convex Lagrangian with positivity constraints. Furthermore, an interesting recent study  pointed out that the invariance principle for hybrid automata in  cannot be applied in this context, and gave an alternative proof, by means of Caratheodory’s invariance principle, to the convergence result in  mentioned above. Convergence criteria for unconstrained gradient dynamics were also derived in  and under positivity constraints in . In general, rigorously proving convergence for the subgradient method, even in what would naively appear to be simple cases, is a non-trivial problem, and requires much machinery from non-smooth analysis [6, 13].
Our aim in this two part paper is to provide an explicit characterization of the asymptotic behaviour of continuous-time gradient and subgradient dynamics applied to a general concave-convex function. Our analysis is carried out in a general setting, where the function with respect to which these dynamics are applied is not necessarily strictly concave-convex. Furthermore, a general class of subgradient dynamics are considered, where trajectories are restricted in an arbitrary convex domain. One of our main results is to show that despite the nonlinear and nonsmooth character of these dynamics their -limit set is comprised of trajectories that solve explicit linear ODEs.
Our main contributions can be summarized as follows:
In Part I, we consider the gradient method applied on a general concave-convex function in an unconstrained domain, and provide an exact characterization to the limiting solutions, which can in general be oscillatory. In particular, we show that despite the nonlinearity of the dynamics the trajectories converge to solutions that satisfy a linear ODE that is explicitly characterized. Furthermore, we show that when such oscillations occur, the dynamic behaviour can be problematic, in the sense that arbitrarily small stochastic perturbations can lead to an unbounded variance.
In Part II, we consider the subgradient method applied to a general concave-convex function with the trajectories restricted in an arbitrary convex domain. We show that despite the non-smooth character of these dynamics, their limiting behaviour is given by the solutions of one of an explicit family of linear ODEs. In particular, these ODEs are shown to be solutions of subgradient dynamics on affine subspaces, which is a class of dynamics the asymptotic properties of which are exactly determined in Part I. These results are used to formulate corresponding convergence criteria, and various examples and applications are discussed.
It should be noted that there is a direct link between the results in Part I and Part II as the dynamics, that are proved to be associated with the asymptotic behaviour of the subgradient method, are a class of dynamics that can be analysed with the framework introduced in Part I. Applications of the results in Part I will therefore be discussed in Part II, as in many cases (e.g. optimization problems with inequality constraints) a restricted domain for the concave-convex function needs to be considered.
Finally, we would also like to comment that the methodology used for the derivations in the two papers is of independent technical interest. In Part I the analysis is based on various geometric properties established for the saddle points of a concave-convex function. In Part II the non-smooth analysis is carried out by means of some more abstract results on corresponding semiflows that are applicable in this context, while also making use of the notion of a face of a convex set to characterize the asymptotic behaviour of the dynamics.
The Part I paper is structured as follows. In section II we introduce various definitions and preliminaries that will be used throughout the paper. In section III the problem formulation is given and the main results are presented in section IV, i.e. characterization of the limiting behaviour of gradient dynamics. This section also includes an extension to a class of subgradient dynamics that restrict the trajectories on affine spaces. This is a technical result that will be used in Part II to characterize the limiting behaviour of general subgradient dynamics. The proofs of the results are finally given in section VI.
Real numbers are denoted by and non-negative real numbers as . For vectors the inequality denotes the corresponding element wise inequality, i.e. , denotes the Euclidean metric and denotes the Euclidean norm.
The space of times continuously differentiable functions is denoted by . For a sufficiently differentiable function we denote the vector of partial derivatives of with respect to as , respectively . The Hessian matrices with respect to and are denoted and , while denotes the matrix of partial derivatives defined as . For a vector valued function we let denote the matrix formed by partial derivatives of the elements of , i.e. .
For a matrix we denote its kernel and transpose by and respectively. If is in addition symmetric, we write if is negative definite.
For subspaces we denote the orthogonal complement as , and for a set of vectors we denote their span as , their affine span as and their convex hull as . The addition of a vector and a set is defined as .
For a set , we denote the interior, relative interior, boundary and closure of as , , and respectively, and we say that and are orthogonal and write if for any two pairs of points and , we have .
Given a set and a function we say that is an isometry of or simply an isometry, if for all we have .
For we define if and if .
Ii-A2 Convex geometry
For a closed convex set and , we define the maximal orthogonal linear manifold to through as
and the normal cone to through as
When is an affine space is independent of and is denoted . If is in addition non-empty, then we define the projection of onto as .
Ii-B Concave-convex functions and saddle points
Definition 0 (Concave-convex function).
Let be non-empty closed and convex. We say that a function is concave-convex on if for any , is a concave function of and is a convex function of . If either the concavity or convexity is always strict, we say that is strictly concave-convex on .
Definition 0 (Saddle point).
For a concave-convex function we say that is a saddle point of if for all and we have the inequality .
If is in addition then is a saddle point if and only if and .
When we consider a concave-convex function we shall denote the pair in bold, and write . The full Hessian matrix will then be denoted . Vectors in and matrices acting on them will be denoted in bold font (e.g. ). Saddle points of will be denoted .
Ii-C Dynamical systems
Definition 0 (Flows and semi-flows).
A triple is a flow (resp. semi-flow) if is a metric space, is a continuous map from (resp. ) to which satisfies the two properties
For all , .
For all , (resp. ),
When there is no confusion over which (semi)-flow is meant, we shall denote as . For sets (resp. ) and we define .
Definition 0 (Global convergence).
We say that a (semi)-flow is globally convergent, if for all initial conditions , the trajectory converges to the set of equilibrium points of as , i.e.
A specific form of incremental stability, which we will refer to as pathwise stability, will be needed in the analysis that follows.
Definition 0 (Pathwise stability).
We say that a semi-flow is pathwise stable if for any two trajectories the distance is non-increasing in time.
As the subgradient method has a discontinuous vector field we need the notion of Carathéodory solutions of differential equations.
Definition 0 (Carathéodory solution).
We say that a trajectory is a Carathéodory solution to a differential equation , if is an absolutely continuous function of , and for almost all times , the derivative exists and is equal to .
Iii Problem formulation
The main object of study in Part I is the gradient method on an arbitrary concave-convex function in .
Definition 0 (Gradient method).
Given a concave-convex function on , we define the gradient method as the flow on generated by the differential equation
It is clear that the saddle points of are exactly the equilibrium points of (4).
In our companion paper  we study instead the subgradient method where the gradient method (section III) is restricted to a convex set by the addition of a projection term to the differential equation (4).
Definition 0 (Subgradient method).
Given a non-empty closed convex set and a function that is concave-convex on , we define the subgradient method on as a semi-flow on consisting of Carathéodory solutions of
Note that the gradient method is the subgradient method on . In Appendix A-A we also consider the addition of constant gains to the gradient and subgradient method.
We briefly summarise below the main contributions of this paper (Part I).
We provide an exact characterization of the limiting solutions of the gradient method (4) applied to an arbitrary concave-convex function which is not assumed to be strictly concave-convex. Despite the non-linearity of the gradient dynamics, we show that these limiting solutions solve an explicit linear ODE given by derivatives of the concave-convex function at a saddle point.
We show that the lack of convergence in gradient dynamics can lead to a problematic behaviour were arbitrarily small stochastic perturbations can lead to an unbounded variance.
We provide an exact classification of the limiting solutions of the subgradient method on affine subspaces by extending the result described in the first bullet point. This will be important for the analysis of general subgradient dynamics considered in Part II . In particular, we show in Part II that the limiting behaviour of the subgradient method on arbitrary convex domains reduces to the limiting behaviour on affine subspaces.
Iv Main Results
This section presents the main results of the paper. Before stating those we give some preliminary results.
Let be and concave-convex on , then the gradient method (4) is pathwise stable.
Because saddle points are equilibrium points of the gradient method we obtain the well known result below.
Let be and concave-convex on , then the distance of a solution of (4) to any saddle point is non-increasing in time.
By an application of LaSalle’s theorem we obtain:
Thus classifying the limiting behaviour of the gradient method reduces to the problem of finding all solutions that lie a constant distance from any saddle point. In order to facilitate the presentation of the results, for a given concave-convex function we define the following sets:
will denote the set of saddle points of .
will denote the set of solutions to (4) that are a constant distance from any saddle point of .
Note that if then section IV gives the convergence of the gradient method to a saddle point.
Our first main result is that solutions of the gradient method converge to solutions that satisfy an explicit linear ODE.
To present our results we define the following matrices of partial derivatives of
For simplicity of notation we shall state the result for ; the general case may be obtained by a translation of coordinates.
The significance of this result is discussed in the remarks below.
It should be noted that despite the non-linearity of the gradient dynamics (4), the limiting solutions solve a linear ODE with explicit coefficients depending only on the derivatives of at the saddle point.
An important consequence of this exact characterisation of the limiting behaviour, is the fact that the problem of proving global convergence to a saddle point is reduced to that of showing that there are no non-trivial limiting solutions.
Condition (8) appears to be very hard to check, as it requires knowledge of the trajectory for all times . However, when the aim is to prove convergence to an equilibrium point, the form of condition (8) makes the stability condition more powerful, as it makes it easier to prove that non-trivial trajectories do not satisfy the condition.
Remark 0 (Localisation).
The conditions in the Theorem use only local information about the concave-convex function , in the sense that if is only concave-convex on a convex subset which contains , then any trajectory of the gradient method (4) that lies a constant distance from any saddle point in and does not leave at any time will obey the conditions of the theorem.
As a simple illustration of the use of this result we show how to recover the well known result that the gradient method is globally convergent under the assumption that is strictly concave-convex.
Suppose is strictly concave (the strictly convex case is similar), then is of full rank except at isolated points, and the condition (8) can only hold if . Then the ODE (7) implies that is constant, and hence is a saddle point. Thus the only limiting solution of the gradient method are the saddle points, which establishes global convergence.
From section IV we deduce some further results that give a more easily understandable classification of the limiting solutions of the gradient method for simpler forms of .
In particular, the ‘linear’ case occurs when is a quadratic function, as then the gradient method (4) is a linear system of ODEs. In this case has a simple explicit form in terms of the Hessian matrix of at , and in general this provides an inclusion as described below, which can be used to prove global convergence of the gradient method using only local analysis at a saddle point.
Let be , concave-convex on and . Then define
where and in (6). Then with equality if is a quadratic function.
Here we draw an analogy with the recent study  on the discrete time gradient method in the quadratic case. There the gradient method is proved to be semi-convergent if and only if , i.e. if . section IV includes a continuous time version of this statement.
We next consider the effect of noise when oscillatory solutions occur, and show that arbitrarily small stochastic perturbations can lead to an unbounded variance. In particular, we consider the addition of white noise to the dynamics (4). This leads to the following stochastic differential equations
where are independent standard Brownian motions in respectively, and are positive definite symmetric matrices in respectively.
Let be concave-convex on . Let and contain a bi-infinite line. Consider the noisy dynamics (10). Then, for any initial condition, the variance of the solution tends to infinity as , in that
where denotes the expectation operator.
The condition that contains a bi-infinite line is satisfied, for example, if the set is not just a single point and is a quadratic function, and can occur in applications, e.g. in the multi-path routing example given in our companion paper .
One of the main applications of the gradient method is to the dual formulation of concave optimization problems where some of the constraints are relaxed by Lagrange multipliers. When all the relaxed constraints are linear, the Lagrangian has the form
where is a concave cost function, are the Lagrange multipliers, and are a constant matrix and vector respectively associated with the equality constraints. Under the assumption that is analytic we obtain a simple exact characterisation of . One specific case of this was studied by the authors previously in , but without the analyticity condition.
Let be defined by (12) with analytic and , constant. Assume that is a saddle point of . Then is given by
Furthermore is an affine subspace.
Iv-a The subgradient method on affine subspaces
We now extend the exact classification (section IV) to the subgradient method on affine subspaces. The significance of this result is that it allows to provide a characterization of the limiting behaviour of the subgradient method in any convex domain. In particular, one of the main results that will be proved in Part II of this work is the fact that the limiting behaviour of the subgradient method on a general convex domain are solutions to subgradient dynamics on only affine subspaces.
In order to consider subgradient dynamics on an affine subspace, we let be an affine subspace of and let be the orthogonal projection matrix onto the orthogonal complement of the normal cone . Then the subgradient method (5) on is given by
where . We generalise section IV for this projected form of the gradient method. As with the statement of section IV, we state the result for being an equilibrium point; the general case may be obtained by a translation of coordinates.
Let be an orthogonal projection matrix, be and concave-convex on , and be an equilibrium point of (14). Then the trajectories of (14) that lie a constant distance from any equilibrium point of (14) are exactly the solutions to the linear ODE:
that satisfy, for all and , the condition
where and are defined by (6).
In many applications associated with saddle point problems, the variables need to be constrained in prescribed domains. These include, for example, positivity constraints on dual variables in optimization problems where some of the inequality constraints are relaxed with Lagrange multipliers, or more general convex constraints on primal variables. Therefore applications will be studied in Part II of this work where subgradient dynamics will be analyzed.
It should be noted, that apart from their significance for saddle point problems without constraints111Note that these include also dual versions of optimization problems with equality constraints., a main significance of the results in Part I is that they also lead to a characterization of the asymptotic behaviour of subgradient dynamics. In particular, as mentioned in section IV-A, it will be proved in Part II of this work that the asymptotic behaviour of subgradient dynamics on a general convex domain, is given by solutions to subgradient dynamics on only affine subspaces, which is a class of dynamics the asymptotic behaviour of which can be exactly determined using the results in Part I.
Vi Proofs of the main results
In this section we prove the main results of the paper which are stated in section IV.
Vi-a Outline of the proofs
We first give a brief outline of the derivations of the results to improve their readability. Before we give this summary we need to define some additional notation.
Given , we denote the set of solutions to the gradient method (4) that are a constant distance from , (but not necessarily other saddle points), as . It is later proved that but until then the distinction is important.
First in subsection VI-B we use the pathwise stability of the gradient method (section IV) and geometric arguments to establish convexity properties of . subsection VI-B and subsection VI-B tell us that is convex and can only contain bi-infinite lines in degenerate cases. subsection VI-B gives an orthogonality condition between and which roughly says that the larger is, the smaller is. These allow us to prove the key result of the section, subsection VI-B, which states that any convex combination of and lies in .
To prove section IV we first prove a lemma subsection VI-C (analogous to subsection VI-B) that tells us that containing a bi-infinite line implies the presence of a quantity conserved by all solutions of the gradient dynamics (4). In the presence of noise, the variance of this quantity converges to infinity and allows us to prove section IV.
To prove section IV we construct a quantity that is conserved by solutions in . In the case considered this has a natural interpretation in terms of the utility function and the constraints .
Vi-B Geometry of and
In this section we will use the gradient method to derive geometric properties of convex-concave functions. We will start with some simple results which are then used as a basis to derive subsection VI-B the main result of this section. On the way we illustrate how the gradient method can be used to prove results (subsection VI-B and subsection VI-B) on the geometry of concave-convex functions.
Let be concave-convex on , then , the set of saddle points of , is closed and convex.
Closure follows from continuity of the derivatives of . For convexity let and lie on the line between them. Consider the two closed balls about and that meet at the single point , as in Figure 1. By section IV, is an equilibrium point as the motion of the gradient method starting from is constrained to stay within both balls. It is hence a saddle point. ∎
Let be and concave-convex on . Let the set of saddle points of contain the infinite line for some . Then is translation invariant in the direction of , i.e. for any .
We do this in two steps. First we will prove that the motion of the gradient method is restricted to linear manifolds normal to . Let be a point and consider the motion of the gradient method starting from . As illustrated in Figure 2 we pick two saddle points on , then by section IV the motion starting from is constrained to lie in the (shaded) region, which is the intersection of the two closed balls about and which have on their boundaries. The intersection of the regions generated by taking a sequence of pairs of saddle points off to infinity is contained in the linear manifold normal to .
Next we claim that for the motion starting from is exactly the motion starting from shifted by . As illustrated in Figure 3, by section IV the motion from must stay a constant distance from the motion from . This uniquely identifies the motion from and proves the claim. Finally we deduce the full result by noting that the second claim implies that is defined up to an additive constant on each linear manifold as the motion of the gradient method contains all the information about the derivatives of . As is constant on , the proof is complete. ∎
We now use these techniques to prove orthogonality results about solutions in .
Let be concave-convex on , and be a trajectory in , then for all .
If or the claim is trivial. Otherwise we let be arbitrary, and consider the spheres about and that touch . By section IV, is constrained to lie on the intersection of these two spheres which lies inside where is the line segment between and . As and were arbitrary this proves the lemma. ∎
Let be and concave-convex on , and lie in for all . Then .
If the claim is trivial. Let be arbitrary. Then by subsection VI-B the line segment between and lies in . Let be the intersection of the extension of to infinity in both directions and . Then the definition of tells us that the extension of meets at a right angle. is constant and as , which implies that is also constant (as illustrated in Figure 4). Indeed, we have
and all the terms on the right hand side are constant. ∎
Using these orthogonality results we prove the key result of the section, a convexity result between and .
Let be and concave-convex on , and . Then for any , the convex combination lies in . If in addition , then .
Clearly is a constant distance from . We must show that is also a solution to (4). We argue in a similar way to Figure 3 but with spheres instead of planes. Let the solution to (4) starting at be denoted . We must show this is equal to . As it lies on a sphere about , say of radius , and by construction lies on a smaller sphere about of radius . By section IV, and are non-increasing, so that must be within of and within of . The only such point is which proves the claim. For the additional statement, we consider another saddle point and let be the line segment connecting and . By subsection VI-B, lies in , so by construction, , (as illustrated by Figure 5). Hence, by subsection VI-B, . ∎
Let be and concave-convex on . Let . Then is constant.
Let be and concave-convex on , then is convex.
The proof is very similar to that of subsection VI-B. Let , and . Set . By subsection VI-B we know that is constant. Denote the solution of the gradient method starting from as . We must prove that and that . First we imagine two closed balls centered on and and of radii and respectively. By section IV, is constrained to lie within both of these balls. For each there is only one such point and it is exactly . Next we let be arbitrary, then is determined by and , (as illustrated by Figure 6). Indeed, we may assume by translation that , and then
which is constant for the same reason. ∎
Vi-C Classification of
We will now proceed with a full classification of and prove Theorems IV-IV. For notational convenience we will make the assumption (without loss of generality) that . Then we compute from line integrals from to . Indeed, letting be a unit vector parallel to , we have
Together with the definition of the matrices and given by (6) we obtain
We are now ready to prove the first main result.
Proof of section IV.
As is skew symmetric and is symmetric we have , so that condition (8) is equivalent to
We will prove that , and . As the other inclusion is clear this will prove the theorem.
where . If , then , and by skew-symmetry of , is constant, which means that is a constant distance from . Furthermore, the assumption that for implies that the integrand in (24) vanishes, and is a solution of the gradient method.
Step 2: . Let be arbitrary. Consider the function . By expanding in the orthonormal basis of eigenvectors of we observe that this function is a linear combination of continuous periodic functions. As, by section IV, this function is also non-increasing, it must be constant.
Step 3: . Let and which is constant. For , define , so that and . Note that the corresponding unit vector does not depend on . The convexity result subsection VI-B implies that , and is a solution of the gradient method. We shall compute the time derivative of this in two ways. First, we use (4) and (24) to obtain,
Second, we use the explicit definition of in terms of to obtain,
Differentiating with respect to we have,
The right hand side of this is independent of , which implies that the left hand side is also independent of , and is thus equal to its value at , so that
Putting this back into our expression for we find that
but as is constant, skew symmetric, and symmetric, must vanish, which, together with (29) shows that . ∎
Let be and concave-convex on and there be a saddle point which is locally asymptotically stable. Then .
By local asymptotic stability of , for some open ball about . Then by subsection VI-B, is convex, and we deduce that . ∎
The proof of subsection VI-B is now very simple.
Proof of subsection VI-B.
Using section IV we have that which has constant magnitude as is skew symmetric. ∎
To prove section IV we require the following lemma which shows the existence of a conserved quantity of the gradient dynamics.
Let be and concave-convex on . Suppose that contains a bi-infinite line . Assume that . Then is a conserved quantity for any solution of (4).
As is closed and convex (subsection VI-B) we may assume that the line passes though the origin and take . Let and note that is a solution to the gradient method (4) by section IV for any . We follow the strategy of the first part of the proof of subsection VI-B with replacing the saddle points ,. Indeed, let be any solution to (4) and let . Then for any , section IV implies that must satisfy
where by we mean that the equation holds for each of and . In the same way as in the proof of subsection VI-B, taking the intersection of these balls for a sequence we deduce that is contained in the linear manifold normal to the line through the origin and , and passing through . Indeed, by squaring (31) and expanding we obtain
By dividing through by and taking the limit we deduce that is equal to which implies that is conserved. ∎
Proof of section IV.
Consider the conserved quantity given by subsection VI-C. Applying Itō’s lemma and taking expectations, we have
where , is the total derivative along the deterministic flow (4) and is the trace operator. As is conserved along the deterministic flow, and a simple computation shows that the second term is independent of and bounded below by a strictly positive constant. Therefore grows at least linearly in time. It remains to note that , so that for a constant . This implies that also and completes the proof of the proposition. ∎
The convexity of allow us to deduce that the average position of any limiting trajectory is a saddle point.
Let be and concave-convex on and , then the average position of defined by
exists and lies inside .
That the limit exists follows from expanding into eigenmodes and noting that, as is skew symmetric, each individual limit exists.
To prove that the limit is in we consider, for , the function