On GradientBased Learning in Continuous Games ^{1}
Abstract
We introduce a general framework for competitive gradientbased learning that encompasses a wide breadth of multiagent learning algorithms, and analyze the limiting behavior of competitive gradientbased learning algorithms using dynamical systems theory. For both generalsum and potential games, we characterize a nonnegligible subset of the local Nash equilibria that will be avoided if each agent employs a gradientbased learning algorithm. We also shed light on the issue of convergence to nonNash strategies in general and zerosum games, which may have no relevance to the underlying game, and arise solely due to the choice of algorithm. The existence and frequency of such strategies may explain some of the difficulties encountered when using gradient descent in zerosum games as, e.g., in the training of generative adversarial networks. To reinforce the theoretical contributions, we provide empirical results that highlight the frequency of linear quadratic dynamic games (a benchmark for multiagent reinforcement learning) that admit global Nash equilibria that are almost surely avoided by policy gradient.
On GradientBased Learning in Continuous GamesMazumdar, Ratliff, and Sastry \firstpageno1
continuous games, gradientbased algorithms, multiagent learning
1 Introduction
With machine learning algorithms increasingly being deployed in real world settings, it is crucial that we understand how the algorithms can interact, and the dynamics that can arise from their interactions. In recent years, there has been a resurgence in research efforts on multiagent learning, and learning in games. The recent interest in adversarial learning techniques also serves to show how game theoretic tools can be being used to robustify and improve the performance of machine learning algorithms. Despite this activity, however, machine learning algorithms are still being treated as blackbox approaches and being naïvely deployed in settings where other algorithms are actively changing the environment. In general, outside of highly structured settings, there exists no guarantees on the performance or limiting behaviors of learning algorithms in such settings.
Indeed, previous work on understanding the collective behavior of coupled learning algorithms, either in competitive or cooperative settings, has mainly looked at games where the global structure is well understood like bilinear gamesSingh et al. (2000); Hommes and Ochea (2012); Mertikopoulos et al. (2018); Leslie and Collins (2005), convex games Mertikopoulos and Zhou (2019); Rosen (1965), or potential games Monderer and Shapley (1996), among many others. Such games are more conducive to the statement of global convergence guarantees since the assumed global structure can be exploited.
In games with fewer assumptions on the players’ costs, however, there is still a lack of understanding of the dynamics and limiting behaviors of learning algorithms. Such settings are becoming increasingly prevalent as deep learning is increasingly being used in game theoretic settings Goodfellow et al. (2014); Foerster et al. (2018); Abdallah and Lesser (2008); Zhang and Lesser (2010).
Gradientbased learning algorithms are extremely popular in a variety of these multiagent settings due to their versatility, ease of implementation, and dependence on local information. There are numerous recent papers in multiagent reinforcement learning that employ gradientbased methods (see, e.g.Abdallah and Lesser (2008); Foerster et al. (2018); Zhang and Lesser (2010)), yet even within this wellstudied class of learning algorithms, a thorough understanding of their convergence and limiting behaviors in general continuous games is still lacking.
Generally speaking, in both the game theory and the machine learning communities, two of the central questions when analyzing the dynamics of learning in games are the following:
 Q1.

Are all attractors of the learning algorithms employed by agents equilibria relevant to the underlying game?
 Q2.

Are all equilibria relevant to the game also attractors of the learning algorithms agents employ?
In this paper, we provide some answers to the above questions for the class of gradientbased learning algorithms by analyzing their limiting behavior in general continuous games. In particular, we leverage the continuous time limit of the more naturally discrete multiagent learning algorithms. This allows us to draw on the extensive theory of dynamical systems and stochastic approximation to make statements about the limiting behaviors of these algorithms in both deterministic and stochastic settings. The latter is particularly relevant since it is common for stochastic gradient methods to be used in multiagent machine learning contexts.
Analyzing gradientbased algorithms through the lens of dynamical systems theory has recently yielded new insights into their behavior in the classical optimization setting Wilson et al. (2016); Scieur et al. (); Lee et al. (2016). We show that a similar type of analysis can also help understand the limiting behaviors of gradientbased algorithms in games. We remark, however, that there is a fundamental difference between the dynamics that are analyzed in much of the singleagent, gradientbased learning and optimization literature and the ones we analyze in the competitive multiagent case: the combined dynamics of gradientbased learning schemes in games do not necessarily correspond to a gradient flow. This may seem a subtle point, but it it turns out to be extremely important.
Gradient flows admit desirable convergence guarantees—e.g., almost sure convergence to local minimizers—due to the fact that they preclude flows with the worst geometries Pemantle (2007). In particular, they do not exhibit nonequilibrium limiting behavior such as periodic orbits. Gradientbased learning in games, on the other hand, does not preclude such behavior. Moreover, as we show, asymmetry in the dynamics of gradientplay in games can lead to surprising behaviors such as nonrelevant limiting behaviors being attracting under the flow of the game dynamics and relevant limiting behaviors, such as a subset of the Nash equilibria being almost surely avoided.
1.1 Related Work
The study of continuous games is quite extensive (see e.g. Basar and Olsder (1998); Osborne (1994)), though in large part the focus has been on games admitting a fair amount of structure. The behavior of learning algorithms in games is also wellstudied (see e.g. Fudenberg and Levine (1998)). In this section, we comment on the most relevant prior work and defer a more comprehensive discussion of our results in the context of prior work to Section 6.
As we noted, previous work on learning in games in both the game theory literature, and more recently from the machine learning community, has largely focused on addressing (Q1) whether all attractors of the learning dynamics are gamerelevant equilibria, and (Q2) whether all gamerelevant equilibria are also attractors of the learning dynamics. The primary type of gamerelevant equilibrium considered in the investigation of these two questions is a Nash equilibrium.
The majority of the existing work has focused on Q1. In fact, a large body of prior work focuses on games with structures that preclude the existence of nonNash equilibria. Consequently, answering Q1 reduces to analyzing the convergence of various learning algorithms (including gradientplay) to the unique Nash equilibrium or the set of Nash equilibria. This is often shown by exploiting the game structure. Examples of classes of games where gradientplay has been wellstudied are potential games Monderer and Shapley (1996), concave or monotone games Rosen (1965); Bravo et al. (2018); Mertikopoulos and Zhou (2019), and gradientplay over the space of stochastic policies in twoplayer finiteaction bilinear games Singh et al. (2000). In the latter setting, other gradientlike algorithms such as multiplicative weights have also been studied fairly extensively Hommes and Ochea (2012), and have been shown to converge to cycling behaviors.
Some works have also attempted to address Q1 in the context of gradientplay in twoplayer zerosum games. Concurrently with this paper, for a general class of “sufficiently smooth” twoplayer, zerosum games it was shown that there exists stationary points for gradientplay that are nonNash Daskalakis and Panageas (2018)
There is also related work in more general games on the analysis of when Nash equilibria are attracting for gradientbased approaches (i.e. Q2). Sufficient conditions for this to occur are the conditions for stable differential Nash equilibria introduced in Ratliff et al. (2013, 2014, 2016) and the condition for variational stability later analyzed in Mertikopoulos and Zhou (2019). We remark that these conditions are equivalent for the classes of games we consider. Neither of these works give conditions under which Nash equilibria are avoided by gradientplay or comment on other attracting behaviors.
Expanding on this rich body of literature (only the most relevant of which is covered in our short review), in this paper we provide answers to Q1 without imposing structure on the game outside regularity conditions on the cost functions by exploiting the observation that gradientbased learning dynamics are not gradient flows. We also provide answers to Q2 by demonstrating that a nontrivial set of games admit Nash equilibria that are almost surely avoided by gradientplay. We give explicit conditions for when this occurs. Using similar analysis tools, we also provide new insights into the behavior of gradientbased learning in structured classes of games such as zerosum and potential games.
1.2 Contributions and Organization
We present a general framework for modeling competitive gradientbased learning that applies to a broad swath of learning algorithms. In Section 3, we draw connections between the limiting behavior of this class of algorithms and gametheoretic and dynamical systems notions of equilibria. In particular, we construct generalsum and zerossum games that admit nonNash attracting equilibria of the gradient dynamics. Such points are attracting under the learning dynamics, yet at least one player—and potentially all of them—has a direction in which they could unilaterally deviate to decrease their cost. Thus, these nonNash equilibria are of questionable game theoretic relevance and can be seen as artifacts of the players’ algorithms.
In Section 4, we show that policy gradient multiagent reinforcement learning (MARL), generative adversarial networks (GANs), gradientbased multiagent multiarmed bandits, among several other common multiagent learning settings, conform to this framework. The framework is amenable to tools for analysis from dynamical systems theory.
Also in Section 4, we show that a subset of the local Nash equilibria in generalsum games and potential games is avoided almost surely when each player employs a gradientbased algorithm. We show that this holds in two broad settings: the full information setting when each player has oracle access to their gradient but randomly initializes their first action, and a partial information setting where each player has access to an unbiased estimate of their gradient.
Thus, we provide a negative answer to both Q1 and Q2 for –player generalsum games, and highlight the nuances present in zerosum and potential games. We also show that the dynamics formed from the individual gradients of agents’ costs are not gradient flows. This in turn implies that competitive gradientbased learning in generalsum games may converge to periodic orbits and other nontrivial limiting behaviors that arise in, e.g., chaotic systems.
To support the theoretical results, we present empirical results in Section 5 that show that policy gradient algorithms avoid global Nash equilibria in a large number of linear quadratic (LQ) dynamic games, a benchmark for MARL.
We conclude in Section 6 with a discussion of the implications of our results and some links with prior work as well as some comments on future directions.
2 Preliminaries
Consider agents indexed by . Each agent has their own decision variable , where is their finitedimensional strategy space of dimension . Define to be the finitedimensional joint strategy space with dimension . Each agent is endowed with a cost function with and such that where we use the notation to make the dependence on the action of the agent , and the actions of all agents excluding agent , explicit. The agents seek to minimize their own cost, but only have control over their own decision variable . In this setup, agents’ costs are not necessarily aligned with one another, meaning they are competing.
Given the game , agents are assumed to update their strategies simultaneously according to a gradientbased learning algorithm of the form
(1) 
where is agent ’s stepsize at iteration .
We analyze the following two settings:

Agents have oracle access to the gradient of their cost with respect to their own choice variable—i.e. where denotes the derivative of with respect to .

Agents have an unbiased estimator of their gradient—i.e., where is a zero mean, finite variance stochastic process.
We refer to the former setting as deterministic gradientbased learning and the latter setting as stochastic gradientbased learning. Assuming that all agents are employing such algorithms, we aim to analyze the limiting behavior of the agents’ strategies. To do so, we leverage the following gametheoretic notion of a Nash equilibrium.
A strategy is a local Nash equilibrium for the game if, for each , there exists an open set such that that and for all . If the above inequalities are strict, then we say is a strict local Nash equilibrium.
The focus on local Nash equilibria is due to our lack of assumptions on the agents’ cost functions. If for each , then a local Nash equilibrium is a global Nash equilibrium. This holds in e.g the bimatrix games and the linear quadratic games we analyze in Section 5. Depending on the agents’ costs, a game may admit anywhere from one to a continuum of local or global Nash equilibria; or none at all.
3 Linking Games and Dynamical Systems
In this section, we draw links between the limiting behavior of dynamical systems and gametheoretic notions of equilibria in three broad classes of continuous games. For brevity, the proofs of the propositions in this section are supplied in Appendix A. A highlevel summary of the links we draw is shown in Figure 1.
Define to be the vector of player derivatives of their own cost functions with respect to their own choice variables. When each player is employing a gradientbased learning algorithm, the joint strategy of the players, (in the limit as the agents’ stepsizes go to zero) follows the differential equation
A point is said to be an equilibrium, critical point, or stationary point of the dynamics if . Stationary points of are joint strategies from which, under gradientplay, the agents do not move. We note that is a necessary condition for a point to be a local Nash equilibrium Ratliff et al. (2016). Hence, all local Nash equilibria are critical points of the joint dynamics .
Central to dynamical systems theory is the study of limiting behavior and its stability properties. A classical result in dynamical systems theory allows us to characterize the stability properties of an equilibrium by analyzing the Jacobian of the dynamics at . The Jacobian of is defined by
Since is a matrix of second derivatives, it is sometimes referred to as the ‘game Hessian’. Similar to the Hessian matrix of a gradient flow, allows us to further characterize the critical points of by their properties under the flow of . Let for denote the eigenvalues of at where —that is, is the eigenvalue with the smallest real part. Of particular interest are asymptotically stable equilibria.
A point is a locally asymptotically stable equilibrium of the continuous time dynamics if and for all .
Locally asymptotically stable equilibria have two properties of interest. First, they are isolated, meaning that there exists a neighborhood around them in which no other equilibria exist. Second, they are exponentially attracting under the flow of , meaning that if agents initialize in a neighborhood of a locally asymptotically stable equilibrium and follow the dynamics described by , they will converge to exponentially fast Sastry (1999). This, in turn, implies that a discretized version of , namely
(2) 
converges locally for appropriately selected step size at a rate of . Such results motivate the study of the continuous time dynamical system in order to understand convergence properties of gradientbased learning algorithms of the form (1).
Another important class of critical points of a dynamical system are saddle points. {definition} A point is a saddle point of the dynamics if and is such that . A saddle point such that for and for with is a strict saddle point of the continuous time dynamics .
Strict saddle points are especially relevant to our analysis since their neighborhoods are characterized by stable and unstable manifolds Sastry (1999). When the agents evolve according to the dynamics solely on the stable manifold, they converge exponentially fast to the critical point. However, when they evolve solely on the unstable manifold, they diverge from the equilibrium exponentially fast. Agents whose strategies lie on the union of the two manifolds asymptotically avoid the equilibrium. We make use of this general fact in Section 4.1.
To better understand the links between the critical points of the gradient dynamics and the Nash equilibria of the game, we make use of an equivalent characterization of strict local Nash that leverages first and second order conditions on player cost functions. This makes them simpler objects to link to the various dynamical systems notions of equilibria than local Nash equilibria.
[Ratliff et al. (2013, 2016)] A point is a differential Nash equilibrium for the game defined by if and for each .
In Ratliff et al. (2014), it was shown that local Nash equilibria are generically differential Nash equilibria where (i.e., is nondegenerate). Thus, in the space of games where the agents’ costs are at least twice differentiable, the set of games that admit local Nash equilibria that are not nondegenerate differential Nash equilibria is of measure zero Ratliff et al. (2014). In Ratliff et al. (2014) it was also shown that nondegenerate Nash equilibria are structurally stable, meaning that small perturbations to the agents’ costs functions will not change the fundamental nature of the equilibrium. This also implies that gradientplay with slightly biased estimators of the gradient will not have vastly different behaviors in neighborhoods of equilibria.
Given these different equilibrium notions of the learning dynamics and the underlying game, let us define the following sets which will be useful in stating the results in the following sections. For a game , denote the sets of strict saddle points and locally asymptotically stable equilibria of the gradient dynamics, , as and , respectively, where we recall that . Similarly, denote the set of local Nash equilibria, differential Nash equilibria, and nondegenerate differential Nash equilibria of as , , and , respectively. As previously mentioned, in almost all continuous games. The key takeaways of this section are summarized in Figure 1.
3.1 Generalsum games
We first analyze the properties of local Nash equilibria under the joint gradient dynamics in player generalsum games.
A nondegenerate differential Nash equilibrium is either a locally asymptotically stable equilibrium or a strict saddle point of —i.e., . Locally asymptotically stable differential Nash equilibria satisfy the notion of variational stability introduced in Mertikopoulos and Zhou (2019). In fact, a simple analysis shows that the definitions of variationally stable equilibria and locally asymptotically stable differential Nash equilibria Ratliff et al. (2013) are equivalent in the games we consider—i.e., games where each players’ cost is at least twice continuously differentiable. We remark that, from the definition of asymptotic stability, the gradient dynamics have a convergence rate in the neighborhood of such equilibria.
An important point to make is that not every locally asymptotically stable equilibrium of is a nondegenerate differential Nash equilibrium. Indeed, the following proposition provides an entire class of games whose corresponding gradient dynamics admit locally asymptotically stable equilibria that are not local Nash equilibria. {proposition} In the class of generalsum continuous games, there exists a continuum of games containing games such that , and moreover, . {proof} Consider a two player game on where
for constants . The Jacobian of is given by
(3) 
If and , then the unique stationary point is neither a differential Nash nor a local Nash equilibria since the necessary conditions are violated (i.e., ). However, if and , the eigenvalues of have positive real parts and is asymptotically stable. Further, this clearly holds for a continuum of games. Thus, the set of locally asymptotically stable equilibria that are not Nash equilibria may be arbitrarily large.
The, preceding proposition shows that there exists attracting critical points of the gradient dynamics in generalsum continuous games that are not Nash equilibria and may not be even relevant to the game. Thus, this provides a negative answer to Q2 (whether all attracting equilibria in generalgames are gamerelevant for the learning dynamics).
We note that, by definition, the nonNash locally asymptotically stable equilibria (or nonNash equilibria) do not satisfy the secondorder conditions for Nash equilibria. Thus, at these joint strategies, at least one player – and maybe all of them – has a direction in which they would unilaterally deviate if they were not using gradient descent. As such, we view convergence to these points to be undesirable.
3.2 Zerosum games
Let us now restrict our attention to twoplayer zerosum games, which often arise when training GANs, in adversarial learning, and in MARL Goodfellow et al. (2014); Omidshafiei et al. (2017); Chivukula and Liu (2017). In such games, one player can be seen as minimizing with respect to their decision variable and the other as minimizing with respect to theirs. The following proposition shows that all differential Nash equilibria in twoplayer zerosum games are locally asymptotically stable equilibria under the flow of .
For an arbitrary twoplayer zerosum game, on , if is a differential Nash equilibrium, then is both a nondegenerate differential Nash equilibrium and a locally asymptotically stable equilibrium of —that is, .
This result guarantees that the differential Nash equilibria of zerosum games are isolated and exponentially attracting under the flow of . This in turn guarantees that simultaneous gradientplay has a local linear rate of convergence to all local Nash equilibria in all zerosum continuous games. Thus, the answer to Q1 is the context of zerosum games is “yes”, since all Nash equilibria are attracting for the gradient dynamics.
The converse of the preceding proposition, however, is not true. Not every locally asymptotically stable equilibrium in twoplayer zerosum games are nondegenerate differential Nash equilibria. Indeed, there may be many locally asymptotically stable equilibria in a zerosum game that are not local Nash equilibria. The following proposition highlights this fact. {proposition} In the class of zerosum continuous games, there exists a continuum of games such that for each game , . {proof} Consider the twoplayer zerosum game on where
and . The Jacobian of is given by
If and , then has eigenvalues with strictly positive real part, but the unique stationary point is not a differential Nash equilibrium—since —and, in fact, is not even a Nash equilibrium. Indeed,
Thus, there exists a continuum of zerosum games with a large set of locally asymptotically stable equilibria of the corresponding dynamics that are not differential Nash.
The, preceding proposition again shows that there exists nonNash equilibria of the gradient dynamics in zerosum continuous games. Thus, this proposition also provides a negative answer to Q2 in the context of zerosum games.
3.3 Potential Games
One last set of games with interesting connections between the Nash equilibria and the critical points of the gradient dynamics is the class known as potential games. This particularly nice class of games are ones for which corresponds to a gradient flow under a coordinate transformation—that is, there exists a function (commonly referred to as the potential function) such that for each , . We remark that due to the equivalence this class of games is sometimes referred to as an exact potential game. Note that a necessary and sufficient condition for to be a potential game is that is symmetric Monderer and Shapley (1996)—that is, . This gives potential games the desirable property that the only locally asymptotically stable equilibria of the gradient dynamics are local Nash equilibria.
For an arbitrary potential game, on , if is a locally asymptotically stable equilibrium of (i.e., ), then is a nondegenerate differential Nash equilibrium (i.e., ).
The full proof of Proposition 3.3 is supplied in Appendix A. The preceding proposition rules out nonNash locally asymptotically stable equilibria of the gradient dynamics in potential games, and implies that every local minimum of a potential game must be a local Nash equilibrium. Thus, in potential games, unlike in generalsum and zerosum games, the answer to Q2 is positive. However, the following proposition shows that the existence of a potential function is not enough to rule out local Nash equilibria that are saddle points of the dynamics. {proposition} In the class of continuous games, there exist a continuum of potential games containing games that admit Nash equilibria that are saddle points of the dynamics —i.e., such that for some , . {proof} Consider the game on described by
where . The Jacobian of is given by
If , then is a local Nash equilibrium. However, if , has one positive and one negative eigenvalue and is a saddle point of the gradient dynamics. Thus, there exists a continuum of potential games where a large set of differential Nash equilibria are strict saddle points of .
Proposition 3.3 demonstrates a surprising fact about potential games. Even though all minimizers of the potential function must be local Nash equilibria, not all local Nash equilibria are minimizers of the potential function.
3.4 Main Takeaways
The main takeaways of this section are summarized in Figure 1. We note that for zerosum games, Proposition 3.2 shows that . Since the inclusion is strict, the answer to Q2 in such games is “no”. For generalsum games, Proposition 3.1 allows us to to conclude that there do exist attracting, nonNash equilibria. Thus, the answer to Q2 is also “no”. In potential games, since the answer is “yes”.
In the following sections, we provide answers to Q1 by showing that all local Nash equilibria in are avoided almost surely by gradientbased algorithms in both the deterministic and stochastic settings. In particular, since in potential and generalsum games, one cannot give a positive answer to Q1 in either of these classes of games.
4 Convergence of GradientBased Learning
In this section, we provide convergence and nonconvergence results for gradientbased algorithms. We also include a highlevel overview of wellknown algorithms that fit into the class of learning algorithms we consider; more detail can be found in Appendix C.
4.1 Deterministic Setting
We first address convergence to equilibria in the deterministic setting in which agents have oracle access to their gradients at each time step. This includes the case where agents know their own cost functions and observe their own actions as well as their competitors’ actions—and hence, can compute the gradient of their cost with respect to their own choice variable.
Since we have assumed that each agent has their own learning rate (i.e. step sizes ), the joint dynamics of all the players are given by
(4) 
where with and elementwise. By a slight abuse of notation, is defined to be elementwise multiplication of and where is multiplied by the first components of , is multiplied by the next components, and so on.
We remark that this update rule immediately distinguishes gradientbased learning in games from gradient descent. By definition, the dynamics of gradient descent in singleagent settings always correspond to gradient flows —i.e evolves according to an ordinary differential equation of the form for some function . Outside of the class of exact potential games we defined in Section 3, the dynamics of players’ actions in games are not afforded this luxury—indeed, is not in general symmetric (which is a necessary condition for a gradient flow). This makes the potential limiting behaviors of highly nontrivial to characterize in generalsum games.
The structure present in a gradientflow implies strong properties on the limiting behaviors of . In particular, it precludes the existence of limit cycles or periodic orbits (limiting behaviors of dynamical systems where the state of system cycles infinitely through a set of states with a finite period) and chaos (an attribute of nonlinear dynamical systems where the system’s behavior can vary extremely due to slight changes in initial position) Sastry (1999). We note that both of these behaviors can occur in the dynamics of gradientbased learning algorithms in games
Despite the wide breadth of behaviors that gradient dynamics can exhibit in competitive settings, we are still make statements about convergence (and nonconvergence) to certain types of equilibria. To do so, we first make the following standard assumptions on the smoothness of the cost functions and the magnitude of the agents’ learning rates .
Assumption 1
For each , with , , and where is the induced norm.
Given these assumptions, the following result rules out converging to strict saddle points.
Let and satisfy Assumption 1. Suppose that is open and convex. If , the set of initial conditions from which competitive gradientbased learning converges to strict saddle points is of measure zero.
We remark that the above theorem holds for in particular, since holds trivially in this case. It is also important to note that, as we point out in Section 3, local Nash equilibria can be strict saddle points. Thus, all local Nash equilibria that are strict saddle points for are avoided almost surely by gradientplay even with oracle gradient access and random initializations. This holds even when players randomly initialize uniformly in an arbitrarily small ball around such Nash equilibria. In Section 5, we show that many linear quadratic dynamic games have a strict saddle point as their global Nash equilibrium. For brevity, we provide the proof of Theorem 4.1 in Appendix A, and provide a proof sketch below.
[Proof sketch of Theorem 4.1] The core of the proof is the celebrated stable manifold theorem from dynamical systems theory, presented in Theorem A.2. We construct the set of initial positions from which gradientplay will converge to strict saddle points and then use the stable manifold theorem to show that the set must have measure zero in the players’ joint strategy space. Therefore, with a random initialization players will never evolve solely on the stable manifold of strict saddles and they will consequently diverge from such equilibria.
To be able to invoke the stable manifold theorem, we first show that the mapping is a diffeomorphism, which is nontrivial due to the fact that we have allowed each agent to have their own learning rate and is not symmetric. We then iteratively construct the set of initializations that will converge to strict saddle points under the game dynamics. By the stable manifold theorem, and the fact that is a diffeomorphism, the stable manifold of a strict saddle point must be measure zero. Then, by induction we show that the set of all initial points that converge to a strict saddle point must also be measure zero.
In potential games we can strengthen the above nonconvergence result and give convergence guarantees. {corollary} Consider a potential game on open, convex and where each for . Let be a prior measure with support which is absolutely continuous with respect to the Lebesgue measure and assume exists. Then, under Assumption 1, competitive gradientbased learning converges to nondegenerate differential Nash equilibria almost surely. Moreover, the nondegenerate differential Nash to which it converges is generically a local Nash equilibrium.
Corollary 4.1 guarantees that in potential games, gradientplay will converge to a differential Nash equilibrium. Combining this with Theorem 4.1 guarantees that the differential Nash equilibrium it converges to is a local minimizer of the potential function. A simple implication of this result is that gradientbased learning in potential games cannot exhibit limit cycles or chaos.
Of note is the fact that the agents do not need to be performing gradientbased learning on to converge to Nash almost surely. That is, they do not need to know the function ; they simply need to follow the derivative of their own cost with respect to their own choice variable, and they are guaranteed to converge to a local Nash equilibrium that is a local minimizer of the potential function.
We note that convergence to Nash equilibria is a known characteristic of gradientplay in potential games. However, our analysis also highlights that gradientplay will avoid a subset of the Nash equilibria of the game. This is surprising given the particularly strong structural properties of such games. The proof for Corollary 4.1 is provided in Appendix A and follows from Proposition 3.3, Theorem 4.1, and the fact that is symmetric in potential games.
Implications and Interpretation of Convergence Analysis
Both Theorem 4.1 and Corollary 4.1 show that gradientplay in multiagent settings avoids strict saddles almost surely even in the deterministic setting. Combined with the analysis in Section 3 which shows that (local) Nash equilibria can be strict saddles of the dynamics for generalsum games, this implies that a subset of the Nash equilibria are almost surely avoided by individual gradientplay, a potentially undesirable outcome in view of Q1 (whether all Nash equilibria are attracting for the learning dynamics). In Section 5, we show that the global Nash equilibrium is a saddle point of the gradient dynamics in a large number of randomly sampled LQ dynamic games. This suggests that policy gradient algorithms may fail to converge in such games, which is highly undesired. This is in stark contrast to the single agent setting where policy gradient has been shown to converge to the unique solution of LQR problems Fazel et al. (2018).
In Section 3, we also showed that local Nash equilibria of potential games can be strict saddles points of the potential function. Nonconvergence to such points in potential games is not necessarily a bad result since this in turn implies convergence to a local minimizer of the potential function (as shown in Lee et al. (2016); Panageas and Piliouras (2016)) which are guaranteed to be local Nash equilibria of the game. However, these results do imply that one cannot answer “yes” to Q1 in potential games since some of the Nash equilibria are not attracting under gradientplay.
In zerosum games, where local Nash equilibria cannot be strict saddle points of the gradient dynamics, our result suggests that eventually gradientbased learning algorithms will escape saddle points of the dynamics.
The almost sure avoidance of all equilibria that are saddle points of the dynamics further implies that if (3) converges to a critical point , then —i.e., is locally asymptotically stable for . This may not be a desired property however, since we showed in Section 3 that zerosum and generalsum games both admit nonNash LASE.
Since gradientplay in games generally does not result in a gradient flow, other types of limiting behaviors such as limit cycles can occur in gradientbased learning dynamics. Theorem 4.1 says nothing about convergence to other limiting behaviors. In the following sections we prove that the results described in this section extend to the stochastic gradient setting. We also formally define periodic orbits in the context of dynamical systems and state stronger results on avoidance of some more complex limiting behaviors like linearly unstable limit cycles.
4.2 Stochastic Setting
We now analyze the stochastic case in which agents are assumed to have an unbiased estimator for their gradient. The results in this section allow us to extend the results from the deterministic setting to a setting where each agent builds an estimate of the gradient of their loss at the current set of strategies from potentially noisy observations of the environment. Thus, we are able to analyze the limiting behavior of a class of commonly used machine learning algorithms for competitive, multiagent settings. In particular, we show that agents will almost surely not converge to strict saddle points. In Appendix B.1, we show that the gradient dynamics will actually avoid more general limiting behaviors called linearly unstable cycles which we define formally.
To perform our analysis, we make use of tools and ideas from the literature on stochastic approximations (see e.g Borkar (2008)). We note that the convergence of stochastic gradient schemes in the singleagent setting has been extensively studied Robbin (1971); Pemantle (1990); Bottou (2010); Mertikopoulos and Staudigl (2018). We extend this analysis to the behavior of stochastic gradient algorithms in games.
We assume that each agent updates their strategy using the update rule
(5) 
for some zeromean, finitevariance stochastic process . Before presenting the results for the stochastic case, let us comment on the different learning algorithms that fit into this framework.
Examples of Stochastic GradientBased Learning
Class  Gradient Learning Rule 

GradientPlay  
GANs  
MA Policy Gradient  
Individual Qlearning  
MA Gradient Bandits  , 
MA Experts  , 
The stochastic gradientbased learning setting we study is general enough to include a variety of commonly used multiagent learning algorithms. The classes of algorithms we include is hardly an exhaustive list, and indeed many extensions and altogether different algorithms exist that can be considered members of this class. In Table 1, we provide the gradientbased update rule for six different example classes of learning problems: (i) gradientplay in noncooperative continuous games, (ii) GANs, (iii) multiagent policy gradient, (iv) individual Qlearning, (v) multiagent gradient bandits, and (vi) multiagent experts. We provide a detailed analysis of these different algorithms including the derivation of the gradientbased update rules along with some interesting numerical examples in Appendix C. In each of these cases, one can view an agent employing the given algorithm as building an unbiased estimate of their gradient from their observation of the environment.
For example, in multiagent policy gradient (see, e.g., (Sutton and Barto, 2017, Chapter 13)), agents’ costs are defined as functions of a parameter vector that parameterize their policies . The parameters are agent ’s choice variable. By following the gradient of their loss function, they aim to tune the parameters in order to converge to an optimal policy . Perhaps surprisingly, it is not necessary for agent to have access to or even in order for them to construct an unbiased estimate of the gradient of their loss with respect to their own choice variable as long as they observe the sequence of actions, say , of all other agents generated. These actions are implicitly determined by the other agents’ policies . Hence, in this case if agent observes , where are the reward, action, and state of agent , then this is enough to construct an unbiased estimate of their gradient. We provide further details on multiagent policy gradient in Appendix C.
Stochastic Gradient Results
Returning to the analysis of (5), we make the following standard assumptions on the noise processes Robbin (1971); Robbins and Siegmund (1985).
Assumption 2
The stochastic process satisfies the assumptions , and a.s., for , where is an increasing family of fields—i.e. filtration, or history generated by the sequence of random variables—given by .
We also make new assumptions on the players’ stepsizes. These are standard assumptions in the stochastic approximation literature and are needed to ensure that the noise processes are asymptotically controlled.
Assumption 3
For each , with , is –Lipschitz with , the stepsizes satisfy for all and and , and a.s.
Let and denotes the inner product. The following theorem extends the results of Theorem 4.1 to the stochastic gradient dynamics in games. {theorem} Consider a game on . Suppose each agent adopts a stochastic gradient algorithm that satisfies Assumptions 2 and 3. Further, suppose that for each , there exists a constant such that for every unit vector . Then, competitive stochastic gradientbased learning converges to strict saddle points of the game on a set of measure zero. The proof follows directly from showing that (5) satisfies Theorem A.3, provided the assumptions of the theorem hold. The assumption that rules out degenerate cases where the noise forces the stochastic dynamics onto the stable manifold of strict saddle points.
Theorem 4.2.2 implies that the dynamics of stochastic gradientbased learning defined in (5), have the same limiting properties as the deterministic dynamics visàvis saddle points. Thus, the implications described in Section 4.1.1 extend to the stochastic gradient setting. In particular, stochastic gradientbased algorithms will avoid a nonnegligible subset of the Nash equilibria in generalsum and potential games. Further, in zerosum and generalsum games, if the players fo converge to a critical point, that point may be a nonNash equilibrium.
Further Convergence Results for Stochastic GradientPlay in Games
As we demonstrated in Section 4.1, outside of potential games, the dynamics of gradientbased learning algorithms in games are not gradient flows. As such, the players’ actions can converge to more complex sets than simple equilibria. A particularly prominent class of limiting behaviors for dynamical systems are known as limit cycles (see e.g Sastry (1999)). Limit cycles (or periodic orbits) are sets of states such that each state is visited at periodic intervals ad infinitum under the dynamics. Thus, if the gradientbased algorithms converge to a limit cycle they will cycle infinitely through the same sequence of actions. Like equilibria, limit cycles can be stable or unstable under the dynamics , meaning that the dynamics can either converge to or diverge from them depending on their initializations.
We remark that the existence of oscillatory behaviors and limit cycles has been observed in the dynamics of of gradientbased learning in various settings like the training of Generative Adversarial Networks Daskalakis et al. (2017), and multiplicative weights in finite action games Mertikopoulos et al. (2018). We simply emphasize that the existence of such limiting behaviors is due to the fact that the dynamics are no longer gradient flows. This fact also allows for other complex limiting behaviors like chaos
In Appendix B.1, we formalize the notion of a limit cycle and its stability in the stochastic setting. Using these concepts, we then provide an analogous theorem to Theorem 4.2.2 which states that competitive stochastic gradientbased learning converges to linearly unstable limit cycles—a parallel notion to strict saddle points but pertaining to more general limit sets—on a set of measure zero, provided that analogous assumptions to those in the statement of Theorem 4.2.2 hold. Providing such guarantees requires a bit more mathematical formalism, and as such we leave the details of these results to Appendix B.
In pursuit of a more general class of games with desirable convergence properties, in Appendix B.2 we also introduce a generalization of potential games, namely MorseSmale games, for which the combined gradient dynamics correspond to a MorseSmale vector field Hirsch (1976); Palis and Smale (1970). In such games players are guaranteed to converge to only (linearly stable) cycles or equilibria. In such games, however, players may still converge to nonNash equilibria and avoid a subset of the Nash equilibria.
5 Saddle Point LNE in LQ Dynamic Games
In this section, we present empirical results that show that a nonnegligible subset of twoplayer LQ games have local Nash equilibria that are strict saddle points of the gradient dynamics. LQ games serve as good benchmarks for analyzing the limiting behavior of gradientplay in a nontrivial setting since they are known to admit global Nash equilibria that can be found be solving a coupled set of Riccati equations Basar and Olsder (1998). LQ games can also be cast as multiagent reinforcement learning problems where each agent has a policy that is a linear function of the state and a quadratic reward function. Gradientplay in LQ games can therefore be seen as a form of policy gradient.
The empirical results we now present imply that, even in the relatively straightforward case of linear dynamics, linear feedback policies, and quadratic costs, policy gradient multiagent reinforcement learning would be unable to find the local Nash equilibrium in a nonnegligible subset of problems.
LQ game setup For simplicity, we consider twoplayer LQ games in . Consider a discrete time dynamical system defined by
(6) 
where is the state at time , and are the control inputs of players and , respectively, and , , and are the system matrices. We assume that player searches for a linear feedback policy of the form that minimizes their loss which is given by
where and are the cost matrices on the state and input, respectively. We note that the two players are coupled through the dynamics since is constrained to obey the update equation (6). The vector of player derivatives is given by where
Note that there is a slight abuse of notation here as we are treating as a matrix and as the vectorization of a matrix. The matrices and can be found by solving the Riccati equations
for a given . As shown in Basar and Olsder (1998), global Nash equilibria of LQ games can be found by solving coupled Ricatti equations. Under the following assumption, this can be done using an analogous method to the method of Lyapunov iterations outlined in Li and Gajic (1995) for continuous time LQ games.
Assumption 4
Either or is stabilizabledetectable.
Generating LQ games with strict saddle point Nash equilibria Without loss of generality, we assume is stabilizabledetectable. Given that we have a method of finding the global Nash equilibrium of the LQ game, we now present our experimental setup.
We fix , , , and and parametrize , and by and respectively. The shared dynamics matrix has entries that are sampled from the uniform distribution supported on . For each value of the parameters , , and , we randomly sample different matrices. Then, for each LQ game defined in terms of each of the sets of parameters, we find the optimal feedback matrices using the method of Lyapunov iterations, and we numerically approximate using autodifferentiation tools and check its eigenvalues.
The exact values of the matrices are defined as follows: with each of the entries sampled from the uniform distribution on ,
The results for various combinations of the parameters and are shown in Figure 2. For all of the different parameter configurations considered, we found that in anywhere from of the randomly sampled LQ games, there was a global Nash equilibrium that was a strict saddle point of the gradient dynamics. Of particular interest is the fact that for all values of and we tested, at least of the LQ games had a global Nash equilibrium with the strict saddle property. In the worst case, around of the LQ games for the given values of and admitted such Nash equilibria.
These empirical observations imply that multiagent policy gradient, even in the relatively straightforward setting of linear dynamics, linear policies, and quadratic costs, has no guarantees of convergence to the global Nash equilibria in a nonnegligible number of games. Further investigation is warranted to validate this fact theoretically. This in turn supports the idea that for more complicated cost functions, policy classes, and dynamics, local Nash equilibria with the strict saddle property are likely to be very common.
6 Discussion and Future Directions
In this paper we provided answers to the following two questions for classes of gradientbased learning algorithms:
 Q1.

Are all attractors of the learning algorithms employed by agents equilibria relevant to the underlying game?
 Q2.

Are all equilibria relevant to the game also attractors of the learning algorithms agents employ?
We answered these questions in generalsum, zerosum, and potential games without imposing structure on the game outside regularity conditions on the cost functions by exploiting the observation that gradientbased learning dynamics are not gradient flows. Our analysis, was shown in Section C to apply to a number of commonly used methods in multiagent learning.
6.1 Links with Prior Work
As we noted, previous work on learning in games in both the game theory literature, and more recently from the machine learning community, has largely focused on Q1, though some recent work has analyzed Q2 in the setting of zerosum games.
In the seminal work by Rosen Rosen (1965), –player concave or monotone games are shown to either admit a unique Nash equilibrium or a continuum of Nash equilibria, all of which are attracting under gradientplay. The structure present in these games rules out the existence of nonNash equilibria.
Twoplayer, finiteaction bilinear games have also been extensively studied. In Singh et al. (2000), the authors investigate the convergence of the gradient dynamics in such games. Additionally, the dynamics of other (non gradientbased) algorithms like multiplicative weights have been studied in Hommes and Ochea (2012) among many others. In such settings, the structure guarantees that there exists a unique global Nash equilibrium and no other critical points of the gradient dynamics. As such, nonNash equilibria, cannot exist.
In the study of learning dynamics in the class of zerosum games, it has been shown that cycles can be attractors of the dynamics (see, e.g., Mertikopoulos et al. (2018); Wesson and Rand (2016); Hommes and Ochea (2012)). Concurrently with our results, Daskalakis and Panageas (2018) also showed the existence of nonNash attracting equilibria in this setting.
In more general settings, there has been some analysis of the limiting behavior of gradientplay though the focus has been for the most part, on giving sufficient conditions under which Nash equilibria are attracting under gradientplay. For example, Ratliff et al. (2013, 2014, 2016), introduced the notion of a differential Nash equilibrium which is characterized by first and second order conditions on the players’ individual cost functions and which we made extensive use of. Following this body of work, Mertikopoulos and Zhou (2019) also investigated the local convergence of gradientplay in continuous games. They showed that if a Nash equilibrium satisfies a property known as variational stability, the equilibrium is attracting under gradient play. In twice continuously differentiable games, this condition coincides exactly with the definition of stable differential Nash equilibria. Though these works analyze a general class of games, the focus of the analysis is solely on the local characterization and computation (via gradient play) of local Nash equilibria. As such, the issues of nonconvergence that we show in this paper were not discussed.
6.2 Open Questions
Our results suggest that gradientplay in multiagent settings has fundamental problems. Depending on the players’ costs, in general games and even potential games, which have a particularly nice structure, a subset of the Nash equilibria will be almost surely avoided by gradientbased learning when the agents randomly initialize their first action. In zerosum and generalsum games, even if the algorithms do converge, they may have converged to a point that has no game theoretic relevance, namely a nonNash locally asymptotically stable equilibrium.
Lastly, these results show that limit cycles persist even under a stochastic update scheme. This explains the empirical observations of limit cycles in gradient dynamics presented in Daskalakis et al. (2017); Leslie and Collins (2005); Hommes and Ochea (2012). It also implies that gradientbased learning in multiagent reinforcement learning, multiarmed bandits, generative adversarial networks, and online optimization all admit limit cycles under certain loss functions. Our empirical results show that these problems are not merely of theoretical interest, but also have great relevance in practice.
Which classes of games have all Nash being attracting for gradientplay and which classes preclude the existence of nonNash equilibria is an open and particularly interesting question. Further, the question of whether gradientbased algorithms can be constructed for which only gametheoretically relevant equilibria are attracting is of particular importance as gradientbased learning is increasingly implemented in game theoretic settings. Indeed, more generally, as learning algorithms are increasingly deployed in markets and other competitive environments understanding and dealing with such theoretical issues will become increasingly important.
Appendix A Proofs of the Main Results
This appendix contains the full proofs of the results in the paper.
a.1 Proofs on Links Between Dynamical Systems and Games
We begin with a proof of Proposition 3.1 that all differential Nash equilibria are either strict saddle points or asymptotically stable equilibria of the gradient dynamics. This relies mainly on the definitions of strict saddle points, locally asymptotically stable equilibria, and nondegenerate differential Nash equilibria and simple linear algebra.
[Proof of Proposition 3.1] Suppose that is a nondegenerate differential Nash equilibrium. We claim that . Since is a differential Nash equilibrium, for each ; these are the diagonal blocks of . Further implies that . Since , . Thus, it is not possible for all the eigenvalues to have negative real part. Since is nondegenerate, so that none of the eigenvalues can have zero real part. Hence, at least one eigenvalue has strictly positive real part.
To complete the proof, we show that the conditions for nondegenerate differential Nash equilibrium are not sufficient to guarantee that is locally asymptotically stable for the gradient dynamics—that is, not all eigenvalues of have strictly positive real part. We do this by constructing a class of games with the strict saddle point property. Consider a class of two player games on defined as follows:
In this game, the Jacobian of the gradient dynamics is given by
(7) 
with . If is a nondegenerate differential Nash equilibria, and which implies that . Choosing such that will guarantee that one of the eigenvalues of is negative and the other is positive, making a strict saddle point. This shows that nondegenerate differential Nash equilibria can be strict saddle points of the combined gradient dynamics.
Hence, for any game , a nondegenerate differential Nash equilibrium is either a locally asymptotically stable equilibrium or a strict saddle point, but it not strictly unstable or strictly marginally stable (i.e. having eigenvalues all on the imaginary axis).
The proof of Proposition 3.2, which claims that all differential Nash equilibria in zerosum games are locally asymptotically stable, again just relies on basic linear algebra and the definition of a differential Nash equilibrium.
[Proof of Proposition 3.2] Consider a two player game on with . For such a game,
Note that . Suppose that is a differential Nash equilibrium and let with and . Then, since and for , a differential Nash equilibrium. Since is arbitrary, this implies that is positive definite and hence, clearly nondegenerate. Thus, for twoplayer zerosum games, all differential Nash equilibria are both nondegenerate differential Nash equilibria and locally asymptotically stable equilibria of
The proof that all locally asymptotically stable equilibria in potential games are differential Nash equilibria relies on the symmetry of in potential games.
[Proof of Proposition 3.3] The proof follows from the definition of a potential game. Since is a potential game, it admits a potential function such that for all . This, in turn, implies that at a locally asymptotically stable equilibrium of , , where is the Hessian matrix of the function . Further must have strictly positive eigenvalues for to be a locally asymptotically stable equilibrium of . Since the Hessian matrix of a function must be symmetric, , must be positive definite, which through Sylvester’s criterion ensures that each of the diagonal blocks of is positive definite. Thus, we have that the existence of a potential function guarantees that the only locally asymptotically stable equilibria of , are differential Nash equilibria.
a.2 Proofs for Deterministic Setting
We now present the proof of Theorem 4.1 and its corollaries. The proof of relies on the celebrated stable manifold theorem (Shub, 1978, Theorem III.7), Smale (1967). Given a map , we use the notation to denote the –times composition of . {theorem}[Center and Stable Manifolds (Shub, 1978, Theorem III.7), Smale (1967)] Let be a fixed point for the local diffeomorphism where is an open neighborhood of in and . Let be the invariant splitting of into generalized eigenspaces of corresponding to eigenvalues of absolute value less than one, equal to one, and greater than one. To the invariant subspace there is an associated local –invariant embedded disc called the local stable center manifold of dimension and ball around such that , and if for all , then . Some parts of the proof follow similar arguments to the proofs of results in Lee et al. (2016); Panageas and Piliouras (2016) which apply to (singleagent) gradientbased optimization. Due to the different learning rates employed by the agents and the introduction of the differential game form , the proof differs.
[Proof of Theorem 4.1] The proof is composed of two parts: (a) the map is a diffeomorphism, and (b) application of the stable manifold theorem to conclude that the set of initial conditions is measure zero.
(a) is diffeomorphism We claim the mapping is a diffeomorphism. If we can show that is invertible and a local diffeomorphism, then the claim follows. Consider and suppose so that . The assumption implies that satisfies the Lipschitz condition on . Hence, . Let where —that is, is an diagonal matrix with repeated on the diagonal times. Then, since .
Now, observe that . If is invertible, then the implicit function theorem (Lee, 2012, Theorem C.40) implies that is a local diffeomorphism. Hence, it suffices to show that does not have an eigenvalue of . Indeed, letting be the spectral radius of a matrix , we know in general that for any square matrix and induced operator norm so that Of course, the spectral radius is the maximum absolute value of the eigenvalues, so that the above implies that all eigenvalues of have absolute value less than .
Since is injective by the preceding argument, its inverse is welldefined and since is a local diffeomorphism on , it follows that is smooth on . Thus, is a diffeomorphism.
(b) Application of the stable manifold theorem Consider all critical points to the game—i.e. . For each , let be the open ball derived from Theorem A.2 and let . Since , Lindelõf’s lemma Kelley (1955)—every open cover has a countable subcover—gives a countable subcover of . That is, for a countable set of critical points with , we have that .
Starting from some point , if gradientbased learning converges to a strict saddle point, then there exists a and index such that for all . Again, applying Theorem A.2 and using that —which we note is obviously true if —we get that .
Using the fact that