Learning Sparse Polymatrix Games in Polynomial Time and Sample Complexity

# Learning Sparse Polymatrix Games in Polynomial Time and Sample Complexity

Asish Ghoshal and Jean Honorio
Department of Computer Science
Purdue University
West Lafayette
IN - 47906
{aghoshal
jhonorio}@purdue.edu
###### Abstract

We consider the problem of learning sparse polymatrix games from observations of strategic interactions. We show that a polynomial time method based on -group regularized logistic regression recovers a game, whose Nash equilibria are the -Nash equilibria of the game from which the data was generated (true game), in samples of strategy profiles — where is the maximum number of pure strategies of a player, is the number of players, and is the maximum degree of the game graph. Under slightly more stringent separability conditions on the payoff matrices of the true game, we show that our method learns a game with the exact same Nash equilibria as the true game. We also show that samples are necessary for any method to consistently recover a game, with the same Nash-equilibria as the true game, from observations of strategic interactions. We verify our theoretical results through simulation experiments.

## 1 Introduction and Related Work

#### Motivation.

Many complex real-world data can be thought of as resulting from the behavior of a large number of self-interested trying to myopically or locally maximize some utility. Over the past several decades, non-cooperative game theory has emerged as a powerful mathematical framework for reasoning about such strategic interactions between self-interested agents. Traditionally, research in game theory has focused on computing the Nash equilibria (NE) (c.f. Blum et al. (2006) and Jiang and Leyton-Brown (2011)) — which characterizes the stable outcome of the overall behavior of self-interested agents — correlated equilibria (c.f. (Kakade et al., 2003)), and other solution concepts given a description of the game. Computing the price of anarchy (PoA) for graphical games, which in a sense quantifies the inefficiency of equilibria, is also of tremendous interest (c.f. (Ben-Zwi and Ronen, 2011)). The aforementioned problems of computing the NE, correlated equilibria and PoA can be thought of as inference problems in graphical games, and require a description of the game, i.e., the payoffs of the players. In many real-world settings, however, only the behavior of the agents are observed, in which case inferring the latent payoffs of the players from observations of behavioral data becomes imperative. This problem of learning a game from observations of behavioral data, i.e., recovering the structure and parameters of the player payoffs such that the Nash equilibria of the game, in some sense, approximates the Nash equilibria of the true game, is the primary focus of the paper.

Recovering the underlying game from behavioral data is an important tool in exploratory research in political science and behavioral economics, and recent times have seen a surge of interest in such problems (c.f. (Irfan and Ortiz, 2014; Honorio and Ortiz, 2015; Ghoshal and Honorio, 2016; Garg and Jaakkola, 2016; Ghoshal and Honorio, 2017)). For instance, in political science, Irfan and Ortiz (2014) identified the most influential senators in the U.S congress — a small coalition of senators whose collective behavior forced every other senator to a unique choice of action — by learning a linear influence game from congressional voting records. Garg and Jaakkola (2016) showed that a tree-structured polymatrix game 111Garg and Jaakkola (2016) call their game a potential game even though the formulation of their game is similar to ours. learned from U.S. Supreme Court data was able to recover the known ideologies of the justices. However, many open problems remain in this area of active research. One such problem is whether there exists efficient (polynomial time) methods for learning polymatrix games (Janovskaja, 1968) from noisy observations of strategic interactions. This is the focus of the current paper.

#### Related Work.

Various methods have been proposed for learning games from data. Honorio and Ortiz (2015) proposed a maximum-likelihood approach to learn “linear influence games” — a class of parametric graphical games with linear payoffs. However, in addition to being exponential time, the maximum-likelihood approach of Honorio and Ortiz (2015) also assumed a specific observation model for the strategy profiles. Ghoshal and Honorio (2016) proposed a polynomial time algorithm, based on -regularized logistic regression, for learning linear influence games. They again assumed the specific observation model proposed by Honorio and Ortiz (2015) in which the strategy profiles (or joint actions) were drawn from a mixture of uniform distributions: one over the pure-strategy Nash equilibria (PSNE) set, and the other over the complement of the PSNE set. Ghoshal and Honorio (2017) obtained necessary and sufficient conditions for learning linear influence games under arbitrary observation model. Finally, Garg and Jaakkola (2016) use a discriminative, max-margin based approach, to learn tree structured polymatrix games. However, their method is exponential time and they show that learning polymatrix games is NP-hard under this max-margin setting, even when the class of graphs is restricted to trees. Furthermore, all the aforementioned works, with the exception of Garg and Jaakkola (2016), consider binary strategies only. In this paper, we propose a polynomial time algorithm for learning polymatrix games, which are non-parametric graphical games where the pairwise payoffs between players are characterized by matrices (or pairwise potential functions). In this setting, each player has a finite number of pure-strategies.

#### Our Contributions.

We propose an group-regularized logistic regression method to learn polymatrix games, which has been considered by Garg and Jaakkola (2016) and is a generalization of linear influence games considered by Ghoshal and Honorio (2017). We make no assumptions on the latent payoff functions and show that our polynomial time algorithm recovers an -Nash equilibrium of the true game 222By the phrase “recovering the Nash equilibria” we mean that we learn a game with the same Nash equilibria as the true game. We use this phrase elsewhere in the paper for brevity., with high probability, if the number of samples is , where is the number of players, is the maximum degree of the game graph and is the maximum number of pure-strategies of a player. Under slightly more stringent separability conditions on the payoff functions of the underlying game, we show that our method recovers the Nash equilibria set exactly. We further generalize the observation model from Ghoshal and Honorio (2017) in the sense that we allow strategy profiles in the non-Nash equilibria set to have zero measure. This should be compared with the results of Garg and Jaakkola (2016) who show that learning tree-structured polymatrix games is NP-hard under a max-margin setting. We also obtain necessary conditions on learning polymatrix games and show that samples are required by any method for recovering the PSNE set of a polymatrix game from observations of strategy profiles.

Finally, we conclude this section by referring the reader to the work of Jalali et al. (2011) who analyze -regularized logistic regression for learning undirected graphical models. However, our setting differs from that of learning discrete graphical models in many ways. First, unlike discrete graphical models, where the underlying distribution over the variables is described by a potential function that factorizes over the cliques of the graph, we make no assumptions whatsoever on the generative distribution of data. Further, we are interested in recovering the PSNE set of a game, since the graph structure in generally unidentifiable from observational data, whereas Jalali et al. (2011) obtain guarantees on the graph structure of the discrete graphical model. As a result, our theoretical analysis and proofs differ significantly from those of Jalali et al. (2011).

## 2 Notation and Problem Formulation

In this section, we introduce our notation and formally define the problem of learning polymatrix games from behavioral data.

#### Polymatrix games.

A -player polymatrix game is a graphical game where the set of nodes of the graph denote players and the edges correspond to two-player games. We will denote the graph by , where is the vertex set and is set of directed edges. An edge denotes the directed edge . Each player has a set of pure-strategies or actions , and the set of pure-strategy profiles or joint actions of all the players is denoted by . We will denote . With each edge is associated a payoff matrix , such that gives the finite payoff of the -th player (with respect to the -th player), when player plays and player plays . We assume that , if and only if . Given a strategy profile , the total payoff, or simply the payoff, of the -th player is given by the following potential function:

 (1)

where is the set of neighbors of in the graph , and gives the (finite) individual payoff of for playing . We will denote the number of neighbors of player by , and the maximum degree of the graph by . A polymatrix game is then completely defined by a graph and a collection of potential functions , where each of the payoff functions decomposes according to (1). Finally, we will also assume that the number of strategies of each player, , is non-zero and with respect to and , and that .

#### Nash equilibria of polymatrix games.

The pure-strategy Nash equilibria (PSNE) set for the game is given by the set of strategy profiles where no player has any incentive to unilaterally deviate from its strategy given the strategy profiles of its neighbors, and is defined as follows:

 NE(G)={x∈A∣∣xi∈argmaxa∈Aiui(a,x−i)}. (2)

The set of -Nash equilibria of the game are those strategy profiles where each player can gain at most payoff by deviating from its strategy, and is defined as follows:

 ε-NE(G) ={x∈A∣ui(xi,x−i)≥ui(a,x−i)−ε,∀a∈Ai % and ∀i∈[p]}. (3)

#### Observation model.

Without getting caught up in the dynamics of gameplay — something that is difficult to observe or reason about in real-world scenarios — we abstract the learning problem as follows. Assume that we are given “noisy” observations of strategy profiles, or joint actions, drawn from a game . The noise process models our uncertainty over the individual actions of the players due to observation noise, for instance, when we observe the actions through a noisy channel, or due to the unobserved dynamics of gameplay during which equilibrium is reached. By “observations drawn from a game” we simply mean that there exists a distribution , from which the strategy profiles are drawn, satisfying the following condition:

The above condition ensures that the signal level is more than the noise level. This should be compared with the observation model of Ghoshal and Honorio (2017), who assume that . Our observation model thus encompasses specific observation models considered in prior literature (Honorio and Ortiz, 2015; Ghoshal and Honorio, 2016): the global and local noise model. The global noise model is parameterized by a constant such that the probability of observing a strategy profile is given by a mixture of two uniform distributions:

 Pg(x;G)=q1[x∈NE(G)]|NE(G)|+(1−q)1[x∉NE(G)]|A|−|NE(G)|. (4)

In the local noise model, we observe strategy profiles from the PSNE set with each entry (strategy) corrupted independently. Therefore, in the local noise model we have the following distribution over strategy profiles:

 Pl(x;G)=1|NE(G)|×∑y∈NE(G)p∏i=1(qi)1[xi=yi](1−qimi−1)1[xi≠yi], (5)

with for all .

In essence, we assume that we observe multiple “stable outcomes” of the game, which may or may-not be in equilibria. Treating the outcomes of the game as “samples” observed across multiple “plays” of the same game is a recurring theme in the literature for learning games (c.f. (Honorio and Ortiz, 2015), (Ghoshal and Honorio, 2016), (Ghoshal and Honorio, 2017), (Garg and Jaakkola, 2016)).

The learning problem then corresponds to recovering a game from such that with high probability. Given that computing a single Nash equilibria is PPAD-complete (Daskalakis et al., 2009), any efficient learning algorithm must learn the game without explicitly computing or enumerating the Nash equilibria of the game. It has also been shown that even computing an -Nash equlibria is hard under the exponential time hypothesis for PPAD (Rubinstein, 2016). We also emphasize that we do not observe any information about the latent player payoffs, and neither do we impose any restrictions on the payoffs for obtaining our -Nash equilibria guarantees. Also, note that in our definition of the learning problem, we do not impose any restriction on the “closeness” of the recovered graph to the true graph . This is because multiple graphs can give rise to the same PSNE set under different payoff functions and thus be unidentifiable from observations of joint actions alone (see section 4.4.1 of Honorio and Ortiz (2015) for a counter example.)

## 3 Method

In this section, we describe our method for learning polymatrix games from observational data. The individual and pairwise payoffs can be equivalently written, in linear form, as follows:

 ui,i(xi) =(θi,0)Tfi,0(xi), ui,j(xi,xj) =(θi,j)Tfi,j(xi,xj),

where for , and , and . Note that , , , and . Let

 θi def=(θi,0,θi,1,…,θi,i−1,θi,i+1,…,θi,p), fi(xi,x−i) def=(fi,0(xi),fi,1(xi,x1),…,fi,i−1(xi,xi−1), fi,i+1(xi,xi+1),…,fi,p(xi,xp)), (6)

with for , and . Thus the payoff for the -th player can be written, in linear form, as:

 ui(xi,x−i)=(θi)Tfi(xi,x−i). (7)

The learning problem then corresponds to learning the parameters for each player . The sparsity pattern of identifies the neighbors of . The way this differs from the binary strategies considered by Ghoshal and Honorio (2017) is that the parameters have a group-sparsity structure, i.e., for all the entire group of parameters is zero. In order to ensure that the payoffs are finite, we will assume that the parameters for the -th player belong to the set .

Our approach for estimating the parameters is to perform one-versus-rest multinomial logistic regression with group-sparse regularization. In more detail, we obtain estimators by solving the following optimization problem for each :

 ˆθi =argminθ∈ΘiLi(D;θ)+λ∥θ∥1,2, (8) Li(D;θ) =1nn∑l=1ℓi(x(l);θ), (9) ℓi(x;θ) =−log⎛⎝exp(θTfi(xi,x−i))∑a∈Aiexp(θTfi(a,x−i))⎞⎠, (10)

where , with being the -th group of . When referring to a block of a matrix or vector we will use bold letters, e.g, denotes the -th group or block of , while denotes the -th element of . In general, we define the group structured norm as follows: . Also, when using group structured norms, we will use the group structure as shown in (6), i.e., we will assume that there are groups and, in the context of the -th player, the sizes of the groups are: . Finally, we will define the support set of as the set of all indices corresponding to the active groups, i.e., , where can be thought of as indexing the groups, while can be thought of as the indexing the elements within the -th group. Thus, .

After estimating the parameters for each , the payoff functions are simply estimated to be . Finally, the graph is given by the group-sparsity structure of s, i.e., .

## 4 Sufficient Conditions

First, we obtain sufficient conditions on the number of samples to ensure successful PSNE recovery. Since our theoretical results depend on certain properties of the Hessian of the loss function defined above, we introduce the Hessian matrix in this paragraph. Let denote the Hessian of . A little calculation shows that the -th block of the Hessian matrix for the -th player is given as:

 Hij,k(x;θ) =∑\mathclapa∈Aiσi(a,x−i;θ)fi,j(a,xj)(fi,k(a,xk))T− (11) σi(x,x−i;θ) =exp(θTfi(x,x−i))∑a∈Aiexp(θTfi(a,x−i)), (12)

where we have overloaded the notation to also include , i.e., we let . We will denote the -th expected Hessian matrix at any parameter as , and the -th Hessian matrix at the true parameter as . We will also drop the superscript from the -th Hessian matrix, whenever clear from context. We will denote the finite sample version of by , i.e., . Finally, we will denote the Hessian matrix restricted to the true support set by: . In order to prove our main result, we will present a series of technical lemmas slowly building towards our main result. Detailed proofs of the lemmas are given in Appendix A.

The following lemma states that the -th population Hessian is positive definite. Specifically, the -th population Hessian evaluated at the true parameter , are positive definite with the minimum eigenvalue being . We prove the following lemma by showing that the loss function given by (10), when restricted to an arbitrary line, is strongly convex as long as the payoffs are finite.

###### Lemma 1 (Minimum eigenvalue of population Hessian).

For .

Given that population Hessian matrices are positive-definite, we then show that the finite sample Hessian matrices, evaluated at any parameter , are positive definite with high probability. We use tools from random matrix theory developed by (Tropp, 2012) to prove the following lemma.

###### Lemma 2 (Minimum eigenvalue of finite sample Hessian).

Let be any arbitrary vector and let . Then, if the number of samples satisfies the following condition:

 n≥8(di+1)λminlog(mi(1+dim)δ),

then with probability at least for some .

Now that we have shown that the loss function given by (10) is strongly convex (Lemmas 1 and 2), we exploit strong convexity to control the difference between the true parameter and the estimator . However, before proceeding further, we need to bound the norm of the gradient, as done in the following lemma. We prove the lemma by using McDiarmid’s inequality to show that in each group the finite sample gradient concentrates around the expected gradient, and then use a union bound over all the groups to control the norm.

Let , then we have that

 ∥∇Li(D;θi)∥∞,2≤ν+√2nlog(2(di+1)δ),

with probability at least .

Note that the expected gradient at the parameter does not vanish, i.e., . This is because of the mismatch between the generating distribution and the softmax distribution used for learning the parameters, as in (10). Indeed, if the data were drawn from a Markov random field, which induces a softmax distribution on the conditional distribution of node given the rest of the nodes, the parameter . However this is not the case for us. An unfortunate consequence of this is that, even with an infinite number of samples, our method will not be able to recover the parameters exactly. Thus, without additional assumptions on the payoffs, our method only recovers the -Nash equilibrium of the game.

With the required technical results in place, we are now ready to bound . Our analysis has two steps. First, we bound the norm in the true support set, i.e., . Then, we show that the norm of the difference between the true parameter and the estimator, outside the support set, is a constant factor (specifically 3) of the difference in the support set. For the first step with use a proof technique originally developed by Rothman et al. (2008) in a different context, while the second step follows from matrix algebra and optimality of the estimator for the problem (8).

The following technical lemma, which will be used later on in our proof to bound , lower bounds the minimum eigenvalue of the -th population Hessian at an arbitrary parameter , in terms of the minimum eigenvalue of the -th population Hessian at the true parameter .

###### Lemma 4 (Minimum population eigenvalue at arbitrary parameter).

Let be any vector. Then the minimum eigenvalue of -th population Hessian matrix evaluated at is lower bounded as follows:

 λmin(Hi(θSi)) ≥λmin(Hi(θiSi))−14(di+1)m2∥θSi−θiSi∥1,2.

Now, we are ready to bound the difference between the true parameter and its estimator , in the true support set .

###### Lemma 5 (Error of the i-th estimator on the support set).

If the regularization parameter and number of samples satisfy the following condition:

 λ ≥2⎛⎜⎝ν+ ⎷2nlog(2(di+1)δ)⎞⎟⎠, n >2N(m,di)log(2(di+1)δ),

where , and ; then with probability at least , for some , we have:

 ∥ˆθiSi−θiSi∥1,2≤6(di+1)Cminλ. (13)

Next, we bound the difference between the true parameter and its estimator .

###### Lemma 6 (Error of the i-th parameter estimator).

Under the same conditions on the regularization parameter and number of samples as in Lemma 5 we have, with probability at least for some ,

 ∥ˆθi−θi∥1,2≤24(di+1)Cminλ.

Now that we have control over for all , we are ready to prove our main result concerning the sufficient number of samples needed by our method to guarantee PSNE recovery with high probability.

###### Theorem 1.

Let , with , be the true potential graphical game over players and maximum degree , from which the data set is drawn. Let , with , be the game learned from the data set by solving the optimization problem (8) for each . Then if the regularization parameter and the number of samples satisfy the condition:

 λ ≥2⎛⎜⎝ν+ ⎷2nlog(2p(d+1)δ)⎞⎟⎠, n >max{2N(m,d)log(2p(d+1)δ),8(d+1)Cminlog(m(1+dm)δ)},

where , then we have that the following hold with probability at least , for some :

1. , with .

2. Additionally, if the true game satisfies the condition: such that . Then, .

###### Proof.

Note that , for any , since each binary vector has a single “1” at exactly one location. Then, from the Cauchy-Schwartz inequality, Lemma 6, and a union bound over all players, we have that:

 (∀x∈A,∀i∈[p]) |ˆui(xi,x−i)−ui(xi,x−i)| =|(ˆθi−θi)Tfi(xi,x−i)| ≤∥ˆθi−θi∥1,2∥fi(xi,x−i)∥∞,2 =∥ˆθi−θi∥1,2≤24(di+1)Cminλ=ε2, (14)

with probability at least . Now consider any and any . Since , we have from (14), :

 ui(xi,x−i)+\nicefracε2≥ˆui(xi,x−i)≥ˆui(x′i,x−i) ⟹ui(xi,x−i)≥ˆui(x′i,x−i)−\nicefracε2 ⟹ui(xi,x−i)≥ui(x′i,x−i)−ε,

where the last line again follows from (14). This proves that . Using exactly the same arguments as above, we can also show that for any :

 ˆui(xi,x−i)≥ˆui(xi,x−i)−ε (∀x′i∈Ai),

which proves that . Thus we have that , i.e., the set of joint strategy profiles form an -Nash equilibrium set of the true game . This proves our first claim. For our second claim, consider any and . Then:

 ui(xi,x−i)>ui(x′i,x−i)+ε ⟹ˆui(xi,x−i)+\nicefracε2>ˆui(x′i,x−i)−\nicefracε2+ε ⟹ˆui(xi,x−i)>ˆui(x′i,x−i)

where the first line holds by assumption, and the second line again follows from (14). Thus we have that . By setting the probability of error for some we prove our claim. The second part of the lower bound on the number of samples is due to Lemma 2. ∎

###### Remark 1.

The sufficient number of samples needed by our method to guarantee PSNE recovery, with probability at least , scales as . This should be compared with the results of Jalali et al. (2011) for learning undirected graphical models. They show that are sufficient for learning -ary discrete graphical models. However, their sample complexity hides a constant that is related to the maximum eigenvalue of the scatter matrix, which we have upper bounded by in our case, leading to a slightly higher sample complexity.

###### Remark 2.

Note that as , the regularization parameter , where is the maximum norm of the expected gradient at the true parameter across all . Thus, even with an infinite number of samples, our method recovers the -Nash equlibria set of the true game with as .

## 5 Necessary Conditions

In this section, we obtain an information-theoretic lower bound on the number of samples needed to learn sparse polymatrix games. Let be set of polymatrix games over players, with degree at most , and maximum number of strategies per player being . Our approach for doing so is to treat the inference procedure as a communication channel, where nature picks a game from the set and then generates a data set of strategy profiles. A decoder then maps to a game . We wish to obtain lower bounds on the number of samples required by any decoder to recover the true game consistently. In this setting, we define the minimax estimation error as follows:

 perr=minψsupG∗∈Gp,d,mPr{NE(ψ(D))≠NE(G∗)},

where the probability is computed over the data distribution. For obtaining necessary conditions on the sample complexity, we assume that the data distribution follows the global noise model described in (4). The following theorem prescribes the number of samples needed for learning sparse polymatrix games. Our proof of the theorem constitutes constructing restricted ensembles of “hard-to-learn” polymatrix games, from which nature picks a game uniformly at random and generates data. We then use the Fano’s technique to lower bound the minimax error. The use of restricted ensembles is customary for obtaining information-theoretic lower bounds, c.f. (Santhanam and Wainwright, 2012; Wang et al., 2010).

###### Theorem 2.

If the number of samples , then estimation fails with .

###### Proof.

Consider the following restricted ensemble of -player polymatrix games with degree , and the set of pure-strategies of each player being . Each is characterized by a set of influential players, and a set of non-influential players, with . The graph is a complete (directed) bipartite graph from the set to . After picking the graph structure , nature fixes the strategies of the influential players to some . Finally, the payoff matrices are chosen as follows:

 ui,i(xi) =1[xi=ai] (∀i∈I) uj,j(xj) =1(2xj) (∀j∈Ic) uj,i(xj,xi) =1[xj=xi] (∀i∈I∧j∈Ic).

Therefore, each game has a exactly one unique Nash equilibrium where the influential players play (decided by nature) and the non-influential players play — where returns the majority strategy among , and in case of a tie between two or more strategies it returns the numerically lowest strategy (recall that the pure-strategy set for each player is ). Thus we have that . Nature picks a game uniformly at random from by randomly selecting a set of players as “influential”, and then selecting a strategy profile uniformly at random for the influential players and setting the payoff matrices as described earlier. Nature then generates a dataset using the global noise model with parameter . Then from the Fano’s inequality we have that:

 (15)

where and denote mutual information and entropy respectively. The mutual information can be bounded, using a result by Yu (1997), as follows:

 \mathnormalI(D;G)≤1|˜G|2∑G1∈˜G∑G2∈˜GKL(PD|G=G1∥∥PD|G=G2), (16)

where (respectively ) denotes the data distribution under (respectively ). The KL divergence term from 16 can be bounded as follows:

 KL(PD|G=G1∥∥PD|G=G2) =n∑x∈APD|G=G1logPD|G=G1PD|G=G2 =n{∑x∈NE(G1)qlogq(mp−1)1−q+∑x∈NE(G2)(1−q)mp−1log1−qq(mp−1)} =n(qmp−1)mp−1log(q(mp−1)1−q) ≤nlog(q(mp−1)1−q)≤nlog2, (17)

where the first line follows from the fact that the samples are i.i.d , the second line follows from the fact the the distributions and assign the same probability to , and the last line follows from the fact that . Putting together (15), (16) and (17), we have that if

 n≤log(md−m)(pd)2log2−1,

then . Since, learning the ensemble is at least as hard as learning a subset of , our claim follows. ∎

###### Remark 3.

From the above theorem we have that, the number of samples needed by any conceivable method, to recover the PSNE set consistently, is , assuming that . Therefore, the method based on -regularized logistic regression is information-theoretically optimal in the number of players, for learning sparse polymatrix games.

## 6 Experiments

In order to validate our theoretical results, we performed various synthetic experiments by sampling a random polymatrix game, generating data from the sampled game, and then using our method to learn the game from the sampled data. We estimated the probability that our method learns the “correct” game, i.e., a game with the same PSNE set as the true game, across 40 randomly sampled games for each value of and . The results are shown in Figure 1. We observe that the scaling of the sample complexity prescribed by Theorem 1 indeed holds in practice. The results show a phase transition behavior, where if the number of samples is less than , for some constant , then PSNE recovery fails with high probability, while if the number of samples is at least , for some constant , then PSNE recovery succeeds with high probability. More details about our synthetic experiments can be found in Appendix B.

We also evaluated our algorithm on real-world data sets containing (i) U.S. supreme court rulings, (ii) U.S. congressional voting records, and (iii) U.N. General Assembly roll-call votes.

Our algorithm recovers connected components corresponding to liberal and conservative blocs of justices within the Supreme Court of the U.S. The Nash equilibria consists of strategy profiles where all justices vote unanimously, as well as strategy profiles where the conservative and liberal blocs vote unanimously but in opposition to each other.

The game graph recovered from congressional voting records, groups Democrats and Republicans in separate components. Moreover, we observed that the connected components groups senators belonging to the same state or geographic region together. The recovered PSNE set sheds light on the voting patterns of senators — senators belonging to the same connected component vote (almost) identically on bills.

Finally, on the U.N. voting data set our method recovered connected components comprising of Arab League countries and Southeast Asian countries respectively. As was the case with the aforementioned data sets, the PSNE set grouped countries that vote almost identically on U.N. resolutions.

We were also able to compute the price of anarchy (PoA) for each data set, which quantifies the degradation of performance caused by selfish behavior of non-cooperative agents. For the two supreme court voting data sets, the PoA was 1.9 and 1.6 respectively. For the congressional voting data set the PoA was 2.6, while for the united nations voting data set the PoA was 3.0. More details and results from our real-world experiments can be found in Appendix C.

#### Concluding Remarks.

We conclude this exposition with a discussion of potential avenues for future work. In this paper we considered the problem of learning a very general, and widely used, class of graphical games called polymatrix games, involving players with pure strategies. One can also consider mixed strategies, which would entail learning distributions, instead of “sets”, under the framework of non-cooperative maximization of utility. Further, one can also consider other solution concepts like correlated equilibria.

## References

• Ben-Zwi and Ronen (2011) Ben-Zwi, O. and Ronen, A. (2011). Local and global price of anarchy of graphical games. Theoretical Computer Science, 412(12-14):1196–1207.
• Blum et al. (2006) Blum, B., Shelton, C. R., and Koller, D. (2006). A continuation method for nash equilibria in structured games. Journal of Artificial Intelligence Research, 25:457–502.
• Boyd and Vandenberghe (2004) Boyd, S. and Vandenberghe, L. (2004). Convex optimization. Cambridge university press.
• Daskalakis et al. (2009) Daskalakis, C., Goldberg, P. W., and Papadimitriou, C. H. (2009). The complexity of computing a Nash equilibrium. SIAM Journal on Computing, 39(1):195–259.
• Garg and Jaakkola (2016) Garg, V. and Jaakkola, T. (2016). Learning Tree Structured Potential Games. In Advances in Neural Information Processing Systems 29, pages 1552–1560.
• Ghoshal and Honorio (2016) Ghoshal, A. and Honorio, J. (2016). From behavior to sparse graphical games: Efficient recovery of equilibria. In 54th Annual Allerton Conference on Communication, Control, and Computing, pages 1220–1227.
• Ghoshal and Honorio (2017) Ghoshal, A. and Honorio, J. (2017). Learning graphical games from behavioral data: Sufficient and necessary conditions. In Artificial Intelligence and Statistics, pages 1532–1540.
• Honorio and Ortiz (2015) Honorio, J. and Ortiz, L. (2015). Learning the structure and parameters of large-population graphical games from behavioral data. Journal of Machine Learning Research, 16:1157–1210.
• Irfan and Ortiz (2014) Irfan, M. T. and Ortiz, L. E. (2014). On influence, stable behavior, and the most influential individuals in networks: A game-theoretic approach. Artificial Intelligence, 215:79–119.
• Jalali et al. (2011) Jalali, A., Ravikumar, P., Vasuki, V., and Sanghavi, S. (2011). On Learning Discrete Graphical Models using Group-Sparse Regularization. In AISTATS, pages 378–387.
• Janovskaja (1968) Janovskaja, E. (1968). Equilibrium situations in multi-matrix games. Litovskiĭ Matematicheskiĭ Sbornik, 8:381–384.
• Jiang and Leyton-Brown (2011) Jiang, A. X. and Leyton-Brown, K. (2011). Polynomial-time computation of exact correlated equilibrium in compact games. In Proceedings of the 12th ACM conference on Electronic commerce, pages 119–126. ACM.
• Johnson and Nylen (1991) Johnson, C. R. and Nylen, P. (1991). Monotonicity properties of norms. Linear Algebra and its Applications, 148:43–58.
• Kakade et al. (2003) Kakade, S., Kearns, M., Langford, J., and Ortiz, L. (2003). Correlated equilibria in graphical games. In Proceedings of the 4th ACM Conference on Electronic Commerce, pages 42–47. ACM.
• Rothman et al. (2008) Rothman, A. J., Bickel, P. J., Levina, E., and Zhu, J. (2008). Sparse permutation invariant covariance estimation. Electronic Journal of Statistics, 2:494–515.
• Rubinstein (2016) Rubinstein, A. (2016). Settling the complexity of computing approximate two-player Nash equilibria. arXiv:1606.04550 [cs]. arXiv: 1606.04550.
• Santhanam and Wainwright (2012) Santhanam, N. P. and Wainwright, M. J. (2012). Information-theoretic limits of selecting binary graphical models in high dimensions. Information Theory, IEEE Transactions on, 58(7):4117–4134.
• Tropp (2012) Tropp, J. A. (2012). User-friendly tail bounds for sums of random matrices. Foundations of computational mathematics, 12(May):389–434.
• Wang et al. (2010) Wang, W., Wainwright, M. J., and Ramchandran, K. (2010). Information-theoretic bounds on model selection for Gaussian Markov random fields. In ISIT, pages 1373–1377. Citeseer.
• Yu (1997) Yu, B. (1997). Festschrift for Lucien Le Cam: Research Papers in Probability and Statistics, chapter Assouad, Fano, and Le Cam, pages 423–435. Springer New York, New York, NY.

Learning Sparse Polymatrix Games in Polynomial Time and Sample Complexity

## Appendix A Detailed Proofs

###### Proof of Lemma 1 (Minimum eigenvalue of population Hessian).

Fix any , with . For any , let . Then for ,

 ℓ(x;θ0+tθ1)=−F(t;xi)+log(∑a∈Aiexp(F(t;a))). (18)

A little calculation shows that the double derivative of with respect to is as follows:

 ∂2ℓ(x;θ0+tθ1)∂