Correlation in Extensive-Form Games:Saddle-Point Formulation and Benchmarks

Correlation in Extensive-Form Games:
Saddle-Point Formulation and Benchmarks

Gabriele Farina
Computer Science Department
Carnegie Mellon University
gfarina@cs.cmu.edu &Chun Kai Ling
Computer Science Department
Carnegie Mellon University
chunkail@cs.cmu.edu &Fei Fang
Institute for Software Research
Carnegie Mellon University
feif@cs.cmu.edu &Tuomas Sandholm
Computer Science Department, Carnegie Mellon University
Strategic Machine, Inc.
Strategy Robot, Inc.
Optimized Markets, Inc.
sandholm@cs.cmu.edu
Abstract

While Nash equilibrium in extensive-form games is well understood, very little is known about the properties of extensive-form correlated equilibrium (EFCE), both from a behavioral and from a computational point of view. In this setting, the strategic behavior of players is complemented by an external device that privately recommends moves to agents as the game progresses; players are free to deviate at any time, but will then not receive future recommendations. Our contributions are threefold. First, we show that an EFCE can be formulated as the solution to a bilinear saddle-point problem. To showcase how this novel formulation can inspire new algorithms to compute EFCEs, we propose a simple subgradient descent method which exploits this formulation and structural properties of EFCEs. Our method has better scalability than the prior approach based on linear programming. Second, we propose two benchmark games, which we hope will serve as the basis for future evaluation of EFCE solvers. These games were chosen so as to cover two natural application domains for EFCE: conflict resolution via a mediator, and bargaining and negotiation. Third, we document the qualitative behavior of EFCE in our proposed games. We show that the social-welfare-maximizing equilibria in these games are highly nontrivial and exhibit surprisingly subtle sequential behavior that so far has not received attention in the literature.

 

Correlation in Extensive-Form Games:
Saddle-Point Formulation and Benchmarks


 Arxiv Preprint
Gabriele Farina Computer Science Department Carnegie Mellon University gfarina@cs.cmu.edu Chun Kai Ling Computer Science Department Carnegie Mellon University chunkail@cs.cmu.edu Fei Fang Institute for Software Research Carnegie Mellon University feif@cs.cmu.edu Tuomas Sandholm Computer Science Department, Carnegie Mellon University Strategic Machine, Inc. Strategy Robot, Inc. Optimized Markets, Inc. sandholm@cs.cmu.edu

May 21, 2019

\@xsect

Nash equilibrium (NE) (Nash, 1950), the most seminal concept in non-cooperative game theory, captures a multi-agent setting where each agent is selfishly motivated to maximize their own payoff. The assumption underpinning NE is that the interaction is completely decentralized: the behavior of each agent is not regulated by any external orchestrator. Contrasted with the other—often utopian—extreme of a fully managed interaction, where an external dictator controls the behavior of each agent so that the whole system moves to a desired state, the social welfare that can be achieved by NE is generally lower, sometimes dramatically so (Koutsoupias & Papadimitriou, 1999; Roughgarden & Tardos, 2002). Yet, in many realistic interactions, some intermediate form of centralized control can be achieved. In particular, in his landmark paper, Aumann (1974) proposed the concept of correlated equilibrium (CE), where a mediator (the correlation device) can recommend behavior, but not enforce it. In a CE, the correlation device is constructed so that the agents—which are still modeled as fully rational and selfish just like in an NE—have no incentive to deviate from the private recommendation. Allowing correlation of actions while ensuring selfishness makes CE a good candidate solution concept in multi-agent and semi-competitive settings such as traffic control, load balancing (Ashlagi et al., 2008), and carbon abatement (Ray & Gupta, 2009), and it can lead to win-win outcomes.

In this paper, we study the natural extension of correlated equilibrium in extensive-form (i.e., sequential) games, known as extensive-form correlated equilibrium (EFCE)  (Von Stengel & Forges, 2008). Like CE, EFCE assumes that the strategic interaction is complemented by an external mediator; however, in an EFCE the mediator only privately reveals the recommended next move to each acting player, instead of revealing the whole plan of action throughout the game (i.e., recommended move at all decision points) for each player at the beginning of the game. Furthermore, while each agent is free to defect from the recommendation at any time, this comes at the cost of future recommendations.

While the properties of correlation in normal-form (i.e., non-sequential) games are well-studied, they do not automatically transfer to the richer world of sequential interactions. It is known in the study of NE that sequential interactions can pose different challenges, especially in settings where the agents retain private information. Conceptually, the players can strategically adjust to dynamic observations about the environment and their opponents as the game progresses. Despite tremendous interest and progress in recent years for computing NE in sequential interactions with private information, with significant milestones achieved in the game of Poker (Bowling et al., 2015; Brown & Sandholm, 2017; Moravčík et al., 2017) and other large, real-world domains, not much has been done to increase our understanding of (extensive-form) correlated equilibria in these settings.

Contributions Our primary objective with this paper is to spark more interest in the community towards a deeper understanding of the behavioral and computational aspects of EFCE.

  • In Correlation in Extensive-Form Games: Saddle-Point Formulation and Benchmarks we show that an EFCE in a two-player general-sum game is the solution to a bilinear saddle-point problem (BSPP). This conceptual reformulation complements the EFCE construction by Von Stengel & Forges (2008), and allows for the development of new and efficient algorithms. As a proof of concept, by using our reformulation we devise a variant of projected subgradient descent which outperforms linear-programming(LP)-based algorithms proposed by Von Stengel & Forges (2008) in large game instances.

  • In Correlation in Extensive-Form Games: Saddle-Point Formulation and Benchmarks we propose two benchmark games; each game is parametric, so that these games can scale in size as desired. The first game is a general-sum variant of the classic war game Battleship. The second game is a simplified version of the Sheriff of Nottingham board game. These games were chosen so as to cover two natural application domains for EFCE: conflict resolution via a mediator, and bargaining and negotiation. We will release the source code for our parametric game generators, so that the research community can benefit from our implementation work.

  • By analyzing EFCE in our proposed benchmark games, we show that even if the mediator cannot enforce behavior, it can induce significantly higher social welfare than NE and successfully deter players from deviating in at least two (often connected) ways: (1) using certain sequences of actions as ‘passcodes’ to verify that a player has not deviated: defecting leads to incomplete or wrong passcodes which indicate deviation, and (2) inducing opponents to play punitive actions against players that have deviated from the recommendation, if such a deviation is detected. Crucially, both deterrents are unique to sequential interactions and do not apply to non-sequential games. This corroborates the idea that the mediation of sequential interactions is a qualitatively different problem than that of non-sequential games and further justifies the study of EFCE as an interesting direction for the community. To our knowledge, these are the first experimental results and observations on EFCE in the literature.

\@xsect

Extensive-Form Games Extensive-form games (EFGs) are sequential games that are played over a rooted game tree. Each node in the tree belongs to a player and corresponds to a decision point for that player. Outgoing edges from a node correspond to actions that can be taken by the player to which belongs. Each terminal node in the game tree is associated with a tuple of payoffs that the players receive should the game end in that state. To capture imperfect information, the set of vertices of each player is partitioned into information sets. The vertices in a same information set are indistinguishable to the player that owns those vertices. For example, in a game of Poker, a player cannot distinguish between certain states that only differ in opponent’s private hand. As a result, the strategy of the player (specifying which action to take) is defined on the information sets instead of the vertices. For the purpose of this paper, we only consider perfect-recall EFGs. This property means that each player does not forget any of their previous action, nor any private or public observation that the player has made. The perfect-recall property can be formalized by requiring that for any two vertices in a same information set, the paths from those vertices to the root of the game tree contain the exact same sequence of actions for the acting player at the information set.

A pure normal-form strategy for Player defines a choice of action for every information set that belongs to . A player can play a mixed strategy, i.e., sample from a distribution over their pure normal-form strategies. However, this representation contains redundancies: some information sets for Player may become unreachable reachable after the player makes certain decisions higher up in the tree. Omitting these redundancies leads to the notion of reduced-normal-form strategies, which are known to be strategically equivalent to normal-form strategies (e.g., (Shoham & Leyton-Brown, 2009) for more details). Both the normal-form and the reduced-normal-form representation are exponentially large in the size of the game tree.

Here, we fix some notations. Let be the set of terminal states (or equivalently, outcomes) in the game and be the utility obtained by player if the game terminates at . Let be the set of pure reduced-normal-form strategies for Player . We define , and to be the set of reduced-normal-form strategies that (a) can lead to information set , (b) can lead to and prescribes action at information set , and (c) can lead to the terminal state , respectively. We denote by the set of information set-action pairs (also referred to as sequences), where is an information set for Player and is an action at set . For a given terminal state let be the last pair belonging to Player encountered in the path from the root of the tree to .

Extensive-Form Correlated Equilibrium Extensive-form correlated equilibrium (EFCE) is a solution concept for extensive-form games introduced by Von Stengel & Forges (2008).111Other CE-related solution concepts in sequential games include the agent-form correlated equilibrium (AFCE), where agents continue to receive recommendations even upon defection, and normal-form coarse CE (NFCCE). NFCCE does not allow for defections during the game, in fact, before the game starts, players must decide to commit to following all recommendations upfront (before receiving them), or elect to receive none. Like in the traditional correlated equilibrium (CE), introduced by Aumann (1974), a correlation device selects private signals for the players before the game starts. These signals are sampled from a correlated distribution —a joint probability distribution over —and represent recommended player strategies. However, while in a CE the recommended moves for the whole game tree are privately revealed to the players when the game starts, in an EFCE the recommendations are revealed incrementally as the players progress in the game tree. In particular, a recommended move is only revealed when the player reaches the decision point in the game for which the recommendation is relevant. Moreover, if a player ever deviates from the recommended move, they will stop receiving recommendations. To concretely implement an EFCE, one places recommendations into ‘sealed envelopes’ which may only be opened at its respective information set. Sealed envelopes may implemented using cryptographic techniques (see (Dodis et al., 2000) for one such example).

In an EFCE, the players know less about the set of recommendations that were sampled by the correlation device. The benefits are twofold. First, the players can be more easily induced to play strategies that hurt them (but benefit the overall social welfare), as long as “on average” the players are indifferent as to whether or not to follow the recommendations: the set of EFCEs is a superset of that of CEs. Second, since the players observe less, the set of probability distributions for the correlation device for which no player has an incentive to deviate can be described succinctly in certain classes of games: Von Stengel & Forges (2008, Theorem 1.1) show that in two-player, perfect-recall extensive-form games with no chance moves, the set of EFCEs can be described by a system of linear equations and inequalities of polynomial size in the game description. On the other hand, the same result cannot hold in more general settings: Von Stengel & Forges (2008, Section 3.7) also show that in games with more than two players and/or chance moves, deciding the existence of an EFCE with social welfare greater than a given value is NP-hard. It is important to note that this last result only implies that the characterization of the set of all EFCEs cannot be of polynomial size in general (unless ). However, the problem of finding one EFCE can be solved in polynomial time: Huang (2011) and Huang & von Stengel (2008) show how to adapt the Ellipsoid Against Hope algorithm (Papadimitriou & Roughgarden, 2008; Jiang & Leyton-Brown, 2015) to compute an EFCE in polynomial time in games with more than two players and/or with chance moves. Unfortunately, that algorithm is only theoretical, and known to not scale beyond extremely small instances (Leyton-Brown, 2019).

\@xsect

Our objective for this section is to cast the problem of finding an EFCE in a two-player game as a bilinear saddle-point problem, that is a problem of the form where and are compact convex sets. In the case of EFCE, and are convex polytopes that belong to a space whose dimension is polynomial in the game tree size. This reformulation is meaningful:

  • From a conceptual angle, it brings the problem of computing an EFCE closer to several other solution concepts in game theory that are known to be expressible as BSPP. In particular, the BSPP formulation shows that an EFCE can be viewed as a NE in a two-player zero-sum game between a deviator, who is trying to decide how to best defect from recommendations, and a mediator, who is trying to come up with an incentive-compatible set of recommendations.

  • From a geometric point of view, the BSPP formulation better captures the combinatorial structure of the problem: and have a well-defined meaning in terms of the input game tree. This has algorithmic implications: for example, because of the structure of (which will be detailed later), the inner maximization problem can be solved via a single bottom-up game-tree traversal.

  • From a computational standpoint, it opens the way to the plethora of optimization algorithms (both general-purpose and those specific to game theory) that have been developed to solve BSPPs.

Furthermore, it is easy to show that by dualizing the inner maximization problem in the BSPP formulation, one recovers the linear program introduced by Von Stengel & Forges (2008) (we show this in Appendix  Correlation in Extensive-Form Games: Saddle-Point Formulation and Benchmarks). In this sense, our formulation subsumes the existing one.

Triggers and Deviations One effective way to reason about extensive-form correlated equilibria is via the notion of trigger agents, which was introduced (albeit used in a different context) in Gordon et al. (2008) and Dudik & Gordon (2009):

Definition 1.

Let be a sequence for Player , and let be a distribution over . A -trigger agent for Player is a player that follows all recommendations given by the mediator unless they get recommended at ; in that case, the player ‘gets triggered’, stops following the recommendations and instead plays based on a pure strategy sampled from until the game ends.

A correlated distribution is an EFCE if and only if any trigger agent for Player can get utility at most equal to the utility that Player earns by following the recommendations of the mediator at all decision points. In order to express the utility of the trigger agent, it is necessary to compute the probability of the game ending in each of the terminal states. As we show in Appendix Correlation in Extensive-Form Games: Saddle-Point Formulation and Benchmarks, this can be done concisely by partitioning the set of terminal nodes in the game tree into three different sets. In particular, let be the set of terminal nodes whose path from the root of the tree contains taking action at and let be the set of terminal nodes whose path from the root passes through and are not in . We have

Lemma 1.

Consider a -trigger agent for Player 1, where . The value of the trigger agent, defined as the expected difference between the utility of the trigger agent and the utility of an agent that always follows recommendations sampled from correlated distribution , is computed as where and .

(A symmetric result holds for Player 2, with symbols and .) It now seems natural to perform a change of variables, and pick distributions for the random variables and instead of and . Since there are only a polynomial number (in the game tree size) of combinations of arguments for these new random variables, this approach allows one to remove the redundancy of realization-equivalent normal-form plans and focus on a significantly smaller search space. In fact, the definition of also appears in (Von Stengel & Forges, 2008), referred to as (sequence-form) correlation plan. In the case of the and random variables, it is clear that the change of variables is possible via the sequence form (von Stengel, 2002); we let be the sequence-form polytope of feasible values for the vector . Hence, the only hurdle is characterizing the space spanned by and as varies across the probability simplex. In two-player perfect-recall games with no chance moves, this is exactly one of the merits of the landmark work by Von Stengel & Forges (2008). In particular, the authors prove that in those games the space of feasible can be captured by a polynomial number of linear constraints. In more general cases the same does not hold (see second half of Correlation in Extensive-Form Games: Saddle-Point Formulation and Benchmarks), but we prove the following (Appendix Correlation in Extensive-Form Games: Saddle-Point Formulation and Benchmarks):

Lemma 2.

In a two-player game, as varies over the probability simplex, the joint vector of , variables spans a convex polytope in , where is at most quadratic in the game size.

Saddle-Point Reformulation According to Lemma 1, for each Player and -trigger agent for them, the value of the trigger agent is a biaffine expression in the vectors and , and can be written as for a suitable matrix and vector , where the two terms in the difference correspond to the expected utility for deviating at according to the (sequence-form) strategy and the expected utility for not deviating at . Given the correlation plan , the maximum value of any deviation for any player can therefore be expressed as We can convert the maximization above into a continuous linear optimization problem by introducing the multipliers (one per each Player and trigger ), and write , where the maximization is subject to the linear constraints and for all . These linear constraints define a polytope .

A correlation plan is an EFCE if an only if for every trigger agent, i.e., . Therefore, to find an EFCE, we can solve the optimization problem , which is a bilinear saddle point problem over the convex domains and , both of which are convex polytopes that belong to , where is at most quadratic in the input game size (Lemma 2). If an EFCE exists, the optimal value should be non-positive and the optimal solution is an EFCE (as it satisfies ). In fact, since EFCE’s always exist (as EFCEs are supersets of CEs (Von Stengel & Forges, 2008)), and one can select triggers to be terminal sequences for Player , the optimal value of the BSPP is always . The BSPP can be interpreted as the NE of a zero-sum game between the mediator, who decides on a suitable correlation plan and a deviator who selects the ’s to maximize each . The value of this game is always . Finally, we can enforce a minimum lower bound on the sum of players’ utility by introducing an additional variable and maximizing the new objective subject to and the modified constraint .

Computing an EFCE using Subgradient Descent(Von Stengel & Forges, 2008) show that a (SW-maximizing) EFCE of a two-player game without chance may be expressed as the solution of an LP and solved using generic methods such as the simplex algorithm or interior-point methods. However, this does not scale to large games as these methods require to store and invert large matrices. Here, we showcase the benefits of exploiting the combinatorial structure of the BSPP formulation by proposing a simple algorithm based on subgradient descent; in Correlation in Extensive-Form Games: Saddle-Point Formulation and Benchmarks we show that this method scales better than commercial state-of-the-art LP solver in large games.

For brevity, we only provide a sketch of our algorithm, which computes a (not necessarily SW-maximizing) EFCE. Conceptually, since the function is convex, we may perform subgradient descent on . This is convenient, because the subgradients may be readily expressed as , where is a triplet which maximizes the objective ; this can be computed by traversing the tree. Unfortunately, maintaining feasibility (that is, ) is trickier, because projecting onto is challenging, even in games without chance, where can be expressed by a polynomial number of constraints (Von Stengel & Forges, 2008). To overcome this, we show that in games with no chance can be expressed as the intersection of convex polytopes and non-negative orthant. Projection on and individually can be efficiently done, in parallel, by precomputing a sparse Cholesky factor of the constraints that define and : we prove that a sparse (polynomial) factorization always exists, and implemented a custom parallel algorithm that computes the factorization by exploiting the structure of the game tree. This allows for the use of a recent algorithm by (Wang & Bertsekas, 2013), where gradient steps are interlaced with projections onto , , and the non-negative orthant in a cyclical manner. See Appendix Correlation in Extensive-Form Games: Saddle-Point Formulation and Benchmarks.

\@xsect

In this section we introduce the first two benchmark games for EFCE. These games are naturally parametric so that they can scale in size as desired and hence used to evaluate different EFCE solvers. In addition, we show that the EFCE in these games are interesting behaviorally: the correlation plan in social-welfare-maximizing EFCE is highly nontrivial and even seemingly counter-intuitive. We believe some of these induced behaviors may prove practical in real-world scenarios and hope our analysis can spark an interest in EFCEs and other equilibria in sequential settings.

\@xsect

In this section we introduce our first proposed benchmark game to illustrate the power of correlation in extensive-form games. Our game is a general-sum variant of the classic game Battleship. Each player takes turns to secretly place a set of ships (of varying sizes and value) on separate grids of size . After placement, players take turns firing at their opponent—ships which have been hit at all the tiles they lie on are considered destroyed. The game continues until either one player has lost all of their ships, or each player has completed shots. At the end of the game, the payoff of each player is computed as the sum of the values of the opponent’s ships that were destroyed, minus times the value of ships which they lost, where is called the loss multiplier of the game. The social welfare (SW) of the game is the sum of utilities to all players.

In order to illustrate a few interesting feature of social-welfare-maximizing EFCE in this game, we will focus on the instance of the game with a board of size , in which each player commands just ship of value and length , there are rounds of shooting per player, and the loss multiplier is . In this game, the social-welfare-maximizing Nash equilibrium is such that each player places their ship and shoots uniformly at random. This way, the probability that Player 1 and 2 will end the game by destroying the opponent’s ship is and respectively (Player 1 has an advantage since they act first). The probability that both players will end the game with their ships unharmed is a meagre . Correspondingly, the maximum SW reached by any NE of the game is .

In the EFCE model, it is possible to induce the players to end the game with a peaceful outcome—that is, no damage to either ship—with probability , times of the probability in NE, resulting in a much-higher SW of . Before we continue with more details as to how the mediator (correlation device) is able to achieve this result in the case where , we remark that the benefit of EFCE is even higher when the loss multiplier increases: Figure 1 (left) shows, as a function of , the probability with which Player 1 and 2 terminate the game by sinking their opponent’s ship, if they play according to the SW-maximizing EFCE. For all values of , the SW-maximizing NE remains the same while with a mediator, the probability of reaching a peaceful outcome increases as increases, and asymptotically gets closer to and the gap between the expected utility of the two players vanishes. This is remarkable, considering Player 1’s advantage for acting first.

Figure 1: (Left) Probabilities of players sinking their opponent when the players play according to the SW-maximizing EFCE. For , the probability of the game ending with no sunken ship and the probability of Player 2 sinking Player 1 coincide. (Right) Example of a playthrough of Battleship assuming both players are recommended to place their ship in the same position . Edge labels represents the probability of an action being recommended. Squares and hexagons denote actions taken by Players 1 and 2 respectively. Blue and red nodes represent cases where Players 1 and 2 sink their opponent, respectively. The Shoot action is abbreviated ‘Sh.’.

We now resume our analysis of the SW-maximizing EFCE in the instance where . In a nutshell, the correlation plan is constructed in a way that players are recommended to deliberately miss, and deviations from this are punished by the mediator, who reveals to the opponent the ship location that was recommended to the deviating player. First, the mediator recommends the players a ship placement that is sampled uniformly at random and independently for each players. This results in possible scenarios (one per possible ship placement) in the game, each occurring with probability . Due to the symmetric nature of ship placements, only two scenarios are relevant: whether the two players are recommended to place their ship in the same spot, or in different spots. Figure 1 (right) shows the probability of each recommendation from the mediator in the former case, assuming that the players do not deviate. The latter case is symmetric (see Appendix Correlation in Extensive-Form Games: Saddle-Point Formulation and Benchmarks for details). Now, we explain the first of the two methods in which the mediator compels non-violent behavior. We focus on the first shot made by Player 1 (i.e., the root in Figure 3). The mediator suggests that Player 1 shoot at the Player 2’s ship with a low probability, and deliberately miss with high probability. One may wonder how it is possible for this behavior to be incentive-compatible (that is, what are the incentives that compel Player 1 into not defecting), since the player may choose to randomly fire in any of the 2 locations that were not recommended, and get almost chance of winning the game immediately. The key is that if Player 1 does so and does not hit the opponent’s ship, then the mediator can punish him by recommending that Player 2 shoot in the position where Player 1’s was recommended to place their ship. Since players value their ships more than destroying their opponents, the player is incentivized to avoid such a situation by accepting the recommendation to (most probably) miss. We see the first example of deterrent used by the EFCE mediator: the mediator is inducing the opponent to play punitive actions against players that have deviated from the recommendation, if ever that deviation can be detected from the player. A similar situation arises in the first move of Player 2, where Player 2 is recommended to deliberately miss, hitting each of the 2 empty spots with probability . A more detailed analysis is available in Appendix Correlation in Extensive-Form Games: Saddle-Point Formulation and Benchmarks.

\@xsect

Our second proposed benchmark is a simplified version of the Sheriff of Nottingham board game. The game models the interaction of two players: the Smuggler—who is trying to smuggle illegal items in their cargo—and the Sheriff—who is trying to stop the Smuggler. At the beginning of the game, the Smuggler secretly loads his cargo with illegal items. At the end of the game, the Sheriff decides whether to inspect the cargo. If the Sheriff chooses to inspect the cargo and finds illegal goods, the Smuggler must pay a fine worth to the Sheriff. On the other hand, the Sheriff has to compensate the Smuggler with a utility if no illegal goods are found. Finally, if the Sheriff decides not to inspect the cargo, the Smuggler’s utility is whereas the Sheriff’s utility is . The game is made interesting by two additional elements (which are also present in the board game): bribery and bargaining. After the Smuggler has loaded the cargo and before the Sheriff chooses whether or not to inspect, they engage in rounds of bargaining. At each round , the Smuggler tries to tempt the Sheriff into not inspecting the cargo by proposing a bribe , and the Sheriff responds whether or not they would accept the proposed bribe. Only the proposal and response from round will be executed and have an impact on the final payoffs—that is, all but the -th round of bargaining are non-consequential and their purpose is for the two players to settle on a suitable bribe amount. If the Sheriff accepts bribe , then the Smuggler gets , while the Sheriff gets . See Appendix Correlation in Extensive-Form Games: Saddle-Point Formulation and Benchmarks for a formal description of the game.

We now point out some interesting behavior of EFCE in this game. We refer to the game instance where as the baseline instance.

Effect of and . First, we show what happens in the baseline instance when the item value , item penalty , and Sheriff compensation (penalty) are varied in isolation over a continuous range of values. The results are shown in Figure 2. In terms of general trends, the effect of the parameter to the Smuggler is fairly consistent with intuition: the Smuggler benefits from a higher item value as well as from higher sheriff penalties, and suffers when the penalty for smuggling is increased. However, the finer details are much more nuanced. For one, the effect of changing the parameters not only is non-monotonic, but also discontinuous. This behavior has never been documented and we find it rather counterintuitive. More counterintuitive observations can be found in Appendix Correlation in Extensive-Form Games: Saddle-Point Formulation and Benchmarks.

Figure 2: Utility of players with varying and for the SW-maximizing EFCE. We verified that these plots are not the result of equilibrium selection issues.

Effect of , and . Here, we try to empirically understand the impact of and on the SW maximizing equilibrium. As before we set and vary and simultaneously while keeping constant. The results are shown in Table 1.

(3.00, 2.00) (3.00, 2.00) (3.00, 2.00)
(8.00, 2.00) (8.00, 2.00) (8.00, 2.00)
(2.28, 1.26) (8.00, 2.00) (8.00, 2.00)
(1.76, 0.93) (7.26, 1.82) (8.00, 2.00)
Table 1: Payoffs for (Smuggler, Sheriff) in the SW-maximizing EFCE.

The most striking observation is that increasing the capacity of the cargo may decrease social welfare. For example, consider the case when (shown in blue in Table 1, right) where the payoffs are . This achieves the maximum attainable social welfare by smuggling items and having the Sheriff accept a bribe of . When is increased to (red entry in the table), the payoffs to both players drop significantly, and even more so when increases further. While counter-intuitive, this behavior is consistent in that the Smuggler may not benefit from loading items every time he was recommended to load ; the Sheriff reacts by inspecting more, leading to lower payoffs for both players. That behavior is avoided by increasing the number of rounds : by increasing to (entry shown in purple), the behavior disappears and we revert to achieving a social welfare of 10 just like in the instance with . With sufficient bargaining steps, the Smuggler, with the aid of the mediator, is able to convince the Sheriff that they have complied with the recommendation by the mediator. This is because the mediator spends the first bribes to give a ‘passcode’ to the Smuggler so that the Sheriff can verify compliance—if an ‘unexpected’ bribe is suggested, then the Smuggler must have deviated, and the Sheriff will inspect the cargo as punishment. With more rounds, it is less likely that the Smuggler will guess the correct passcode by chance. See also Appendix Correlation in Extensive-Form Games: Saddle-Point Formulation and Benchmarks.

\@xsect

We show that even our proof-of-concept algorithm based on the BSSP formulation and subgradient descent, introduced in Section Correlation in Extensive-Form Games: Saddle-Point Formulation and Benchmarks, is able to beat LP-based approaches using the commercial solver Gurobi (Gurobi Optimization, 2018) in large games. This confirms known results about the scalability of methods for computing NE, where in the recent years first-order methods have affirmed themselves as the only algorithms that are able to handle large games.

We experimented on Battleship over a range of parameters while fixing . All experiments were run on a cluster with 64 cores and 500GB of memory. For our method, we tuned step sizes based on multiples of 10. In Table 2, we report execution times when all constraints (feasibility and deviation) are violated by no greater than , and . Our method outperforms the LP-based approach for larger games. However, while we outperform the LP-based approach for accuracies up to , Gurobi spends most of its time reordering variables and preprocessing, their solution improves more rapidly for higher levels of precision; this is expected of a gradient-based method like ours. On very large games with more than 100 million variables, both our method and Gurobi fail—in Gurobi’s case, it was due to a lack of memory while in our case, each iteration required nearly an hour which was prohibitive. The main bottleneck in our method was the projection onto and . We also experimented on the Sheriff game and obtained similar findings (Appendix Correlation in Extensive-Form Games: Saddle-Point Formulation and Benchmarks).

Ship #Actions #Relevant Time (LP) Time (ours)
length Pl 1 Pl 2 seq. pairs
(2, 2) 1 741 917 35241 2s 2s 2s 1s 2s 3s
(3, 2) 1 15k 47k 3.89M 3m 6s 3m 17s 3m 24s 8s 34s 52s
(3, 2) 1 145k 306k 26.4M 42m 39s 42m 44s 43m 2m 48s 14m 1s 23m 24s
(3, 2) 2 970k 2.27M 111M —- out of memory —- —- did not achieve —-
Table 2: #Seq. pairs is the dimension of under the compact representation of (Von Stengel & Forges, 2008). For LPs, we report the fastest of Barrier, Primal and Dual Simplex, and 3 different formulations (Appendix Correlation in Extensive-Form Games: Saddle-Point Formulation and Benchmarks). Gurobi went out of memory and was killed by the system after seconds during the variable ordering phase. Our method requires hour per iteration and did not achieve the required accuracy after hours.
\@xsect

In this paper, we have proposed two parameterized benchmark games in which EFCE exhibits interesting behaviors. We have analyzed those behaviors both qualitatively and quantitatively, and isolated two ways through which a mediator is able to compel the agents to follow the recommendations. We also provide an alternative saddle-point formulation of EFCE and demonstrate its merit with a simple subgradient method which outperforms standard LP based methods. We hope that our analysis will bring attention to some of the computational and practical uses of EFCE, and that our benchmark games will be useful to evaluate future algorithms for computing EFCE in large games.

\@ssect

Acknowledgements This material is based on work supported by the National Science Foundation under grants IIS-1718457, IIS-1617590, and CCF-1733556, and the ARO under award W911NF-17-1-0082. Fei Fang and Chun Kai Ling are supported by a research grant from Lockheed Martin.

References

  • Ashlagi et al. (2008) Ashlagi, I., Monderer, D., and Tennenholtz, M. On the value of correlation. Journal of Artificial Intelligence Research, 33:575–613, 2008.
  • Aumann (1974) Aumann, R. Subjectivity and correlation in randomized strategies. Journal of Mathematical Economics, 1:67–96, 1974.
  • Bowling et al. (2015) Bowling, M., Burch, N., Johanson, M., and Tammelin, O. Heads-up limit hold’em poker is solved. Science, 2015.
  • Brown & Sandholm (2017) Brown, N. and Sandholm, T. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science, Dec. 2017.
  • Crawford & Sobel (1982) Crawford, V. P. and Sobel, J. Strategic information transmission. Econometrica: Journal of the Econometric Society, pp. 1431–1451, 1982.
  • Dodis et al. (2000) Dodis, Y., Halevi, S., and Rabin, T. A cryptographic solution to a game theoretic problem. In Annual International Cryptology Conference, pp. 112–130. Springer, 2000.
  • Dudik & Gordon (2009) Dudik, M. and Gordon, G. J. A sampling-based approach to computing equilibria in succinct extensive-form games. In UAI, pp. 151–160. AUAI Press, 2009.
  • Gordon et al. (2008) Gordon, G. J., Greenwald, A., and Marks, C. No-regret learning in convex games. In Proceedings of the 25th international conference on Machine learning, pp. 360–367. ACM, 2008.
  • Gurobi Optimization (2018) Gurobi Optimization, L. Gurobi optimizer reference manual, 2018. URL http://www.gurobi.com.
  • Huang (2011) Huang, W. Equilibrium computation for extensive games. PhD thesis, London School of Economics and Political Science, January 2011.
  • Huang & von Stengel (2008) Huang, W. and von Stengel, B. Computing an extensive-form correlated equilibrium in polynomial time. In International Workshop On Internet And Network Economics (WINE), pp. 506–513. Springer, 2008.
  • Jiang & Leyton-Brown (2015) Jiang, A. X. and Leyton-Brown, K. Polynomial-time computation of exact correlated equilibrium in compact games. Games and Economic Behavior, 91:347–359, 2015.
  • Koutsoupias & Papadimitriou (1999) Koutsoupias, E. and Papadimitriou, C. Worst-case equilibria. In Symposium on Theoretical Aspects in Computer Science, 1999.
  • Leyton-Brown (2019) Leyton-Brown, K. Personal communication, 2019.
  • Moravčík et al. (2017) Moravčík, M., Schmid, M., Burch, N., Lisý, V., Morrill, D., Bard, N., Davis, T., Waugh, K., Johanson, M., and Bowling, M. Deepstack: Expert-level artificial intelligence in heads-up no-limit poker. Science, 2017.
  • Nash (1950) Nash, J. Equilibrium points in n-person games. Proceedings of the National Academy of Sciences, 36:48–49, 1950.
  • Papadimitriou & Roughgarden (2008) Papadimitriou, C. H. and Roughgarden, T. Computing correlated equilibria in multi-player games. Journal of the ACM, 55(3):14, 2008.
  • Ray & Gupta (2009) Ray, I. and Gupta, S. S. Technical Report, 2009.
  • Roughgarden & Tardos (2002) Roughgarden, T. and Tardos, É. How bad is selfish routing? Journal of the ACM (JACM), 49(2):236–259, 2002.
  • Shoham & Leyton-Brown (2009) Shoham, Y. and Leyton-Brown, K. Multiagent systems: Algorithmic, game-theoretic, and logical foundations. Cambridge University Press, 2009.
  • von Stengel (1996) von Stengel, B. Efficient computation of behavior strategies. Games and Economic Behavior, 1996.
  • von Stengel (2002) von Stengel, B. Computing equilibria for two-person games. In Aumann, R. and Hart, S. (eds.), Handbook of game theory, volume 3. North Holland, Amsterdam, The Netherlands, 2002.
  • Von Stengel & Forges (2008) Von Stengel, B. and Forges, F. Extensive-form correlated equilibrium: Definition and computational complexity. Mathematics of Operations Research, 33(4):1002–1022, 2008.
  • Wang & Bertsekas (2013) Wang, M. and Bertsekas, D. P. Incremental constraint projection-proximal methods for nonsmooth convex optimization. SIAM J. Optim.(to appear), 2013.
\@xsect

Recall the continuous version of the primal version of the inner maximization problem which was obtained by adding the multipliers .

such that

where may be seen as the sequence form representation of a game rooted at a particular information set of player , and scaled by the factor . By expanding the sequence form constraints which define , we get

such that

where and are sequence form constraint matrices rooted at the information set containing , with the only difference that instead of having the ‘empty sequence’ be equal to , we require that all actions belonging to sum to . We are now in a position to take duals; the only non-zero elements on the right hand side of the constraints are from the sum-to-one constraints over . This give s the following dual

such that

where and are free in sign. Combining this with the outer minimization over gives us the linear program by (Von Stengel & Forges, 2008), up to a change in variable names and conventions.

\@xsect

In order to express the utility of a trigger agent, it is necessary to compute the probability of the game ending in each of the terminal states. Before that, we will review the notation introduced in earlier sections in more detail.

  • be the set of terminal states (or equivalently, outcomes) in the game, and is some terminal state.

  • be the utility obtained by player if the game terminates at some terminal state .

  • be the set of pure reduced-normal-form strategies for Player . We also require notation for subsets of , namely,

    • , is the set of reduced-normal-form strategies that can lead to information set (which belongs to player ) assuming that the other player acts to do so as well. This is equivalent (assuming no zero-chance nodes or disconnected game trees) to saying that all reduced-normal-from strategies in have some action which belongs to information set .

    • is the set of reduced normal form strategies which will lead to information set and recommend the action in . This is equivalent to the set of reduced normal form strategies which contain as part of their recommendation (this set is typically a subset of ).

    • is the set of reduced-normal-form strategies which can lead to the terminal state (assuming the other player players to do so). This is equivalent to the set of reduced-normal-form strategies which contain the pair where is the unique last information set-action pair which has to be encountered by player before the terminal state .

  • the set of information set-action pairs (also known as sequences), where is an information set for Player and is an action at set .

  • is the last pair belonging to Player encountered before some terminal state .

We are interested in characterizing the random variable that maps a triple of reduced-normal-form strategies to the terminal state of the game that is reached when Player 1 is a -trigger agent and Player 2 follows all recommendations. That is, we want to find the probabillity of terminating at each for a -trigger agent, given the mediator’s joint distribution over reduced normal form strategies and the trigger strategy for the deviating player, which we will assume to be Player without loss of generality. For each trigger , the terminal leaves may be partitioned into the following sets.

  • (or equivalently ) is the set of terminal nodes that are descendants of the trigger . In order for the game to end in one of these terminal nodes, it is necessary that the recommendation device recommended to Player 1 the trigger sequence , and therefore the agent must have deviated. Furthermore, Player 2 must have been recommended the terminal sequence corresponding to the terminal state, and finally must be compatible with . We can capture all these constraints concisely by saying that the sampled must be such that , and . Therefore the probability that a trigger agent terminates at some is given by,

    where the first term in the product is the probability that Player plays to and Player gets triggered, and the second term is the probability that the deviation strategy from Player upon getting triggered is one that reaches .

  • is the set of terminal states that are descendant of any sequence in , except . In order for the game to reach this terminal state, recommendations issued to Player 1 by the correlation device must have been such that Player 1 reached . There are two cases: either the correlation device recommended at , or it did not. In the former case, Player 1 started deviating (using the sampled reduced-normal-form plan ); hence, in this case it must be . In the latter case, Player 1 does not deviate from the recommendation, and therefore it must be . Either way, Player 2 must have been recommended the terminal sequence corresponding to the terminal state ; that is, . Collecting all these constraints, it must be

    Using the fact that the two cases as to whether or not Player 1 was recommended or not at are disjoint, we can write

    The first term in the summation may be understood as the probability that the agent was triggered and its deviation was to play something other than . The second term is that probability that the agent was not triggered and the game simply terminates at based on .

  • Finally, is the set of terminal nodes that are neither in nor in . If the game has ended in any terminal state that belongs to , Player 1 has not deviated from the recommended strategy, since they have never even reached the trigger information set, . Hence, in this case it must be . Hence,

With the above, we can finally express the constraint that no deviation strategy can lead to a higher utility for Player 1 than simply following each recommendation. Indeed, for all , the utility of the trigger agent is expressed as

where the correct expression for must be selected depending on whether , or . On the other hand, the utility of an agent that follows all recommendations is

Therefore, following all recommendations is a best response for the -trigger agent if and only if is chosen so that

(1)

The crucial observation is that all the probabilities defined above can be expressed via the following quantities:

For example, for all we can write

When deviations relative to Player 2 are brought into the picture, the following two sets of symmetric quantities also become relevant:

It would now seem natural to perform a change of variables, and pick (correlated) distributions for the random variables and instead of and . Since there are only a polynomial number (in the game tree size) of combinations of arguments for these new random variables, this approach would allow one to remove the redundancy of realization-equivalent normal-form plans and focus on a polynomially-small search space. In the case of the random variables and , it is clear that the change of variables is possible via the sequence form (von Stengel, 2002). Therefore, the only difficulty is in characterizing the space spanned by and as varies across the probability simplex. In two-player perfect-recall games with no chance moves, this is exactly the merit of the landmark work by Von Stengel & Forges (2008). In particular, the authors prove that in those games the space of feasible can be captured by a polynomial number of linear constraints.

\@xsect

The vectors of entries are obtained from via a linear mapping. Hence, the set of values that can be assumed by is the image of the probability simplex via a linear mapping. Since images of polytopes via linear functions are polytopes, the lemma holds.

\@xsect

First observe that the function

is convex since it is the maximum of linear functions of . This suggests that we may perform subgradient descent on . The subgradients are given by

(2)

where is a triplet which maximizes the objective function . Concretely, this may be obtained by computing for each player and each , computing the best response with respect to the posterior strategy that Player would expect after having been triggered by . This computation is a straightforward bottom-up traversal of the sequence-form tree of Player , rooted at . We take the maximum over all of and select the trigger-agent which results in the largest violation of incentive constraints. This gives us the subgradient expression in (2).

Unfortunately, maintaining feasibility (that is, ) is trickier, because projecting onto is challenging. This is so even in games without chance (where can be expressed by a polynomial number of constraints (Von Stengel & Forges, 2008)). We are not aware of any distance generating functions which allow for efficient projection onto this intersection polytope. One would ideally want to avoid projection involving iterative methods like Dykstra’s algorithm, since this would dramatically increase the cost of each iterate.

To overcome this hurdle, we show that in games with no chance moves can be expressed as the intersection of convex polytopes and non-negative orthant. Specifically, we require the compact representation prescribed by (Von Stengel & Forges, 2008), which states that may be compactly represented as a an embedding within 2-dimensional matrix, with dimensions equal to the sequence-form (von Stengel, 1996) representation of each player. Here, we are only interested in entries of the matrix which correspond to relevant sequence pairs (see (Von Stengel & Forges, 2008) for details)—this typically results in much sparser matrix. The constraints for may be shown to be equivalent to the intersection of (a) , which are the sequence form constraints for all relevant-sequence pairs for each row, and (b) , the sequence form constraints for all relevant-sequence pairs over each column, and (c) non-negativity constraints and the additional constraint that the empty-sequence pair is equal to .

In this instance, projection (based on L2 distance) on and individually can be decomposed a series of disjoint projections (either on rows or columns) and thus computed in parallel. It turns out that projection of each individual row/column may also be done efficiently. Let and be matrices and vectors coressponding to the sequence form constraints (Von Stengel & Forges, 2008) —that is, is a (typically sparse) matrix containing entries in and is a vector containing ’s or ’s. 222In our implementation, need not have this restriction, but it is included her to be more in line with the classic work of (von Stengel, 1996). Given a vector , the projection onto the affine space given by is given by the optimization problem

s.t.

The closed form solution may be found using Lagrange multipliers, and is given by

Since F is sparse, the main difficulty in computing is settled if we can efficiently compute for any vector .

Lemma 3.

Let be the sequence form constraint matrix. Computing may be done efficiently.

Proof.

The key here is to exploit the structure of . Observe that is symmetric, positive-definite and has dimension equal to the number of information sets. Furthermore, may be expressed in closed form:

where above are information sets, and being the parent of means that there is some action in which can lead to information set (without any other information set from the same player in between), and being the sibling of means that the (unique) sequence leading to and are the same. Observe that is almost, but not quite tree-structured. However, it is sparse and more importantly, has fill-in if we order variables in a bottom-up fashion in the player’s game tree. That is, we treat as a graph with information sets as vertices, then repeatedly removing vertices (information sets) ina bottom-up fashion and forming cliques with all neighbours of the removed vertex does not introduce any new edges. In other words, performing gaussian elimination on may be done without introducing additional non-zero entries; this means that computing may be done efficiently. Suppose the number of maximum number of actions that an information set may have is . Then, eliminating a variable will only require time linear in . ∎

Remark.   Lemma 3 and the fact that L2 projections onto sequence form constraints can be done efficiently may be of separate interest to researchers beyond the scope of EFCEs.

Hence, L2-projections onto sequence-form constraints may be effeciently done (assuming is small). In practice, we precompute a sparse Cholesky factor of . From the previous discussion, the Cholesky factors are guaranteed to be sparse and easily stored. Withe the Cholesky decomposition of , finding becomes straightforward. This precomputation is done once per trigger-sequence , since the set of relevant sequence pairs for each trigger sequence (i.e., the location of non-zero entries in the matrix representing ) differs. This precomputation step is trivially parallel. In our experiments, computing the Cholesky factors was rarely the bottleneck (although we do include this timing when evaluating our method)

With individual projections onto , easily done, the final step is to perform gradient descent itself. To this end, we make use of a recent algorithm proposed by (Wang & Bertsekas, 2013), where gradient steps are interlaced with projections onto , , and the non-negative orthant in a cyclical manner. This is similar to projected gradient descent, but instead of projecting onto the intersection of , and the non-negative orthant (which we believe to be difficult), we project onto just one of them (which is easy), and in round robin fashion. This simple method was shown to converge by (Wang & Bertsekas, 2013).

\@xsect\@xsect

A game of Battleship is parameterized by a tuple , where

  • the integers define the height and width of the playing field for each player;

  • is an ordered list containing ship descriptions for each player. Each description is a pair , where is the length of the -th ship and is its value;

  • is the number of rounds in the game;

  • is a loss multiplier that controls the relative value of a losing versus destroying ships.

The game proceeds in phases: ship placement and shooting. During the ship placement phase, the players (starting with Player 1) take turns placing their ships on their playing field. The players must place all their ships, in the same order in which they appear in , on the playing field. The ship placement phase ends when all ships have been placed. We remark that the players’ playing fields are separate: in other words, there are two playing fields of dimensions , one per player. The ships may be placed either horizontally or vertically on each player’s grid (playing field); all ships must lie entirely within the playing field and may not overlap with other ships the player has already placed. Finally, the locations of a player’s ships is private information for each player.

In the shooting phase, players take turns firing at each other; Player 1 starts first. This is done by selecting a pair of integer coordinates that identify a cell within the playing field. After taking a shot, the player is told if the shot was a hit, that is, the selected cell is occupied by a ship of the opponent, or if it is a miss, that is, does not contain an opponent’s ship. If all cells covered by a ship have been shot at, the ship is destroyed and this fact is announced. Note that the identity of the ship which was hit or sunk is not revealed; players only know that some ships was hit or sunk. The game ends when shots have been made by each player, or if one player has lost all their ships, whichever comes first. At the end of the game, each player’s payoff is computed as follows: for each opponent’s ship that the player has destroyed, the player receives a payoff equal to the value of that ships; for each ship that the player has lost to the opponent, the player incurs a negative payoff equal to , that is the value of the ship times the loss multiplier . Note that when the game is general sum.

Since , this asymmetric model describes situations where players are encouraged to destroy other ships, but are ultimately more protective of their own assets. The loss multiplier governs this gap; a higher value of makes so that each player values their ships more than destroying others. Note that when , we obtain a zero-sum version of battleships (with varying scores for each ship).

For the remainder of the discussion, we define the social welfare (SW) of any outcome to be the sum of payoffs of each player. We will demonstrate that with the aid of a mediator (the correlation device), the social welfare of the optimal correlated equilibria are dramatically higher than the social welfare of even the best Nash equilibrium. In other words, the mediator leads to significantly less destructive outcomes, and leads to more frequent ties where the players sometimes agree to deliberately miss their opponents, while still retaining incentive-compatibility and rationality in the standard game-theoretic sense.

\@xsect

We analyze one social-welfare-maximizing EFCE in the same small instance of Battleship as the previous section. The mediator in this EFCE recommends the players a ship placement that is sampled uniformly at random and independently for each players. This results in possible scenarios (one per possible ship placement) in the game, each occurring with probability . Due to the symmetric nature of ship placements, only two scenarios are relevant: whether the two players are recommended to place their ship in the same spot, or in different spots. Figure 3 details the strategy of the the mediator in each of these two scenarios, assuming that the players do not deviate. Note that the game trees in Figure 3 are parametric on the recommended ship placements and ; all possible ship placements can be recovered from Figure 3 by setting and to appropriate values in .

Figure 3: Example of a playthrough of Battleship assuming both players were recommended to place their ship in (left), or that Player and were recommended to place their ships in and respectively (right). For both pictures, the numbers along each edge denote probabilities of each action being recommended; no edge is shown for actions recommended with zero probability. Squares and hexagons denote actions taken by Players 1 and 2 respectively. Similarly, blue and red nodes represent cases where Players 1 and 2 sink their opponent’s ship, respectively. Green leaf nodes are where the game results in no ship loss. The Shoot action is abbreviated to ‘Sh.’

For both game trees, note that the correlation device suggests that Player 1 shoot at the Player 2’s ship with a low probability, and deliberately miss with high probability. As hinted in earlier sections, this type of recommendation is key to understanding why the EFCE succeeds in promoting less destructive outcomes. One may wonder why this behavior is incentive-compatible (that is, what are the incentives that compel Player 1 into not defecting), since the player may choose to randomly fire in any of the 2 locations that were not recommended, and get almost chance of winning the game immediately. The key is that if Player 1 does so and does not hit the opponent’s ship, then the mediator can punish him by recommending that Player 2 shoot at the location of Player 1’s ship. Since players value their ships more than destroying their opponents, the player is incentivized to avoid such a situation by accepting the recommendation to (most probably) miss.

A similar situation arises in the first move of Player 2. Here, Player 2 is recommended to deliberately miss, hitting each of the 2 empty spots with probability . If he deviates and attempts to destroy Player 1’s ship, then he risks the mediator revealing his location to his opponent if his shot misses; this risk is enough to keep Player 2 ‘in line’. The second move of Player 1 (third shot of the full game) bears a similar ideas. Here, Player 1 is recommended to hit Player 2’s ship with probability . Similar to his first shot, Player 1 may deviate and fire at the remaining location and enjoy chance of winning the game out right. Yet, this behavior is discouraged, since in the chance that he misses the shot (i.e., the recommendation was in fact, the correct location of Player 2’s ship), then his location would be revealed by the mediator and he loses the next round. Again, this threat from the mediator encourages peaceful behavior, even though the recommendation to Player 1 reveals a more accurate ‘posterior’ of Player 2’s ship location, as compared to the uniform distribution of . While making these recommendations, the mediator ensures that Player 2 has a uniform distribution of Player 1’s ship location, meaning that even though Player 2 has the final move, he may not do better than guessing at uniform at this stage.

\@xsect

It is important to note that Figure 3 does not convey the full information of the correlated plans. Crucially, it does not show the consequences suffered if a player deviates from his recommended strategy—in this case, the deviating player stops receiving recommendations and risks having his ship’s location revealed to the opponent. These ‘counterfactual’ scenarios may be counter-intuitive but are key to understanding how SW-maximizing EFCEs achieve their purpose.

\@xsect\@xsect

The Sheriff game is described by the the parameters . The parameters describe the value of each illegal item, the penalty that the Smuggler has to pay for each discovered illegal item, and the compensation that the Sheriff pays to the Smuggler in the case of a false alarm. At the beginning of the game, the Smuggler loads items into his cargo. The amount of goods loaded is unknown to the Sheriff. The game then proceeds for rounds of bargaining. Each round comprises two steps. First, the Smuggler offers a bribe to the Sheriff, where is the round of bargaining. After that, the Sheriff responds with ‘Yes’ or ‘No’.

All actions are public knowledge, except for the selection of cargo contents, which only the Smuggler knows. In the final step, we compute the payoffs to players. The outcome of the game is decided by the last step of bargaining. In particular, the first rounds of bargaining have no explicit bearing on the outcome of the game, except for purposes of coordination. The payoffs for each outcome are:

  1. Sheriff accepts the bribe. The Smuggler’s gets , and the Sheriff’s gets the bribe offered .

  2. Sheriff inspects and discovers illegal items. The Smuggler is fined and gets a payoff of while the Sheriff gets a payoff of .

  3. Sheriff chooses to inspect and does not find illegal items. The Smuggler receives a compensation of , while the Sheriff gets .

The objective of the mediator is to maximize social welfare in the space EFCEs. Ideally, this will involve the Smuggler bringing in goods and the Sheriff accepting bribes – any other outcome would simply be zero-sum, since it no goods will be successfully smuggled and money only changes hands between players. A qualitative description of the welfare maximizing equilibrium is not obvious, since the game contains elements of both lying and bargaining.

\@xsect

The communication in the bargaining steps are similar to that in cheap talk (Crawford & Sobel, 1982), where costless and non-binding signals are transmitted between players. However, in our setting, the signals are transmitted in the middle of the game as opposed to just at the beginning. More importantly, the presence of the mediator during the phase of bargaining bestows more uses for the signals—in particular, the mediator may be able to take punitive measures against players who deviate from recommendations, since future recommendations will be withheld from players who deviate. The importance of this will be illustrated later.

\@xsect
(4.00, 1.00) (4.00, 1.00) (4.00, 1.00) (4.00, 1.00)
(1.24, 0.19) (4.00, 1.00) (4.00, 1.00) (4.00, 1.00)
(0.89, 0.11) (1.11, 1.00) (4.00, 1.00) (4.00, 1.00)
(0.82, 0.00) (0.84, 1.00) (3.62, 1.00) (4.00, 1.00)
Table 3: Payoffs for (Smuggler, Sheriff) when players play according to the SW-maximizing EFCE in the Sheriff game with (right).

We illustrate the effect of the non-consequential bribes with two small settings, where . Examples of SW-maximizing equilibria are shown in Figure 4 and Figure 5. 333As with the analysis of Battleship, note that this only shows interactions of players on the equilibrium path, that is, the graph omits what would happen if some player deviated.

Figure 4: Example of a playthrough of the Sheriff game with . Edge labels correspond to action probabilities, edges with probability are omitted. Squares and hexagons denote actions taken by Players 1 and 2 respectively, while green and red nodes denote the Sheriff choosing to pass or inspect.
Figure 5: Example of a playthrough of the Sheriff game with . Edge labels correspond to action probabilities, edges with probability are omitted. Squares and hexagons denote actions taken by Players 1 and 2 respectively, while green and red nodes denote the Sheriff choosing to pass or inspect.

The SW maximizing EFCE yields payoffs of and for and respectively. We will first consider the case where (Figure 5. Here, what occurs happens along the equilibrium path is straightforward. The Smuggler loads in or items with equal probability. Next, he offers a (non-consequential) bribe of either , , or . Then, he receives some feedback of , and proceeds to offer a bribe of , which the Sheriff gladly accepts. The payoffs to players is and depending if the Smuggler was recommended to load or items, leading to an average payoff of .

The underlying mechanism is in fact fairly straightforward and mirrors the idea in the modified signalling game of (Von Stengel & Forges, 2008). Assume that a random number is chosen uniformly from . This acts as a ‘passcode’ which the Sheriff expects from the Smuggler in the first round. This passcode forms part of the correlated plan, and will eventually be revealed to the Smuggler assuming he did not deviate when selecting the number of illegal items (recall that the sequential nature of the EFCE means that the recommended amount to bribe is not revealed until the Smuggler loads the cargo with the recommended number of items.) In other words, the first (non-consequential) bribe may be used as a signal which hints to the Sheriff if the Smuggler has deviated—if it is not equal to the passcode, the Smuggler must have deviated somewhere. On the other hand, a deviating Smuggler may successfully guess the passcode with probability no greater than ; if the number of signals is sufficiently large, then it is near impossible to guess the code. Using these tools, the mediator is able to engineer a ‘deviation detector’ which checks if the Smuggler ever deviated. Note, however, that unlike the Signaling game, the Sheriff is not able to glean exactly how what was recommended (in this case, the number of items in the cargo); he is only able to deduce if the player deviated from the recommendation (in this case, this would be load either or items).

Issuing threats to the Smuggler becomes straightforward with this deviation detector. If the Sheriff knows the Smuggler is lying, he employs a ‘grim trigger’ for the rest of the game—in this case, the Sheriff opts to inspect all of the player’s cargo, regardless of the bribe offered in the second round. The Smuggler could also be pretending to bring in illegal goods, i.e., by loading items and hoping that he would guess the incorrect passcode, resulting in the Sheriff making a false accusation. However, because the Smuggler’s payoff for deceiving the Sheriff in this manner is just , he remains incentivized to stick to the recommendations, which guarantees him a payoff of either of .

We now make the following hypotheses. First, the effect of additional bargaining rounds is that the chance of randomly guessing the passcode is reduced. If there are rounds, then there are different possible signals that the Smuggler could have sent to the Sheriff through the first rounds. When , this class of correlation plans fails since the bribe by the Smuggler serves both as the answer to the ‘secret question’ and as the actual bribe to be offered. This aliasing of roles is what leads to a lower payoff; the risk of sending an incorrect passcode is not sufficiently high to dissuade the Smuggler from deviating.

\@xsect

The introduction of the correlation device changes the information structure of the game: in addiction to the public moves of nature and the opponent, each player now also observes the private recommendations issued by the correlation device. In this finer information partition, the player’s decision points are of the form , where is an information set for the player in the original (that is, without correlation device) game, and are the recommendations that the player has received so far; in particular, is the action that was recommended at information set . One crucial observation about EFCEs is that given and the last recommendation observed by the player, all other recommendations are implied. Indeed, the fact that is received at all implies that the player has always followed the recommendations up until at least information set in the original game tree. Hence, necessarily