On the commitment value and commitment optimal strategies in bimatrix games

# On the commitment value and commitment optimal strategies in bimatrix games

## Abstract

Given a bimatrix game, the associated leadership or commitment games are defined as the games at which one player, the leader, commits to a (possibly mixed) strategy and the other player, the follower, chooses his strategy after having observed the irrevocable commitment of the leader. Based on a result by von Stengel and Zamir (2010), the notions of commitment value and commitment optimal strategies for each player are discussed as a possible solution concept. It is shown that in non-degenerate bimatrix games (a) pure commitment optimal strategies together with the follower’s best response constitute Nash equilibria, and (b) strategies that participate in a completely mixed Nash equilibrium are strictly worse than commitment optimal strategies, provided they are not matrix game optimal. For various classes of bimatrix games that generalize zero sum games, the relationship between the maximin value of the leader’s payoff matrix, the Nash equilibrium payoff and the commitment optimal value is discussed. For the Traveler’s Dilemma, the commitment optimal strategy and commitment value for the leader are evaluated and seem more acceptable as a solution than the unique Nash equilibrium. Finally, the relationship between commitment optimal strategies and Nash equilibria in bimatrix games is thoroughly examined and in addition, necessary and sufficient conditions for the follower to be worse off at the equilibrium of the leadership game than at any Nash equilibrium of the simultaneous move game are provided.

\apptocmd

plus 0.3ex
Keywords: Bimatrix Game, Nash Equilibrium, Subgame Perfect, Commitment Optimal, Commitment Value, Weakly Unilaterally Competitive Games, Pure Strategy Equilibrium,
JEL Classification: C72, AMS 2010 Subject Classification Primary: 91A05, Secondary: 91A10, 91A40.

## 1 Introduction

In the 1920’s, when trying to formalize zero-sum games and propose a solution concept for them, both E. Borel and J. von Neumann approached two-person zero-sum games from an optimization point of view. As payoff functions depend on both players actions, direct optimization of a player’s payoff does not make sense and both Borel and von Neumann reached the concept of the players’ security or safety level. It was defined as the best among worst possible outcomes for that player and was to be taken as the “value of the game” for each player2. This formulation led to the famous minimax theorem, which, stating that the values of both players coincide, established the common value as the indisputable solution concept for these games.

To present this concept in their classic book, J. von Neumann and O. Morgenstern proposed two auxiliary games: The “minorant” game , in which player I chooses his mixed strategy first, and then II, in full knowledge of (but not of its realization), chooses his mixed strategy , and the “majorant” game , in which the order of the players’ moves is reversed. This scheme was proposed so that the optimization of each player’s utility would make sense:

The introduction of these two games , achieves this: It ought to be evident by common sense – and we shall also establish it by an exact discussion – that for , the “best way of playing”– i.e. the concept of rational behavior – has a clear meaning3.

Hence, to be able to use individual rationality (i.e. the maximization of a player’s utility) in deriving a solution, a leader-follower scheme was utilized and the notion of a common safety (or security) level as the solution evolved naturally from optimality considerations.

However, generalizing this approach to two person, non zero-sum, non-cooperative games came to a dead end. The reason is that implicit to the Borel-von Neumann approach are three different points of view which do not necessarily agree on (non zero-sum) bimatrix games. The first point of view is that of the players optimizing against the worst possible outcome. For zero-sum games, individual rationality of the opponent coincides with assuming that he is out there to destroy us, but this assumption seems unreasonable for non-zero sum games. The second point of view is that of the players optimizing against the mixed strategy of each other simultaneously, where both mixed strategies are considered to be known. Although this seemed difficult to accept from the optimality point of view, it was this approach that later led to the widely accepted Nash equilibrium concept, which addressed the problem from an equilibrium rather than an optimization perspective. The third point of view is that of the players optimizing in a leader-follower sequence, i.e. the follower optimizing against a known mixed strategy of the leader, and thus obtain a “value of the game” for the leader. Applied to bimatrix games and assuming that irrevocable commitment on mixed strategies by the players is possible, the objections one may raise to the third point of view are firstly, what factors determine the order of play for the players and secondly, how is the leader going to optimize if the follower has a non-unique best response?

The problems that emerge in the leader-follower approach as well as the comparison of simultaneous and sequential move versions of the same underlying game, the latter under various assumptions on the sense of commitment of the leader, have been a topic of continuous research in the game theory literature. What these comparisons actually do is more or less investigating the relationship between the three “points of view” we talked about, often for specific classes of games. Since this literature has a wide variety of themes, we will refer only to papers related to the particular questions we examine in the present paper. All such references are cited in detail and in context to each particular question raised in the main text. However, even if it is not our topic, a small discussion of the rationale behind the leader-follower approach in bimatrix games and of the two objections mentioned above should be helpful in appreciating the rest of this paper.

Rosenthal (1991) examines normal form bimatrix games and related sequential versions, where one of the players commits to any of his (mixed) strategies and this commitment becomes known to the other player prior to his strategy selection. Rosenthal defines commitment-robust equilibria to be Nash equilibria of the simultaneous move game that are subgame perfect equilibria of both the related sequential move versions and argues that Nash equilibria that fail this property ought to be questionable, if there is sufficient flexibility in the rules of the game.

In an attempt to deal with the choice of the leader/follower, i.e. the first objection noted above, Hamilton and Slutsky (1993) propose a two stage generated model for bimatrix games that determines “endogenously” whether the game will be played sequentially and, in that case, the ordering of the two players. Using this model for non-degenerate bimatrix games, van Damme and Hurkens (1996) address a question related to that raised by Rosenthal (1991): when is an equilibrium of the original game an equilibrium of the generated game also? Such equilibria are termed “viable”. They prove that an equilibrium of the original game in mixed strategies is viable if and only if no player has an incentive to move first in the ordinary, sequential move, commitment game. It is clear that the Hamilton-Slutsky two stage generated model was chosen by van Damme and Hurkens as a tool to deal with the ordering of the players. However, although the imposition of a super-game over the original one, with its own rules and assumptions, leads to interesting results, it is questionable whether it actually resolves the issue and should be preferable to the original setup. It is our opinion that an answer that would settle this objection conclusively has yet to be proposed.

The second objection raised when one approaches bimatrix games by a leader-follower approach, namely that of the leader’s choice when the follower has more than one best responses to the leader’s strategy, has been settled by von Stengel and Zamir (2010). Prior to that publication, the prevailing approach was to consider either the case where the follower chooses his best response so as to accommodate the leader or (on the contrary) he chooses his best response so as to harm the leader4. It is questionable whether both these approaches agree with the principle of individual rationality of the follower and, anyway, they leave the question of real play for the leader open: how could his play depend on the assumption that the follower would play in this or that fashion? Certainly, the question pertains to degenerate bimatrix games, but yet, ignoring degeneracy is not a satisfactory answer. von Stengel and Zamir (2010) show that when a bimatrix game is played sequentially with the mixed strategy of the leader being observed by the follower before he makes his move, then all subgame perfect equilibria payoffs for the leader form a closed interval of the form , where is a payoff that the leader can guarantee (i.e. induce the follower to give a best response that results to ). This interval collapses to a point when the bimatrix game is non-degenerate.

An important implication of their result is that, for bimatrix games, is precisely the safety level of the leader, i.e. the “value” of the minorant game or majorant game (according to who plays first) in the von Neumann and Morgenstern (1953) sense. However, these two values will generally not be the same and of course, a player’s payoff will be different on the inducible subgame perfect equilibrium if he plays first or second. We shall call the safety level of the leader in the minorant game (resp. majorant) commitment value for player I (resp. player II) and any strategy of the leader that guarantees him his commitment value commitment optimal.

Based on these observations, there are three notions that their relationship needs to be studied further: matrix values and corresponding max-min strategies in the individual matrix games, Nash equilibria and their payoffs in the simultaneous move bimatrix game and finally, commitment values and commitment optimal strategies in the minorant/majorant games.

In the present paper, we try to investigate these relationships. Our results are far from being complete, however we got some interesting characterizations. Firstly, we derive two general properties of Nash equilibria and commitment optimal strategies. We show that when the leader has a pure commitment optimal strategy in a non-degenerate leader-follower game, then this strategy together with the follower’s best response form a Nash equilibrium in pure strategies in the underlying bimatrix game. Hence, if it is possible to improve upon his Nash equilibria payoffs, the leader must use mixed strategies in the leader-follower non-degenerate game, which underlines the importance of mixed strategies per se. This result is not true in the case of degeneracy. Also, we show that in a non-degenerate bimatrix game a player can strictly improve his payoff at a completely mixed Nash equilibrium by commitment, provided his Nash equilibrium strategy is not matrix game optimal (i.e. maximin).

Secondly, taking player I to be the leader (without loss of generality) and letting denote the value of the leader’s matrix game, we discuss the validity of the equation for various classes of bimatrix games that are viewed as generalizations of zero-sum games.5 Of course, this equation is true for zero sum games. We show that the equation obtains for weakly unilaterally competitive games, but is false for other generalizations of zero-sum games, such us almost strictly competitive games, pre-tight, best response equivalent to zero sum games, and strategically zero sum games. Of particular interest is the case where the payoff at all Nash equilibria of a bimatrix game is equal to the value of the leader’s matrix game but less than his commitment value. The Nash equilibrium is a questionable solution concept for such games.

Thirdly, we discuss the game known as the Traveler’s Dilemma-TrD (see the pertinent section for references on TrD), which is a typical example of a bimatrix game whose unique Nash equilibrium is compelling under standard equilibrium arguments (e.g. domination of strategies) but unattractive under both optimality considerations and common sense, and also unsupported by experimental game theory6. For the TrD, we derive the commitment value and the commitment optimal strategy of the players (the same for both due to symmetry), which are very close both to the Pareto optimal outcome and to the behavior of game participants in experiments. It is noteworthy that the follower’s payoff is then identical to that of the leader and thus, the symmetry of the game is preserved by the leader-follower solution. TrD is an example of a game where the payoff at Nash equilibria equals to the matrix game value of the leader which is strictly less than his commitment value.

Finally, for the case of bimatrix games, we exhaustively examine the relationship between commitment optimal strategies of the leader, optimal responses of the follower and Nash equilibrium strategies of both. We also provide necessary and sufficient conditions for the follower to be worse off at the equilibrium of the leadership game than at any Nash equilibrium of the simultaneous move game.

### 1.1 Outline

The rest of the paper is structured as follows. In section 2, formal definitions and notations for bimatrix games and their associated leadership games are given. In addition, some results from von Stengel and Zamir (2010) are presented, which are used subsequently. Section 3 presents properties and compares matrix game values, Nash equilibria payoffs and commitment values. In section 4, special classes of bimatrix games that have been proposed as generalizations of zero-sum games are examined with respect to the relationship between maximin, optimal commitment and Nash equilibria strategies. In section 4.5, we discuss the traveler’s dilemma. Finally, section 5 is devoted to bimatrix games, discussing the relationship between commitment optimal strategies and Nash equilibrium strategies, and also comparing the payoffs of the follower in these two cases.

## 2 Definitions

We consider the mixed extension of an bimatrix game played by players I and II, which we denote shortly by . The sets of pure strategies of the players are respectively. A pure strategy of player I will be denoted by or simply by , for , when confusion may not arise. Similarly, a pure strategy of player II will be denoted by or simply by , for . The sets of mixed strategies of player I (resp. player II) will be denoted by (resp. ), where and are the dimensional and dimensional probability simplexes. For and , the payoffs of player I and II are given by and . The value of the matrix game of player I will be denoted by . It could be considered as I’s safety level, since I may guarantee the value of the matrix game no matter what strategy player II chooses; however we will avoid this interpretation in view of our previous discussion on commitment value. Of course, . The corresponding quantity for player II is , the value of . Given a strategy of player I, a strategy of player II is a best response to if . In symbols, . Note, that . A strategy is strongly dominated by a strategy if for all and weakly dominated by a strategy if for all , with strict inequality for at least one .

For any , the best reply region of is the set on which is a best reply, i.e. . The sets for are defined similarly. The edges of are denoted by . By standard convexity arguments, in any bimatrix game. We say that is full-dimensional, if there is such that for all , i.e. if the interior is not empty. Let . von Stengel and Zamir (2004) show that if and only if is not (weakly) dominated. For , let , i.e. denotes the set of all pure strategies , that are payoff equivalent to .

Given a bimatrix game , we consider two associated leadership (leader-follower) or commitment games, denoted by and . We define to be the game at which player I is the leader, i.e. he moves first and commits to a strategy, possibly mixed, and player II is the follower, i.e. he moves second, after having observed the strategy choice of player I. Formally, the strategy set of player I is as in , while the strategy set of player II is the set of measurable functions . The payoffs of the players in are determined by their payoff functions in , that is and . When considering best replies of player II, we will usually restrict attention to pure strategies since there is no need for the follower to employ a mixed strategy. Similarly, we define as the game where player II moves first and player I moves second. In what follows, we use with player I as the leader in a generic fashion, but, unless stated otherwise, results apply also to .

A bimatrix game is non-degenerate if no mixed strategy of any player has more pure best replies than the size of its support. When considering the game we will need this property to hold only for mixed strategies of the player that moves first, i.e. player I. This motivates the following definition

###### Definition 2.1.

A bimatrix game is non-degenerate for player , for , if no mixed strategy of player has more pure best replies among the strategies of player than the size of its support.

For the leadership games (and ), we use the subgame perfect equilibrium as a solution concept, under which an optimal strategy of the follower is a best reply to any . For the simultaneous move game , we will be interested in Nash equilibria and, in some cases, in correlated equilibria (Aumann (1974)) or in coarse correlated equilibria (Moulin and Vial (1978)).

A Nash equilibrium strategy profile will be denoted by , with and . The set of all Nash equilibria strategy profiles of the bimatrix game will be denoted by and the set of all Nash equilibria payoffs by . The payoffs of player I and II at a Nash equilibrium will be denoted by and respectively. Finally, we write for the set of all strategies of player I (resp. II) that participate in some Nash equilibrium of the simultaneous move game.

### 2.1 Existing results: equilibrium payoffs of the leader

The present work builds upon results that appeared recently in the literature, some of which we present here. von Stengel and Zamir (2010) prove that in a degenerate bimatrix game, the subgame perfect equilibria payoffs of the leader form an interval . The lowest leader equilibrium payoff is given by the expression

 αL=maxj∈Dmaxx∈X(j)mink∈E(j)α(x,k) (1)

and the highest leader equilibrium payoff is given by

 αH=maxx∈Xmaxj∈BRII(x)α(x,j)=maxj∈Jmaxx∈X(j)α(x,j) (2)

If the game is non-degenerate (or non-degenerate for the leader), then the leader has a unique subgame perfect equilibrium payoff in . In this case, the expressions in (1) and (2) coincide. In fact, less than non-degeneracy is required for the equality (hence uniqueness) to hold. If any best reply region is full-dimensional or empty and if there are no payoff equivalent strategies, i.e. if for any , then expression (1) yields the same value as expression (2), i.e. . See also Example 3.5 in section 3.2 for such a case.

For the non-degenerate case, the intuition behind deriving the unique leader equilibrium payoff is the following. The leader commits to a strategy and the follower gives a best response, which the leader may force to be the best possible for him among player II’s best responses. This follows from an -argument. As a sketch, for any strategy that admits more that one pure best replies, player I may sacrifice an and move to a nearby strategy that admits a unique pure best reply. von Stengel and Zamir (2004) call such a strategy of the follower inducible, in the sense that by sacrificing the leader may induce the follower to use it. By the non-degeneracy property, this can be done for any pure best reply of the follower against , for any . In equilibrium there is no sacrifice and the result obtains.

In non-degenerate bimatrix games, the unique leader equilibrium payoff is identified as the highest payoff for that player on an edge of his part of the Lemke-Howson diagram (meaning the part of the diagram on this player’s simplex).

In games that are degenerate or at least degenerate for the leader (cf. definition 2.1), the reason for the possible difference between the lowest () and highest () leader equilibrium payoff is that not all pure best replies of the follower may be inducible. This is so for the case of weakly dominated strategies, which have not full-dimensional best reply regions and for the case of payoff equivalent strategies which have best reply regions that fully coincide. Again, corresponds to the best possible subgame equilibrium payoff for the leader. To obtain the lowest leader payoff , von Stengel and Zamir (2004) reduce the original game to a game that is non-degenerate for the leader, by ignoring his payoffs against weakly dominated strategies and solving for his safety level against payoff equivalent strategies of the follower. The unique leader equilibrium payoff in this reduced, non-degenerate (for the leader) game, is now the lowest payoff that he may guarantee in the original leadership game.

If denotes the lowest and the highest Nash equilibrium payoff of player I in , von Stengel and Zamir (2010) show that and7 . Together with the trivial inequality , their result establishes lower and upper bounds for the Nash equilibria payoffs of a player. So, in degenerate games with

 vA≤l≤αLvA≤l≤h≤αH (3)

while for non-degenerate games, since , (3) simplifies to

 vA≤l≤h≤αL (4)

We have already noted that should not be viewed as the lowest subgame perfect equilibrium payoff for the leader, but rather as his commitment value in the leadership game, i.e. a payoff that he may guarantee assuming the other player is rational (utility maximizer).

### 2.2 Motivation: Safety levels and Nash equilibria payoffs

Hence, applied to bimatrix games, the optimization point of view leads to as a candidate for the generalization of the notion of safety level. As we saw, von Neumann and Morgenstern (1953) take the leader-follower approach to discuss the safety level in matrix games, for which of course . At first sight, the payoff refers to a different game than the payoffs with which we want to compare it. However, as Conitzer (2016) argues, the leadership equilibrium should be viewed as a distinct solution concept for the game itself and not as an application of the Nash equilibrium concept on the different game . So, one wonders whether a theory can be developed for the solution of non-zero sum games which will not originate from the equilibrium point of view but will be based on the leader-follower approach, which is generated from the optimization point of view. In that case, will play a central role for the case of bimatrix games.

Based on our discussion thus far and on equations (3) and (4), one is motivated to raise certain questions. Firstly, is it possible to characterize all bimatrix games for which ? This inequality, which already holds for non-degenerate games, leads to discarding all Nash equilibria if the rules of the game permit commitment, as is often the case. Certainly, in that case a new problem appears: How will the leader be chosen among the two players? Secondly, for which classes of bimatrix games does equation (4) collapse to , i.e. the leader may not guarantee more than the safety level of his payoff matrix?

## 3 Nash Equilibria, maximin strategies and commitment optimal strategies

Along with the leader payoffs at subgame perfect equilibria of , we want to study his equilibrium strategies. In agreement with the notation and , we denote by and the strategies that the leader uses to attain these payoffs. However, and may be non-unique.

###### Example 3.1.

In the bimatrix game with payoff matrices

 A=⎛⎜⎝[r]410320003.5⎞⎟⎠B=⎛⎜⎝[r]120430001⎞⎟⎠

, but this payoff can be achieved by player I (the leader) in with two different commitment strategies: either , which induces the follower to play or which induces the follower to play . This is despite the fact that the game is non-degenerate.

To proceed, we give a formal definition of commitment value and commitment optimal strategies.

###### Definition 3.2.

Let be a bimatrix game and let and be the associated leadership (leader-follower) or commitment games, where is the leader in and is the leader in . Then, the leader’s inducible lower subgame perfect equilibrium payoff in (resp. in ) will be called commitment value for (resp. ). A strategy that guarantees his commitment value to a player will be called commitment optimal.

As the example above shows, the set of commitment optimal strategies for a player need not be a singleton. We will take player I as the “default” leader player and we will denote by the set of his commitment optimal strategies, with generic element . A strategy that the leader may induce the follower to use when playing commitment optimally8 by using will be denoted by and the corresponding payoff of the follower will be denoted by or simply , i.e. , where such that . Notice that at different strategy pairs the payoff of the leader is constant and equal to but the follower may obtain different payoffs .

### 3.1 Monotonicity of the bounds: matrix game value, commitment value and equilibria payoffs

We start with the observation that the lower and upper bounds in relations (3) and (4), i.e. and (resp. ) exhibit certain monotonicity relations to the sizes of the pure strategy spaces (i.e. the number of pure strategies) of the players. The proof is immediate from the definitions and thus ommitted.

###### Lemma 3.3.

Let and be the numbers of pure strategies of player and respectively in a bimatrix game . Then

1. The value of the matrix game of player in , , is a non-decreasing function of and a non-increasing function of .

2. The lowest and highest leader payoffs and of player in are non-decreasing functions of , but not necessarily non-increasing in .

3. The Nash equilibria payoffs of player do not have a certain monotonicity relation to the number of the player’s own strategies or to the number of the other player’s strategies.

Although obvious, Lemma 3.3 highlights an undesired property of the Nash equilibrium. Given the strategies of his opponent, having more choices should offer a strategic advantage to a player. While this is indeed the case in terms of the leader’s commitment value , his highest subgame perfect equilibrium payoff , and his matrix game value , having more options may well be harmful in terms of his Nash equilibria payoffs in .

It is easy to construct such an example by referring to the TrD (see section 4.5). There, consider first the game having the same payoff functions , but strategy spaces and start increasing the strategy space of player I by adding 99, then 98, etc.

The introduction/removal of strongly dominated strategies affects these bounds in a non-trivial way. For example, in evaluating the matrix game value , the is taken against all strategies in , including strongly dominated strategies of player II. In view of Lemma 3.3, this means that may decrease by the addition of a strongly dominated strategy to player II’s strategies.

On the other hand, Nash equilibria payoffs and leader payoffs remain unaffected if strongly dominated strategies of the other player are introduced/removed. A more interesting case occurs when I has strategies that are strongly dominated in but not in . Such strategies do not affect or Nash equilibrium payoffs , but may improve his leader payoff bounds . This subject is addressed in detail later, see example 4.7 and section 4.5 for instances of this case.

### 3.2 Nash equilibria and pure commitment optimal strategies

We show that for non-degenerate bimatrix games any pure commitment optimal strategy of I, together with II’s best response to it, constitute a Nash equilibrium of . In the proof we make use of an observation by von Stengel and Zamir (2010), namely that in a non-degenerate game any best reply region is either empty or full-dimensional, i.e. either or .

###### Proposition 3.4.

If the bimatrix game is non-degenerate for player , and if has a pure commitment optimal strategy , then the strategy profile is a pure strategy Nash equilibrium of .

###### Proof.

Let and let . Since the game is non-degenerate for I, is unique, i.e for any other . Since , must be full-dimensional, so that for any and for sufficiently small, the mixed strategy lies only in . Since and is linear, we conclude that . But then, , i.e. . Hence, are both best responses one to the other in . ∎

The construction of the mixed strategies , , has been employed by von Stengel and Zamir (2004) and von Stengel and Zamir (2010) in various proofs. If the game is degenerate, then the pure strategy may also belong to another for some , and the statement of Proposition 3.4 is not always true. This is highlighted by the following example.

###### Example 3.5.

The bimatrix game with pure strategy spaces and payoff matrices

 A=⎛⎜⎝[r]−321−303⎞⎟⎠B=⎛⎜⎝[r]00123−1⎞⎟⎠

has a unique Nash equilibrium with payoffs . The pure strategy of I has two pure best replies, i.e. , hence the game is degenerate (for player I). Nevertheless, both best reply regions and have full dimension and therefore player I’s equilibrium payoff in the leadership game is unique (c.f. section 2.1) and may be determined by relation (2). The edges of the best reply regions and the corresponding payoffs for player I are

 C(4)={s1,(0,0.8,0.2),s3},with α(x,jF=4)x∈C(4)={−3,0.8,0}C(5)={s1,(0,0.8,0.2),s2},with α(x,jF=5)x∈C(5)={2,−1.8,−3}

giving that , with which, however, is not a pure strategy equilibrium of .

In view of Proposition 3.4, a necessary condition for the commitment value to be strictly better than all Nash equilibria payoffs in a non-degenerate bimatrix game, i.e. for , is that all strategies in are mixed. In other words, actual mixed strategies have to be used if the leader is to improve his payoff over all Nash equilibria. This is in line with the “concealment” interpretation of mixed strategies as expressed in von Neumann and Morgenstern (1953) and is to be expected since the leader-follower game is precisely the sort of game that von Neumann and O. Morgenstern were considering when arguing about the need of concealment9.

### 3.3 Completely mixed Nash equilibrium strategies vs commitment optimal strategies

The next property states that a player’s strategy at a Nash equilibrium is strictly dominated by his commitment optimal strategy provided (a) the bimatrix game is non-degenerate, (b) the Nash equilibrium under consideration is completely mixed, and (c) the equilibrium strategy under consideration is not matrix game optimal (i.e. maximin).

A Nash equilibrium is completely mixed if all pure strategies of both players are played with positive probability, which implies that any is a best response against and similarly any is a best response against . Moreover, the non-degeneracy property implies that there is no other completely mixed equilibrium and that the supports of have equal size, i.e. .

###### Proposition 3.6.

Let be a non-degenerate bimatrix game with a completely mixed Nash equilibrium , such that is not a matrix game optimal (maximin) strategy. Then .

###### Proof.

By Lemma 1 of Pruzhansky (2011), cannot be a column equalizer in the payoff matrix of player I, since otherwise would be a maximin strategy. Hence, there exist in such that . Since, is inducible by the non-degeneracy property, the point constitutes a payoff that player I can guarantee for himself in and hence . ∎

Example 5.6 describes a game where Proposition 3.6 holds for both players. If is maximin then an improvement may not be possible, as in the matrix game of matching pennies. In case Proposition 3.6 applies, the strategy profile Pareto-dominates the completely mixed Nash equilibrium, since by construction, and .

## 4 Commitment value and Nash equilibria payoffs in generalizations of matrix games

Many classes of games that extend two-person zero-sum games have been studied in the literature. Among others, one is referred to Aumann (1961); Kats and Thisse (1992), Moulin and Vial (1978) and Beaud (2002), who generalize zero-sum games in different ways (see the review by Viossat (2006)). A natural question is whether all three solution concepts (i.e. matrix game value, Nash equilibria payoffs, and commitment value) coincide on such generalizations, as is the case for zero-sum games, i.e whether

 vA=αL=αH. (5)

Of interest is also the case where all Nash equilibria payoffs of the leader are equal to his matrix game value, but his commitment value is strictly higher, i.e.

 vA=h<αL. (6)

In section 4.2, it is shown that (5) is true for the class of weakly unilaterally competitive (wuc) games, firstly defined by Kats and Thisse (1992). The wuc games strictly include the classes of zero-sum and strictly competitive games. Recently, interest in wuc games was reignited in view of the sufficient conditions for the existence of pure strategy equilibria in such games that were given by Iimura and Watanabe (2016).

Equation (5) is also valid in the class of a-cooperative games. However, for other generalizations of zero-sum games, namely pre-tight, best response equivalent to zero-sum, and almost strictly competitive games, (5) is not valid. Equation holds in some of these classes and then (6) may be true in certain cases (for details see section 4.4).

Classes generalizing zero sum games which satisfy (5), such as wuc or a-cooperative games, retain the flavor of pure antagonism that characterizes zero-sum games. All solution concepts on them coincide and no controversies on what constitutes an optimal behavior for the players may arise. On the other hand, we expect Nash equilibrium to be a questionable solution concept for games in classes generalizing zero sum games which satisfy (6). Such “bad behavior” cases may be found in the class of pre-tight games, a typical example being the TrD. For a detailed exposition see 4.4 and 4.5.

### 4.1 A sufficient condition

Here we examine conditions under which equation (5) is valid. Obviously this obtains, if

 maxj∈BRII(x)α(x,j)=minj∈Jα(x,j),∀x∈X (7)

Also, if there exists some such that

 maxj∈BRII(x0)α(x0,j)=minj∈Jα(x0,j)≥maxj∈BRII(x)α(x,j),∀x∈X, (8)

then, the left hand side of (8) is and since , we conclude that (8) is sufficient to get , i.e (5).

###### Example 4.1.

The bimatrix game with payoff matrices

 A=([r]10−2−10)B=([r]0110)

satisfies condition (8), but not (7) in .

### 4.2 Leader payoffs in wuc games

We now turn our attention to two person weakly unilaterally competitive (wuc) games.

###### Definition 4.2.

A bimatrix game is weakly unilaterally competitive, if for all and all

 α(x1,y)>α(x2,y) ⟹β(x1,y)≤β(x2,y) α(x1,y)=α(x2,y) ⟹β(x1,y)=β(x2,y)

and similarly if for all and all

 β(x,y1)>β(x,y2) ⟹α(x,y1)≤α(x,y2) β(x,y1)=β(x,y2) ⟹α(x,y1)=α(x,y2)

Beaud (2002) observes that the classes of two-person zero-sum, strictly competitive (sc), unilaterally competitive (uc) and weakly unilaterally competitive (wuc) bimatrix games satisfy the inclusion relation . As an immediate consequence of the definition, wuc games satisfy the sufficient condition (7), and thus the leader’s equilibrium payoff is unique and both the commitment value and all Nash equilibria payoffs are equal to the matrix game value. Formally,

###### Proposition 4.3.

In a wuc game , the leader’s payoff at any subgame perfect equilibrium of the commitment game is equal to his commitment value which is equal to his matrix game value, i.e. and . Moreover, all Nash equilibria payoffs of the wuc game are equal to the matrix game values, i.e. for all .

###### Proof.

By Definition 4.2, we have that for any and any with and

 β(x,j1)=β(x,j2) ⟹α(x,j1)=α(x,j2) β(x,j1)>β(x,k′) ⟹α(x,j1)≤α(x,k′)

which together imply that , i.e (7) is satisfied. Similarly, using the first part of Definition 4.2, the result follows for player II. The second part of the proposition, already known in the literature, follows trivially from the inclusion of Nash equilibria payoffs between the matrix game values and the highest subgame perfect equilibrium payoff (resp. ) of the commitment game.∎

van Damme and Hurkens (1996) derive a result similar to our Proposition 4.3 for the class of strictly competitive games, which, as we have seen, is a subclass of wuc games.

Note: Proposition 4.3 can be generalized for N-player wuc games, . Such games are defined similarly to Definition 4.2 (see Kats and Thisse (1992)). Also, for N-player games, the corresponding leadership game , is defined similarly: in the first stage, the leader (say, player 1) commits to a mixed strategy and in the second stage, the remaining players simultaneously choose their strategies knowing the mixed strategy of the leader. So, take to be a wuc N-player game and let be the sub-game where player 1 has fixed his mixed strategy . Then, is also a wuc game. Let be the set of mixed uncorrelated strategy profiles of the followers and let be the set of strategies of the followers that are at equilibrium in . By a result of de Wolf (1999) for N player wuc games, if denotes player 1’s payoff in , then, for any , i.e. for each , player 1’s payoff is constant over all equilibrium strategies of the followers in and it is the worse possible outcome for him over all mixed strategies of the followers. But then, which implies that .

### 4.3 Games of common interest

Opposite to bimatrix games resembling zero-sum games stand games where there is a strong motivation for the cooperation of the two players. In such games we may get the equality of and by a condition opposite to (7), namely

 maxj∈BRII(x)α(x,j)=maxj∈Jα(x,j),∀x∈X (9)

This condition guarantees that , since and therefore (9) is sufficient to get for non-degenerate games. For degenerate games though, this is not the case as the next example shows.

###### Example 4.4.

The bimatrix game with payoff matrices

 A=([r]−12−20)B=([r]1111)

satisfies condition (9) for player I, but .

An alternative sufficient condition for to be equal to is the existence of an such that is acquired at and is constant on . Then, for all , and hence , which of course implies . This condition is not sufficient to guarantee , as the following example shows

###### Example 4.5.

The bimatrix game with payoff matrices

 A=([r]300024)B=([r]30−102−1)

satisfies the relaxed condition for player I, hence , but .

### 4.4 Other generalizations of zero-sum games and some counterexamples

The result of Proposition 4.3 does not apply to other generalizations of zero-sum games that have been studied in the literature. Such generalizations include the class of a-cooperative games, which is studied in a setting similar to ours by D’Aspremont and Gérard-Varet (1980), the class of almost strictly competitive games (asc) introduced by Aumann (1961), the class of pre-tight games, Viossat (2006), and the class of strategically zero-sum games, Moulin and Vial (1978). We provide the main definitions of these classes for bimatrix games, but for the general case one is referred to the relevant works.

Let be a bimatrix game with strategy spaces and payoff functions . A twisted equilibrium of is a Nash equilibrium of the game , which is played over the same strategy spaces with payoff functions and . We denote with and the sets of twisted equilibria and twisted equilibria payoffs of . A pair of strategies is a called saddle-point of , if

 α(x,ys)≤α(xs,ys)≤α(xs,y)andβ(xs,y)≤β(xs,ys)≤β(x,ys)

for all . We denote with the set of saddle points of . For every bimatrix game, . A pair of strategies is Pareto-optimal if there is no other pair of strategies giving at least as much to every player and more to some player.

A bimatrix game is a-cooperative if it has at least one Pareto-optimal twisted equilibrium. A bimatrix game is almost strictly competitive (asc) if and .

Similarly to wuc games, see section 4.2, D’Aspremont and Gérard-Varet (1980) and Beaud (2002) observe that the classes of zero-sum, a-cooperative and asc bimatrix games satisfy the inclusion relation . Over asc games, Nash equilibria payoffs of the simultaneous move game are constant and equal to the players’ matrix game values, i.e. for all of the simultaneous move game.

In any two-person asc game the unique Nash equilibrium payoff is also the unique twisted equilibrium payoff. D’Aspremont and Gérard-Varet (1980) use this property to prove that in any a-cooperative game, both leaders in the associated leadership games will commit to Nash equilibrium strategies of the simultaneous move game and consequently they will receive their matrix game values, i.e. for a-cooperative games (5) is true.

However, this property does not extend to the class of asc games. As shown below (see section 4.5), the TrD is an asc game in which both players strictly improve their payoffs in the associated leadership games. In particular, (6) is true for TrD.

Viossat (2006) defines and studies the class of pre-tight games. A pure strategy (resp. j) of player I (resp.II) is called coherent if it is played in a correlated equilibrium of . A bimatrix game is called pre-tight if in any correlated equilibrium all the incentive constraints for non deviating to a coherent strategy are tight. Viossat (2006) notes that the class of two-player pre-tight games strictly contains two player zero-sum games and games with a unique correlated equilibrium. TrD, has a unique correlated equilibrium and thus provides an example that in a pre-tight game, contrary to zero-sum games, the leader and the follower may strictly improve their payoffs in the associated leadership games and .

Moulin and Vial (1978) define strategically zero-sum bimatrix games, which strictly contain zero-sum and strictly competitive games. A bimatrix game with payoff matrices is strategically zero-sum if it is strategically equivalent to a zero-sum game, i.e. if there exists a matrix game with payoff functions on the same strategy spaces, such that for all and for all . Additionally, they define the concept of a trivial game for a player and show that every bimatrix game that is trivial for a player is strategically zero-sum. They show that any strategically zero-sum game is best response equivalent10 to a zero-sum game and provide sufficient conditions for the converse to be true. They also show that in strategically zero-sum games a Nash equilibrium exists that dominates payoff-wise all other Nash equilibria of the simultaneous game.

Example 4.6 shows that in the class of strategically zero-sum games may be strictly greater than the payoff of the leader at the dominant Nash equilibrium of the simultaneous game.

###### Example 4.6.

Let be the bimatrix game

 A=([r]2−130)B=([r]21−10)

Then, strategy of player I is strictly dominated, thus the game is trivial for player I according to the triviality concept of Moulin and Vial (1978). The only Nash equilibrium of the simultaneous move game is with payoffs . In , and player’s I equilibrium strategy is which induces player II to play . The resulting unique equilibrium yields payoffs .

While best response equivalent games have the same set of Nash equilibria, the equilibria of their associated leadership games may differ. This is highlighted in the following example.

###### Example 4.7.

Let be the bimatrix game

 A=([r]31019)B=([r]3189)

and the bimatrix game

 A′=([r]3211)B′=B

results from by the affine transformation . and are best response equivalent, since the best reply regions for and for are

 Y(1)=Y′(1)=Y,Y(2)=Y′(2)=∅X(3)=X′(3)=[13,1],X(4)=X′(4)=[0,13]

and hence they have the same set of Nash equilibria, which is the singleton with payoffs . However, the unique leader equilibrium of is

 (xL,jF)=((13,23),t4),(αL,βF)=(913,613)

while the unique leader equilibrium of is

 (x′L,j′F)=(s1,t3),(α′L,β′F)=(3,3)

### 4.5 Traveler’s dilemma

Traveler’s dilemma (TrD) was first introduced by Basu (1994) and quickly attracted widespread attention as a game where rationality leads to a difficult to accept Nash equilibrium solution. In TrD, the equilibrium solutions of the simultaneous game and of the associated leadership games differ significantly. Notably, the associated leadership equilibria strategies are much closer to the behavior of the players, as observed in experiments, than the unique Nash equilibrium of the simultaneous move game. This discrepancy generates a strong motivation to study leadership games.

TrD is a symmetric bimatrix game with strategy spaces and payoffs

 α(i,j)=β(j,i)=⎧⎨⎩i+2,ij

The best reply correspondence of player I against is

 BRI(j)={j−1,j>22,j=2

and similarly for player II. The game can be solved with the process of iterated elimination of strongly dominated strategies (iesds) and hence it has the fictitious play property (see Monderer and Shapley (1996)). In the first round of elimination, the pure strategy is strongly dominated by the mixed strategy (for ) and hence eliminated. The process of iesds successively deletes all strategies except the pure strategy , which results to the pure strategy profile being the unique Nash equilibrium. Hence, the equilibrium survives any Nash equilibrium refinement concept. It is also the unique correlated equilibrium and the unique rationalizable strategy profile11, implying that neither correlation nor rationalizability may improve upon this Nash equilibrium. The TrD has attracted interest mostly due to the fact that this unique solution, although having a very strong theoretical argument in its favor since it is derived by iesds, is inefficient in terms of the social welfare, counter-intuitive and differs significantly from the observed behavior of the players in conducted experiments. In the penultimate round of elimination the game corresponds to a prisoner’s dilemma

 A=([r]3042)B=([r]3402)

Hence, TrD may be viewed as a generalization of prisoner’s dilemma to strategies. Contrary to the intuition that cooperation should be beneficial for both players, this is not the case as TrD turns out to be an almost strictly competitive game, since

a fact that forces the players’ payoffs to their matrix game values. TrD resembles Bertrand duopoly, an observation made already in Basu (1994). Halpern and Pass (2012) apply their new solution concept of iterated regret minimization to TrD and derive a satisfactory solution.

Although correlated equilibrium or rationalizability do not improve upon the Nash equilibrium outcome in TrD, coarse correlation and leadership significantly do so. It is straightforward to check that the distribution with

 zij={12,i=j=100, and i=j=98( or i=j=97),0, else

is a symmetric coarse correlated equilibrium, with payoffs equal to (or ) for each player. Note that starting from this distribution, one may see that there exists a great multitude of coarse correlated equilibria in TrD. However, the Pareto-optimal outcome is not a coarse correlated equilibrium.

The unique equilibrium of (commitment optimal strategy of the leader and optimal response of the follower) may be calculated by equation (2) after considerable simplifications in the strategy spaces and is given by

 xL=13(100)+13(99)+13(97),jF=(99)

with payoffs . The leader-follower approach to a solution concept works well for TrD since payoffs are the same for both players, irrespective of who moves first, and they are very close to the Pareto optimal outcome. In other words, if the rules of the game permit commitment, one expects the leader-follower equilibrium to prevail over the Nash equilibrium of the simultaneous move game.

## 5 2×2 bimatrix games

In bimatrix games, where and , we may derive some special properties for the associated leadership games. Let the payoff matrices be

 A=([r]a1a2a3a4)B=([r]b1b2b3b4)

In this case, when solving , the possible edges of the best reply regions and are , where denotes the equalizing strategy of player I over player II’s payoffs with

 d:=b1−b2b1−b2+b4−b3,if b1−b2+b4−b3≠0 and 0≤d≤1.

Similarly, in the possible edges of the best reply regions and are , where denotes the equalizing strategy of player II over player I’s payoffs, i.e.

 c:=a4−a3a1−a2+a4−a3,if a1−a2+a4−a3≠0 and 0≤c≤1.

For a bimatrix game that is degenerate for the leader12, we first show that a stronger statement than that of Proposition 3.4 holds, namely that any commitment optimal strategy is a Nash equilibrium strategy of the simultaneous move game.

###### Lemma 5.1.

If a bimatrix game is degenerate for the leader, then .

###### Proof.

If the game is degenerate for player I, then player II has either a weakly dominated strategy or two payoff equivalent strategies.

In the first case, assume the weakly dominated strategy is . This implies and hence by equation (1),

 αL=maxj∈D{maxx∈X(j)mink∈E(j)α(x,k)}=maxx∈X(3)α(x,t3)=maxx∈Xα(x,t3)

which implies that . Thus, any strategy that guarantees player I his payoff in is a best response against , which together with implies that is a Nash equilibrium in the simultaneous move game for all