Repeated Matching Pennies
with Limited Randomness111This research was partially supported by NSF grants CCF-0829754 and DMS-0652521.
We consider a repeated Matching Pennies game in which players have limited access to randomness. Playing the (unique) Nash equilibrium in this -stage game requires random bits. Can there be Nash equilibria (or -Nash equilibria) that use less than random coins?
Our main results are as follows
We give a full characterization of approximate equilibria, showing that, for any , the game has a -Nash equilibrium if and only if both players have random coins.
When players are bound to run in polynomial time with bits of randomness, approximate Nash equilibria can exist if and only if one-way functions exist.
It is possible to trade-off randomness for running time. In particular, under reasonable assumptions, if we give one player only random coins but allow him to run in arbitrary polynomial time with bits of randomness and we restrict his opponent to run in time , for some fixed , then we can sustain an -Nash equilibrium.
When the game is played for an infinite amount of rounds with time discounted utilities, under reasonable assumptions, we can reduce the amount of randomness required to achieve a -Nash equilibrium to , where is the number of random coins necessary to achieve an approximate Nash equilibrium in the general case.
In the classical setting of Game Theory, one of the core assumptions is that all participating agents are “fully rational”. This amounts not only to the fact that an agent must be able to make optimal decisions, given the other players’ actions, but also to the fact that he must understand how these actions will affect the behavior of all other participants. If this is the case, a Nash equilibrium can be viewed as a set of strategies in which each agent is simply computing his best response given his opponents’ actions. However, in real world strategic interactions, people often behave in manners that are not fully rational. There are many reasons behind non-rational behavior, we focus on two: limitations on computation and limitations on randomness.
Since the work of Herbert Simon , much research has focused on defining models that take computational issues into account. In recent years, the idea that the full rationality assumption is often unrealistic has been formalized using tools and ideas from computational complexity. It is in fact easy to come up with settings, as in Fortnow and Santhanam , in which simply computing a best response strategy involves solving a computationally hard problem. Furthermore there is strong evidence that, in general, the problem of finding a Nash equilibrium is computationally difficult for matrix games (Daskalakis, Goldberg and Papadimitriou , Chen and Deng ).
Traditionally bounded rationality has focused on two computational resources: time and space. In this paper we focus on another fundamental resource: randomness.
It is a basic fact that games in which agents are not allowed to randomize might have no Nash equilibrium. In this sense, randomness is essential in game theory. We focus on a simple two player zero-sum game that captures this: Matching Pennies (Figure 1).
Specifically, we consider the repeated version of Matching Pennies, played for rounds. In this game, the unique Nash equilibrium is the one in which, at every round, both players choose one of their two strategies uniformly at random. The algorithm that implements this strategy requires random coins, one for each round. The main question we address in this paper is: can there be Nash equilibria if the amount of randomness available to both players is less than ?
First we show that, in general, the game cannot have a Nash equilibrium in which both players have only a fraction of random coins. In particular, when we give both players random coins, we can only achieve a -Nash equilibrium. This turns out to be tight, in the sense that we can show that any game with a -Nash equilibrium both players must have at least coins.
The proof of this fact, however, relies on the players’ ability to implement a strategy that runs in exponential time. We then consider games in which the players’ strategies are polynomially-bounded. Using ideas developed in cryptography and computational complexity we show that, in this setting, -Nash equilibria that use only coins exist if and only if one-way functions exist.
We also show that the amount of randomness can be “traded” for time. If we allow one of the players to run in arbitrary polynomial time, but use only bits, we can still achieve a -Nash equilibrium if we restrict his opponent to run in time for some fixed , while giving him random bits.
Finally we consider an infinitely repeated game with time discounted utilities. In this case, in general, for any discount factor and approximation , we can always achieve a -Nash equilibrium with only random coins, if is large enough. When we limit players’ strategies to polynomial size circuits, we can reduce the amount of randomness to , for any .
There are many recent approaches to bounded rationality using a
computational complexity perspective. For instance Halpern and
Pass  study games in which players’
strategies are Turing machines. The idea of considering
randomness as a costly resource in game theory has received
only limited attention. Kalyanaraman and Umans
 study zero-sum games, and
give both an efficient deterministic algorithm for finding
-Nash equilibria, as well as a weaker, but more
general, result in the spirit of our Lemma
3.1, giving a randomness-efficient adaptive
on-line algorithm for playing repeated zero-sum games. Hu
 also considers a similar setting but he
is concerned with computability rather than complexity. He
considers infinitely repeated plays of 2 player zero-sum games
that have no pure strategy Nash equilibrium, and in which
players have a set of feasible actions, which represents both
the strategies they can play and the strategies they can
predict. In this setting Hu gives necessary and sufficient
conditions for the existence of Nash equilibria. Finally
Gossner and Tomala , give entropy
bounds on Bayesian learning in a game theoretic setting, in a
more general framework then this paper. Their results applied to
Matching Pennies do not achieve the tight bounds we get in
The rest of the paper is organized as follows. In Section 2 we introduce the notation and known results used. Section 3 presents an information theoretic impossibility result. Section 4 considers players whose strategies are limited to polynomial sized Boolean circuit families, while Sections 5 and 6 give extensions of the main results to complexity pseudorandom number generators and infinitely repeated versions of the game.
2 Background and Definitions
2.1 Game Theory Notation
Throughout the paper we consider a repeated game of Matching Pennies. We focus on this game because it captures one of the fundamental aspects of randomness in game theory. Studying such a simple game also allows us to get tight bounds. However, variations of our results extend to other similar 2 person zero-sum repeated games.
The payoffs at each round are shown in Figure 1. Let , be the payoff to Player 1 (P1), and the payoff for P2. When we allow the players to randomize, we denote as a randomized strategy on for player . P1’s expected payoff in one round is
Let be P1’s cumulative payoff when the game is played for rounds. In the repeated game, mixed strategies can be viewed as distribution over sequences of length that are dependend on the opponent’s strategy. Given the adversary’s strategy, let denote a randomized strategies for player . P1’s expected cumulative payoff is
Finally we define the expected average payoff to P1 for the -round game as
and consequently P2’s expected payoff is . To denote player’s payoff we will sometimes use the standard notation , where is player’s mixed strategy and is his opponent’s mixed strategy.
Definition 2.1 (Nash equilibrium).
A pair of mixed strategies is a Nash equilibrium for the -stage Matching Pennies game if, for :
In some cases we will consider a relaxed notion of equilibrium, namely -Nash equilibrium.
Definition 2.2 (-Nash equilibrium).
A pair of mixed strategies is a -Nash equilibrium for the -stage Matching Pennies game if, for :
2.2 Complexity and Pseudorandomness
The model of computation used throughout most of the paper is based on Boolean Circuits. We consider circuits with AND, OR and NOT gates, and denote by a circuit with input nodes. A circuit family is an infinite collection of circuits, intuitively one for each input length.
The size of a circuit is the number of gates. A circuit family is polynomial sized if there is a such that, for all , . The class of languages recognizable by families of polynomial sized circuits is called P/Poly. Any language that can be decided in polynomial time by a deterministic or randomized Turing machine is also in P/Poly. Formally .
A function is one-way if it is easy to compute and hard to invert.
Definition 2.3 (One-way Function).
A one-way function is a polynomial-time computable function such that, for all polynomial size circuits and (where is chosen uniformly at random on ), for all and sufficiently large .
Informally, two objects are indistinguishable if no polynomial sized circuit family can tell them apart with noticeable probability.
Definition 2.4 (Indistinguishability).
Let be two random variables on . We say that and are computationally indistinguishable if for every family of polynomial size circuits , every and for sufficiently large
A cryptographic pseudorandom number generator (PRNG) is a deterministic algorithm whose output is computationally indistinguishable from the uniform distribution, provided that it’s input is truly random. We will denote by a random variable uniformly distributed on .
Definition 2.5 (Cryptographic PRNG).
A cryptographic pseudorandom number generator is a deterministic polynomial time algorithm , where is a polynomial time computable function, such that and are computationally indistinguishable.
There are two basic properties about pseudorandom number generators that we will use. One relates the notion of pseudorandomness to the notion of predictability.
Definition 2.6 (Unpredictable).
Let be a polynomial time algorithm, and . We call unpredictable if for every family of polynomial size circuits , all and for sufficiently large
Intuitively a pseudorandom number generator must be unpredictable, otherwise we could easily build a test for it by using the predictor circuits. In 1982 Yao  proved the opposite implication, thus establishing the following theorem.
Theorem 2.7 (Yao’s Theorem).
A polynomial time algorithm is unpredictable if and only if is a pseudorandom number generator.
Håstad, Impagliazzo, Luby and Levin in 1999  showed how to construct pseudorandom number generators with polynomial expansion based on one-way functions.
Theorem 2.8 (PRNG’s from one-way functions).
One way functions exist if and only if for every there is a pseudorandom number generator with .
Cryptographic pseudorandom number generators’ main power lies in the ability to fool any polynomial sized adversary, while running in polynomial time. However, in other areas of complexity, such as derandomization, the crucial issue is having a smaller seed.
Definition 2.9 (Complexity PRNG).
A complexity pseudorandom number generator is a time computable function , such that for any circuit of size
The essential difference with cryptographic pseudorandom number generators is the order of quantifiers. A cryptographic pseudorandom number generator fools circuits of an arbitrary polynomial size. The complexity pseudorandom number generator fools circuits only of a fixed polynomial size but under the right assumptions requires far fewer random bits.
There exists and such that no circuit of size at most can compute if and only if there exists a complexity pseudorandom number generator with for some .
3 Information Theoretic Bounds
In this section we make no computational assumptions on the players, and show that there can be no Nash equilibrium if we limit the amount of randomness available to both players.
For any , if P2 has less than random bits, then P1 has a deterministic strategy that achieves an expected average payoff of at least .
We will give a strategy for P1 that achieves a high payoff against any strategy from P2.
Player 1 will enumerate all of P2’s possible coin flips, and will obtain a set of possible strategies, one of which is the one being used by P2. After each play by P2, P1 can eliminate all the strategies that do not play that action at that round. Let be the set of strategies that are consistent with P2’s plays up to round . Initially the set contains strategies, and, for all , .
The strategy for P1 is straightforward: at round , P1 will play based on the most likely event: he will consider all strategies in and play H if the majority of strategies in use H at round and play T otherwise. Let be the exact fraction of strategies that are the majority at round ,
so that . P1’s expected payoff at round is:
Thus P1’s average expected payoff is at least 0. To show that P1 can actually achieve an average expected payoff of we need to consider the amount of information P1 gains at each round. We define the following potential function :
which considers both the accumulated payoff for P1 and the log-size of the set of consistent strategies. At time there are possible strategies for P2, so . We will now lower bound the expected increase in at each round. We can express this as
Now consider . When P1 looses he can eliminate a fraction of strategies, thus the new set will contain a fraction of the strategies in . On the other hand, when P1 wins, . To complete the analysis we have to consider two cases, since if then is not well defined.
First assume . This happens when all feasible strategies for P2 have the same action at round . In this case P1 will win with probability 1, and the size of the set of feasible strategies will stay the same. So, overall, the increase in will be 1.
Now assume . Then, the expected size of the set is:
The expected change in does not depend on , but only on , since
So that the overall change in potential when is
which is always at least for .
So, for all , at each step the potential function increases by at least . Thus, after rounds we have that
Since P1’s expected payoff is at least , this completes the proof. ∎
This result immediately implies that, without any computational assumption, there can be no equilibrium with less than random coins.
For all such that , if P1 and P2 have, respectively, and random coins, then there can be no Nash equilibrium in the -stage Matching Pennies repeated game.
However, if we limit the amount of randomness available to both players, we are still able to achieve an -Nash equilibrium. Furthermore, if the game has a -Nash equilibrium, then both players must have at least random coins.
Let . The game has a -Nash equilibrium if and only if both players have random coins.
For simplicity we assume is the same for both players, however a similar result holds even in the case where the two players have a different amount of random coins.
To show the “only-if” implication, consider, by way of contradiction, a game that has a -Nash equilibrium but in which both players have less than random bits. Thus there must be a such that they have exactly random bits. Since is a -Nash equilibrium, it must be the case that, for any strategy for P1
By Lemma 3.1 we know that both players have a strategy that achieves a payoff of at least , so that the above implies . Applying the same argument to P2, we get , a contradiction.
The “if” part follows from Lemma 3.4 below. ∎
Let . If both players have random coins, then the game has a -Nash equilibrium.
For simplicity we assume is even.
Consider the following strategies: both player use their random coins to play uniformly at random for the first rounds. Thereafter P1 will always play , while P2 will alternate between and , playing . We claim that this is a -Nash equilibrium.
First notice that no player can improve his payoff in the first rounds, given his opponent’s strategy. Let’s consider the remaining rounds. P1 could improve his payoff by playing , however this only increases his payoff by . This holds also for P2, that could play , however gaining only . ∎
4 Computationally Efficient Players
The proof of Lemma 3.1 in the previous section relies heavily the fact that we make no computational assumptions. In particular, to implement the majority strategy and compute in (3.1) requires solving P hard problems, by reduction from SAT. If we restrict the players to run in time polynomial in this particular strategy likely becomes unfeasible. In this setting, under reasonable complexity assumptions, it is possible to greatly reduce the amount of randomness and, at the same time, achieve a -Nash equilibrium.
We consider players’ whose actions are polynomial size Boolean circuits. A strategy is thus a circuit family , such that circuit takes as input random coins and outputs the actions to be played. Notice that this definition implies that each agent can simulate any of his opponent’s strategies.
We consider equilibria that use random coins for any . Theorem 4.1 shows that such -Nash equilibria exist if and only if one-way functions exist.
If players are bound to run in time polynomial in , then, for all and sufficiently large , -Nash equilibria that use only random coins exist, where for all and sufficiently large ’s, if and only if one-way functions exist.
As a preliminary result we show that, in our setting, the expected utility when at least one player uses a pseudorandom number generator can’t be too far from the expected utility when playing uniformly at random.
Assume one-way functions exist, and let be the strategy corresponding to the output of a pseudorandom number generator. For any strategy that runs in time polynomial in , for all and sufficiently large ,
We prove only the first inequality, the proof for the second one being symmetric.
Proof by contradiction. Assuming there is a such that for infinitely many ’s, we will construct a test for , and show that
for some and infinitely many ’s, thus contradicting the assumption that is a pseudorandom number generator.
First consider the random variables , for , where is simply P1’s payoff at round . Since ,
This implies that there must be an such that . Fix that .
The test takes as input an -bit sequence and generates a sequence of plays according to strategy . Now simulates an -stage repeated Matching Pennies game with strategies . If P1 wins the -th round then it will output 1, otherwise the output will be 0. In other words, outputs 1 if and only if . Notice that runs in time polynomial in .
When is drawn from the uniform distribution, P1 will win with probability , or .
Now notice, that since , . This implies that
which proves the lemma. ∎
If one-way functions exist and players are bound to run in time polynomial in , then for every and for sufficiently large , the -stage has an -Nash equilibrium in which each player uses at most random bits.
Assume, by contradiction, that one-way functions exist but there are values and such that the game has no -Nash equilibrium in which players use at most random coins.
Since we assume one-way functions exist, by Theorem 2.8 there exist pseudorandom number generators that use coins. Assume both players use the output of such pseudorandom number generators as their strategies (which we call, respectively, and ). Since we are assuming that this is not a -Nash equilibrium, one of the players, say Player 1, must have a strategy such that
By Lemma 4.2 we can choose a such that
Pick , so that for . This contradicts Lemma 4.2, proving the claim. ∎
We now prove the opposite direction, that is that the existence of Nash equilibria that use few random bits implies the existence of one-way functions.
If for every there is a Nash equilibrium in which each player uses random bits and runs in time polynomial in , then one-way functions exist.
Let be such a Nash equilibrium and assume, by contradiction, that one-way functions don’t exist. This implies that pseudorandom number generators can’t exist (Goldreich ), and so, and can’t be sequences that are computationally indistinguishable from uniform.
Thus, by Yao’s theorem (Theorem 2.7), we know that there are polynomial size circuit families and such that
for some , where and .
To get a contradiction it is sufficient to show that players are better off by using the predictor circuits and . Consider Player 1: using , at each round he can guess, given the previous history, the opponent’s next move with probability . Thus his expected payoff at any round is
where the expectation is over the internal coin tosses of the predictor circuit . The overall expected payoff is
Now, let be the value of the expected payoff when players play . Consider the following cases:
: this implies that Player 1 could gain by using strategy ,
: by definition Player 2’s expected payoff is , so Player 2 can achieve a higher payoff by using his predictor circuit ,
In both cases we see that can’t be a Nash equilibrium, a contradiction. ∎
5 Exchanging Time for Randomness
In this section we determine conditions under which a -Nash equilibrium can arise, given that one of the players has only a logarithmic amount of randomness and his opponent must run in time for some fixed . This shows how we can trade off randomness for time; the player with random bits runs in time polynomial in , while the player with more random bits runs in fixed polynomial time.
Assume there exists and such that no circuit of size at most can compute and that one-way functions exist. Let Player 1’s strategies be circuits of size at most that use at most random bits for some , where is a constant related to the implementation of a cryptographic pseudorandom number generator. Assume Player 2 has access to only random bits. As long as , where is the constant in Theorem 2.10, then for all and sufficiently large there is a -Nash equilibrium.
Let be the cryptographic pseudorandom number generator available to Player 1 and be the complexity pseudorandom number generator used by player 2. Furthermore let be the set of all possible strategies for P1 (for all circuits of size at most that use random bits), and the set of strategies available to P2 (polynomial size circuit families and random coins). We will show that for all and sufficiently large , is a -Nash equilibrium with the required properties.
First we argue that, for all and sufficiently large , . The proof of this fact is similar to the proof of Lemma 4.2, showing by way of contradiction, that if then we can build a test for the cryptographic pseudorandom number generator .
Now we show that, for the appropriate setting of parameters, fools and fools . For any , Player 2 can fool circuits of size by using random bits. So, for , fools . Notice also that since Player 2 runs in time , the cryptographic pseudorandom number generator fools . Let be the one-way permutation used by the pseudorandom number generator , and assume us computable in time for some . Given , is defined as follows: let , and let be ’s seed (notice that ), then
where . There are applications of , so runs in time . So, for , fools .
At this point we’re almost done. As in Lemma 4.3 assume, by contradiction, that the assumptions in the theorem hold but is not a -Nash equilibrium for some . This implies that at least one of the two players can improve his expected payoff by more than by switching to some other strategy. First consider P2, and assume there is a strategy such that
As in Lemma 4.3 this implies that would be a test for the cryptographic pseudorandom number generator , contradicting the fact that fools . Similarly, assume P1 has a strategy such that
Again, can easily be made into a test for , contradicting the fact that fools . ∎
6 Infinite Play
We now consider an infinitely repeated game of Matching Pennies, and show that, if utilities are time discounted, we can always achieve a -Nash equilibria using a large enough (but finite) amount of random coins. First we determine the least amount of randomness required to achieve a -Nash equilibrium in the general, i.e. computationally unbounded, case.
For all discount factors and all , there is an -Nash equilibrium in which the players use only random bits, for .
Given consider the following strategies: both players play the Nash equilibrium strategy for the first rounds. After this P1 will always play , while P2 will play and alternatively. The overall expected payoff is 0. However, after round , both players could switch to a strategy that always wins, achieving a total expected payoff of . To ensure that our strategies are indeed a -Nash equilibrium we just need to make sure that
Rearranging and taking logarithms we get . ∎
Now we consider players’ whose strategies are families of polynomial size Boolean circuits (as in Section 4), and assume one-way functions exist. We first give a version of Lemma 4.2 for time discounted utilities on a finite number of rounds.
Assume one-way functions exist, and let be the strategy corresponding to the output of a cryptographic pseudorandom number generator. Let be any strategy. For all , and for sufficiently large
Again we give the proof only for the first inequality. Assume, by contradiction, that for some , and infinitely many ’s. Consider the random variables , defined as if P1 wins round when playing according to and 0 otherwise. Let , so that . This implies that there is a such that , which implies . Fix that . As in Lemma 4.2, consider the test that, given a sequence of plays , generates a play from and outputs 1 if P1 wins round and outputs 0 otherwise.
When is drawn uniformly at random, . On the other hand, when is ’s output,
for , contradicting the assumption that is a pseudorandom number generator. ∎
Using the above Lemma we can show that, for all discount factors, we can greatly reduce the amount of random coins needed to get an -Nash equilibrium.
For all discount factors , all and all , there is a -Nash equilibrium in which players use only random coins, for sufficiently large ’s.
As in the proof of Lemma 6.1 we consider the following strategy for both players: for the first rounds play the output of a cryptographic pseudorandom number generator , with seed length . Thereafter P1 will always play , while P2 will alternate between and . Pick any , we now show that this is a -Nash equilibrium. By Lemma 6.2 we can pick such that the expected utility in the first rounds lies in the interval . To ensure that this is a -Nash equilibrium we just need to show that
Now, for any , if we set , then the left hand side of the above inequality is , so that it always holds for sufficiently large ’s. ∎
We have shown how, in a simple setting, reducing the amount of randomness available to players affects Nash equilibria. In particular, if we make no computational assumptions on the players, there is a direct tradeoff between the amount of randomness and the approximation to a Nash equilibrium we can achieve. If, instead, players are bound to run in polynomial time, we can get very close to a Nash equilibrium with only random coins, for any .
Some directions for future research include:
Is it possible to extend Lemma 3.1 to player games, for ? Notice that the strategy used in that proof does not generalize to this setting.
Under what circumstances is it possible to further reduce the amount of randomness available (say to for both players)?
Is it possible to extend these results to general zero-sum games or even non zero-sum games?
We wish to thank Tai-Wei Hu, Peter Bro Miltersen, Rahul Santhanam and Rakesh Vohra for fruitful discussions.
-  S. Arora and B. Barak. Computational Complexity: A Modern Approach. Cambridge University Press, 2009.
-  X. Chen and X. Deng. Settling the Complexity of Two-player Nash Equilibrium. In 47th FOCS, 2006.
-  C. Daskalakis, P. W. Goldberg, and C. H. Papadimitriou. The complexity of computing a nash equilibrium. SIAM J. Comput., 39(1):195–259, 2009.
-  L. Fortnow and R. Santhanam. Bounding Rationality by Discounting Time. In 1st ICS, 2010.
-  O. Goldreich. Foundations of Cryptography: Basic Applications. Cambridge University Press, 2004.
-  O. Gossner and T. Tomala. Entropy Bounds on Bayesian Learning. Journal of Mathematical Economics, 44(1):24–32, 2008.
-  J. Y. Halpern and R. Pass. Game theory with costly computation: Formulation and application to protocol security. In 1st ICS, pages 120–142, 2010.
-  J. Hastad, R. Impagliazzo, L. Levin, and M. Luby. A Pseudorandom Generator from any One-way Function. SIAM Journal on Computing, 28(4):1364–1396, 1999.
-  T.-W. Hu. Complexity and Mixed Strategy Equilibria. http://bit.ly/e4N8cN, 2010. Working Paper.
-  R. Impagliazzo and A. Wigderson. Randomness vs time: Derandomization under a uniform assumption. J. Comput. Syst. Sci., 63(4):672–688, 2001.
-  S. Kalyanaraman and C. Umans. Algorithms for Playing Games with Limited Randomness. In 15th ESA, 2007.
-  N. Nisan and A. Wigderson. Hardness vs Randomness. J. Comput. Syst. Sci., 49(2):149–167, 1994.
-  H. Simon. A Behavioral Model of Rational Choice. The Quarterly Journal of Economics, 69(1):99–118, 1955.
-  A. Yao. Theory and Application of Trapdoor Functions. In 23rd FOCS, 1982.