Computing Strong Game-Theoretic
Strategies in Jotto††thanks: This material is based upon work supported by the National Science
Foundation under grants IIS-0964579, IIS-0905390, and CCF-1101668.
We develop a new approach that computes approximate equilibrium strategies in Jotto, a popular word game. Jotto is an extremely large two-player game of imperfect information; its game tree has many orders of magnitude more states than games previously studied, including no-limit Texas hold ’em. To address the fact that the game is so large, we propose a novel strategy representation called oracular form, in which we do not explicitly represent a strategy, but rather appeal to an oracle that quickly outputs a sample move from the strategy’s distribution. Our overall approach is based on an extension of the fictitious play algorithm to this oracular setting. We demonstrate the superiority of our computed strategies over the strategies computed by a benchmark algorithm, both in terms of head-to-head and worst-case performance.
Developing strong strategies for agents in large games is an important problem in artificial intelligence. In particular, much work has been devoted in recent years to developing algorithms for computing game-theoretic solution concepts, specifically the Nash equilibrium. In two-player zero-sum games, Nash equilibrium strategies have a strong theoretical justification as they also correspond to minimax strategies; by following an equilibrium strategy, a player can guarantee at least the value of the game in expectation, regardless of the strategy followed by his opponent. Currently, the best algorithms for computing a Nash equilibrium in two-player zero-sum extensive-form games (with perfect recall) are able to solve games with states in their game tree .
Unfortunately, many interesting games actually have significantly more than states. Texas hold ’em poker is a prime example of such a game that has received significant attention in the AI literature in recent years; the game tree of two-player limit Texas hold ’em has about states, while that of two-player no-limit Texas hold ’em has about states. The standard approach of dealing with this is to apply an abstraction algorithm, which constructs a smaller game that is similar to the original game; then the smaller game is solved, and its solution is mapped to a strategy profile in the original game . Many abstraction algorithms work by coarsening the moves of chance, collapsing several information sets of the original game into single information sets of the abstracted game (called buckets).
In this paper we study a game with many orders of magnitude more states than even two-player no-limit Texas hold ’em. Jotto, a popular word game, contains approximately states in its game tree. Unfortunately, Jotto does not seem particularly amenable to abstraction in the same way that poker is; we discuss reasons for this in Section 2. Furthermore, even if we could apply an abstraction algorithm to Jotto, we would need to group game states into a single bucket on average, which would almost certainly lose a significant amount of information from the original game. Thus, the abstraction paradigm that has been successful on poker does not seem promising to games like Jotto; an entirely new approach is needed.
We provide such an approach. To deal with the fact that we cannot even represent a strategy for one of the players, we provide a novel strategy representation which we call oracular form. Rather than viewing a strategy as an explicit object that must be represented and stored, we instead represent it implicitly through an oracle; we can think of the oracle as an efficient algorithm. Each time we want to make a play from the strategy, we query the oracle, which quickly outputs a sample play from the strategy’s distribution. Thus, instead of representing the entire strategy in advance, we obtain it on an as-needed basis via real-time computation.
Our main algorithm for computing an approximate equilibrium in Jotto is an extension of the fictitious play algorithm  to our oracular setting. The algorithm outputs a full strategy for one player, and for the other player outputs data such that if another algorithm is run on it, a sample of the strategy’s play is obtained. Thus, we can play this strategy, even though it is never explicitly represented. We use our algorithm to compute approximate equilibrium strategies on 2, 3, 4, and 5-letter variants of Jotto. We demonstrate the superiority of our computed strategies over the strategies computed by a benchmark algorithm, both in terms of head-to-head and worst-case performance.
Jotto is a popular two-player word game. While there are many different variations of the game, we will describe the rules of one common variant. Each player picks a secret five-letter word, and the object of the game is to correctly guess the other player’s word first. Players take turns guessing the other player’s secret word and replying with the number of common letters between the guessed word and the secret word (the positions do not matter). For example, if the secret word is GIANT and a player guesses PECAN, the other player will give a reply of 2 (for the A and the N, even though they are in the wrong positions). Players often cross out letters that are eliminated and record other logical deductions on a sheet of paper. An official Jotto sheet is shown in Figure 1.
Instead of having both players simultaneously guessing the other player’s word, we could instead just have one player pick a secret word while the other player guesses it. Let us refer to these players as the hider and the guesser respectively. If the guesser correctly guesses the word on his ’th try, then the hider gets payoff while the guesser gets payoff This is the variation that we consider in the remainder of the paper.
There are a few limits on the words that the players can select. All words must be chosen from a pre-arranged dictionary. No proper nouns are allowed, and the words must consist of all different letters (some variations do not impose this restriction). Furthermore, we do not allow players to select a word of which other permutations (aka anagrams) are also valid words (e.g., STARE and RATES)111If this restriction were not imposed, then players might need to guess all possible permutations of a word even once all the letters are known..
The official dictionary we will be using has 2833 valid 5-letter words. A naïve attempt at determining the size of the game tree is the following. First the hider selects his word, putting the game into one of 2833 states. At each state, the guesser must choose one of 2833 words; the hider gives him an answer from 0-5, and the guesser must now choose one of 2832 words, and so on. The total number of game states will be approximately
It turns out that we can represent the game much more concisely if we take advantage of the fact that many paths of play lead to the guesser knowing the exact same information. For example, if the player guesses GIANT and gets reply of 2 followed by guessing PECAN with a reply of 3, he knows the exact same information as if he had guessed PECAN with a reply of 3 followed by GIANT with a reply of 2; both sequences should lead to the same game state. More generally, two sequences of guesses and answers lead to identical knowledge bases if and only if the set of words consistent with both sequences is identical. Thus, there is a one-to-one correspondence between a knowledge base of the guesser and a subset of the set of words in the dictionary corresponding to the set of words that are consistent with the guesses so far. Since there are total subsets of words in the dictionary, the total number of game states where the guesser would need to make a decision is Since the best equilibrium-finding algorithms can only solve games with up to nodes, we have no hope of solving Jotto by simply applying an existing algorithm.
Furthermore, Jotto does not seem amenable to the same abstraction paradigm that has been successful on poker. In poker, abstraction works by grouping several states into the same bucket and forcing all states in the same bucket to follow the same strategy. In our compact representation of Jotto, the states correspond to subsets of the dictionary; abstraction would mean that several subsets are grouped together into single buckets. However, the action taken at each bucket will be a single word (i.e., the next guess). If many subsets are grouped together into the same bucket, then there will clearly be some words that have already been guessed in some states in the bucket, while not in other states. This will lead to certain actions being ill-defined, as well as possible infinite loops in the structure of the abstracted game tree. This will be exacerbated by the fact that many states will need to be grouped into the same bucket; on average, each bucket will contain game states.
In short, there are significant challenges that must be overcome to apply any sort of abstraction to Jotto; this may not even be feasible to do at all. Instead, we propose an entirely new approach.
3 A natural approach
A natural strategy for the hider would be to select each word uniformly at random, and one for the guesser would be to always guess the word that will eliminate the most words that are still consistent with the guesses and answers so far (in expectation against the uniform hider strategy). We refer to this strategy for the hider as HiderUniform, and to the strategy for the guesser as GuesserGBRUniform (for “Greedy Best Response”). GuesserGBRUniform is essentially 1-ply minimax search; further details and pseudocode are given in Section 7.1. We suspect that current Jotto programs follow algorithms similar to HiderUniform and GuesserGBRUniform, though we are not aware of any publicly available descriptions of existing algorithms. We will use these algorithms as a benchmark to measure the performance of our new approach.
While HiderUniform seems like a pretty strong (i.e., low-exploitability) strategy for the hider, it turns out that GuesserGBRUniform is actually highly exploitable. For example, in the five-letter variant using our rules and dictionary, GuesserGBRUniform will always select ‘doyen’ as its first guess. Clearly no intelligent hider would select doyen as his secret word against such an opponent. Furthermore, the hider can always guarantee that GuesserGBRUniform will require 9 guesses by selecting a word such as ‘amped’ (note that 9 is the maximal number of guesses that GuesserGBRUniform will take to guess any word in the 5-letter variant).
Searching additional levels down the tree will probably not help much with the worst-case exploitability of the guesser’s strategy. The main problem is that GuesserGBRUniform plays a deterministic strategy, and a worst-case opponent could exploit it by always selecting the word that requires the most guesses. We would like to compute a less exploitable strategy for the guesser, which will involve some amount of randomization. Our overall goal is to compute strategies for both players with worst-case exploitabilities as low as possible (i.e., we would like to compute an approximate Nash equilibrium, viewing Jotto as a game of imperfect information).
4 Game theory background
In this section, we review relevant definitions and prior results from game theory and game solving.
4.1 Strategic-form games
The most basic game representation, and the standard representation for simultaneous-move games, is the strategic form. A strategic-form game (aka matrix game) consists of a finite set of players a space of pure strategies for each player, and a utility function for each player. Here denotes the space of strategy profiles — vectors of pure strategies, one for each player.
The set of mixed strategies of player is the space of probability distributions over his pure strategy space . We will denote this space by If the sum of the payoffs of all players equals zero at every strategy profile, then the game is called zero sum. In this paper, we will be primarily concerned with two-player zero-sum games. If the players are following strategy profile we let denote the strategy taken by the opponent.
4.2 Extensive-form games
An extensive-form game is a general model of multiagent decision-making with potentially sequential and simultaneous actions and imperfect information. As with perfect-information games, extensive-form games consist primarily of a game tree; each non-terminal node has an associated player (possibly chance) that makes the decision at that node, and each terminal node has associated utilities for the players. Additionally, game states are partitioned into information sets, where the player whose turn it is to move cannot distinguish among the states in the same information set. Therefore, in any given information set, the player whose turn it is to move must choose actions with the same distribution at each state contained in the information set. If no player forgets information that he previously knew, we say that the game has perfect recall.
4.3 Mixed vs. behavioral strategies
There are multiple ways of representing strategies in extensive-form games. Define a pure strategy to be a vector that specifies one action at each information set. Clearly there will be an exponentially number of pure strategies in the size of the game tree; if the tree has information sets for player and possible actions at each information set, then player has possible pure strategies. Define a mixed strategy to be a probability distribution over the space of pure strategies. We can represent a mixed strategy as a vector with components.
Alternatively, we could play a strategy that randomizes independently at each information set; we refer to such a strategy as a behavioral strategy. Since a behavioral strategy must specify a probability for playing each of actions at each information set, we can represent it as a vector with only components. Thus, behavioral strategies can be represented exponentially more compactly than mixed strategies.
Fortunately, it turns out that this gain in representation size does not come at the loss of expressiveness; any mixed strategy can also be represented as an equivalent behavioral strategy (and vice versa). Thus, current computational approaches to extensive-form games operate on behavioral strategies and avoid the unnecessary exponential blowup associated with using mixed strategies.
4.4 Nash equilibria
Player ’s best response to is any strategy in A Nash equilibrium is a strategy profile such that is a best response to for all An -equilibrium is a strategy profile in which each player achieves a payoff of within of his best response.
In two player zero-sum games, we have the following result which is known as the minimax theorem:
We refer to as the value of the game to player 1. Any equilibrium strategy for a player guarantees an expected payoff of at least the value of the game to that player.
All finite games have at least one Nash equilibrium. Currently, the best algorithms for computing a Nash equilibrium in two-player zero-sum extensive-form games with perfect recall are able to solve games with states in their game tree .
5 Smoothed fictitious play
In this section we will review the fictitious play (FP) algorithm . Despite its conceptual simplicity, FP has recently been used to compute equilibria in many classes of games in the artificial intelligence literature (e.g., [4, 5]). The basic FP algorithm works as follows. At each iteration, each player plays a best response to the average strategy of his opponent so far (we assume the game has two players). Formally, in smoothed fictitious play each player applies the following update rule at each time step :
where is a best response of player to the strategy of his opponent at time We allow strategies to be initialized arbitrarily at
FP is guaranteed to converge to a Nash equilibrium in two-player zero-sum games; however, very little is known about how many iterations are needed to obtain convergence. Recent work shows that FP may require exponentially many iterations in the worst case ; however, it may perform far better in practice on specific games. In addition, the performance of FP is not monotonic; for example, it is possible that the strategy profile after 200 iterations is actually significantly closer to equilibrium than the profile after 300 iterations. So simply running FP for some number of iterations and using the final strategy profile is not necessarily the best approach.
We instead use the following improved algorithm. For each iteration we compute the amount each player could gain by deviating to a best response; denote it by . Let and let After running FP for iterations, rather than output we will instead output — the -equilibrium for smallest out of all all the iterations of FP so far.
6 Oracular strategies
Consider the following scenario. Suppose one is playing an extensive-form game with information sets and two actions per information set, and that he wishes to play an extremely simple strategy: always choose the first action at each information set (suppose actions are labeled as Action 1 and Action 2). To represent this pure strategy, technically we must list out a vector of size (with each entry being a 1 for this particular strategy). On the other hand, it is trivial to write an algorithm that takes as input the index of an information set and outputs the action taken by this strategy (i.e., output 1 on all inputs). Even though there are a large number of information sets, we only require 100 bits to represent the index of each one; thus, it is possible to play this simple strategy without ever explicitly representing it.
More generally, let be an efficient deterministic algorithm that takes as input the index of an information set and outputs an action from the set of actions available at We refer to as a pure oracular strategy for player . It is easy to see that every pure oracular strategy is strategically equivalent to a pure strategy of the game; at information set I, simply plays whatever action outputs on input
We define oracular versions of randomized strategies analogously to the extensive-form case. Let be a collection of pure oracular strategies; then any probability distribution over elements of is a mixed oracular strategy. If we let be a randomized algorithm that outputs a probability distribution over actions at each information set, then call a behavioral oracular strategy. As was the case with pure strategies, each oracular strategy corresponds to a single extensive-form strategy of the same type.
In the next two sections, we will see how the oracular strategy representation can be useful in practice when computing approximate equilibrium strategies in Jotto. In particular, a strategy for the guesser is so large that we cannot represent it explicitly; however, we can encode it concisely as an oracular strategy which we efficiently query repeatedly throughout the algorithm.
7 Computing best responses in Jotto
In order to apply smoothed fictitious play to Jotto, we must figure out how to compute a best response for each player. This is challenging for several reasons. First, the guesser’s strategy space is so large that we cannot compute a full best response; we must settle for computing an approximate best response, which we call the guesser’s greedy best response. In addition, we represent the guesser’s strategy in oracular form; so the hider can cannot operate on it explicitly, and can only query it at certain game states. It turns out that we can actually compute an exact best response for the hider despite this limitation.
7.1 Computing the guesser’s greedy best response
Suppose we are given a strategy for the hider, and wish to compute a counterstrategy for the guesser. Let denote the strategy of the hider, where denotes the probability that the hider chooses — the ’th word in the dictionary. Let denote the number of words in the dictionary, and let be a bit-vector of size where means that is still consistent with the guesses so far. So encodes the current knowledge base of the guesser and represents the state of the game.
A reasonable heuristic to use for the guesser would be the following. For each word in the dictionary, compute the number of words that we will eliminate in expectation (over ) if we guess Then guess the word that expects to eliminate the most words. We refer to this algorithm as GuesserGBR (for “Greedy Best Response”); pseudocode is given in Algorithm 1.
GuesserGBR relies on a number of subroutines. ExpNumElims, given in Algorithm 2, gives the expected number of words eliminated if is guessed. AnswerProbs, given in Algorithm 4, gives a vector of the expected probability of receiving each answer from the hider when is guessed. NumElims, given in Algorithm 3, gives the number of words that can be eliminated when is guessed and an answer of is given. Finally, NumCommLetts, given in Algorithm 5, gives the number of common letters between two words. In the pseudocode, denotes the number of letters allowed per word. For efficiency, we precompute a table of size storing all the numbers of common letters between pairs of words. The overall running time of GuesserGBR is
It is worth noting that the greedy best response is not an actual best response; it is akin to searching one level down the game tree and then using the evaluation function “expected number of words eliminated” to determine the next move. This is essentially 1-ply minimax search. While we would like to compute an exact best response by searching the entire game tree, this is not feasible since the tree has nodes. As with computer chess programs, we will need to settle for searching down the tree as far as we can, then applying a reasonable evaluation function.
7.2 Computing the hider’s best response
In order to compute the hider’s best response (in the context of fictitious play), we will find it useful to introduce two data structures. Let IterNumGuesses (ING) and AvgNumGuesses (ANG) be two arrays of size . The ’th component of ING will denote the number of guesses needed for the guesser’s greedy best response to guess at the current iteration of the algorithm. The ’th component of ANG will be the average over all iterations (of fictitious play) of the number of guesses needed for the guesser’s greedy best response to guess We update ANG by applying
We update ING at each time step by applying
where is the hider’s strategy at iteration of fictitious play, and pseudocode for CompNumGuesses is given below in Algorithm 6. CompNumGuesses computes the number of guesses needed for the guesser’s greedy best response to to correctly guess each word. It accomplishes this by repeatedly querying GuesserGBR at various game states. The subroutine UpdateState updates the game state in light of the answer received from GuesserGBR. It is in this way that the hider’s best response algorithm selectively queries the guesser’s strategy, which is represented implicitly in oracular form.
Finally, once ING and ANG have been updated as described above, we are ready to compute the full best response for the hider. Pseudocode for the algorithm HiderBR is given below in Algorithm 8. HiderBR takes ANG as input, and determines which word(s) required the most guesses on average. If there is a unique word requiring the maximal number of guesses, then that word is selected with probability 1. If there are multiple words requiring the maximal number of guesses, then these are each selected with equal probability. Note that selecting any distribution over these words would constitute a best response; we just choose one such distribution. The asymptotic running time of CompNumGuesses is while that of HiderBR is
Note that, unlike GuesserGBR of Section 7.1 which is an approximate best response using 1-ply minimax search, HiderBR is a full best response. We are able to compute a full best response for the hider because his strategy space is much smaller than that of the guesser; the hider has only possible pure strategies — 2833 in the case of 5-letter Jotto.
7.3 Parallelizing the best response calculation
The hider’s best response calculation can be sped up dramatically by parallelizing over several cores. In particular, we parallelize the CompNumGuesses subroutine as follows. For the first processor, we iterate over to where is the number of processors, and so on for the other processors. Thus we can perform independent computations in parallel. The overall running time of the new algorithm is .
8 Computing an approximate equilibrium in Jotto
We would like to apply smoothed fictitious play to Jotto, using HiderBR for the hider’s best response and GuesserGBR for the guesser’s best response; however, this is tricky for several reasons. First, it is not clear how to compute the epsilons and determine the quality of our strategies. And second, it will be difficult to run the algorithm without explicitly represent the guesser’s strategy; furthermore, we cannot output it at the end of the algorithm.
We are now ready to present our full algorithm for computing an approximate equilibrium in Jotto; pseudocode is given in Algorithm 13. Note that we initialize the hider’s strategy to choose each word uniformly at random. In terms of the guesser’s strategy, it turns out that all the information needed to obtain it is already encoded in the hider’s strategy and that we do not actually need to represent it in the course of algorithm.
To obtain the guesser’s final strategy, note that the strategies of the hider are output to a file at each iteration. It turns out that we can use this file to efficiently generate samples from the guesser’s strategy, even though we never explicitly output this strategy. We present pseudocode in Algorithm 14 for generating a sample of the guesser’s strategy at state from the file output in Algorithm 13. This algorithm works by randomly selecting an integer from 1 to then playing the guesser’s greedy best response to — the hider’s strategy at iteration . We can view this algorithm as representing the guesser’s strategy as a mixed oracular strategy; in particular, it is the uniform distribution over his greedy best responses in the first iterations of Algorithm 13. This is noteworthy since it is a rare case of the mixed strategy representation having a computational advantage over the behavioral strategy representation.
We ran our algorithm SolveJotto on four different Jotto instances, allowing words to be 2, 3, 4, or 5 letters long. To speed up the computation, we used the parallel version of the bottleneck subroutine CompNumGuesses (described in Section 7.3) with 16 processors. As our dictionary, we use the Official Tournament and Club Word List , the official word list for tournament Scrabble in several countries. As discussed in Section 2, we omit words with duplicate letters and words for which there exists an anagram that is also a word. The dictionary sizes are given in Table 1. We note that our algorithms extend naturally to any number of words and dictionary sizes (and to other variants of Jotto as well).
One metric for evaluating our algorithm is to play the strategies it computes against a benchmark algorithm. The benchmark algorithm we chose selects his word uniformly at random as the hider, and plays the greedy best response to the uniform strategy as the guesser. This is the same strategy that we use to initialize our algorithm.
For each number of letters, we computed the payoff of our algorithm SolveJotto against the benchmark (recall that the payoff to the hider is the expected number of guesses needed for the guesser to correctly guess the hider’s word). The overall payoff is the average of the hider and guesser payoff.
|Number of letters||2||3||4||5|
|Our hider payoff vs. benchmark||7.652||7.912||7.507||7.221|
|Our guesser payoff vs. benchmark||-6.619||-7.635||-7.415||-7.216|
|Our overall payoff vs. benchmark||0.517||0.139||0.046||0.003|
|Benchmark self-play hider payoff||6.627||7.601||7.365||7.079|
|Our algorithm self-play hider payoff||7.438||7.658||7.390||7.162|
|Our final epsilon||0.038||0.334||0.336||0.335|
|Number of iterations||22212||10694||3568||3906|
|Avg time per iteration (minutes)||0.028||1.160||12.576|
Several observations from Table 1 are noteworthy. First, our algorithm beats the benchmark for all dictionary sizes. In the two-letter game, our expected payoff against the benchmark is 0.517; our strategy requires over a full guess less than the benchmark in expectation. Our profit against the benchmark decreases as more letters are used.
In addition to head-to-head performance against the benchmark, we also compared the algorithms in terms of worst-case performance. Recall that denotes the maximum payoff improvement one player could gain by deviating to a best response (full best response for the hider and greedy best response for the guesser). Note that in all cases, our is significantly lower than that of the benchmark. For example, in the two-letter game the benchmark obtains an of 5.373, while our algorithm obtains one of 0.038.
Interestingly, we also observe that the self-play payoff of our algorithm, which is an estimate of the value of the game, does not increase monotonically with the number of letters. That is, increasing the number of letters in the game does not necessarily make it more difficult for the guesser to guess the hider’s word.
We presented a new approach for computing approximate-equilibrium strategies in Jotto. Our algorithm produces strategies that significantly outperform a benchmark algorithm with respect to both head-to-head performance and worst-case exploitability. The algorithm extends fictitious play to a novel strategy representation called oracular form. We expect our algorithm and the oracular form representation to apply naturally to many other interesting games as well; in particular, games where the strategy space is very large for one player, but relatively small for the other player.
-  Billings, D., Burch, N., Davidson, A., Holte, R., Schaeffer, J., Schauenberg, T., Szafron, D.: Approximating game-theoretic optimal strategies for full-scale poker. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI) (2003)
-  Brandt, F., Fischer, F., Harrenstein, P.: On the rate of convergence of fictitious play. In: International Symposium on Algorithmic Game Theory (SAGT) (2010)
-  Brown, G.W.: Iterative solutions of games by fictitious play. In: Koopmans, T.C. (ed.) Activity Analysis of Production and Allocation, pp. 374–376. John Wiley & Sons (1951)
-  Ganzfried, S., Sandholm, T.: Computing an approximate jam/fold equilibrium for 3-player no-limit Texas Hold’em tournaments. In: Proceedings of the International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS) (2008)
-  Rabinovich, Z., Gerding, E., Polukarov, M., Jennings, N.R.: Generalised fictitious play for a continuum of anonymous players. In: Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI) (2009)
-  Official tournament and club word list, http://www.isc.ro/en/commands/lists.html
-  Zinkevich, M., Bowling, M., Johanson, M., Piccione, C.: Regret minimization in games with incomplete information. In: Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS) (2007)