Evaluating Go Game Records for Prediction of Player Attributes
We propose a way of extracting and aggregating per-move evaluations from sets of Go game records. The evaluations capture different aspects of the games such as played patterns or statistic of sente/gote sequences. Using machine learning algorithms, the evaluations can be utilized to predict different relevant target variables. We apply this methodology to predict the strength and playing style of the player (e.g. territoriality or aggressivity) with good accuracy. We propose a number of possible applications including aiding in Go study, seeding real-work ranks of internet players or tuning of Go-playing programs.
978-1-4799-8622-4/15/$31 ©2015 IEEE
Computer Go, Machine Learning, Feature Extraction, Board Games, Skill Assessment
The field of computer Go is primarily focused on the problem of creating a program to play the game by finding the best move from a given board position . We focus on analyzing existing game records with the aim of helping humans to play and understand the game better instead.
Go is a two-player full-information board game played on a square grid (usually lines) with black and white stones; the goal of the game is to surround territory and capture enemy stones. In the following text we assume basic familiarity with the game rules, a small glossary can be found at the end of the paper.
Following up on our initial research , we present a method for extracting information from game records. We extract different pieces of domain-specific information from the records to create a complex evaluation of the game sample. The evaluation is a vector composed of independent features – each of the features captures different aspect of the sample. For example, a statistic of most frequent local patterns played, or a statistic of high and low plays in different game stages are used.
Using machine learning methods, the evaluation of the sample can be used to predict relevant variables. In this work in particular, the data sample consists of games of a player, and it is used to predict the player’s strength and playing style.
This paper is organized as follows. Section 2 summarizes related work in the area of machine learning applications in the game of Go. Section 3 presents the features comprising the evaluation. Section 4 gives details about the machine learning method we have used. In Section 5 we give details about our datasets – for prediction of strength and style – and show how precisely can the prediction be conducted. Section 6 discusses applications and future work.
2 Related Work
Currently, the ways to utilize these records could be divided into two directions. Firstly, there is the field of computer Go, where the records have been used to rank certain patterns which serve as a heuristic to speed up the tree-search , or to generate databases of standard openings . They have also been used as a source of training data by various neural-network based move-predictors. Until very recently, these did not perform convincingly . The recent improvements [8, 9] based on deep convolutional neural networks seem to be changing the situation and promising big changes in the field.
Secondly, the records of professional games have traditionally served as a study material for human players. There exist software tools [10, 11] designed to enable the user to search the games. These tools also give statistics of next moves and appropriate win rate among professional games.
Our approach seems to reside on the boundary between the two above mentioned directions, with possible applications in both computer Go and tools aiding in human study. To our knowledge, the only work somewhat resembling ours is , where the authors claim to be able to classify player’s strength into 3 predefined classes (casual, intermediate, advanced player). In their work, the domain-specific features were extracted by using GnuGo’s  positional assessment and learned using random forests . It is hard to say how precise their method is, since neither precision, nor recall was given. The only account given were two examples of development of skill of two players (picked in an unspecified manner) in time.
One of the applications of hereby proposed methodology is a utilization of predicted styles of a player to recommend relevant professional players to review. The playing style is traditionally of great importance to human players, but so far, the methodology for deciding player’s style has been limited to expert judgement and hand-constructed questionnaires , . See the Discussion (Section 6) for details.
3 Feature Extraction
This section presents the methods for extracting the evaluation vector (denoted ) from a set of games. Because we aggregate data by player, each game in the set is accompanied by the color which specifies our player of interest. The sample is therefore regarded as a set of colored games, .
The evaluation vector is composed by concatenating several sub-vectors of features – examples include the aforementioned local patterns or statistic of sente and gote sequences. These will be described in detail in the rest of this section. Some of the details are omitted, see  for an extended description.
3.1 Raw Game Processing
The games are processed by the Pachi Go Engine  which exports a variety of analytical data about each move in the game. For each move, Pachi outputs a list of key-value pairs regarding the current move:
atari flag — whether the move put enemy stones in atari,
atari escape flag — whether the move saved own stones from atari,
capture — number of enemy stones the move captured,
contiguity to last move — the gridcular distance (cf. equation 1) from the last move,
board edge distance — the distance from the nearest edge of the board,
spatial pattern — configuration of stones around the played move.
We use this information to compute the higher level features given below. The spatial pattern is comprised of positions of stones around the current move up to a certain distance, given by the gridcular metric
This metric produces a circle-like structure on the Go board square grid . Spatial patterns of sizes 2 to 6 are taken into account.
The pattern feature family is essentially a statistic of the most frequently occurring spatial patterns (together with both atari flags). The list of the most frequently played patterns is computed beforehand from the whole database of games. The patterns are normalized so that it is black’s turn, and they are invariant under rotation and mirroring. We used for the domain of strength and for the domain of style (which has a smaller dataset, see Section 5.2 for details).
Given a set of colored games we then count how many times was each of the patterns played – thus obtaining a vector of counts (). With simple occurrence count however, particular counts increase proportionally to number of games in . To maintain invariance under the number of games in the sample, a normalization is needed. We do this by dividing the by , though other normalization schemes are possible, see .
3.3 -local Sente and Gote Sequences
The concept of sente and gote is traditionally very important for human players, which means it could bear some interesting information. Based on this intuition, we have devised a statistic which tries to capture distribution of sente and gote plays in the games from the sample. In general, deciding what moves are sente or gote is hard. Therefore, we restrict ourselves to what we call -local (sente and gote) sequences.
We say that a move is -local (with respect to the previous move) if its gridcular distance from previous move is smaller than a fixed number ; in this paper, we used for the strength dataset and for the style dataset. The simplifying assumption we make is that responses to sente moves are always local. Although this does not hold in general, the feature proves useful.
The assumption allows to partition each game into disjunct -local sequences (that is, each move in the sequence is -local with respect to its directly previous move) and observe whether the player who started the sequence is different from the player who ended it. If it is so, the -local sequence is said to be sente for the player who started it because he gets to play somewhere else first (tenuki). Similarly if the player who started the sequence had to respond last we say that the sequence is gote for him. Based on this partitioning, we can count the average number of sente and gote sequences per game from the sample and these two numbers form the feature.
3.4 Border Distance
The border distance feature is a two dimensional histogram counting the average number of moves in the sample played low or high in different game stages. The original inspiration was to help distinguishing between territorial and influence based moves in the opening, though it turns out that the feature is useful in other phases of the game as well.
The first dimension is specified by the move’s border distance, the second one by the number of the current move from the beginning of the game. The granularity of each dimension is given by intervals dividing the domains. We use the
division for the border distance dimension (distinguishing between the first 2 lines, 3rd line of territory, 4th line of influence and higher plays for the rest). The move number division is given by
for the strength dataset and
for the style dataset. The motivation is to (very roughly) distinguish between the opening, early middle game, middle game and endgame. Differences in the interval sizes were found empirically and our interpretation is that in the case of style, we want to put bigger stress on opening and endgame (both of which can be said to follow standard patterns) on behalf of the middle game (where the situation is usually very complex).
If we use the and intervals to divide the domains, we obtain a histogram of total fields. For each move in the games , we increase the count in the appropriate histogram field. In the end, the whole histogram is normalized to establish invariancy under the number of games scanned by dividing the histogram elements by . The resulting 16 numbers form the border distance feature.
3.5 Captured Stones
Apart from the border distance feature, we also maintain a two-dimensional histogram which counts the numbers of captured stones in different game stages. The motivation is simple – especially beginners tend to capture stones because “they could” instead of because it is the “best move”. Such capture could be a grave mistake in the opening and it would not be played by skilled players.
As before, one of the dimensions is given by the intervals
which try to specify the game stages (roughly: opening, middle game, endgame). The division into game stages is coarser than for the previous feature because captures occur relatively infrequently. Finer graining would require more data.
The second dimension has a fixed size of three bins. Along the number of captives of the player of interest (the first bin), we also count the number of his opponent’s captives (the second bin) and a difference between the two numbers (the third bin). Together, we obtain a histogram of elements.
Again, the colored games are processed move by move by increasing the counts of captivated stones (or 0) in the appropriate field. The 9 numbers (again normalized by dividing by ) together comprise the feature.
3.6 Win/Loss Statistic
The next feature is a statistic of wins and losses and whether they were by points or by resignation. The motivation is that many weak players continue playing games that are already lost until the end, either because their counting is not very good (they do not know there is no way to win), or because they hope the opponent will make a blunder. On the other hand, professionals do not hesitate to resign if they think that nothing can be done, continuing with a lost game could even be considered rude.
We disregard forfeited, unfinished or tie games in this feature because the frequency of these events is so small it would require a very large dataset to utilize them reliably.
In the colored games of , we count how many times did the player of interest:
win by counting,
win by resignation,
lost by counting,
and lost by resignation.
Again, we divide these four numbers by to maintain the invariance under the number of games in . Furthermore, for the games won or lost in counting we count the average size of the win or loss in points. The six numbers form the feature.
So far, we have considered how we can turn a set of coloured games into an evaluation vector. Now, we are going to show how to utilize the evaluation. To predict various player attributes, we start with a given input dataset consisting of pairs , where corresponds to a set of colored games of -th player and is the target attribute. The might be fairly arbitrary, as long as it has some relation to the . For example, might be ’s strength.
Now, let us denote our evaluation process presented before as and let be evaluation of -th player, . Then, we can transform into , which forms our training dataset.
As usual, the task of subsequent machine learning algorithm is to generalize the knowledge from the dataset to predict correct even for previously unseen . In the case of strength, we might therefore be able to predict strength of an unknown player given a set of his games (from which we can compute the evaluation ).
4.1 Prediction Model
Choosing the best performing predictor is often a tedious task, which depends on the nature of the dataset at hand, requires expert judgement and repeated trial and error. In , we experimented with various methods, out of which stacked ensembles  with different base learners turned out to have supreme performance. Since this paper focuses on the evaluation rather than finding the very best prediction model, we decided to use a bagged artificial neural network, because of its simplicity and the fact that it performs very well in practice.
The network is composed of simple computational units which are organized in a layered topology, as described e.g. in monograph by . We have used a simple feedforward neural network with 20 hidden units in one hidden layer. The neurons have standard sigmoidal activation function and the network is trained using the RPROP algorithm  for at most 100 iterations (or until the error is smaller than 0.001). In both datasets used, the domain of the particular target variable (strength, style) was linearly rescaled to prior to learning. Similarly, predicted outputs were rescaled back by the inverse mapping.
The bagging  is a method that combines an ensemble of models (trained on differently sampled data) to improve their performance and robustness. In this work, we used a bag of above specified neural networks.
4.2 Reference Model and Performance Measures
In our experiments, mean regression was used as a reference model. The mean regression is a simple method which constantly predicts the average of the target attributes in the dataset regardless of the particular evaluation . Although mean regression is a very trivial model, it gives some useful insights about the distribution of target variables . For instance, low error of the mean regression model raises suspicion that the target attribute is ill-defined, as discussed in the results section of the style prediction, Section 5.2.
To assess the efficiency of our method and give estimates of its precision for unseen inputs, we measure the performance of our algorithm given a dataset . A standard way to do this is to divide the into training and testing parts and compute the error of the method on the testing part. For this, we have used a standard method of 10-fold cross-validation , which randomly divides the dataset into 10 disjunct partitions of (almost) the same size. Repeatedly, each partition is then taken as testing data, while the remaining partitions are used to train the model. Cross-validation is known to provide error estimates which are close to the true error value of the given prediction model.
A commonly used performance measure is the mean square error () which estimates variance of the error distribution. We use its square root () which is an estimate of standard deviation of the predictions,
where the machine learning model is trained on the training data and denotes the testing data.
5 Experiments and Results
One of the two major domains we have tested our framework on is the prediction of player strength.
For each rank in the range of 6-dan to 20-kyu, we gathered a list of players of the particular rank. To avoid biases caused by different strategies, the sample only consists of games played on board between players of comparable strength (excluding games with handicap stones). The set of colored games for a player consists of the games player played when he had the rank . We only use the if the number of games is not smaller than 10 games; if the sample is larger than 50 games, we randomly choose a subset of the sample (the size of subset is uniformly randomly chosen from interval ). Note that by cutting the number of games to a fixed number (say 50) for large samples, we would create an artificial disproportion in sizes of , which could introduce bias into the process. The distribution of sample sizes is shown in Figure 1.
For each of the 26 ranks, we gathered 120 such ’s. The target variable to learn from directly corresponds to the ranks: for rank of 20-kyu, for 1-kyu, for 1-dan, for 6-dan, other values similarly. (With increasing strength, the decreases.) Since the prediction model used (bagged neural network) rescales the input data to , the direction of the ordering or its scale can be chosen fairly arbitrarily.
The performance of the prediction of strength is given in Table 1. The table compares performances of different features (predicted by the bagged neural network, Section 4.1) with the reference model of mean regression.
The results show that the prediction of strength has standard deviation (estimated by the error) of approximately rank. Comparing different features reveals that for the prediction of strength, the Pattern feature works by far the best, while other features bring smaller, yet nontrivial contribution.
|None (Mean regression)|
|All features combined|
The second domain is the prediction of different aspects of player styles.
The collection of games in this dataset comes from the Games of Go on Disk database by . This database contains more than 70 000 professional games, spanning from the ancient times to the present.
We chose 25 popular professional players (mainly from the 20th century) and asked several experts (professional and strong amateur players) to evaluate these players using a questionnaire. The experts (Alexander Dinerchtein 3-pro, Motoki Noguchi 7-dan, Vladimír Daněk 5-dan, Lukáš Podpěra 5-dan and Vít Brunner 4-dan) were asked to assess the players on four scales, each ranging from 1 to 10.
The scales (cf. Table 2) try to reflect some of the traditionally perceived playing styles. For example, the first scale (territoriality) stresses whether a player prefers safe, yet inherently smaller territory (number 10 on the scale), or roughly sketched large territory (moyo, 1 on the scale), which is however insecure. For detailed analysis of playing styles, please refer to , or .
For each of the selected professionals, we took 192 of his games from the GoGoD database at random. We divided these games (at random) into 12 colored sets of 16 games. The target variable (for each of the four styles) is given by average of the answers of the experts. Results of the questionnaire are published online in . Please observe, that the style dataset has both much smaller domain size and data size (only 4800 games).
Table 3 compares performances of different features (as predicted by the bagged neural network, Section 4.1) with the mean regression learner. Results in the table have been averaged over different styles. The table shows that the two features with biggest contribution are the pattern feature and the border distance feature. Other features perform either weakly, or even slightly worse than the mean regression learner.
The prediction performance per style is shown Table 4 (computed on the full feature set). Given that the style scales have range of 1 to 10, we consider the average standard deviation from correct answers of around 1.6 to be a good precision.
We should note that the mean regression has very small for the scale of thickness. This stems from the fact that the experts’ answers from the questionnaire have themselves very little variance. Our conclusion is that the scale of thickness is not well defined. Refer to  for further discussion.
|None (Mean regression)|
|All features combined|
|Style||Mean regression||Bagged NN||Mean cmp|
In this paper, we have chosen the target variables to be the strength and four different aspects of style. This has several motivations. Firstly, the strength is arguably the most important player attribute and the online Go servers allow to obtain reasonably precise data easily. The playing styles have been chosen for their strong intuitive appeal to players, and because they are understood pretty well in traditional Go theory. Unlike the strength, the data for the style target variables are however hard to obtain, since the concepts have not been traditionally treated with numerical rigour. To overcome this obstacle, we used the questionnaire, as discussed in Section 5.2.
The choice of target variable can be quite arbitrary, as long as some dependencies between the target variable and evaluations exist (and can be learned). Some other possible choices might be the era of a player (e.g. standard opening patterns have been evolving rapidly during the last 100 years), or player nationality.
The possibility to predict player’s attributes demonstrated in this paper shows that the evaluations are a very useful representation. Both the predictive power and the representation can have a number of possible applications.
So far, we have utilized some of the findings in an online web
Of course, our methods for style estimation are trained on very strong players and thus they might not be fully generalizable to ordinary players. Weak players might not have a consistent style, or the whole concept of style might not be even applicable for them. Estimating this effect is however not easily possible, since we do not have data about weak players’ styles. Our web application allows the users to submit their own opinion about their style, therefore we should be able to consider this effect in the future research.
It is also possible to study dependencies between single elements of the evaluation vector and the target variable directly. By pinpointing e.g. the patterns of the strongest correlation with bad strength (players who play them are weak), we can warn the users not to play the moves associated with the pattern. We have also realised this feature in the online web application . However, this method seems to be usable only for the few most strongly correlated attributes, the weakly correlated attributes are prone to larger errors.
Other possible applications include helping the ranking algorithms to converge faster — usually, the ranking of a player is determined from his opponents’ ranking by looking at the numbers of wins and losses (e.g. by computing an Elo rating ). Our methods might improve this by including the domain knowledge. Similarly, a computer Go program can quickly classify the level of its human opponent based on the evaluation from their previous games and auto-adjust its difficulty settings accordingly to provide more even games for beginners. We will research these options in the future.
This paper presents a method for evaluating players based on a sample of their games. From the sample, we extract a number of different domain-specific features, trying to capture different pieces of information. Resulting summary evaluations turn out to be very useful for prediction of different player attributes (such as strength or playing style) with reasonable accuracy.
The ability to predict such player attributes has some very interesting applications in both computer Go and in development of teaching tools for human players, some of which we realized in an on-line web application. The paper also discusses other potential extensions and applications which we will be exploring in the future.
We believe that the applications of our findings can help to improve both human and computer understanding of the game of Go.
The machine learning models were implemented and evaluated using the Orange Datamining suite  and the Fast Artificial Neural Network library FANN . We used the Pachi Go engine  for the raw game processing.
This research has been partially supported by the Czech Science Foundation project no. P103-15-19877S. J. Moudřík has been supported by the Charles University Grant Agency project no. 364015 and by SVV project no. 260 224.
atari — a situation where a stone (or group of stones) can be captured by the next opponent move,
sente — a move that requires immediate enemy response, and thus keeps the initiative,
gote — a move that does not require immediate enemy response, and thus loses the initiative,
tenuki — a move gaining initiative – ignoring last (gote) enemy move,
handicap — a situation where a weaker player gets some stones placed on predefined positions on the board as an advantage to start the game with (their number is set to compensate for the difference in skill).
- S. Gelly and D. Silver, “Achieving master level play in 9x9 computer go,” in AAAI’08: Proceedings of the 23rd national conference on Artificial intelligence. AAAI Press, 2008, pp. 1537–1540.
- P. Baudiš and J. Moudřík, “On move pattern trends in a large go games corpus,” Arxiv, CoRR, October 2012. [Online]. Available: http://arxiv.org/abs/1209.5251
- W. Shubert. (2013) KGS — kiseido go server. [Online]. Available: http://www.gokgs.com/
- T. M. Hall and J. Fairbairn. (winter 2011) Games of Go on Disk — GoGoD Encyclopaedia and Database. [Online]. Available: http://www.gogod.co.uk/
- R. Coulom, “Computing Elo Ratings of Move Patterns in the Game of Go,” in Computer Games Workshop, H. J. van den Herik, Mark Winands, Jos Uiterwijk, and Maarten Schadd, Eds., Amsterdam Pays-Bas, 2007. [Online]. Available: http://hal.inria.fr/inria-00149859/en/
- P. Audouard, G. Chaslot, J.-B. Hoock, J. Perez, A. Rimmel, and O. Teytaud, “Grid coevolution for adaptive simulations: Application to the building of opening books in the game of go,” in Applications of Evolutionary Computing. Springer, 2009, pp. 323–332.
- M. Enzenberger, “The integration of a priori knowledge into a go playing neural network,” 1996. [Online]. Available: http://www.markus-enzenberger.de/neurogo.html
- I. Sutskever and V. Nair, “Mimicking go experts with convolutional neural networks,” in Artificial Neural Networks-ICANN 2008. Springer, 2008, pp. 101–110.
- C. Clark and A. Storkey, “Teaching deep convolutional neural networks to play go,” arXiv preprint arXiv:1412.3409, 2014.
- U. Görtz. (2012) Kombilo — a Go database program (version 0.7). [Online]. Available: http://www.u-go.net/kombilo/
- F. de Groot. (2005) Moyo Go Studio. [Online]. Available: http://www.moyogo.com/
- A. Ghoneim, D. Essam, and H. Abbass, “Competency awareness in strategic decision making,” in Cognitive Methods in Situation Awareness and Decision Support (CogSIMA), 2011 IEEE First International Multi-Disciplinary Conference on, feb. 2011, pp. 106 –109.
- D. Bump, G. Farneback, A. Bayer et al. (2009) GNU Go. [Online]. Available: http://www.gnu.org/software/gnugo/
- L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, Oct. 2001.
- A. Dinerchtein. (2012) What is your playing style? [Online]. Available: http://style.baduk.com
- Sensei’s Library. (2013) Which pro do you most play like. [Online]. Available: http://senseis.xmp.net/?WhichProDoYouMostPlayLike
- J. Moudřík, “Meta-learning methods for analyzing go playing trends,” Master’s thesis, Charles University, Faculty of Mathematics and Physics, Prague, Czech Republic, 2013. [Online]. Available: http://www.j2m.cz/~jm/master_thesis.pdf
- P. Baudiš et al. (2012) Pachi — Simple Go/Baduk/Weiqi Bot. [Online]. Available: http://repo.or.cz/w/pachi.git
- D. Stern, R. Herbrich, and T. Graepel, “Bayesian pattern ranking for move prediction in the game of go,” in ICML ’06: Proceedings of the 23rd international conference on Machine learning. New York, NY, USA: ACM, 2006, pp. 873–880.
- L. Breiman, “Stacked regressions,” Machine Learning, vol. 24, pp. 49–64, 1996. [Online]. Available: http://dx.doi.org/10.1007/BF00117832
- S. Haykin, Neural Networks: A Comprehensive Foundation (2nd Edition), 2nd ed. Prentice Hall, jul 1998. [Online]. Available: http://www.worldcat.org/isbn/0132733501
- M. Riedmiller and H. Braun, “A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm,” in IEEE International Conference on Neural Networks, 1993, pp. 586–591.
- L. Breiman, “Bagging predictors,” Mach. Learn., vol. 24, no. 2, pp. 123–140, Aug. 1996. [Online]. Available: http://dx.doi.org/10.1023/A:1018054314350
- R. Kohavi, “A study of cross-validation and bootstrap for accuracy estimation and model selection.” Morgan Kaufmann, 1995, pp. 1137–1143.
- W. Shubert. (2013) KGS archives — kiseido go server. [Online]. Available: http://www.gokgs.com/archives.jsp
- A. Hollosi. (2006) SGF File Format. [Online]. Available: http://www.red-bean.com/sgf/
- J. Fairbairn. (winter 2011) Games of Go on Disk — GoGoD Encyclopaedia and Database, Go players’ styles. [Online]. Available: http://www.gogod.co.uk/
- Sensei’s Library. (2013) Professional players’ go styles. [Online]. Available: http://senseis.xmp.net/?ProfessionalPlayersGoStyles
- J. Moudřík and P. Baudiš, “Style consensus: Style of professional players, judged by strong players,” Tech. Rep., May 2013. [Online]. Available: http://gostyle.j2m.cz/FILES/style_consensus_27-05-2013.pdf
- J. Moudřík and P. Baudiš. (2013) GoStyle — Determine playing style in the game of Go. [Online]. Available: http://gostyle.j2m.cz/
- A. E. Elo, The rating of chessplayers, past and present. Arco, New York, 1978.
- Python Software Foundation. (2008, November) Python 2.7. [Online]. Available: http://www.python.org/dev/peps/pep-0373/
- J. Demšar et al., “Orange: Data mining toolbox in python,” Journal of Machine Learning Research, vol. 14, pp. 2349–2353, 2013. [Online]. Available: http://jmlr.org/papers/v14/demsar13a.html
- S. Nissen, “Implementation of a fast artificial neural network library (fann),” Department of Computer Science University of Copenhagen (DIKU), Tech. Rep., 2003, http://fann.sf.net.