Player Behavior and Optimal Team Composition in
Online Multiplayer Games
Abstract
We consider clustering player behavior and learning the optimal team composition for multiplayer online games. The goal is to determine a set of descriptive play style groupings and learn a predictor for win/loss outcomes. The predictor takes in as input the play styles of the participants in each team; i.e., the various team compositions in a game. Our framework uses unsupervised learning to find behavior clusters, which are, in turn, used with classification algorithms to learn the outcome predictor. For our numerical experiments, we consider League of Legends, a popular teambased roleplaying game developed by Riot Games. We observe the learned clusters to not only corroborate well with game knowledge, but also provide insights surprising to expert players. We also demonstrate that game outcomes can be predicted with fairly high accuracy given team compositionbased features.
team performance, team composition, player behavior, video games, multiplayer games, game prediction
I Introduction
Online virtual worlds are an increasingly significant venue for human interaction. By far the most active virtual worlds belong to a genre of video games called massively multiplayer online roleplaying games (MMORPGs), where players interact with each other in a virtual world [TSF:13]. In an MMORPG, players assume the role of ingame characters and take control over most of their characters’ actions, often working in teams to accomplish a common objective, such as defeating opposing teams. Due to the shared, persistent nature of these virtual worlds, user behaviors and experiences are shaped by various social factors.
Besides profitmaking, an understanding of these social dynamics would provide insight to human interactions in the real world and the potential of virtual worlds for education, training, and scientific research [Bai:12, Dic:05]. Numerous prior studies in social sciences and management have investigated how team compositions can affect team performance [Spo:11, HA:08]. However, little is understood about player behavior and team performance and factors contributing to it in competitive MMORPGs. To address this, we develop a machine learning framework that uses game histories to learn player behavior clusters and predict the outcome of games given prior knowledge about the game and its players.
The contributions of this paper are twofold. First, we present several approaches that group player behaviors in online games. Second, we develop predictors that determine how likely it is that a team of players can emerge victorious given the said team’s composition of players, all of whom may have different play styles. Specifically, we consider kmeans and DPmeans—an expectation maximization algorithm [KJ:12]—for clustering play styles and logistic regression (LR), Gaussian discriminant analysis (GDA), and support vector machines (SVMs) for determining win/loss outcomes. The rest of the paper is structured as follows. Section II describes the target game of our numerical experiments and our data collection method. Sections III and IV demonstrate several methods and their effectiveness for learning play style clusters and outcome predictors. Some concluding remarks are drawn and future works mentioned in Section V.
Ii Target Game Description
We begin with a description of the MMORPG used for our numerical experiments and the data acquisition method.
Iia League of Legends
For this project we consider a popular MMORPG—the League of Legends (LoL). LoL is a multiplayer online battle arena video game developed and published by Riot Games with 27 million daily players [Tas:14]. Furthermore, LoL is a representative MMORPG of its genre, with many similar counterparts such as World of Warcraft’s Dota 2 [Suz:09]—giving us a measure of generalizability to other games in its genre. In this MMORPG, a standard game consists of two opposing teams of five players. Each player assumes the role of one of over 120 different characters battling each other to destroy the opposing team’s “towers”—structures that fall after suffering enough attacks from characters. A game is won when all of either team’s towers are destroyed.
IiB Data set acquisition
The developer of LoL has made the game’s player statistics and match histories freely available through a webbased application programming interface (API) [Rio:14]. We randomly gathered over 100,000 instances of player statistics and over 10,000 instances of match histories from the 20132014 season. We then parsed and cleaned the raw game data to construct our training and testing sets, depending on the features we chose. Player statistics include performance indicators such as average damage dealt and number of wins. Match histories contain information such as participant ID numbers and character choices.
Iii Behavioral Clustering
The target game’s developers have grouped the 120 different ingame characters into six classes, such as assassin or support, which indicates the character’s gameplay style. While these classes reflect the developers’ design intent for the characters, they do not necessarily reveal the behavior of actual players in games. Using statistics from various players, we present our feature selection method and the gameplay styles learned by applying various clustering algorithms to our data set. We validate our results and the insights derived from it with expert analysis from ranked players.
Iiia Feature selection
For our clustering algorithms, the features were 21 normalized player statistics, such as average damage dealt and money earned. The statistics were normalized over their range of values, preventing clusters from being formed due to order of magnitude differences between statistics. For instance, damage dealt values are often 7 orders of magnitude greater than kill streaks, which means small variations in damage dealt are erroneously considered as much more important than kill streaks if taken directly as feature values.
IiiB Clustering models
IiiB1 kmeans
Given a set of observations, kmeans clustering aims to partition them into k sets so as to minimize the withincluster sum of squares; i.e., find the minimizer of the distortion function:
(1) 
where is an observation and is the cluster centroid.
In general, this problem is computationally difficult (NPhard). For our clustering, we employ Lloyd’s algorithm, which is a heuristic that consists of randomly choosing observations as cluster centroids and iteratively assigning observations to their closest centroids and updating the centroids with the mean of their respective clusters [Ng:14].
To select the number of clusters , we run 10fold cross validation over to find a local optimizer. The scoring function for the cross validation is simply the average distortion given by (1) over the heldout sets.
IiiB2 DPmeans
DPmeans is a nonparametric expectationmaximization (EM) algorithm derived using a Dirichlet process (DP) mixture of Gaussians model, which In other words, the user does not choose the number of clusters beforehand. The technique being the topic of a series of papers, we will only provide a brief description of the algorithm. The reader is referred to [TKJ:13, KJ:12] for a thorough review of DPmeans.
Recall that the standard mixture of Gaussians assumes that one chooses a cluster with probability and then generates an observation from the Gaussians corresponding to that chosen cluster. In contrast, the DP mixture of Gaussians is a Bayesian extension to this model that arises by first placing a Dirichlet prior on the mixing Gaussian coefficients (i.e., the probability of choosing a cluster) for some initial set of coefficients (e.g., uniform prior). As observations are made, the prior is updated and the mixture coefficients change to reflect these new knowledge.
The derivation of DPmeans is inspired by the connection between kmeans EM with a finite mixture of Gaussians model. Namely, the kmeans algorithm may be viewed as a limit of the EM algorithm if all of the covariance matrices corresponding to the clusters in a Gaussian mixture model are equal to . As , the negative loglikelihood of the mixture of Gaussians model approaches the kmeans clustering objective (1). Correspondingly, the EM steps approach the kmeans steps in Lloyd’s algorithm.
In the case of DPmeans, [KJ:12] shows how to perform a similar limiting argument. Specifically, suppose that the generative model for the EM algorithm was a DP mixture of Gaussians model with covariances equal to . Letting for the DP mixture model yields the objective function
(2) 
where is the set of clusters, is an observation, and is the cluster centroid. Note that, unlike in kmeans, is now a variable to be optimized over.
This leads to an algorithm with clustering assignments similar to the classical kmeans algorithm and the same monotonic local convergence guarantees. (See Algorithm 1.) The difference is that a new cluster is formed whenever an observation is sufficiently far away from all existing cluster centroids, with some userdefined threshold distance . Intuitively, is a penalty on the number of clusters, on top of the original kmeans distortion function.
We ran DPmeans with 10fold cross validation over a range of values, setting our scoring function as the average of the objective values from (2) over the heldout sets.
IiiC Numerical results
Due to the random initializations, we ran 20 trials for each clustering algorithm in order to obtain the best locally optimal centroids. These optima correspond to 12 and 8 clusters for kmeans and DPmeans, respectively. All code were implemented in MATLAB and computations executed on a 2.7 GHz Intel Core i7 with 8 GB RAM. Figure 1 shows an example of the log of distortion values attained over the range of values for the kmeans algorithm ran with 10fold cross validation. Table I summarizes the results for the clustering algorithms. The recorded computation times were averaged over the 20 trials, and do not include preprocessing and transforming data into features, etc.
cross val.  param. range  clusters  cpu time  

kmeans  10fold  12  154.1 s  
DPmeans  10fold  8  65.4 s 
IiiD Cluster interpretation
Surprisingly, our consultations with expert, highlyranked (top 0.2% worldwide) LoL players corroborated the correctness of the behavior clusters learned by our algorithms. By checking the centroid values corresponding to each feature and using information about the frequency of ingame characters used for each cluster, these expert players were able map each cluster to a specific gameplay type that they had experienced ingame. This suggests that our clustering were intuitively correct. The mappings determined for the 12clusters kmeans result are as follows.

Ranged physical attacker Clusters 1, 7, and 9

Players who maintain distance from fights while dealing high damage with longrange attacks

Players in each cluster differ in risk attitudes, such as whether they attack deeper in enemy territory


Ambusher Clusters 3, 8, 11, and 12

Players who move stealthily around the battlefield and engage in quick, closeranged combat

Some players prefer a team oriented style, whereas others prefer a more “lone wolf” approach

Includes “hybrid” roles with other behavior clusters


Team support Cluster 5

Players who typically assist ranged physical attackers (healing, cooperative attacks, etc.)


Magic attacker Clusters 6 and 10

Players who rely on magicbased attacks; as opposed to physical damage in the above clusters

Differ in preference for close or rangedcombat


Miscellaneous Clusters 2 and 4

No clear style preference

Differs in skill: either a novice player or prefers an allaround gameplay style

Interestingly, we notice from expert analysis that there appears to be a hierarchy of clusters. For instance, clusters 6 and 10 fall under the broader “magic attacker” category. This suggests that we might consider other clustering models than kmeans or DPmeans, as these methods assign each observation to only one cluster. We address this further in Section V.
IiiE Cluster visualization with PCA
Fig. 2 shows the result of applying principal component analysis (PCA) to reduce our feature dimension and visualize it in three dimensions. Notice that the data is clearly clustered into 8 distinct groups, suggesting that in higher dimensions there are probably more clusters. Overlaying our 12groups clustering from the kmeans technique in color, we observe that they are consistent with the PCA results: Almost all points in any kmeans cluster are in the same PCA cluster.
Iv Game Outcome Prediction
We illustrate the accuracy of game outcome predictors that use team composition features based on our gameplay style clusters learned in the previous section. We present our feature selection, the classification models used to learn our predictors, and the accuracies for our predictors.
Iva Feature selection
For our classification algorithms, the features are the two teams’ player compositions. Associated with each team is a vector of counts of players that fall into a certain play style category, which were, say, derived from one of the clustering algorithms. The feature vector is the concatenation of the count vectors of teams 1 and 2. The labels for each sample are the win/loss indicator for the game, with 1 corresponding to a victory and 0 a loss by team 1 to team 2. For instance, there are 8 clusters, teams 1 and 2 have the count vectors and , and team 1 beats team 2. The feature vectorlabel pair would then be
where is the feature vector and is the sample binary label.
IvB Classification models
To obtain the best win/loss outcome predictor, we consider different classification models. To determine the accuracy of our predictors learned using the various models, we held out 10% of our total sample set (over 130,000 in total) for training, and used the heldout samples for testing.
IvB1 Logistic regression
For this model, we use the Bernoulli family of distributions to model the conditional distribution of winning or losing given the team composition features. That is, adhering to our notation introduced above, , where is our model parameter and is our hypothesis, which is derived from formulating the Bernoulli distribution as an exponential family distribution. To learn our model, we find a parameter that maximizes the loglikelihood function
(3)  
(4) 
where is the sample set size. We used stochastic gradient ascent to efficiently find the optimizer .
IvB2 Gaussian discriminant analysis
In this model, we assume that the input features are continuousvalued random variables and model using a multivariate normal distribution. In other words, we use a generative learning model. In our case,
(5)  
(6)  
(7) 
where , , and are the means and covariance of the Gaussian distributions. Here, we maximize the loglikelihood of the samples data
(8) 
The result of maximizing with respect to the model parameters is a set of exact analytic equations [Ng:14], which we compute directly. The derivation of these equations are simple, and we omit them for brevity.
IvB3 Support vector machine
Assuming that our data are separable with a large “gap,” a support vector machine model posits that the size of the geometric margin between some observation point and the decision boundary is proportional to “confidence level” that the observation is classified correctly. The result of this model is an optimization problem that seeks the maximum margin separating hyperplane for our samples.
For our problem, we use regularization since we are uncertain about whether our data is linearly separable (e.g., outliers, erroneous data). The resulting problem is solved using the sequential minimal optimization algorithm [SBS:98].
IvC Evaluation criteria
To evaluate the usefulness of game outcome predictor models with features based on our learned behavior clusters, we compare them against a baseline predictor with features based on the game developers’ official gameplay classes. As introduced in Section III, the game developers have grouped the ingame characters into six broad categories, such as assassin or support, which supposedly reflects the character’s gameplay style. We learn a logistic regression model with features constructed using these categories and use the 10% holdout method for cross validation.
IvD Results and discussion
To ensure fairness of results, we ran 20 trials for each model to determine the predictor accuracies, which are based on different randomized train and test sets. As with our clustering algorithms, all code were implemented in MATLAB, the computations were executed on a 2.7 GHz Intel Core i7 with 8 GB RAM, and the computation times were averaged over the 20 random trials. Again, these times do not include preprocessing and transforming data into features, etc.
As we observe in Table II, the best predictor learned using our behavior clustersbased features uses an SVM model with features derived from our kmeans clustering. This predictor did significantly better (16% better) than the baseline algorithm on the test sets, which had 55.1% and 54.4% accuracies on the training and testing sets. The other predictors were also competitive—all were only less accurate by a tiny percentage.
kmeans  DPmeans  

train acc.  test acc.  cpu time  train acc.  test acc.  cpu time  
LR  72.3%  68.8%  7.4 s  69.7%  67.1%  7.1 s 
GDA  74.8%  70.1%  7.7 s  70.9%  68.4%  7.1 s 
SVM  74.8%  70.4%  91.2 s  71.7%  69.2%  41.6 s 
Other than illustrating the relatively high accuracy of our team compositionbased outcome prediction approach, this result also implies that our behavior clusters learned had more descriptive power than the official game developers’ version. This indirectly concurs with what we have shown from our clustering models: The official gameplay style categories that were used for the baseline algorithm do not necessarily correspond to the behaviors of actual players in games.
Overall, our results validate our framework of first clustering players by their gameplay style and then using team composition features based on these learned styles to predict team performance. And since our target game is a representative title for games of the same type (i.e., teambased roleplaying games), we expect this framework to also be effective and generalizable to other multiplayer games.
V Conclusion and Extensions
In this brief, we have presented an algorithmic framework for outcome prediction: By learning ingame player behavior categories through clustering and using them in features for game outcome predictors based on classification models, we are able to determine wins and losses with over 70% accuracy for our target game. This approach could be used to evaluate how team compositions can affect performance in games other than the one we have considered.
Future work will include adding timedependent player statistics features. Unlike the overall game statistics we used, these timed statistics might give an additional layer of descriptive power, allowing the model to differentiate between clusters based on how players behave early and later in the game. This might lead to a better features for a more accurate team compositionbased win/loss predictor. As another extension, we could also consider different clustering models, such as one that captures the ostensibly hierarchical clustering seen in the expert analysis of the kmeans results. For instance, the BPmeans model described in [TKJ:13] is designed to capture such hierarchical clustering relationships.
Acknowledgments
We thank Professor Andrew Ng and the course staff for motivating and giving feedback for our work. We are also grateful to the LoL expert players who helped with our cluster analysis.