An Elobased rating system for TopCoder SRM
Abstract
We present an Elobased rating system for programming contests. We justify a definition of performance using the logarithm of a player’s rank. We apply the definition to rating TopCoder SRM. We improve the accuracy, guided by experimental results. We compare results with SRM ratings.
0394
1 Introduction
SRM (single round matches) are regular programming contests organized by TopCoder [3] since 2001. TopCoder has developed its own rating system, which has been used throughout its 19 year history [2]. Various shortcomings of SRM ratings have been documented on TopCoder forums and elsewhere. Our purpose here is not to discuss these issues, but to provide a concrete proposal to remedy them. Players have sometimes asked: What would SRM ratings be if they were Elobased? We would like to obtain a reasonable answer to this question.
There isn’t a standard method for applying Elo ratings to rounds of more than two players. We could consider a ranking of players as the set of results between each pair of players. However, such results are not independent from each others: a ranked result is the product of a single performance by each player. Instead, we will consider the ranking as a tournament. From desirable properties, we will deduce a formula for performance in ranked games.
We then use this formula to specifically rate SRM. Our goal is to more accurately predict the players’ performances after each round.
2 Performance
2.1 Rank performance
Let be the performance of a player ranked in a round of players. We consider the ranking as an elimination tournament, and count the number of wins.
If the tournament has players, the winner must win 11 rounds, so we have:
(1) 
The player ranked has one less win than rank :
(2) 
Multiplying both and by any does not change the number of wins:
(3) 
With these constraints, we find the only solution is:
(4) 
2.2 Expectations
We use the standard Elo formula for expectations [4]. A player with rating is expected to outperform a player with rating with probability:
(5) 
As with SRM ratings, the rating of new players is initially 1200.
2.3 Ties
In programming contests like SRM, ties generally reflect a limitation of the problem set or scoring, rather than an unexpected performance of the tied players. We would like ties to not affect the ratings.
We experimented with several accounting methods. We find the most accurate is to split the ties equally in actual and expected ranks, in each. Slightly less accurate is to compute expected ranks regardless of ties, then split the tied ranks as is expected.
We now split the ties equally.
2.4 Relative performance
We have the results of a round, which may include multiple divisions and ties. We consider the results of each division separately. The result of a round is a list of scores , where is the score obtained by player .
We compute rank and expected rank for each player :
(6) 
The relative performance of a player in the round is the difference of actual and expected rank performance:
(7) 
This can be written as:
(8) 
The performance equals a number of wins above or below expectations in a tournament of appropriately matched players.
2.5 Properties
Rank performance is convex (Figure 1). The sum of performances of a set of ranks is maximal if the ranks are distinct, and minimal if the ranks are uncertain or tied. Having split ties equally in actual and expected ranks, the expected ranks are at least as tied as the actual ranks. This ensures the sum of performances in a round is positive or zero. We have:
(9) 
The rank is expected in logarithm (Figure 2). A player’s performance may average in several ways:
(10) 
We will compute , preserving these properties.
2.6 Accuracy
We define the prediction error for expected and actual ranks :
(11) 
Our primary accuracy metric is the average error for all participants in all rated rounds.
3 Proposed rating system
We have the performances of each player in a SRM. We would like to compute rating changes which better predict future performances.
3.1 Initial factor
With a prior of , a performance is outperforming expectations by a factor . A rating difference is expecting a better performance by a factor . We can convert the performances in rating units:
(12) 
If we expected the same performances the next round, and had no other information, this would be a reasonable . Thus we consider as Elobased or ’Elo’.
3.2 Fixed K
Here we compute , with K minimizing the error. We find the most accurate choice is (Figure 3). As we add factors in , we automatically adjust to the most accurate.
3.3 Weight factor
A player’s prior rating should weigh in according to the player’s experience. Let be the weight, such that , and the round number for the player, starting with . We experimented with several possibilities, and find the best results with . Figure 4 shows choices of . Thus we choose .
(13) 
, the error is .7562.
3.4 Variance factor
A player’s performance has variance for various reasons, not necessarily predicting future performance. We compute the derivative of a player’s expected performance per change in rating:
(14) 
We write the expected rank 1 as the loss:win ratio relative to the player’s current rating:
(15) 
In units of performance, we have:
(16) 
Extrapolating linearly, we can solve with .

, so may be conservative.

when . No extrapolation is possible.

when . Here has bits of precision, hence more likely predicts future performance.
We compute in a direction accounting for and a multiplier :
(17) 
We find the most accurate (Figure 5). Thus we choose .
We define the variance factor . We now have:
(18) 
, the error is .7536.
3.5 Maximum factor
An unexpected performance predicts future performance less reliably than a consistent performance. The ratings gain accuracy if we limit the magnitude of to a maximum using a sigmoid. A sigmoid preserves symmetry, and exactly linearity of performance around . We define the adjusted performance . We find the best results with:
(19) 
We find the most accurate (Figure 6). Thus we choose . The rating change for each player is now:
(20) 
, the error is .7513.
3.6 Natural inflation
We have computed which make the ratings more accurate after a round. Each player is expected a performance , thus has an expected . The ratings are stable in expectation.
Because performances have a positive sum, more rating is won by the outperforming players than is lost by the underperforming players. The ratings have net inflation. Because the players gain experience during a round, the players on average have better performances after the round. Thus inflation better predicts future performance than deflation. Because we minimized the prediction error, the average should approximately predict the next performances of participants relative to nonparticipants. We define this rate of inflation as natural inflation.
For comparison with SRM ratings, we consider natural inflation as stable. We refer to this Elobased implementation as ’Elo’ in our results.
’Elo’:
3.7 Stability
We have , predicting the players’ relative performances. Now we would like to estimate the performances over time, such that players with stable performances have stable ratings. To maintain relative accuracy, we will not adjust our current parameters.
3.8 Performance bonus
As long as exactly, the expected rating change of any player is zero. However, the expected performance in a round is a better performance than not participating. Players having practiced already have better performances before the round. Thus in expectation predicts future performance less accurately than .
We adjust the expectation to expect inflation, as if the ratings increased. We choose a parameter , then add the difference in expected performance to each player’s performance:
(21) 
We find (Figure 7), and little accuracy can be gained from this parameter alone.
3.9 New players
So far we have a constant rating for new players. However, the performance of new players is not constant. As the performance of existing players improves, SRMs become more difficult. This raises the barrier to entry.
Before participating, potential players have opportunities to practice on recent rounds. Some may be experienced players coming from other platforms. Thus the performance of new players improves over time. Thus we adjust the initial rating for inflation.

We choose a parameter , the increase in per 100 rounds.

After each round, adjust by .

Simultaneously, adjust for accuracy.
We find most accurately (Figure 8).
Thus, our parameters estimating a stable performance are:
’Elo2’:
4 Results
We first compare our ’Elo’ implementation to SRM ratings. Table 1 shows the average computed , performances, and prediction error, using our definitions.

The first row is our primary metric.

The players by experience.

Existing players, in each division.

In each division, the top and bottom half ranks.
ΔR  perf  err  
Players  ratings  Elo  SRM  Elo  SRM  Elo  SRM 
All  790127  19.5  20.8  0.224  0.298  0.7513  0.8301 
First round  77818  65.3  186.2  0.391  0.373  0.8019  1.0766 
27 rounds  202858  32.5  25.1  0.285  0.244  0.6649  0.7105 
824 rounds  217893  12.2  10.9  0.239  0.511  0.7454  0.8204 
2574 rounds  195019  5.1  4.5  0.171  0.395  0.7671  0.8263 
75199 rounds  84825  0.8  0.7  0.050  0.284  0.8701  0.9093 
200+ rounds  11714  0.5  2.0  0.043  0.241  0.8965  0.9347 
Existing  712309  14.5  2.7  0.206  0.372  0.7458  0.8032 
Division 1  388024  11.7  1.8  0.169  0.241  0.6930  0.7230 
Division 2  324285  17.8  3.7  0.251  0.528  0.8090  0.8992 
D1 H1  194004  34.1  52.1  0.636  0.796  0.9705  1.0406 
D1 H2  194020  10.7  55.8  0.298  0.314  0.4155  0.4054 
D2 H1  162141  58.7  55.3  0.881  1.348  1.1311  1.3831 
D2 H2  162144  23.2  62.7  0.379  0.291  0.4869  0.4153 
Because SRM ratings use a different definition of performance, we include results using independent metrics. Each round, we compute rank correlation statistics [5]:

Kendall’s

Spearman’s
For each metric, we compute the fraction of rounds where ’Elo’ better predicted the result than SRM ratings, splitting ties equally. Table 2 shows the percentages.
Rounds  #  err  

All  1950  87.4  87.1  89.2 
Division 1  1196  79.8  79.6  83.1 
Division 2  754  99.4  98.9  98.9 
216 players  151  48.3  48.7  54.0 
1799 players  204  65.9  65.4  69.1 
100199 players  337  86.5  85.9  91.5 
200399 players  437  92.7  92.7  94.7 
400599 players  310  95.5  95.5  96.1 
600799 players  268  98.1  98.1  96.3 
800+ players  242  99.2  97.9  98.3 
ΔR  perf  err  
Players  ratings  Elo2  Elo  Elo2  Elo  Elo2  Elo 
All  790127  25.7  19.5  0.217  0.224  0.7497  0.7513 
First round  77818  90.6  65.3  0.426  0.391  0.7997  0.8019 
27 rounds  202858  40.4  32.5  0.280  0.285  0.6649  0.6649 
824 rounds  217893  15.6  12.2  0.217  0.239  0.7442  0.7454 
2574 rounds  195019  7.3  5.1  0.156  0.171  0.7650  0.7671 
75199 rounds  84825  2.4  0.8  0.050  0.050  0.8662  0.8701 
200+ rounds  11714  0.6  0.5  0.033  0.043  0.8924  0.8965 
Existing  712309  18.6  14.5  0.194  0.206  0.7443  0.7458 
Division 1  388024  14.9  11.7  0.163  0.169  0.6915  0.6930 
Division 2  324285  22.9  17.8  0.232  0.251  0.8074  0.8090 
D1 H1  194004  37.0  34.1  0.622  0.636  0.9668  0.9705 
D1 H2  194020  7.2  10.7  0.297  0.298  0.4163  0.4155 
D2 H1  162141  63.2  58.7  0.852  0.881  1.1199  1.1311 
D2 H2  162144  17.3  23.2  0.388  0.379  0.4949  0.4869 
Rounds  #  err  

All  1950  58.4  57.3  64.6 
Division 1  1196  59.5  58.3  63.8 
Division 2  754  56.6  55.7  65.8 
216 players  151  52.6  54.0  59.3 
1799 players  204  56.6  54.4  60.3 
100199 players  337  57.7  56.8  58.9 
200399 players  437  62.2  61.1  65.7 
400599 players  310  61.1  61.0  66.1 
600799 players  268  59.0  57.8  69.8 
800+ players  242  53.3  50.4  69.8 
ΔR  R  

Rating  err  max  init  median  max  
SRM  0.8301  20.8  130  900  1200  1043  3923 
initial  0.7908  22.2  142  1202  1200  1301  4663 
K  0.7784  14.7  75  642  1200  1258  3955 
W  0.7562  18.5  102  1762  1200  1327  3734 
C  0.7536  19.6  103  2417  1200  1341  3684 
Elo  0.7513  19.5  102  1548  1200  1369  3709 
B  0.7511  21.5  104  1581  1200  1384  3801 
Elo2  0.7497  25.7  105  1591  1970  2095  4468 
5 Conclusion
Our ’Elo’ implementation generally better predicts the players’ relative performances than SRM ratings. The ranks are also better predicted, with predictions improving with the number of players. Our ’Elo2’ adjustments improve stability and slightly improve accuracy.
Our primary metric considers all the players’ performances in all SRM. The predictions are empirically accurate, on average, but not necessarily precise for any player or at any time.
We include source code and charts in appendix. Other results are posted on our web site [1].
6 Acknowledgements
We would like to thank Ivan Kazmenko for reviewing this paper and helpful comments.
Appendix A
Appendix B
References
 Note: \urlhttp://tc.eloranked.com Cited by: §5.
 TopCoder: algorithm competition rating system. Note: \urlhttps://community.topcoder.com/tc?module=Static&d1=help&d2=ratings Cited by: §1.
 TopCoder. Note: \urlhttps://www.topcoder.com/about/ Cited by: §1.
 Wikipedia: elo rating system. Note: \urlhttps://en.wikipedia.org/wiki/Elo_rating_system Cited by: §2.2.
 Wikipedia: rank correlation. Note: \urlhttps://en.wikipedia.org/wiki/Rank_correlation Cited by: §4.