An Elo-based rating system for TopCoder SRM

# An Elo-based rating system for TopCoder SRM

## Abstract

We present an Elo-based rating system for programming contests. We justify a definition of performance using the logarithm of a player’s rank. We apply the definition to rating TopCoder SRM. We improve the accuracy, guided by experimental results. We compare results with SRM ratings.

\DeclareUnicodeCharacter

0394

## 1 Introduction

SRM (single round matches) are regular programming contests organized by TopCoder  since 2001. TopCoder has developed its own rating system, which has been used throughout its 19 year history . Various shortcomings of SRM ratings have been documented on TopCoder forums and elsewhere. Our purpose here is not to discuss these issues, but to provide a concrete proposal to remedy them. Players have sometimes asked: What would SRM ratings be if they were Elo-based? We would like to obtain a reasonable answer to this question.

There isn’t a standard method for applying Elo ratings to rounds of more than two players. We could consider a ranking of players as the set of results between each pair of players. However, such results are not independent from each others: a ranked result is the product of a single performance by each player. Instead, we will consider the ranking as a tournament. From desirable properties, we will deduce a formula for performance in ranked games.

We then use this formula to specifically rate SRM. Our goal is to more accurately predict the players’ performances after each round.

## 2 Performance

### 2.1 Rank performance

Let be the performance of a player ranked in a round of players. We consider the ranking as an elimination tournament, and count the number of wins.

If the tournament has players, the winner must win 1-1 rounds, so we have:

 RP(2n,1)=n (1)

The player ranked has one less win than rank :

 RP(n,2)=RP(n,1)−1 (2)

Multiplying both and by any does not change the number of wins:

 RP(n,r)=RP(kn,kr) (3)

With these constraints, we find the only solution is:

 RP(n,r)=log2n−log2r (4)

### 2.2 Expectations

We use the standard Elo formula for expectations . A player with rating is expected to outperform a player with rating with probability:

 WP(Ri,Rj)=11+10(Rj−Ri)/400 (5)

As with SRM ratings, the rating of new players is initially 1200.

### 2.3 Ties

In programming contests like SRM, ties generally reflect a limitation of the problem set or scoring, rather than an unexpected performance of the tied players. We would like ties to not affect the ratings.

We experimented with several accounting methods. We find the most accurate is to split the ties equally in actual and expected ranks, in each. Slightly less accurate is to compute expected ranks regardless of ties, then split the tied ranks as is expected.

We now split the ties equally.

### 2.4 Relative performance

We have the results of a round, which may include multiple divisions and ties. We consider the results of each division separately. The result of a round is a list of scores , where is the score obtained by player .

We compute rank and expected rank for each player :

 rties=12|{sj∈S:j≠i,sj=si}|r=1+|{s∈S:s>si}|+rties^r=1+∑j:sj≠siWP(Rj,Ri)+rties (6)

The relative performance of a player in the round is the difference of actual and expected rank performance:

 P=RP(n,r)−RP(n,^r) (7)

This can be written as:

 P(^r,r)=log2^r−log2r (8)

The performance equals a number of wins above or below expectations in a tournament of appropriately matched players.

### 2.5 Properties

Rank performance is convex (Figure 1). The sum of performances of a set of ranks is maximal if the ranks are distinct, and minimal if the ranks are uncertain or tied. Having split ties equally in actual and expected ranks, the expected ranks are at least as tied as the actual ranks. This ensures the sum of performances in a round is positive or zero. We have:

 ∑P≥0 (9)

The rank is expected in logarithm (Figure 2). A player’s performance may average in several ways:

 P(a,b)+P(b,c)+P(c,a)=0P(r,rxk)=k.P(r,rx)P(^r,r)=P(k^r,kr) (10)

We will compute , preserving these properties.

### 2.6 Accuracy

We define the prediction error for expected and actual ranks :

 E(^r,r)=|P|=|log2^r−log2r| (11)

Our primary accuracy metric is the average error for all participants in all rated rounds.

## 3 Proposed rating system

We have the performances of each player in a SRM. We would like to compute rating changes which better predict future performances.

### 3.1 Initial factor

With a prior of , a performance is outperforming expectations by a factor . A rating difference is expecting a better performance by a factor . We can convert the performances in rating units:

 ΔR=400log210P=K0⋅PK0≈120 (12)

If we expected the same performances the next round, and had no other information, this would be a reasonable . Thus we consider as Elo-based or ’Elo’.

### 3.2 Fixed K

Here we compute , with K minimizing the error. We find the most accurate choice is (Figure 3). As we add factors in , we automatically adjust to the most accurate.

### 3.3 Weight factor

A player’s prior rating should weigh in according to the player’s experience. Let be the weight, such that , and the round number for the player, starting with . We experimented with several possibilities, and find the best results with . Figure 4 shows choices of . Thus we choose .

 ΔR=K⋅PW (13)

, the error is .7562.

### 3.4 Variance factor

A player’s performance has variance for various reasons, not necessarily predicting future performance. We compute the derivative of a player’s expected performance per change in rating:

 P=−log2^rdPdR=−1ln2.^r.d^rdR^r=1+∑j≠iwjwj=WP(Rj,R) (14)

We write the expected rank 1 as the loss:win ratio relative to the player’s current rating:

 ^r(dR)=10−dR/400+∑11+10(R+dR−Rj)/400d^rdR=−ln10400(1+∑wj(1−wj))dPdR=1K0.1+∑wj(1−wj)1+∑wj (15)

In units of performance, we have:

 P′=1+∑wj(1−wj)1+∑wj (16)

Extrapolating linearly, we can solve with .

• , so may be conservative.

• when . No extrapolation is possible.

• when . Here has bits of precision, hence more likely predicts future performance.

We compute in a direction accounting for and a multiplier :

 ΔR∝P1+C⋅P′ (17)

We find the most accurate (Figure 5). Thus we choose .

We define the variance factor . We now have:

 ΔR=KV⋅PW (18)

, the error is .7536.

### 3.5 Maximum factor

An unexpected performance predicts future performance less reliably than a consistent performance. The ratings gain accuracy if we limit the magnitude of to a maximum using a sigmoid. A sigmoid preserves symmetry, and exactly linearity of performance around . We define the adjusted performance . We find the best results with:

 PA=P1+|P|M (19)

We find the most accurate (Figure 6). Thus we choose . The rating change for each player is now:

 ΔR=KV⋅PAW (20)

, the error is .7513.

### 3.6 Natural inflation

We have computed which make the ratings more accurate after a round. Each player is expected a performance , thus has an expected . The ratings are stable in expectation.

Because performances have a positive sum, more rating is won by the outperforming players than is lost by the underperforming players. The ratings have net inflation. Because the players gain experience during a round, the players on average have better performances after the round. Thus inflation better predicts future performance than deflation. Because we minimized the prediction error, the average should approximately predict the next performances of participants relative to non-participants. We define this rate of inflation as natural inflation.

For comparison with SRM ratings, we consider natural inflation as stable. We refer to this Elo-based implementation as ’Elo’ in our results.

’Elo’:

### 3.7 Stability

We have , predicting the players’ relative performances. Now we would like to estimate the performances over time, such that players with stable performances have stable ratings. To maintain relative accuracy, we will not adjust our current parameters.

### 3.8 Performance bonus

As long as exactly, the expected rating change of any player is zero. However, the expected performance in a round is a better performance than not participating. Players having practiced already have better performances before the round. Thus in expectation predicts future performance less accurately than .

We adjust the expectation to expect inflation, as if the ratings increased. We choose a parameter , then add the difference in expected performance to each player’s performance:

 ΔP=BK0⋅P′ (21)

We find (Figure 7), and little accuracy can be gained from this parameter alone.

### 3.9 New players

So far we have a constant rating for new players. However, the performance of new players is not constant. As the performance of existing players improves, SRMs become more difficult. This raises the barrier to entry.

Before participating, potential players have opportunities to practice on recent rounds. Some may be experienced players coming from other platforms. Thus the performance of new players improves over time. Thus we adjust the initial rating for inflation.

• We choose a parameter , the increase in per 100 rounds.

• After each round, adjust by .

We find most accurately (Figure 8).

Thus, our parameters estimating a stable performance are:

’Elo2’:

## 4 Results

We first compare our ’Elo’ implementation to SRM ratings. Table 1 shows the average computed , performances, and prediction error, using our definitions.

• The first row is our primary metric.

• The players by experience.

• Existing players, in each division.

• In each division, the top and bottom half ranks.

Because SRM ratings use a different definition of performance, we include results using independent metrics. Each round, we compute rank correlation statistics :

• Kendall’s

• Spearman’s

For each metric, we compute the fraction of rounds where ’Elo’ better predicted the result than SRM ratings, splitting ties equally. Table 2 shows the percentages.

Tables  3,  4 and  5 compare our ’Elo2’ and ’Elo’ implementations.

## 5 Conclusion

Our ’Elo’ implementation generally better predicts the players’ relative performances than SRM ratings. The ranks are also better predicted, with predictions improving with the number of players. Our ’Elo2’ adjustments improve stability and slightly improve accuracy.

Our primary metric considers all the players’ performances in all SRM. The predictions are empirically accurate, on average, but not necessarily precise for any player or at any time.

We include source code and charts in appendix. Other results are posted on our web site .

## 6 Acknowledgements

We would like to thank Ivan Kazmenko for reviewing this paper and helpful comments.

## Appendix A

//
// tc.eloranked.com/EloSRM.h
// (c) 2019-2020 Batty, Kamenetsky
//
#include <math.h>
#include <vector>
#include <algorithm>
namespace EloSRM
{
const double EloScale = M_LN10 / 400;
const double K0 = 400 * M_LN2 / M_LN10;
const double R0 = 1200;         // initial rating
double R1 = R0;                 // adjusted for inflation
const double N = .63;           // per round
const double K = 600;           // initial K
const double C = 4;             // coeff P’
const double M = 6.75;          // max perf
const double B = 27 / K0;       // perf bonus
struct Player {
int numRatings = 0;
double rating = R1;
};
struct Result {
Player* player;
int div;
double score;
double deltaR;
};
inline double winP(double ri, double rj) {
double s = (rj - ri) * EloScale;
return 1 / (1 + exp(s));
}
void rateDivision(Result* results, int n) {
#pragma omp parallel for
for (int i = 0; i < n; i++) {
Player* pi = results[i].player;
double si = results[i].score;
double ri = pi->rating;
double erank = 1, arank = 1;
double mu = 1, var = 1;
for (int j = 0; j < n; j++) {
if (j == i) continue;
double sj = results[j].score;
double rj = results[j].player->rating;
double wp = winP(rj, ri);
mu += wp;
var += wp * (1 - wp);
if (si == sj) {
erank += .5;
arank += .5;
}
else {
erank += wp;
arank += si < sj;
}
}
double err = log(erank / arank) * M_LOG2E;
double err1 = var / mu;
double perf = err + B * err1;
double pa = perf * M / (M + abs(perf));
double wf = sqrt(1. + pi->numRatings);
double vf = 1 + C * err1;
double w = wf * vf;
results[i].deltaR = K * pa / w;
}
}
void rateRound(std::vector<Result>& results) {
int n = (int)results.size();
std::sort(results.begin(), results.end(), [](auto& r0, auto& r1) { return r0.div < r1.div; });
for (int i = 0, j = 1; j <= n; j++) {
if (j == n || results[j].div != results[i].div) {
rateDivision(&results[i], j - i);
i = j;
}
}
for (int i = 0; i < n; i++) {
Player* pi = results[i].player;
pi->numRatings++;
pi->rating += results[i].deltaR;
}
R1 += N;
}
}

## Appendix B

### References

1. Note: \urlhttp://tc.eloranked.com Cited by: §5.
2. TopCoder: algorithm competition rating system. Note: \urlhttps://community.topcoder.com/tc?module=Static&d1=help&d2=ratings Cited by: §1.
3. TopCoder. Note: \urlhttps://www.topcoder.com/about/ Cited by: §1.
4. Wikipedia: elo rating system. Note: \urlhttps://en.wikipedia.org/wiki/Elo_rating_system Cited by: §2.2.
5. Wikipedia: rank correlation. Note: \urlhttps://en.wikipedia.org/wiki/Rank_correlation Cited by: §4.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters   