How the system works
Despite the immense wealth of detailed statistics collected and published by insight.gg, there is always an appetite for the easily digestible and quickly summarised Single Number Rating. These can take many forms, from the arcane and unknowable results of a complex regression to the easily understandable and straightforward. Fortunately for this article we’re looking at a pretty simple rating system that should be easily understood by pretty much anyone.
The idea is to the use the round differences between team’s maps scores to build an overall rating of how well those teams performed, even when they don’t all play each other. The basic process is setting up a series of equations – one for each team in the tournament – that read that a team’s rating should be the average of it’s map scores plus the average rating of all the other teams they played (counted once for every map, so potentially multiple times per team).
The wrinkle in this system is that as you run it for each team, the equations change as the team’s ratings are updated, so you need to run through the series multiple times until you reach an equilibrium point where the ratings stop changing, or sometimes bounce between very small variations.
The main benfits of this type of system include:
- The numbers are easy to understand – the currency is in rounds, so you can easily compare how the teams stack up in terms of likely game result
- It has a predictive element that indicates which teams are stronger than others. The system fundamentally includes errors about the past to make more general statements about a team’s quality. Once a tournament has been played the ratings should answer the question not just of who played the best, but who would win in another game tomorrow.
- It allows you to generate secondary information, such as the strength of the schedule a team had to play through to get it’s results. Those in turn let you draw conclusions about how the tournament was structured.
And like any rating system it has some drawbacks:
- It’s easily impressed by massive victories. Because it works on round differences running up a score against a weak team can give the algorithm a biased assessment of a team’s true strength when another equally strong or even stronger team might have been less focused on producing a big win.
- All games are given the same weight, meaningless exhibitions are given the same importance as grand finals, and I think most teams and fans would prefer their side to win the latter. This is potentially a pretty big problem, as teams can frequently play below their best in group stages only to turn it on in the knockout rounds. The final results of a tournament usually resolve in the right way, but it can be a little misleading in the middle of a tournament. Has a team really made a big surge in form or are they just going to revert to their usual level when the bigger dogs decide to turn up?
- All rounds are considered equal, not wins and losses. If a team wins every map 16-14 and the tournament the algorithm is likely to rank a team with a few blowout victories as clearly better. We know how important it is for CS players to perform in the clutch, and many games can come down to the wire.
- It has no contextual information, roster changes, VAC bans, twitter vendettas, the algorithm has no knowledge of these factors.
If you have any interest in rating systems in other sports you’ve probably come across a rough approximation of the above description before. This kind of points ranking system is as old as the hills, and nobody really owns it. However there are a few well known tweaks that people use, and I have slightly modified the system to suit CS a little better.
As outlined above blowout victories can heavily distort the algorithm, and when I ran the first version of it I frequently got results that were plainly sufferring from this effect. Using the insight.gg win percentage model I found a cutoff point for wins which acts as a brake on blowout victories. The results were immediately good, and I’m confident they still allow us to talk about the RRS (round rating system) scores in terms of round difference. An alternative solution such as applying a logarithm to the score differences to cut down on the big differences wouldn’t let us keep talking about that basic unit.
Another side effect of this is that the RRS becomes more descriptive of what has gone before, and less predictive of what will happen in the future. I feel like this is a reasonable tradeoff to maintain a clearer link with reality.
Just in the interests of full disclosure, the exact implementation I use to get to the final result is actually different to that outlined above, because computers provide different ways of solving the same problems, but the final results should be very much the same.