Making a team ranking and the Glicko system

 in Categories CSGO, Explainers

Why make a new team ranking?

Counter Strike already has team rankings, most popularly the hltv rankings system. That seems to do a reasonable job, so what reason could there be for creating something new?

The primary reason is that with the right ranking system you can do other things, such as predict results and create simulations. Having a system that produces ratings that we can use in this way increases the richness of the overall scene and provides a basis for new tools. Also bookmakers and gamblers will usually set initial odds using a system like this, and this gives punters some insight into result prediction.

Finally it also gives me targets to potentially expand the number of competitions I cover. If I see a team rated well but they rarely get any coverage from the detailed stats, I can look at including more tournaments they play in.

Method of choice

The first thing that usually jumps to everybody’s mind when it comes to ranking is Elo, the system used by the chess governing body FIDE to rate it’s players. This is an old system and it’s main feature is that it changes a player’s ranking based on the expectation to win, so individuals get more ranking points by beating someone ranked well above them.

However because it’s old it has been improved upon in recent years. The main problem with Elo is that it doesn’t respond very quickly when something being rated changes in performance or deal well with inconsistent performance.

More recent rating system address this, and I’m using one of them. Glicko is a system that includes a variable for uncertainty. As a team’s rating gets more refined by the opposition they manage to beat, the uncertainty about their rating drops. If a team makes a change and it affects their performance the uncertainty about their new level increases until they find their new place.

At the time of writing for example, Astralis are using a series of subs because device is on sick leave. Their results are suffering, and as this happens as well as their rating dropping the uncertainty about their rating increases as well. That helps the algorithm to adjust to their new level more quickly.

This is a branch of Bayesian probability and you can read more about it here.

Practicalities of implementation

Ideally it would be possible to constantly watch for global results and update the rating based on that. In reality results have to be updated in batches, so it’s likely to be a weekly process of result collection, collation and processing to produce rating chances.

Also considering how far down the overall ladder of competition I’m trying to go then we’re talking about a lot of competitions and results, and inevitably it includes teams that are just loosely thrown together and don’t have a long life.

There are steps that can be taken to filter these teams out. When a team is newly created it has a very large uncertainty score so waiting for that to converge to a fairly low number is a good practical guide for teams that have had a valid history for rating purposes.

One final wrinkle is that in a compressed time period it’s hard to give a default rating that will adjust quickly to a team’s place in the world, and I’ve only taken a year’s worth of results. I can’t start all the elite teams at the same level I’d start a Chinese league team and expect the intermediate competitions to filter through all the rating points.

To remedy this when I first discovered a team I gave it a default based on the level of competition it was in. That preloaded the ratings and let them adjust naturally from within roughly the right range. Even so there are some problems that are just impossible to reliably adjust for.

Problems with big lists

One of the problems with looking at a lot of results is regional clumping. By this I mean the tendency for a lot of teams from regions to play each other but not get outside of their area enough to have the region adjust it’s overall rating to reflect it’s place in the wider world.

For instance for a small  domestic scene to adjust it’s top teams must play internationally enough times to lose points to better teams, and then beat their local rivals to adjust the whole local scene. Tyloo have played a few matches in the West this year, but that small exposure won’t have leaked enough of their domestic ranking points to adjust their entire domestic scene.

Given this effect once you get outside the international circuit you tend to get bubbling effects where team rankings may not entirely make sense when compared across continents. There’s very little that can be done about this without getting into more complex procedures such as regressing the games between regional teams and setting up a regional rating adjustment based off that. That’s something that might happen later.

Priority and big games

One of the other drawbacks to using this kind of system is that there’s no prioritisation of big games over small ones, so dropping the odd meaningless map in a group stage or playing just well enough to qualify for a playoff in a league probably suppresses a team’s ranking more than they deserve.

Being the team that can find a higher level when needed will always be the bane of ranking systems. At the time of writing Virtus Pro are languishing in mid field anonymity according to their ranking, but when a major tournament rolls around you can never count them out which gives our ranking system a problem.

It’s a fundamental flaw in any ranking system that works in this way. Even with something like the uncertainty in Glicko you never really capture a team’s ability to improve their level unexpectedly. It’s something you just need to bear in mind when looking at the rankings, past performance is no guarantee of future results.

Future exploitation

Despite these problems there are some interesting possibilities to explore. With settled ratings I can build predictive models, provide odds and create simulations.

Before tournaments I’ll be looking to create Monte Carlo simulations based on ratings and provide odds from that. Also team comparison and prediction tools are possible that can potentially guide betting and skin gambling.