Round Rating System

How the system works

Despite the immense wealth of detailed statistics collected and published by insight.gg, there is always an appetite for the easily digestible and quickly summarised Single Number Rating. These can take many forms, from the arcane and unknowable results of a complex regression to the easily understandable and straightforward. Fortunately for this article we’re looking at a pretty simple rating system that should be easily understood by pretty much anyone.

The idea is to the use the round differences between team’s maps scores to build an overall rating of how well those teams performed, even when they don’t all play each other. The basic process is setting up a series of equations – one for each team in the tournament – that read that a team’s rating should be the average of it’s map scores plus the average rating of all the other teams they played (counted once for every map, so potentially multiple times per team).

The wrinkle in this system is that as you run it for each team, the equations change as the team’s ratings are updated, so you need to run through the series multiple times until you reach an equilibrium point where the ratings stop changing, or sometimes bounce between very small variations.

The main benfits of this type of system include:

  • The numbers are easy to understand – the currency is in rounds, so you can easily compare how the teams stack up in terms of likely game result
  • It has a predictive element that indicates which teams are stronger than others. The system fundamentally includes errors about the past to make more general statements about a team’s quality. Once a tournament has been played the ratings should answer the question not just of who played the best, but who would win in another game tomorrow.
  • It allows you to generate secondary information, such as the strength of the schedule a team had to play through to get it’s results. Those in turn let you draw conclusions about how the tournament was structured.

And like any rating system it has some drawbacks:

  • It’s easily impressed by massive victories. Because it works on round differences running up a score against a weak team can give the algorithm a biased assessment of a team’s true strength when another equally strong or even stronger team might have been less focused on producing a big win.
  • All games are given the same weight, meaningless exhibitions are given the same importance as grand finals, and I think most teams and fans would prefer their side to win the latter. This is potentially a pretty big problem, as teams can frequently play below their best in group stages only to turn it on in the knockout rounds. The final results of a tournament usually resolve in the right way, but it can be a little misleading in the middle of a tournament. Has a team really made a big surge in form or are they just going to revert to their usual level when the bigger dogs decide to turn up?
  • All rounds are considered equal, not wins and losses. If a team wins every map 16-14 and the tournament the algorithm is likely to rank a team with a few blowout victories as clearly better. We know how important it is for CS players to perform in the clutch, and many games can come down to the wire.
  • It has no contextual information, roster changes, VAC bans, twitter vendettas, the algorithm has no knowledge of these factors.

Modifications

If you have any interest in rating systems in other sports you’ve probably come across a rough approximation of the above description before. This kind of points ranking system is as old as the hills, and nobody really owns it. However there are a few well known tweaks that people use, and I have slightly modified the system to suit CS a little better.

As outlined above blowout victories can heavily distort the algorithm, and when I ran the first version of it I frequently got results that were plainly sufferring from this effect. Using the insight.gg win percentage model I found a cutoff point for wins which acts as a brake on blowout victories. The results were immediately good, and I’m confident they still allow us to talk about the RRS (round rating system) scores in terms of round difference. An alternative solution such as applying a logarithm to the score differences to cut down on the big differences wouldn’t let us keep talking about that basic unit.

Another side effect of this is that the RRS becomes more descriptive of what has gone before, and less predictive of what will happen in the future. I feel like this is a reasonable tradeoff to maintain a clearer link with reality.

Just in the interests of full disclosure, the exact implementation I use to get to the final result is actually different to that outlined above, because computers provide different ways of solving the same problems, but the final results should be very much the same.

Strength Of Schedule

Strength Of Schedule

The strength of schedule scores displayed in an event team summary table are the average scores of the Round Rating System scores of each opponent played, calculated on a per map basis. So 3 maps against a very tough opponent means that counts 3 times into the Strength of Schedule average, not just once.

Strength of schedule gives some indication of how hard the path of any of the given teams was to the stage in they reached in the tournament. Frequently in CS some groups and playoff draws are harder or softer. It’s not intended as a modifier to a team’s RRS score – that already contains the information of how their opponents stacked up against everybody else in the tournament.

This is most useful in identifying teams that may have made a deep tournament run based on a fairly soft schedule, or strong teams that have been dumped out early because they had hard games. In combination with 2nd order wins this can be helpful in identifying unjustly seeded teams.

Often in a CS tournament one of the whipping boys in a group will have a very high SoS score. The reason is that CS tournament groups tend to be small, so a weak team will face only the other strong teams, while those strong teams will have their SoS score diluted by the weak team. On top of that the weak team will never make it out of their group to face one of the teams from another group. When groups are unbalanced, which is unfortunately quote common in CS, the weak team in a strong group will usually have the highest SoS in the tournament.

Performance Above Eco Expectation

One of the frustrations of counter strike player statistics is that players are frequently under-equipped compared to their opponents and teammates but their raw stats don’t reflect this. Also sometimes a strong player is on a team that struggles and as a result their team plays a lot of Eco rounds. That player is at a disadvantage statistically through no fault of their own.

Some players hog the budget or have the luxury of playing a role that allows them to more frequently survive rounds, carrying their kit over to the next. In a one sided match players frequently sit on a budget advantage I wanted to look at developing a statistic that accounted for the value of equipment players carried into rounds and how that affected their performance.

To measure the performance though I needed a single performance statistic that would make sense and be easily understandable, and that had a clear relationship to a player’s share of eco expenditure.

My candidate stats were:

  • Net Kills per Round, (kills – deaths)/rounds. This is a fairly straightforward stat that is heavily optimised to the single individually measurable stat that correlates well to the round win – kills. It also incorporates the damage getting killed does to your team’s chances. It rewards play taken to a maximum point of aggression while also taking care of a player’s own life.
  • Raw Net Kills. Largely used as a reference point to check that correlating net kills per round improved the accuracy of the study.
  • Average Damage per Round. This is something that is often cited as a particularly fair stat, as it still allows players who have a dangerous role in the team to still acquire positive stats. In my studies though I haven’t found it to correlate as strongly to winning as kill statistics.
  • Kill/Death ratio. Anyone who has spent time with game statistics know that these ratios can be very misleading as they interact very badly with time, essentially rewarding camping play and baiting your own team. However that also makes them an excellent part of a study, because if a ratio is the best match something has potentially gone wrong.

The study set is a very broad base of games across a wide variety of competitions and levels, creating thousands and thousands of performances. For each game the total expenditure for both teams was calculated and compared to the potential metrics identified above to find relationships between eco share and individual stats.

Just be completely clear, when I talk about Eco Share, I’m talking about all the players in the server, not just the players on one team. If you’re hogging your team’s budget then players are expected to do well, but if the opposition is stacked then we have to make adjustments to performance expectations as well.

Net Kills per Round (NKR) had the best correlation with Eco Share, at 0.723 when ADR only correlated at 0.569. NKR was significantly better than raw Net Kills which was expected and had to be confirmed. KD also correlated well with Eco Share (better than ADR) but not as well as NKR.

So having established that player NKR has a significant relationship with a player’s Eco Share, the next step was to create a model. Something to remember is that a model is essentially predictive in nature, so it will never correlate as well as the correlations taken only from the historical data.

What immediately became obvious from visualising the data was that a simple linear relationship wouldn’t cover the variations, so a 3 degree polynomial regression gave me a pretty solid match – correlation to model of 0.5325 r2.

Translated into English this means it’s not perfect, but clearly has meaning. It produces an S shaped curve which indicates the initial budget differences are of key importance and there is a diminishing return for bigger use of the budget. So going from 10% to 12% of the overall budget should produce a bigger jump in performance than going from 14% to 16%.

This shows the trend as well as the general shape of the prediction made. Deviations from the trend line are things I’m putting down to skill, luck and good or bad form and being on the team with less overall money. There are a few problems with this, and as with everything I’ve had to make some trade-offs when it comes to presenting the data.

Firstly it doesn’t deal that well with pistol rounds. It’s rare for players to have much disparity in their Eco Share in those rounds and there is a slight bias in the model to a positive score, and there aren’t very many pistol rounds in comparison to most other types of round so in a single map or even match there often isn’t enough variation to produce a refined number.

It also seems that AWP players at a top level are more deadly than my general model predicts, so successful snipers appear to get rewarded more strongly than perhaps is justified.

I wanted to keep this simple so even if you don’t understand what is going on mathematically you can visualise that as a player spends more in comparison to everyone else in the game their performance is expected to go up. I am aware of some more refined possibilities, but they start to become hard to understand both mathematically and conceptually.

For the future there are still many possibilities for improvement. Having used a very large study set to create this baseline statistic there are some weaknesses in how it applies to elite level players, and there are some other complications to do with how the Eco considerations have two sides – the value of the equipment and the potential of them to earn more money which this study doesn’t take into account.

Overall the problem of the perfect model of player performance has only had it’s surfaced scratched by this study, but there are plenty more ideas in the works.

Win Percentage

Winning percentages crop up in a couple of places, in the game graph and in the wpar (win percentage added per round) player stat. These stats were born out of the realisation that not all kills are created equal. The 4th kill by a fully stacked team cleaning up the opposition during their eco round is a lot less important than the first kill between two teams full buying towards the end of the game and facing economic ruin if they lose.

To establish exactly to what extent I returned to the study dataset and looked at various scenarios, using both large scale data like round scores and comparative economies and the smaller details of round kills and relative equipment values.

This took some time to unpick, and eventually broke down into two broad groups – the round scores with a linear relationship and the in-round kill stats and all economic stats that were a lot more subtle.

To start with round stats, there is a straightforward linear relationship between having a round lead and the chances of victory with each round having the same value in comparison to eachother. This varies by how far through the game the lead is established. A 3 round lead after 17 rounds predicts the result a lot more strongly than a 3 round lead after only 5 rounds. One of the interesting results of this reasoning is that the 2nd pistol round is more important than the first not just on eco grounds but in terms of the value of the round itself, as the swing in round score it produces is more predictive of the final result. The exception is if one team already holds a massive round lead making dropping a handful of 2nd half rounds less important.

When examining in-round win chances, kills start to become more complex. This is down to how there is a diminishing return in terms of the winning chance a numbers advantage gives a team. There are only so many ways the team with the extra players can maneuver to improve their chances of winning, particularly given the limitations of the timer.

It boils down to the first 2 kills giving a very large advantage, and after that they become a lot less important to the outcome of the round. An interpretation of this information could be that rounds are often decided when multiple players are killed and I’m just seeing that it’s very common to win a round by a margin of two kills. However checking specific scenarios, 5 vs 3, 4 vs 2, etc, seems to bear out the general conclusion. However if the players left in the smaller team can bring things back to an even contest then they have regained a lot of win percentage.

As with economic considerations in the PAE stats, the economy affects win chances in a more complex way. If we start with game level economy that goes hand in hand with round scores I’ve found that this affects win chances in two ways. Firstly an advantage in economic resources (like in PAE stats) need to be explained by a polynomial, which essentially means that the initial advantages are much more important. The first 10 and 20 thousand eco advantage is much more important to winning chances than any amount that is acquired on top of that.

The second part is that like round score advantage, the economic advantage has more impact over time – a $20,000 budget lead with only a few rounds left in the game can be worth many times the same lead established at the start of the match. And like round advantages it shows that the 2nd pistol round is more important because the eco advantage it secures (or gives the opportunity to secure at least) will probably take affect during more important rounds. The exception to this if a match so one sided that one team already holds a massive round advantage – in that case it’s almost certain that at some point the economy will swing around and give the leading team the chance to close out the map.

The economy at the round level isn’t quite as important, but it does provide us with some interesting insights. Once again a polynomial is required to explain it, so once again the initial advantages are more important in terms of who secures the win – the first $20,000 dollars of advantage are by far the most influential in determining the round win. But we know that teams will stack their loadout way above this when facing an eco. Why?

The answer is that the larger the equipment value advantage the larger the likely margin of victory in terms of players surviving, which cements the long term economic advantage because they don’t lose equipment and they don’t give eco to the opposition. My study shows that for roughly each $7000 of equipment value advantage a team is expected to win by an extra player, so a $35,000 advantage should produce a 5-0 wipe. Anything short of that indicates a sub optimal use of economic investment, but given the headshot potential of weapons in CS it’s always a possibility.

Win Percentage in action

Taking Envyus vs Faze in January’s Eleague major on de_nuke as an example we can see how the game winning chances change based on the circumstances

The left axis measures win percentage from Envyus’s perspective (spoiler: they lost the map) and is shown by the green lines on the graph, and the right axis measures the relative economic advantage shown by the red line on the graph once again from the perspective of Envyus, so negative numbers show that advantage to Faze.

  1. Faze get out to a solid 3 round lead and establish a solid eco advantage as well, but as it’s early on in the game so the relative influence of that is fairly small.
  2. Envyus come back with a run of rounds and flip things in their favour, establishing a small eco advantage.
  3. Faze go on a long run establishing a big 12-6 lead. Something to note though is that they don’t secure a very large eco advantage to go with it (partly due to the half time reset), Envyus manage to continue to be competitive enough to stop that long term persistent advantage going too far against them. This is what good eco management by the team behind looks like interacting with the team ahead going heavily for big anti-eco buys. Even so Faze are very heavily favoured here.
  4. Envyus mount a comeback to get to 11-12, but although they never get the round lead they establish a clear eco lead which at this stage of the game gives them an advantage going into the key final quarter of the game. Faze are one round loss from having their economy broken at a critical stage.
  5. Faze come through in the clutch and never lose another round, two rounds after having their back against the wall they flip the eco situation and ride it home to victory. Envyus’s economy ends in tatters as they desperately try to survive and fail.

Ultimately in this game Faze’s mid round run gave them enough slack that they could absorb the Envyus comeback, just avoid losing the lead and finish the map strongly. Did they get complacent and nearly let the map slip? They were certainly in mortal danger from a position of nearly assured victory.

Win percentage for players: WPAR

For players win percentage is measured as Win Percentage Added per Round or WPAR. It’s split over rounds as games that go on longer may result in larger values for players giving a misleading comparison between maps. The values for all players usually add up to around zero as you might expect, but there are sometimes macro considerations that don’t interact neatly with the value so it can be off a little.

Sometimes players that have a good PAE will also have a good WPAR, but often times they won’t. The reason for this is that there’s an element of clutchness to WPAR, not just how many kills you got but when you got them and how much impact they had.

A player with high PAE and low WPAR is probably getting a lot of cleanup frags in already won situations, or is giving up their life in the least favourable way possible – usually being the kill that breaks the team equilibriums or loses them a lot of eco investment.

A player with both high PAE and WPAR is not only out-performing their equipment value, they’re doing it at key times as well, getting key kills and advancing their team’s objectives at a bargain price.

A player with both low PAE and low WPAR is having a really bad day. An example of a player this might happen to is a team Awper who is eating up a lot of the team’s budget and is expected to get kills, but is being dominated by the opposition snipers instead.

Descriptive Game Statistics

There are two map specific descriptive stats featured on match pages that need some explanation, Contest Score and Comeback Index.

Contest Score summarises how close the map was by reflecting how many rounds there were between teams throughout the duration of it. The figure is the average difference between the two team scores at any given time in the match. The lower the score, the closer the match.

The range isn’t specific, but any game where the average is below 2 is a fairly competitive game and it can be less than 1. A theoretical maximum is the average difference in rounds in a 16-0 slaughter, which would be roughly 8.

Comeback Index measures how large the swings between the two teams were during the match. It’s scored between 1 and 0, with 1 reflecting a game where one team got out to a 15-0 lead and the other came back to erase that advantage.

Note that comeback index doesn’t necessarily indicate that the comeback was successful or that any given advantage was completely erased, just that there was a big score discrepancy at one point that was made smaller, and if the comeback index score is closer to 1 then a lot smaller.

2nd Order Wins

2nd order wins is inspired by the baseball metric created by Bill James. It relies on a concept called Pythagorean Expectation. The idea is that there is a mathematical relationship between an in-game objective and the outcome of the overall game, and that looking at this in-game objective allows us to determine how many games the team should have won, compared to how many they actually won. In baseball it’s runs, in Counter Strike I’ve looked at rounds. The idea is that the underlying stat provides many more samples so helps filter out the effect of small events or single decision points that might skew an entire result because of their timing – essentially luck.

You can read more about the original at wikipedia. The technique has been refined over the years and I’m using the method nicknamed Pythagenpat. The main variation between methods over the years has been arriving at the exponent part of the equation, in Bill James’s original he settled on an exponent of 2, in analysis by others this was refined to a more accurate number and now it has been shown the exponent can vary based on the number of rounds and games there are in the sample.

One of the more controversial elements of this is that the size of the exponent gives an indication of how much luck is involved in the type of game being analysed. This method has been applied to many sports and basketball is the one with the largest exponent I know of (and therefore the least luck). Given how much scoring there is in a basketball game you can see why. In CS my original study set included games from a variety of competitive levels, but I’ve recently re-run the calculations and the top level has much less of this “luck” factor than it originally appeared. It’s far lower than professional soccer for example, and given that you are guaranteed 30 round scores in a CS game rather than 2 or 3 goals in a soccer match that probably makes sense.

The value of the stat increases the more games are played, the more samples there are the more you’d expect the scores to converge. A team that is over or underperforming their pythagorean expectation despite having many samples may be clutching very effectively, crumbling under the pressure, or yes experiencing an extreme example of luck.

This potentially throws an interesting light on teams that may appear to be in crisis, but have in fact just suffered the effects of variance.

Luck vs Skill: how much is clutch?

The idea that this variance is down to luck comes from the assumption that players are taking on shots and duels that only have a certain % chance of success, or that it’s possible to be caught out by a perfect counter strategy with no warning, and the distribution of whether they work out in a particular player or team’s favour or not is down to luck. A lot of people are going to argue that this is down to players performing in the clutch, and there’s definitely something to that. Any team that performs better than it’s pythagorean expectation over a lot of samples is probably raising their game at critical times and any team under performing when there are many samples is probably choking.

Pressure is real and performing (whether it be making shots or making the right calls) in tight situations where games are won and lost is a critical skill, but in any small sample size it’s impossible to statistically say whether a team is clutching or is just benefitting from hitting their shots at the right time. Comparing their longer term stats to a specific smaller sample (like a specific competition) might give an indication why a team is above or below their pythagorean expectation.

Kill Graphs

The kill graph can seem a bit confusing when you first look at it. Clicking on a player’s node should isolate their relationships to opposing players, but exactly what you’re seeing still needs some explanation.

Example image:

Here I’ve isolated twist’s data. The player node is blue, so blue arrows represent kills he has gotten, and red arrows represent times he has been killed. Each player to player arrows come in a pair: the dominant arrow which is straight and usually thicker, and the secondary arrow which is curved and usually thinner.

The dominant arrow shows that a player killed that particular opponent more often than they were killed by them. The curved arrow shows the opposite. The thickness of the arrows shows how large the ratio between the kills was, it doesn’t represent the exact numbers. If you want the exact numbers they’re in the kill matrix to the left of the graph.

Sometimes both players have curved arrows, that happens when players have killed one another an equal number of times.

In the example we can see that twist had kill superiority over dupreeh, Xyp9x and gla1ve, but suffered kill inferiority to device and Kjaerbye.