Thursday, March 11, 2010

Using Individual Stats

One of the more common requests I received after I posted my Team Win Probability Calculator was to provide some way to use individual stats in the calculations. This is a topic that presents many difficulties, ones that I have been trying to solve for the past few weeks. The most obvious problem with using individual stats is that while individual stats are easy to collect, they are hard to use effectively to prognosticate the outcome of a match. The reason for this is the extreme team nature of the game. It's easy to build up stellar individual stats just by surrounding oneself with great players. If a game is lop-sided, everyone on the winning team will have great stats, even if they contributed the least. Likewise, a great player on a terrible team can top frag or even outscore the medic, but still won't be able to get anything going; in the end, his kill-death ratio will still likely be below that of everyone on the dominant team.

The inspiration for much of my statistical work, Advanced NFL Stats, has a game prediction model that uses team stats to predict game outcomes. A similar effect can occur in football as in Team Fortress: teams that face relatively easy schedules early in the year can rack up some very good numbers. A solution to this problem is to adjust a team's stats to reflect the strength of the teams that they played when they got those stats. Teams that faced weak opponents have their numbers docked slightly for their easy schedule, and those that faced much stronger opponents are boosted.

The problem is exacerbated by the way most 6v6 match-ups are being played. Pick-up games and lobbies have players of greatly differing skill levels coming together randomly for one game, and then those particular teams never face each other again. Since the strength of your teammates matters just as much as the strength of your opponents, we must also adjust each individual's stats by accounting for their teammates.

I'm currently working on applying an Advanced NFL Stats-style model to Team Fortress 2 games. I'll be using the technique of adjusting for opponent strength, and also extending the technique by including an adjustment for the strength of your own teammates.

Sunday, February 21, 2010

A Team Fortress 2 Win Probability Calculator

I believe I have the major errors worked out on my Team Fortress 2 Win Probability Calculator. It is now up as a live beta. Give it a shot. It should work in all modern browsers.

As I mentioned in my introductory post, win probability is calculated by comparing the current game situation to past games where the same situation occurred. For example, if you punch in Blue losing a scout and Red losing a demoman at the mid-fight, the calculator reports the probability that Blue caps mid is 55% and the probability that Blue wins the entire round is 61%.

It arrives at those numbers by looking at the historical data. In all the matches in the dataset, it finds all the mid-fights where one team was down a scout, and the other team was down a demoman. That particular situation occurred in 109 mid-fights. The team trading their scout for the enemy demoman ended up winning 60 of those mid-fights, and 67 of the rounds those mid-fights started. It is interesting to observe that the impact of the scout-demo trade is only 5%. The impact of the trade on the round win probability is 11%. This would seem to say that the demoman is more important after the mid-fight than during it.

What if Red wipes at mid? The calculator reports 100% chance of Blue capping and 100% chance of Blue winning the round. This certainly agrees with our intuition about how the game plays out. But 100%? Doesn't that say that it's impossible for Red to come back? The problem is a small dataset. In the dataset that I have, every time Red wiped, Blue won. That's why I also added a margin of error calculation.

Similar to the margin of error on a pollster's poll, the margin of error term is an indication of how confident we can be in the results, given the number of data points we had to work with. On the last example, the margin of error is 31% at the 90% confidence level. 90% confidence means that there is a 90% chance that the true odds of one team coming back from a wipe at mid is less than 31%. If a 31% margin of error seems huge, it is. Most polls of the kind you would see in a newspaper have 2-5% margin of error. A 31% margin of error means we only had a handful of cases in the dataset where one team completely wipes at mid. Compare that to the full strength numbers. Since every round starts out as a full strength battle, we have much more data to work with, and the margin of error is just 2%. But even with a 31% margin of error, the result agrees with our experience in this case: wiping on mid is bad, and you're likely to lose the round. Coming back from a wipe at mid is a rare event, and there's not enough data right now to get a better estimate.

The calculator is colorblind. Every calculation looks at both sides of the map: if Red is pushing last and Blue is down a scout, the numbers automatically include the times when Blue is pushing last and Red is down a scout. This is probably the most useful behavior for comparing situations and possible outcomes.

The only issue is that it doesn't show Blue's slight edge. In current dataset, Blue wins 53% of all mid-fights. Blue has the edge at mid on Badlands, Granary, and Freight. Only Follower bucks this trend, with Red winning 56% there. 53% is the same kind of edge we've been seeing ever since Valve published their stats. Your guess is as good as mine why Blue has an edge, but it certainly seems to be real.

At the time of this writing the dataset is 187 matches from #tf2.pug.na. Special thanks to Cinq for sending me the log files. I'm always looking for more data. Margins of error will naturally decrease with more data.

Wednesday, February 17, 2010

A Statistical Approach to Team Fortress 2

Welcome to my site on Team Fortress 2 statistical analysis.

My inspiration for this work is the site Advanced NFL Stats. Brian Burke, the author of Advanced NFL Stats, applies statistical analysis to NFL football, and has found that a lot of the common wisdom of the sport is not borne out by the data.

Advanced NFL Stats has produced two very interesting systems which I hope to be able to port to Team Fortress 2, specifically the 6-man competitive format of Team Fortress 2. The first of these is a win probability system. The win probability system for football gives an estimate of a team's chances of winning the game, given information about the score, time remaining, who has the ball, and the current down and distance. By collecting all the times in the historical data that the state of the game matched the state of the current game, and whether the historical team went on to win, we can take the percentage of the time that the historical team won their game as the probability that the current team will win theirs.

For Team Fortress 2, a similar system should be possible. Instead of score, and field position for football, the relevant variables for Team Fortress 2 are possession of control points, which players are dead and alive, and whether the medics have ubercharge. Win probability can inform us about the value of different advantages. Is ubercharge-advantage worth more than the scout sent to force the pop? What about the demoman versus the medic at the mid-fight? Win probability can help answer these and other questions.

The second system that Advanced NFL Stats has produced is a statistics-driven game prediction system. Different from just examining the likelihood of winning based on current position and players, the prediction system uses the statistics that are most highly correlated with teams winning, and builds a logistic model that gives each statistic the correct weight for how important it is. I expect some kind of statistical prediction system to become more relevant as tf2lobby.com grows in popularity, and more players are introduced to 6-man competitive Team Fortress 2.

I have data compiled for the in-game win probability system, but I am still working on a nice interface for browsing it. I'll build the spiffy interface as quickly as I can, and post with an update and a link to the result.