Quantifying Variance: Streaks & Tilt in Heads-Up SNGs Pt. 3
Quantifying Variance is a biweekly column in which we’ll take a look at some of the math underlying poker, with the goal of understanding just how probable or improbable various occurrences actually are, and how to tell the difference between what is random and what is not.
Last time, we found that hunting for tilt in the wild was a tricky proposition. It had been my working hypothesis that we would only need to look at players’ longest streaks and, if we found that most people tended to have longer winning and losing streaks than predicted by statistics, that would be hard evidence for the existence of tilt.
It turned out that this was not the case, so I decided to look at shorter streaks. Unfortunately, being an inexperienced Sharkscope user, I didn’t realize at the time that it was possible to separate out players’ heads-up games from other tournaments, which tied my hands a bit. Fortunately for us, it turns out that Sharkscope is much more powerful than I’d given it credit for, and has a number of advanced filters available to subscribing members, so we’ll have a lot more data to look at in coming weeks. In the meantime, though, I had promised to start with some case studies of individual players, to see if we can tell how streaks affect their play on a personal level.
What are we looking at?
Having selected a few high-volume heads-up players (who I’ll call Mr. A, B and C to preserve their privacy) and pulled up their heads-up stats on Sharkscope, the question becomes what to do with that data.
What Sharkscope gives us for each of these players is their overall wins and losses, and the frequency with which they’ve had winning and losing streaks of various lengths. It’s easy enough to write a computer simulation to figure out what the average streak frequencies should be for a given player’s sample size and win rate, but let’s not forget what we’re ultimately looking for.
We want to find out whether they exhibit long-term agential variance; in other words, whether their probability of winning any given game depends in any predictable way on the players themselves, rather than just the luck of the cards. In particular, we’re looking for the effects of tilt (or confidence, perhaps). We want to know whether a winning or losing streak tends to affect the player’s subsequent results due to emotional effects.
Since win rate is what we’re interested in, it makes sense to look at that directly if we can. In fact, it’s not too hard to extract win rate from streak data, because the player’s win rate after a streak of length N is, by definition, the same as the percentage of the time that the streak is extended to length N+1. If we take, for instance, all the times the player had 9 or more consecutive wins, and divide by the number of times they had 8 or more, we get their effective win rate after a winning streak of 8 games. For losing streaks, it’s simply the opposite – the player’s win rate is the same as their odds of breaking, rather than extending the streak.
Naively, we would expect to see players’ win rates go up over the course of a winning streak and down over the course of a losing streak, due to the effects of confidence and tilt respectively, but I do say “naively” for a reason. As it turns out, we do see statistically meaningful effects correlated with the streaks, but with each player showing a somewhat different pattern.
Reading the graphs
For each of the players, we’ll look at a graph showing their win rate after winning and losing streaks of length 1 to 8. The reason we’re stopping there is that for the volume of data we have available, losing streaks beyond 8 games are rare, so the statistics become pretty unreliable.
Each graph has three lines. The blue line represents winning streaks, the red line represents losing streaks, and the horizontal orange line is the player’s overall win rate, included as a reference. When the red or blue line is above that line, it means the player is playing better than usual, when it falls below, they are playing worse.
One thing you’ll notice is that the win rates start to vary wildly when we get into longer streaks. Although this is what we’re looking for, we have to be careful to take it with a grain of salt; since long streaks are rare, these win rates are based on ever-decreasing sample sizes, so we expect a lot more random variation. For this reason, in pointing out the trends in each player’s graph, I will include the relevant sample size and the odds (calculated by way of another computer simulation) that the results could have arisen by chance.
Mr. A is a high-stakes player with just over 10,000 heads-up Sit-and-Gos under his belt. His most common stakes are $60 and $100, which together make up half his sample. He plays turbos almost exclusively (over 99%), and has an overall win rate of 55.6% – very solid for those stakes.
The most striking thing we see in his graph is that his win rate holds pretty firm in the face of losing streaks up until he’s lost 7 in a row, after which it falls off a cliff. After 7 or 8 losses, he’s only winning the next game 1/3 of the time. Of course, we’re dealing with a small sample size, as he doesn’t go on losing streaks of this length very often: he’s only done so a total of 12 times, in fact. However, the odds of a player who is 55.6% overall to go 4 for 12 by chance are only about 3%, so although bad luck may be contributing to his abysmal performance on these streaks, it’s likely that tilt is a factor for this player.
The next most obvious deviation is his sudden spike in performance after four straight wins; he is, at that point, over 60% likely to win a fifth. He continues to perform better than average for the next few wins as well, although not as dramatically. Again, one might assume this to be an anomaly, but the sample size is actually quite large for these mid-length winning streaks: 411 games to be precise. The odds of a player performing 5% better than average over that sample are a little over 2%, so again, it seems like this is in fact a meaningful effect.
Lastly, we note that there seems to be a bit of reverse tilt happening after two consecutive losses, with his win rate climbing 2.5% above average. This may seem like a small effect, but now we have a sample size of 1127, so again, the odds of this being a statistical fluke are only 4%.
Overall, this graph seems to paint a picture of a player who grows in confidence and skill when on a minor heater and who tends to clutch after losing a couple of games in a row, but has an unfortunate tendency to lose his mind on the rare occasions that he is on a particularly bad losing streak. This sort of behaviour is intuitively more or less what we’d expect… but let’s see what happens with the other players.
Mr. B is a mid-stakes player. He was a losing player in his early days, but turned himself around has been winning consistently since July 2012. I’m therefore using only data since then to avoid contaminating throwing off his losing streak statistics with streaks from his fishier days. He’s had nearly 26,000 games in this winning period, with an overall win rate of 53.76%. Three quarters of his games as a winning player have been played at $15 and $30 stakes, and three quarters of his games overall are turbos.
Like Mr. A, he shows a tendency to tilt during his longer losing streaks, to a point. With a larger sample and a lower overall win rate than Mr. A, the volume of data he has for these longer streaks is considerably higher. On the other hand, his performance isn’t quite as poor as Mr. A’s, only dropping to 45% at his lowest point, rather than 33%. He’s been in that position 65 times, making his odds of performing this badly by chance around 9%.
Interestingly, he rebounds after 8 losses and succeeds in breaking the streak at that point over 58% of the time. Now we’re talking about a rather small sample, of course, so we can’t be sure this isn’t just random noise. However, it’s worth noting that none of the three players we’re looking at has longest losing streaks in excess of statistical prediction, which would be what we’d expect if they continued to tilt indefinitely. In all likelihood, all of these players clutch at some point, perhaps deciding to take a break when on a streak that stands to become their worst ever.
He also seems to show clutch behaviour after 4- and 5-game losing streaks, but here he is only winning a percent or two more games than his overall average, and the sample size is not large enough for this to be statistically meaningful.
The most remarkable aspect of his graph, however, is what happens after five wins. In stark contrast to Mr. A, who plays better when on a heater, Mr. B appears to get overconfident. Almost all the data points on his win-streak graph are below his average, meaning that winning several games in a row makes him less, rather than more likely to win the next. He bottoms out after five wins, extending the streak to six only 47% of the time. The sample size here is 537, so the odds of this being a fluke are lower than 0.1%, making this by far our most compelling evidence for emotional effects.
Overall, in terms of tilt, Mr. B’s performance seems fairly consistent with Mr. A. He tilts a little bit earlier, but not as heavily, and perhaps recovers faster. On the other hand, his performance while on a winning streak is the reverse: where confidence seems to work in Mr. A’s favor, it tends to get the better of Mr. B and causes him to play carelessly.
Mr. C is an even higher-stakes player than Mr. A, with $200 turbos being his most common game. Prior to July 2012, he was playing lower stakes games at a considerably different win rate, so like Mr. B, I have used only his data since then. This data set includes just under 24,000 games, over which he has a 53.53% win rate. Like Mr. B, about three-quarters of his games are turbos, with the remainder being played at regular speed.
Like Mr. A, Mr. C seems to grow in strength over the course of a moderate win streak, although the tail end of his graph suggests he may eventually succumb to overconfidence like Mr. B. His peak win rate while on a winning streak is after six consecutive wins, at which point he’s about 5% more likely than normal to win his next one. He’s had 270 streaks of at least six games, so this is a decent sample size, and it’s only about 4% likely that he’d do this well by chance.
What’s remarkable about Mr. C, however, is that rather than tilting, he has a strong tendency to clutch after six or seven losses, with his win rate skyrocketing to nearly 64%. This is on a sample of 135, so only 1% likely to be a fluke.
Generally speaking, then, it seems like Mr. C plays better when on a long streak, regardless of whether it’s a winning or a losing streak. Shorter streaks don’t seem to affect him much, with the one exception being that his performance drops by about 1.6% after two consecutive wins. Although this is a small effect, the sample size is 3193, making it less than 4% likely to have happened by chance. It’s hard to say what this means, but given the stakes he plays at, it could have to do with his opponents adjusting to him after he beats them a couple of times.
We don’t want to fall victim to the jelly bean fallacy, so let’s look at the deviations we’re seeing and compare them to expectation.
Given that we’re looking at 16 data points for each player and are considering deviations both above and below the player’s average, we’d expect to see one or two statistical outliers of probability around 1-2%. Instead, we see a total of four places where something at least this improbable occurs: Mr. A’s tilting and confidence, Mr. B’s overconfidence, and Mr. C’s clutch behaviour on a losing streak. Furthermore, most of these are not complete blips: Mr. A’s jump in win rate after four straight wins may be, but the other effects are corroborated by at least one adjacent data point showing a significant and similar deviation from the average.
Thus, it seems likely that players’ win rates are not entirely consistent, and do correlate in some way with streaks. On the other hand, there is no consistency between the players: Mr. A and Mr. B both tend to tilt, but a long losing streak actually helps Mr. C. Meanwhile, Mr. A and Mr. C both tend to play better on a moderate win streak, but Mr. B self-destructs. There’s likewise little consistency in how they respond to shorter streaks, though some of them show a statistically significant tendency one way or the other there as well.
That leaves us with the question of whether there’s any overall trend in players as a whole. Are there more people who tilt than clutch, or vice versa? What about confidence: does it help more people than it hurts? Or does it really depend entirely on the player? Those are the questions we’ll address in our next instalment.
Alex Weldon (@benefactumgames) is a freelance writer, game designer and semipro poker player from Montreal, Quebec, Canada.