As of this writing, the latest Brains vs. AI Challenge is about a quarter of the way to completion, with the humans trailing the bot, but catching up. Dong Kim is the only of the four human players leading at the moment, having snatched $33,297 from the bot, Libratus, much of it on a single huge bluff-catch. Teammates Jason Les and Jimmy Chou are narrowly in the red, with Daniel McAulay’s $54,809 deficit accounting for the entirety of Libratus’s lead.
Curious about the challenge facing these players, I decided last week to put in a few thousand hands against Slumbot, an AI built by my cousin Eric Jackson. Slumbot can be assumed to be weaker than Libratus, having lost narrowly to Baby Tartanian 8, an earlier bot by the same team from Carnegie Mellon University. None the less, all of these bots use a similar approach, variations on a technique called Counterfactual Regret Minimization, and Slumbot is still very strong compared to most poker AIs you would find elsewhere.
In no particular order, here are some of the observations I’ve made about the differences between playing against a bot versus playing against human opponents.
Easy to get ahead, hard to stay ahead
I started off quite well against Slumbot, with double-digit profits per 100 hands for the first 2000 hands or so. I was hoping to sustain that performance into a larger sample, but have since dropped into the red due to a couple of coolers, a lost preflop coin flip for stacks, and some foolish calls in large pots. If you look at Slumbot’s leaderboards, this experience is typical; it seems rather easy to beat if for large sums over 1000 or so hands if the cards go one’s way, but very few people have managed to stay in the black for 5000 or more.
In other words, if you aren’t yourself a high-volume player, it takes a much larger sample than you may think to tell who is the better of two players, and a bot – given that it is both consistent in its play and consistently available to rematch – is a perfect way to convince yourself of this.
Exploiting the unexploitable
Slumbot and its competitors all share the feature that they don’t attempt to adjust their play in real-time. They seek to develop the least exploitable mixed strategy possible before the match, and then simply execute it with no attention paid to how their opponent has played in previous pots. They won’t necessary do the same thing in the same spot every time, but will choose from their available options with the same relative probabilities.
Ironically, the fact that the human player knows the bot is not adapting means that there’s no need to play a mixed strategy oneself. If you think you’ve found a type of spot where your bluffs are very profitable, for instance, you can fire off your bet every single time that situation arises without worrying that the bot will increase its calling frequency in response. In other words, it may be hard to find ways to beat the bot, but when you do find them, there’s no danger of being lured into a trap.
The one big hole
There’s one serious weakness in the approach used to build these bots, which is that handling complete freedom in bet sizes would render the task computationally impossible. Therefore, the algorithms must compartmentalize bet sizes to some extent, rounding, for instance, a 29% pot bet off to 30% and treating it no differently than a 31% pot bet.
I experienced the effects of this first-hand early in my play against Slumbot; holding Ace High in a situation where I expected it to have many worse Aces or Kings, I made a teeny-tiny value bet and was surprised to find that it called me with Eight High. My cousin explained that I had bet so small that it had been rounded down to 0% and the bot was treating it as equivalent to a check.
Now, Slumbot is primarily designed to take on other bots, so little effort has been made to disguise or correct for this weakness. It’s probably safe to assume Libratus won’t have behaviours as exploitable as calling sufficiently small river bets with everything down to the nut low. But it is likely that even the strongest bots exhibit some discontinuities in their responses to slightly different bet sizes, and that this is one angle of attack that a very astute human could seek to employ.
Maybe you really are just a lucky fish
It’s a poker cliché that the more thoroughly you dominate an opponent, the more of your victory he’ll chalk up to luck. We’ve all had the experience of being called a lucky fish by someone who doesn’t understand why we played a hand the way we did, or why it worked out for us.
One humbling aspect of Slumbot is that it provides you not just with your actual profits, but what it calls your “baseline” profits. This stat is the difference between your actual profits and those the bot would have made, playing itself, with exactly the same hole cards and board runouts you experienced.
Perhaps you’ve beaten Slumbot for 30 BB/100 over some sample, but your notion that you’ve played well may be shattered when you discover that your baseline profits are -40 BB/100: That is, if you were actually playing as well as it, you should have profited by 70 BB/100 with the luck you’ve had, and once variance evens out, it expects you’ll be losing to it by a significant margin.
It makes you wonder how many of your past opponents were actually correct in calling you an idiot luckbox donkeyfish who’d be busto if the deck didn’t keep hitting you in the face.
When your best teacher is your opponent
One important thing to realize about these algorithms is that the way they build their mixed strategies is by playing millions upon millions of hands against themselves. What this means is that ultimately, the strategy the bot is playing is one which it has not itself found a way to exploit. That, in turn, means that when you’re not sure what to do, you can try to ask yourself what the bot would do; emulating its strategies may not give you an edge, but if you do it successfully, you should at least not be losing.
This is quite different from playing against humans, whose strategies are not typically balanced against themselves; it’s entirely possible for a human to be bluffing a given river too much, yet also not calling enough when the situation is reversed. Thus, doing what you think your opponent would do may turn out to be either a great idea or a terrible one.
Ego is a huge factor in human play
One of the first thing you’ll notice playing Slumbot is that it bluffs less often than you’d expect. Playing against it a bit more, you’ll realize this isn’t quite true. It actually bluffs a fair bit, but it bluffs early streets considerably more than most humans, later streets considerably less, and quite often fires only a single barrel.
As an extreme example, something Slumbot will sometimes do, which you’d be unlikely to see from a human player, is to make a 4-bet bluff preflop and, when called, proceed to just give up on a lot of textures without making even a single postflop attempt to buy the pot. Even on just a sample of just a few thousand hands, there have been several instances where a 3- or 4-bet pot has been checked down postflop and my Ace High or small pocket pair has been good, and many more where it has simply folded the turn to a modest-sized bet after checking the flop.
It makes perfect sense that if its play is correct (or close to correct), that humans would err on the side of being persistent in our bluffs. Getting a bluff called is embarrassing, especially if we end up having to show it down, so it’s a natural impulse when we get called on an early street to keep blasting away until we either push our opponent off his hand, or pick up a hand we won’t be ashamed to turn over at showdown.
Similarly, we feel like wimps if we have no showdown equity, yet fail to take at least one shot at the pot, yet we’d probably save ourselves a lot of chips just check-mucking our Eight High some of the time when our opponent’s range is loaded with bluff catchers, something Slumbot has no issue doing. Overall, I get the sense that Slumbot is willing to give up a lot of small pots, but plays large pots well and is good at choosing when to do so… certainly an area of the game in which most of us could stand to improve.
Tilt is even less rational than you think
On a related note, you’d think that playing against a silent, emotionless opponent, with no actual money on the line, would be a less tilting experience than poker in its normal context. For some people that may be the case, but not for me. Part of the reason I’m currently in a hole against Slumbot is frustration getting the better of me once luck started to go its way, calling off in spots where I haven’t seen it bluff a lot just because I convince myself it can’t just have the goods every time. This is terrible logic against a human opponent, and even worse against a computer.
Your opponent’s emotional state may be an illusion
Meanwhile, even though I know it’s impossible, there were moments where I could swear the bot was tilting. For instance, there were a couple of times that I’d float with a weak draw, hit, win a big pot, and see Slumbot open for 5x the next hand. Against a human opponent, the natural assumption is that the large opening size is because he’s upset about losing the last pot and wants to bet big so that you don’t “suck out” again.
Slumbot, of course, is opening to 5x because the strategy it has computed says it should open to 5x with that hand some percentage of the time, and its random number generator has decided that this will be one of those times. The perception that it is reacting to things which happened in previous hands is pure pareidolia. As with the “lucky fish” phenomenon, it makes me wonder how often I’ve been right in making assumptions about my opponents’ feelings, and how often I’ve been mistaken in forming connections between previous hands and an opponent simply mixing up their strategy or getting an unusual run of cards.
Humans are too conservative with bet sizing
Beginners are often advised to be fairly consistent with their bet sizing to avoid giving away too much about their hands. More advanced players begin adjusting their sizing to considerations other than their holdings, such as to the board texture or to leave an appropriately-sized river all-in.
The reality is that you can have multiple bet sizings for a given situation which are dependent on your holdings, so long as each range is balanced independently. That is, you can bet big with your big holdings and smaller with your marginal hands, as long as your bluffing frequency in each case is such that your opponent has a hard time choosing between calling and folding, and that you’re not folding too much to a raise in either case either.
This is hard for humans, because desire creeps in; we bet a lot because we want a fold, or out of greed. We bet a little because we want a call, or don’t want to risk too much if we’re bluffing. If we are splitting our range into multiple sizes, however, we want to do it such that our opponent will call more liberally against the smaller sizing but be wrong more often, and more conservatively against the larger sizing but catch us bluffing a bit more often when he does.
This is difficult for humans, because it requires a great deal of understanding of how the various portions of our range stack up against various portions of our opponent’s. It’s much easier for an AI like Slumbot, which taking a more brute force approach. Slumbot may not be “thinking” about ranges in any true sense, just using lines which have empirically proved themselves to be profitable; even so, observing what it does and asking the question why such variable bet sizing could be a good idea is a great opportunity to add some sophistication to one’s own game.
Alex Weldon (@benefactumgames) is a freelance writer, game designer and semipro poker player from Montreal, Quebec, Canada.