What Does AlphaGo Mean for Poker?
A few weeks ago, Google announced that its Go-playing artificial intelligence (AI) AlphaGo had defeated professional player and reigning, three-time European Champion Fan Hui, and by a margin of 5-0 no less. Go has long been held up as an example of a game which is “hard for computers,” so seeing a very strong human player fall to an algorithm is a cause for celebration, consternation, or both, depending on who you ask. In particular, poker players worried about the future of the game in face of AI encroachment might well wonder what AlphaGo means for poker.
What is Go?
Go is the English name for one of the world’s oldest board games, originating in China where it is known as weiqi, and equally popular in Japan (as igo) and Korea (as baduk). Players take turns playing stones onto a grid, one player taking Black and the other White. The objective is to completely surround sections of the board with one’s own stones; enemy stones completely enclosed are captured, and empty areas at the end of the game score points for the surrounding player.
The rules are extremely simple – far more so than other abstract strategy games like chess – yet their implications are complex and difficult to wrap one’s head around. For computers, the difficulty stems from the size of the game space. Whereas chess has a fixed initial setup, a Go board begins empty; chess cannot therefore be played on anything other than an 8-by-8 board without changing the fundamental nature of the game, whereas a Go board can be scaled arbitrarily. In modern times, Go is typically played on a 19-by-19 board, but 17-by-17 was once the default, and the idea has been floated of moving to 21-by-21 at some point in the future if the game ever starts to feel played out at the pro level.
The large board means that there are many more possible moves to be made at most junctures than in something like chess. This, in turn, has exponential implications on the complexity of solving the game by brute force. With two possible moves to be made, brute force examination to four moves depth means looking at 16 positions; with ten moves available at each juncture, you’d have 10,000 positions to look at. Meanwhile, most Go positions allow for hundreds of possible moves. It’s this unfeasibility of brute force solutions that makes Go an interesting challenge for computers.
No-Limit Hold’em, another hard game
There’s a parallel to be made with poker here. The University of Alberta was able to “essentially weakly solve” Limit Heads-Up Hold’em with its Cepheus AI, while Carnegie-Melon’s Claudico, although a strong player, can still be defeated by humans at No-Limit Hold’em, and is still extremely far from theoretical unexploitability.
To understand why that is, you have to look at the number of moves available at any juncture. In Limit games, this is never more than three: check or bet when the action is unopened; fold, call or raise when facing a bet. In No-Limit games, however, the bet and raise moves are subdivided into different bet amounts, from the minimum up to an all-in. Humans simplify the game for themselves by having a few standard bet sizes (either in terms of big blinds or percentage of the pot), but a truly perfect AI would need to consider every legal bet size as a separate possible move.
Furthermore, because poker is a game of imperfect information, the previous streets’ action is as much a part of the game state as the pot size and community cards. When arbitrary bet sizes are allowed, then, the game space explodes, making No-Limit Hold’em much “bigger” a game than Go, and much, much bigger than something like chess.
Perfect vs. imperfect information: differing approaches
The comparison isn’t completely accurate, however, because there’s a fundamental difference between Go and poker, in that Go features neither chance nor hidden information. Poker, on the other hand, contains elements of both, in the form of a random deal and uncertainty about the opponent’s hole cards.
This makes a huge difference in terms of how the two games are approached as problems. In games of perfect information, like Go, the perfect strategies are “pure,” meaning that there is one correct move for every possible situation; your opponent has access to the same facts as you to begin with, so there’s no loss in playing predictably. In games of imperfect information, like poker, perfect strategies are typically “mixed,” meaning that the player will often be choosing probabilistically between several possible moves; for instance, the ideal strategy in a given situation might be to fold 30% of the time and raise all-in the remaining 70%. A certain amount of unpredictability is necessary in order to avoid giving the opponent an informational advantage.
In terms of analysis by humans, imperfect information games are usually approached with traditional game theory, which originated as a branch of economics. Perfect information games, on the other hand, are more easily attacked with combinatorial game theory, which is a rather different beast. It’s a branch of mathematics involving something called “surreal numbers,” and is as confusing to us non-mathematicians as it sounds. It’s powerful, but applies only to fully deterministic games with no randomness or uncertainty.
Likewise, the field of AI research for games has always been split into those more interested in attacking Go-like games, and those more interesting in poker-like games. Each of those camps has typically used its own techniques, such as various kinds of tree search for perfect information games, and minimax or regret minimization approaches for games with elements of chance or hidden information. If you’re not an AI researcher yourself, it’s not necessary to understand what these terms mean, only that they’re quite different, and what works for one category of game doesn’t typically apply to the other.
Neural networks and machine learning
Meanwhile, there’s another, more general approach to AI, which was postulated in the early days of computing and which has drifted in and out of fashion ever since: the artificial neural network. The idea here is to take the “intelligence” part of “artificial intelligence” more literally; rather than developing specialized algorithms to tackle specific problems, research into neural networks has focused on trying to emulate the low-level operation of the human brain, in the hopes that it will one day be possible to train such a program to perform any given task.
Once again, the details aren’t particularly important unless it’s your field of interest, but there are couple of critical things to understand. First, that the algorithm doesn’t “know” how to do anything initially, but can modify itself in certain limited ways. It takes input in some convenient format, and initially produces random output. It’s then fed huge volumes of input data (for instance, Go game records or poker hand histories) along with examples of the desired output (what the pro player did, who won the game, etc.). It then compares its own output to the target output and tweaks its internal parameters to try to bring the two closer together. Over many, many iterations, its output begins to match the desired solutions more and more closely, much as a child gradually figures things out through trial and error, supported by feedback from his parents and teachers.
The second, somewhat worrisome fact about these learning algorithms is that once they’re trained, not even their creators really know how they work. They understand the learning process itself, but the ultimate decision-making involves the whole network in a holistic way. Examining the low-level code tells you as little about a neural network’s “logic” as a single neuron tells you about the human whose brain it belongs to. This is one of several things that has hindered the progress of neural networks until recently; when they don’t perform as expected, it’s nigh impossible to tell what went wrong.
The combined approach
Aside from the difficulties involved in debugging them, the big weakness of neural networks is the general principle that breadth tends to come at the expense of depth and vice versa. A versatile solution is rarely the most powerful one, so although neural networks can be applied to essentially any challenge, most specific problems are better tackled by an algorithm handcrafted for that specific purpose.
While a specialized algorithm should perform better than a neural network for any given problem, writing such an algorithm requires the programmers knowing, in theoretical terms, how to solve the problem; the computer can provide the muscle to crunch the numbers, but must be told which numbers to crunch. When it comes to matters of human intuition, however, we’re mostly in the dark about the functioning of our own minds: How does a professional Go player evaluate a board position to tell who’s “winning” when she can’t read the game out to the end? General principles can be provided, but no one knows the entirety of what goes on in the professional’s mind, beyond saying that it’s a matter of “experience.”
What makes AlphaGo so incredibly strong is that it takes a hybrid approach. At its core is a type of tree search algorithm which brute forces its way through possible sequences of moves, in the manner of a classical AI for perfect-information games. But whereas previous AIs have either spent equal time considering every possible move, or relied on explicit, human-coded heuristics to tell them where to look, AlphaGo has two neural networks to help it out. One of these suggests likely moves based on general policies it has learned, and the other guesses at who’s more likely to win the game from a given position by “eyeballing” the board in the same way an experienced human player would. In combination, these guide it through the game tree and ensure it spends more processor power reading more deeply into the most promising branches.
The threat to poker
If neural networks can be applied to any problem, and AlphaGo has demonstrated that they can be combined effectively with more specialized algorithms, then there’s no reason not to expect that we’ll see “neural hybrid” poker AIs in future. The prospect of a bot which combines Claudico’s approximations to game theoretically optimal (GTO) play with a more human-like situational intuition should worry anyone banking on the future of online poker.
For one thing, most efforts at poker AI have so far focused on unexploitable play, treating each hand independently and not attempting to adjust to opponent tendencies. A neural hybrid poker AI could be trained on whole matches rather than individual hand histories, in order to produce a computer player which attempts to trick, adjust to and out-level its weaker opponents while reverting to more balanced strategies against stronger ones, much like a top human player.
Aside from the obvious fact that such a bot would almost certainly be more profitable at most online stakes than a conventional GTO-style bot, there’s an additional danger in that it would be extremely difficult to detect. One big weakness of most bots, from a detection standpoint, is that they never adjust, never get tired or distracted, and never tilt; sites can therefore attempt to detect bots by performing statistical analysis on player tendencies and look for anyone who seems just a little too consistent. A bot which adjusts to its opponents would be all that much harder to spot.
Even now, the signs are there that sites are struggling to detect bots. Last year, a Russian Pot-Limit Omaha bot ring went undetected by PokerStars until it was pointed out by a player who decided to do some statistical sleuthing of his own. Now, the site has reportedly been asking certain players to record video of themselves playing in order to prove that they’re not using computer assistance, suggesting that even when there is suspicion, the security teams are finding it hard to be sure.
I point this out not to pick on PokerStars, but just to say that as the world’s biggest site, one would expect them to have the best security personnel; if they’re struggling, one can only imagine how tough the situation must be for other sites. If, in future, everyone also has to try to hit a moving target in the form of a gear-changing neural hybrid, then the hunt seems nearly hopeless.
AlphaGo vs. Lee Sedol
The one ray of sunshine in all of this is that AlphaGo’s first professional opponent, Fan Hui, is not quite as strong as a non-Go player might expect. “Three-time European champion” sounds impressive, but Go is not widely played outside of Asia, and all the top players stay within the big three countries: Japan, China and South Korea. Defeating Fan Hui, then, is equivalent to beating, say, the Finnish national basketball team. It’s an impressive feat to be sure, but it doesn’t necessarily mean that the team in question is going to be competitive in the NBA.
Next month, AlphaGo will face its true test, another five game series, this time with the world’s reigning champion, the Korean player Lee Sedol. With a $1 million prize going to the winner, you can be sure that Sedol will be motivated. As you’d expect, the computing world is optimistic about AlphaGo’s chances, but Go players are inclined to think Sedol can hold out for another year or two at least.
Unfortunately, it’s a little bit hard to assess AlphaGo’s true strength, because it doesn’t try to crush its opponents, but rather maximize its odds of winning. At times, against Fan Hui, it appeared to be too conservative, yet it swept the series; it’s likely, then, that Fan Hui did not play well enough to bring out its best. That’s enough to give me pause, but I’m taking Sedol for now nonetheless. One thing’s for sure, either way: I’ll be watching the results with interest and concern, and if you’re among those worrying about the future of poker, so should you.
Alex Weldon (@benefactumgames) is a freelance writer, game designer and semipro poker player from Montreal, Quebec, Canada.