Upcoming Brains vs. AI Rematch More Than Just a Game for AI Developers
In April of 2015, a four man team of the world’s top poker professionals that consisted of Doug Polk, Bjorn Li, Dong Kim, and Jason Les took on then-most powerful Heads Up No-Limit Hold’em playing bot, Claudico, in the inaugural Brains vs. AI competition. It was a highly publicized 80,000 hand match that was held at Rivers Casino in Pittsburgh, PA and open to the public. With play money stakes of $50/100 and a total of just over $170 million wagered over the course of the competition, the human players eventually bested the machine to the tune of $732,130 with an aggregate winrate of 9.15bb/100.
While the poker community may have witnessed what they believed to be a decisive victory in a HU4ROLLZ pissing contest between humanity and Skynet Beta, upon scrutiny, the actual results were not as they may have appeared. Despite the human players enjoying a collective win in terms of dollars, the final results were declared to be inconclusive at the level of measurement required by science to be considered to be statistically significant.
Variance is a strong factor in determining a true winner, and what looks to be a large winrate over an equally large sample size did not prove to be adequate. Approximately 5% of the time, a win by that margin could have occurred by chance alone. PTP’s own Alex Weldon provided a technical breakdown of the 2015 results that can be viewed for a deeper understanding here.
Starting Jan. 11th, a $200k rematch will take place again at the Rivers Casino, this time called “Brains vs. Artificial Intelligence: Upping the Ante.” Once again, four human challengers will go head-to-head against the world’s most powerful AI, except that this time around, there will be a few fresh faces. On the human side, Jason Les and Dong Kim will be returning, while Polk and Li will be replaced by Daniel McAulay and Jimmy Chou. Claudico will be stepping down in favor of Libratus, which represents the latest technological effort by Carnegie Mellon University (CMU) computer science professor Tuomas Sandholm, and PhD student Noam Brown, the programmers who spearheaded the development on its predecessor.
The setup of round two will mimic that of the first bout, with the players splitting off into pairs and grinding it out individually over the course of 20 days against the computer. One of them will be broadcasting his play on the casino floor, while the other is tucked away in isolation, using a reversed set of randomly dealt cards as his partner in a version of the game known as “duplicate poker“, which was chosen to reduce some of the luck factor that inherently impacts the overall results. Stacks 200bb deep will reset after each hand, and an additionally 40,000 hands will be played this time around, for a total of 120,000.
Is It Accurate to Call It a Rematch?
Referring to this year’s follow-up competition as a rematch might come across as being a bit misleading. While this year’s round of flesh and blood contestants are presumably pitting themselves up against Libratus in an attempt to book a win for a share of the prize pool, the AI developers aren’t in it for the gamble. The game of NLH represents something of a gold standard when it comes to AI development, and booking a win for the developers is more akin to a major achievement for all of mankind than it is to a traditional prop bet.
Perhaps even more alien to poker fans is the concept that beating out the world’s best heads up NLH players has nothing to do with bragging rights– at least, not the kind of bragging rights that your average poker enthusiast may be able to comprehend. The way Brown describes it:
“I am very interested in the theory of imperfect-information game solving, and see this competition as a way to test some of the latest techniques that have emerged from our research. None of our methods are specific to poker, and can be applied to any strategic interaction where there is hidden information. Some examples are auctions, negotiations, and cyber-security. I also think beating humans at no-limit Texas Hold’em would be a major milestone in AI. After AlphaGo’s victory last year, no-limit Texas Hold’em is one of the few “benchmark” games in which computers haven’t been able to beat humans. It would certainly be nice to claim that prize.”
Poker Isn’t Always Gambling
There seems to be a misunderstanding on the part of some onlookers, who believe the competition to be an actual money wager between mankind’s finest against the world’s most advanced chip-slinging poker bot. None of the human players are required to put up their own money to play, and all of them are getting a cool $10,000 payday just for showing up. From there, just how well they perform individually compared to each other will determine how the remaining $160,000 purse is divided, and the participants are getting the entire bounty regardless of whether or not they get the best of Libratus during play.
While the formula for this divvying process is somewhat complicated, in order for, say, Dong Kim to profit more than his appearance fee, he has to not only wrestle chips away from the AI, he has to outscore his organic brethren as well. In the event that all of the contestants are beaten chip-wise, the player who loses the least takes home the additional cash.
This secondary step of having the players compete amongst themselves ensures that each individual player brings his A-game at all times, and doesn’t just allow the pros the option of “colluding” against Libratus during a massive lead going into the home stretch. Such a scenario would be made possible if the human players were entering the final day of play with a large collective lead and then agreeing to soft-play the program by only playing strong hands, or just simply folding their way en route to victory. Similarly, they won’t be able to go for broke by forcing large pots in an attempt to hit a target winrate if they find themselves falling behind.
Implications of the Match Go Well Beyond Poker
Poker players may have a hard time wrapping their heads around the idea, but to the programmers, racking up points in a game traditionally played for real money is not the end goal. Much like past efforts in checkers, chess, Jeopardy!, LHE, and most recently, the Japanese board game of Go, booking a decisive win against the best in a game as complex as NLH that contains the element of hidden information represents a win not just for the research team at CMU, but for the entire field of artificial intelligence. Says Sandholm:
“Going against the top humans in this particular game is certainly a pinnacle. Heads-Up No-Limit Texas Hold’em is the only game on which there has been decades of AI research but where the best AI has not surpassed the best humans — unlike in checkers, chess, Heads-Up Limit Texas Hold’em, and Go, where AI now reigns supreme. So, Heads-Up No-Limit Texas Hold’em is, in this sense, the final frontier within the foreseeable horizon.”
Both Sandholm and Brown actually strike me as being rather ambivalent with the idea of having their bot beaten at the conclusion of the rematch. The concept of “winning” does not even seem to register to with them, yet their enthusiasm for the underlying research clearly shines through during our correspondence. Maybe in the regard, it will be more helpful for readers to begin thinking of the participating players as the ultimate human test subjects for testing the capabilities of state-of-the-art algorithms, rather than young men involved in an elite competition.
A New Level of Competition
Even if one were to consider the four pros from the first competition victorious, this does not mean that this year’s top brass will be engaging Claudico 2.0 with the benefit of hindsight. Libratus is an entirely new program constructed using all new methods, meaning that it has built itself a completely original strategy from its predecessor, and therefore effective strategies used 20 months ago may be obsolete, if not completely useless.
When I inquired if there were any specific areas that may have been tweaked for improvements such as 3-bet frequencies or bet sizing, Brown told me that no manual improvements were made, as Libratus was only provided with the rules of the game. The AI reached its own conclusions about which tactics are appropriate only after playing trillions of hands against itself and gradually improving over time. The skill of the AI has risen to the level where Brown himself lacks the ability to question it, as he says, “Libratus is so much better than us at this point that I trust its decisions in poker much more than my own anyways.”
Weighing the Odds of Winning
When I asked if he could estimate the odds of a decisive win this time around, Sandholm declined to speculate, yet perhaps ominously so.
“It’s very hard to say. Tartanian 8 is much stronger than Claudico. Libratus is much stronger than Tartanian 8.” He then humbly added, “Also, the top humans are stronger today than they were a year and eight months ago.”
Despite Sandholm’s lack of prediction for success, it is difficult to imagine professional players running on biological wetware improving at a rate to match the technological wizardry spawned at the CMU research lab. The Tartanian 8 he mentions was the immediate successor to Claudico, and while it did not face off against human competition, it bested the field in the 2016 Annual Computer Poker Competition.
The rules of that competition put a cap on the available computing power, and therefore Tartanian 8 was forced to play as a scaled down version of itself, which went by the name of “Baby Tartanian 8.” With the more powerful Libratus presumably operating with no artificially enforced restraint, it is not only conceivable to imagine a victory, but perhaps even a rout.
Is This the Beginning of the End for Libratus?
The AI developers currently have no future plans to make this a regular event. While future iterations of Libratus have the option of demanding further rematches in event of a loss or yet another tie, should the AI clearly emerge victorious, future generations of NL’s top crop shouldn’t be expecting any more freerolls.
“In that scenario, I doubt that I would put in the effort to raise this kind of sponsorship and pull together this kind of event again. It takes a huge amount of effort and time”, Sandholm says.
Regardless of the outcome, Sandholm and Brown plan on continuing their efforts at improving the algorithms that underpin the Libratus program, even if that means that Libratus itself may be facing an early retirement from poker. Brown will be running the aforementioned Annual Computer Poker Competition for the next two years, but will be shifting his focus from engineering aspects to developing the theory and algorithms running beneath the AI, while Sandholm plans to further apply himself to the applications of the research yielded from the project.