Applied Deep Learning Researcher
(Read my followup story for an update on the match. This is a good place to start, and my predictions are looking pretty good, 2/3 of the way through.)
I got to Pittsburgh late on Wednsday to watch the first day of the Brains vs AI poker challenge at the Pittsburgh Rivers Casino. The games got going behind late, and I’d assumed the players would play past the 7pm schedule. The scoreboard tells us why they ended on time instead:
There will be 120,000 hands played over two weeks — to avoid last year’s controversy where humans claimed victory, while the CMU poker group noted the small sample and thus a statistical tie. 120,000 hands is a lot of poker.
Four professional heads-up poker specialists will play duplicate poker against the AI (two pairs each get the same cards, with human and computer switching roles, and no communication between the players other than on breaks). After 700 hands by each player, and 2,800 overall, the computer was up a whoping 70,000 chips. This amounts to $25 per hand, or a fourth of a big blind at 50/100 blinds and 20,000 stacks. This would be a very solid victory for the AI, if maintained over the whole match.
It’s not hard to see why the players would opt to end on time, get some rest, and discuss their strategy for the days ahead. The match will take place every day over the next two weeks, at Pittsburgh’s downtown Rivers Casino.
Meanwhile online, CMU poker research group graduate Sam Ganzfried was impressed by the computer’s play on the first day.
I could not see a replay of yesterday’s games, so it’s hard to get specfics. But Sam seems to notice that the AI fixed two of last year’s major weaknesses:
Let me explain what that means. Carnegie Mellon’s Libratus poker AI solves the heads-up poker game for an unexploitable equilibrium solution (as do previous CMU poker AIs, Tartarian and Claudico).
The entire poker game state is way too big, unlike a smaller game such as checkers, or even heads-up Limit Holdem. Therefore the AI solves a simpler game, in which some game situations are grouped together into single states. Specifically, it has a “card abstraction” — similar card states are treated as exactly the same situation. And it has some “bet abstractions” — bets are rounded to the nearest bet size within a limited choice. The AI considers betting “pot” or “2x pot” but not 1.75x pot, for example. The 1.75x pot bet would likely be treated as a bigger bet after rounding, so the odds might not be modeled quite correctly.
The card abstractions are generally good, but they could start to fail in important but rare cases, like a player bluffing based on “blockers” in his hand — cards that his opponent can not have, thus less likely to make a striaght or a flush, and more likely to have a strong but foldable hand like one pair. Sam explains this in detailed analysis about last year’s Brains vs AI match, linked in the tweet above.
The bet abstraction could be an even bigger problem. Once a human starts betting in unusual amounts (for the computer), the whole betting tree becomes difficult to map to states that exist into games that the AI could play against itself.
The CMU team of graduate student Noam Brown and professor Tuomas Sandholm solved both of these problems with big league amounts of online computation. Apart from common early hand (preflop) situations for which the equilibrium solution is already very solid, later hand decisions take ~15 seconds for the computer, as you could see on the live Twitch Stream.
During those 15 seconds, the AI is dispatching an “endgame solver” job to the Pittsburgh Supercomputer Center, as reported by The Verge and MIT Technology Review. CMU won’t disclose details about the solver until after the match, but this appears to be the big difference between last year’s Claudico and this year’s Libratus player.
An endgame solver can take time to simulate rollouts with the exact cards of the computer, and the exact bet amounts. This is computationally exhaustive since the solver has to simulate not only the hand to the end of the hand, but also from the beginning… to build up enough situations where an opponent would credibly get to the existing game state. There is no point seeing how your opponent would play all of his hands, if you can not first determine what hands he might have that arrive at the current state given the previous bets.
But once these paths are created, it is possible to correctly account for the exact hands that an opponent is “supposed to have” with well-balanced equilibrium play. This includes card replacement effects, since if you hold a King of hearts, your opponent is much less likely to have two kings, and somewhat less likely to hold a hearts flush draw on a flush draw board.
Betting-wise, the AI still bets in a few fixed amounts, but these include more bet sizes, to the point that the AI actually varies its bet sizing more than most humans. Faster processors, better algorithms, and a super-computer cluster — advantages that a human can never leverage in his or her lifetime. Man I wish I could upgrade, or at least buy some extra memory and unload the difficult computation in life to an AWS Lambda.
Wrapping up this morning’s 700 hand session — two screens of 350 hands for each player — poker pro Dong Kim was up 30,000 chips. I’d watched him play the later half of that session, and observed him moving steadily up, despite getting stacked early twice (losing an entire 20,000 chip buyin in a single hand).
Asked what were the big hands, he told me there was an allin for stacks that he lost with AK against the AI’s pocket Aces. In another spot, he had Aces himself, but the AI ran him down and made a flush with 42-suited. Besides those hands, Dong thought he ran pretty good, and any good player would be ahead with the cards he was dealt.
Jason Les played the “other side” of those same cards upstairs. Dong was concerned that Jason might lose as much as he’d won, so it was nice to hear that Jason only lost about -15,000 while Dong had won 30,000 chips. Advantage 👨.
We laughed when Jason said he folded the 42-suited preflop, and did not get a chance to chase down the AI’s Aces. Would have been nice.
In the other duplicate match, both players lost. Therefore the players are down more overall, and not doing much better than the -$25/hand pace set on the first day.
With this lead and given the AI’s strong play, beating it this match is probably futile. Humans still have some edges to be sure, but we also make mistakes. A player moves within seconds, and he is not really trying to solve for an optimal strategy. Thus unless the AI has consistent bugs in important weird corner cases, it is hard to see how people could both play solid enough not to lose, and well enough to exploit the AI in the cases where it is possible.
This reminds me of the adage, that once a computer is as good as a human, it is already better.
Meanwhile researchers at the University of Alberta — another of North America’s great artificial intelligence programs, and CMU’s rival in the poker research space — released a draft paper about their DeepStack poker AI, also claiming “expert level artificial intelligence in no limit Hold’em”. The authors won’t discuss their creation while it’s under peer review, but the structure described uses online simulation to solve individual game situations. It also includes a 7-layer deep neural network as the “brain” of the algorithm, connecting hand input states to the simulator.
The DeepStack system — a great name, I’m a bit sore for not taking it — claim great results against professional-level players. They actually out-performed Libratus’s current pace over 40,000 online hands. Even if you notice that the players that DeepStack competed against were not highly vetted or well-compensated for their time, one has every reason to believe that this is also a very strong system.
The neural network + online simulation approach is exactly the same as I would have applied, or any other deep learning researcher would have tried, to create something like AlphaGo for poker. Which is probably why Alberta kept their work secret, and won’t comment on it now until it is officially journal-reviewed and published. The DeepStack paper claims that the AI spends ~3 seconds per move, much less than Libratus, so I wonder if they also did something clever to speed up the online simulation part of the algorithm. Batching something on GPU, perhaps? Or maybe just considering fewer simulation states. It’s hard to know.
It would be great to see the two flagship AIs battle each other sometime this year. That seems unlikely, as the setup for Libratus is pretty involved and rather expensive, and I would expect that DeepStack is similarly difficult to run. Someone would have to put up a substantial prize pool to get both artificial Hellmuths in the same room. Any takers?
Meanwhile both networks can benchmark against professional players as they have been doing, and against previous years’ strong poker AIs. We’ll have some points of reference soon. Will one be significantly further ahead of the other?
What’s next for poker? Surely, playing as well as the best humans heads-up is not where this ends.
It might be, if a new generation of graduate students doesn’t pick up poker. Speaking with the developers of the current poker systems, many are looking forward to other AI systems, and beyond poker for their future work. Meanwhile, with the race to solve heads up Limit Hold’em, an the race to beat humans at heads up No Limit, not much work has gone into building good AIs for multi-handed play.
It would be great to see that change. In conversations with others involved in the Annual Computer Poker Competition, there has been interest in a multi-player match. Would we get enough entries for a full 6-player table? It would be tough, but you gotta start somewhere. Amarillo Slim won the third World Series of Poker by besting just seven other players.
Surely with MIT (and other colleges) hosting poker-AI classes and competitions, we could get multiple entries simply by promoting the tournament and doing more to help people submit their entries to the competition. Some coders submit poker AIs to win. Others simply want to see how well they’d do against the best computer players, and enjoy getting the logs for future analysis, in case they ever want to go all-in on building a badass poker AI.
Equilibrium solving does not lend itself easily to 3+ player hidden-information games. But surely it could make some progress, perhaps going back to smaller card abstractions. Even a hack on top of a heads-up AI might get pretty far, and at least provide a decent starting point to building multi-handed approaches. With a neural network, once you have a decent opponent (equilibrium, neural net, or player hand logs), it might be possible to build something decent, and have it improve with self-play. This is how DeepMind’s AlphaGo got started.
In AI for video games, you’re seeing projects like DeepDrive, which learn to drive a car in the Grand Theft Auto video game (San Andreas edition) directly from the screen pixels, by imitating the computer’s self-driving features — which has access to more information. A similar type of “transfer learning” would be an interesting application for multi-player Hold’em.
Meanwhile for heads-up no limit Hold’em, I’d love to see someone try an adversarial neural network approach. With access to a decent equilibrium opponent, it should be possible to try different strategies, and start picking apart which ones do the best, perhaps in 100,000 hand matches. These matches take weeks against poker pros, but could be run overnight between computers online.
A simplified version might start with a hyper-parameter search. How does an AI fare if it starts bluffing 10% more? What if it starts folding more often in big pots? What if it starts betting in those weird “1.75x pot” amounts? Against an equilibrium approach that is not adjusting to opponent’s patterns, it could be possible to stumble upon a strategy that takes advantage of the holes in the AI, if any exist. Perhaps there are not many weaknesses left to find.
It might be more effective to training an adversarial network to recognize if you are playing against a human, a weak AI, or one of these equilibrium monsters. Professional players seem more than a little bit concerned about being trolled with strong AIs posing as humans, as DeepMind had done on a Chinese Go server recently.
I hung out with the the CMU poker group and the pro players on the Twitch stream the rest of the day. The players were impressed by the how the computer continues to play. They finished the Thursday afternoon session booking a mini loss, significantly reducing the previous -$21 per hand pace.
A blowout win for the humans seems unlikely.
Only one of the players (Jimmy Chou) came out the session well ahead, and even he commented on how hard the AI was to read — poker speak for putting a player on a specific range of hands. Last year’s glaring mistakes by the AI appear to be fixed, possibly by the super-computer doing its simulations on the turn and river. The early-hand strategy is also more nuanced than last year, in part creating more turn and river situations, which drove the players a bit nuts once the games started turning into 20 second plus waits consistently on both of their screens. It looked to me like a great number of hands go to showdown, or are folded on the river — thus taking a minute or more of computer thought time.
To a strong but non-professional player, it looked like very good poker. I also had the privilege of watching great heads-up players working their best, under good conditions, and giving away some of their thoughts about the hands live on Twitch. I learned a lot about heads-up strategy, and it’s really easy to see how someone could go deep on the heads up version of this game.
At some point, Dan Mcaulay, buried in a streak of bad luck hands (he lost and Jimmy won, playing the same cards from the other side), pointed out how “human” the AI’s strategy appeared. Libratus does bluff with some cards that a human would not consider — good bluff spots, but unusual hands to bluff with. But other than that, it did not win by blasting hard at all of the pots to overpower its opponent. Instead, it seemed to play expert level small-ball poker, with an occasional thought-provoking overbet. Always putting the players in tough spots.
Given enough practice, I wonder if the AI could learn to seek out those tough spots, and push opponents even harder. For now, it is remarkable how close the Libratus strategy — which was trained in a lab on a supercomputer, without any human advice, comes to playing a lot like just a solid, well-balanced human poker player.
Maybe this shows how close the humans online were to an equilibrium strategy, in the first place.
I predict that with two days of practice and some time to compare notes, the four pros will hold the AI close to break-even going forward. I also think that their play will start to slip a bit later in the match because of fatigue. I expect the final results to end up somewhere above below for the players, but below the -$21/hand pace that the AI started over the first two sessions of play.
I’ll guess at -$15/hand to finish. This sounds harsh toward the humans, as I think at their best they are better than this. Past AI challenges, most famously Kasparov vs Deep Blue have shown that human performance suffers over the course of a long match, while the computer never gets tired.
On the Twitch stream, Prof. Sandholm told us that if the players can’t win this week, it is unlikely that they could win under any other conditions. It is true that the casino is a nice place to play, and whatever edge the players could get through study, live heads-up displays or practice, these would probably be minimal at this point. It’s also great to be reminded how much the players want to win, while the CMU poker group is cheering for the coming 🤖 domination!
I do think that a future poker AI could improve upon Libratus, whether it would be through more online solving, or with an adversarial network that finds equilibrium solvers’ remaining weaknesses and exploit them more directly. An AI could certainly do more to exploit the patterns of human play.
It would be hard to come up with a deep yet simple game like no limit Texas Hold’em. My brain is fried after watching the games and chatting a bit with the players, asking them to explain some of the difficult spots. The big all-in hands are far between, but in heads-up poker the strategy in just about every hand can be fascinating.
The match progressed much closer to what I predicted, than I could have imagined. With 100k of 120k hands in after two weeks, the players are down nearly $14/hand — very close to what I predicted above.
It would be great to see the graph of player performance, so if anyone saved the daily updates, please tweet at me or post below — I’ll add it to the article. You can see the graph below, through the first 50k hands:
As I predicted, the players cut the AI’s initial $25/hand lead down as they adjusted — these guys are the best — but then they got tired. That is what I predicted, and that appears to be exactly what happened.
A win by this margin would be statistically significant. Doing a back of the envelope calculation, each day of play swings about $100k in chips, over about 5.5k hands. With those assumptions, one standard deviation is $18/hand per day. Treating 18 days as independent samples, two standard deviations for the match work out to ~$8.3/hand. The official numbers are more complicated, and will be released by CMU when the match is over. But I’m probably overshooting the error bars, if anything.
I won’t comment on how the AI has adjusted to the players, or whether it’s learning from previous days’ sessions. I have some ideas how it could have done so, but the CMU team has promised to explain what they did after the match. They deserve the chance to explain their methods when they are ready, having put on a great event, as well as a historically significant perfomance.
Meanwhile as it’s become more clear that the AI is #winning, the match has gotten increased attention — both in the mainstream media, and especially amongst poker fans and AI fans on Twitter.
Libratus is becoming a Twitter celebrity, personified by an anonymous parody account. In true American style, humble it is not.
Doug Polk had a great interview with the players on YouTube. A common theme was that Libratus showed the players that it’s possible to win in 200 big blind deep poker, using non-standard bet sizing. In particular, the bot does a lot of overbetting — betting more than the size of the pot.
This makes sense to me. For the past year, I’ve been playing against Slumbot — a less sophisticated but also very good equilibrium solving heads-up poker AI. Slumbot overbets the pot all the time, and I’ve learned to gain an edge (I’m up $1/hand after 10k+ hands of play) by overbetting the pot all the time.
It’s not about me, so I will save examples for a future blog post, but expect more overbets in no limit hold’em poker, as a result of this match. There’s no rule in poker, requiring the standard bet to be about 2/3 the size of the pot.
Create your free account to unlock your custom reading experience.