Facebook AI claims Pluribus is the first AI to consistently beat more than 2 human players in a benchmark game. How the AI was constructed was detailed in a paper published in the journal Science. Researchers say it’s able to surpass top human performance within 20 hours of training.
Pluribus achieves its goal with just $150 worth of cloud computing for training.
Like AI trained to play games like Go, Dota II and StarCraft II that precede Pluribus, the AI achieves its results by training in matches against itself. Training for more than 20 hours produces an AI player better than top human players, researchers said.
“The core of Pluribus’s strategy was computed via self play, in which the AI plays against copies of itself, without any data of human or prior AI play used as input. The AI starts from scratch by playing randomly, and gradually improves as it determines which actions, and which probability distribution over those actions, lead to better outcomes against earlier versions of its strategy,” the Science paper reads.
The AI is Is a collaboration between Carnegie Mellon University Computer Science Department and Facebook AI Research, along with companies like Strategic Machine, Strategy Robot, and Optimized Markets.
In tests by researchers, Pluribus won in both 5 humans and 1 AI matches as well as 5 AI and 1 human games. If each chip was worth $1, Pluribus would have made approximately $5 on each hand and earned roughly $1,000/hour playing against 5 humans, Facebook AI said.
“The exact number of bets it considers varies between one and 14 depending on the situation. Although Pluribus can limit itself to only betting one of a few different sizes between $100 and $10,000, when actually playing no-limit poker, the opponents are not constrained to those few options,” the Science paper reads.
Pluribus builds on Libratus, an AI poker player made by Carnegie Mellon in 2017, but it comes with some additional features, like a search algorithm to evaluate outcomes a few moves ahead.
Abstraction is also used to reason about betting in future rounds and to batch strategically similar hands. The AI also uses counterfactual regret minimization, an iterative self-play algorithm that plays against itself for improvements.
Take a look at live gameplay: