Check out articles at https://medium.com/@reasonets
Machines That Play series has been broken into 7 parts: This is Part 6 of the series.
This series covers the history of Artificial Intelligence and games (until Deep Blue) and focuses on machines that played chess, checkers, and backgammon. The following topics are covered: how to build chess machines, Shannon’s work on chess, Turing’s work on chess, The Turk, El Ajedrecista, MANIAC, Bernstein chess program, Samuel’s checkers, Mac Hack VI, Cray Blitz, BKG, HiTech, Chinook, Deep Thought, TD-Gammon, and Deep Blue.
Part 1: Machines That Play (Overview) — this one
Part 4: Machines That Play (Deep Blue)
If you want a summary of the first 5 parts, focusing on the human elements, go here.
Part 6: Machines That Play (Checkers) — this one
Part 7: Machines That Play (Backgammon)
This part will cover Samuel’s checkers (1950s) and Jonathan Schaeffer’s Chinook (1990s). Samuel’s checkers was the first program to learn (before AI was coined). Chinook was the first computer program to win the world champion title against humans. Schaeffer also solved Checkers in 2007 — this is coming soon.
Before we begin, let’s look at some ways to measure game complexity.
The state-space complexity of a game is the number of legal game positions reachable from the initial position of the game.
The game tree size is the total number of possible games that can be played: the number of leaf nodes in the game tree rooted at the game’s initial position.
The branching factor is the number of children at each node. For example, chess, suppose that a “node” is considered to be a legal position, then the average branching factor is estimated to be about 35. This means that, on average, a player has about 35 legal moves available at each turn. By comparison, the average branching factor for the game Go is 250!
Optimal status: it is not possible to perform better (some of these entries were solved by humans)
Super-human: performs better than all humans
Minimax can be traced to a 1913 paper by Ernst Zermelo, the father of modern set theory. The paper contained several errors and did not describe minimax correctly. He did, however, propose (but did not prove) what became known as Zermelo’s Theorem: in any finite deterministic (i.e. in which chance does not affect the decision making process) perfect information two-player game (such as chess) there are three possibilities: either the first-player can force a win, or the second-player can force a win, or both players can force a draw. When applied to chess, Zermelo’s Theorem states “either White can force a win, or Black can force a win, or both sides can force at least a draw”. We don’t (yet) know which it is for chess. (Checkers, on the other hand, has been solved: either player can force a draw.)
Von Neumann improved Zermelo’s minimax theorem to include games involving imperfect information and games with more than two players. In 1944, John Von Neumann and Oskar Morgenstern published Theory of Games and Economic Behavior. This is the groundbreaking book that created the research field of game theory. Originally, game theory addressed zero-sum games, but since then it has been applied to a wide range of behavioral relations, and has become a framework for modeling scenarios in which conflicts of interest exist among the participants.
What does it mean to say a game is perfect? Each Player has complete knowledge of the game state. For two players games, they take alternate turns. Examples: Chess, Checkers, Connect-Four, Go, Othello
What does it mean to say a game is imperfect? Some of the game state is hidden. Examples: Poker, Bridge
What does it mean to say a game is not deterministic? Game involves an element of chance. Example: Backgammon (involves dice rolls)
Now, let’s talk about the machines.
In 1949, Arthur Samuel joined IBM’s Poughkeepsie lab and in 1952 he programmed IBM’s first major computer — IBM 701 — to play checkers. It was the first game-playing program to run on a computer — it was the first checkers program
On February 24, 1956, Arthur Samuel demonstrated the program and played a game of checkers on television.Before the demonstration, Thomas J. Watson, Sr., the founder and president of IBM, predicted that the demonstration would make a strong impression — and it would raise the price of IBM stock by 15 points. It did!
[Side note. See videos titled The Computer Pioneers: The Development of the IBM 701: “The Computer Pioneers is a video oral history project produced by Richard Jay Solomon. This segment of the series discusses the development of the IBM 701 model computer, also known as the Defense Calculator, in the early 1950s. These interviews were conducted on July 12th, 1983 and feature several members of the IBM 701’s development team including Jerrier Haddad, Clarence Frizzell, Nathan Rochester, and Richard Whalen.”]
By 1955, Samuel had done something groundbreaking; he had created a program that could learn — something that no one had done before — and this was demonstrated on television in 1956.
Samuel had been thinking about machine learning since he joined IBM and wanted to focus on building programs that could learn to play the game of checkers. In 1959 he coined the term machine learning as the “field of study that gives computers the ability to learn without being explicitly programmed”. He later said, “I became so intrigued with this general problem of writing a program that would appear to exhibit intelligence that it was to occupy my thoughts during almost every free moment for the entire duration of my employment by IBM and indeed for some years beyond.”
He published a seminal paper in 1959, titled Some Studies in Machine Learning Using the Game of Checkers, where he talked about how a machine could look ahead “by evaluating the resulting board positions much as a human player might do”. The computer started out losing to Samuel and eventually beat Samuel.
He had programmed the “digital computer to behave in a way which, if done by human beings or animals, would be described as involving the process of learning.”
In fact, the program had beaten its creator and it had done that by learning in a relatively short amount of time — “a computer can be programmed so that it will learn to play a better game of checkers that can be played by the person who wrote the program. Furthermore, it can learn to do this in a remarkably short period of time (8 or 10 hours of machine-playing time).”
8AM: Your computer program isn’t any good at checkers. You teach it how to play.
9AM: You leave it alone and let it play by itself the entire day.
It plays against itself and learns from its mistakes.
6PM: You come back and play it. It beats you.
And that was in the 1950s — with slow computers of that time. In some sense, this is the earliest glimpse of what lies ahead in games: no matter what the rate of improvement is for humans, once machines learn to learn, it will be hard to keep up with machines, whose progress will end up being measured exponentially. And ours won’t.
No matter what the rate of improvement is for humans, once machines learn to learn, it will be hard to keep up with machines, whose progress will end up being measured exponentially. And ours won’t.
We know (from from Shannon’s work described in Machines That Play (Chess) that the main driver of the machine that played games was a search tree of the set of all possible board positions reachable from the current state.
Checkers is extremely complex — it has roughly 500 billion billion possible positions (5 x 10²⁰). About 10²⁰ possible board positions compared to Chess (10⁴⁷) or Go (10²⁵⁰)). Even though checkers is simpler than those games, it is still complex enough that a brute force only approach is impractical.
In his paper, Samuel said that “at our present stage of knowledge, the only practical approach, even with the help of the digital computer, will be through the development of heuristics which tend to ape human behavior.“
Samuel’s program was based on Shannon’s minimax strategy to find the best move from a given current position.
Recall that in two-player games, the minimax algorithm determines the best move — minimax is a recursive algorithm and it chooses an optimal move based on the assumption that both players play optimally. Simply put, it’s trying to model what we do when we play: we think “If I make this move, then my opponent can only make these moves, then I will make this move,…”
The problem with a full minimax search algorithm is that it explores all parts of the tree, including the parts of the tree it doesn’t need to. In other words, given a board position, human experts tend to “know” that some moves are irrelevant and some moves are good. They don’t explore all moves, they prune out the irrelevant or bad ones up front.
The rules of thumb that human experts use to prune out bad moves are called “heuristics”. In games, heuristics are implemented to cut down on computation time. An (easy) example of a heuristic in chess is that if a move leads to the player’s king being in checkmate, then the algorithm should not look further down that path because a player will never want to make that move. There is no point in exploring that branch — it is a waste of precious computational resources, but simple minimax search will still explore that path. Alpha-beta pruning will not explore that path.
To attack the complexity problem, Samuel implemented alpha-beta pruning — instead of searching each path to the game’s end, Samuel developed a scoring function based on the position of the board at any given time.
Alpha-beta pruning is an optimization of minimax. At a given node n, if a player has a better choice a (at a node further up), then n will not be reached. By looking at some successors of n, we can know enough about n to prune it out. The idea is that the algorithm will maintain two values, alpha and beta, where alpha represents the minimum score that the maximizing player is guaranteed and beta represents the maximum score that the minimizing player is guaranteed. Both players start with their worst possible score: alpha is negative infinity and beta is positive infinity. So when to prune? Well, each time beta (the maximum score that the minimizing player is guaranteed) becomes less than or equal to alpha (the minimum score that the maximizing player is guaranteed) (or when alpha becomes greater than or equal to beta), the maximizing player can prune those nodes. The chart below explains this:
The alpha-beta algorithm produces the same result as the minimax algorithm, but is more efficient because entire branches can be eliminated from the search process. This allows the program to evaluate the minimax search tree much deeper, while using the same resources. This combination of the minimax algorithm and alpha-beta pruning significantly reduced the total number of branches of the tree, which made it feasible to play games on a computer. Here’s the (commonly referenced) basic idea:
“If you have an idea that is surely bad, don’t take the time to see how truly awful it is.” — Pat Winston
In addition to minimax and alpha-beta pruning, Samuel used a fundamentally new and important component: learning. He used two main learning methods to create the first competent AI program: a) rote learning and b) learning by generalization.
Rote learning meant the program would save all of the board positions encountered during play, along with their scores. Then when those moves need to be evaluated further down the tree, they could simply be referenced instead of evaluated, thus saving some computing time. Even though this wasn’t an advanced form of learning, the idea is for the program to use the saved time to compute further in depth. And rote learning produced slow but continuous improvements, which were most effective for openings and endgames.
Learning by generalization meant modifying the evaluation function based on previous games played by the program. This was groundbreaking because it showed that a a program could learn by playing against previous versions of itself (which would be a key aspect of AlphaGo). This learning method came closest to the reinforcement learning technique later dubbed temporal-difference learning (TD-Lambda) — that the value of a state should equal the value of likely following states. (Side note: Samuel’s method was conceptually the same as that used much later by Tesauro in TD-Gammon.) The version using generalized learning was able to develop a good middle game but remained weak in opening and endgame play. Later in the series, we will see how some of the most successful programs would attack these issues.
Samuel did his work despite poor computation power in that period, while working essentially alone and doing his own programming. The principles of machine learning verified by the program’s “experiments are, of course, applicable to many other situations.” Today there is no field that is untouched by machine learning.
Successive enhancements made by Samuel allowed the computer to improve and achieve good skill — it eventually beat Samuel, but it still could not beat the experts consistently.
But it did defeat an expert player Robert Nealy, however, there is a controversy about whether Nealy lost the game when a draw was available (i.e. he played poorly) or whether Samuel’s program won (i.e. it played well).
The next year a six match rematch was won by Nealy 5–1.
Samuel’s checkers demonstrated that a computer could be programmed to learn to play a game better than its creator. It could improve its performance doing something that humans could not — by playing thousands of games against itself to practice.
This was a monumental achievement for his day. But Samuel himself had no illusions about the strength of his AI program. Even though it is a milestone in AI, it was blown out of proportion by the media. Some said, “This work led to the false impression that checkers was a “solved” game. As a result, researchers moved on to chess, and largely ignored checkers for over 25 years.”
In 1990, Schaeffer reached out to Samuel to tell him about his program’s massive improvements, but Samuel had just died. It was the end of one of the founders, one who gave us the very first glimpse of machine intelligence.
Is he really human or is he superhuman?
The year was 1991. Marion Tinsley had agreed to a friendly checkers match against Chinook. Tinsley had been the world’s best checkers player for 40 years. Regarding Tinsley it was said that he was “to checkers what Leonardo da Vinci was to science, what Michelangelo was to art and what Beethoven was to music.”
After Samuel’s work on checkers, there was a false impression that checkers was a “solved” game. As a result, researchers moved on to chess and mostly ignored checkers until Jonathan Schaeffer began working on Chinook in 1989. Schaeffer’s goal was to develop a program capable of defeating the best checkers player.
Back to 1991, Chinook was playing the best player ever, Tinsley. The first nine games they played were all draws. In the tenth game, Chinook made a move that it thought was slightly advantageous. Seeing this Tinsley remarked, “You’re going to regret that.” In an interview Schaeffer said. “And at the time, I was thinking, what the heck does he know, what could possibly go wrong?” It turned out that Tinsley began to pull ahead and Chinook resigned. Schaeffer said (about Tinsley), “In his notes to the game, he later wrote that he had seen all the way to the end of the game and he knew he was going to win.”
After the game when Schaeffer looked back into the database, he discovered that from that move to the end of the game, if both sides played perfectly, he would lose every time. But to see that, a computer or a human would have to look 64 moves ahead. Tinsley seemed to have picked the only strategy that could have defeated Chinook from that point — Tinsley was able to see the win 64 moves ahead.
“I was absolutely stunned,” Schaeffer said.
“How do you compete with somebody whose understanding of the game is so deep that he immediately knows through experience or knowledge or doing some amazing search that he was gonna win that position?”
It’s only a matter of time now
Chinook and Marion Tinsley met again in 1992 at the Man-Machine World Championship. It was the first time in history a human was playing a computer for a world championship title.
Because of Tinsley’s greatness, most players would play cautiously against Tinsley, hoping for a draw. Chinook, however, played a very different game. Regarding how Chinook played, Schaeffer said, “It played brash, aggressive moves — it walked on the edge of a precipice…It would do things people looked at and said, ‘Man, is that program crazy?’”
Even though Chinook lost against Tinsley, it came close to defeating Tinsley. It played a stunning eighth game and won; it was Tinsley’s sixth loss in 40 years.
According to The Atlantic, Schaffer felt sad about Tinsley’s loss and later wrote in this book:
“We’re still members of the human race and Chinook defeating Tinsley in a single game means that it will only be a matter of time before computers will be supreme in checkers, and eventually in other games like chess.”
But it wasn’t just one single game. Chinook won again in game 16. No living player had defeated Tinsley more than once. Tinsley had lost only seven games in his entire 45 year career, two of them to Chinook.
Finally, a machine was becoming more perfect.
But Chinook had some kind of an error, which forced Schaffer to resign the game. Schaeffer was devastated.
Losing his divine backing
In 1994, Chinook, after having beaten another player, was ready to face Tinsley again. Scientific American reported, “The night before the match,
…Tinsley dreamt that God spoke to him and said, “I like Jonathan, too,” which had led him to believe that he might have lost exclusive divine backing.”
Chinook hadn’t lost a game in its last 125 games and since 1992, Schaeffer’s team had spent thousands of hours improving Chinook.
Leading up to the match, Tinsley’s stomach had been hurting. It hurt again the day of the match. After six games, all of which were draws, he needed to see a doctor so Schaeffer took him to the hospital. The next day Tinsley was told there was a lump on his pancreas — he had pancreatic cancer. He withdrew.
Don Lafferty, rated the number two player in the world at the time, replaced Tinsley and played Chinook. Chinook won and became the first computer program in history to win a human world championship. Seven months later, Tinsley died.
Schaeffer’s program never got to beat the best checkers player ever — and this was why he started it all in 1989. The best human player never truly lost a match to Chinook. By the late 1980s checkers programs had become more advanced. Chinook played its last tournament in 1996, where Chinook finished “miles ahead of the strongest human players in the U.S. Championship.”
But that wasn’t enough for Schaeffer. He needed to make sure his program could beat the best player ever.
From 1997 to 2001, Schaeffer suspended Chinook and began working on solving the game of checkers, which meant his program would always know the right move. It would be perfect.
In an interview with Scientific American, Schaeffer said, “From the end of the Tinsley saga in ’94–’95 until 2007, I worked obsessively on building a perfect checkers program. The reason was simple: I wanted to get rid of the ghost of Marion Tinsley. People said to me, ‘You could never have beaten Tinsley because he was perfect.’
“Well, yes, we would have beaten Tinsley because he was only almost perfect. But my computer program is perfect.”
“Eighteen years later, I’m finally done.” Schaeffer said.
In 2007, the makers of Chinook published a paper in the journal Science announcing that Chinook had completely solved Checkers: the program could no longer be defeated by any, human or otherwise. Jonathan Schaeffer and his team had been working to solve the checkers problem since 1989. The article stated, “…checkers is now solved: Perfect play by both sides leads to a draw. This is the most challenging popular game to be solved to date, roughly one million times as complex as Connect Four.” Checkers is the largest game that has been solved to date, with a search space of 5×10²⁰. “The number of calculations involved was 10¹⁴, which were done over a period of 18 years. The process involved from 200 desktop computers at its peak down to around 50”.
Create your free account to unlock your custom reading experience.