10 Years of AlphaGo: The Move That Changed AI's Direction
Guests: Thore Graepel (Distinguished Research Scientist, Google DeepMind; core architect of AlphaGo), Pushmeet Kohli (VP Research, Google DeepMind Science) Host: Professor Hannah Fry Source: Google DeepMind Podcast · 54 minutes
In March 2016, inside a hotel suite in Seoul, 18-time Go world champion Lee Sedol sat across from a neural network. Seven days later, the score was 4-1. A decade on, the core insight from that match — that machines can surpass human cognition through self-play — has extended into protein folding, mathematical proofs, and algorithm discovery.
This podcast episode features two people who were there: Thore Graepel, a core architect of AlphaGo who was present in Seoul, and Pushmeet Kohli, who leads DeepMind’s science work and drove follow-up projects including AlphaFold, AlphaTensor, and AlphaEvolve. They tell the complete story, from a Go board to a transformation in how science is done.
Go: The Perfect AI Challenge
Go’s rules can be learned in five minutes, but the number of possible game states is 10^170 — far exceeding the number of atoms in the observable universe. IBM’s Deep Blue beat the chess world champion through brute-force search in 1997, but Go’s search space dwarfs chess by orders of magnitude. Brute force wouldn’t work.
Thore Graepel: “The game has such simple rules, yet it leads to such complex gameplay with tactics and strategies. Once chess had been solved, Go was the open challenge. Nobody was expecting it to be solved anytime soon.”
On his first day at DeepMind, David Silver dragged Thore to a table to test a prototype that didn’t even have a name yet — just an internship project trained on a few hundred thousand games from the internet. Thore played conservatively — “just don’t make a mistake.” But that was exactly what this version was trained to exploit: conventional play by professional standards.
“My position became worse and worse and I ended up losing by a small margin. But I took the crown of the first person who officially lost to AlphaGo. It was a wonderful way of introducing myself. A humbling way.”
Thinking Fast and Slow: AlphaGo’s Brain
AlphaGo’s core design mirrored how humans play Go.
Pushmeet Kohli explained that Go’s difficulty lies not just in the breadth of available moves (200-300 per turn, versus 20-30 in chess), but in the depth — games last far longer, requiring much deeper reasoning chains.
AlphaGo’s solution combined two types of thinking: the policy network handled “fast thinking,” scanning the board and immediately identifying promising moves — like a human player’s intuition. The value network plus Monte Carlo tree search handled “slow thinking,” explicitly reasoning through sequences of moves and counter-moves.
Thore noted this closely matches human cognition — players never consider all possible moves. They lock onto a few that “just feel right,” guided by intuition, then calculate deeply.
“Deep learning was just ripe at the time to learn these approximate functions. The timing was perfect.”
The team needed a real professional to test the system. They found Fan Hui, the European Go champion, and flew him from Bordeaux to London for 10 test games. Thore bet David Silver that AlphaGo would lose at least one game. David predicted a 10-0 sweep. The loser had to come to the office dressed as an ancient Japanese Go master for a full day.
The result was 10-0. Thore wore the costume.
Move 37: A Machine Surpasses Human Intuition
In March 2016, the AlphaGo team flew to Seoul to challenge Lee Sedol — compared to Roger Federer for his record and intellectual brilliance. He had studied AlphaGo’s games against Fan Hui, assessed himself as stronger, and was confident of victory. What he didn’t know was that AlphaGo had been continuously improving through training and algorithmic refinements.
“In England, Go is a niche activity,” Thore said. “But in South Korea, people were so excited. The best Go players are celebrities. We came there and there were hordes of photographers. Typical computer geeks, suddenly in the limelight of the world.”
AlphaGo won game one. An American professional sitting next to Thore initially dismissed AlphaGo’s play: “I always tell my students not to play that stupid move.” After the game, the same man said: “This is the most phenomenal thing I’ve ever experienced. I’m so grateful to witness a machine playing Go at this level.”
Game two produced the moment that changed history.
Commentator Michael Redmond placed the stone for move 37 on his demo board, stepped back, and said, “This must be wrong.” He removed it, checked the screen again, and put it back.
Move 37 was a shoulder hit on the fifth line. Go wisdom holds that in edge battles, the third and fourth lines represent a fair trade between territory and influence. Playing on the fifth line means conceding too much territory. AlphaGo calculated that a human player would make this move with a probability of 1 in 10,000.
Dozens of moves later, it proved to be the winning move. It redefined the trade-off between territory and influence — showing that in certain positions, even the fifth line is profitable.
Pushmeet Kohli on the significance:
“People initially thought it was a hallucination or a mistake. Only as the game progressed did its implications become clear. This moment showed us that there will be times when these systems produce insights we can’t even immediately judge — but they will fundamentally change how we see entire fields of study.”
Move 78: Humanity’s Last Divine Move
After three straight losses, Lee Sedol played move 78 in game four — an unconventional wedge.
Thore: “From that point on, AlphaGo’s moves didn’t make sense — not in the Move 37 way of ’temporarily confusing but possibly brilliant,’ but in a way that seemed wrong even to amateurs.”
The team wasn’t entirely disappointed. Even though AlphaGo had clinched the series, if Lee Sedol won the last two games, the narrative would shift entirely. “He’s found the weakness. Human triumph.”
Lee Sedol said at the press conference that he was proud to have found, “maybe for the last time, on behalf of humanity, a way to overcome the machine.”
Some called move 78 “the divine move.” Given the pressure of that moment and Lee Sedol’s transcendent performance, Thore considered it a fitting name.
The final series score was 4-1. The Go community’s reaction was surprising — interest in Go actually increased. More people started learning the game, and professionals embraced AI tools built on the same techniques for analysis and teaching.
AlphaZero: What Happens Without Human Knowledge
If AlphaGo vs. Lee Sedol was an engineering feat, AlphaZero was a scientific breakthrough.
AlphaZero trained with zero human game data. It knew only the rules, starting from entirely random play, accumulating experience through self-play, gradually learning what constitutes good and bad moves.
Thore described a striking pattern: it first rediscovered classical human Go patterns — openings refined over centuries. “We thought, wow, it’s finding the same openings.” Then it stopped using them.
“It had found refutations. It rediscovered human knowledge, then discarded it, because it had found something better.”
By the end, AlphaZero’s Go looked alien. “This wasn’t the kind of Go I learned from my teacher. The moves seemed random, but 30 moves later everything fell into place — as if it had foresight. Which it did.”
From a scientific perspective, Thore considers AlphaZero even more significant than the original AlphaGo: proof that superhuman capability can emerge without any human prior knowledge.
From the Board to Proteins: Search Transforms Science
During the Seoul match, the documentary crew’s cameras had stopped rolling, but the microphones were still on. They captured a private conversation between DeepMind founders Demis Hassabis and David Silver:
“It’s just amazing seeing how quickly a problem that is seen as being impossible can change to being done. We can do protein folding. I mean, this is huge. I’m sure we can do that.” “I thought we could do that before.” “But now we definitely can do it.”
This became the prelude to AlphaFold.
Pushmeet Kohli explained that the door AlphaGo opened — if we can navigate a search space of 10^170, we can handle other vast combinatorial spaces — was systematically applied to science.
AlphaTensor turned matrix multiplication — the computational foundation of all neural networks and LLMs — into a “game.” The objective: complete a matrix multiplication accurately in the fewest possible steps. Strassen’s 1969 algorithm had held the record for 50 years. AlphaTensor broke it.
AlphaEvolve went further, searching the space of all possible programs for optimal algorithms, applied to data center scheduling, network packet routing, and other real-world problems.
Hannah Fry pressed on a key point: if the search space is no longer a Go board but “all possible algorithms in the world,” how do you build intuition to narrow it down?
Pushmeet acknowledged that these agents sometimes discover algorithms that are counterintuitive to humans. They find symmetries that mathematicians and computer scientists hadn’t noticed, then exploit them for dramatic efficiency gains. “Sometimes we just don’t understand why it’s faster. But it is faster.”
Move 37 or Hallucination? The Interpretability Dilemma
This raises a core question: when AI produces an original result, how can you be sure it’s a Move 37-type breakthrough rather than a hallucination?
Pushmeet pointed to verifiers. He invoked Karl Popper’s “conjecture and refutation” framework: AI’s generative capability serves as “conjecture” — including potential hallucinations; the verifier serves as “refutation” — filtering out what’s wrong.
This explains why AI advances fastest in certain domains: code can be compiled and tested, mathematical proofs can be formally verified. These fields have built-in “refutation” mechanisms.
DeepMind’s math agent AlphaProof generates mathematical proofs that are verifiable — you can confirm they’re correct even if you don’t fully understand them. But Pushmeet raised two new challenges: first, how to accurately specify problems for AI (formalization itself is hard); second, how to translate AI solutions back into human-comprehensible form.
Hannah Fry asked, half-jokingly: what if AI proves the Riemann hypothesis, but the proof exceeds any human’s ability to understand?
Thore’s response was striking:
“An explanation needs to account not just for the phenomenon, but for the intellectual level of the recipient. Future AI systems may come up with explanations that seem simplistic to them — but are just about right for us to keep up.”
As for mathematicians, Pushmeet argued they haven’t been replaced — their role has shifted. From problem solvers to problem posers. “These agents can solve incredible problems. But what problems need solving? How do you specify them? That’s where mathematicians and scientists come in.”
LLMs: A Shortcut, Not the Destination
The final major topic was the relationship between LLMs and the AlphaGo lineage.
DeepMind’s founding philosophy was to grow intelligence by placing agents in environments and letting them learn through trial and error. Then LLMs arrived — what Thore calls a “shortcut.”
“There’s this huge amount of crystallized intelligence stored in the form of data on the internet — text, images, videos. The shortcut is to first mine all of that data.”
This shortcut has been remarkably productive, but it has a ceiling: LLMs are bounded by their training distribution. Going beyond what humans already know is difficult.
Thore argued this is precisely why the community has spent recent years returning to methods DeepMind pioneered early on — reinforcement learning, self-play in environments, post-training in verifiable domains like coding.
“We are now entering a period where we’re going, again, beyond human knowledge.”
Editorial Analysis
Speaker Position
Both guests are Google DeepMind employees, speaking on an official DeepMind podcast. They naturally frame AlphaGo as the “origin point” and “turning point” of the entire AI revolution. This narrative is genuine but selective.
Selectivity in the Argument
The interview emphasizes a linear progression from AlphaGo → AlphaZero → AlphaFold → AlphaTensor → AlphaEvolve, while describing the LLM path in simplified terms — calling it a “shortcut” and implying that human-data-based approaches are inherently limited. In reality, LLMs and reinforcement learning are deeply integrated in modern systems (RLHF, reasoning models like o1), making it difficult to treat them as separate paths.
Frameworks Worth Questioning
- The “Go → science” narrative omits DeepMind’s extensive exploratory work between Go and AlphaFold, creating an impression of linear progress
- Characterizing LLMs as “unable to exceed training distribution” isn’t fully accurate — chain-of-thought reasoning, tool use, and other mechanisms have demonstrated beyond-distribution capabilities
- The “conjecture and refutation” framework, while elegant, sidesteps a hard problem: most scientific domains (climate, biology, social science) lack the clean verifiers that Go and code enjoy
Unasked Questions
- What computational resources did the AlphaGo team deploy in Seoul? What does that scale mean then versus now?
- Lee Sedol’s post-match career and wellbeing (he retired in 2019, partly due to AI’s impact)
- Is the Go community’s embrace of AI-assisted training universal, or are there dissenters?
Key Takeaways
- AlphaGo’s breakthrough wasn’t brute force but the combination of intuition and reasoning — the policy network provided human-level instinct while search provided deep reasoning, making Go-scale problems tractable
- Move 37 proved AI can produce genuinely original insights — not just doing what humans do faster, but finding what humans hadn’t considered
- AlphaZero was the bigger scientific breakthrough — removing human data improved performance, proving that human prior knowledge can be a constraint rather than an asset
- “Turn the problem into a game” is a surprisingly universal paradigm — from Go to protein folding to matrix multiplication to algorithm search, the same approach keeps working
- Verifiability determines AI’s applicability — domains with clear verification mechanisms (code, mathematical proofs) advance fastest; open-ended scientific problems remain challenging
Source: Google DeepMind Podcast - 10 years of AlphaGo · Guests: Thore Graepel, Pushmeet Kohli · Host: Hannah Fry · Duration: 54 minutes
If you found this helpful, consider buying me a coffee to support more content like this.
Buy me a coffee