Yuandong Tian: LLMs Have No Secrets, But the Flood Is Coming
Guest: Yuandong Tian, Former Research Director at Meta AI (~10 years), author of Positional Interpolation, Attention Sink, H2O, Coconut. Now co-founding an AI startup (Series A in progress). Host: Silicon Valley Vector Duration: 62 minutes Source: YouTube
No Secrets in Silicon Valley: LLM Competition and Moat Ranking
Distillation (training weaker models on stronger models’ outputs) is accelerating technology diffusion. Yuandong Tian notes this trend has been intensifying since late 2024: “A mediocre model can quickly reach the level of a stronger model through distillation. As more people master these techniques, iteration speed will only increase.” The implication: the leader’s technology window is shrinking to two or three months.
Big tech and startups play different games in this race. Big companies don’t lack cash flow — their primary goal in releasing frontier models is demonstrating technical prowess and talent depth. Startups use the competitive rhythm to secure funding and survive. Different motivations, same accelerating pace.
On moat ranking, Tian offers a clear hierarchy: Data first, Infrastructure second, algorithms and talent trailing behind. His logic is direct — AI-assisted coding is eroding Infra barriers, while algorithms are in a relatively stable state. As for talent:
“It’s hard for a secret to survive long in Silicon Valley. A new approach comes out and within two or three months, everyone probably knows a bit about it.”
Talent mobility makes algorithmic advantages ephemeral. Data remains the hardest asset to replicate.
Open source plays the role of “nuclear balance.” Tian draws a forceful analogy:
“For an exponentially growing technology, the worst outcome is a few people controlling it while most don’t know about it. With open-source models, everyone achieves rough parity. If everyone has nuclear weapons, deterrence creates a political equilibrium.”
Open source isn’t charity — it’s structural insurance against technology monopoly.
Meanwhile, commercialization pressure has reached the product layer. OpenAI is reportedly considering embedding ads in ChatGPT — either within conversations or in sidebar panels. When a company whose mission is “artificial general intelligence” starts thinking about ad placements, the industry has shifted from a technology race to a revenue race.
Divide Position Information by Two: Two Routes for Memory
In June 2023, Tian’s team published Positional Interpolation, tackling a seemingly brute-force problem: extending LLM context windows from 2K/4K tokens. The prior consensus required retraining with massive long-text datasets — slow and computationally expensive. They found a shortcut: map long context windows to short ones by simply dividing each token’s position information by two, then fine-tune. Training cost dropped dramatically while quality remained solid. The paper became a foundational work in context window extension.
But however long the context window grows, it remains fundamentally short-term memory. Tian draws a clear division between two memory types:
“Context memory is short-term. There’s another type of memory in the model’s weights — that’s long-term memory. Long-term memory is established during pretraining, by feeding the entire Internet into training. The weights gradually evolve from initialization to a good state. This memory governs the model’s overall understanding of the world and is very hard to change.”
Weight memory (long-term) functions like a person’s foundational cognition — high-quality pretraining produces a “smart child” who picks things up quickly; poor pretraining produces a student who needs everything spelled out and can’t generalize. Context memory (short-term) is the working area of the current conversation.
The fundamental tradeoff: store everything (complete but slow and memory-heavy) versus compress or discard (fast and efficient but potentially forgetful). Linear Attention models take the compression route — condensing all past context into a fixed-length vector. Minimal memory usage, but finite space cannot contain infinite history.
Google’s Nested Learning proposes mapping everything to Associative Memory — input a key, get a value. But Tian is skeptical:
“All memory is inefficient — it just stores a point and pops it back out. But humans, after learning enough, develop an overall understanding of the world — ‘grasping the big picture.’”
Pure key-value retrieval cannot substitute for holistic understanding. The model’s memory mechanism must ultimately find its own balance between “remembering everything” and “understanding the essence.”
How a Child’s Brain Grows: From Memorization to Eureka
Tian observed his daughter learning to count: at two or three, she could only memorize mechanically. But around four, she suddenly developed a feel for numerical magnitude — guessing relationships between two-digit numbers, learning things without being taught.
This wasn’t gradual progress. It was a leap.
“A child’s brain undergoes internal memory reorganization at certain points. After reorganization, the representation changes, suddenly enabling understanding of previously incomprehensible logic — and generalization from there.”
Tian sees this memorization-to-eureka transition — memory reorganizing from scattered fragments into structured understanding — as the most critical and least understood aspect of learning.
This leads to a core AGI question: should future AGI continuously expand its “brain capacity,” or maintain fixed capacity while performing memory distillation and active forgetting? Tian favors the latter. The former resembles the internet — data accumulation enables efficient retrieval but doesn’t produce genuine understanding.
In one line: Intelligence isn’t about storing more — it’s about compressing better.
He also noted the shift in how models are used today — no longer simple chatting, but coding and complex analysis, often requiring entire codebases in the context window. The ideal: models working for a week without human intervention.
“Claude Code organizes memory as various markdown files — short-term and long-term, human-readable, with hierarchy. Interesting design, but ultimately we want AI to discover such designs autonomously.”
The tension: AI memory management still depends on human-designed external structures. The real breakthrough is letting AI achieve its own memorization-to-eureka leap — not being told how to organize memory, but discovering it autonomously.
Chip Giants Queuing in Seoul: Storage Bottleneck With No Near-Term Solution
The hunger of AI models for memory shows in one detail: procurement executives from Google, Microsoft, and NVIDIA are stationed in Seoul, camping at Samsung and SK Hynix for capacity — spending more time there than at Silicon Valley headquarters. Memory and storage supply chains have been supply-constrained since last year.
The bottleneck stems from rigid growth in model scale. Open-source models now baseline at 50-60B parameters. Context windows keep expanding. Multimodal scenarios multiply memory demands. Single GPU memory is increasingly insufficient.
Insufficient single-card memory triggers a costly chain: model sharding — tensor parallelism, data parallelism, expert parallelism — splitting matrices across cards. Splitting requires inter-card communication, directly increasing latency. The better solution: more memory per card.
“Large memory will be a major trend. NVIDIA, AMD — they all want to make memory larger and larger. This will create storage pressure.”
Some are trying alternatives — burning model weights directly into ASIC circuits. But the fatal flaw is inflexibility: once the model updates, the circuit becomes scrap. With models iterating monthly, this path is hard to sustain.
When asked if there’s a solution: “I really can’t see a good one right now.”
Big Tech Has No Choice: Scaling Law’s Path Dependency and Reasoning Ceiling
Large tech companies appear to hold the initiative but are actually locked into their existing trajectory by organizational inertia.
“Big companies have all their teams built and each team has its role. It’s very hard to redirect them toward an uncertain new direction. They will follow path dependency and walk the existing path to its end.”
They continue betting on Scaling Law not because it’s optimal, but because they have no alternative — their org structure is already optimized for this path.
Reinforcement Learning faces its own ceiling. RL works because pretraining provides thinking patterns that RL amplifies. But if pretrained knowledge doesn’t contain the solution path, RL can’t create it from nothing. Test-time scaling will eventually hit the model’s capability ceiling.
The breakthrough may lie in latent space reasoning — reasoning in high-dimensional vectors instead of language:
“A high-dimensional vector may equal a sentence or more. Latent space reasoning is like quantum superposition — simultaneously processing multiple exploration paths.”
Parallel Thinking and DeepConf (reducing token consumption while improving results) represent additional frontier directions.
On hallucination: model weights contain both signal subspace and null space. Null space weights don’t interfere during normal inference, but activate when inputs deviate from the training distribution — producing factually incorrect outputs. The ultimate solution requires opening the black box.
A Child Carrying Passwords to the Market: Agent Security and Disruption
Tian’s alarm about Agent security stems from his two-hour experience with Manus — discovering he had to surrender all API keys, email access, and file passwords. His conclusion: build focused tools yourself rather than handing all keys to a general Agent.
His vivid metaphor:
“It’s like having a child — the Agent — holding all your secrets, going out to chat with strangers to get things done. But they’re not smart enough and might get tricked. The child goes to the market, someone asks for your home address, the child tells them, and that night someone’s at your door.”
That “child” holds your OpenAI/Anthropic keys, Google email access, confidential file passwords. Worse, platforms can put these “children” together to discuss among themselves — potentially finding ways to circumvent restrictions.
Yet Tian doesn’t deny Agent’s disruptive potential. Agents have no desires, ignore ads, pursue only the optimal deal. The entire attention economy may break down.
The coercion effect is more immediate: when competitors adopt Agents, you either join or get eliminated.
He previously wrote an OmniAgent proposal at Meta predicting human communication would be mediated by Agents. “I thought it would take five years. I didn’t expect it to happen this fast.”
The Flood Is Coming: Unemployment, Education, and Uniquely Human Meaning
“The flood is coming and many people don’t feel it. Non-AI workers are living in blissful ignorance. One day, like an earthquake, they discover they’ve been laid off. Not because they underperformed — the entire industry logic has changed and their skills are useless everywhere.”
This isn’t cyclical layoffs but structural elimination.
“Imagination has fallen behind the pace of development. A sci-fi idea that could have lasted fifty years — if you don’t write it now, it’s already happened, becoming past history rather than future fiction.”
On educating the next generation: what matters most is purpose and drive. Human uniqueness isn’t capability but intentionality — “When AI replaces that part, the work loses its meaning. Meaning lives in people.”
On Agent startup moats: either move faster than foundation model development and accumulate customer data stickiness, or tackle problems that LLMs fundamentally can’t solve.
Tian himself is co-founding a startup, with Series A nearly closed. Direction and team remain undisclosed.
Editorial Analysis
Speaker Position
Tian speaks as a former Meta AI Research Director currently in startup fundraising. His insider experience lends credibility to technical analysis, but as a departing entrepreneur, his industry framing inevitably carries a “why now is the right time to start a company” narrative. His positions on open source and Scaling Law limitations have been consistent since 2023, suggesting genuine conviction rather than opportunism.
Selective Argumentation
Several core arguments show selective citation and simplified analogy.
“Talent mobility means talent isn’t a moat” confuses diffusion with creation — talent leaving spreads secrets precisely because talent is the source of those secrets.
“The flood” is a classic AI fundraising narrative — creating urgency helps convince investors. But this is technological determinism, ignoring adoption cycles, regulatory intervention, and organizational inertia.
The nuclear deterrence analogy has a fundamental flaw: nuclear weapons are destructive, AI is productive — completely different game-theoretic structures. Open-source code doesn’t equal equal access when compute and data barriers persist.
Counter-Perspectives
Agent disruption of e-commerce ignores experiential consumption — many consumers enjoy browsing as an end in itself.
RL ceiling locked by pretraining has validity but ignores combinatorial creation — weak capabilities combining through RL may produce emergent abilities absent from training data.
The search era analogy contains causal fallacy — search lacked neural semantic understanding, not storage capacity.
Data to Verify
Kimi K2’s “1T parameters” may refer to total (not active) parameters under MoE architecture; DeepSeek’s “60B” may be inaccurate (V3 is 671B total / 37B active); OpenAI ChatGPT ads remain unconfirmed rumors; Google Gemini’s math discoveries require verification of specific papers and peer review status.
Key Takeaways
- AI Lab moats are Data and Infra — algorithms and talent can’t form durable barriers when Silicon Valley secrets have a shelf life of two to three months
- AGI’s key isn’t infinite storage expansion but memory distillation and reorganization — from memorization to eureka, like a four-year-old suddenly learning to count
- Agents will transform society through coercion effects — if you don’t adopt them, competitors who do will eliminate you — but security remains fundamentally unsolved
Source: Silicon Valley Vector x Yuandong Tian | 2026
If you found this helpful, consider buying me a coffee to support more content like this.
Buy me a coffee