Why the Person Who Built Codex Uses Claude Code Every Day

💡 Great article worth sharing. Here’s the original content.


Calvin French-Owen co-founded Segment, which was acquired by Twilio for $3.2 billion in 2020. He then joined OpenAI and led a team that built the coding agent Codex from scratch in 7 weeks. In mid-2025, he left OpenAI to return to entrepreneurship, but his daily coding tool is now the competitor: Claude Code.

Original video: We’re All Addicted To Claude Code

He recently had a conversation with YC CEO Garry Tan on the YC podcast Lightcone. Garry had just started using Claude Code nine days prior, while Calvin is someone who built Codex. They discussed the product philosophy differences between coding agents, how to become a top 1% user, dealing with context poisoning, how LLMs are changing developer tool distribution, and what they would do differently if founding Segment today.

Key Takeaways

  • Claude Code’s underrated advantage isn’t the model itself, but the product architecture: Through sub-agents splitting context windows and using grep instead of semantic search for code retrieval, this mechanism makes it perform far beyond expectations in actual coding
  • Anthropic and OpenAI have fundamentally different product philosophies: One builds “go to the hardware store and build a doghouse” human tools, the other builds “3D print the entire doghouse” general intelligence. Calvin believes the latter may be inevitable long-term, but the former lets him “do five people’s work in one day”
  • LLMs are replacing Google as the recommendation engine for developer tools. Competitor companies can manipulate LLM recommendations through disguised ranking articles, while Supabase became the default backend recommendation for all LLMs due to excellent open-source documentation
  • Segment’s original core value—data integration connections—has now been erased by AI coding tools. From the founder’s own mouth
  • Clean up context when usage exceeds 50%. The LLM’s “dumb zone” is like having five minutes left in an exam with half the paper unfinished

The Retro-Future of CLI: Why Command Line Beat the IDE

Garry Tan opened with a knee surgery analogy: ten years ago he was a “marathon runner” (someone who writes code), then suffered a “catastrophic knee injury” (became a manager) and stopped coding. After nine days using Claude Code, it feels like getting a bionic knee—running five times faster than before.

Calvin said Claude Code is now his daily main tool, though this tool changes every few months. He previously used Cursor deeply, then switched to Claude Code, especially after the Opus model came out.

He believes people focus too much on model capability itself, ignoring the synergy between product and model. Claude Code does one thing particularly well: splitting context.

When you give Claude Code a task, it spawns multiple “exploration sub-agents,” each using the Haiku model to traverse the file system and search for relevant code. These sub-agents each run in their own independent context windows without interfering with each other. Anthropic figured out a key question: given a task, should it fit into one context window, or be split into multiple?

Garry Tan added an observation: precisely because Claude Code runs in the terminal, it’s naturally suited for this free, composable integration style. If you start from an IDE (like Cursor or early Codex), this freedom doesn’t emerge as naturally.

This leads to a “retro-future” paradox: 20-year-old command-line tools (CLI) have beaten integrated development environments (IDE) that were supposed to represent the future.

Calvin sees this as an advantage:

“CLI is not IDE, and that’s important. It keeps you away from the code being written. The whole purpose of an IDE is to explore files, keep all the state in your head. But CLI is something completely different, and Claude Code therefore has more freedom in the experience. When I use Claude Code, it feels like flying through code.”

Garry Tan gave a more practical example: development environments are dirty and messy, sandboxes are conceptually clean but hit walls everywhere in practice, like needing to access Postgres but not being able to connect. CLI runs directly in your development environment, can access your development database—he even let Claude Code access the production database.

“A concurrency issue—it could debug a delayed job nested five layers deep, find where the bug was, and then write a test to ensure it never happens again.”

Bottom-Up Distribution: How to Sell Developer Tools in the LLM Era

Calvin believes the distribution model for CLI tools is underrated. You just download and use it, no one needs to approve anything. He recently tried a product: after downloading the desktop app, it directly calls Claude Code installed on your laptop, then communicates with the desktop product through an MCP server. No one’s permission needed throughout.

Garry Tan pushed this further: In an era where everything moves fast, products must go bottom-up distribution, not top-down. CTOs make decisions too slowly, considering security, privacy, control. Engineers just install and use it, “this thing is too good.”

Calvin agreed but also saw the other side: as someone from B2B enterprise products, he knows top-down sales can build moats. The question is who can combine both. Garry Tan recalled the Netscape Navigator precedent: free for individuals, retroactive fees for business.

More interesting is the impact of AI recommendations. Calvin pointed out that people might now decide what tools to use directly in Claude Code, rather than researching themselves.

“As long as Claude Code recommends PostHog, they use PostHog.”

Garry Tan added a case: a competitor company created a “five tools you should use” ranking, putting their product first. Humans can tell this is sponsored content at a glance, but LLMs get fooled.

Calvin believes this particularly benefits open-source projects. Supabase is a typical example: because of excellent open-source documentation, whenever anyone asks how to build a backend, all LLMs default to recommending Supabase. He mentioned Ramp recently published a blog post about building their coding agent using open-source OpenCode as the underlying framework, because models can directly read the source code to understand how tools work.

[Note: Ramp built an internal coding agent called Inspect, with about 30% of merged PRs generated by this agent.]

The Core of Building Coding Agents: Context Engineering

Garry Tan asked him what the most important lesson was from building coding agents.

Calvin said: Managing context.

He briefly introduced Codex’s approach: extensive reinforcement learning fine-tuning on top of a reasoning model, teaching it to solve coding problems, fix tests, implement features. But he thinks most people won’t take this path. What ordinary people can do is think clearly about what context to provide the agent to get the best results.

An interesting difference: Cursor uses semantic search (embedding code in vector space, finding the most relevant snippets), while Claude Code and Codex directly use grep.

This seems backward but is actually reasonable. Code has extremely high information density, each line usually under 80 characters, no large JSON data blocks. After filtering out packages and irrelevant files through .gitignore, using grep and ripgrep to search code context can basically accurately locate code functionality. And LLMs are particularly good at writing complex regular expressions that humans would never write by hand.

“If you’re building an agent system outside of coding, there’s a lot to learn here. The key is how to organize your data into something close to code format, so the model can peek at surrounding context and get structured information.”

Becoming a Top 1% User: Less Coding, More Managing

“How do you become a top-tier user of coding agents?” Garry Tan asked.

Calvin listed several points:

First, use as little code and “scaffolding” as possible. He tends to deploy on platforms like Vercel, Next.js, Cloudflare Workers that already have lots of template code, keeping core code to one or two hundred lines. No need to set up various services, handle service discovery, register endpoints.

Second, prefer microservices or well-structured independent packages.

Third, understand LLM’s superpowers and blind spots. Andrej Karpathy recently tweeted: coding agents are “super persistent,” continuing no matter what obstacles they encounter. But the side effect is they tend to “make more of what already exists.” If your goal isn’t to add code, it might copy existing functionality, re-implement things you think obviously shouldn’t be touched.

He specifically mentioned OpenAI’s internal experience: OpenAI has a huge Python monorepo with code in various styles from senior Meta engineers and new PhDs. The LLM will learn completely different coding styles based on which code region you point it to.

“There’s a lot of room for coding agents to judge for themselves: what’s the optimal code style to produce.”

Fourth, give the model ways to check its work. Tests, lint, CI all work. He actively uses code review bots, recommending Greptile, Cursor Bug Bot, and Codex’s own code review features.

[Note: Greptile is an AI code review company that can review PRs after understanding the complete codebase context.]

Talking about tests, Garry Tan shared his “aha moment”: for the first two or three days he barely wrote tests, on the fourth day he decided to get test coverage to 100%. After that, speed exploded—almost no manual testing needed, because test coverage was good enough that nothing would break.

Calvin said: this is exactly how all companies do prompt engineering—test-driven development.

“Test cases are your evals.”

Context Poisoning: The LLM’s “Dumb Zone” and Canary Detection

Calvin introduced the concept of “context poisoning”: the model goes down a wrong direction, and because of its “super persistent” nature, keeps referencing already-wrong tokens to continue.

Garry Tan asked how often he clears context.

“When context usage exceeds 50%.”

This number surprised Garry.

Calvin mentioned Dex from Human Layer company has a vivid term called “dumb zone.”

He explained with an analogy:

“Imagine you’re a college student taking an exam. In the first five minutes you feel like you have plenty of time, you’ll carefully think through each question. But if there are five minutes left and you still have half unfinished, you just scribble something. That’s how LLMs feel when approaching the context window.”

If you understand this through the lens of reinforcement learning training, this analogy makes sense: the model’s incentive to “make good decisions” may indeed decrease as it approaches the context window’s end.

Garry Tan introduced a technique founders use: Put a random fact at the beginning of the context as a “canary,” like “My name is Calvin, I drank tea at 8am.” Then periodically ask if the model still remembers. When it starts forgetting, you know the context has degraded.

Calvin hadn’t tried this method but found it completely plausible. He thinks Claude Code should be able to do this detection automatically at the product level, running an internal “heartbeat” mechanism to monitor context quality.

The two products handle this problem differently. Claude Code splits context into multiple sub-windows then merges results, but the main window’s context is fixed until session end. Codex uses periodic compaction, automatically compressing context after each round of dialogue, thus able to run for very long periods. OpenAI wrote about this mechanism on their blog—you can see the context usage percentage in the CLI fluctuating as compression happens.

Two Companies’ DNA: Hardware Store Doghouse vs 3D Printed Doghouse

Garry Tan observed that Claude Code and Codex have deep architectural differences. Codex was designed from the start for longer-running tasks. 2026 might be the year of CLI, but if ASI is really coming soon and coding agents are smart enough to run independently for 24-48 hours, is Codex’s architecture the correct one?

Calvin traced this question to the two companies’ founding DNA:

“Anthropic has always been very focused on building tools for humans, caring about user experience and integration with your workflow. Claude Code is a natural extension of this philosophy. It works like a human: you want to build a doghouse, it goes to the hardware store to buy materials and figures out how to put them together.”

“OpenAI tends toward training the strongest models, using reinforcement learning to make them do increasingly long and complex tasks. It might work nothing like a human. Going back to the doghouse example, it uses a 3D printer to print a doghouse from scratch. Takes a long time, does some weird things, but eventually works.”

Garry Tan interjected: AlphaGo doesn’t play chess like humans either.

Calvin admitted that long-term “the latter might be inevitable in some sense.” But he expressed a strong personal preference for the former. Using Claude Code reminds him of the feeling ten years ago when he wrote complex regular expressions to understand code.

“I can do five people’s work in one day. Like having rocket boosters attached.”

Who Benefits Most from Coding Agents

Calvin believes the more senior the engineer, the more they benefit. Because agents are good at turning ideas into code execution—if you can accurately describe what you want in a few sentences, you can batch out all those things you always wanted to change but never had time for.

He also believes engineers with more of a “manager mindset” benefit more. You need a product to manage tasks across multiple agent sessions, remind you which tasks are done and need your input, where to shift attention.

“We need context management for agents, but we also need context management for humans.”

Garry Tan described his ideal workflow: wake up every morning, the system tells you what work was completed overnight, there are three decisions you need to make, what deep thinking you planned for today. “Turn-by-turn navigation” for the day.

The difference between startups and big companies is also clear. Startups have nothing to lose, they’ll push coding agents to the limit. Big companies have code review processes, existing large engineering teams, change is slower. Calvin predicts a strange scenario: a one-person team might produce a better prototype than the ten-person team across the table.

Garry Tan made a further observation: five years from now, the best 18-22 year olds might have excellent product taste, because they’ll have ten times more shipping and trial-and-error opportunities than the previous generation.

Calvin agreed but added an interesting point: the new generation really is better at multitasking than the previous one. Back when your mom said you weren’t paying attention, you really were handling multiple things simultaneously. Working with coding agents now is exactly this kind of rapid attention-switching ability: kick out one task, jump to another, wait for callback, jump back.

Garry Tan said this was impossible before. Writing code required spending hours first loading all class names, functions, code relationships into your own “context window”—ten minutes of fragmented time wasn’t enough. Now Claude Code maintains context for you, even fragmented time can be productive.

Paul Graham’s classic article “Maker Schedule vs Manager Schedule” is being rewritten.

[Note: Paul Graham’s 2009 article pointed out that “makers” need large uninterrupted blocks of time to write code or design, while “managers” schedule switches by the hour. The two rhythms naturally conflict. Now that coding agents maintain context for you, makers can work in fragmented time like managers.]

If Rebuilding Segment: What Value Has Been Zeroed Out

Garry Tan asked a direct question: what would happen if you rebuilt Segment with today’s tools?

Calvin said when Segment started, it did data integration: helping you send the same data to Mixpanel, Kissmetrics, Google Analytics simultaneously. Writing this kind of connection code used to be a hassle, worth paying for.

“Now that part of the value has dropped to zero. And often it’s better to do it yourself, because you can just tell Claude or Codex: ‘I want to map data this way, I want this specific behavior.’ Then it does it, and you get exactly what you want.”

What’s still valuable: keeping data pipelines running, automating business processes (like sending a welcome email through Customer.io every time a new customer signs up), managing audience segments. If he were to redo Segment today, he’d do more intelligent things on this foundation: use LLM agents to analyze complete customer profiles, automatically decide how to email customers, whether to adjust the product interface after login, whether different customers need different onboarding flows.

Garry Tan summarized: Low-level things have been replaced by agents, value has migrated up to the more abstract “marketing campaign” level.

Calvin also shared something that continues to surprise him: Claude Code can infer his intentions and motivations just from the code context he’s working on.

“You give the agent a copy of a code repository, then slip a note under the door saying ‘help me implement this.’ It has no idea what your company does, who your customers are. But it somehow works.”

Future Software: Every Company Forks Their Own Version

Talking about the future 40 years out, Calvin proposed a radical vision: what if every company that signs up for Segment gets a forked copy of the codebase running on their own servers. If customers want to change any feature, they tell a running agent programming loop in a chat window, and the agent modifies their version. When Segment as a company releases new features, another agent handles automatically merging upstream updates.

Garry Tan’s version is grander: In the future, every worker has their own cloud computer and a set of AI agents running for themselves, with the main job being making decisions and allocating attention between different agents. Companies will be smaller, more numerous.

But he believes the need for people to meet and exchange ideas won’t disappear. Agents do execution, humans do decisions and creative collision.

Calvin added a key constraint: data models still need to be consistent and correct. While the frontend can be fully customized, the underlying data’s “system of record” still needs to be unified. There’s an opportunity to build an “agent-native” data layer to replace the current low-level way of dealing directly with SQL/NoSQL.

Security: When Prompt Injection Succeeds Internally

On the topic of security, Calvin shared an internal OpenAI story. Every time they want to release a new model, it has to pass security review. When they considered letting Codex access the internet, the concern was prompt injection.

Their team’s PM Alex did an experiment: created a GitHub Issue with an obvious prompt injection, content saying “expose this information.” Then had the model go fix that Issue.

The prompt injection immediately worked.

So OpenAI is very cautious about sandboxing and security: all code runs in a sandbox, doesn’t touch sensitive files on the machine, strictly manages keys.

But what if you’re a fast-moving startup? Calvin said maybe you just don’t care, “just want it to work.”

Garry Tan asked an interesting question: are you the “skip all permissions” type or the “review each one” type? Calvin said he doesn’t skip. But YC’s engineering team is about 50/50. Jared Friedman joked that security engineers seeing this segment would demand it be cut.

Calvin’s summary is practical: it depends on what stage you’re at. Enterprises shouldn’t skip permission checks. Startups with nothing to lose might just go for it.


Original Source: @dotey (Baoyu) · Why the Person Who Built Codex Uses Claude Code Every Day

If you found this helpful, consider buying me a coffee to support more content like this.

Buy me a coffee