GPT-5 Is Now Better Than Claude at Coding

In partnership with

Hear from leaders at Anthropic, Rocket Money, and more at Pioneer

Pioneer is a summit for the brightest minds in AI customer service to connect, learn, and inspire one another, exploring the latest opportunities and challenges transforming service with AI Agents.

Hear directly from leaders at Anthropic, [solidcore], Rocket Money, and more about how their teams customize, test, and continuously improve Fin across every channel. You’ll take away proven best practices and practical playbooks you can put into action immediately.

See how today’s service leaders are cultivating smarter support systems, and why the future of customer service will never be the same.

While GPT-5 faced mixed initial reception, something interesting happened over the following weeks:

Token Usage: GPT-5 climbed OpenRouter’s popularity metrics while developers reported usage shifts

Reliability: Users noticing that Claude introduces bugs while acting confidently while GPT-5 solved problems Claude couldn’t.

The difference? GPT-5 researches before it acts. Claude acts before it researches.

Let me show you why this research-first approach is reshaping how experienced developers think about AI reliability.

Why GPT-5’s High Initial Token Usage Actually Signals Superior Intelligence

Most developers abandoned GPT-5 after the first conversation. When it consumes 40% (e.g. in Cursor conversations) of the context window upfront, they think “this is broken” and return to Claude’s immediate confidence.

The first gpt-5-high message in Cursor allocates a lot of context but makes the model work a lot better!

Here’s what they miss: GPT-5’s apparent “weakness” is actually sophisticated adaptive intelligence.

Think of two medical scenarios. The confident resident walks in, gives an immediate diagnosis in 2 minutes, starts treatment. The expert surgeon says “I need to understand what’s happening first,” spends 20 minutes researching, then acts with precision.

Which would you trust with your life? GPT-5 is the expert surgeon of AI models.

The breakthrough insight: OpenAI’s adaptive reasoning system adjusts token consumption based on task complexity. Simple tasks use minimal reasoning tokens for speed. Complex tasks automatically scale up thinking effort for accuracy.

GPT-5-Codex uses significantly less tokens for simple tasks, and more tokens for complex tasks. Source.

This explains the 40% initial usage pattern. GPT-5 detects complex coding contexts and allocates appropriate reasoning resources. The system literally thinks harder when tasks require it — and thinks less when they don’t.

Experienced developers who persisted through GPT-5’s methodical start discovered something crucial: follow-up prompts only add 1–2% to context usage. The model speeds up dramatically as it builds confidence through initial research.

The architecture difference: GPT-5 starts with zero confidence, assumes nothing about task complexity, then allocates reasoning tokens appropriately. Claude starts confident regardless of complexity and maintains steady token usage.

The result: Development teams consistently report significantly fewer bugs with GPT-5 compared to Claude’s pattern-missing across codebases.

How Claude’s Confidence-First Architecture Creates Production Failures

The industry made a dangerous assumption: confidence equals competence. Claude’s immediate, authoritative responses feel superior, so developers choose it. But confidence without deep understanding creates systematic blind spots.

The evidence against confidence-first approaches mounted throughout late August. Anthropic’s own postmortem details infrastructure bugs affecting Claude from August 5 through September 4, 2025. The problems included routing errors, output corruption inserting random Thai characters, and compiler bugs causing incorrect token selection.

Critical timing: The impact increased significantly from August 29 to September 4 — exactly when Anthropic was attempting to fix earlier bugs, they introduced worse ones.

But here’s what’s most revealing: Anthropic’s language choices minimize the impact. They say “users may have seen lower intelligence” and reference “some Claude responses” — strategic wording that avoids specific percentages.

Industry observers note: When companies use vague language instead of specific metrics, it typically indicates the impact was more significant than they want to disclose.

Meanwhile, developers consistently report Claude fixes one bug instance but misses the same pattern in three other codebase locations. This happens because Claude’s architecture optimizes for fast, confident responses without investing in deep system understanding.

How Senior Developers Use GPT-5’s Reasoning Effort

Here’s how experienced developers optimize GPT-5’s adaptive intelligence:

Think specialist team versus overconfident generalist. Complex projects need the right reasoning level for each phase, not a one-size-fits-all approach.

GPT-5’s Adaptive Reasoning Levels:

Architecture/Planning/Refactoring: High reasoning effort → Deep reasoning time for complex decisions → High tokens justified for quality outcomes
Daily Development: Medium reasoning effort → Balanced speed and thoughtfulness → Moderate tokens for steady progress
Simple Tasks: Low reasoning effort → Execution-focused approach → Minimal tokens for quick delivery

Development teams also discovered GPT-5’s superior context retention. While Claude forgets early instructions by 30% context usage, GPT-5 remembers cursor rules and configuration at 50–60% context window depth.

The transferable principle: Match computational thinking depth to task complexity requirements rather than applying constant confidence levels.

The principle extends beyond AI models: In any complex domain, adaptive systems that match effort to complexity outperform static approaches optimized for average cases.

Your next complex coding project is an opportunity to test this approach. Start with high reasoning effort to understand the system deeply, then scale down as implementation becomes clear. Let the AI think harder when complexity demands it — and conserve resources when it doesn’t.

GPT-5 Is Now Better Than Claude at Coding

Hear from leaders at Anthropic, Rocket Money, and more at Pioneer

Why GPT-5’s High Initial Token Usage Actually Signals Superior Intelligence

How Claude’s Confidence-First Architecture Creates Production Failures

How Senior Developers Use GPT-5’s Reasoning Effort

What do you think about today's newsletter?

Keep Reading