Anthropic's Claude 4 Models Can Write Complex Code for You

(Credit: NurPhoto / Contributor / NurPhoto via Getty Images)

Anthropic released two new Claude models today with a focus on coding and software development.

Claude Opus 4 and Claude Sonnet 4 aim to set "new standards for coding, advanced reasoning, and AI agents," Anthropic says. The new models can "deliver superior coding" and respond more precisely to user instructions. They can "think" through complex problems more deeply, and search the web along the way.

Opus 4, in particular, is "the world's best coding model," Anthropic says, and can operate independently without human intervention. When shopping app Rakuten tested Opus 4, it ran independently for seven hours. Many companies are rapidly adopting AI models this purpose. Microsoft says 30% of its code is already written by AI, and Meta aims for 50% by 2026.

"These models are a large step toward the virtual collaborator—maintaining full context, sustaining focus on longer projects, and driving transformational impact," says Anthropic.

Anthropic did not increase the price for developers who access the models through its API. Opus 4 is $15/$75 per million tokens (input/output) and Sonnet 4 at $3/$15. OpenAI's o3 model, which also promises "leading performance on coding," sits between the two at $10/$40.

Claude Code is also now available to everyone with this release. It integrates the AI model into developers' existing tools, and helps them get their work done. Claude's proposed edits appear in-line once installed.

It seems like every AI company these days is offering their "biggest and smartest model yet." Anthropic backs up its claims by noting Claude 4 is the best at two benchmarks, the SWE-bench (72.5%) and Terminal-bench (43.2%). In the chart below, OpenAI models and Google Gemini 2.5 Pro trail in performance.

Claude 4 model scores on key software engineering benchmarks.

Since AI benchmarks are notoriously difficult for the layperson to understand, Anthropic has resorted to portraying its progress through video games. It built a way for its models to play Pokémon Red autonomously, livestreamed via Twitch. The Sonnet 3.7 model progressed further in the game than Sonnet 3.5, and now Anthropic says the Claude 4 Models are playing the best yet, thanks to a new ability to store "memory files" of key information.

"This unlocks better long-term task awareness, coherence, and performance on agent tasks—like Opus 4 creating a 'Navigation Guide' while playing Pokémon," Anthropic says.

Claude Opus 4 records key information to help improve its game play.

About Our Expert

Emily Forlini

Senior Reporter

My Experience

As a news and features writer at PCMag, I cover the biggest tech trends that shape the way we live and work. I specialize in on-the-ground reporting, uncovering stories from the people who are at the center of change—whether that’s the CEO of a high-valued startup or an everyday person taking on Big Tech. I also cover daily tech news and breaking stories, contextualizing them so you get the full picture.

I came to journalism from a previous career working in Big Tech on the West Coast. That experience gave me an up-close view of how software works and how business strategies shift over time. Now that I have my master's in journalism from Northwestern University, I couple my insider knowledge and reporting chops to help answer the big question: Where is this all going?

My Expertise

I'm the expert at PCMag for on-the-ground feature reporting and trending tech news, with a particular focus on electric vehicles and AI. I've published hundreds of articles and am also a podcast host, a bi-weekly tech correspondent for CBS News, a panel speaker and moderator, and a frequent contributor to a range of news and radio channels around the country.

The Technology I Use

All the latest from Apple and Microsoft, but I'll never give up my wired headphones!

Read the latest from Emily Forlini

Read full bio