PCMag editors select and review products independently. If you buy through affiliate links, we may earn commissions, which help support our testing.

Go, Claude! Twitch Fans Cheer on an AI Playing Pokémon Red Surprisingly Well

Thousands tune into a livestream of Claude 3.7 Sonnet navigating the game on its own better than its predecessor. Anthropic tells PCMag it's a more effective way of measuring AI progress.

 & Emily Forlini Senior Reporter

Our team tests, rates, and reviews more than 1,500 products each year to help you make better buying decisions and get more from technology.

Our Expert
LOOK INSIDE PC LABS HOW WE TEST
65 EXPERTS
43 YEARS
41,500+ REVIEWS
(Credit: NurPhoto / Contributor / NurPhoto via Getty Images)

A livestream of an AI model playing Pokémon Red on Twitch is captivating audiences this week.

The model is Anthropic's latest release, Claude 3.7 Sonnet, which is navigating the classic Gameboy game with no prior training.

"HE'S DOING IT," says one onlooker in the live chat. "Let's see what happens now," another adds. "GO, CLAUDE, GO!"

Claude 3.7 Sonnet plays Pokémon Red live
(Credit: Twitch)

Although the livestream page claims the experiment is "a passion project made by a person who loves Claude and loves Pokémon," it was actually set up by Claude's creator, Anthropic.

The idea to unleash Claude on Pokémon Red began internally at Anthropic in 2024, with an earlier model called Claude 3.5 Sonnet. The project "gained a cult following within the company," David Hershey, Anthropic technical staff member, tells PCMag. "The livestream on Twitch was a natural extension of that internal enthusiasm...Our team quickly created the ongoing livestream so anyone could watch Claude attempt to catch 'em all."

Claude 3.7 is getting further in the game than its predecessor Claude 3.5. While Claude 3.5 could catch Pokémon and leave the starting area of Pallet Town, the "real breakthrough" with Claude 3.7 Sonnet is that it can complete challenges, collecting three badges from Pokémon gym leaders, Hershey says.

Video game progress is a lot easier to understand than the typical AI improvement metrics that OpenAI, Grok, Google, and all AI companies release with each new model.

Claude 3.7 Sonnet specs
(Credit: Anthropic)

That's why Claude included its new models' gaming chops in the 3.7 Sonnet announcement. "We're slowly moving away from traditional benchmarks in favor of more 'accessible' tests that can be understood by a larger group of people," says Dianne Penn, lead product manager of research at Anthropic. "We're at a point where standard evaluations don't tell the full story of how much more capable each version of these models are."

Measuring the nuances of AI model improvement is a difficult task. This week, OpenAI admitted it struggled to measure the improvements of its latest model, GPT-4.5, and had to develop its own testing scale for "vibes," or humanlike behavior.

Diagram of how Claude plays the game
(Credit: Twitch)

When playing Pokémon Red, Claude can perform actions with the main game buttons (A, B, Up, Down, Left, Right, Start, Select) and navigate to specific coordinates on the screen. It takes screenshots and processes the images to understand its surroundings. As it plays, it updates its knowledge base with new information and keeps building upon it.

It's not perfect, and sometimes gets confused by the navigation and where it is. It's not always successful, either, but human onlookers are finding its solutions to challenges creative. In that sense, it's providing a fresh perspective on how to beat the game that humans may not have thought of, along with some good internet fun.

About Our Expert

Emily Forlini

Emily Forlini

Senior Reporter

My Experience

As a news and features writer at PCMag, I cover the biggest tech trends that shape the way we live and work. I specialize in on-the-ground reporting, uncovering stories from the people who are at the center of change—whether that’s the CEO of a high-valued startup or an everyday person taking on Big Tech. I also cover daily tech news and breaking stories, contextualizing them so you get the full picture.

I came to journalism from a previous career working in Big Tech on the West Coast. That experience gave me an up-close view of how software works and how business strategies shift over time. Now that I have my master's in journalism from Northwestern University, I couple my insider knowledge and reporting chops to help answer the big question: Where is this all going?

My Expertise

I'm the expert at PCMag for on-the-ground feature reporting and trending tech news, with a particular focus on electric vehicles and AI. I've published hundreds of articles and am also a podcast host, a bi-weekly tech correspondent for CBS News, a panel speaker and moderator, and a frequent contributor to a range of news and radio channels around the country.

The Technology I Use

All the latest from Apple and Microsoft, but I'll never give up my wired headphones! 

Read full bio