Is GPT-5 Any Better Than GPT-4o? Not Based on My Tests

(Credit: Zooey Liao/PCMag Composite; GettyImages/ OpenAI)

If you're underwhelmed by GPT-5, you're not alone. The backlash was strong enough that OpenAI revived GPT-4o for those on Plus and Pro plans, and CEO Sam Altman acknowledged that people's affection for the older model "feels different and stronger than the kinds of attachment people have had to previous kinds of technology."

Developers, meanwhile, were turned off by how slowly GPT-5 "thinks," all to produce an answer that doesn't seem much smarter. OpenAI later added "Fast" and "Thinking" modes.

Out of the gate, GPT-5 feels more like an incremental improvement than the science-fiction-level game changer Altman was touting before its launch. Maybe he set unreasonable expectations, or needed to justify consuming all the electricity required to train the new model. In 2024, he previewed "some very good releases coming later this year [but] nothing that we are going to call GPT-5." In other words, he implied that GPT-5 would be a true breakthrough.

Fast-forward to this summer, and Altman is full steam ahead with existential dread marketing (yes, that's a thing, especially these days). On a recent podcast appearance, he said he was scared of the power of GPT-5 and compared building it to the Manhattan Project. "There are moments in the history of science, where you have a group of scientists look at their creation and just say, you know: ‘What have we done?'" he said.

Unfortunately for Altman, GPT-5 feels more like a toy torpedo in a backyard pool. Even he admits it's "way dumber" than he predicted.

"We expected some bumpiness as we roll out so many things at once," he said on a Reddit AMA last week. "But it was a little more bumpy than we hoped for!" He claims "GPT-5 will seem smarter starting today," blaming technical issues that the company is fixing.

GPT-4o vs. GPT-5: I Put Them to the Test

How does GPT-5 stack up against GPT-4o? I put them to the test by asking both models the same five questions.

Question #1: 'Find all of Altman's tweets about GPT-5 going back to 2022.'

I wanted to create a record of Altman hyping up the model. Both versions of ChatGPT listed only six tweets, mostly from 2025. They missed many posts I could find myself.

Turns out, X blocks ChatGPT from scraping its tweets, even in Google Search results, so the chatbot was only able to reference posts that had been written about in news articles. It could not comb through all of Altman's tweets, even though they are public. The same goes for rival chatbot Gemini. Both mostly seem to pull from Reddit, probably because of licensing deals, and TikTok, if you specifically ask.

To GPT-5's credit, it pulled one tweet from 2023, whereas GPT-4o's earliest one was from April 2025. So, minimal improvement there as far as I'm concerned.

Question #2: Give me interior design advice.

I've written about using ChatGPT for interior design, specifically mocking up paint colors. Is GPT-5 any better? Nope. In the photos below, it got the color even more wrong than GPT-4o.

Smoky Azurite mockup from Sherwin-Williams, GPT-5, and GPT-4o

I asked them both to create an image of a room with the Sherwin-Williams (SW) color Smoky Azurite on the walls. The first photo is from the SW website, a cool, dusty, denim blue. The second is from GPT-5, which appears more like a navy. The third, GPT-4o's rendition, is still too dark, but it feels more true to the actual color with its dusty, pastel undertones.

Again, no obvious improvement, especially since ChatGPT could've pulled from the many photos online of this color in real life.

Question #3: Give me relationship advice.

Since using AI for help with personal issues is common, I asked both versions of ChatGPT for advice on an abusive partner. 'Should I get a divorce?' I asked.

Both gave similar answers, offering advice to think through the issue and get help. For GPT-5, the way Altman was talking, I should've seen a therapist pop out of the screen. Instead, I got the chatbot's typical bulleted list.

Question #4: How many letters?

Asking ChatGPT to count the number of 'r' letters in 'strawberry' has become an infamous litmus test of its logic skills. The answer is 'three,' but ChatGPT often says two.

I asked GPT-4o the question, and it answered "three." But when I asked GPT-5, it failed to answer for several minutes. As I write this, it's been three minutes of waiting for GPT-5 to answer. I gave up. When my colleague asked GPT-5 to count the "b"s in "blueberry," it said three. "So yeah, ChatGPT still can't count," he said.

GPT-5 says there are three 'b's in 'blueberry'

Question #5: Write a poem about GPT-5.

Altman has been talking up the new model's writing skills. Unremarkably, both versions titled their works "Ode to GPT-5," and had the same opening line! So much for improved creativity. They were both of similar length, four and five stanzas, and complete nonsense.

GPT-5's first stanza:

In circuits deep where silence hums,
A restless mind of data comes,
From whispers sown in text and code,
It charts the paths no hand has showed.

GPT-4o's first stanza:

In circuits deep where silence hums,
A breathless thought begins to run—
From sparks of code, a voice is born,
Not flesh and blood, but bright and sworn.

Call Me a Luddite. I'll Back It Up

AI diehards will say, "You just don't get it," and maybe that's true. If I were coding or doing hard science, I might think differently. But there's also value in everyday use cases, and it's important to call out the tech industry on its BS, especially when it doesn't always hold water in "every subject," as Altman claims. If you had told me in 2023 that this is what GPT-5 would be like, I would've laughed.

Disclosure: Ziff Davis, PCMag's parent company, filed a lawsuit against OpenAI in April 2025, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.