(Credit: Zain bin Awais/Claude/OpenAI/Getty Images)
Claude made a name for itself as the go-to tool for programmers and vibe coders alike, enabling the creation of countless apps (including mine). I love the Warframe build calculator app I created, and I use and improve it all the time. However, as I continued developing my app with Anthropic’s latest reasoning model, Opus 4.7 in Claude Code, its bugs, mistakes, and usage limits made me switch to OpenAI’s new GPT-5.5 reasoning model in the Codex app. Anthropic is working to close the gap between the two, but here's why GPT-5.5 is still my AI coding tool of choice for now.
Opus 4.7: Powerful on Paper, Frustrating in Practice
I’ve been developing my Warframe calculator app for almost a year with Claude, which involves a huge database of hundreds upon hundreds of items, rigorous data collection and source verification, and tons of interdependent calculations. For this article, I continued working on my app using Opus 4.7 in the Claude Code app, primarily with its Extra High intelligence setting. My goal was to get the app into a feature-complete state (supporting every relevant ability, character, and item in the game) and then conduct an exhaustive audit of its content to ensure accuracy and squash any bugs.
Before diving back into the code in earnest, I outlined a few strict requirements for working on my project in Claude Code’s memory, such as breaking big tasks into batches, delivering new versions of the app after changes go live (and detailing changes in a separate version history document), establishing a source hierarchy, requiring two-source verification, and more.
As I used Opus 4.7, it regularly made mistakes. All LLMs make mistakes, but Opus 4.7 did so often. For example, as I mentioned above, I have a strict source hierarchy and a two-source verification policy for data in my app. However, Opus 4.7 repeatedly sourced unverified data, forcing me to clarify that pulling nine values from one site and one value from another, or getting data from two pages on the same site, doesn't count as separate sources. Even after my clarifications, Opus 4.7 provided incorrect information or information from a single source.

Opus 4.7's generous context window of one million tokens theoretically means it can parse longer code or document chunks and retain more information. However, since Opus 4.7 makes more mistakes the closer you get to maxing out its context window, you benefit less from the size of its context window than you might expect; you're better off starting a fresh session instead of filling it up completely.
As such, I felt like I had to cheat myself out of using what should have been a compelling feature. I wanted to load the full documentation guide I created for adding a new character into Opus 4.7’s context window, but doing so significantly filled the context window and often caused Opus 4.7 to forget things in the guide. I resorted to requiring Opus 4.7 to reference explicitly relevant sections of the guide first when adding new characters, but that meant it missed out on useful information in other sections.

Opus 4.7 has other bugs, too. Take its web fetch and web search abilities, for example. Its web search feature lets it search the internet and retrieve snippets of information, whereas its web fetch feature lets it access a URL. Multiple times after I hit a usage cap and returned to a later session, Opus 4.7 forgot it had the web fetch capability and deferred to a simple web search that returned far lower-quality information until I re-prompted it to use its web fetch feature. In some cases, this created more work for me, as I didn’t always notice Opus 4.7 inserting web search data into my app, requiring later verification that cost more usage. In other instances, Opus 4.7 stalled, consuming tokens and requiring additional prompts.
Despite my complaints, Opus 4.7 is still a capable LLM. Bugs are frustrating, but they don’t have much to do with model quality. Unfortunately, these mistakes stand out more than the countless prompts that went through without issue, allowing me to continue working on my app. These issues also stick out, given what Anthropic promised with the launch of Opus 4.7, claiming it was so skilled in advanced software engineering that users reported they could hand off their hardest coding work to it.
GPT-5.5 vs. Opus 4.7: Smaller Context, Fewer Headaches
Soon after Anthropic launched Opus 4.7, OpenAI released GPT-5.5, which promised much of the same standout coding ability as Opus 4.7. So, I moved my project over to OpenAI’s Codex app. I alternated between GPT-5.5’s High and top-end Extra High intelligence settings, and I occasionally set its speed to Fast (which runs faster at the cost of increased usage). Opus 4.6 had a similar speed setting, but Opus 4.7 doesn’t at the time of writing. Like with Claude Code, I set up a workspace in Codex, adding all the same requirements to its memory before beginning any work.
Although GPT-5.5 isn’t perfect, I experienced fewer issues with it than with Opus 4.7. For example, the sourcing problem I mentioned above wasn't an issue, and GPT-5.5 never relied on snippets of information from web searches. Similarly, GPT-5.5 never hung while processing a prompt, even across workloads that stretched up to an hour. I was even able to break a comprehensive audit of my app into well over 50 individual steps and use GPT-5.5 to execute each step sequentially without any major problems, aside from discovering new issues to address.

GPT-5.5’s context window is a paltry 258,000 tokens compared with Opus 4.7’s 1,000,000, but in practice, I didn’t notice much difference. Yes, GPT-5.5’s context window fills up quickly, but I didn't encounter any issues as it filled. GPT-5.5’s automatic context compaction process doesn’t have any major problems, either, clearing out its context when full to continue working. At worst, I had to occasionally prompt GPT-5.5 to continue a task, which isn’t a major problem. Even tasks that you might expect to require significant context, such as adding a new character that requires extensive referencing of my lengthy character addition guide and significant web access, were no problem for GPT-5.5.
Although not specific to GPT-5.5, Codex also has some useful quality-of-life features that Claude Code doesn’t. For example, although Claude Code lets me toggle my preview window between desktop and mobile layouts, Codex makes it easy to choose from a wider variety of device targets or set a custom resolution. This allows me to get a concrete sense of how my app looks on any device someone might use to access it, whether that’s an iPad Mini or a 720p desktop monitor.

Like with any AI service, using GPT-5.5 requires careful prompt engineering, often across many sessions, to get the results you want. However, vibe coding my Warframe build calculator app has been smooth overall with fewer frustrations compared with Claude Code and Opus 4.7. Usage limits are another major win for GPT-5.5 over Opus 4.7, but not for the reasons you might think.
Usage Limits Are the Real Bottleneck, Not Model Intelligence
My biggest takeaway from using Opus 4.7 is how punishing its usage limits are, even with Claude’s Max 5x plan. Max is Claude’s top plan, available in 5x ($100 per month) and 20x ($200 per month) versions; those multipliers indicate how much more usage each provides compared with Claude’s standard Pro plan ($20 per month). Nonetheless, I almost always burned through my five-hour limit (which allows only so much usage in a given 5-hour window) in just a handful of prompts, over half an hour to an hour. Accessing the web, in particular, sucks up a lot of usage, so prompting Claude to go through my app and verify the data within it consistently capped out my usage very quickly.
However, during the process of writing this article, Anthropic announced a partnership with SpaceX to “substantially increase [its] compute capacity.” At the same time, Anthropic announced it was doubling Claude Code’s five-hour limits and removing peak-hour limit reductions in Claude Code. This is a major improvement on the session-based usage pain I experienced, but it’s important to note that Anthropic didn’t increase the weekly usage limit. As such, this means that you can use Claude Code longer in any given 5-hour session, but you will also hit your weekly limit roughly twice as fast. If you use Claude Code regularly during peak hours, your weekly usage limit should last longer than before, but it's unclear by how much.

In my limited experience using Opus 4.7 in Claude Code after the usage limit increase, session limits are more lax, but it’s still very easy to burn through your usage allocation in a five-hour period. For example, I sent Opus 4.7 a series of prompts to build a framework document outlining how my app might support building the weapons of Warframes that have built-in weapons, and I exhausted my limit in well under two hours. I noticed some single prompts still could take up to 10% of my five-hour usage, too. The session described above also took up about 10% of my weekly usage. So, even after the usage limit increase, Opus 4.7 can still be quite punishing. You might not feel this as deeply through July, however, since Anthropic announced a temporary increase to Claude Code's weekly usage. The experience above is what you can expect once this promotion ends, barring any further changes.
I had absolutely no usage problems with GPT-5.5 in Codex. In fact, I could use GPT-5.5’s highest intelligence setting (whereas I could only use Opus 4.7’s second-highest setting) while occasionally boosting its speed (at the cost of even more usage) for a full five hours without hitting a limit. It was a night-and-day difference. The ChatGPT Pro 5x subscription I used is equivalent in price and features to the Claude plan I tried. During my testing period, OpenAI offered temporarily increased usage limits, so you might not experience quite as stark a difference, but it's still evident.

OpenAI also gives you an entirely separate usage cap for its GPT-5.3 Codex Spark model. Codex Spark isn’t nearly as intelligent as GPT-5.5, and it has a much smaller context window, but you can effectively use it for simple coding tasks you don’t want to waste GPT-5.5 usage on. For example, I noticed a few stats on a certain ability in my app that scaled with strength rather than duration, so I tasked Codex Spark with fixing it, and it did so successfully in under a minute. However, you still need to be careful when using Codex Spark. When I gave it even slightly more complex prompts, it caused problems. For example, I asked Codex Spark to fix a bug with saving builds in my app, and it introduced significant amounts of mojibake, which is gibberish text related to issues with character encoding.
In short, AI usage limits are a nightmare, and everything is subject to change further. For instance, Claude was the de facto best choice for powering OpenClaw just a few months ago until it decided to charge you extra to do so. No matter what the usage situation is with the AI service you currently use, chances are it won’t stay that way. That's especially frustrating when subscriptions can cost upward of $100 per month. I enjoy developing my app, and it’s genuinely useful to me when making builds in Warframe, but I don't think spending $200 per month on a top-tier plan to continue developing it is worth the money, whether that’s with ChatGPT or Claude.
Improvements Aside, Claude Users Remain Frustrated
Outside of Anthropic’s recent increase in usage limits, it published a postmortem in April following many Claude Code quality issue reports. The company cited a bug that made Claude more forgetful, a change to Claude Code’s default intelligence level, and a system prompt instruction to reduce verbosity as issues, noting that it fixed all of them. Anthropic also promised to add tighter controls on system prompt changes, ensure a larger share of staff use the public build of Claude Code, and promise more transparency on product decisions and the reasoning behind them.
All of the above is good news, but the community sentiment remains much more mixed. The Claude Code subreddit, for example, is full of complaints. In the wake of Anthropic’s session usage limit increase, some are talking about how they’re blowing their weekly limits far quicker than ever. (Claude's temporary weekly usage increase might help somewhat alleviate that pain.) Meanwhile, the subreddit has continuously featured a lot of unhappy Claude users since the launch of Opus 4.7, who cite a variety of issues around usage limits and model quality. Of course, this one subreddit isn’t representative of every Claude user, but the consistent complaints it generates certainly indicate that problems persist.
I don’t wear the same rose-tinted glasses some seem to when talking about Opus 4.6, and I haven’t noticed a profound regression in Opus’ functionality with 4.7 while vibe coding my Warframe calculator app. However, aggressive usage caps and inconsistent model quality in Opus 4.7 are exactly why I’m continuing to develop my app with GPT-5.5, so I can’t say the complaints I read online feel unfair or unfounded, either.
I’m Sticking With GPT-5.5, But Your Mileage May Vary
Above all else, keep in mind that I’m a hobbyist vibe coding a relatively simple app largely for personal use, not a professional programmer leveraging AI at work on an app for thousands or even millions of people. Your needs are likely different, and Claude could very well be the best (or only) solution for you. However, given my experience, the several issues Anthropic acknowledged, and the sentiment of many Claude subscribers, I’m not going back to Opus 4.7 for the time being.
GPT-5.5 and Codex suit my purposes well, and I’ve had a great time using them so far. If you’re a fledgling vibe coder, if for no other reason than to make your money go further in terms of usage, I recommend trying out GPT-5.5 before opening up Claude. I still plan to keep my eye on Claude Code and Opus as Anthropic makes improvements, however.
Disclosure: Ziff Davis, PCMag's parent company, filed a lawsuit against OpenAI in April 2025, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.


