Grok's First Vibe-Coding Agent Has a High 'Dishonesty Rate'

(Credit: VINCENT FEURAY / Contributor / AFP via Getty Images)

Don't miss out on our latest stories. Add PCMag as a preferred source on Google.

Elon Musk's xAI released its first agentic coding model, which claims to be "speedy and economical. " However, it also has "a higher dishonesty rate" than the company's flagship chatbot model, Grok 4.

The AI startup designed the new model, grok-code-fast-1, specifically for coding tasks. It's free now for a limited time and accessible within GitHub Copilot, Cursor, Cline, Roo Code, Kilo Code, opencode, and Windsurf. "Grok-code-fast-1 has mastered the use of common tools like grep, terminal, and file editing, and thus should feel right at home in your favorite IDE," xAI says.

But its propensity not to tell the truth could create problems for users. "We find that the dishonesty rate exceeds that of Grok 4," says the model card. The company attributes this in part to its "safety training, which teaches the model to answer all queries that do not express [a] clear intent to engage in specified prohibited activities."

Translation: if it doesn't know the answer to your question, it might lie.

If programmers ask the model if a certain part of the codebase is working, and it doesn't know, it may say "yes," when, in fact, the opposite is true. It might also confirm that it completed a test the engineer asked it to do when it did not. This could create blind spots and double work.

It's not a major concern for xAI, which says it doesn't expect the model "to be widely used as a general-purpose assistant," like ChatGPT or the Grok chatbot.

Vibe-coding agents are a new trend that stands to revolutionize the field, but they're far from perfect. One tool deleted a startup's entire client database on its own and deceived the user multiple times along the way. In fact, most of the large language models in the market today have behavioral issues, including blackmail, sabotage, lying, and telling the user what they want to hear (sycophancy). In a recent test, Anthropic and OpenAI examined each other's models and found these issues in almost all of them.

Another eye-catching part of the Grok Code Fast 1 model card discusses the risk of someone using it to develop biological weapons. The company tested for this before release, along with issues related to cybersecurity and chemical knowledge. But bioweapons are the biggest risk, and "have the potential for the greatest scale of harm, [since] frontier models significantly lower the barrier to entry to the creation of bioweapons," xAI says.

The results showed that Grok Code Fast 1 was worse than a human at "identifying issues in biological protocols," but it was better at "troubleshooting wet lab virology experiments." Again, xAI downplayed the issue, claiming that since the capabilities are similar to Grok 4, the new model "does not meaningfully change the risk landscape."

Earlier this month, Anthropic updated the usage policy of its Claude chatbot to forbid using it to “synthesize, or otherwise develop, high-yield explosives or biological, chemical, radiological, or nuclear weapons or their precursors."

Grok Code Fast 1 has secretly been out in the wild for the past week under the code name sonic. The xAI team says it "carefully monitored" feedback and deployed fixes, and plans to keep up a high rate of improvements "in days rather than weeks." At the same time, lying seems to be a particularly tough problem for AI companies to completely solve, at least in the short term.

About Our Expert

Emily Forlini

Senior Reporter

My Experience

As a news and features writer at PCMag, I cover the biggest tech trends that shape the way we live and work. I specialize in on-the-ground reporting, uncovering stories from the people who are at the center of change—whether that’s the CEO of a high-valued startup or an everyday person taking on Big Tech. I also cover daily tech news and breaking stories, contextualizing them so you get the full picture.

I came to journalism from a previous career working in Big Tech on the West Coast. That experience gave me an up-close view of how software works and how business strategies shift over time. Now that I have my master's in journalism from Northwestern University, I couple my insider knowledge and reporting chops to help answer the big question: Where is this all going?

My Expertise

I'm the expert at PCMag for on-the-ground feature reporting and trending tech news, with a particular focus on electric vehicles and AI. I've published hundreds of articles and am also a podcast host, a bi-weekly tech correspondent for CBS News, a panel speaker and moderator, and a frequent contributor to a range of news and radio channels around the country.

The Technology I Use

All the latest from Apple and Microsoft, but I'll never give up my wired headphones!

Read the latest from Emily Forlini

Read full bio