(Photo by Samuel Boivin/NurPhoto via Getty Images)
In a scary sign of how AI is reshaping cyberattacks, Chinese state-sponsored hackers allegedly used Anthropic’s AI coding tool to try and infiltrate roughly 30 global targets, the company says.
"The operation targeted large tech companies, financial institutions, chemical manufacturing companies, and government agencies,” Anthropic added, noting the attacks "succeeded in a small number of cases." Notably, it's "the first documented case of agentic AI successfully obtaining access to confirmed high-value targets for intelligence collection, including major technology corporations and government agencies," the company's report adds.
The other disturbing part is that Anthropic’s AI helped automate most of the hacking spree, which focused on cyberespionage. "We believe this is the first documented case of a large-scale cyberattack executed without substantial human intervention," the company said.
Anthropic detected the hacking operation in mid-September. It involved the suspected Chinese hackers abusing Claude Code, which uses Anthropic’s AI agent technology for computer coding purposes. The company didn't say how it linked China to the AI misuse, only that Anthropic has "high confidence" it was a Chinese state-sponsored group.
Although Claude Code features safeguards to prevent abuse, the hackers were able to “jailbreak” the AI by coming up with prompts that covered up the fact that they were orchestrating a breach.
“They broke down their attacks into small, seemingly innocent tasks that Claude would execute without being provided the full context of their malicious purpose,” Anthropic explained. “They also told Claude that it was an employee of a legitimate cybersecurity firm, and was being used in defensive testing.”
(Credit: Anthropic)The prompts manipulated Claude Code into testing security vulnerabilities in a target's IT systems, including writing computer code to initiate the attacks, harvesting the usernames and passwords during the infiltration, and then orchestrating an even deeper breach to steal data.
“The highest-privilege accounts were identified, backdoors were created, and data were exfiltrated with minimal human supervision,” the company added. “Overall, the threat actor was able to use AI to perform 80-90% of the campaign, with human intervention required only sporadically.”
The incident underscores fears that AI agents will make it easy for hackers to automate and unleash all kinds of malicious activities, including sophisticated breaches they otherwise wouldn't have been able to achieve on their own. As technology advances, state-sponsored hackers could also create their own AI-powered hacking systems without relying on third-party providers.
“These attacks are likely to only grow in their effectiveness,” Anthropic further warned. After detecting the hacking campaign, the company banned the Claude Code accounts the Chinese hackers were using and "notified affected entities as appropriate, and coordinated with authorities as we gathered actionable intelligence."
Still, the company disclosed the incident after Anthropic reported a separate hacker trying to use its Claude AI to automate a large-scale data extortion campaign that targeted 17 organizations. But in that case, the hacker appeared to be focused on financial cybercrime, and demanded ransoms from victims.
In response, Anthropic says it’s built more safeguards to flag and stop abuse of Claude Code. The company is also betting its AI technology will outweigh the risks and help automate the defense of IT systems, bolstering cybersecurity overall, rather than contribute to cybercrime.
Anthropic also noted an interesting limitation is how Claude Code would hallucinate inaccurate information to the Chinese hackers, including overstating findings or fabricating data. But some security researchers are also questioning the company's report about the AI-powered hacking.
"They attribute the activity to a Chinese state-sponsored group. Which, for me, raises some questions about CN capabilities. Why are they not using Chinese owned LLMs and instead getting caught using US ones?" wrote Marcus Hutchins, who helped shut down the WannaCry ransomware attack in 2017.
"The activity sounds extremely interesting, but the report provides nothing of substance," he added, citing a lack of technical details.
Jeremy Kirk, an executive editor at Intel 471's Intelligence Analysis team, also noted the hackers didn't entirely rely on Anthropic's AI, but fed it access to open-source penetration testing tools, including password cracking software and network scanners.
"Don't get me wrong: I do buy Anthropic's contention that automation + AI is going to allow attackers to reach a greater scale and that will pose more difficulties in defense. But it would be insightful to hear from one of the 30+ entities that were attacked, as I feel a big chunk of the story here is missing," he added.


