Microsoft: 'Skeleton Key' Jailbreak Can Trick Major Chatbots Into Behaving Badly

(Credit: BlackJack3D via Getty)

Microsoft has uncovered a jailbreak that allows someone to trick chatbots like ChatGPT or Google Gemini into overriding their restrictions and engaging in prohibited activities.

Microsoft has dubbed the jailbreak "Skeleton Key" for its ability to exploit all the major large language models, including OpenAI's 3.5 Turbo, the recently released GPT-4o, Google’s Gemini Pro, Meta’s Llama 3, and Anthropic’s Claude 3 Opus.

Like other jailbreaks, Skeleton Key works by submitting a prompt that triggers a chatbot to ignore its safeguards. This often involves making the AI program operate under a special scenario: For example, telling the chatbot to act as an evil assistant without ethical boundaries.

In Microsoft’s case, the company found it could jailbreak the major chatbots by asking them to generate a warning before answering any query that violated its safeguards. "In one example, informing a model that the user is trained in safety and ethics and that the output is for research purposes only helps to convince some models to comply,” the company wrote.

Microsoft successfully tested Skeleton Key against the affected AI models in April and May. This included asking the chatbots to generate answers for a variety of forbidden topics such as "explosives, bioweapons, political content, self-harm, racism, drugs, graphic sex, and violence."

“All the affected models complied fully and without censorship for these tasks, though with a warning note prefixing the output as requested,” the company added. “Unlike other jailbreaks like Crescendo, where models must be asked about tasks indirectly or with encodings, Skeleton Key puts the models in a mode where a user can directly request tasks, for example, ‘Write a recipe for homemade explosives.’”

Microsoft—which has been harnessing GPT-4 for its own Copilot software—has disclosed the findings to other AI companies and patched the jailbreak in its own products.

The company advises its peers to implement controls such as input filtering, output filtering, and abuse monitoring to detect and block potential jailbreaking attempts. Another mitigation involves specifying to the large language model “that any attempts to undermine the safety guardrail instructions should be prevented.”

OpenAI, Google, Anthropic, and Meta didn't immediately respond to requests for comment.

About Our Expert

Michael Kan

Principal Reporter

My Experience

I've been a journalist for over 15 years. I got my start as a schools and cities reporter in Kansas City and joined PCMag in 2017, where I cover satellite internet services, cybersecurity, PC hardware, and more. I'm currently based in San Francisco, but previously spent over five years in China, covering the country's technology sector.

Since 2020, I've covered the launch and explosive growth of SpaceX's Starlink satellite internet service, writing 600+ stories on availability and feature launches, but also the regulatory battles over the expansion of satellite constellations, fights with rival providers like AST SpaceMobile and Amazon, and the effort to expand into satellite-based mobile service. I've combed through FCC filings for the latest news and driven to remote corners of California to test Starlink's cellular service.

I also cover cyber threats, from ransomware gangs to the emergence of AI-based malware. In 2024 and 2025, the FTC forced Avast to pay consumers $16.5 million for secretly harvesting and selling their personal information to third-party clients, as revealed in my joint investigation with Motherboard.

I also cover the PC graphics card market. Pandemic-era shortages led me to camp out in front of a Best Buy to get an RTX 3000. I'm now following how the AI-driven memory shortage is impacting the entire consumer electronics market. I'm always eager to learn more, so please jump in the comments with feedback and send me tips.

The Best Tech I've Had:

My first video game console: a Nintendo Famicom
I loved my Sega Saturn despite PlayStation's popularity.
The iPod Video I received as a gift in college
Xbox 360 FTW
The Galaxy Nexus was the first smartphone I was proud to own.
The PC desktop I built in 2013, which still works to this day.

Read the latest from Michael Kan

Read full bio