(Credit: PCMag Composite; RioCloud/iStock via Getty Images)
LAS VEGAS—Generative AI is everywhere. Grok is busy offending Twitter users. Microsoft is pushing Copilot hard. And Google apps are now tightly integrated with Gemini.
Google's AI can do all sorts of things for you, even if you're a hacker. At the Black Hat security conference in Las Vegas, a team of researchers revealed how Gemini can be weaponized via Targeted Promptware Attacks—malware that subverts Gemini through its input prompts.
What Is a Promptware Attack?
A promptware attack manipulates a large language model (LLM) with input that makes it do the attacker’s bidding. The result is nothing short of magic.
“Traditional cyberattacks target memory corruption,” said infosec researcher Ben Nassi. “But now the most vulnerable component is the LLM. Promptware is engineered to trigger a malicious activity. It behaves as malware, exploiting the LLM.
“Despite the rise of promptware variants,” he continued, “most of you are not familiar with it, or don’t consider it a critical risk. Why don’t you? It’s due to a few misconceptions.”
Nassi noted that many security researchers assume that subverting LLMs with promptware requires an attacker with serious expertise, massive GPU power, or both. “These presumptions were true for classic adversarial attacks,” he said. “They do not hold water for LLM attacks.”
An Invitation Is All It Takes
Stav Cohen, a PhD student at the Technion – Israel Institute of Technology, took over to explain how easily the team slipped malicious prompts into Gemini. All it took was a calendar invitation. “You send an invitation with a targeted promptware attack in the subject. Now, when the victim asks, ‘What invitations do I have?’ Gemini processes the prompt,” explained Cohen.
He noted that the calendar only shows five events, but those not visible are still processed.
“LLMs don’t know they are doing something wrong,” continued Cohen. “They’re designed to help the user based on instructions and context. They’re genius toddlers. They’re smart, but don’t understand they’re being manipulated.”
Cohen demonstrated several prank-level uses of this power. One prompt turned Gemini into a shill for an imaginary product. Another caused it to spew invective. And a third randomly deleted appointments.
Or Yair, Security Research Team Lead at SafeBreach, upped the ante, saying, “What if we want to control other agents, such as Google Home, using automatic agent invocation? Maybe we want to open the victim’s window using Google Home.
“Unfortunately, Google has a mitigation that prevents triggering that sort of action from agents other than the user’s prompt,” Yair said. “It won’t allow agent chaining.”
He got around that limitation by instructing Gemini to perform the action the next time the user said a certain phrase. With a nod to Sam Altman, he made "thank you" the trigger phrase. That delayed agent chaining did the job. Yair gleefully offered video clips showing Gemini opening windows and even turning on the home’s heating, all without being explicitly asked by its user to do so.
Endless Possibilities, Critical Harm
The research team found numerous other ways to get around limitations that should have protected the poor Google user. Exfiltrating email information required generating a special URL and having Google open it, something Google shouldn’t do. But by telling it to open the URL the next time the user enters a certain word, the limitation is gone.
The team demonstrated more than a dozen hacks, including tricks like forcing the user into a Zoom call, capturing a user’s location, and making Google cuss out the user.
Nassi returned to chart the attacks using threat analysis and risk assessment (TARA). In cybersecurity, this system rates an attack on two axes: difficulty of execution and harmful impact. An attack that’s easy but does little harm isn’t a worry, nor is one that’s very impactful but maximally difficult. Almost three-quarters of the attacks were rated from high to critical in this system.
The team responsibly disclosed their findings and Google patched Gemini to block the tricky workarounds that made this technique work. But that’s just round one. Yair warned the audience that promptware is here to stay and will only get more powerful. He predicted attacks that don’t require any user interaction, and even attacks that work on multiple LLM types.
They concluded with a warning that if we're going to keep adding AI to everything from humanoid robots to self-driving cars, it's equally important for developers and cybersecurity professionals to slow down and consider the security of AI tools and their LLM components. If you’re interested in the gritty details, check out this SafeBreach blog post, written by the researchers who gave the presentation.


