We’re In the Windows 95 Era of AI Agent Security
Headed for widespread usage, and insecure by design
Windows 95 ushered in a golden age for viruses. Appealing to a broad audience, it offered a tempting target for hackers. Designed to support general-purpose software that could do pretty much anything, it was difficult for Microsoft to prevent the resulting malware from doing bad things. AI agents, another fundamentally general-purpose technology with potentially broad appeal, could unleash another Wild West era for computer security. How bad could things get? That remains to be seen.
Windows 95: Insecure By Design
The internal storage on your computer contains a little bit (or a lot) of everything – your personal documents, apps you’ve downloaded, your browsing history, the system software and settings files that determine how the machine operates, and more.
Windows 95 needed to remain compatible with design decisions that go all the way back to the early 1980s and MS-DOS. That basically meant that any program could do anything to any data in the system, and developers routinely relied on this when designing applications. Closing off access would have broken thousands of programs used by millions of people. So instead, Microsoft and a long list of antivirus vendors played a cat-and-mouse game with malware authors. They’d block a specific technique used by viruses, and three new “exploits” would pop up in its place.
For instance, in the early days, viruses would attach themselves to legitimate applications like Microsoft Word by appending their own code at the end. That meant that one signature of an infection was program files suddenly getting longer. Antivirus programs began checking for this; in response, the “Chernobyl” virus was designed to embed its code into small bits of unused space in the middle of the file. Thus bypassing length-based detection, it was estimated to have infected sixty million computers, permanently damaging some and wiping data from many more.
How did Microsoft fix the security issues in Windows 95? They couldn’t. Instead, they designed an entirely new operating system (Windows NT), and spent years migrating applications and users to that lineage, a process that dragged on until Windows XP went mainstream in the 2000s. On top of this, they poured decades of effort into addressing other security holes. (In 2002, Bill Gates wrote a famous memo, Trustworthy Computing, instructing everyone at Microsoft to prioritize security over new features.)
Early computers allowed developers to be enormously creative, because any program could do anything to any file in the system. From a security perspective, Windows 95’s flexibility was also its downfall. AI agents are fundamentally insecure for precisely the same reason.
AI Agents: Too Good for This World
No, little bot! It’s a trick!
In a demonstration at the recent Black Hat cybersecurity conference, researchers presented 14 attacks against assistants powered by Google’s Gemini model1. These attacks tricked the assistant into acting on malicious instructions by embedding them in a shared document, an email message, or even a simple calendar invite. Actions included stealing information from the victim’s inbox, spying on them by automatically connecting their computer to a video call, or even controlling their home automation system (for instance, turning on the furnace).
Another recent report documents vulnerabilities in Perplexity’s “AI-powered browser”, Comet. To quote from the report:
…when users ask it to “Summarize this webpage,” Comet feeds a part of the webpage directly to its LLM without distinguishing between the user’s instructions and untrusted content from the webpage. This allows attackers to embed indirect prompt injection payloads that the AI will execute as commands.
… The injected commands instruct the AI to use its browser tools maliciously, for example navigating to the user’s banking site, extracting saved passwords, or exfiltrating sensitive information to an attacker-controlled server.
To demonstrate the problem, the authors posted the following as a comment on Reddit:
To avoid detection, the text was written in white on a white background. The AI doesn’t pay attention to text colors, and so the malicious instructions were perfectly visible to it. When the report authors asked Comet to summarize the Reddit page, the agent was tricked into generating a temporary password for the (simulated) victim’s Perplexity account and sharing it with the attacker. (This sort of vulnerability is yet another reason why, as I explained last week, AI agents aren’t ready for open-ended interactions with the real world.)
I explained the principles behind this kind of attack in a series of posts: What is “Prompt Injection”, And Why Does It Fool Chatbots?, Anatomy of a Hack, and Why Are LLMs So Gullible?. It’s been almost 2 years since I wrote those posts, and not much has changed. What are the prospects for securing AI agents?
Irresistible Market Pressures Meet an Immovable Design Flaw
Tech companies are racing to place AI agents in the hands of users. Already, applications such as OpenAI’s “Operator” agent, Perplexity’s Comet browser, and the Claude Code software engineering agent, can take actions on a user’s behalf in a browser or even take control of their computer. These are just a few examples of the agents already on the market, and we can expect a flood of imitators and innovators to follow. For now, all of them are fundamentally vulnerable to being subverted by malicious instructions.
(As I was writing this, Anthropic released an experimental new data analysis feature for their Claude chatbot and included the following note: “Claude can be tricked into sending information from its context (e.g., prompts, projects, data via MCP, Google integrations) to malicious third parties. To mitigate these risks, we recommend you monitor Claude while using the feature and stop it if you see it using or accessing data unexpectedly.”2)
And yet, so far as I’m aware, so far there are no known instances of such attacks being carried out in practice (aside from controlled tests by “white hat” security researchers). At least, not unless you count occasional clumsy examples of research papers containing hidden text telling AI tools to give the paper a positive review.
I’m not particularly surprised that we haven’t seen attacks yet. Personal computers first hit the market in the late 1970s, but viruses didn’t become a serious problem until 10 to 15 years later. AI agents are quite new, and aren’t yet in widespread use; there’s not much profit opportunity in trying to attack them. And as I recently wrote, bad guys aren’t always quick to act on new opportunities.
This attack-free grace period won’t last forever; things move a lot faster than they did in the 70s. In the meantime, application providers are playing whack-a-mole, adding filters to block specific attack techniques as they’re discovered. Model developers are undoubtedly training their AIs to be harder to fool. Compared to the Windows 95 days, developers – hardened by decades of cat-and-mouse with virus authors, spammers, and web content farms – are far better at detecting patterns of malicious activity and responding quickly.
On the other hand, attackers have also upped their game, quickly generating new attack variants. Machine learning techniques can be used to automatically modify a prompt to evade defenses. And there are no known techniques for making modern AI systems secure by design. Back in 1995, secure operating system architectures were already a mature science; the challenge was simply to migrate the Windows ecosystem onto a secure design. We don’t know how to do that for AI, even in principle3.
I’ll end with this paragraph from a Wired article reporting on a conversation with a senior director of security product management:
Google’s Wen, like other security experts, acknowledges that tackling prompt injections is a hard problem since the ways people “trick” LLMs is continually evolving and the attack surface is simultaneously getting more complex. However, Wen says the number of prompt-injection attacks in the real world are currently “exceedingly rare” and believes they can be tackled in a number of ways by “multilayered” systems. “It’s going to be with us for a while, but we’re hopeful that we can get to a point where the everyday user doesn’t really worry about it that much,” Wen says.
“Hopeful… the everyday user doesn’t really worry about it that much” isn’t the loftiest goal. Time will tell whether it’s good enough.
Thanks to Abi and Taren for feedback and images.
For more information, see the technical paper, Invitation Is All You Need! TARA for Targeted Promptware Attack Against Gemini-Powered Assistants.
More discussion of this issue in Simon Willison’s blog.
There has been some nice work published on constructing secure LLM-based agents; see these two posts from Simon Willison’s blog. But so far, all techniques either place serious restrictions on how agents can be used, or force attackers to jump through some additional hoops but don’t make attacks impossible in principle.