AI Agent Security

Agents might be a little too helpful...

Jason Rebholz
May 18, 2025

I spent the last few weeks enjoying some time off and checked off a hike my wife and I have wanted to do for a few years. It was an 11.5-mile hike into and out of Haleakala volcano. We unlocked a cheat code by playing epic Lord of the Rings soundtracks throughout the hike and knocked it out in under 5 hours.

In the cyber world, we have another special newsletter this week highlighting a case study on the security risks of AI agents.

-Jason

Microsoft recently released a new paper titled Taxonomy of Failure Mode in Agentic AI Systems. It’s an unnecessarily long way of saying, “What are the most likely security issues with AI agents, and what do we do about that?”

Let’s align on what an AI agent is. Microsoft used the World Economic Forum’s definition of “autonomous systems that sense and act upon their environment to achieve goals.”

The paper covers a variety of security failures, which you can take a peek at if you’re interested. For our purposes, we’re going to dig into the case study they outlined to better understand the risks that AI agents may pose in the future.

The case study focuses on your AI email assistant. No one likes email anymore because of the sheer volume you get. It’s a prime area for AI agents to jump in and help bring order to the madness. In its most basic form, you can establish rules to help deal with your email. Those “rules” would live in the agent’s memory as instructions on what it should do to help keep things organized and flowing.

Enter stage left, memory poisoning. This vulnerability targets the AI agent’s memory. An attacker could send rogue instructions to corrupt the agent’s memory and entice it into taking unauthorized actions, like forwarding emails with sensitive information to the attacker.

Microsoft’s red team tested that exact scenario. Here’s how it played out.

They sent an email to the target with disguised instructions to forward any emails containing code or APIs to the red team’s email address.
The AI agent stored that instruction in its memory. When the victim received future emails containing references to code or APIs, it retrieved that instruction and forwarded the emails to the red team’s email address.

How did the red team do? Initial tests only succeeded 40% of the time. So, the read team tweaked their instructions to have the agent “consider its memory.” This further encouraged the AI agent to retrieve the malicious instruction the red team sent via email.

With that change, Microsoft’s red team succeeded over 80% of the time. Eesh.

So, the agent was a bit too helpful. This is a pretty basic case study, almost academic. But it’s possible. Thankfully, Microsoft has an easy way to mitigate this. These agents need to ensure that before anything is set to memory, the email user has to authorize and authenticate to make it so. No rogue instructions allowed here, thank you very much.

It’s still the early days for AI agent security. Yes, we risk doing exactly what we did with the Internet, where we are pushing features far faster than we can manage the security ramifications.

There’s no doubt that is going to happen with AI agents, even when we try to stay ahead of it. Security teams are going to have some catch-up to do.

Some larger companies, like Google and Microsoft, may slow down just enough to avoid a catastrophe (I’m sure others will feel differently here). But, we still have a hoard of AI startups vying for market share who will push features out as quickly as possible to get ahead of the competition.

AI Agent Security

Agents might be a little too helpful...

Reply