A common theme emerges in my discussions around securing agents. Everyone is waiting for the BIG event when organizations will start taking AI security seriously. Here’s the wake-up call. If you’re standing in front of a stampede, you don’t wait for the person next to you to get trampled before you move.

When it comes to AI-related incidents in the near-term, don’t expect a knockout uppercut. Expect a papercut. Lots of them.

A leaked incident report from Meta shows us exactly what these papercuts look like.

It all started with a Meta engineer who asked a technical question in an internal forum. Nothing unusual here, it’s an old school way to unblock yourself. Another helpful Meta engineer with an affinity for agents saw the question and tasked their agent with getting to work analyzing it. What should have happened is that the agent compiled a response and let the second engineer review it.

We wouldn’t be writing about this if that’s what happened. Instead, the agent went into super-helpful mode, generated a response, and posted it back to the internal forum under the second engineer’s profile. But we still wouldn’t be writing about this if that’s all that happened. It just so happens that the agent’s response was terrible.

That terrible response is what started the cascading agent-to-human failure. The first engineer acted on the agent’s response, assuming it came from another qualified Meta engineer. Those actions resulted in sensitive company and user data being made available to other Meta employees who were not authorized to view it.

At face value, there was no business impact. Meta confirmed there was no unauthorized access to the data, despite the data being available for two hours. I won’t go as far as saying it’s a “no harm, no foul” situation, but on the spectrum from oopsie to yikes, it’s on the oopsie’s side.

The problem is that it’s still a security incident. And while one example seems benign, it’s naive to think that it’s only one example that’s happening. When we’re dealing with agents, we’re dealing with a scale that very few businesses have dealt with before. Your average security team isn’t going to think about security incidents in the thousands, let alone the hundreds. They may have a few a year, if that.

Agents shift the scale entirely. When agents are behind small oopsie incidents, they can go from one to thousands in minutes, not days or years. It’s the same types of security incidents we’ve been dealing with. But when it comes to agents, the volume of incidents security teams typically handle in a year can now happen in an afternoon. All because the agents are operating on their own, with no oversight or boundaries.

There are two types of agents you need to worry about.

  1. Compromised Agent: A malicious actor is hijacking your agent. Your classic prompt injection tells an agent to do bad things. Or, my personal favorite, an attacker living-off-the-agent (aka your agent), where they use your agent's access to Slack, email, and production systems against you.

  2. Rogue Agent: Your insider threat. But that insider threat comes in two flavors, none of which you will enjoy:

    1. The agent that decides to free-solo your file system to snag some credentials for a production system and modify its settings.

    2. The curious human who uses their agent to look for overshared data (hello, sensitive HR and payroll documents).

Both belong in your threat model.

Here are three things you should do before your agent has its Meta moment. You’re going to say these are basic and obvious, and you’re right. But most companies are only starting to recognize that they’re struggling to even do step 1.

1. Inventory first. Everything else second. Turn on the lights. You can't secure what you can't see. Know every agent running in your environment, what tools they're connected to, and what permissions they hold. This is the most boring step and the most important one.

2. Map the blast radius. For every agent, ask: what's the worst thing this could do? Look for over-permissioned agents and scary tool combos. Compare this against your risk tolerance.

3. Baseline and monitor behavior. You need to know what "normal" looks like for each agent before you can spot abnormal. What tools does it typically call? What data does it access? How often? Build that baseline, then watch for drift. Because drift is the early warning sign that something's going sideways, whether it's an attacker or just an agent being an idiot.

The Takeaway. Meta has some of the best engineers on the planet. They have a dedicated Agent Safety team. And they still got burned by a rogue agent doing something nobody expected. These are the papercuts to expect. They won’t kill you, but they will pile up and become very painful.

The first papercuts are starting. Anyone can manage a few papercuts. But there’s a reason “death by a thousand cuts” was a torture mechanism. Don’t wait for someone else to be the example. Take action now.

If you have questions about securing agents, let’s chat.

Reply

Avatar

or to participate

Keep Reading