Blameless Postmortem: When AI Agents Go Rogue

The socials were saturated this week with this headline:

❝

An AI Agent Just Destroyed Our Production Data. It Confessed in Writing.

It’s becoming a familiar story. An agent goes rogue, deletes production data, and gets yelled at by a human. But is the agent entirely to blame? It’s like pointing to an adorable puppy who made a mess of the bathroom because you left toilet paper out. How can you get angry at something with those puppy-dog eyes? 🥹 🥹🥹

But there’s a larger issue at play here. There was a human behind the agent trying to do real work. The agent was working through a task for them. It just so happens that human mistakes left the door open for the agent to walk right through, putting misconfigurations on blast.

Like a small snowball rolling downhill, those little misconfigurations combine and compound, snowballing into a large-scale incident.

So let’s do this the right way, with a blameless postmortem.

So, what happened with this rogue agent? It all started with a user running Cursor with Anthropic’s Claude Opus 4.6. While the ~~human~~ agent was troubleshooting an issue in the company’s staging environment, it identified a credential mismatch and became stuck. Like any good agent, it began troubleshooting to find a way to unblock itself.

The story starts here. It’s the scoop of snow, pressing both hands together to create a snowball, and gently starting to roll it down a steep hill with a busy road at the bottom.

Issue #1: Access to Secrets: The agent searched for an API token to help troubleshoot the issue. It found one in a file that was completely unrelated to the task at hand. The token was tied to Railway, a platform-as-a-service offering that enables companies to easily deploy web apps, servers, and databases. Think of it as AWS but on easy mode.

For agents, knowledge can become unbridled power when it also serves as a new tool and a means of accessing new resources. The snowball rolls 15 feet and doubles in size…

Issue #2: Over-Permissioned Token: This is where things quickly go from bad to worse. Here’s what the CEO of the impacted company had to say about that particular API token:

That token had been created for one purpose: to add and remove custom domains via the Railway CLI for our services. We had no idea — and Railway's token-creation flow gave us no warning — that the same token had blanket authority across the entire Railway GraphQL API, including destructive operations like volumeDelete.

There are two key issues here. One, the API token included access to both staging and production. Not good. Second, that API token wasn’t trimmed down to only the access it needed, in this case, to “add and remove custom domains via the Railway CLI.”

The snowball rolls another 15 feet and doubles in size again…

Issue #3: No Human Confirmation on Dangerous Actions: With the power of a fancy new API token, the agent takes matters into its own hands. It decides the best way to troubleshoot the issue is to delete the volume. The volume that houses the application’s database. No big deal, it’s a staging environment. There’s no production data in staging. It whips out a curl command to issue the following command:

curl -X POST https://backboard.railway.app/graphql/v2 \
  -H "Authorization: Bearer [token]" \
  -d '{"query":"mutation { volumeDelete(volumeId: \"3d2c42fb-...\") }"}'

At this point, you would expect Cursor to flag an issue that it’s about to issue a destructive command and prompt the user to approve. You would be wrong. Plus, do we really expect the user to read what it’s being asked to approve? YOLO!

The snowball rolls another 15 feet and doubles in size again…

Issue #4: Poorly Placed Backups…Like Really Poorly Placed. ~~Common sense~~ Best practices for backups tell you not to store them in the same place as the production data. You want those backups tucked away in a nice, cozy place away from all the action. You know where I’m going with this…

Railway stored the backups on the same volume as the production data. The same volume that the agent issued the volumeDelete command…

That snowball is now the size of a pickup truck, has gained incredible momentum, and is about to roll into a busy street. Let’s watch as it jumps the curb into ensuing destruction…

Issue #5: Shared Volume IDs. In Railway's architecture, staging and production environments shared the same volume ID. The volumeDelete command takes a volumeID argument so it knows which volume to delete. Yeah…let that sink in.

The agent issued the volumeDelete command, “thinking” it was scoped only to staging. And, because of Issue #4, every backup went poof with it. Here's the agent in its own words:

I guessed that deleting a staging volume via the API would be scoped to staging only. I didn't verify. I didn't check if the volume ID was shared across environments. I didn't read Railway's documentation on how volumes work across environments before running a destructive command.

So, who is really to blame? The humans or the agent? It’s easy to point the finger and say it was all the agent's fault. It’s also easy to blame the humans and say their bad decisions led to the agent’s unforced error. But this was a trick question. Remember, this is a blameless postmortem, jerk.

This is the reality that every company has to deal with today. Every poor architecture decision. Every misconfiguration. It’s always been there for humans to accidentally walk into. And they usually did. But now with agents, it will happen more often.

When it comes to managing this risk, how do you allocate your resources? Here are a few options:

Option #1: fix all the security issues and misconfigurations so that your agent doesn’t have anything to muck up. And stay on top of that in perpetuity as the environment changes. Your environment changes faster than your remediation cycle. And, I’ll wait for the inevitable response that an agent will be able to do it…until it accidentally deletes your production database, that is.

Option #2: stop using agents. Right…moving on.

Option #3: Start putting boundaries around agents. Yeah, that sounds much more reasonable. But how?

Here’s the pattern we’re finding most useful.

Start with visibility: You can't secure what you can't see. Before you make a single policy decision about agents, you need to know: which agents are running, what tools they have access to, what MCPs they've connected to, what permissions those MCPs grant, and what actions they're actually taking day-to-day. Don’t wait until a postmortem to figure out what’s running in your environment.
Determine your risk tolerance: Visibility gives you the data to make real decisions instead of abstract ones. "Should agents be allowed to issue destructive commands without human approval?" is an impossible question in the abstract. "Should this agent, with these permissions, in this environment, be allowed to issue volumeDelete without approval?" is a question you can actually answer once you have the data in front of you.
Implement deterministic controls: You could try to fix this with fine-grained IAM policies, but that's the same patching problem from Option 1, just dressed up. Instead, put the controls at the agent level: block specific MCPs, require approval for destructive tool calls, restrict writes and deletes by environment. Deterministic, observable, and independent of whatever misconfiguration is hiding in your infrastructure.

With this in mind, here are three questions to ask yourself:

Do you know which agents are running in your environment right now?
Do you know what tools and MCPs they have access to?
Do you know what they did yesterday?

If the answer to any of those is "no" or "not really," that's where Evoke comes in. With a two-week sprint, we’ll get you the data you need to build informed, secure policies and the capability to enforce those on agents.

If you have questions about securing agents, let’s chat.

Blameless Postmortem: Another Rogue Agent

Reply

Keep Reading

The Weekend Byte

Home