AI Agent Security for CISOs: A Visibility-First Playbook for Detecting Rogue Agents

I responded to too many ransomware engagements in my time doing incident response. In those, there was always one thing that stood out to me. That sudden shock the victim felt. Everything is fine and dandy, and then BAM! They get sucker-punched out of nowhere and are left picking up the pieces to figure out what happened.

It’s an unbearably uncomfortable and vulnerable time for any business. The businesses just didn’t see it coming. The signals were there, but they were hidden just outside their line of sight.

That's exactly where we are with agents right now. Just like how Meta didn’t see the punch coming from their AI support agent that willingly helped attackers take over Instagram accounts because the attacker asked nicely.

The good news is we're early to this fight. Attackers are just starting to figure out how to target agents, just as they spent years figuring out how to attack operating systems and web applications. The playbook isn't getting a full refresh; it’s just being adapted to a new technology.

And if there's one thing the last twenty years of security has taught us, it's this: you don't need to wait to get punched in the face to see that the punch is coming. You just have to be watching.

That’s why it’s not surprising that Anthropic lists visibility as one of the key steps in their eBook, Zero Trust for AI Agents. There are a ton of great tips on how to secure autonomous agents in your environment, but for the sake of brevity, we’re going to focus on the two that serve as your security backstop for when things go wrong, because you know they will. They’re the ones that tip you off when something sketchy is going on:

Observability and auditing: The foundation on which your entire detection and response strategy is built. Without the right visibility, you’re playing a one-sided game of Battleship. You can’t even guess where an enemy ship is because no one is answering.

Anthropic calls out two specific things to monitor:

Action logging: What agents do, when they do it, and under what authority. There are three levels to this, from Foundation (the baseline) to Advanced (a highly mature system).

Anthropic’s Action Logging Maturity

Traceability: While actions are individual events, traceability is the storyline of what happened. It’s the plot from the opening prompt, every tool call, and the final closing response.

Anthropic’s Traceability Maturity

Here’s the rub: Anthropic provides observability and auditing through OpenTelemetry data that you can configure in their platform. But those logs don’t tell you when something bad is happening. That’s where the next piece comes in.

Behavioral monitoring and response: It’s not enough to just log what agents do. That’s like a bank setting up an elaborate CCTV system but then not looking for when a team of masked robbers waltzes in. This is all about identifying known bad behaviors and baselining normal activity to detect any deviations.

Baseline establishment: This is just the absolutely normal activity of stalking an agent to monitor everything it does until you know its skincare regimen, favorite color, and how it takes its coffee. Creepy stalker vibes aside, knowing what is normal for an agent is imperative to identifying deviations from that norm. It’s like identifying game trails, the common path animals take to get to their watering hole and feeding grounds. At Evoke, we call these agent trails.
In Anthropic’s Maturity model, you’re not even participating in the game until you get to the Enterprise level.

Anthropic’s Baseline Establishment Maturity

Anomaly detection: With a baseline in place, you can build the capability to monitor for deviations. This is where we can lean on battle-tested Machine Learning capabilities. The main objective here is to identify when an agent steps off its normal agent trail. Emphasis on “its.” It doesn’t do anyone any good to baseline the average of all agents in the environment. You need to make that agent feel special, show it a little 1:1 attention, buy it something it really likes, like more tokens. Deviations are noise unless they’re tuned to the specific agent.
In Anthropic’s Maturity model, you can do some basic behavioral detections with some of their basic API usage reports and spend limits. But that only covers things like resource exhaustion. To truly find compromised or rogue agents, you have to monitor for deviations from the typical actions a specific agent or task takes, and alert when they occur.

Anthropic’s Anomaly Detection Maturity

Known bads: This one wasn’t in Anthropic’s list, so I have to add it. Behavioral analysis is important, but that’s a 4th-inning problem, when most companies are just taking the field in the first inning. There are some basic blocking and tackling you can get done with the right visibility. Things like detecting and blocking destructive commands. You don’t need anomaly detection to flag and block an agent from deleting your production database.

No maturity matrix on this one…it’s just common sense.

The starting point isn't fancy. It's four steps.

Get visibility. You want both configuration and run-time. This is where you stop waiting for the punch and start watching your opponent. You can pull (or manage) configurations via your MDM. Easier said than done, but it’s doable. For runtime, Anthropic can send OpenTelemetry data for Claude Code and Cowork to your logging server. With their Compliance API for Enterprise customers, you can get additional visibility into Claude AI activity. If you stop here, you’re just wasting your time.
Map the blast radius: Threat modeling agent configurations will tell you what an agent could do. It’s the blast radius. Start by assessing every agent your users are running locally, along with every permission, tool, and data source attached to it. Think of it as mapping all the strings reaching out from the agent and clipping into resources across your organization. This is where over-permissioned agents and risky tool combinations stop hiding and start showing up on a list.
Monitor for malicious activity: Watching what your agents actually do in your environment at run-time is not enough. You have to build the detections to spot the incoming punch before it lands. Once you can see it coming, you've taken the surprise out of the attack. And just like with ransomware, surprise is the attacker's best weapon. Now that you see it coming, you can take the next step.
Stop the punch before it lands: Give yourself enough buffer to catch the wind-up. The easiest place to start is spotting the fist right in front of your face: the moment a risky tool is about to execute, aka the known bads. Something dangerous is about to run, so you block it. Punch averted. Then, graduate to behavioral analysis monitoring. That starts looking for when the attacker is winding up. That gives you the most time to block the punch and avert the pain.

Hooray! We have guidance. But, like most security reports and blogs, Anthropic and this blog leave you with great recommendations and no practical way to implement them on your own. Very few companies can get past step #1 above.

** Vendor punch incoming ** That’s why we built Evoke. Because I’ve had to defend networks on a budget, where you don’t have an endless supply of headcount or tokens to solve this.

The guidance Anthropic recommends (with no clear solution) is what Evoke is built to solve. Let us help you see the punches before they land.

Trying to get eyes on the agents running in your environment? That's exactly the problem we're solving at Evoke. If you want to see what visibility actually looks like, let’s chat.

AI Agent Security for CISOs: A Visibility-First Playbook for Detecting Rogue Agents

The starting point isn't fancy. It's four steps.

Reply

Keep Reading

The Weekend Byte

Home