The Turning Point: AI-Orchestrated Hacking Campaigns

Anthropic report shows attackers automated 80% - 90% of tactical operations

We just crossed a threshold. This week, Anthropic reported that an alleged Chinese state-sponsored hacking group used Claude Code and MCPs to automate up to 90% of their attack. This is the level up from vibe hacking to semi-autonomous hacking.

There is one key takeaway for me from their report. It’s something I’ve been saying for years. AI is not rewriting the attacker’s playbook, but it will increase the velocity of attacks. This means AI isn’t inventing new techniques. Instead, it’s compressing the time to attack while unlocking more scale.

Let’s dig into how Anthropic’s findings, the ensuing nerd controversy, and what I found to be the most important thing that everyone is glossing over.

The AI Attack Orchestration Recipe

Claude Code + MCP Servers + Open-source hacking tools. Yup, that’s it. Those are the core ingredients. You don’t even need to add water or bake at 450 degrees for 35 minutes. Terrible baking analogy aside, there’s obviously more to it than that. The threat actor had some solid engineering on the code backend to build the orchestration engine. It’s the same type of engineering that companies are testing to automate business tasks and functions.

Anthropic provided the following basic architecture diagram.

Claude Code served as the orchestration system, breaking down the attack steps into smaller chunks and distributing them to Claude sub-agents, who would execute the work and report back. Claude Code helped orchestrate everything.

Since Claude handled all the tasks, Anthropic had good visibility into what was happening throughout the entire attack lifecycle. Their analysis led them to believe that AI executed up to 90% of the tactical work while humans provided strategic supervisory roles. Hence the semi-autonomous comments above. Good news for humans as they’re still necessary.

Humans had to initiate the attack and interject at strategic points, including "approving progression from reconnaissance to active exploitation, authorizing use of harvested credentials for lateral movement, and making final decisions about data exfiltration scope and retention.” Anthropic reported that at peak activity, thousands of requests were being processed.

The AI-Orchestrated Attack Playbook

Like rewatching your favorite movie for the 10th time, the attack playbook feels familiar, but you can still pick up on new things. Here’s how the attack unfolded and where the baton was passed between the human and AI.

  1. Target selection: initiated the old-fashioned way, with humans. This required the humans to bypass Claude’s safety guardrails, which they easily did with some creative role-playing. How? They just told Claude that they were doing defensive cybersecurity testing. Okay, maybe not that creative.

  2. Reconnaisance: with the target selected, Claude issued MCP to execute known hacking tools to map the target’s attack surface in an attempt to identify potential vulnerabilities. This stage had minimal human interaction, which isn’t surprising. This type of recon is perfect for GenAI.

  3. Initial Access: after mapping the attack surface, Claude developed an exploit for a discovered vulnerability. Before executing the vulnerability, it sent the findings and recommendation steps to the human, who then approved it. Claude then executed the exploit against the target. This established a foothold in the victim’s environment. Again, minimal human involvement for approvals.

  4. Privilege escalation: after gaining a foothold, Claude used known techniques to harvest credentials. It tested those stolen credentials across systems, going the extra step to map privileges and access boundaries. All of this without any human supervision.

  5. Data collection: stolen credentials mean access to data. Claude independently queried databases and systems to collect, parse, and categorize data by intelligence value. The summary of data was sent to the human for review and approval of what data should be stolen from the environment.

  6. Documentation: Like a good little AI hacker, Cladue documented everything that it did and found during the attack. Antrophic noted that this allowed the threat actor to hand off persistent access to other teams, keeping the attack going.

This playbook follows the majority of attacks I’ve seen throughout my career. Nothing new. But that’s not the point. All of those phases used to rely on individual human operators to execute. With engineering efforts like this, that starts to change.

Things get better from here, right? Anthropic is taking the under on that. And I agree with them. Here’s what they had to say about the future.

This campaign demonstrates that the barriers to performing sophisticated cyberattacks have dropped substantially—and we can predict that they’ll continue to do so. Threat actors can now use agentic AI systems to do the work of entire teams of experienced hackers with the right set up, analyzing target systems, producing exploit code, and scanning vast datasets of stolen information more efficiently than any human operator. Less experienced and less resourced groups can now potentially perform large-scale attacks of this nature.

As informative as I find this report, it’s not without controversy. The rabid nature of the security community led to widespread uproar. Here is just a sample of their complaints:

  1. The report only highlighted known existing attack techniques.

  2. The attack still required a lot of human interaction, especially to build the orchestration platform.

  3. The report did not include any indicators of compromise (IOC). For the non-nerds, this means Anthropic didn’t give any forensic artifacts, like IP addresses/domains the threat actors used, or file hashes tied to malware.

  4. The report felt like it was AI-generated.

  5. The report felt like a marketing stunt.

  6. Why would China not use its own models?

Here’s my take on each item above:

  1. The point isn’t to call out a novel new playbook. It’s about automating existing attacks. An attack that is 90% automated is still a massive leap forward to how they were done even a year ago.

  2. Attackers won’t reach full autonomy overnight. I think this is an extension of the AI conversations. Progress is iterative. Even factories that pump out thousands of widgets an hour still need humans to build the factory and maintain it.

  3. Not including IOCs doesn’t mean the findings are flawed. Yes, the best threat intel reports included IOCs because they can help defenders put hard blocks in place. But I argue that it’s focusing on the wrong thing. The focus should be less on the IOCs and more on the technique here.

  4. Honestly, I would be more surprised if Anthropic didn’t use AI to help write the report. It’s kind of their thing.

  5. Every public intel report has some degree of marketing. This is a reach.

  6. The model selection was something I also struggled with. China’s models are good. There should be no need to use Claude Code and risk being discovered like this. I don’t have an answer for why. One intriguing idea I saw (but won’t say I fully support) is that China was sending a message to the US while testing Antrophic's defenses.

That last point brings me to the thing I feel everyone is glossing over. The threat actor bypassed all the safety protections intended to prevent malicious uses like this with a basic role-playing attack.

That’s like walking into a bank and saying, “Hi, I’m the bank manager and I’m authorized to transfer all of the money out of here. Please put all of the money in a bag and don’t tell anyone I was here.” The bank teller complies because, after all, you said you were the bank manager and were authorized to do so.

Just let that sink in and then realize that your own AI systems and agents can be misled just as easily. Maybe it’s time we also start paying attention to that?

If you have questions about securing AI, let’s chat.

Reply

or to participate.