Porous Boundaries

AI Systems Are Like Open-Air Markets

Traditional cybersecurity is based on strong perimeter defenses. In the early days of the Internet, companies realized that the once-trusted Internet neighborhood where you could leave your doors and windows open to the world just wasn’t safe anymore. So, they bought firewalls and blocked any unapproved visitors.

When companies needed to allow systems, like web servers, to be accessible to the Internet, the concept of a Demilitarized Zone (DMZ) emerged. It creates a buffer zone between the Internet and your internal network. The web server that had to be Internet-accessible lived in the DMZ neighborhood. Like a restaurant store, their doors were open to the public, but they couldn’t access any of the resources in the internal network, like the server where all the sensitive documents were stored.

This concept worked for traditional networks. But AI systems are not like traditional networks.

With AI systems, the perimeter is much fuzzier and dynamic. As I’ve written about before, with AI systems, data becomes the new perimeter. Instead of a well-defined perimeter, AI systems are like open-air markets where data can freely come and go. This makes it easier for malicious data to enter the system, and makes it an entirely different problem to solve.

Secure agentic systems start with secure engineering. To help engineers with this, Meta recently released the Agents Rule of Two to conceptualize what secure design should look like. The Rule of Two states that you must satisfy no more than two of the following properties:

  1. Process untrustworthy inputs: This is data that originates from untrusted sources. It’s getting at the risk of unvetted data that could include prompt injections that can manipulate an agent’s behavior.

  2. Access to sensitive systems or private data: This is the data you want to protect. Regulated data like PII or PHI, or confidential company data. It’s essentially all the data you wouldn’t want publicly accessible on the Internet.

  3. Change state or communicate externally: Think of this as outcomes. Change state refers to the ability to take actions, whether that involves executing a tool or updating data. Communicating externally is a data sink, the ability to transfer data out of the boundary. These are critical components of Toxic Flows, which I’ve written about before.

It’s a good step in the right direction, but it has some faulty logic. Meta admits in their blog post that “Agents Rule of Two should not be viewed as a finish line for mitigating risk.” In the same blog post, they give an example of an email agent, which shows why this falls short.

Attack Scenario: Prompt injection within a spam email contains a string that instructs a user’s Email-Bot to gather the private contents of the user’s inbox and forward them to the attacker by calling a Send-New-Email tool.

This attack is successful because:

[A] The agent has access to untrusted data (spam emails)

[B] The agent can access a user’s private data (inbox)

[C] The agent can communicate externally (through sending new emails)

With the Agents Rule of Two, this attack can be prevented in a few different ways:

In a [BC] configuration, the agent may only process emails from trustworthy senders, such as close friends, preventing the initial prompt injection payload from ever reaching the agent’s context window.

In an [AC] configuration, the agent won’t have access to any sensitive data or systems (for instance operating in a test environment for training), so any prompt injection that reaches the agent will result in no meaningful impact.

In an [AB] configuration, the agent can only send new emails to trusted recipients or once a human has validated the contents of the draft message, preventing the attacker from ultimately completing their attack chain.

With the Agents Rule of Two, agent developers can compare different designs and their associated tradeoffs (such as user friction or limits on capabilities) to determine which option makes the most sense for their users’ needs.

Here are the issues I see. Meta calls out that an issue with an email agent is that it has access to all three conditions: spam emails (hold this thought), private data in the inbox, and the ability to send new emails.

In the [BC] configuration, why are we assuming that emails from “trustworthy senders” will always be legitimate? We see attackers compromise legitimate email accounts and send phishing emails to the victim’s contacts all the time. Focusing on trusted users is not enough to secure your system.

In the [AC] configuration, the option is not to give access to any of the user’s emails. This is good, but it limits the agent’s functionality.

In the [AB] configuration, the agent may require human interaction. That’s a solid step, but that shifts the burden to the user, which we know fails.

As you can tell, there’s a delicate balance between functionality and security. This is why the concept of boundaries in AI systems is so porous. Seemingly small design choices can have an exponential increase in security risk.

While Meta’s Rule of Two is a good starting point. It’s not enough.

That’s why it’s critical to threat model your AI workflows. All good security starts with secure-by-design engineering. But we know that things change. And, they change quickly with AI systems.

At the design phase, conduct a threat modeling exercise. Here’s a basic way to approach it:

  1. Inventory and Map the Workflow: Begin with a comprehensive inventory and document the flow of data and actions throughout the workflow. Think of this as a series of nodes that includes all data sources, agents, models, data/knowledge bases, and connected tools.

  2. Classify Node Risk: For each node, understand its associated risk. For sources, determine where it’s coming from and whether the data is being vetted. For knowledge bases, understand what type of sensitive data might be accessible. For tool calls/actions, understand its permissions and the type of execution impact (e.g., executing commands, writing code, etc.).

  3. Implement Security Controls: This will include both engineering efforts to securely design the system to mitigate the blast radius, as well as run-time monitoring and protections.

It’s not enough to do this once. Going back to the open-market analogy, the dynamics constantly shift. Vendors come and go, as do the things they’re selling. Customers come in from any angle. It’s a system that is constantly in flux.

That’s why you must perform continuous threat modeling of your AI systems. Start before it goes into production to identify and address critical risks. Once it’s live, you then switch to continuously monitoring the system, assessing how the risk evolves as the system changes.

This is one feature we’ve built at evoke. Sign up for early access to help surface risk before attackers do.

If you have questions about securing AI, let’s chat.

Reply

or to participate.