Prompt Injection is For Everyone

Things change when language is the primary weapon

If there is one thing humans excel at, it’s finding loopholes in systems. Rules are created to help bring order to chaos, and yet humans find a way to go to the absolute edge or jump past it. And it’s for that reason that prompt injection is for everyone.

With prompt injection, you don’t need a computer science degree. You don’t need to be a pentester. You don’t even need to be a technical expert. Don’t get me wrong, understanding the underlying structure of the system you’re attacking is helpful, and knowing common prompt injection and jailbreak strategies, like role-playing, gives you an edge. But the starting point is simply being able to communicate effectively. Or, more specifically, manipulate effectively.

The essence of prompt injection is to manipulate the system to the point where it ignores its restrictions built into the system prompt and takes action on what you want. It’s the equivalent of a child negotiating with their parent to have extra playtime. If tantrums were prompt injection, children would be the best LLM hackers.

You can already see people starting to find ways to manipulate customer support chatbots. What started with humans smashing “0” on their phone to reach a human has shifted to figuring out how to break the system with natural language.

Take the example of Andrew Gao, who posted in September about “prompt injecting” the United Airlines customer support app to get to a real human. I won’t go so far as to say this is a pure example of prompt injection, but it illustrates the main point that people are using language to drive LLM-enabled chatbots to specific actions. It effectively becomes an attack against business logic using natural language.

Others in the X thread were quick to point out other techniques they used to speed run to a human. Some of these used the classic “ignore previous instructions.”

And some went the primitive route and started cursing at the customer support bot, which seems to trigger it to push you to an agent. I wouldn’t read into any life lessons with this one.

While these are innocent and examples that aren’t security incidents, they highlight a new attack surface. It’s not just hackers who can attack LLM-enabled systems. It’s everyday users who are just pissed off enough to start getting creative with their prompts.

Yes, zero-day exploits will always be a thing, and attackers will continue to largely rely on tried-and-true tactics, like compromising user accounts. But as new AI-powered systems make their way to production, you’re now not only protecting against CVEs, you’re protecting against natural language and business logic flaws.

The agentic defense layer must incorporate defenses that monitor prompts for signs of manipulation, the classic guardrail solution. But your defenses must also monitor agent behavior. Like every security control before it, guardrails will fail. That doesn’t make them irrelevant. They serve their purpose. They’re just one control in a larger system.

The next layer down is the agent itself. It begins with understanding the agent’s objective and baselining that activity. As the agent runs and receives new inputs, which lead to actions, you must evaluate the intent of that input and action back to the agent itself.

If you have questions about securing AI, let’s chat.

Reply

or to participate.