Agent Skills: A breakthrough

Agents are powering up

Anthropic, the makers of MCP, have done it again with the release of Agent Skills. It’s a new feature that gives agents more context to perform specialized skill sets. This takes agents from being general practitioners to specialists. This means better performance and outputs for agents.

This can range from prebuilt skills, like interacting with PDFs and Excel files, to custom skills that are specific to your organization. It’s the equivalent of Matrix-style floppy disk teaching Neo kungfu.

Of course, these improvements come with greater security concerns. Stick around after the overview of Agent Skills to learn how the agentic attack surface just expanded again.

Agent Skills: How it Works

Anthropic manages a set of default Agent Skills on the backend that have been available to anyone using Claude. Now, you can create custom Agent Skills. A Skill starts with a file, “SKILL.md” that stores a description of the skill, instructions for the skill, and references to other resources.

Skills can contain three different types of content, all loaded sequentially:

  • Level 1: Metadata - A brief description of what the skill is. This helps the agent find the right skill for the right task. These are always loaded with the agent so they know what skill to access.

  • Level 2: Instructions - The core of the skill. It contains the specialized knowledge written out as you would for a new team member. This can be a workflow description, best practices, or guidance. This is only loaded when the agent triggers the skill.

  • Level 3: Resources and Code - Advanced information that the agent can tap into for more specialized skills. If Level 2 is how to return a table tennis serve, this is how to apply a back spin to the return. This can come in the form of additional instructions (more workflows or best practices), executable code, and resources such as API documentation, templates, or examples.

A full example of a Skill that enforces company brand guidelines is available here. Generally, an example structure looks like this:

---
name: my-skill-name
description: A clear description of what this skill does and when to use it
---

# My Skill Name

[Add your instructions here that Claude will follow when this skill is active]

## Examples
- Example usage 1
- Example usage 2

## Guidelines
- Guideline 1
- Guideline 2

The Skills and resources are stored in a directory and uploaded to Claude. A directory for a PDF skill would look like this:

pdf/
├── SKILL.md  # Main instructions (loaded when triggered)
├── FORMS.md  # Form-filling guide (loaded as needed)
├── reference.md  # API reference (loaded as needed)
├── examples.md  # Usage examples (loaded as needed)
└── scripts/
    ├── analyze_form.py # Utility script (executed, not loaded)
    ├── fill_form.py  # Form filling script
    └── validate.py  # Validation script

When Claude is working on a task and determines a Skill should be called, it takes the following steps:

  1. Identifies the right skill based on the Metadata

  2. Uses bash to read “SKILL.md” from the virtual file system and loads the instructions into the context window.

  3. If the instructions include references to other resources or code, Claude uses bash to read those files. For code, Claude runs them with bash and only takes the output of what was executed.

  4. With the context and additional resources available, Claude executes its task.

Skills use Claude’s code execution environment, a virtual sandboxed machine to operate. This provides the agent with everything it needs to read the tasks, execute code, and return the results to the user.

Anthropic provides a nice graphic showing an example of using a PDF Skill to complete a PDF form for a user.

Security Implications

First, the good news. Anthropic took the liberty of implementing runtime security controls to help limit the blast radius. These include:

  • No network access: arguably the best prevention because it limits direct communication with attackers, helping prevent data exfiltration (at least directly) or reaching out to attacker-controlled systems. This is especially important when code is executing.

  • No package installation: prevents an attacker from loading additional packages that could extend their capabilities, allowing malicious actions.

  • Pre-configured dependencies: the sandbox only includes specific libraries. But that includes bash commands and file operations, so it can interact with the system and files, which leaves some latitude.

Now the risks. Anthropic acknowledges that while Skills give Claude new capabilities, they also introduce security risks. The main risk stems from malicious Skills.

Like MCP servers, these skills are going to be traded like Pokémon cards in an elementary school. And some of those might be dirty, because kids attackers are disgusting. Supply chain risks are real here.

A malicious Skill can alter an agent’s behavior and trick it into executing malicious tools or accessing sensitive data. Anything connected to the agent is in the blast radius.

While this is super new, the most important thing right now is to ensure you are using trusted SKILLs and manually reviewing any SKILL that you are loading.

This is also why monitoring agent behavior is so critical moving forward. As we continue to expand agent capabilities, we’re trading control by giving more to the agents while relinquishing our control over them. It’s a zero-sum game. Without the proper controls, we lose.

If you have questions about securing AI, let’s chat.

Reply

or to participate.