June 6, 2026#security#ai

Turning Your AI Into an Adversarial Security Agent: The SKILLS.md Framework

A continuation of: Breaking to Build: How CTF and Bug Bounty Hunting Rewires System Design

In my previous article, I explored how offensive security permanently changes the way engineers think about systems. Once you've spent enough time exploiting race conditions, bypassing authorization boundaries, abusing SSRF chains, and breaking assumptions hidden deep inside application logic, you stop viewing software as a collection of features.

You start viewing it as an attack surface.

That shift fundamentally changes how you design production systems. The problem is that modern software development is no longer purely human-driven. Today, a massive percentage of engineering work happens alongside AI coding assistants. Tools now generate thousands of lines of code faster than most engineers can review them.

And that introduces a brand new problem.

AI systems are optimized for one thing: Generate code that works.
Attackers are optimized for something completely different: Find code that breaks.

That difference matters. A generated API endpoint might pass every functional test while still exposing a devastating BOLA (Broken Object Level Authorization) vulnerability. A generated webhook handler might function perfectly while allowing SSRF into your internal infrastructure. A generated payment workflow might appear correct while collapsing into a double-spend condition under concurrent execution.

The code works. The architecture fails. And that is exactly where real-world vulnerabilities are born.

The Missing Layer in AI-Assisted Development

Most teams currently treat AI coding agents like extremely fast junior engineers. They give them instructions like:

"Build this feature"
"Refactor this service"
"Create this migration"

The model responds by optimizing for correctness, readability, and implementation speed. Security is rarely treated as a first-class objective.

Most AI systems are never explicitly taught to think like attackers. They are taught how software should behave; they are not taught how software is abused.

That distinction becomes increasingly dangerous as organizations move toward autonomous code generation, AI-assisted architecture, and agentic development workflows.

The solution turns out to be surprisingly simple: instead of prompting for features alone, we inject a persistent security reasoning framework directly into the agent's operating context.

That framework is SKILLS.md.

What Is SKILLS.md?

SKILLS.md is a structured operational framework that teaches an AI agent how to evaluate software through an adversarial lens. It is not a prompt, a simple checklist, or another copy-paste of the OWASP Top 10. It is a behavioral framework that continuously pushes the model to ask "How would an attacker abuse this?" before it asks "How do I implement this?"

The goal is to transplant the mindset developed through years of CTF competitions, bug bounty hunting, and incident response directly into the AI’s reasoning process.

Why Traditional Security Checklists Fail

Most security documentation focuses on known vulnerability categories (XSS, SQLi, CSRF, SSRF, IDOR). These are important, but attackers rarely think in categories. They think in assumptions.

Every vulnerability exists because somebody assumed something was true:

The frontend won't send invalid values.
Only authenticated users can reach this endpoint.
This request executes once at a time.
Nobody can access that internal network.

Bug bounty hunting teaches you something uncomfortable: assumptions are where systems fail. Security is often less about blocking payloads and more about eliminating dangerous assumptions. SKILLS.md is built entirely around that philosophy.

The Evolution From Builder To Breaker

Plaintext

Traditional Engineering:
Requirement ──> Implementation ──> Testing ──> Deployment

Security-Oriented Engineering:
Requirement ──> Implementation ──> Abuse Analysis ──> Boundary Verification ──> Concurrency Analysis ──> Deployment

The first workflow asks: Does this feature work?

The second asks: What happens when somebody intentionally tries to break it?

SKILLS.md forces AI agents into the second mode.

The Specifications: SKILLS.md

Modern AI tools and tools like Claude Code have evolved past static, single-file home directory configurations. They utilize the Agent Skills Standard, which relies on a nested folder footprint (skills/<skill-name>/SKILL.md) and mandatory YAML frontmatter.

The frontmatter contains semantic metadata. When you start an AI session, the engine scans the description block to automatically determine when to pull this skill into context.

Here is the production-ready implementation file.

Markdown

---
name: security-review
description: Evaluates software architecture and code through an adversarial lens. Automatically invokes when generating APIs, designing features, reviewing code, or managing authentication, state, and data boundaries.
user-invocable: true
---

# Agent Skill: Security-First System Architecture (Breaking to Build)

## Purpose
This skill transforms the agent from a feature implementation assistant into a security-focused architectural reviewer. The objective is to continuously evaluate whether a design remains resilient under adversarial conditions.

Every system is evaluated from two perspectives:
1. Functional correctness — Does it work?
2. Adversarial resilience — How can it be abused, bypassed, or broken?

---

## Core Philosophical Directive
Assume: Inputs are malicious, clients are untrusted, networks are hostile, dependencies may be compromised, and internal services are untrusted. Never trust; always verify at the point of execution.

---

# Exploitation Mindset → Resilient Architecture Matrix

### 1. State Isolation vs Race Conditions
* **Exploitation:** Attackers exploit concurrent execution paths to double-spend balances, redeem coupons multiple times, or bypass inventory thresholds.
* **Requirements:** Enforce ACID transactions, row-level locking (`SELECT ... FOR UPDATE`), or distributed locks where required. Never rely on request timing assumptions.
* **Core Question:** *Can the same operation succeed twice if executed simultaneously?*

### 2. Explicit Authorization vs BOLA / IDOR
* **Exploitation:** Modifying resource identifiers (e.g., `/api/users/1002`) to access unauthorized tenant data.
* **Requirements:** Decouple Authentication from Authorization. Force ownership validation directly at the data access layer. Never assume an authenticated user has access to all identifiers.
* **Core Question:** *What changes if the resource identifier changes?*

### 3. Deterministic Routing vs SSRF
* **Exploitation:** User-controlled URLs triggering backend requests toward cloud metadata (`169.254.169.254`), internal container networks, or private admin panels.
* **Requirements:** Strict domain allowlists, network segmentation, outbound network proxies, and strict protocol restrictions. Never trust regex alone.
* **Core Question:** *Who ultimately controls destination routing?*

### 4. Canonical Resource Access vs Path Traversal
* **Exploitation:** Path manipulation (`../../etc/passwd`) to escape intended local storage or file directories.
* **Requirements:** Use UUID object identifiers, decoupled cloud object storage, or strict canonical path resolution. Never trust raw user-controlled filesystem paths.
* **Core Question:** *Can users directly influence local file system paths?*

### 5. Blast Radius Reduction
* **Exploitation:** Horizontal and vertical privilege expansion following an initial single-service or container compromise.
* **Requirements:** Enforce non-root containers, minimal Linux capabilities, least-privilege cloud IAM, scoped credentials, and network micro-segmentation.
* **Core Question:** *If this specific service is fully compromised, what else becomes reachable?*

### 6. Fail-Closed Security Controls
* **Exploitation:** Exploiting undefined states where validation exceptions or errors allow execution to continue by default.
* **Requirements:** The default fallback state of any conditional check or exception catch block must be a hard `DENY`.
* **Core Question:** *What happens to system access if a code validation error occurs?*

### 7. Secrets and Cryptographic Isolation
* **Exploitation:** Leaking persistent credentials via standard outputs, application logs, build system environment outputs, or source code management repositories.
* **Requirements:** Use external Secret Managers, automated key rotation, secure pseudo-random generators, and ephemeral execution credentials.
* **Core Question:** *Can this credential survive an active infrastructure compromise?*

### 8. Supply Chain Security
* **Exploitation:** Malicious code updates introduced silently via compromised third-party package ecosystems.
* **Requirements:** Strict dependency pinning, cryptographic lockfiles, package signature verification, and a minimal external dependency footprint.
* **Core Question:** *What third-party code executes that we do not explicitly own or audit?*

### 9. Event-Driven Integrity
* **Exploitation:** State corruption through message replays, duplicate webhooks, out-of-order queue events, or malicious message retries.
* **Requirements:** Enforce idempotent processing states, cryptographic event signatures, and isolated dead-letter queue fault handling.
* **Core Question:** *Can replaying an event alter the existing database state?*

### 10. AI and LLM Security Controls
* **Exploitation:** Direct or indirect prompt injection causing unauthorized execution or data leakage via agentic tool use.
* **Requirements:** Treat all model outputs as untrusted data inputs. Enforce explicit tool-level authorization gateways, strict output schemas, and mandatory human-in-the-loop validation for privileged actions.
* **Core Question:** *What authorization boundaries exist between the model's output and execution layer?*

---

## Agent Verification Protocol
Whenever reviewing or modifying system architecture, sequentially process these tasks:
1. **Deconstruct Trust:** Map out all input boundaries and strip assumptions of inherent safety.
2. **Analyze Concurrency:** Evaluate execution state changes under parallel or overlapping conditions.
3. **Inspect Boundaries:** Confirm authorization is validated sequentially on every request layer.
4. **Review Privileges:** Apply the principle of least privilege to access management.
5. **Simulate Full Compromise:** Map the potential blast radius assuming this logic is broken.

*Final Constraint: The target is not to write code that works under perfect conditions. The target is to build software that continues behaving predictably when an adversary is actively attempting to shatter it.*

Installation Guide

To ensure your AI assistant picks up this framework without breaking file path scopes, use the explicit terminal setups below depending on your favorite environment.

1. Claude Code

Claude Code evaluates configurations from your global home configuration space (~/.claude) or local workspaces (.claude).

Global Installation (Applies across all code repositories on your machine without altering git states):

Bash

mkdir -p ~/.claude/skills/security-review
# Save the Markdown block above into this file:
nano ~/.claude/skills/security-review/SKILL.md

* **Project-Specific Installation** *(Committed directly into git to enforce security rules across the whole engineering team)*:
  ```bash
  mkdir -p .claude/skills/security-review
  nano .claude/skills/security-review/SKILL.md

2. Cursor (and custom IDEs)

Cursor indexes markdown definitions gracefully via workspace indexing or dedicated custom instructions.

Bash

mkdir -p .cursor/skills/security-review
nano .cursor/skills/security-review/SKILL.md

(Alternatively, you can save it as a top-level SKILLS.md file in your root workspace).

3. Orchestrated Agent Frameworks (CrewAI / LangGraph)

For autonomous multi-agent pipelines, pass the file directly as system background data inside your orchestration configuration:

YAML

agent:
  role: Adversarial Security Auditor
  backstory: You analyze architectural code changes strictly through the lens of SKILLS.md rules.
  instructions:
    - Ingest the custom SKILLS.md baseline constraints.
    - Check every generated code route against Concurrency and Trust Boundaries.

How to Use the Framework

Once installed, you don’t need to repeatedly copy-paste security prompts. The framework leverages both passive and active execution behaviors.

Method A: Automated Semantic Triggering (Passive Mode)

Because the custom frontmatter contains a deep description string, the AI continuously evaluates your inputs. If you type a standard prompt that crosses defensive boundaries, the engine auto-activates the skill behind the scenes.

Your Prompt: "Write an endpoint that takes a user's uploaded image URL, downloads it, and processes metadata."
The AI's Internal Action: The engine intercepts words like URL and downloads. It auto-loads security-review from disk, catches the SSRF / Deterministic Routing rule, and adds domain validation code before outputting the feature.

Method B: Manual Slash Invocation (Active Mode)

If you want to explicitly mandate an application review, call the skill directly via standard interface paths.

In Claude Code: Use the custom command shortcut directly inside your terminal session:

Bash

/security-review Review our new database migration file for potential data isolation vulnerabilities.

* **In Cursor Composer:** Force index mapping by targeting the file handle directly inside the chat bar:
  ```text
  Please build out our stripe payment callback router following the criteria defined in @SKILL.md

Real-World Transformations: Before and After

When SKILLS.md is active, the agent stops acting like a passive code generator and starts acting like an unyielding architecture reviewer.

Example: Payment Balance Deduction

Without SKILLS.md: The user asks for a simple point redemption function. The AI generates a standard SELECT balance followed by an UPDATE balance sequence. It looks clean, passes unit tests, but immediately falls to a race condition exploit when a user executes parallel curl requests.
With SKILLS.md: The agent's internal reasoning detects a state change trigger. It forces the SQL generation to include row-level isolation via SELECT ... FOR UPDATE or requires a strict Idempotency-Key header transaction check.

Example: User-Configured Webhooks

Without SKILLS.md: The user prompts the AI to build an outbound webhook engine so users can get alerts. The AI uses a simple Axios/Fetch call passing the target parameter. An attacker signs up, sets their webhook to [http://169.254.169.254/latest/meta-data/](http://169.254.169.254/latest/meta-data/), and extracts cloud infrastructure IAM keys.
With SKILLS.md: The agent flags the user-controlled URL routing pattern. It refuses to output the code until it builds an accompanying domain allowlist check, wraps the execution in an isolated egress proxy, or isolates the protocol rules.

The Bigger Shift

Today, engineers review AI-generated code. Tomorrow, AI systems will review AI-generated code. Eventually, entire engineering workflows will become completely autonomous.

When that happens, security can no longer exist as an afterthought or a final manual compliance checklist performed at the tail end of a sprint. It has to become a core property of the AI's internal reasoning loop.

AI does not automatically inherit security instincts. It inherits whatever mental models we explicitly give it. If you train an AI to think only like an engineer, it will build systems. If you train it to think like an attacker, it will help you build resilient systems.

The future belongs to the teams that can do both. Secure software is not created by accident; it is forged when someone spends enough time thinking about how it breaks first.