Superpowers:
The Anatomy of an Agent Skill
AI coding agents will skip your guardrails the moment they feel inconvenient. Superpowers - the 200k-star skills framework by Jesse Vincent - is a software development methodology disguised as markdown. Explore how a skill bootstraps itself into every session, why descriptions should never summarize, and what makes one skill stick where another gets quietly ignored.
Anyone who has spent time pair-programming with an LLM has felt the same small betrayal: you write a clear instruction in your CLAUDE.md, watch the agent acknowledge it, and then watch it cheerfully ignore the instruction twenty minutes later. The system prompt is not a contract. It is a suggestion the model weighs against everything else in its context.
Superpowers, created by
Jesse Vincent in October 2025, was not the first structured attempt to
make agents behave - Cursor's rules files and CLAUDE.md
conventions came earlier - but it is one of the most widely-adopted and
thoroughly worked-out. It is an installable plugin - the author calls it
"an agentic skills framework and software development methodology" - that
bundles fourteen "skills" (small markdown files encoding a development
methodology) with a bootstrap mechanism that forces the agent to consult
them. The skills are the visible part; the bootstrap is what turns inert
files into a framework. In seven months it has crossed 200,000 stars on
GitHub.
The interesting part is not that Superpowers exists. It is why it works. Each design choice - the frontmatter shape, the bootstrap hook, the sentence that opens every description, the bright-red "IRON LAW" banners - is a specific response to a specific failure mode the author observed in agents. Read together, the codebase is a textbook on how to write instructions a model will actually follow.
This explainer takes that textbook apart. We'll look at the four pieces that make Superpowers' skills effective - the bootstrap, the anatomy, the description rule, and the loophole-closing pattern - and end with a scorecard you can apply to any skill, in any framework.
The Bootstrap Problem
Before we talk about what a great skill looks like, there is a
more basic question: how does the agent know to use one at all? An agent
that doesn't reach for a skill is exactly as useful as no skill at all,
and in practice agents reach for things lazily. A skill is just a markdown
file sitting in a directory - and a file the agent never opens is no more
useful than the CLAUDE.md it already ignores.
One Meta-Skill Pre-Loaded into Every Session
Superpowers solves the bootstrap with a single trick: a
SessionStart hook. A hook is a plugin-level
capability, not a skill-level one - a SKILL.md file can't
register anything, so this is declared by the Superpowers plugin
(in its hooks/hooks.json) and wired up when you install and
enable the plugin. From then on it fires on every session
startup, clear, or compact. Each
time it fires, a small script reads one file -
using-superpowers/SKILL.md - and injects its full contents
into the session as additional context, wrapped in
<EXTREMELY_IMPORTANT> tags.
That meta-skill is the only one injected in full automatically. Its
job is to teach the agent that the Skill tool exists and
must be invoked aggressively. Every other skill is listed by name and
trigger description, but its body stays dormant on disk until the agent
reaches for it.
The hook registration is a few lines of JSON:
// hooks/hooks.json - shipped by the plugin, not by any skill { "hooks": { "SessionStart": [{ "matcher": "startup|clear|compact", "hooks": [{ "type": "command", // run-hook.cmd is a cross-platform wrapper that execs session-start "command": "\"${CLAUDE_PLUGIN_ROOT}/hooks/run-hook.cmd\" session-start", "async": false }] }] } }
The animation below shows what this actually does inside a session. Either way the harness lists the available skills - their names and trigger descriptions are visible to the agent through the Skill tool. What the bootstrap changes is propensity: with the hook off, nothing pushes the agent to act on that list, so it tends to improvise unless a skill is glaringly relevant. With the hook on, the meta-skill arrives before the first user token and tells the agent to reach for a skill on even a 1% chance it applies.
Skill tool either
way - the hook doesn't make them exist, it makes the agent reliably
reach for them. With the hook on, using-superpowers
pre-loads the aggressive "invoke on a 1% chance" rule. With it off, the
agent can still invoke a skill but frequently skips it and improvises.
This pattern - auto-load a meta-skill that makes the agent reliably reach for everything else - is the single most important design decision in the framework. It separates "skills the agent could use" from "skills the agent will use." The skills are discoverable without it; the bootstrap is what makes them reliably used rather than quietly ignored.
Anatomy of a SKILL.md
Each skill lives in skills/<name>/SKILL.md. The format
is deliberately spartan: a YAML frontmatter with exactly two fields, then
a markdown body that follows a small set of conventions. The brevity is
intentional - frequently-loaded skills are kept under 200 words because
every token a skill consumes is a token the agent can't spend on your
problem.
Here is the frontmatter from using-superpowers itself:
name: using-superpowers description: Use when starting any conversation - establishes how to find and use skills, requiring Skill tool invocation before ANY response including clarifying questions
Two fields, both with strict rules. name is verb-first,
kebab-case, max 64 characters: creating-skills, not
skill-creation. description is third-person, max
1024 characters, and - the part that violates most people's instincts -
it describes only the triggering conditions, never the workflow.
We'll return to why in the next section; it's the most surprising design
finding in the whole project.
The body has recurring elements that show up across nearly every skill.
Click any block in the inspector below to see what it does and why it's
there. The skill displayed is a compressed view of
test-driven-development, one of the framework's most
battle-tested.
SKILL.md files. The shape is shared; only the rules change.
Three pieces deserve special attention because they don't appear in most prompt frameworks:
First, the Iron Law: a single sentence in a code-block
banner that states the one rule the skill exists to enforce. TDD's Iron
Law is above. verification-before-completion has
NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE.
systematic-debugging has NO FIXES WITHOUT ROOT CAUSE
INVESTIGATION FIRST. The Iron Law is a load-bearing rhetorical
device - it lets the agent compress the entire skill to a single rule
it can hold in working memory.
Second, the Red Flags table: a two-column list mapping
the internal-monologue phrases that mean the agent is rationalizing to
the reality that should override them. From using-superpowers:
| Thought | Reality |
|---|---|
| "This is just a simple question" | Questions are tasks. Check for skills. |
| "I need more context first" | Skill check comes BEFORE clarifying questions. |
| "I remember this skill" | Skills evolve. Read the current version. |
| "The skill is overkill" | Simple things become complex. Use it. |
| "I'll just do this one thing first" | Check BEFORE doing anything. |
Third, and most distinctively, the Common Rationalizations table: an "Excuse / Reality" list of plausible-sounding reasons the agent might generate for skipping the discipline, each paired with a refutation. From the real TDD skill: "Too simple to test" -> "Simple code breaks. Test takes 30 seconds." and "Deleting X hours is wasteful" -> "Sunk cost fallacy. Keeping unverified code is technical debt." These come from observation, not imagination - we'll see how in the loopholes section.
Description as Trigger, Not Summary
Here is the design choice that violates everyone's instinct on first contact: the description must never summarize what the skill does. It should describe only when the skill applies, in third person, starting with the words "Use when." It must not name the steps the skill contains.
This sounds like pedantry until you read the bug report that produced it.
From the writing-skills SKILL.md:
"A description saying 'code review between tasks' caused Claude to do ONE review, even though the skill's flowchart clearly showed TWO reviews (spec compliance then code quality). When the description was changed to just 'Use when executing implementation plans with independent tasks' (no workflow summary), Claude correctly read the flowchart and followed the two-stage review process."
The mechanism is simple. If the description tells the agent what the skill does, the agent is liable to treat the description as the instructions and skip reading the body. If the description tells it only when to apply the skill, the agent has nothing to act on but the trigger, so it opens the file to find the steps.
The figure below makes the failure visible. Toggle between a
workflow-summary description and a trigger-only description for the same
skill - the framework's real executing-plans example - and
watch what reaches the agent's context.
The rule that falls out is concrete, and it applies to almost any skill framework, not just Superpowers:
Both describe the same skill; these are the actual bad and good
examples from the framework's own writing-skills guide. The
second form contains zero instructions - it only loads when the
described situation arises. Combine that with a body that contains the
real discipline, and you get a skill the agent can't shortcut by
skimming the index.
Closing the Loopholes
Every skill in Superpowers reads like it was written by someone who has
watched an LLM weasel out of the rule before. That is because every
skill in Superpowers was. Jesse Vincent applies the TDD cycle
to skill-writing itself - the SKILL.md plays the role of
production code, and a pressure scenario where the agent rationalizes
around the rule is the failing test:
| TDD concept | Skill creation |
|---|---|
| Write the test first | Run a pressure scenario with a subagent before writing the skill |
| Watch it fail (RED) | Document the exact rationalizations the agent produces, verbatim |
| Minimal code to pass | Write a skill that addresses those specific rationalizations |
| Refactor | Close remaining loopholes while keeping compliance green |
The "pressure scenario" is the load-bearing piece. You give a subagent an artificial constraint - a $5,000-per-minute production outage, sunk cost from earlier work, an authority figure telling them to ship - and watch how they justify skipping the rule. You don't have to imagine excuses. The model produces them. You write them into the skill verbatim, with refutations, and the next subagent has no fresh excuses left.
This produces a specific texture in the writing. Compare a fragile rule to a Superpowers rule:
The second form anticipates the rationalizations - "I'll keep it as reference," "I'll adapt it," "I'll just glance at it" - and forecloses each one explicitly. Add the Superpowers stock phrase "Violating the letter of the rules is violating the spirit of the rules" and you've also pre-empted the meta-rationalization where the agent claims it's following the spirit while breaking the letter.
The Persuasion Layer
The most distinctive design choice in Superpowers is that its rhetoric is
explicitly grounded in persuasion research. The
writing-skills/persuasion-principles.md document cites
Cialdini's Influence and a 2025 study by Meincke et al. that
found persuasion techniques roughly doubled LLM compliance with hard
requests, from 33% to 72% across 28,000 conversations.
Each Cialdini principle maps onto a writing technique you can spot in any Superpowers skill:
What the Skill Rhetoric Is Actually Doing
- Authority - "YOU MUST", "Never", "No exceptions". Heavy in TDD and verification skills.
- Commitment - "Announce skill usage", required TodoWrite checklists, explicit choice statements.
- Scarcity - "Before proceeding", "IMMEDIATELY after X". Time-bounded action windows.
- Social proof - "Every time", "X without Y = failure". Universal-pattern framing.
- Unity - "we're colleagues", "our codebase". Aligns the agent with the user's interest.
- Reciprocity & liking - used sparingly; they can feel manipulative or conflict with honest feedback.
Whether you find this manipulative or merely effective depends on your prior. Either way, it works. The capitalized "YOU MUST" and "NO EXCEPTIONS" phrases that look out of place in technical documentation are doing actual mechanical work on the model's compliance probability.
A Scorecard for Skills
Pulling everything together: here is the rubric Superpowers implicitly teaches for evaluating any agent skill you write, in any framework. Treat the must / should weighting as empirical - it reflects how late-2025 models and harnesses behaved - rather than as fixed law; the note below the table covers what has already shifted.
creating-skills), real error strings in the description.@-syntax. @-loads burn context for skills not yet needed.A note on the date. This rubric encodes how models and harnesses behaved when Superpowers shipped in October 2025, and some of the Musts are already softening. Harnesses now surface skills natively - Anthropic's Agent Skills became a cross-vendor open standard in December 2025 - so a plugin-level auto-bootstrap is less load-bearing than it was. Stronger instruction-following makes a model less likely to skip a skill's body just because the description summarized the steps. And much larger context windows make the under-200-words token economy far less binding. The tactics relax as models improve; the principle behind them does not.
And that principle is what ties the rubric together: every choice optimizes for the agent under pressure. An agent that is bored, certain, or in a hurry. The cheerful path is easy. The hard part is the agent at the moment it would otherwise rationalize, and every Superpowers convention is a counter-measure for that exact moment - however much the moment recedes as models get better.
Why It Matters
Two things make Superpowers worth studying beyond the framework itself.
First, it is a working demonstration that you can encode software engineering discipline - TDD, root-cause debugging, code review, verification-before-completion - in a form that models will follow under pressure. The agent doesn't internalize the discipline; it consults it. That distinction is the gap between a methodology that works in articles and one that survives production use.
Second, the design choices are general. The bootstrap mechanism works for any host that supports SessionStart hooks. The "description as trigger" rule applies to any retrieval-augmented prompt system. The Iron Law and the loophole-closing pattern translate directly to system prompts, agent instructions, and tool documentation. Skill-building is becoming its own discipline, and Superpowers is the most thoroughly worked-out example we have.
The 200,000 stars are not really for the fourteen skills it ships. They are for the methodology of writing them - the demonstration that an agent skill can be small, brutal, persuasive, and reliably triggered. The skills are just the existence proof.
One Skill at a Time
If you want to write your own: pick one workflow you already do repeatedly. Write the frontmatter with a trigger-only description. Pick the one rule you wish the agent would never break. Pressure-test it on a subagent. Capture the rationalizations. Close them one at a time. That is the whole loop - it's the same loop Superpowers used to get here.