Interactive Explainer

Superpowers:
The Anatomy of an Agent Skill

AI coding agents will skip your guardrails the moment they feel inconvenient. Superpowers - the 200k-star skills framework by Jesse Vincent - is a software development methodology disguised as markdown. Explore how a skill bootstraps itself into every session, why descriptions should never summarize, and what makes one skill stick where another gets quietly ignored.

May 2026 · 13 min read · 4 figures

Anyone who has spent time pair-programming with an LLM has felt the same small betrayal: you write a clear instruction in your CLAUDE.md, watch the agent acknowledge it, and then watch it cheerfully ignore the instruction twenty minutes later. The system prompt is not a contract. It is a suggestion the model weighs against everything else in its context.

Superpowers, created by Jesse Vincent in October 2025, was not the first structured attempt to make agents behave - Cursor's rules files and CLAUDE.md conventions came earlier - but it is one of the most widely-adopted and thoroughly worked-out. It is an installable plugin - the author calls it "an agentic skills framework and software development methodology" - that bundles fourteen "skills" (small markdown files encoding a development methodology) with a bootstrap mechanism that forces the agent to consult them. The skills are the visible part; the bootstrap is what turns inert files into a framework. In seven months it has crossed 200,000 stars on GitHub.

200k+
GitHub stars
14
shipped skills
7
months old
6+
agent platforms

The interesting part is not that Superpowers exists. It is why it works. Each design choice - the frontmatter shape, the bootstrap hook, the sentence that opens every description, the bright-red "IRON LAW" banners - is a specific response to a specific failure mode the author observed in agents. Read together, the codebase is a textbook on how to write instructions a model will actually follow.

This explainer takes that textbook apart. We'll look at the four pieces that make Superpowers' skills effective - the bootstrap, the anatomy, the description rule, and the loophole-closing pattern - and end with a scorecard you can apply to any skill, in any framework.

The Bootstrap Problem

Before we talk about what a great skill looks like, there is a more basic question: how does the agent know to use one at all? An agent that doesn't reach for a skill is exactly as useful as no skill at all, and in practice agents reach for things lazily. A skill is just a markdown file sitting in a directory - and a file the agent never opens is no more useful than the CLAUDE.md it already ignores.

Design Choice

One Meta-Skill Pre-Loaded into Every Session

Superpowers solves the bootstrap with a single trick: a SessionStart hook. A hook is a plugin-level capability, not a skill-level one - a SKILL.md file can't register anything, so this is declared by the Superpowers plugin (in its hooks/hooks.json) and wired up when you install and enable the plugin. From then on it fires on every session startup, clear, or compact. Each time it fires, a small script reads one file - using-superpowers/SKILL.md - and injects its full contents into the session as additional context, wrapped in <EXTREMELY_IMPORTANT> tags.

That meta-skill is the only one injected in full automatically. Its job is to teach the agent that the Skill tool exists and must be invoked aggressively. Every other skill is listed by name and trigger description, but its body stays dormant on disk until the agent reaches for it.

The hook registration is a few lines of JSON:

// hooks/hooks.json - shipped by the plugin, not by any skill
{
  "hooks": {
    "SessionStart": [{
      "matcher": "startup|clear|compact",
      "hooks": [{
        "type": "command",
        // run-hook.cmd is a cross-platform wrapper that execs session-start
        "command": "\"${CLAUDE_PLUGIN_ROOT}/hooks/run-hook.cmd\" session-start",
        "async": false
      }]
    }]
  }
}

The animation below shows what this actually does inside a session. Either way the harness lists the available skills - their names and trigger descriptions are visible to the agent through the Skill tool. What the bootstrap changes is propensity: with the hook off, nothing pushes the agent to act on that list, so it tends to improvise unless a skill is glaringly relevant. With the hook on, the meta-skill arrives before the first user token and tells the agent to reach for a skill on even a 1% chance it applies.

Figure 1 - Session Bootstrap
Hook:
The SessionStart hook runs before the user's first message arrives. The skills are listed in the Skill tool either way - the hook doesn't make them exist, it makes the agent reliably reach for them. With the hook on, using-superpowers pre-loads the aggressive "invoke on a 1% chance" rule. With it off, the agent can still invoke a skill but frequently skips it and improvises.

This pattern - auto-load a meta-skill that makes the agent reliably reach for everything else - is the single most important design decision in the framework. It separates "skills the agent could use" from "skills the agent will use." The skills are discoverable without it; the bootstrap is what makes them reliably used rather than quietly ignored.

Anatomy of a SKILL.md

Each skill lives in skills/<name>/SKILL.md. The format is deliberately spartan: a YAML frontmatter with exactly two fields, then a markdown body that follows a small set of conventions. The brevity is intentional - frequently-loaded skills are kept under 200 words because every token a skill consumes is a token the agent can't spend on your problem.

Here is the frontmatter from using-superpowers itself:

---
name: using-superpowers
description: Use when starting any conversation - establishes how to find
  and use skills, requiring Skill tool invocation before ANY response including
  clarifying questions
---

Two fields, both with strict rules. name is verb-first, kebab-case, max 64 characters: creating-skills, not skill-creation. description is third-person, max 1024 characters, and - the part that violates most people's instincts - it describes only the triggering conditions, never the workflow. We'll return to why in the next section; it's the most surprising design finding in the whole project.

The body has recurring elements that show up across nearly every skill. Click any block in the inspector below to see what it does and why it's there. The skill displayed is a compressed view of test-driven-development, one of the framework's most battle-tested.

Figure 2 - SKILL.md Inspector
Skill:
Hover any block to see its role. Every Superpowers skill shares the same backbone - a strict two-field frontmatter, the letter-and-spirit clause, and an Iron Law naming the single non-negotiable - then adds skill-specific guardrails: explicit loophole-closing lists, gate checklists, or phase gates. All text here is taken from the real SKILL.md files. The shape is shared; only the rules change.

Three pieces deserve special attention because they don't appear in most prompt frameworks:

NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST

First, the Iron Law: a single sentence in a code-block banner that states the one rule the skill exists to enforce. TDD's Iron Law is above. verification-before-completion has NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE. systematic-debugging has NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST. The Iron Law is a load-bearing rhetorical device - it lets the agent compress the entire skill to a single rule it can hold in working memory.

Second, the Red Flags table: a two-column list mapping the internal-monologue phrases that mean the agent is rationalizing to the reality that should override them. From using-superpowers:

ThoughtReality
"This is just a simple question"Questions are tasks. Check for skills.
"I need more context first"Skill check comes BEFORE clarifying questions.
"I remember this skill"Skills evolve. Read the current version.
"The skill is overkill"Simple things become complex. Use it.
"I'll just do this one thing first"Check BEFORE doing anything.

Third, and most distinctively, the Common Rationalizations table: an "Excuse / Reality" list of plausible-sounding reasons the agent might generate for skipping the discipline, each paired with a refutation. From the real TDD skill: "Too simple to test" -> "Simple code breaks. Test takes 30 seconds." and "Deleting X hours is wasteful" -> "Sunk cost fallacy. Keeping unverified code is technical debt." These come from observation, not imagination - we'll see how in the loopholes section.

Description as Trigger, Not Summary

Here is the design choice that violates everyone's instinct on first contact: the description must never summarize what the skill does. It should describe only when the skill applies, in third person, starting with the words "Use when." It must not name the steps the skill contains.

This sounds like pedantry until you read the bug report that produced it. From the writing-skills SKILL.md:

"A description saying 'code review between tasks' caused Claude to do ONE review, even though the skill's flowchart clearly showed TWO reviews (spec compliance then code quality). When the description was changed to just 'Use when executing implementation plans with independent tasks' (no workflow summary), Claude correctly read the flowchart and followed the two-stage review process."

The mechanism is simple. If the description tells the agent what the skill does, the agent is liable to treat the description as the instructions and skip reading the body. If the description tells it only when to apply the skill, the agent has nothing to act on but the trigger, so it opens the file to find the steps.

The figure below makes the failure visible. Toggle between a workflow-summary description and a trigger-only description for the same skill - the framework's real executing-plans example - and watch what reaches the agent's context.

Figure 3 - Workflow vs Trigger Descriptions
Description style:
When the description summarizes the workflow, the agent reads it, believes it has enough, and skips opening the file. When the description only describes the trigger condition, the agent has to load the body to learn the steps. The skill body contains the full discipline; the description should be just enough to get the agent to fetch it.

The rule that falls out is concrete, and it applies to almost any skill framework, not just Superpowers:

Don't write "Use for TDD - write test first, watch it fail, write minimal code, refactor."
Do write "Use when implementing any feature or bugfix, before writing implementation code."

Both describe the same skill; these are the actual bad and good examples from the framework's own writing-skills guide. The second form contains zero instructions - it only loads when the described situation arises. Combine that with a body that contains the real discipline, and you get a skill the agent can't shortcut by skimming the index.

Closing the Loopholes

Every skill in Superpowers reads like it was written by someone who has watched an LLM weasel out of the rule before. That is because every skill in Superpowers was. Jesse Vincent applies the TDD cycle to skill-writing itself - the SKILL.md plays the role of production code, and a pressure scenario where the agent rationalizes around the rule is the failing test:

TDD conceptSkill creation
Write the test firstRun a pressure scenario with a subagent before writing the skill
Watch it fail (RED)Document the exact rationalizations the agent produces, verbatim
Minimal code to passWrite a skill that addresses those specific rationalizations
RefactorClose remaining loopholes while keeping compliance green

The "pressure scenario" is the load-bearing piece. You give a subagent an artificial constraint - a $5,000-per-minute production outage, sunk cost from earlier work, an authority figure telling them to ship - and watch how they justify skipping the rule. You don't have to imagine excuses. The model produces them. You write them into the skill verbatim, with refutations, and the next subagent has no fresh excuses left.

Figure 4 - The Pressure-Test Loop
One iteration of skill TDD. Pressure scenario fires, subagent rationalizes its way around the rule, the rationalization gets logged verbatim, the skill is updated to refute it, the next subagent hits the same scenario and complies. The loop repeats until the skill survives the worst case you can construct.

This produces a specific texture in the writing. Compare a fragile rule to a Superpowers rule:

Fragile "Delete code written before tests."
Loophole-closed "Delete it. Start over. Don't keep it as 'reference'. Don't 'adapt' it while writing tests. Don't look at it. Delete means delete."

The second form anticipates the rationalizations - "I'll keep it as reference," "I'll adapt it," "I'll just glance at it" - and forecloses each one explicitly. Add the Superpowers stock phrase "Violating the letter of the rules is violating the spirit of the rules" and you've also pre-empted the meta-rationalization where the agent claims it's following the spirit while breaking the letter.

The Persuasion Layer

The most distinctive design choice in Superpowers is that its rhetoric is explicitly grounded in persuasion research. The writing-skills/persuasion-principles.md document cites Cialdini's Influence and a 2025 study by Meincke et al. that found persuasion techniques roughly doubled LLM compliance with hard requests, from 33% to 72% across 28,000 conversations.

Each Cialdini principle maps onto a writing technique you can spot in any Superpowers skill:

Cialdini Mapping

What the Skill Rhetoric Is Actually Doing

  • Authority - "YOU MUST", "Never", "No exceptions". Heavy in TDD and verification skills.
  • Commitment - "Announce skill usage", required TodoWrite checklists, explicit choice statements.
  • Scarcity - "Before proceeding", "IMMEDIATELY after X". Time-bounded action windows.
  • Social proof - "Every time", "X without Y = failure". Universal-pattern framing.
  • Unity - "we're colleagues", "our codebase". Aligns the agent with the user's interest.
  • Reciprocity & liking - used sparingly; they can feel manipulative or conflict with honest feedback.

Whether you find this manipulative or merely effective depends on your prior. Either way, it works. The capitalized "YOU MUST" and "NO EXCEPTIONS" phrases that look out of place in technical documentation are doing actual mechanical work on the model's compliance probability.

A Scorecard for Skills

Pulling everything together: here is the rubric Superpowers implicitly teaches for evaluating any agent skill you write, in any framework. Treat the must / should weighting as empirical - it reflects how late-2025 models and harnesses behaved - rather than as fixed law; the note below the table covers what has already shifted.

Auto-bootstrap
Must
A skill the agent has to remember to load is a skill the agent will skip. Use SessionStart hooks or always-loaded meta-skills.
Trigger-only description
Must
Describe when, never how. If the description summarizes the steps, the agent skips the body.
One Iron Law
Must
Compress the whole skill to one rule the agent can hold while working. Multiple "important" rules dilute each other.
Pressure-tested rationalizations
Must
Find the excuses by running a subagent, not by guessing. Refute each one explicitly in the skill body.
Letter-and-spirit clause
Should
Add the line "Violating the letter is violating the spirit." Pre-empts the meta-rationalization loophole.
Token economy
Should
Keep frequently-loaded skills under 200 words. Move reference material to sibling files, link by name.
Rigid vs flexible label
Should
Tell the agent whether to follow exactly (discipline) or adapt (patterns). Mislabeled flexibility kills discipline skills.
Searchable naming
Should
Verb-first names, gerunds for processes (creating-skills), real error strings in the description.
Cross-reference by name
Optional
Link to other skills with plain names, never @-syntax. @-loads burn context for skills not yet needed.

A note on the date. This rubric encodes how models and harnesses behaved when Superpowers shipped in October 2025, and some of the Musts are already softening. Harnesses now surface skills natively - Anthropic's Agent Skills became a cross-vendor open standard in December 2025 - so a plugin-level auto-bootstrap is less load-bearing than it was. Stronger instruction-following makes a model less likely to skip a skill's body just because the description summarized the steps. And much larger context windows make the under-200-words token economy far less binding. The tactics relax as models improve; the principle behind them does not.

And that principle is what ties the rubric together: every choice optimizes for the agent under pressure. An agent that is bored, certain, or in a hurry. The cheerful path is easy. The hard part is the agent at the moment it would otherwise rationalize, and every Superpowers convention is a counter-measure for that exact moment - however much the moment recedes as models get better.

Why It Matters

Two things make Superpowers worth studying beyond the framework itself.

First, it is a working demonstration that you can encode software engineering discipline - TDD, root-cause debugging, code review, verification-before-completion - in a form that models will follow under pressure. The agent doesn't internalize the discipline; it consults it. That distinction is the gap between a methodology that works in articles and one that survives production use.

Second, the design choices are general. The bootstrap mechanism works for any host that supports SessionStart hooks. The "description as trigger" rule applies to any retrieval-augmented prompt system. The Iron Law and the loophole-closing pattern translate directly to system prompts, agent instructions, and tool documentation. Skill-building is becoming its own discipline, and Superpowers is the most thoroughly worked-out example we have.

The 200,000 stars are not really for the fourteen skills it ships. They are for the methodology of writing them - the demonstration that an agent skill can be small, brutal, persuasive, and reliably triggered. The skills are just the existence proof.

Try It

One Skill at a Time

If you want to write your own: pick one workflow you already do repeatedly. Write the frontmatter with a trigger-only description. Pick the one rule you wish the agent would never break. Pressure-test it on a subagent. Capture the rationalizations. Close them one at a time. That is the whole loop - it's the same loop Superpowers used to get here.