Files
olekhondera ea21818f76 feat: add meta-skill create-skill for creating and improving skills
Two modes: Create (gather requirements, generate SKILL.md) and Improve
(diagnose existing skill against best practices, propose changes).
Includes bundled references for frontmatter spec and writing guide.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 18:24:55 +02:00

3.9 KiB

Skill Writing Guide

Best practices for writing effective Claude Code skills.

Two Categories of Skills

  1. Capability uplift — teaches the agent something it couldn't do before (scaffold component, run audit, deploy)
  2. Encoded preference — captures your specific way of doing something the agent could already do (commit style, review checklist, naming conventions)

Know which you're building — it changes how much detail to include.

Description Optimization

The description is the most important line. It determines when the skill gets triggered.

  • List trigger contexts explicitly: "Use when the user wants to X, Y, or Z"
  • Think about should-trigger / should-not-trigger scenarios
  • A slightly "pushy" description is better than a vague one
  • Test: would this description make the model select this skill for the right prompts?

Writing Instructions

Explain WHY, not just rules

  • Bad: "MUST use semantic HTML"
  • Good: "Use semantic HTML elements (nav, main, aside) because screen readers depend on landmarks for navigation"

Avoid heavy-handed MUSTs

  • Reserve MUST/NEVER for genuine constraints (security, data loss)
  • For preferences, explain the reasoning and let the agent make good decisions

Progressive disclosure

Three levels of instruction loading:

  1. Frontmatter — always loaded (name, description). Keep minimal.
  2. Body — loaded when skill is invoked. Core instructions here.
  3. Bundled resources — loaded on demand via Read. Put reference tables, specs, examples here.

Use bundled resources (references/, scripts/, assets/) for content that would bloat the main SKILL.md.

Every sentence should change behavior

  • Delete filler: "It is important to...", "Make sure to...", "Please note that..."
  • Delete obvious instructions the agent would do anyway
  • Test: if you removed this sentence, would the output change? No → delete it.

Structure Conventions

Project conventions (this repo)

  • Always set disable-model-invocation: true
  • Use H1 for the skill title (short action phrase)
  • Reference $ARGUMENTS early in the body
  • Use ! backtick for live data injection (git diff, file listings)
  • Numbered steps, imperative voice
  • Output format in a fenced markdown block if structured

Bundled resources pattern

.claude/skills/my-skill/
  SKILL.md              # Main instructions
  references/           # Specs, guides, schemas
  scripts/              # Shell scripts, templates
  assets/               # Static files

Reference from SKILL.md: Read ${CLAUDE_SKILL_DIR}/references/spec.md

Length Guidelines

  • Simple skills (encoded preference): 30-50 lines
  • Standard skills (capability uplift): 50-100 lines
  • Complex skills (multi-mode, research): 100-200 lines
  • Maximum: 500 lines (if exceeding, split into bundled resources)

Common Mistakes

  1. Overfitting to test cases — write general instructions, not scripts for specific inputs
  2. Too many rules — the agent ignores rules after ~20 constraints. Prioritize.
  3. No examples — for complex output formats, show one complete example
  4. Ignoring conversation context — skills without fork can use prior conversation. Leverage it.
  5. Forgetting edge cases — what happens with empty input? Invalid arguments? Missing files?

Improvement Workflow

  1. Draft the skill
  2. Test with 3-5 realistic prompts
  3. Review output — does every instruction change behavior?
  4. Remove filler, tighten descriptions
  5. Add edge case handling for failures observed in testing
  6. Re-test after changes

Evaluation Criteria

When reviewing a skill, score against:

  • Trigger accuracy — does the description match the right prompts?
  • Instruction clarity — can the agent follow without ambiguity?
  • Output quality — does the skill produce useful, consistent results?
  • Conciseness — is every line earning its place?
  • Robustness — does it handle edge cases and errors?