Two modes: Create (gather requirements, generate SKILL.md) and Improve (diagnose existing skill against best practices, propose changes). Includes bundled references for frontmatter spec and writing guide. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
97 lines
3.9 KiB
Markdown
97 lines
3.9 KiB
Markdown
# Skill Writing Guide
|
|
|
|
Best practices for writing effective Claude Code skills.
|
|
|
|
## Two Categories of Skills
|
|
|
|
1. **Capability uplift** — teaches the agent something it couldn't do before (scaffold component, run audit, deploy)
|
|
2. **Encoded preference** — captures your specific way of doing something the agent could already do (commit style, review checklist, naming conventions)
|
|
|
|
Know which you're building — it changes how much detail to include.
|
|
|
|
## Description Optimization
|
|
|
|
The description is the most important line. It determines when the skill gets triggered.
|
|
|
|
- List trigger contexts explicitly: "Use when the user wants to X, Y, or Z"
|
|
- Think about should-trigger / should-not-trigger scenarios
|
|
- A slightly "pushy" description is better than a vague one
|
|
- Test: would this description make the model select this skill for the right prompts?
|
|
|
|
## Writing Instructions
|
|
|
|
### Explain WHY, not just rules
|
|
- Bad: "MUST use semantic HTML"
|
|
- Good: "Use semantic HTML elements (nav, main, aside) because screen readers depend on landmarks for navigation"
|
|
|
|
### Avoid heavy-handed MUSTs
|
|
- Reserve MUST/NEVER for genuine constraints (security, data loss)
|
|
- For preferences, explain the reasoning and let the agent make good decisions
|
|
|
|
### Progressive disclosure
|
|
Three levels of instruction loading:
|
|
1. **Frontmatter** — always loaded (name, description). Keep minimal.
|
|
2. **Body** — loaded when skill is invoked. Core instructions here.
|
|
3. **Bundled resources** — loaded on demand via `Read`. Put reference tables, specs, examples here.
|
|
|
|
Use bundled resources (`references/`, `scripts/`, `assets/`) for content that would bloat the main SKILL.md.
|
|
|
|
### Every sentence should change behavior
|
|
- Delete filler: "It is important to...", "Make sure to...", "Please note that..."
|
|
- Delete obvious instructions the agent would do anyway
|
|
- Test: if you removed this sentence, would the output change? No → delete it.
|
|
|
|
## Structure Conventions
|
|
|
|
### Project conventions (this repo)
|
|
- Always set `disable-model-invocation: true`
|
|
- Use H1 for the skill title (short action phrase)
|
|
- Reference `$ARGUMENTS` early in the body
|
|
- Use `!` backtick for live data injection (git diff, file listings)
|
|
- Numbered steps, imperative voice
|
|
- Output format in a fenced markdown block if structured
|
|
|
|
### Bundled resources pattern
|
|
```
|
|
.claude/skills/my-skill/
|
|
SKILL.md # Main instructions
|
|
references/ # Specs, guides, schemas
|
|
scripts/ # Shell scripts, templates
|
|
assets/ # Static files
|
|
```
|
|
|
|
Reference from SKILL.md: `Read ${CLAUDE_SKILL_DIR}/references/spec.md`
|
|
|
|
## Length Guidelines
|
|
|
|
- Simple skills (encoded preference): 30-50 lines
|
|
- Standard skills (capability uplift): 50-100 lines
|
|
- Complex skills (multi-mode, research): 100-200 lines
|
|
- Maximum: 500 lines (if exceeding, split into bundled resources)
|
|
|
|
## Common Mistakes
|
|
|
|
1. **Overfitting to test cases** — write general instructions, not scripts for specific inputs
|
|
2. **Too many rules** — the agent ignores rules after ~20 constraints. Prioritize.
|
|
3. **No examples** — for complex output formats, show one complete example
|
|
4. **Ignoring conversation context** — skills without fork can use prior conversation. Leverage it.
|
|
5. **Forgetting edge cases** — what happens with empty input? Invalid arguments? Missing files?
|
|
|
|
## Improvement Workflow
|
|
|
|
1. Draft the skill
|
|
2. Test with 3-5 realistic prompts
|
|
3. Review output — does every instruction change behavior?
|
|
4. Remove filler, tighten descriptions
|
|
5. Add edge case handling for failures observed in testing
|
|
6. Re-test after changes
|
|
|
|
## Evaluation Criteria
|
|
|
|
When reviewing a skill, score against:
|
|
- **Trigger accuracy** — does the description match the right prompts?
|
|
- **Instruction clarity** — can the agent follow without ambiguity?
|
|
- **Output quality** — does the skill produce useful, consistent results?
|
|
- **Conciseness** — is every line earning its place?
|
|
- **Robustness** — does it handle edge cases and errors?
|