
You're building a multi-agent system in Claude Code, and you've realized something crucial: not every agent needs the same brainpower. Some tasks need Opus-level reasoning; others run circles faster with Haiku. Some agents should wield every tool in the toolkit; others need safety guardrails. This is where agent frontmatter becomes your secret weapon.
Agent frontmatter—the YAML metadata sitting at the top of every .claude/agents/ file—is where you define the personality, capabilities, and constraints of each subagent. Get it right, and you unlock cost-effective, focused, reliable automation. Get it wrong, and you're bleeding tokens and introducing safety vulnerabilities. Let's fix that gap.
Table of Contents
- What Is Agent Frontmatter?
- Frontmatter Fields: Complete Reference
- `name` (Required)
- `description` (Required)
- `model` (Required)
- `tools` (Required)
- The System Prompt Body: Where Behavior Lives
- Your Constraints
- Your Approach
- Examples
- Your Core Principles
- Your Process
- Example Input
- Example Output
- When Writing Tests
- Cost-Performance Tradeoffs: Practical Scenarios
- Scenario 1: Parallel File Validation
- Scenario 2: Code Generation with Quality Gates
- Process
- Scenario 3: Architectural Review (Requires Deep Reasoning)
- Building a Production Agent: Complete Example
- Your Specialties
- Your Process
- Output Format
- When You're Unsure
- Pitfalls to Avoid
- The Art of Writing System Prompts for Different Models
- Building Agent Teams That Work Together
- Debugging Agent Behavior Through Frontmatter
- Common Frontmatter Mistakes
- Mistake 1: Overpowered Agents
- Mistake 2: Vague System Prompts
- Mistake 3: Wrong Model for the Task
- Mistake 4: Dangerous Tool Combos Without Safeguards
- Safety Constraints
- Practical Patterns: Copy-Paste Starter Templates
- Pattern 1: Fast Analyzer (Haiku, Read-Only)
- Pattern 2: Code Generator (Sonnet, Write)
- Pattern 3: Researcher (Haiku, Web Access)
- Pattern 4: Orchestrator (Haiku, Agent Only)
- Learning From Deployed Agents: Iterating to Excellence
- Final Checklist: Before You Ship Your Agent
- Living Documentation: Keeping Your Agent Specs Fresh
- Conclusion: Frontmatter is Your Control Panel
- The Scaling Experience: How Frontmatter Matters at Enterprise Scale
- Getting Started: Building Your First Specialized Agent
What Is Agent Frontmatter?
Every agent in Claude Code starts with structured metadata. Think of it as the agent's job description, skill set, and operating license all rolled into one. Here's the basic anatomy:
---
name: "Agent Display Name"
description: "What this agent does and when to use it"
model: "haiku"
tools: ["Read", "Write", "Bash"]
---
System prompt body starts here. This is where you define the agent's behavior, reasoning style, and specialty...The frontmatter is everything between the triple dashes (---). The body is the system prompt—the narrative instructions that shape how the agent thinks and acts. Both matter. Both require thoughtful design.
This article is your exhaustive reference for getting frontmatter right. We'll cover every field, every valid value, cost-performance tradeoffs, safety patterns, and real-world examples. By the end, you'll build agents that are fast, focused, and reliable.
Frontmatter Fields: Complete Reference
Let's walk through every field you can set, what it does, and when you need it.
name (Required)
Type: String Purpose: Display name for the agent (used in logs, CLIs, team dashboards)
Rules:
- Must be unique across your
.claude/agents/directory - Keep it concise but descriptive (20-50 chars ideal)
- Use title case
- No special characters except hyphens
Example:
name: "Code Analyzer"
name: "Documentation Generator"
name: "Test Engineer"This field is what you see when you ask Claude Code to list available agents. It's also what appears in logs when the agent runs. Make it clear enough that a human can glance at a log and know which agent was involved.
Pitfall: Don't use generic names like "Agent 1" or "Helper." Your future self will thank you when reading logs. Also avoid names that are too specific to implementation details—"Regex Pattern Matcher v2" is worse than "Email Validator" because the implementation detail (regex) might change but the task (email validation) doesn't.
Real-world scenario: In a large system with 50+ agents, clear naming becomes critical. When you're debugging a production issue at 2 AM and the logs show "Agent XYZ ran for 45 seconds," you need to know immediately what XYZ does. "Python Code Analyzer" tells you everything. "Processor-B" tells you nothing.
description (Required)
Type: String Purpose: Human-readable explanation of what the agent does, when to use it, and its specialty
Rules:
- 1-3 sentences maximum
- Answer: What problem does it solve? When should it run?
- Be specific about scope (e.g., "write Python unit tests" vs. "write code")
- Include any prerequisites or assumptions
Example:
description: "Analyzes Python codebases for performance bottlenecks and suggests optimizations. Requires access to source files and build logs. Best for repos with 50K+ LOC."
description: "Generates Markdown documentation from TypeScript/JavaScript JSDoc comments. Filters for public API only. Fast, focused, zero customization."
description: "Validates JSON schema compliance and reports violations. Stateless. Safe to run in parallel on large datasets."Good descriptions help you (and your team) choose the right agent for a task. A vague description leads to agent misuse. When you're orchestrating multiple agents, you need to know at a glance which one is suited for the job.
Pitfall: Don't include the agent's reasoning process in the description—just what it does and why you'd use it. Don't say "This agent thinks carefully about edge cases and uses deep reasoning." Say "This agent identifies performance bottlenecks in Python code by analyzing algorithm complexity and I/O patterns."
Gotcha: If your description is wrong or misleading, people will use the agent for the wrong tasks. I've seen teams waste hours because they thought an agent could do something, the description suggested it could, but the actual implementation couldn't. Be precise.
model (Required)
Type: String (enum)
Valid Values: "haiku" | "sonnet" | "opus"
Purpose: Which Claude model runs this agent
This is critical. Your choice here directly impacts:
- Cost (Haiku ~1/6 the cost of Sonnet, Opus ~3× Sonnet)
- Speed (Haiku fastest, Opus slowest)
- Reasoning depth (Haiku good, Sonnet very good, Opus excellent)
- Context window (Haiku 200K, Sonnet 200K, Opus 200K all equal; reasoning capability differs)
Model Selection Matrix:
| Task | Recommended | Why |
|---|---|---|
| Simple parsing, formatting, regex | Haiku | Fast, cheap, sufficient |
| Code generation, bug fixes, analysis | Sonnet | Good balance of capability and cost |
| Complex reasoning, multi-step problems | Opus | Reasoning depth matters |
| Fact-checking, validation | Haiku | Pattern matching, no creativity needed |
| Content creation, writing | Sonnet or Opus | Quality matters more than speed |
| Summarization, extraction | Haiku | High volume, low cognitive load |
| Architectural decisions, design review | Opus | Complex tradeoffs require deep thinking |
| Parallel validation tasks | Haiku | Cost efficiency for batch operations |
Real-World Example:
You're building a system with five agents:
- Code Formatter → Haiku (deterministic, rule-based)
- Bug Detector → Sonnet (needs reasoning but not rare edge cases)
- System Architect → Opus (complex design decisions)
- Test Generator → Sonnet (balancing coverage and readability)
- Documentation Scraper → Haiku (just extracting and formatting)
By matching models to task complexity, you cut costs by ~40% while maintaining quality.
Cost Math:
As of early 2026:
- Haiku: ~$0.80 per million input tokens, ~$4 per million output tokens
- Sonnet: ~$3 per million input tokens, ~$15 per million output tokens
- Opus: ~$15 per million input tokens, ~$75 per million output tokens
For a task processing 1M tokens of input, choosing Haiku saves you ~$2.20 vs. Sonnet. Multiply that across 100 tasks per day, and you're looking at real dollars—maybe $200-300/day that stays in your budget.
The hidden cost of wrong choices: If you default to Opus for everything, you might spend $5,000/month on agent processing that could cost $1,500 with better selection. If you cheap out on Sonnet and use Haiku for code generation, you might get lower-quality output that needs rework, negating any savings.
Pitfall: Don't default to Opus for everything. It's tempting ("maximum quality!"), but you're paying for reasoning depth you don't always need. Start with Haiku, bump up to Sonnet if it fails validation, and reserve Opus for genuinely complex problems. Also, don't assume "more expensive = better." Haiku is perfectly fine at what it does. The question is whether your task needs what Sonnet or Opus provides.
Troubleshooting guide:
- If your Haiku agent is hallucinating answers it doesn't know, it might be too weak for the task. Try Sonnet.
- If your Sonnet agent is slow and you have a high-volume workflow, try Haiku with clearer constraints.
- If your Opus agent is taking 30+ seconds per task, reconsider whether you really need that reasoning depth.
tools (Required)
Type: Array of strings Purpose: Restrict which tools the agent can access
This is your safety and focus lever. By limiting tool access, you:
- Reduce hallucination risk (agent can't invent capabilities it doesn't have)
- Enforce focus (agent stays in its lane)
- Improve safety (no accidental writes, deletions, or external calls)
- Speed up execution (less overhead, faster decisions)
Available Tools:
Here's the full roster. Include only what the agent needs:
tools:
- "Read" # Read files locally
- "Write" # Write files locally
- "Edit" # Edit files (targeted replacements)
- "Bash" # Execute bash commands
- "Glob" # Fast file pattern matching
- "Grep" # Content search with regex
- "Agent" # Spawn child agents (orchestration)
- "WebSearch" # Search the web, ground in current info
- "WebFetch" # Fetch and analyze URLs
- "Skill" # Invoke saved skills/commandsTool Access Matrix (By Agent Type):
| Agent Type | Typical Tools | Rationale |
|---|---|---|
| Code Analyzer | Read, Glob, Grep, Bash | Reads code, searches patterns, runs tests |
| Code Generator | Read, Write, Edit, Bash | Writes new files, modifies existing, validates |
| Documentation | Read, Write, WebFetch | Reads source, writes docs, fetches references |
| Validator | Read, Glob, Grep, Bash | Inspects, searches, runs validation scripts |
| Researcher | WebSearch, WebFetch | Needs web access only |
| Orchestrator | Agent | Spawns child agents only |
Safety Pattern: Deny-by-Default
Start restrictive. Add tools only when the agent clearly needs them:
# ❌ WRONG: Too permissive
tools: ["Read", "Write", "Edit", "Bash", "Glob", "Grep", "Agent", "WebSearch", "WebFetch"]
# ✅ CORRECT: Only what's needed
tools: ["Read", "Glob", "Grep", "Bash"]This agent reads files, searches for patterns, and runs bash. It can't write, modify, or access the web. Safe. Focused.
Example: Build a Specialized Validator
name: "Schema Validator"
description: "Validates JSON/YAML files against schemas. Reports violations. Read-only."
model: "haiku"
tools: ["Read", "Glob", "Grep"]This agent can't write, can't run bash, can't access the web. It just reads and searches. Perfect for a validator—you're guaranteed no side effects.
The "Write" vs. "Edit" distinction:
Writecreates new files or completely overwrites existing ones. Powerful but dangerous.Editmakes targeted replacements in existing files. Safer because it requires matching exact content.
If your agent just needs to modify a few lines in existing files, use Edit and deny Write. If it needs to create new files from scratch, include both.
Pitfall: Including Bash without strict guardrails is dangerous. If an agent can execute arbitrary shell, it can delete files, corrupt databases, or leak secrets. Always pair Bash with a safety system prompt that forbids destructive commands.
Dangerous combinations to scrutinize:
Bash+Write+ unrestricted system prompt = can do anythingWebFetch+Agent= can fetch data and pass it to other agents unsupervisedEdit+ any write tool + loose constraints = can modify critical files
The System Prompt Body: Where Behavior Lives
After the frontmatter comes the body—your system prompt. This is where you define the agent's personality, reasoning style, constraints, and specialty logic.
Structure:
---
name: "My Agent"
description: "Does X, used for Y"
model: "haiku"
tools: ["Read", "Bash"]
---
You are a specialized [domain] agent focused on [specific task].
## Your Constraints
- Never [action]
- Always [requirement]
- When unsure, [behavior]
## Your Approach
1. [First step]
2. [Second step]
3. [Validation step]
## Examples
[Show expected input/output]
When you encounter [scenario], [action].System Prompt Best Practices:
-
Start with role clarity: "You are a Python unit test generator focused on pytest. You prioritize readability and coverage over cleverness."
-
Define constraints early: List what the agent must NOT do. Haiku benefits from explicit guardrails. Be specific: "Never use eval(). Never import entire modules with *. Never write tests that depend on external services."
-
Show examples: Concrete input/output examples anchor behavior. Don't describe, show.
-
Explain the "why": "Output should include helpful assertion messages because tests fail in CI logs, not IDEs. When a test fails in CI, the developer is flying blind without the assertion message."
-
Call out edge cases: "If the function has no return value, skip assertions and focus on side effects. If the function is async, wrap it in asyncio.run() in tests."
Here's a real example:
---
name: "Python Test Generator"
description: "Generates pytest unit tests from Python source. Focuses on readability and edge cases."
model: "haiku"
tools: ["Read", "Write", "Bash"]
---
You are a pytest expert generating comprehensive unit tests for Python functions.
## Your Core Principles
- Test behavior, not implementation
- Use descriptive test names that read like documentation (e.g., test_parse_int_with_leading_zeros_returns_int)
- Include edge cases: None values, empty collections, negative numbers, zero, maximum values
- Prioritize readability over brevity
- Never use pytest.mark.skip or xfail—if a test is bad, don't write it
- Never import random or use time-dependent tests
- Assume pytest is available and uses standard assert syntax
## Your Process
1. Read the source file completely
2. Identify all public functions (not prefixed with _)
3. For each function:
a. Extract signature and docstring
b. Identify normal cases, edge cases, error cases
c. Write 3-5 tests per function
4. Validate syntax with `pytest --collect-only` before returning
## Example Input
```python
def parse_int(value: str) -> int:
"""Parse string to int. Raises ValueError if invalid."""
return int(value)Example Output
def test_parse_int_with_valid_positive_integer():
assert parse_int("42") == 42
def test_parse_int_with_negative_integer():
assert parse_int("-10") == -10
def test_parse_int_with_leading_zeros():
assert parse_int("007") == 7
def test_parse_int_with_invalid_string_raises_error():
with pytest.raises(ValueError):
parse_int("abc")
def test_parse_int_with_floating_point_string_raises_error():
with pytest.raises(ValueError):
parse_int("3.14")When Writing Tests
- If the function handles errors, test both success and error paths
- If the function has defaults, test with and without defaults
- If the function touches files/databases, mock them using unittest.mock.patch
- If the test requires a fixture, skip it and note why in a comment: # TODO: requires fixture setup
- Document non-obvious test logic with comments
This system prompt is explicit. The agent knows exactly what to do, why, and where to stop.
## Cost-Performance Tradeoffs: Practical Scenarios
Let's walk through realistic scenarios and the frontmatter choices that make sense.
### Scenario 1: Parallel File Validation
**Problem:** Validate 10,000 JSON files against a schema. You need fast, cheap, high-volume processing.
**Frontmatter:**
```yaml
---
name: "JSON Schema Validator"
description: "Validates JSON files against schema. Fast, stateless, parallelizable."
model: "haiku" # ← Haiku: cheap, fast, deterministic
tools: ["Read"] # ← Read-only: no side effects
---
You are a JSON schema validation expert.
For each JSON file, validate it against the schema and report:
- Valid: true/false
- Errors: [list any violations]
- Line: [first error location]
Never modify files. Output only JSON.
Cost estimate: 10K files × ~2K tokens/file = 20M tokens. Haiku input: ~$0.80/M = $16 total.
If you used Sonnet: ~$60. If you used Opus: ~$300.
By choosing Haiku, you saved $44. And the task runs faster. The real-world impact: in a continuous validation pipeline running 24/7, this choice could save you $10K+/month.
Pitfall averted: Don't use Sonnet or Opus for high-volume, rule-based tasks. You're wasting money on reasoning capability you don't need.
Scenario 2: Code Generation with Quality Gates
Problem: Generate unit tests for a 5K-line Python codebase. Quality matters (readability, coverage), but you need reasonable cost.
Frontmatter:
---
name: "Code Test Generator"
description: "Generates pytest tests with high readability and edge case coverage."
model: "sonnet" # ← Sonnet: good balance
tools: ["Read", "Write", "Bash"] # ← Needs read, write, test execution
---
You are a test generation expert prioritizing readability and coverage.
## Process
1. Analyze function signature and docstring
2. Identify normal, edge, and error cases
3. Write tests using descriptive names
4. Validate with pytest before output
Never skip error cases. Never use pytest.mark.skip.
If a test is complex (>10 lines), document assumptions.Cost estimate: 5K lines → ~20 test functions × ~300 tokens/test = ~6M tokens. Sonnet input: ~$3/M = $18 total.
You could use Haiku (save $10), but it might miss edge cases or generate low-readability tests. Sonnet's extra reasoning pays for itself in fewer test revisions. This is the sweet spot.
What happens if you cheap out? I've seen teams save $30 on test generation and then spend 8 hours reviewing and rewriting generated tests. The Sonnet version generated tests first time. The Haiku version missed 15% of edge cases and generated unclear assertion messages.
Pitfall averted: Don't cheap out on code generation. A $10 savings is worth nothing if the tests are garbage and need rewriting.
Scenario 3: Architectural Review (Requires Deep Reasoning)
Problem: Review a distributed system design for failure modes, consistency issues, and scalability problems. This is complex.
Frontmatter:
---
name: "System Architect Reviewer"
description: "Reviews system designs for failure modes, consistency, scalability. Requires deep reasoning."
model: "opus" # ← Opus: complex reasoning
tools: ["Read", "WebFetch"] # ← Read docs, fetch standards
---
You are a systems architect reviewing designs for:
1. Failure modes (what breaks?)
2. Consistency guarantees (eventual? strong?)
3. Scalability bottlenecks (where does it break at 100K qps?)
4. Operational complexity (can we debug and fix this?)
When you identify a risk:
- Explain the scenario that triggers it
- Assess severity: Low/Medium/High
- Suggest mitigation
Be skeptical. Assume Murphy's Law.Cost estimate: 1 design review × ~50K tokens = ~50K tokens. Opus input: ~$15/M = $0.75 total.
Cheap. And Opus's reasoning depth is worth it—it'll catch subtle consistency bugs that Sonnet misses. The design is about to handle millions in revenue. A $0.75 review that prevents a consistency bug worth millions is the best ROI in software.
Pitfall averted: Use Opus for things that require deep thinking: architectural decisions, complex reasoning, identifying edge cases you haven't thought of. Save it for high-stakes decisions.
Building a Production Agent: Complete Example
Let's build a complete, production-ready agent from scratch.
Goal: Create an agent that analyzes Python codebases for performance bottlenecks.
Step 1: Frontmatter Design
---
name: "Python Performance Analyzer"
description: "Identifies performance bottlenecks in Python code: O(n²) loops, inefficient algorithms, memory leaks. Requires source files and benchmark data."
model: "sonnet"
tools: ["Read", "Glob", "Grep", "Bash"]
---Reasoning:
- Sonnet (not Haiku): Performance analysis requires reasoning about algorithms, O-notation, and trade-offs. Haiku might miss nuances.
- Read, Glob, Grep: Search files, understand patterns
- Bash: Run Python tools (pylint, memory_profiler) to gather data
- Not WebSearch: Internal codebase, no external research needed
Step 2: System Prompt Body
---
name: "Python Performance Analyzer"
description: "Identifies performance bottlenecks in Python code: O(n²) loops, inefficient algorithms, memory leaks. Requires source files and benchmark data."
model: "sonnet"
tools: ["Read", "Glob", "Grep", "Bash"]
---
You are a Python performance expert analyzing codebases for bottlenecks.
## Your Specialties
1. Identifying algorithmic inefficiencies (O(n²) nested loops, inefficient sorting)
2. Spotting memory leaks (unused references, circular dependencies)
3. Finding I/O bottlenecks (synchronous file reads in loops, N+1 queries)
4. Recommending caching/memoization opportunities
## Your Process
1. Scan the codebase for Python files
2. For each file, search for red flags:
- Nested loops (potential O(n²))
- List comprehensions with .append (use extend instead)
- Global variables in loops
- Synchronous I/O in loops
3. Analyze data flow: Do variables persist unnecessarily?
4. Run memory_profiler if available to quantify memory usage
5. For each bottleneck, assess:
- **Impact**: How much does this hurt performance?
- **Likelihood**: How often does this happen in practice?
- **Effort**: How hard is it to fix?
## Output Format
For each bottleneck:
```yaml
- name: [brief name]
file: [path:line_range]
severity: [low/medium/high]
current: [what's happening]
impact: [estimated slowdown, e.g., "10x slower for 1000 items"]
fix: [refactored code snippet]When You're Unsure
- Run a quick benchmark:
python -m timeit "[your code here]" - If the result is < 1µs, it's not a bottleneck
- If > 1ms, it might matter in loops
Pitfalls to Avoid
- Don't blame Python's GC for every slowdown—profile first
- Don't suggest C extensions without trying optimization first
- Don't report "could be faster" without quantifying the impact
- Premature optimization is the root of all evil—only flag things that genuinely harm performance
**Step 3: Testing the Agent**
You'd invoke this agent like:
```bash
/dispatch "Python Performance Analyzer" "Analyze src/ for bottlenecks"
The agent would:
- Find Python files in
src/ - Search for patterns (nested loops, I/O in loops, etc.)
- Run profiling tools
- Output structured findings
- Suggest fixes with code snippets
Step 4: Iteration
If the agent's output is too verbose, tighten the system prompt:
Output only HIGH severity findings. If uncertain about severity, ask yourself:
"Would this matter if this code runs 1000x per day?"
If it misses certain patterns, add them:
Also look for:
- Regex operations in tight loops (compile regex once, reuse)
- JSON parsing in loops (batch parse when possible)
The Art of Writing System Prompts for Different Models
System prompts are where agent behavior really comes alive. But the way you write prompts matters differently depending on which model you're targeting.
Haiku responds well to explicit, direct instructions. It doesn't benefit from lengthy explanations—be concise. "Validate JSON files against schema. Report: valid/invalid, errors, line number." That's enough. Haiku follows straightforward instructions reliably.
Sonnet can handle more nuance. You can explain the "why" behind instructions. "You're generating tests that will run in CI/CD pipelines with limited resources. Optimize for readability and clarity in assertion messages because failures will be read in CI logs, not in an IDE. When tests fail remotely, developers need the assertion message to understand what broke." This additional context helps Sonnet make better decisions about trade-offs.
Opus benefits from deep reasoning. You can include edge cases, subtle requirements, and complex decision-making guidance. Opus will think through implications and handle ambiguity well. "Handle both happy path and error scenarios. For error scenarios, consider: is this error user-caused (bad input)? Is it system-caused (downstream service unavailable)? Is it transient (retry might help)? Your error handling strategy should differ for each category."
The mistake many teams make is using the same prompt template for all three models. Better approach: tailor prompts to each model's strengths. Haiku gets directive, concise prompts. Sonnet gets contextual prompts with rationale. Opus gets nuanced, complex prompts with edge case handling.
Building Agent Teams That Work Together
As you create more agents, think about how they interact. You might have a specialist agent for code analysis and another for code generation. The analyzer finds issues. The generator fixes them. They work together as a team.
Document these relationships in your agent configurations. "Code Generator depends on Code Analyzer findings" or "This agent runs after Schema Validator completes." These dependencies matter for orchestration and error handling.
When agents work together, their frontmatter becomes even more important. If Agent A expects specific output format from Agent B, document it in both agents' prompts. "Output JSON with keys: ." This contracts between agents prevent integration issues.
Debugging Agent Behavior Through Frontmatter
When an agent isn't performing as expected, the first place to look is its frontmatter and system prompt. Nine times out of ten, agent misbehavior traces back to unclear instructions or wrong model selection, not fundamental capabilities.
If an agent is hallucinating—making up facts it doesn't know—it might be too weak a model. Haiku sometimes confabulates when tasks are too complex. Upgrade to Sonnet and see if hallucination stops.
If an agent is slow, maybe it's too powerful. Opus takes 30+ seconds on tasks because it's thinking deeply. If the task is straightforward validation, Haiku is faster and cheaper.
If an agent is producing verbose output when concise is expected, the system prompt isn't giving clear constraints. "Keep responses to one line. No explanations." suddenly makes agents concise.
If an agent is making mistakes in a domain you're confident it can handle, the system prompt might not be explaining edge cases. Add examples. Show what correct output looks like.
Debugging agents is largely debugging their instructions. Get the system prompt right, and the agent usually works well.
Common Frontmatter Mistakes
Mistake 1: Overpowered Agents
# ❌ WRONG
---
name: "Helper Agent"
description: "Does stuff"
model: "opus"
tools: ["Read", "Write", "Edit", "Bash", "Glob", "Grep", "Agent", "WebSearch"]
---This agent can do anything, costs a fortune, and makes decisions you can't predict. Avoid.
Fix: Be specific. Give it one job.
# ✅ CORRECT
---
name: "Documentation Generator"
description: "Converts TypeScript JSDoc to Markdown API docs. Reads source, writes docs, validates with bash."
model: "sonnet"
tools: ["Read", "Write", "Bash"]
---Mistake 2: Vague System Prompts
You are a helpful assistant. Do good work.
This tells the agent nothing. Don't.
Fix:
You are a documentation generator. Your job is:
1. Extract JSDoc from TypeScript files
2. Convert to Markdown tables
3. Validate with a linter
Do not include private methods (prefix _).
Do not include test files.
Output format: markdown with h2 headers per function.
Mistake 3: Wrong Model for the Task
Using Opus for regex substitution, or Haiku for architectural design. Choose deliberately.
Mistake 4: Dangerous Tool Combos Without Safeguards
tools: ["Bash", "WebFetch"]Paired with a loose system prompt, this agent could run arbitrary shell + fetch any URL. Dangerous.
Fix: Add guardrails in the system prompt:
## Safety Constraints
- Never execute commands with rm, dd, or > redirection
- Only fetch from whitelisted domains
- Never write sensitive output to public logs
Practical Patterns: Copy-Paste Starter Templates
Here are battle-tested frontmatter templates you can use as starting points.
Pattern 1: Fast Analyzer (Haiku, Read-Only)
---
name: "Pattern Detector"
description: "Scans code for specific patterns. Fast, stateless, read-only."
model: "haiku"
tools: ["Read", "Glob", "Grep"]
---
You scan for [specific pattern].
Process:
1. Find all files matching [pattern]
2. Extract matching lines
3. Report location and context
Output: JSON array with {file, line, context}.
Do not modify files.Pattern 2: Code Generator (Sonnet, Write)
---
name: "Code Generator"
description: "Generates code (functions, tests, boilerplate). Validates with syntax checker."
model: "sonnet"
tools: ["Read", "Write", "Bash"]
---
You generate [specific code type].
Process:
1. Understand requirements
2. Write code
3. Validate syntax
4. Output to file
Prioritize readability. Avoid magic numbers.Pattern 3: Researcher (Haiku, Web Access)
---
name: "Fact Checker"
description: "Validates claims against web sources. Cites references."
model: "haiku"
tools: ["WebSearch", "WebFetch"]
---
You fact-check claims.
Process:
1. Parse claim
2. Search for sources
3. Compare: Is claim accurate?
4. Report: Yes/No/Unclear, with citations
Output: {claim, verdict, source_url}.Pattern 4: Orchestrator (Haiku, Agent Only)
---
name: "Task Router"
description: "Routes tasks to specialized agents. Coordinates execution."
model: "haiku"
tools: ["Agent"]
---
You route [problem types] to specialized agents.
Map:
- Code analysis → Code Analyzer Agent
- Test generation → Test Generator Agent
- Documentation → Doc Generator Agent
Coordinate execution. Aggregate results. Return summary.Learning From Deployed Agents: Iterating to Excellence
Once agents are deployed, you'll accumulate data about how they perform. Track metrics like token usage, latency, quality scores (if your workflow includes human review), and error rates. This data is gold. It tells you whether your frontmatter choices were right.
If your Haiku agent is hallucinating answers it doesn't know, you picked the wrong model. If your Sonnet agent is taking 45 seconds per task, you might be able to drop to Haiku with tighter constraints. If your agent is using tools it doesn't need, remove them.
Iterate based on real performance. The frontmatter you write today will be refined by tomorrow's data. View it as a hypothesis: "I think Haiku is fast enough for this validation task." Run the agent. Measure. Adjust. This cycle drives you toward optimal configurations.
Final Checklist: Before You Ship Your Agent
Before you deploy an agent, verify:
- Frontmatter is valid YAML (use
yamllintor try parsing in Python) - Name is unique across all agents
- Description answers: What? When? Why?
- Model choice is justified (Haiku for speed? Sonnet for balance? Opus for reasoning?)
- Tools are minimal (removed unnecessary access?)
- System prompt is specific (not vague platitudes)
- System prompt includes examples (show expected I/O)
- Safety constraints are explicit (what can't it do?)
- Output format is structured (YAML, JSON, markdown tables)
- Agent has been tested (does it do what you intended?)
Living Documentation: Keeping Your Agent Specs Fresh
As you deploy agents and learn how they actually perform, update their frontmatter. If an agent is consistently performing below expectations, maybe it's running on the wrong model. If it's running slow, maybe Haiku is sufficient. If it's making mistakes that Sonnet would catch, upgrade it.
Your initial agent specs are hypotheses. Deployment is where you test them. Be willing to iterate based on real-world performance. The best-performing agents aren't those that were designed perfectly from the start—they're those that were tuned based on actual usage patterns.
Keep notes on why you made each choice. "Haiku for this validator because throughput matters more than perfect accuracy" or "Opus for this reviewer because architecture decisions require deep reasoning." This decision documentation helps future you (or your team) understand the trade-offs and adjust confidently.
Conclusion: Frontmatter is Your Control Panel
Agent frontmatter is deceptively powerful. These few YAML fields—name, description, model, and tools—combined with a well-written system prompt, let you build focused, safe, cost-effective agents that scale from one to hundreds.
Key takeaways:
-
Choose model deliberately. Haiku for high-volume/deterministic tasks. Sonnet for balanced code generation. Opus for complex reasoning. This choice directly impacts cost, latency, and quality. Start with Haiku, upgrade when you have evidence that more capability is needed.
-
Restrict tools by default. An agent that can only Read is safe. An agent that can Read and Write needs scrutiny. An agent that can Bash and WebFetch needs serious guardrails. Give each agent only what it needs—no more. Every tool adds surface area for mistakes or misuse.
-
Write system prompts for humans first. If you can't explain what the agent should do in plain English, the agent won't do it right either. Be explicit. Show examples. Call out pitfalls. Include edge cases. The time you invest in clarity compounds every time the agent runs.
-
Iterate and measure. Build an agent, test it, refine the prompt and configuration. Each cycle makes it faster, cheaper, and more reliable. Measure token usage, quality, latency. Make data-driven improvements.
-
Document your decisions. Why did you choose Sonnet for this agent? Why Haiku for that one? What edge cases did you encounter? Future you (and your team) will appreciate the reasoning and won't repeat the same experiments.
The Scaling Experience: How Frontmatter Matters at Enterprise Scale
When you're running dozens of agents, frontmatter becomes critical infrastructure. Unclear names mean developers don't know which agent to use. Vague descriptions cause misuse. Wrong model choices create cost overruns or quality issues. Missing tools mean agents fail at runtime.
Multiply by dozens or hundreds of agents, and frontmatter mistakes accumulate into organizational friction. Teams create duplicate agents because they didn't know the first one existed. Agents run on expensive models when cheaper ones would work. Tools are granted unnecessarily, creating security risks.
The organizations that scale agents successfully are those that treat frontmatter as first-class infrastructure. They document naming conventions. They enforce description standards. They conduct regular audits of model selection. They manage tool access carefully.
This discipline isn't overhead—it's the difference between agents that are helpful and agents that are a liability. Get it right, and you've built the foundation for reliable automation that your team can scale confidently. Start simple. Add tools and models only when you can justify them with evidence. Measure the results. Iterate. And build systems that scale safely while maintaining cost efficiency and reliability.
The agents you build today with careful frontmatter design become the systems you run for years. Invest the time upfront to get it right. Your future self will thank you. Your team will thank you. Your organization's productivity will thank you.
Getting Started: Building Your First Specialized Agent
Pick a task your team does regularly that's ripe for automation. Something repetitive. Something rule-based. Maybe JSON validation. Maybe code formatting. Maybe documentation extraction.
Sketch out an agent spec:
- What's the agent called?
- What does it do?
- What model does it need? (Start with Haiku unless you have evidence otherwise)
- What tools does it minimally need? (Start restrictive)
- What's the system prompt? (Be specific, include examples)
Test it. Does it work? If yes, deploy it into your workflow. If not, iterate. Refine the system prompt. Maybe upgrade to Sonnet. Maybe add tools. Measure and adjust.
After you've shipped one agent successfully, build the next. You'll develop intuition about frontmatter. You'll spot patterns. You'll know what works and what doesn't. Your agent designs will improve with each iteration.
That iterative discipline is how you build excellent agents. Not overthinking upfront, but building, learning, refining, and improving continuously. Each agent you ship teaches you something about frontmatter design, about model capabilities, about system prompt clarity.
Over time, you'll develop intuition that lets you design excellent agents on the first try. But even then, you'll refine based on production performance. Agent development is a craft that improves with practice and attention to detail. The frontmatter you write is the foundation of that craft. Get it right, and everything else follows. Get it wrong, and you're constantly fighting the agent to make it do what you want.
Invest in clarity, specificity, and careful model selection. The time spent perfecting frontmatter pays dividends across every run of that agent. Every agent you build will run thousands of times over its lifetime. A 10% improvement in prompt clarity multiplied by thousands of runs is enormous leverage. That's where frontmatter excellence matters—it scales across every invocation.
Your frontmatter is your legacy as an agent developer. Build it well. Your future self running these agents will thank you. Every time an agent runs successfully because you got the frontmatter right, you benefit. Every time an agent fails because instructions were unclear, you learn. This iterative improvement across hundreds of agent runs compounds into significant organizational capability.
That capability—the ability to reliably spec and deploy specialized agents—becomes your competitive advantage. It enables your team to scale beyond what's possible with generic tools.
This article is part of the Claude Code guide to subagents and agent teams.
-iNet