You've got a solid spec. You know exactly what needs to be built. Now you need an agent that can actually write the code without accidentally deleting your entire project or adding features nobody asked for. That's where the implementer agent comes in—and it's a game-changer for scaling your development workflows.

This guide walks you through building a focused implementer agent for Claude Code, integrating it into a review-implement loop, and keeping it honest about what it actually completes. We'll cover everything from agent instructions to validation gates that ensure quality before you declare anything "done."

Why You Need an Implementer Agent: The Leverage Point

Let's start with the problem. You're running Claude Code with custom agents. Your reviewer agent is great at catching bugs and suggesting improvements. But who writes the code in the first place?

The naive approach: You write specs, Claude writes code, you pray it works. Sometimes it does. Often it doesn't. You iterate three or four times, each cycle taking a full context window and requiring human judgment to debug what went wrong.

The professional approach: You write specs, a focused implementer agent writes code following strict rules, a reviewer agent validates it, and only then do you consider it complete. This is a structured workflow that scales with your team.

Here's why this matters:

Scope control: An implementer agent scoped to specific directories can't accidentally touch your database migrations or delete production code. The boundaries are enforced technically, not through hope.
Plan-before-code: You enforce planning before implementation, reducing wasted iterations. The agent maps out what it's going to change before making changes.
Validation gates: The agent verifies tests pass and validates outputs before claiming completion. It's not "I think it works." It's "I verified all tests pass."
Auditability: Every change is logged with evidence, making reviews faster and human judgment more informed.
Human in the loop: The reviewer agent gets involved after implementation, not instead of it. It's a partnership, not replacement.

The implementer agent becomes the bridge between your specifications and production-ready code. It's fast, focused, and operates within boundaries you define. When properly configured, it multiplies the effectiveness of your development team by handling the mechanical aspects of implementation while you focus on architecture and quality.

The Real-World Difference

Think about what happens without an implementer agent framework. You describe a feature. An LLM generates code. You look at it, find issues, ask for fixes, wait for another response, discover the fix broke something else, iterate again. By the third iteration, you're exhausted and the code still isn't quite right. You end up writing parts of it yourself just to ship it. You've wasted four hours on something that should have taken ninety minutes.

Now imagine an implementer agent. You write the spec. The agent reads it, plans the changes, shows you the plan. You approve it in thirty seconds. The agent codes it, verifies tests pass, reports completion with evidence. You look for five minutes and either approve it or request specific changes. Total time: ninety minutes. No wasted iteration. The agent was disciplined about scope. It didn't guess about edge cases—it followed the plan exactly.

That's the difference between leverage and friction. Leverage means the agent amplifies what you can do. Friction means the agent creates more work than it saves. An implementer agent framework is designed from the ground up to maximize leverage and eliminate friction.

Understanding Agent Architecture in Claude Code

Before we build, let's clarify how agents work in Claude Code. A custom agent is a specialized Claude instance with fixed tools (Write, Edit, Bash, Read, Grep—scoped to specific directories), system prompt (instructions that define behavior, constraints, and quality standards), initialization rules (how it starts and what it receives on spawn), validation gates (checks it must pass before reporting completion), and memory integration (access to shared knowledge and previous learnings).

Custom agents live in .claude/agents/ as markdown files. Each agent definition includes configuration, instructions, and quality standards. The structure is deliberate: you're creating a focused specialist, not a general-purpose tool.

The key insight is that agents are stateful whereas one-shot requests are not. When you ask Claude Code directly "write me a function," you get a response and that's it. The context is gone. When you spawn an implementer agent, it maintains state across multiple operations. It knows what changes it's made. It can reference previous work. It understands where it is in the three-phase workflow (planning, implementation, validation). This statefulness is what allows the agent to be reliable. It's harder to make mistakes when you have continuity and can reference everything you've done so far.

Custom agents differ from commands because commands are one-shot instructions (fire and forget) while agents are persistent and stateful, maintaining awareness of what they've done and what's remaining. For implementation work, agents are superior because they can track multiple related changes across files, reference their own previous actions, self-validate before handing off, and maintain complexity across long execution runs. A command can't track state; an agent must.

Here's how an implementer agent flows in practice:

User Request
    ↓
Implementer Agent Spawned
    ├─ Reads specification
    ├─ Plans changes (PLANNING phase)
    ├─ Writes code (IMPLEMENTATION phase)
    ├─ Runs tests (VALIDATION phase)
    └─ Reports with evidence
        ↓
    Reviewer Agent Spawned
    ├─ Reads implementer output
    ├─ Runs code review
    ├─ Flags issues
    └─ Hands back to implementer if needed
        ↓
    Merge to main (or iterate)

The key insight: The implementer moves code from specification to working implementation. The reviewer catches what the implementer missed. This separation of concerns is powerful. The implementer doesn't need to be perfect at spotting performance issues or security vulnerabilities—that's the reviewer's job. But the implementer absolutely must implement the spec correctly, write tests, and validate that tests pass. Success is measurable and objective.

This separation of concerns is powerful. The implementer doesn't need to be perfect at spotting performance issues or security vulnerabilities—that's the reviewer's job. But the implementer absolutely must implement the spec correctly, write tests, and validate that tests pass. The reviewer, in turn, doesn't need to re-verify that tests pass; they can assume they do and focus on code quality, design patterns, and non-obvious bugs.

Building Your Implementer Agent

Now let's build one. We'll create a YAML-based agent definition that you can drop into your .claude/agents/ directory. This approach is practical because it separates configuration from implementation details, making it easy to understand what the agent does at a glance.

Step 1: Agent Definition Structure

Here's the basic structure of an implementer agent. It defines the agent's name, role, model choice (use a larger model for complex work), available tools scoped to specific directories (write/edit in src/tests/scripts, read in src/tests/docs/config), and forbidden directories (git, node_modules, dist).

The scope is intentional. You're not trying to restrict the agent maliciously. You're creating guardrails that prevent accidents. An agent can't delete node_modules accidentally because it has no write permission there. It can't modify .git configuration because that directory is forbidden. These constraints are features, not limitations. They're the technical implementation of the boundaries you've defined.

Step 2: Core Principles

The implementer operates on five core principles that elevate it from a code generator to a reliable development partner:

Plan before you code: Never jump straight to writing. Follow this sequence: read specification, analyze existing code, identify entry points and dependencies, map changes to specific files, consider impacts on tests and documentation, draft a change plan, present the plan for validation, only then code.

A plan takes 5 minutes and saves 30 minutes of rework. You're front-loading thought, not back-loading debugging. This is the difference between a tool that works sometimes and one that works consistently. The agent writes the plan in a structured format that you can read and approve before implementation starts. If the plan is wrong, you catch it early. If it's right, the implementation is fast and straightforward.

Evidence-driven development: Never claim anything is "done" without proof. Every assertion must include what you did (specific files changed), how you verified it (test results, output samples), and why it works (logic explanation).

Instead of "I fixed the parser. It's done," say "I modified src/parser.js lines 45-67 to handle edge case X. Tests: npm test parser → 12/12 passed. Changed behavior: input 'xyz' now returns object instead of null." The evidence makes review faster and mistakes obvious. When you provide evidence, the reviewer can verify claims in seconds instead of reading code for an hour.

Scoped, focused changes: You have write access to specific directories only. This is intentional, preventing accidental deletions, database schema changes, or infrastructure modifications. Before writing to any file, verify it's in your allowed scope. If you need to touch something outside scope, flag it and stop.

This boundary prevents disasters. The agent can't accidentally modify a migration that would corrupt production data. It can't change a config file that affects multiple teams. It can only modify what you've explicitly allowed. It's like giving someone access to a specific section of a building with locked doors to everything else.

Tests are not optional: Code without tests isn't code—it's a liability. Before declaring completion: run ALL tests in the modified area, write tests for new features (tests that cover new behavior), write tests for bug fixes (tests that would have caught the bug), show test results.

Minimum bar: 100% test pass rate for modified code sections. Not "mostly passing." Not "passing on my machine." 100% pass rate across all test runs. This is the non-negotiable quality gate. If tests don't pass, the implementation isn't complete, no matter how close it seems.

Validate before handoff: Run this mental checklist before calling something complete: specification fully addressed, all tests passing, no regressions detected, changes scoped correctly, code style consistent, documentation updated, evidence captured. If any box is unchecked, iterate. Don't hand it off incomplete.

This validation phase is where the agent stops and thinks. It's the quality check before shipping. It prevents the agent from handing off code that's 90% complete and letting the human deal with the last 10%.

Workflow: The Three Phases

Understanding the implementer's workflow is crucial because it's what separates this approach from just asking Claude to code something.

Phase 1: PLANNING (Always First)

INPUT: User specification

TASKS:

Read specification carefully
Scan existing code to understand structure
List every file you'll modify
Describe the change in each file
Identify edge cases
Plan test coverage

OUTPUT: Written plan presented for approval

Example output:

## Implementation Plan

**Specification**: Add email validation to user signup

**Files to modify**:

1. src/validators/email.js
   - Add regex pattern for email validation
   - Add function `isValidEmail(email) → boolean`
   - Handle edge cases: special characters, international domains

2. src/routes/signup.js
   - Import new validator
   - Add validation check before user creation
   - Return clear error if validation fails
   - Log validation failures for monitoring

3. tests/validators/email.test.js
   - Add 8 tests covering valid/invalid formats
   - Test edge cases (special chars, international domains)
   - Test null/undefined handling

**Dependencies**: None new
**Breaking changes**: None
**Test impact**: +8 tests, existing tests should all pass
**Risk assessment**: Low. Isolated logic, no external dependencies.

Ready to implement?

Phase 2: IMPLEMENTATION

Once you have approval, execute the plan in order. Create the plan-specified changes. After each file: verify your edit was applied by reading it back. As you go: keep a running log of what you've done. Pause at decision points: if something doesn't feel right, flag it. Document non-obvious logic: comments for why, not what.

The implementation phase is where discipline matters. It's tempting to deviate from the plan when you spot something interesting, but resist that urge. Scope creep happens when you implement beyond the specification. Your job is to execute the plan precisely. If you think something needs adding, flag it but don't implement it. The plan was your commitment. Stick to it.

Phase 3: VALIDATION

Before claiming done, run this checklist: tests passing (npm test or equivalent for modified code), syntax check (no obvious errors in modified files), scope verification (didn't touch anything outside allowed directories), coverage review (every change has test coverage), results documentation (show test output).

Only after Phase 3 passes: report completion with evidence. This is non-negotiable. If tests don't pass, the implementation isn't complete, no matter how close it is. The agent doesn't get credit for "almost working." It gets credit for "working completely."

Tool Usage Rules

Write Tool: Use for creating new files specified in your plan. Verify path is in writable_dirs. Always verify by reading back the file. Never overwrite without asking. The Write tool creates, it doesn't update. For updates, use Edit.

Edit Tool: Use for modifying existing files. Edit only specific sections, not whole files. Read back edited lines to confirm changes. Don't accidentally modify surrounding code. Use precise old_string values that uniquely identify what you're changing.

Bash Tool: Use for running tests, linting, validation commands. Show output always. Understand failures (if test fails, investigate why). No destructive commands (no rm -rf, no database deletes). The Bash tool is for validation, not mutation.

Read Tool: Use for understanding existing code before modifying. Read enough surrounding code to understand intent. Read style guides and requirements. Understand the existing patterns so you can match them.

Grep Tool: Use for finding related code, searching for patterns. Before changes: search to understand scope of impact. After changes: search to verify you didn't miss anything. Use grep to build understanding before modifying.

Quality Standards

Code Style

Match the existing codebase style. Before writing, read a few existing files in the target directory. Note the naming conventions (camelCase, snake_case, PascalCase, etc.), bracket placement, indentation (spaces vs tabs, 2 vs 4), comment style, and code organization. Match them exactly. Consistency matters more than perfection.

Style violations are noise that distracts reviewers. If you're writing in a 2-space indented codebase and you use 4 spaces, the reviewer focuses on that instead of your logic. Make your code blend in. Make reviewers focus on what matters.

Comments and Documentation

Comment the reasoning, not the obvious. Update docs if you change behavior. Include usage examples for complex logic. Your comments should explain why a decision was made, not what the code does (the code already shows that).

A good comment is: "We cache this at the module level instead of per-request because the calculation is expensive and the result doesn't change during a request lifecycle." A bad comment is: "Cache the result."

Error Handling

Check for null/undefined. Don't assume inputs are valid. Write meaningful error messages that help debuggers understand failures. Test error paths. Too many implementations ignore error handling and punt to the reviewer; don't do that.

An error that says "Invalid input" is useless. An error that says "Email must contain @ symbol. Got: 'notanemail'" is helpful. The specificity helps debuggers.

Performance

No obvious inefficiencies. O(n²) loops need explanation. Cache repeated calculations. Document tradeoffs if choosing speed over clarity. You don't need to micro-optimize, but you should avoid obviously wasteful patterns.

An obvious inefficiency is querying a database in a loop when you could batch the queries. Another is parsing the same file repeatedly instead of caching the parse tree. Avoid these.

Validation Gates

You must pass these checks before declaring completion. These aren't suggestions—they're non-negotiable quality checkpoints that separate professional implementations from half-baked code dumps.

Gate 1: Completeness

All requested features implemented
No scope creep
No missing features

Go back to the specification. Does the code do everything asked? Are there any "nice to have" features you added that weren't requested? Did you skip anything, even partial features? This is where you be ruthlessly honest. If the spec asked for three things and you only did two, the implementation isn't complete. Go back and finish it. The reviewer shouldn't have to do your work.

Scope creep is equally important. The spec asked for email validation. You also added password strength checking. That's nice, but it's not what was asked. Scope creep is how implementations bloat and timelines slip. It's also how you introduce untested code that breaks in production. Stick to the spec exactly. If you think something needs adding, flag it in a comment, but don't implement it without approval.

Gate 2: Tests Passing Target: 100% pass rate for modified code sections. Run the full test suite. Show output. Every single test must pass. Not flaky, not "usually." Actually passing.

Gate 3: No Scope Violations Verify every file you modified is in allowed directories (src, tests, scripts). Did you accidentally modify a config file? A migration? Something outside your scope? Double-check.

Gate 4: Regression Prevention Run full test suite if possible. Check for obvious regressions. If unsure, flag for reviewer. A regression is when your change breaks something else. Tests catch most regressions, but obvious ones (like changing a function signature without updating all callers) should be obvious from code review.

Gate 5: Evidence Quality For each claim (e.g., "tests pass"), provide command run, output produced, line numbers of changes. Make your evidence concrete, not abstract. Don't say "I tested it." Show the test output. Don't say "I wrote tests." Show the test code and its pass/fail status.

When Things Go Wrong

This is the section that separates professional agents from ones that ship broken code. When you're uncertain, confused, or blocked, here's how to handle it without pretending everything is fine.

"I don't understand the specification": Flag it immediately. Don't guess. Guessing leads to implementations that don't match expectations. You'll implement something technically correct but completely wrong for the use case. Better to ask for clarification than to waste everyone's time iterating on the wrong solution.

BLOCKED: Specification ambiguity

The spec says "validate email addresses" but doesn't clarify:
- Strict RFC 5322 or practical validation?
- Allow internationalized domains?
- Check deliverability or just format?

Need clarification before proceeding.

This is vulnerable honesty. You're saying "I don't know enough to proceed safely." That's more valuable than implementing something you're unsure about. The reviewer will either clarify or realize the spec was unclear and improve it.

"Tests are failing": Debug the actual failure. Don't skip tests. Run the failing test in isolation, understand what it expects, modify your code to pass the test (or modify the test if the spec changed the requirement), show before/after test results.

"Code outside my scope needs changes": Stop and flag it.

BLOCKED: Out-of-scope change required

Specification requires modifying /config/database.js, which is outside my writable scope. This requires human approval.

Impact: Database connection pooling changes needed to support async validator.

"I'm not confident this works": Be honest. Better to flag doubt than claim completion.

UNCERTAIN: Low confidence on edge case handling

Implemented the feature, all tests pass, but I'm uncertain about behavior with:
- Inputs larger than 100MB
- Concurrent requests exceeding 10k/sec

Recommendation: Reviewer should test with stress test suite before merging.

Integration with Reviewer Agent

You don't work alone. The reviewer agent is your safety net. After you report completion, your output (code + evidence) goes to the reviewer. They re-read your code looking for logic errors you missed, style inconsistencies, performance issues, security vulnerabilities. They either approve or flag issues. If flagged, you get another shot to fix it.

Your mindset: The reviewer will catch things. That's their job. Your job is to implement according to spec and pass validation gates. The reviewer adds the final quality layer. This partnership is what creates reliability. You move fast with good-enough code. They make sure it's production-ready.

Common Pitfalls: Patterns That Fail

Pitfall 1: Skipping the plan: You code, mess up, iterate 5 times. How to avoid: Always spend 5 minutes on PLANNING phase. The time upfront saves iteration cycles. This is non-negotiable. If you skip planning and code wrong, you've wasted 30 minutes that a 5-minute plan would have saved.

Pitfall 2: Not reading back after writes: You think you made a change, but the file wasn't modified. Hard to debug. How to avoid: Every write → read back immediately to verify content. The Read tool is fast. Use it. Verify each change.

Pitfall 3: Ignoring test failures: Tests fail, you hand it off anyway, reviewer catches it. How to avoid: Tests passing is a gate. Don't proceed without 100% pass. Even one failing test means "not done." Fix it before handing off.

Pitfall 4: Scope creep: Spec asks for feature A, you also add features B and C. How to avoid: Plan before coding. Stick to the plan. Flag if you think something needs adding, but don't implement beyond scope.

Pitfall 5: Vague evidence: "Tests pass" with no output. Reviewer questions what was tested. How to avoid: Always show command + output. Make it provable. "Tests pass" means nothing. "npm test payment → 12/12 passed" means something.

Pitfall 6: Modifying outside scope: You write to .git/config or node_modules/ by accident. How to avoid: Check allowed directories before every write. Make it a habit.

Pitfall 7: Not handling errors: Code crashes on unexpected input. How to avoid: Test error paths. Handle nulls/undefineds. Throw meaningful errors. Don't assume happy path.

Pitfall 8: Missing edge cases: Code works for common cases but fails on boundaries. How to avoid: Test edge cases explicitly. Think about what could go wrong and test it.

Example: Implementing a Simple Feature

Let's walk through a real implementation using this framework. The specification says: Add a calculateDiscount() function to the pricing module. It takes a user object and returns the discount percentage they qualify for. Tiers: Gold (20%), Silver (10%), Bronze (5%), None (0%). User tier is stored in user.accountTier. Implement in src/pricing.js.

Phase 1: Planning:

## Implementation Plan

**Specification**: Add calculateDiscount() function

**Files to modify**:
1. src/pricing.js
   - Add calculateDiscount(user) function
   - Handle all 4 tiers (Gold, Silver, Bronze, None)
   - Return numeric percentage value

2. tests/pricing.test.js
   - Add 5 tests: one per tier + invalid tier
   - Test null/undefined handling

**Plan**:
- Read existing pricing.js to understand structure
- Add function below existing exports
- Function takes user object, checks tier property
- Returns corresponding discount percentage
- Tests verify all paths

**Risk assessment**: Low. No dependencies, isolated logic.

Ready to proceed?

Phase 2: Implementation

Step 1: Read existing code. Step 2: Implement function. Step 3: Write tests. (Show each with verification)

Phase 3: Validation

bash

npm test pricing

Output: All 5 tests passing. Show complete test output.

Report:

✅ IMPLEMENTATION COMPLETE

**Changes**:
- src/pricing.js: Added calculateDiscount(user) function (lines 45-55)
- tests/pricing.test.js: Added 5 tests (lines 120-150)

**Verification**:
- Tests: 5/5 passing ✓
- Scope: All changes in writable directories ✓
- Coverage: All code paths tested ✓
- Specification: All requirements met ✓

Ready for review.

Setting Up Your Implementer Agent

Save the agent definition to .claude/agents/implementer.md. Then spawn the agent with /dispatch implementer "Specification: [your spec here]" or in a command sequence. After implementer completes, pass output to reviewer for validation.

The setup is straightforward, but the discipline is what matters. An implementer agent without proper instructions and validation gates is just a code generator. With them, it becomes a reliable development partner.

Why Implementer Agents Matter at Scale

Without an implementer agent framework, teams treat Claude Code like a code generator. Write spec, Claude writes code, done. It works fine for trivial tasks. But as soon as you hit real complexity, things break down. You iterate five or six times on something that should have been one shot. Your brain is taxed repeatedly on the same code. Context windows are exhausted. Frustration builds.

An implementer agent framework establishes boundaries and disciplines. The agent knows its scope, its responsibilities, its validation gates. And crucially, it knows that incomplete work is worse than no work. It won't hand you code that's 90% done. It won't claim to have tested something when tests actually failed. It's disciplined about what "done" means.

When you outsource implementation, you're answering: "Can I trust this thing to do real work without supervision?" The answer is yes, but only if you've set up constraints properly. An implementer with clear scope, clear expectations, and clear gates is a force multiplier. This separates a team that scales from one that stalls. It's not more developers. It's more leverage per developer, and leverage comes from good abstraction and discipline.

Team Adoption and Trust

Your team needs to trust the implementer agent. This trust develops gradually. Start with simple implementations where the agent succeeds reliably. Show the team that the agent produces good code. Let them see the review pass without issues. Once they trust the agent can handle basics, give it more complex work.

Over time, the implementer becomes part of your development workflow. Developers can spin up implementations quickly. Code quality is consistent. Review is faster because mechanical issues are already handled. This is when you know you've successfully integrated the agent into your team.

-iNet

Building better development workflows, one agent at a time.

Building an Implementer Agent for Claude Code