Test coverage is one of those things every development team knows they should care about, but actually achieving 90%+ coverage across a real codebase? That's where things get messy. Writing tests is repetitive, it requires understanding code logic deeply, and it's often deprioritized when features are due. What if you could automate the boring parts and let an intelligent agent handle test generation while you focus on the creative, business-logic thinking?

That's exactly what a test writer agent does in Claude Code. Instead of manually writing test suites, you point it at your source code, and it generates comprehensive tests with built-in validation—tests that actually run and pass before they're returned to you.

In this article, we'll build a complete test writer agent from the ground up. We'll walk through the agent's instructions, tool configuration, validation loops, and the strategies that make generated tests genuinely useful. By the end, you'll understand how to set up an agent that writes tests as well as your best developers would.

Why This Matters: The Test Coverage Trap

Most teams operate in a test paradox. You know you should have 90%+ test coverage. You know tests save money in the long run by catching bugs early. You know code without tests becomes unmaintainable and fragile. Yet somehow, coverage stays at 40% or 50%, year after year. Why?

The answer is friction. Writing tests is tedious. It requires discipline. It requires understanding the code deeply enough to predict failure modes. It requires discipline to maintain tests as code evolves. When you're under deadline pressure (always), tests are the first thing cut. The promised "we'll add tests later" never materializes. Debt accumulates.

A test writer agent removes the friction at the source. Instead of developers choosing between shipping features and writing tests, the agent generates tests automatically. Coverage improves not because you're stricter about enforcement, but because friction is gone. This shifts the dynamic fundamentally. You don't have to convince people to write tests. Tests just happen. The culture shifts from "testing is important but optional" to "testing is automatic."

The business case is compelling. Studies show that one escaped bug costs 5-10x more to fix in production than during development. A test that catches one production bug pays for years of tooling. But there's a deeper reason: test-driven development cultures make developers more productive. When tests are automatic, developers move faster and with more confidence. They refactor fearlessly. They see immediate feedback. They learn faster.

Why Agents Are Better Than One-Off Test Generation

Before we dive into the implementation, let's be clear about what we're solving. A simple prompt to "write tests for this code" gets you... something. But it's not reliable. The AI might write tests that don't actually run. It might miss edge cases. It might generate tests that are flaky or don't match your framework's conventions.

This is the difference between test generation and test writing. Test generation is a one-shot operation: you prompt an AI, it produces code, you hope it works. Test writing is a deliberate, iterative process with validation, cleanup, and refinement built in. It's the difference between getting code and getting working code.

The problem with naive test generation gets worse at scale. If you generate tests for fifty functions and five percent are broken, you've introduced five broken tests into your codebase. Those broken tests create noise. They fail sporadically. Developers stop trusting the tests and ignore failures. The whole system degrades. This is how test coverage becomes a liability instead of an asset.

A test writer agent is different. It's a specialized system with:

Constrained access: Read-only access to source code, write-only access to test directories
Built-in validation: Tests are run before being returned to you
Framework awareness: Configuration for Jest, Vitest, pytest, etc.
Coverage targets: Explicit goals (90%+, specific file coverage)
AAA discipline: Arrange-Act-Assert pattern enforced across all generated tests
Feedback loops: If tests fail, the agent investigates and fixes them

Think of it as a colleague who's obsessed with test quality and won't let bad tests out the door. This isn't just a code generation tool—it's a quality assurance system that treats test passing as a prerequisite to returning results.

When you deploy a test writer agent, you're inverting the responsibility model. Instead of you being responsible for validating generated tests before they enter the codebase, the agent is responsible for only returning valid tests. This shift in responsibility changes everything. You get autonomy without the risk. The agent operates within constraints that prevent bad output from leaking into your codebase.

The agent also adapts to your specific codebase. It learns patterns from existing tests. It respects your naming conventions, your assertion styles, your test organization. If your codebase uses describe blocks grouped by component, the agent generates tests that match that pattern. If you have a test utilities library for common setup, the agent uses it. The agent becomes a team member who understands your specific way of working.

Architecture: How the Agent Works

A test writer agent operates in a simple loop:

Scan the source code to understand what needs testing
Analyze the code structure (functions, classes, public APIs)
Generate tests following the AAA pattern and your framework config
Write tests to the test directory
Run the tests to validate they pass
Report coverage and any failures
Fix or regenerate if tests fail
Return only tests that pass, with coverage metrics

The key insight: tests are validated before delivery. No more "oh, these tests need debugging"—they're production-ready when you get them. This single principle changes everything about how you think about test automation. Instead of treating the agent as a code generator that you'll need to fix, you treat it as a quality gatekeeper that ensures its own output meets standards.

Configuration: Setting Up the Agent

Here's what a test writer agent configuration looks like in Claude Code's YAML format. This goes in .claude/agents/test-writer-agent.yaml. The configuration is your contract with the agent—it defines permissions, constraints, and what success looks like:

yaml

name: test-writer-agent
alias: test-writer
purpose: Generate and validate comprehensive test suites from source code
model: claude-opus-4-1 # Powerful model for complex test logic
 
constraints:
  max_context_length: 180000
  temperature: 0.3 # Low temperature for consistency
  timeout_seconds: 300
 
permissions:
  read:
    - patterns: ["**/*.js", "**/*.ts", "**/*.py", "**/*.jsx", "**/*.tsx"]
    - purpose: "Source code analysis"
  write:
    - patterns:
        ["**/test/**", "**/__tests__/**", "**/*.test.ts", "**/*.spec.py"]
    - purpose: "Test file generation"
  execute:
    - commands: ["npm test", "jest", "vitest", "pytest"]
    - purpose: "Validate generated tests"
    - timeout: 60
 
configuration:
  framework: jest # Options: jest, vitest, pytest, mocha, tap
  coverage_target: 0.90 # 90% minimum coverage
  coverage_type: line # Options: line, branch, function, statement
  test_pattern: "*.test.ts" # Naming convention
 
  aaa_pattern: true # Enforce Arrange-Act-Assert
  include_edge_cases: true
  include_integration_tests: true
  max_tests_per_function: 5
 
  style_guide:
    naming: snake_case # test_function_returns_expected_value
    describe_blocks: true # Group related tests
    single_assertion: false # Allow multiple related assertions
    descriptive_messages: true # Clear error messages in assertions
 
memory:
  - type: coverage_history
    path: memory/coverage_history.jsonl
    purpose: Track coverage trends
 
  - type: test_patterns
    path: memory/test_patterns.jsonl
    purpose: Learn common testing patterns in this codebase
 
  - type: framework_config
    path: memory/framework_config.json
    purpose: Store framework detection results
 
validation:
  pre_write:
    - Verify source file exists and is readable
    - Check test directory structure
    - Load framework configuration
 
  post_write:
    - Run all generated tests
    - Verify coverage meets target
    - Check for passing test rate (100%)
    - Report any flaky tests
 
  failure_strategy:
    - Log the failure reason
    - Analyze what went wrong
    - Regenerate or fix the test
    - Re-run validation
    - Return only passing tests
 
quality_gates:
  test_pass_rate: 1.0 # 100% of tests must pass
  minimum_coverage: 0.90 # 90% coverage minimum
  no_flaky_tests: true # All tests must pass consistently
  aaa_compliance: true # All tests follow Arrange-Act-Assert

This configuration tells Claude Code exactly what the test writer agent can and cannot do, and what success looks like. Notice how permissions are restrictive—the agent can read source code but only write to test directories. It can execute test commands but only specific ones. These constraints prevent accidental damage while enabling the agent to do its job.

Agent Instructions: The Core Logic

The actual behavior of the agent is defined in its instructions. Here's what we embed into the agent to make it generate high-quality tests. These instructions are the agent's brain—they encode all the knowledge about good test generation:

yaml

system_instructions: |
  You are a specialized test writing agent for Claude Code. Your job is to generate
  comprehensive, production-ready test suites from source code.
 
  ## Your Core Principles
 
  1. **Arrange-Act-Assert (AAA) Pattern**: Every test follows this structure:
     - ARRANGE: Set up test data, mocks, and preconditions
     - ACT: Call the function or method being tested
     - ASSERT: Verify the result matches expectations
 
  2. **Coverage-Driven**: Always aim for 90%+ line coverage. Generate tests for:
     - Happy path (normal usage)
     - Error cases (invalid inputs, exceptions)
     - Edge cases (boundary values, null, undefined, empty)
     - Integration scenarios (how this code interacts with other modules)
 
  3. **Validation-First**: ALWAYS run tests before returning them. If a test fails:
     - Investigate the failure
     - Fix the test or the approach
     - Re-run until passing
     - NEVER return failing tests
 
  4. **Framework Aware**: Respect the project's testing framework and conventions.
     Generate tests that match existing test patterns in the codebase.
 
  ## Your Workflow
 
  ### Phase 1: Analysis (Read Source Code)
  Use Glob and Grep to understand:
  - What functions/classes exist?
  - What are their signatures?
  - What do they do?
  - What are likely failure modes?
  - Are there existing tests to learn from?
 
  ### Phase 2: Planning
  Before writing a single test:
  - List the function/method you're testing
  - Identify test cases needed (happy path, errors, edges)
  - Estimate coverage impact
  - Check if tests already exist
 
  ### Phase 3: Generation
  Write tests following AAA pattern...

This instruction block is comprehensive. It encodes all the knowledge about good test generation so the agent doesn't have to guess. The agent uses this every time it generates tests, creating consistency across your entire test suite. The instructions are specific about patterns (AAA), goals (coverage targets), and priorities (validation before return).

Implementing the Tools: What the Agent Can Do

The test writer agent needs specific tools to work effectively. Here's the tool configuration. Each tool is restricted to specific purposes. The agent can't delete source files, can't run arbitrary commands, and can't modify anything but tests. This creates a safe sandbox where the agent can operate autonomously without risk:

yaml

tools:
  read:
    description: Read source code files for analysis
    tool_type: file_read
    allowed_patterns:
      - "**/*.js"
      - "**/*.ts"
      - "**/*.py"
      - "**/*.jsx"
      - "**/*.tsx"
      - "**/*.java"
    instruction: |
      Use this to examine source code. Read the file completely to understand
      function signatures, logic flow, and dependencies.
 
  glob:
    description: Find files matching patterns
    tool_type: glob
    instruction: |
      Use to discover all source files in a directory. Example:
      - Glob for "src/**/*.ts" to find all TypeScript files
      - Glob for "**/__tests__/**" to find existing tests
      - Use to understand project structure
 
  grep:
    description: Search for patterns in code
    tool_type: grep
    instruction: |
      Use to find imports, dependencies, and patterns. Examples:
      - Grep for "export function" to find public APIs
      - Grep for "throw new" to find error cases
      - Grep for "describe(" to find existing test patterns
      - Grep for "@deprecated" to skip obsolete functions
 
  write:
    description: Write test files to test directory
    tool_type: file_write
    allowed_patterns:
      - "**/test/**"
      - "**/__tests__/**"
      - "**/*.test.ts"
      - "**/*.test.js"
      - "**/*.spec.py"
    instruction: |
      Write test files following the project's framework and conventions.
      Always write to the test directory structure. Use the naming convention
      from the configuration (e.g., *.test.ts for Jest).
 
  bash_execute:
    description: Run test commands and validation
    tool_type: bash
    allowed_commands:
      - "npm test"
      - "jest"
      - "vitest"
      - "pytest"
      - "npm run test:coverage"
      - "jest --coverage"
    instruction: |
      Execute test commands to validate generated tests. Always run:
      1. npm test (or appropriate test command)
      2. Check for passing rate
      3. Run with coverage flag to get metrics
      4. Never return until tests pass

Notice how specific these tool definitions are. For bash, only test commands are allowed. For write, only test directories. For read, only source files and tests. This granularity of control means the agent stays within boundaries while having everything it needs to generate good tests.

The Validation Loop: Running Tests Before Return

Here's the critical part that separates a test writer agent from a test generator: the validation loop runs tests before returning them. The agent doesn't just generate code—it verifies that code actually works. This is what makes the output reliable.

The agent's workflow looks like this, and each phase is essential:

yaml

validation_workflow:
  phase_1_discovery:
    - Glob source directory
    - Identify files needing tests
    - Grep for existing tests (don't duplicate)
    - Read source files to understand logic
 
  phase_2_planning:
    - List functions/classes to test
    - Identify test cases per function
    - Check for special frameworks or mocks needed
    - Review any existing test patterns in codebase
 
  phase_3_generation:
    - Write test file following AAA pattern
    - Create describe blocks grouped by function
    - Generate 3-5 meaningful tests per function
    - Include happy path, error, edge cases
 
  phase_4_execution:
    step_1: "Run: npm test"
    step_2: "Capture output and check for PASS/FAIL"
    step_3: |
      If ANY test fails:
        - Log the failure message
        - Read the generated test that failed
        - Read the source code again
        - Understand why it failed
        - Fix the test or adjust expectations
        - Re-run tests
      If ALL tests pass:
        - Run with coverage flag
        - Check coverage percentage
        - Log results
 
  phase_5_coverage_check:
    minimum: 0.90
    action_if_below: |
      - Identify uncovered lines (from coverage report)
      - Generate additional tests for gaps
      - Rerun validation
    action_if_met: "Proceed to reporting"
 
  phase_6_reporting:
    output: |
      ## Test Generation Summary
 
      **Files Processed**: X files
      **Tests Generated**: Y total tests
      **Coverage Achieved**: Z%
      **Status**: ✓ All tests passing
 
      ### Coverage by File:
      [detailed breakdown]
 
      ### Tests Generated:
      - function_name: N tests
      - class_name: N tests
 
      ### Any Issues:
      [None, or list of limitations]

The key is phase_4_execution: tests are run immediately after generation. If they fail, the agent fixes them right there. No returning broken tests. This dramatically improves the quality and reliability of generated tests.

This execution-and-fix loop is where the magic happens. Most test generation tools run once and return what they produce. A test writer agent is different—it's responsible for test success. When a test fails, the agent investigates. It reads the error message. It checks the source code again. It understands what went wrong. It revises the test or adjusts the expectation. It runs the test again. This cycle repeats until success.

This is similar to how experienced developers actually write tests. You write a test, run it, it fails, you fix it, run it again. The difference is the agent does this thousands of times with perfect consistency. It never gets tired. It never stops investigating. It never compromises by returning "mostly working" code.

Why This Matters: Quality Over Quantity

A test writer agent could generate thousands of tests in minutes. But the value isn't in the quantity—it's in the quality. A test that doesn't run is worthless. A test that doesn't validate real behavior is misleading. A test that breaks every time you refactor is expensive maintenance.

By enforcing validation before return, we ensure every test is production-ready. This transforms testing from a friction point ("we have tests but they're constantly broken") to a value proposition ("tests always work and give us confidence").

Real-World Scenario: Test Coverage Across a Team

The test writer agent shines in team environments where consistency and scale matter. Consider a 15-person engineering team with three codebases in JavaScript, Python, and Go. Manual test writing has stalled coverage at 45% across all three. They deploy the test writer agent.

Within one week:

JavaScript codebase goes from 45% to 78% coverage
Python codebase goes from 41% to 82% coverage
Go codebase goes from 51% to 84% coverage

More importantly, developers start writing tests automatically. Because it's no longer friction, tests become habitual. After three months, coverage stabilizes at 85-88% (some code is legitimately hard to test). The team ships fewer production bugs. Code reviews move faster (less time arguing about missing tests). Refactoring happens with confidence.

The agent didn't solve testing overnight. But it removed the friction that prevented testing from happening in the first place.

Production Considerations: Scaling Tests Across Teams

When you deploy a test writer agent across a team or organization, you face scaling challenges. A single developer running the agent on their feature branch is one scenario. Fifteen developers running it across dozens of branches daily is quite different.

At scale, test generation becomes a resource allocation problem. Each test generation runs the full validation loop—reading source, generating tests, running tests, gathering coverage metrics. This is CPU-intensive and time-consuming. When multiple developers run the agent simultaneously, you're competing for resources. A single test generation might take 5 minutes. If five developers run the agent in parallel, you're using significant compute resources and waiting time becomes frustrating.

The solution is to integrate the agent into CI/CD pipelines rather than running it locally. When a developer pushes code, the pipeline automatically runs the test writer agent on changes. This happens on CI infrastructure, not the developer's machine. Developers get generated tests in their pull request automatically, without waiting. This distributes load across infrastructure designed for it. It also creates consistency—all generated tests are created in the same environment with the same model, reducing variance.

Storage and Performance: Generated tests accumulate. A 20,000-line codebase might generate 50,000+ lines of tests. Your repository grows. Developers' local clones get large. CI/CD pipelines take longer. You need to think about cleanup strategies. Some teams exclude test files from certain analyses. Some implement cleanup policies (delete tests older than 90 days for deleted code). Some use test compression strategies (deduplicate similar tests).

Framework Consistency: As your codebase ages, you might have tests in three different frameworks (Jest, Vitest, Mocha). The test writer agent needs to detect which framework is used where and respect it. This requires configuration per-directory or per-project. You'll need to document this complexity so developers understand why their tests are generated in Framework X instead of Framework Y.

Feedback Loops: Generated tests are generated. If the underlying code changes significantly, tests might not change to match. You need processes for test maintenance. Should developers update generated tests? Delete and regenerate? Some teams require developers to manually review and approve regenerations. Others trust the agent completely. Neither is wrong, but the choice has consequences.

CI/CD Integration: Test execution becomes a bottleneck. If generating and running tests takes 5 minutes per PR, and each PR spends an hour in CI, tests are adding significant cycle time. You might need to run generation on developer machines (faster feedback, uses local resources) and only validate in CI. Or run test generation in parallel with other CI jobs. The infrastructure matters.

Monitoring and Maintenance: Keeping Tests Healthy

A test writer agent is software. It can develop bugs. It can generate test suites with subtle flaws. You need monitoring to catch problems early. The agent's output quality is a reflection of its instructions, configuration, and the model it uses. If quality degrades, you need diagnostics to understand why.

Think of the test writer agent like a new team member. When they start, you oversee their work closely. You review their output. You provide feedback. As they prove competence, you trust them more. But you never stop monitoring. If their quality drops, you investigate. The same is true with agents.

Coverage Regression: Track coverage over time. If coverage drops week-over-week, something changed. Either code is becoming harder to test, or tests are becoming less thorough. Investigate why. A coverage drop is an early warning signal. It might indicate that the agent is generating tests for simpler code paths and missing complex ones. Or it might indicate that code complexity is increasing faster than test generation can keep up.

Test Failure Rates: Most generated tests pass. But some might be flaky. If a test that was written last week starts failing inconsistently, that's a problem with the test (probably too tight coupling to implementation) or with the code it tests (probably a real bug emerging). Track failure rate trends. Sudden increases are suspicious.

False Positive Analysis: Track when developers disagree with generated tests. "The agent wanted to test X, but X isn't actually important." Over time, patterns emerge about where the agent misses domain knowledge. Use these patterns to improve prompts or guidelines.

Assertion Quality: Periodically review generated assertions. Are they meaningful? Or are they checking things like expect(result).toBeDefined()? Shallow assertions don't catch bugs. If your agent generates shallow assertions, you need to tune it toward deeper validation.

Common Pitfalls to Avoid

When you're setting up a test writer agent, watch out for these mistakes. Understanding these pitfalls helps you configure the agent correctly and review its output effectively.

1. Flaky Tests

If tests sometimes pass and sometimes fail, they're flaky. This usually means:

Tests depend on timing (use jest.useFakeTimers())
Tests depend on external APIs (mock them)
Tests share state (use beforeEach to reset)

Solution: Agent should detect flakiness and regenerate. Build this into your validation loop: run tests 3 times, if they don't consistently pass, flag them.

2. Testing Implementation Details

Tests that check private methods or internal state are brittle. Refactoring the implementation breaks tests even though behavior hasn't changed.

javascript

// BAD: Tests implementation detail
expect(obj._internalState).toBe("something");
 
// GOOD: Tests behavior
expect(obj.getValue()).toBe("expected");

Solution: Agent should focus on public APIs, not private internals. This is a configuration setting—tell the agent to generate tests for exported functions only.

3. Incomplete Mocking

Forgetting to mock external dependencies causes tests to hit real APIs. They become integration tests that are slow and flaky.

javascript

// BAD: Calls real API
const user = await getUserFromAPI(123);
 
// GOOD: Mocks API
jest.mock("../api");
const mockUser = { id: 123, name: "John" };
const user = await getUserFromAPI(123); // Returns mock

Solution: Agent should identify external calls and mock them automatically. This requires understanding your project's structure—which modules are external, which are internal.

4. Coverage Without Quality

High coverage numbers don't mean good tests. 95% coverage with shallow assertions means you're not actually validating much:

javascript

// BAD: Test passes but doesn't verify anything meaningful
it("should do something", () => {
  const result = myFunction();
  expect(result).toBeDefined(); // Too loose
});
 
// GOOD: Test verifies actual behavior
it("should return sum of two numbers", () => {
  expect(add(2, 3)).toBe(5);
});

Solution: Agent should generate tests with meaningful assertions. This is about depth, not breadth. Better to have 70% coverage with strong assertions than 95% coverage with weak ones.

5. No Edge Cases

Missing tests for boundary conditions leaves your code vulnerable to subtle bugs.

javascript

// Incomplete: Missing edge cases
it("should divide two numbers", () => {
  expect(divide(10, 2)).toBe(5);
});
 
// Complete: Includes edge cases
it("should divide two numbers", () => {
  expect(divide(10, 2)).toBe(5);
});
 
it("should throw error when dividing by zero", () => {
  expect(() => divide(10, 0)).toThrow("Cannot divide by zero");
});
 
it("should handle negative numbers", () => {
  expect(divide(-10, 2)).toBe(-5);
});

Solution: Agent should systematically generate tests for happy path, errors, and edges. This is built into the agent's planning phase—before writing tests, identify all the cases that need testing.

Framework-Specific Strategies

Different testing frameworks have different idioms. Here's how the agent adapts:

Jest (JavaScript/TypeScript)

yaml

jest_config:
  test_pattern: "*.test.ts"
  test_runner: "npm test"
  coverage_command: "npm test -- --coverage"
 
  key_patterns:
    - describe()/it() blocks
    - expect() assertions
    - Mock with jest.mock()
    - Async with async/await
    - beforeEach() for setup
 
  example_async_test: |
    it('should fetch user data', async () => {
      // ARRANGE
      const userId = 123;
      jest.mock('../api');
      const mockFetch = jest.fn().mockResolvedValue({ id: 123, name: 'John' });
 
      // ACT
      const user = await fetchUser(userId);
 
      // ASSERT
      expect(user.name).toBe('John');
      expect(mockFetch).toHaveBeenCalledWith(userId);
    });

Vitest (Fast Jest Alternative)

yaml

vitest_config:
  test_pattern: "*.test.ts"
  test_runner: "vitest"
  coverage_command: "vitest run --coverage"
 
  key_patterns:
    - Same as Jest (drop-in replacement)
    - import { describe, it, expect } from 'vitest'
    - vi.mock() instead of jest.mock()
    - Better performance for large suites

pytest (Python)

yaml

pytest_config:
  test_pattern: "test_*.py"
  test_runner: "pytest"
  coverage_command: "pytest --cov"
 
  key_patterns:
    - Functions starting with test_
    - assert statements (not expect)
    - @pytest.mark.parametrize for multiple inputs
    - @patch decorators for mocking
    - fixtures for setup/teardown
 
  example_test: |
    def test_calculate_age_with_valid_date():
        # ARRANGE
        birth_date = date(1990, 5, 15)
 
        # ACT
        age = calculate_age(birth_date)
 
        # ASSERT
        assert age >= 30

The agent detects the framework from configuration and uses the appropriate syntax. This means the agent can work across polyglot teams—generate Jest tests for the JavaScript codebase, pytest tests for Python, all with consistent quality and style.

Practical Example: Test Writing Session

Let's walk through a real example. Say we have this source file:

javascript

// src/utils/calculateAge.ts
export function calculateAge(birthDate: Date): number {
  if (!(birthDate instanceof Date)) {
    throw new Error('birthDate must be a Date object');
  }
 
  if (birthDate > new Date()) {
    throw new Error('birthDate cannot be in the future');
  }
 
  const today = new Date();
  let age = today.getFullYear() - birthDate.getFullYear();
  const monthDiff = today.getMonth() - birthDate.getMonth();
 
  if (monthDiff < 0 || (monthDiff === 0 && today.getDate() < birthDate.getDate())) {
    age--;
  }
 
  return age;
}

The test writer agent would:

Phase 1-2: Analyze and Plan

Looking at this function, the agent identifies what needs testing:

Happy path: Normal birthdate (e.g., 30 years ago)
Error: Invalid input (not a Date)
Error: Future date
Edge case: Birthday today
Edge case: Birthday is tomorrow (hasn't happened yet this year)
Edge case: Very old age

Each of these cases tests a different code path and ensures the function behaves correctly.

Phase 3: Generate Tests

javascript

// src/utils/__tests__/calculateAge.test.ts
describe('calculateAge', () => {
  it('should return correct age for a date 30 years ago', () => {
    // ARRANGE
    const thirtyYearsAgo = new Date();
    thirtyYearsAgo.setFullYear(thirtyYearsAgo.getFullYear() - 30);
 
    // ACT
    const age = calculateAge(thirtyYearsAgo);
 
    // ASSERT
    expect(age).toBe(30);
  });
 
  it('should throw error when birthDate is not a Date object', () => {
    // ARRANGE
    const invalidInput = '2000-01-01';
 
    // ACT & ASSERT
    expect(() => calculateAge(invalidInput as any)).toThrow('birthDate must be a Date object');
  });
 
  it('should throw error when birthDate is in the future', () => {
    // ARRANGE
    const futureDate = new Date();
    futureDate.setFullYear(futureDate.getFullYear() + 1);
 
    // ACT & ASSERT
    expect(() => calculateAge(futureDate)).toThrow('birthDate cannot be in the future');
  });
 
  it('should return 0 for a person born today', () => {
    // ARRANGE
    const today = new Date();
 
    // ACT
    const age = calculateAge(today);
 
    // ASSERT
    expect(age).toBe(0);
  });
 
  it('should handle birthday that hasnt occurred yet this year', () => {
    // ARRANGE
    const birthDate = new Date();
    birthDate.setFullYear(birthDate.getFullYear() - 25);
    birthDate.setMonth(birthDate.getMonth() + 1); // Birthday next month
 
    // ACT
    const age = calculateAge(birthDate);
 
    // ASSERT
    expect(age).toBe(24); // Haven't had birthday yet this year
  });
});

Notice the clear structure: ARRANGE, ACT, ASSERT. Each test is readable and tests exactly one scenario. The test names describe what's being tested, making it easy to understand what failed if a test breaks.

Phase 4: Validate

$ npm test -- calculateAge.test.ts

 PASS  src/utils/__tests__/calculateAge.test.ts
  calculateAge
    ✓ should return correct age for a date 30 years ago (4ms)
    ✓ should throw error when birthDate is not a Date object (2ms)
    ✓ should throw error when birthDate is in the future (1ms)
    ✓ should return 0 for a person born today (2ms)
    ✓ should handle birthday that hasn't occurred yet this year (2ms)

Tests:       5 passed, 5 total
Coverage:    100% Statements, 100% Branches, 100% Functions

All tests pass on the first try (usually). If any had failed, the agent would investigate and fix it before returning. This is the promise of the validation loop—you get tests that actually work.

Team Adoption: Building a Testing Culture

When your team gets access to a test writer agent, behavior changes. Some changes are positive. Some require guidance.

The Good: More Tests Get Written

Without the agent, testing is optional if you're on a deadline. With the agent, tests are automatic. Developers naturally write more tests because there's no friction. This is purely positive—more testing coverage means fewer bugs in production. You're removing one of the biggest barriers to test adoption: the friction of writing tests manually.

The Risk: Tests Become Maintenance Burden

Generated tests are code. Code requires maintenance. If your generated tests are tightly coupled to implementation details, refactoring becomes painful. Every refactor breaks tests, even though behavior hasn't changed.

Solution: Establish guidelines for what test writers (human and agent) should avoid. Share these with the agent:

yaml

guidelines:
  do_test:
    - "Public APIs and functions"
    - "Happy path and error cases"
    - "Edge cases (null, empty, boundaries)"
    - "Integration scenarios"
 
  avoid_testing:
    - "Private methods and implementation details"
    - "Getters and setters that don't have logic"
    - "Code that's too simple to break (trivial functions)"
    - "External dependencies (mock them instead)"
 
  style_preferences:
    - "Test behavior, not implementation"
    - "Write descriptive test names that explain the scenario"
    - "Use fixtures for complex test data"
    - "Keep tests independent (no shared state)"

Share these guidelines with the agent. The agent will generate tests that align with your team's values. Over time, your test suite becomes maintainable instead of burdensome.

The Risk: Over-Reliance on Coverage Metrics

Coverage percentage is visible and measurable. It's tempting to treat it as the goal. "We have 95% coverage, so we're good!" But 95% coverage with shallow assertions means you're not actually testing much. The tests execute the code but don't verify it works correctly.

Solution: Complement coverage metrics with code review. When the agent generates tests, have one person review them before merging. They're looking for:

Are the assertions meaningful? (not just expect(result).toBeDefined())
Do the tests actually exercise the code, or just execute it?
Are error scenarios properly tested?
Would this test catch a real bug?

This review takes 5 minutes and ensures quality. You're using human judgment where it matters most—understanding whether tests are actually validating behavior.

The Risk: Tests Become Outdated Documentation

Generated tests are great until your API changes. Then tests break. If you fix the tests without understanding why they broke, you might mask real issues.

Solution: Treat test failures as learning opportunities. When tests break:

Understand why they broke
Update the test if the API actually changed
Update the code if the test is right and the code is wrong
Document the decision

This discipline keeps tests accurate to your actual API. Tests become living documentation of expected behavior, not frozen snapshots of old code.

Summary

A test writer agent transforms how your team thinks about test coverage. Instead of "we should write more tests" being a perennial backlog item, it becomes an automated, reliable process. Tests are generated consistently, validated rigorously, and reported transparently.

The key principles:

Constrain Access: Read-only to source, write-only to tests
Enforce AAA: Every test follows Arrange-Act-Assert
Validate Before Return: Tests run and pass before you see them
Target Coverage: 90%+ coverage with meaningful assertions
Handle Frameworks: Configure for your specific testing setup
Report Transparently: Clear metrics and actionable feedback
Monitor Quality: Track coverage, failure rates, assertion depth
Scale Thoughtfully: Consider performance, maintenance, feedback loops
Iterate and Improve: Treat the agent as a tool that improves over time

With a well-configured test writer agent, you get the best of both worlds: the speed and consistency of automation, combined with the rigor and intelligence of a skilled test engineer.

But here's the real magic: the agent doesn't replace human judgment. It removes the friction. You write code, the agent generates tests, you review them and add domain-specific tests. This workflow—agent plus human—produces better results than either alone.

The tests that matter most are often the ones humans write after understanding domain requirements. The agent excels at the mechanical tests: happy path, error cases, boundary conditions. Humans excel at the scenario tests: "what if the customer's credit card expires mid-transaction?" But you don't have to choose. The agent handles mechanical testing at scale. Humans handle scenario testing with domain knowledge. Together, you get comprehensive coverage that's actually maintainable.

Now you have the blueprint. Go build a test writer agent that makes your codebase more reliable. Start with one language, one framework. Prove it works. Then scale. Build incrementally, learning what works for your specific codebase and team.

The tests you generate will thank you. Your developers will thank you. And your users will thank you when your code is more reliable.

-iNet

Building better software, one test at a time.

Building a Test Writer Agent for Claude Code

Why This Matters: The Test Coverage Trap

Why Agents Are Better Than One-Off Test Generation

Architecture: How the Agent Works

Configuration: Setting Up the Agent

Agent Instructions: The Core Logic

Implementing the Tools: What the Agent Can Do

The Validation Loop: Running Tests Before Return

Why This Matters: Quality Over Quantity

Real-World Scenario: Test Coverage Across a Team

Production Considerations: Scaling Tests Across Teams

Monitoring and Maintenance: Keeping Tests Healthy

Common Pitfalls to Avoid

1. Flaky Tests

2. Testing Implementation Details

3. Incomplete Mocking

4. Coverage Without Quality

5. No Edge Cases

Framework-Specific Strategies

Jest (JavaScript/TypeScript)

Vitest (Fast Jest Alternative)

pytest (Python)

Practical Example: Test Writing Session

Team Adoption: Building a Testing Culture

The Good: More Tests Get Written

The Risk: Tests Become Maintenance Burden

The Risk: Over-Reliance on Coverage Metrics

The Risk: Tests Become Outdated Documentation

Summary

Need help implementing this?