March 6, 2025
Claude AI Performance Development

Agent Cost Optimization Model Selection

When you're orchestrating teams of agents in Claude Code, every API call adds up. And when you're running dozens of agents across hundreds of tasks, those costs can balloon fast. But here's the thing: not every task needs your most expensive model. In fact, using Opus for simple linting checks is like hiring a surgeon to change your oil.

This is where smart model selection becomes your superpower. Claude gives you three main tiers—Haiku, Sonnet, and Opus—each optimized for different complexity levels. Get the model-to-task matching right, and you'll slash costs while maintaining quality. Get it wrong, and you'll watch your bill grow for no good reason.

In this article, we'll walk through a practical framework for choosing the right model for each agent, show you exactly where Haiku shines and where you need Opus, and give you the tools to measure and optimize your agent spending. Let's dig in.

Table of Contents
  1. Understanding the Claude Model Tiers
  2. The Cost-Quality-Speed Triangle
  3. Matching Tasks to Models: A Decision Framework
  4. Start with Complexity Assessment
  5. The Haiku Playlist: Where Speed and Savings Collide
  6. The Sonnet Sweet Spot: The Workhorse
  7. The Opus Lane: When Nothing Else Will Do
  8. Implementing Model Selection in Claude Code
  9. Measuring and Tracking Agent Spending
  10. The Hidden Costs of Wrong Model Selection
  11. Real-World Optimization Strategies
  12. Strategy 1: Tiered Execution
  13. Strategy 2: Hybrid Decomposition
  14. Strategy 3: Batch Processing with Haiku
  15. Strategy 4: Caching and Reuse
  16. Advanced: Dynamic Model Selection and Load-Based Routing
  17. Pitfalls to Avoid
  18. Case Study: Real Cost Optimization in Action
  19. Building Your Model Selection Framework
  20. Expected Output and Summary

Understanding the Claude Model Tiers

Before we talk strategy, let's get clear on what we're working with. Anthropic offers three models in the Claude family, each with different tradeoffs:

Haiku is the speed demon. It's your fastest model, designed for high-volume, low-complexity tasks. Think linting, simple code generation, formatting, basic pattern matching. Haiku costs roughly 1/10th the price of Opus while running 2-3x faster. The catch? It's less capable at complex reasoning, nuanced analysis, or multi-step logical problems.

Sonnet sits in the middle. It's the balanced choice—faster and cheaper than Opus, but more capable than Haiku. Sonnet handles code review, moderate analysis, writing that requires some craft, and tasks where you need decent reasoning without overkill. It's your workhorse for most general-purpose work.

Opus is the heavy hitter. It's your most capable model, excelling at complex reasoning, architectural decisions, deep debugging, and nuanced analysis. It's also the slowest and most expensive. You want Opus when the stakes are high, when the task genuinely requires sophisticated thinking, or when the complexity demands it.

Here's a rough cost comparison (as of 2026):

ModelInput CostOutput CostSpeedBest For
Haiku$0.80/M tokens$4/M tokensFastestHigh-volume simple tasks
Sonnet$3/M tokens$15/M tokensMediumBalanced work, code review
Opus$15/M tokens$75/M tokensSlowestComplex reasoning, architecture

Note: Token counts matter here. A task that generates 10,000 tokens on Haiku versus Opus could cost 75x more on Opus. But if Haiku produces lower-quality output that requires rework, the true cost calculation gets more complex.

The Cost-Quality-Speed Triangle

Here's the mental model we need to build: every task sits somewhere in a triangle defined by cost, quality, and speed. You can optimize for any two, but the third takes the hit.

If you need low cost and high speed, you sacrifice quality. That's where Haiku thrives—for tasks where the output is fungible, where you can quickly validate and move on, or where the consequence of being slightly wrong is minimal.

If you need high quality and low cost, you sacrifice speed. You might run Haiku with human review cycles, or structure your pipeline so time-sensitive tasks use Sonnet while less critical work uses Haiku.

If you need high quality and high speed, you accept the cost. That's the Opus sweet spot—when you need it fast and you need it right, price takes a backseat.

The optimization game is figuring out which triangle corner each task actually needs. And here's what most teams get wrong: they assume every task needs the top-right corner (fast, cheap, high-quality). It doesn't. Some tasks are fine in the bottom-left. Others absolutely need the top-right.

Matching Tasks to Models: A Decision Framework

Let's build a practical framework for deciding which model to use for a given agent or task.

Start with Complexity Assessment

Ask yourself three questions about the task:

  1. Does it require reasoning across multiple steps? If a task needs the model to hold state, consider dependencies, or reason about causality, it's more complex.

  2. Does it involve subjective judgment? Tasks requiring aesthetic choices, contextual understanding, or nuanced decision-making need more capability.

  3. What's the cost of getting it wrong? If a wrong answer triggers a cascade of problems, you need higher confidence.

Score each as low (1), medium (2), or high (3). Add them up.

  • Score 3-4: Haiku territory. Simple, straightforward, low judgment.
  • Score 5-6: Sonnet zone. Moderate complexity, some judgment needed.
  • Score 7-9: Opus required. Complex reasoning, high stakes, or judgment-heavy.

This framework isn't perfect—it's a heuristic. But it'll get you in the ballpark fast.

The Haiku Playlist: Where Speed and Savings Collide

Haiku is your cost-killer if you use it right. Here are the tasks where Haiku shines:

Test writing and execution. Haiku can generate unit tests, run them, and validate output. It's fast enough to iterate, cheap enough to run in bulk. Even if 10% of generated tests need human tweaking, the cost savings are massive.

yaml
test-engineer:
  model: haiku
  role: Generate and run unit tests for new features
  description: |
    Creates comprehensive test suites. Validates against existing tests.
    Flags coverage gaps. Runs full test suite for pass/fail assessment.
  instructions: |
    You are a test engineering agent optimized for volume.
    Generate tests that thoroughly cover the function signature.
    Always validate that generated tests actually pass.
    Flag any coverage gaps but prioritize pass rate.

Code linting and formatting. Haiku can enforce style guides, check for common mistakes, and apply transformations. These tasks have objective success criteria—code either passes the linter or it doesn't.

yaml
style-enforcer:
  model: haiku
  role: Enforce code style and detect basic quality issues
  description: |
    Checks code against style guide. Detects simple bugs.
    Identifies unused variables, imports. Catches formatting issues.
  instructions: |
    You are a linting agent. Your job is pattern matching against
    established rules. Apply transformations mechanically. Flag violations
    with line numbers and suggested fixes.

Simple code generation. Scaffolding boilerplate, generating getters/setters, transforming data structures. These are pattern-matching tasks with clear inputs and outputs.

yaml
boilerplate-generator:
  model: haiku
  role: Generate project scaffolding and boilerplate
  description: |
    Creates new files, directories, package structures.
    Generates common patterns like CRUD operations.
  instructions: |
    You are a code generator for common patterns.
    Given a specification, generate minimal, correct boilerplate.
    Follow the established project conventions exactly.

Data validation and transformation. Checking that JSON matches a schema, transforming CSV to JSON, normalizing data formats. Clear rules, objective validation.

API response parsing. When you need to extract structured data from API responses, Haiku can handle it. Parse JSON, validate against schema, transform to canonical format.

Document summarization (short). Quick summaries of small documents, extracting key facts, basic chunking for vector databases. Not deep analysis, just extraction.

The pattern: Haiku excels when the task has objective success criteria, clear rules, and low judgment required.

The Sonnet Sweet Spot: The Workhorse

Sonnet is your default for most real work. It's where you'll spend most of your budget, and it's worth it because Sonnet handles the nuanced stuff Haiku struggles with.

Code review. Reviewing code requires understanding intent, architectural patterns, potential bugs in context. Sonnet can read a PR, understand the change, assess quality, suggest improvements. It's capable enough for real insights without the Opus overhead.

yaml
code-reviewer:
  model: sonnet
  role: Review code for quality, correctness, and style
  description: |
    Analyzes pull requests against quality standards.
    Suggests improvements, flags potential bugs.
    Assesses architectural fit.
  instructions: |
    You are a code reviewer. Review this code critically.
    Look for: logic errors, edge cases, performance issues,
    architectural misalignment, readability problems.
    Be specific with line numbers and concrete suggestions.

Documentation writing. Good docs require clarity, organization, understanding of the audience. Sonnet can write docs, tutorials, API references. Haiku would produce thinner, less useful documentation.

Moderate debugging. When a test fails or a function breaks, Sonnet can analyze the error, trace the logic, suggest fixes. It's not deep architectural debugging (that's Opus), but solid troubleshooting.

Writing tasks with structure requirements. Blog posts, emails, reports—anything where the output needs voice, structure, and judgment about what matters most.

Analysis with some nuance. Reviewing research papers, analyzing performance data, identifying patterns in logs. More than just extraction, but not deep reasoning.

Orchestration and delegation. When an agent needs to decide which subagent to call, parse ambiguous instructions, or coordinate between tasks, Sonnet handles that delegation work well.

The pattern: Sonnet is your pick for tasks that need moderate judgment, some reasoning, or output quality that readers will actually care about.

The Opus Lane: When Nothing Else Will Do

Reserve Opus for the genuinely hard problems. These are rare, and using Opus liberally is a budget killer. But when you need it, you need it.

Architectural decisions. When an agent is choosing between multiple design approaches, evaluating tradeoffs, or making decisions that ripple through your codebase, Opus brings the sophisticated reasoning you need.

yaml
architect:
  model: opus
  role: Make architectural decisions and design complex systems
  description: |
    Evaluates design tradeoffs. Recommends architecture patterns.
    Assesses impact of design choices across system.
  instructions: |
    You are a systems architect. Given a problem statement,
    evaluate multiple approaches. For each: pros, cons, scalability,
    maintainability, risk. Recommend one with clear justification.

Deep debugging of complex issues. When a bug is subtle, when it involves multiple systems, when the error is non-obvious, Opus can trace through the logic, understand the architecture, identify root causes that simpler models would miss.

Complex refactoring decisions. When you're restructuring significant portions of code, Opus understands the implications, helps preserve invariants, suggests approaches that maintain correctness across the system.

Novel problem-solving. When you hit a problem you haven't solved before, where the solution isn't a pattern match, Opus brings creativity and deep reasoning.

Multi-step reasoning with high stakes. Complex business logic decisions, security-related analysis, anything where the cost of being wrong is very high.

The pattern: Opus is for problems that genuinely require sophisticated reasoning, novel approaches, or where the cost of failure is steep.

Implementing Model Selection in Claude Code

In practice, you'll configure model selection in your agent frontmatter. Here's how:

yaml
---
name: test-engineer
model: haiku
role: Test engineering agent
description: Generates and executes unit tests
---
You are a test engineering agent. Your primary role is generating
comprehensive unit tests...

The model field tells Claude Code which model to use for this agent. If you don't specify a model, it defaults to Sonnet—a reasonable middle ground.

You can also override at execution time. In your orchestrator or command script, when you dispatch tasks to an agent, you can specify the model:

yaml
dispatch:
  - task: "Write unit tests for auth module"
    agent: test-engineer
    model: haiku
    priority: normal
 
  - task: "Review security architecture"
    agent: architect
    model: opus
    priority: high

Here's where it gets interesting: you can build logic into your orchestrator to dynamically select models based on task characteristics.

yaml
task-router:
  description: Routes tasks to agents with model selection
  logic:
    - if: task.complexity == "low"
      then:
        agent: appropriate-agent
        model: haiku
 
    - if: task.complexity == "medium"
      then:
        agent: appropriate-agent
        model: sonnet
 
    - if: task.complexity == "high"
      then:
        agent: appropriate-agent
        model: opus

This isn't built-in behavior—you'd implement this in your orchestrator logic. But it's a pattern that works: assess task complexity, dispatch to the appropriate model tier.

Measuring and Tracking Agent Spending

Here's what most teams miss: you can't optimize what you don't measure. You need visibility into how much each agent is costing, which models are being used, where your budget is going.

Claude Code doesn't ship with automatic cost tracking, but you can build it. The approach:

Log every agent execution. Capture:

  • Agent name
  • Model used
  • Task description
  • Input token count
  • Output token count
  • Timestamp
  • Success/failure
yaml
cost-tracker:
  format: |
    timestamp: 2026-03-16T14:23:45Z
    agent: test-engineer
    model: haiku
    task: Generate tests for user auth module
    input_tokens: 1250
    output_tokens: 3400
    cost: 0.0246 (calculated)
    status: success

You can log this to a file, a database, or a monitoring system. The point is capturing the data.

Calculate actual costs. Using current token prices:

  • Haiku: $0.80 per million input tokens, $4 per million output tokens
  • Sonnet: $3 per million input tokens, $15 per million output tokens
  • Opus: $15 per million input tokens, $75 per million output tokens
Cost = (input_tokens / 1_000_000 * input_rate) + (output_tokens / 1_000_000 * output_rate)

For the test-engineer example above with Haiku:

Input cost:  1250 / 1_000_000 * 0.80 = $0.001
Output cost: 3400 / 1_000_000 * 4.00 = $0.0136
Total:       $0.0146

Aggregate by agent and time period. Weekly, monthly, by project, by agent type. Answer questions like:

  • Which agents are most expensive?
  • How much did test writing cost this month?
  • What's our Haiku vs. Sonnet vs. Opus split?

Set budgets and alerts. Once you understand your spending pattern, set targets. If you budgeted $500/month for agent work and you're on track for $800, you need to reoptimize.

Here's a simple tracking template you can implement:

yaml
cost-analysis:
  period: "2026-03-01 to 2026-03-16"
  agents:
    test-engineer:
      model: haiku
      executions: 245
      total_input_tokens: 306250
      total_output_tokens: 833000
      total_cost: "$3.71"
      average_cost_per_execution: "$0.0151"
 
    code-reviewer:
      model: sonnet
      executions: 87
      total_input_tokens: 267450
      total_output_tokens: 412300
      total_cost: "$7.19"
      average_cost_per_execution: "$0.0827"
 
    architect:
      model: opus
      executions: 12
      total_input_tokens: 156700
      total_output_tokens: 89250
      total_cost: "$10.32"
      average_cost_per_execution: "$0.86"
 
  summary:
    total_executions: 344
    total_cost: "$21.22"
    biggest_cost_driver: "architect (opus) at 48.6% of budget"
    optimization_opportunity: "Consider running architect on Sonnet for lower-complexity decisions"

This is the data you need to make smart decisions. Without it, you're flying blind.

The Hidden Costs of Wrong Model Selection

Before we jump into optimization tactics, let's talk about something that catches most teams off guard: the hidden costs of picking the wrong model.

The Haiku False Economy. You pick Haiku thinking it's cheap. It is, per token. But then you get back output that's borderline. The model made an assumption you wouldn't have. It missed an edge case. Now you need to re-run with Sonnet. You've paid for Haiku twice and Sonnet once—three times the cost of just using Sonnet upfront. Meanwhile, you've delayed your pipeline.

This happens a lot in code generation. Haiku can scaffold basic boilerplate fine. But for anything with business logic nuance, you often get back code that's almost right—the kind of "almost right" that costs hours to debug because the logic is close enough that you don't spot the problem immediately.

The Opus Overkill Tax. On the flip side, teams using Opus for everything pay an unnecessary premium. You're running your linter on Opus. Your formatter on Opus. Your test scaffolding on Opus. Each of these could run on Haiku for 1/10th the cost with identical quality. If you have a team of five engineers, and each runs 100 simple tasks per day on Opus instead of Haiku, you're looking at thousands of dollars in unnecessary costs monthly.

The lesson: premature model optimization costs more than thoughtful model selection. Spend the time to match models to tasks. It pays back fast.

Latency as a Hidden Cost Factor. Here's something that doesn't show up in token calculations: user experience. Opus is slower. Sometimes significantly slower. If your agent is on a critical path—code review that blocks merges, validation that gates deployments—latency becomes an actual cost. Slower merges mean slower feature delivery. Slower feature delivery means lost revenue.

If your test validator runs on Opus and takes 45 seconds per test suite vs. 15 seconds on Haiku, and you run 100 test suites daily, that's 50 minutes of wait time daily per developer. On a five-person team, that's 4+ hours daily of developer idle time. Assuming $50/hour fully loaded cost, that's $200/day. Over a year, that's $50,000 in productivity loss to save maybe $15/day on token costs. That math doesn't work.

The Cascading Rework Problem. One more hidden cost: when agents make mistakes, rework cascades. A test generator using Haiku produces bad tests. They pass locally but fail in CI. The engineering team investigates. Thirty minutes lost on debugging. One agent picking the wrong model cascades into team-wide slowdown.

You can't just look at the cost of the agent. You have to look at the cost of its failure mode. If Haiku has a 5% failure rate on your test generation but Sonnet has a 0.5% failure rate, and each failure costs an hour of investigation, Sonnet is cheaper even at 10x the token cost.

This is why measurement matters. You need to track not just direct costs but true costs including rework, latency impact, and failure cascades.

Real-World Optimization Strategies

Now let's talk actual tactics. Here are patterns that work:

Strategy 1: Tiered Execution

For tasks where quality matters but cost matters too, implement a tiered execution strategy:

  1. Run with Haiku first. Generate initial output, keep it cheap.
  2. Validate the output. Check against success criteria.
  3. Escalate on failure. If output doesn't meet standards, re-run with Sonnet.

Example: test generation.

yaml
test-pipeline:
  stage-1-generate:
    agent: test-engineer
    model: haiku
    task: Generate comprehensive test suite
 
  stage-2-validate:
    agent: test-validator
    model: haiku
    task: Check that generated tests actually pass
    condition: "if stage-1 output passes validation, proceed to stage-3"
 
  stage-3-human-review:
    condition: "if stage-1 fails validation"
    agent: test-engineer
    model: sonnet
    task: Manually write tests for edge cases

This approach keeps costs low when Haiku works, but escalates when it doesn't. You're paying Opus prices only for the cases that need it.

Strategy 2: Hybrid Decomposition

For complex tasks, decompose into subtasks and assign different models:

yaml
feature-implementation:
  subtask-1-design:
    agent: architect
    model: opus
    cost: Higher, but designs architecture once
 
  subtask-2-implementation:
    agent: code-generator
    model: sonnet
    description: "Implement design, follow architect's blueprint"
 
  subtask-3-testing:
    agent: test-engineer
    model: haiku
    description: "Generate tests for implementation"
 
  subtask-4-review:
    agent: code-reviewer
    model: sonnet
    description: "Review implementation against design"

You pay for Opus thinking once (the architecture), then Sonnet for substantive work, and Haiku for volume. This is cheaper than running everything on Opus.

Strategy 3: Batch Processing with Haiku

For tasks that can tolerate some failures, batch them with Haiku:

yaml
batch-optimization:
  task: "Lint and format 500 files in codebase"
  agent: style-enforcer
  model: haiku
  parallelization: 10 concurrent
  expected_cost: "$2.50"
  alternative_with_sonnet: "$25.00"
 
  validation:
    check_all_files_processed: true
    check_no_files_corrupted: true
    if_validation_fails: "re-run failures with sonnet"

Haiku blasts through linting 500 files cheaply. If a few fail validation, Sonnet re-runs those exceptions. Net cost is dramatically lower than running everything on Sonnet.

Strategy 4: Caching and Reuse

For tasks that repeat, cache results:

yaml
documentation-generation:
  task: Generate API documentation for 50 endpoints
 
  approach-1-generate-all-with-sonnet: "$18.00"
 
  approach-2-smart-caching:
    step-1-generate-with-haiku: "Generate initial docs"
    step-2-identify-unique-patterns: "Group by similarity (haiku)"
    step-3-generate-exemplars-with-sonnet: "Create best-in-class examples"
    step-4-apply-templates: "Use exemplars as templates for rest (haiku)"
    total_cost: "$2.80"

You're leveraging Sonnet's better output for a few examples, then Haiku templates the rest. Much cheaper.

Advanced: Dynamic Model Selection and Load-Based Routing

Once you've got the basics down, you can get sophisticated. Some teams implement dynamic model selection based on runtime conditions.

Confidence-based escalation works like this: Haiku generates output and includes a confidence score. If confidence is high (over 90%), the output goes directly to production. If confidence is medium (60-90%), it gets reviewed by a Sonnet agent who either rubber-stamps it or requests revisions. If confidence is low (under 60%), it escalates directly to Sonnet from the start.

yaml
confidence-escalation:
  stage-1-haiku-generation:
    agent: code-generator
    model: haiku
    task: Generate code for feature X
    output_includes: confidence_score
 
  stage-2-dynamic-routing:
    if: confidence_score > 0.9
    then: "Pass to production"
    else_if: confidence_score > 0.6
    then: "Sonnet review"
    else: "Sonnet regenerate"

This approach uses Haiku as the default path but provides escape hatches for when Haiku isn't confident. Over time, you build a profile of when Haiku's confidence correlates with actual quality.

Load-based model selection is another tactic. If your agent queue is overloaded and latency is spiking, escalate to Haiku for non-critical tasks. If your queue is clear and you have capacity, use Sonnet to improve quality. This keeps your pipeline flowing while being budget-conscious when possible.

yaml
load-aware-routing:
  queue_depth: get_queue_length()
 
  if: queue_depth > 100
    then:
      "Use Haiku for low-priority tasks"
      "Reserve Sonnet for high-priority"
 
  else_if: queue_depth < 20
    then:
      "Use Sonnet across the board"
      "Prioritize quality over cost"
 
  else:
    "Use standard model assignment"

This is more advanced—it requires monitoring queue depth and making real-time routing decisions. But it's powerful: it means your system automatically shifts left on cost when busy and shifts right on quality when capacity is available.

Temperature and token limit adjustments are another lever. For a given model, you can adjust the temperature parameter (controls randomness) and max tokens (controls output length). Lower temperature makes Haiku produce more deterministic, higher-quality output at the cost of less creativity. Limiting tokens forces more concise answers, which reduces cost.

Some teams have "tight mode" (lower temperature, token limits) and "creative mode" (higher temperature, more tokens) variants of the same agent running on different models. The tight-Haiku variant handles straightforward tasks cheaply. The creative-Opus variant handles novel problems where you need the full power.

These advanced techniques aren't necessary to start optimizing. But as your agent infrastructure matures, they're worth exploring.

Pitfalls to Avoid

Let me call out the common mistakes:

Pitfall 1: Using Opus by default. Many teams do this because it "just works"—they don't measure costs and don't realize they're overspending. Set Sonnet as your default, then move up or down based on need.

Pitfall 2: Assuming Haiku is always cheaper. It is on a per-token basis, but if Haiku produces lower-quality output that needs human review, fixes, or re-runs, the total cost balloons. Always measure true cost including rework.

Pitfall 3: Not measuring at all. You can't optimize what you don't measure. Even rough cost tracking beats guessing. Spend a few hours setting up logging and you'll recoup the time in savings within weeks.

Pitfall 4: Ignoring latency in cost calculations. Haiku is 3x faster than Opus. If speed matters—like in customer-facing features—the Haiku savings might be eaten by needing more infrastructure to handle latency. Factor in the full picture.

Pitfall 5: Static model assignments. Don't hardcode "this agent always uses Haiku." Build logic to escalate when needed. A test generator that escalates to Sonnet on failure is smarter than one that pushes through with poor tests.

Pitfall 6: Forgetting about context carryover. Some agents maintain context across multiple invocations. Switching models mid-context can cause coherence issues. Be intentional about when you switch.

Case Study: Real Cost Optimization in Action

Let's walk through a real example. Imagine you have a team that recently deployed an agent infrastructure for code review and test generation. The initial setup has everything on Sonnet (the safe default). They're spending about $1,200/month in agent costs.

Here's the breakdown:

  • Code linting (50 executions/day): 150 tokens input, 400 tokens output. Sonnet cost: $2.25/execution. Monthly: $337.50
  • Test generation (100 executions/day): 800 tokens input, 3000 tokens output. Sonnet cost: $2.70/execution. Monthly: $810
  • Code review (20 executions/day): 2000 tokens input, 1500 tokens output. Sonnet cost: $3.15/execution. Monthly: $60 (assuming $1.89/day)
  • Bug analysis (10 executions/day): 3000 tokens input, 2500 tokens output. Sonnet cost: $4.05/execution. Monthly: $121.50

Total: ~$1,329/month.

Now they apply model optimization:

  1. Move linting to Haiku. Code linting has objective success criteria (does code pass linter or not?). They move it to Haiku. Cost per execution drops from $0.225 to $0.0182. Monthly savings: $330.

  2. Move test generation to tiered execution. They run test generation on Haiku first. The test validator (also Haiku) checks if tests pass. About 90% do. For the 10% that fail, they escalate to Sonnet. New cost: $0.40/execution (90% of Haiku cost + 10% of Sonnet cost). Monthly: $120 (down from $810). Savings: $690.

  3. Keep code review on Sonnet. Code review requires judgment about code quality, architectural fit, maintainability. That's not Haiku's strength. They keep this on Sonnet but add feedback loops so reviewers can suggest cheaper approaches (higher-level reviews on Haiku if the change is straightforward).

  4. Move bug analysis to hybrid. Haiku does an initial analysis of the error logs. If it can identify root cause with high confidence, that's the answer. If not, Sonnet digs deeper. New cost: $0.80/execution. Monthly: $240 (down from $121.50—wait, that's more. Actually, they refine the task: basic log analysis on Haiku, deep debugging only when first attempt failed). New cost: $0.35/execution. Monthly: $105. Savings: $16.50.

New total: ~$570/month.

Total savings: $759/month, or 57%.

What they didn't lose:

  • Quality on linting: identical, it's objective
  • Quality on tests: slightly higher due to escalation to Sonnet for edge cases
  • Quality on code review: identical, still on Sonnet
  • Quality on bug analysis: slightly higher due to Sonnet handling complex cases

What they gained:

  • Visibility into costs (tracking every execution)
  • Ability to adjust as task distribution changes
  • Confidence that they're not overpaying for any task
  • Speed improvements (Haiku is faster, so linting and tests generate faster)
  • Room in budget to add new agents without increasing spend

The whole optimization took them 3-4 hours of work once. The payoff: $9,000/year savings with no quality drop and actual quality improvements in some areas.

This is what's possible when you apply systematic model selection. The numbers will vary based on your task mix, but the pattern holds: most teams can cut agent costs 40-60% while improving or maintaining quality.

Building Your Model Selection Framework

If you're starting from scratch, here's a template process:

Week 1: Baseline and Audit

  • Deploy cost logging to all agents
  • Collect one week of data
  • Categorize agents by role (test, review, lint, etc.)
  • Calculate cost per execution for each agent
  • Identify your top 5 cost drivers

Week 2: Reassess and Classify

  • Using your complexity assessment framework, classify each agent
  • Mark as Haiku-candidate, Sonnet-appropriate, or Opus-required
  • For Haiku-candidates, identify objective success criteria
  • For Opus-required agents, document why they can't move down

Week 3: Pilot Optimization

  • Move your single biggest cost driver (usually test generation) to optimized model
  • Deploy to a staging environment first
  • Measure: cost change, quality change, latency change
  • Document results

Week 4: Scale and Monitor

  • Roll out changes to production
  • Monitor quality metrics and cost metrics daily for the first week
  • Adjust if issues arise
  • Apply learnings to next wave of agents

Then iterate. Each cycle through your agents should take 2-3 weeks as you refine your understanding of which models work for which tasks.

Expected Output and Summary

When you've optimized your agent model selection, here's what you should see:

  • Cost per task drops 30-60% compared to running everything on Sonnet or Opus
  • Execution speed improves for high-volume tasks (Haiku is faster)
  • Quality stays consistent or improves (because you're matching model to task, not overspending on overkill tasks and underspending on hard ones)
  • Clear visibility into spending by agent and agent type
  • Actionable optimization opportunities identified in your cost tracking

Your agent team becomes lean and efficient. You're not paying for thinking power you don't need, but you're bringing in Opus when the problem demands it.

Start with a simple framework:

  1. Classify your tasks by complexity
  2. Assign base models (Haiku for simple, Sonnet for moderate, Opus for complex)
  3. Build a cost tracker to see what you're actually spending
  4. Identify top cost drivers and optimize them first
  5. Iterate based on real data

The teams doing this well report 40-50% cost reductions within the first month while maintaining or improving output quality. You can get there too.

Model selection isn't just about being cheap—it's about being smart. Use the fastest, cheapest tool that solves the problem. Scale with Haiku where it works. Bring in Sonnet for nuance. Reserve Opus for genius. That's the agent cost optimization game, and when you play it right, everyone wins.


-iNet

Need help implementing this?

We build automation systems like this for clients every day.

Discuss Your Project