
When you're orchestrating teams of agents in Claude Code, every API call adds up. And when you're running dozens of agents across hundreds of tasks, those costs can balloon fast. But here's the thing: not every task needs your most expensive model. In fact, using Opus for simple linting checks is like hiring a surgeon to change your oil.
This is where smart model selection becomes your superpower. Claude gives you three main tiers—Haiku, Sonnet, and Opus—each optimized for different complexity levels. Get the model-to-task matching right, and you'll slash costs while maintaining quality. Get it wrong, and you'll watch your bill grow for no good reason.
In this article, we'll walk through a practical framework for choosing the right model for each agent, show you exactly where Haiku shines and where you need Opus, and give you the tools to measure and optimize your agent spending. Let's dig in.
Table of Contents
- Understanding the Claude Model Tiers
- The Cost-Quality-Speed Triangle
- Matching Tasks to Models: A Decision Framework
- Start with Complexity Assessment
- The Haiku Playlist: Where Speed and Savings Collide
- The Sonnet Sweet Spot: The Workhorse
- The Opus Lane: When Nothing Else Will Do
- Implementing Model Selection in Claude Code
- Measuring and Tracking Agent Spending
- The Hidden Costs of Wrong Model Selection
- Real-World Optimization Strategies
- Strategy 1: Tiered Execution
- Strategy 2: Hybrid Decomposition
- Strategy 3: Batch Processing with Haiku
- Strategy 4: Caching and Reuse
- Advanced: Dynamic Model Selection and Load-Based Routing
- Pitfalls to Avoid
- Case Study: Real Cost Optimization in Action
- Building Your Model Selection Framework
- Expected Output and Summary
Understanding the Claude Model Tiers
Before we talk strategy, let's get clear on what we're working with. Anthropic offers three models in the Claude family, each with different tradeoffs:
Haiku is the speed demon. It's your fastest model, designed for high-volume, low-complexity tasks. Think linting, simple code generation, formatting, basic pattern matching. Haiku costs roughly 1/10th the price of Opus while running 2-3x faster. The catch? It's less capable at complex reasoning, nuanced analysis, or multi-step logical problems.
Sonnet sits in the middle. It's the balanced choice—faster and cheaper than Opus, but more capable than Haiku. Sonnet handles code review, moderate analysis, writing that requires some craft, and tasks where you need decent reasoning without overkill. It's your workhorse for most general-purpose work.
Opus is the heavy hitter. It's your most capable model, excelling at complex reasoning, architectural decisions, deep debugging, and nuanced analysis. It's also the slowest and most expensive. You want Opus when the stakes are high, when the task genuinely requires sophisticated thinking, or when the complexity demands it.
Here's a rough cost comparison (as of 2026):
| Model | Input Cost | Output Cost | Speed | Best For |
|---|---|---|---|---|
| Haiku | $0.80/M tokens | $4/M tokens | Fastest | High-volume simple tasks |
| Sonnet | $3/M tokens | $15/M tokens | Medium | Balanced work, code review |
| Opus | $15/M tokens | $75/M tokens | Slowest | Complex reasoning, architecture |
Note: Token counts matter here. A task that generates 10,000 tokens on Haiku versus Opus could cost 75x more on Opus. But if Haiku produces lower-quality output that requires rework, the true cost calculation gets more complex.
The Cost-Quality-Speed Triangle
Here's the mental model we need to build: every task sits somewhere in a triangle defined by cost, quality, and speed. You can optimize for any two, but the third takes the hit.
If you need low cost and high speed, you sacrifice quality. That's where Haiku thrives—for tasks where the output is fungible, where you can quickly validate and move on, or where the consequence of being slightly wrong is minimal.
If you need high quality and low cost, you sacrifice speed. You might run Haiku with human review cycles, or structure your pipeline so time-sensitive tasks use Sonnet while less critical work uses Haiku.
If you need high quality and high speed, you accept the cost. That's the Opus sweet spot—when you need it fast and you need it right, price takes a backseat.
The optimization game is figuring out which triangle corner each task actually needs. And here's what most teams get wrong: they assume every task needs the top-right corner (fast, cheap, high-quality). It doesn't. Some tasks are fine in the bottom-left. Others absolutely need the top-right.
Matching Tasks to Models: A Decision Framework
Let's build a practical framework for deciding which model to use for a given agent or task.
Start with Complexity Assessment
Ask yourself three questions about the task:
-
Does it require reasoning across multiple steps? If a task needs the model to hold state, consider dependencies, or reason about causality, it's more complex.
-
Does it involve subjective judgment? Tasks requiring aesthetic choices, contextual understanding, or nuanced decision-making need more capability.
-
What's the cost of getting it wrong? If a wrong answer triggers a cascade of problems, you need higher confidence.
Score each as low (1), medium (2), or high (3). Add them up.
- Score 3-4: Haiku territory. Simple, straightforward, low judgment.
- Score 5-6: Sonnet zone. Moderate complexity, some judgment needed.
- Score 7-9: Opus required. Complex reasoning, high stakes, or judgment-heavy.
This framework isn't perfect—it's a heuristic. But it'll get you in the ballpark fast.
The Haiku Playlist: Where Speed and Savings Collide
Haiku is your cost-killer if you use it right. Here are the tasks where Haiku shines:
Test writing and execution. Haiku can generate unit tests, run them, and validate output. It's fast enough to iterate, cheap enough to run in bulk. Even if 10% of generated tests need human tweaking, the cost savings are massive.
test-engineer:
model: haiku
role: Generate and run unit tests for new features
description: |
Creates comprehensive test suites. Validates against existing tests.
Flags coverage gaps. Runs full test suite for pass/fail assessment.
instructions: |
You are a test engineering agent optimized for volume.
Generate tests that thoroughly cover the function signature.
Always validate that generated tests actually pass.
Flag any coverage gaps but prioritize pass rate.Code linting and formatting. Haiku can enforce style guides, check for common mistakes, and apply transformations. These tasks have objective success criteria—code either passes the linter or it doesn't.
style-enforcer:
model: haiku
role: Enforce code style and detect basic quality issues
description: |
Checks code against style guide. Detects simple bugs.
Identifies unused variables, imports. Catches formatting issues.
instructions: |
You are a linting agent. Your job is pattern matching against
established rules. Apply transformations mechanically. Flag violations
with line numbers and suggested fixes.Simple code generation. Scaffolding boilerplate, generating getters/setters, transforming data structures. These are pattern-matching tasks with clear inputs and outputs.
boilerplate-generator:
model: haiku
role: Generate project scaffolding and boilerplate
description: |
Creates new files, directories, package structures.
Generates common patterns like CRUD operations.
instructions: |
You are a code generator for common patterns.
Given a specification, generate minimal, correct boilerplate.
Follow the established project conventions exactly.Data validation and transformation. Checking that JSON matches a schema, transforming CSV to JSON, normalizing data formats. Clear rules, objective validation.
API response parsing. When you need to extract structured data from API responses, Haiku can handle it. Parse JSON, validate against schema, transform to canonical format.
Document summarization (short). Quick summaries of small documents, extracting key facts, basic chunking for vector databases. Not deep analysis, just extraction.
The pattern: Haiku excels when the task has objective success criteria, clear rules, and low judgment required.
The Sonnet Sweet Spot: The Workhorse
Sonnet is your default for most real work. It's where you'll spend most of your budget, and it's worth it because Sonnet handles the nuanced stuff Haiku struggles with.
Code review. Reviewing code requires understanding intent, architectural patterns, potential bugs in context. Sonnet can read a PR, understand the change, assess quality, suggest improvements. It's capable enough for real insights without the Opus overhead.
code-reviewer:
model: sonnet
role: Review code for quality, correctness, and style
description: |
Analyzes pull requests against quality standards.
Suggests improvements, flags potential bugs.
Assesses architectural fit.
instructions: |
You are a code reviewer. Review this code critically.
Look for: logic errors, edge cases, performance issues,
architectural misalignment, readability problems.
Be specific with line numbers and concrete suggestions.Documentation writing. Good docs require clarity, organization, understanding of the audience. Sonnet can write docs, tutorials, API references. Haiku would produce thinner, less useful documentation.
Moderate debugging. When a test fails or a function breaks, Sonnet can analyze the error, trace the logic, suggest fixes. It's not deep architectural debugging (that's Opus), but solid troubleshooting.
Writing tasks with structure requirements. Blog posts, emails, reports—anything where the output needs voice, structure, and judgment about what matters most.
Analysis with some nuance. Reviewing research papers, analyzing performance data, identifying patterns in logs. More than just extraction, but not deep reasoning.
Orchestration and delegation. When an agent needs to decide which subagent to call, parse ambiguous instructions, or coordinate between tasks, Sonnet handles that delegation work well.
The pattern: Sonnet is your pick for tasks that need moderate judgment, some reasoning, or output quality that readers will actually care about.
The Opus Lane: When Nothing Else Will Do
Reserve Opus for the genuinely hard problems. These are rare, and using Opus liberally is a budget killer. But when you need it, you need it.
Architectural decisions. When an agent is choosing between multiple design approaches, evaluating tradeoffs, or making decisions that ripple through your codebase, Opus brings the sophisticated reasoning you need.
architect:
model: opus
role: Make architectural decisions and design complex systems
description: |
Evaluates design tradeoffs. Recommends architecture patterns.
Assesses impact of design choices across system.
instructions: |
You are a systems architect. Given a problem statement,
evaluate multiple approaches. For each: pros, cons, scalability,
maintainability, risk. Recommend one with clear justification.Deep debugging of complex issues. When a bug is subtle, when it involves multiple systems, when the error is non-obvious, Opus can trace through the logic, understand the architecture, identify root causes that simpler models would miss.
Complex refactoring decisions. When you're restructuring significant portions of code, Opus understands the implications, helps preserve invariants, suggests approaches that maintain correctness across the system.
Novel problem-solving. When you hit a problem you haven't solved before, where the solution isn't a pattern match, Opus brings creativity and deep reasoning.
Multi-step reasoning with high stakes. Complex business logic decisions, security-related analysis, anything where the cost of being wrong is very high.
The pattern: Opus is for problems that genuinely require sophisticated reasoning, novel approaches, or where the cost of failure is steep.
Implementing Model Selection in Claude Code
In practice, you'll configure model selection in your agent frontmatter. Here's how:
---
name: test-engineer
model: haiku
role: Test engineering agent
description: Generates and executes unit tests
---
You are a test engineering agent. Your primary role is generating
comprehensive unit tests...The model field tells Claude Code which model to use for this agent. If you don't specify a model, it defaults to Sonnet—a reasonable middle ground.
You can also override at execution time. In your orchestrator or command script, when you dispatch tasks to an agent, you can specify the model:
dispatch:
- task: "Write unit tests for auth module"
agent: test-engineer
model: haiku
priority: normal
- task: "Review security architecture"
agent: architect
model: opus
priority: highHere's where it gets interesting: you can build logic into your orchestrator to dynamically select models based on task characteristics.
task-router:
description: Routes tasks to agents with model selection
logic:
- if: task.complexity == "low"
then:
agent: appropriate-agent
model: haiku
- if: task.complexity == "medium"
then:
agent: appropriate-agent
model: sonnet
- if: task.complexity == "high"
then:
agent: appropriate-agent
model: opusThis isn't built-in behavior—you'd implement this in your orchestrator logic. But it's a pattern that works: assess task complexity, dispatch to the appropriate model tier.
Measuring and Tracking Agent Spending
Here's what most teams miss: you can't optimize what you don't measure. You need visibility into how much each agent is costing, which models are being used, where your budget is going.
Claude Code doesn't ship with automatic cost tracking, but you can build it. The approach:
Log every agent execution. Capture:
- Agent name
- Model used
- Task description
- Input token count
- Output token count
- Timestamp
- Success/failure
cost-tracker:
format: |
timestamp: 2026-03-16T14:23:45Z
agent: test-engineer
model: haiku
task: Generate tests for user auth module
input_tokens: 1250
output_tokens: 3400
cost: 0.0246 (calculated)
status: successYou can log this to a file, a database, or a monitoring system. The point is capturing the data.
Calculate actual costs. Using current token prices:
- Haiku: $0.80 per million input tokens, $4 per million output tokens
- Sonnet: $3 per million input tokens, $15 per million output tokens
- Opus: $15 per million input tokens, $75 per million output tokens
Cost = (input_tokens / 1_000_000 * input_rate) + (output_tokens / 1_000_000 * output_rate)
For the test-engineer example above with Haiku:
Input cost: 1250 / 1_000_000 * 0.80 = $0.001
Output cost: 3400 / 1_000_000 * 4.00 = $0.0136
Total: $0.0146
Aggregate by agent and time period. Weekly, monthly, by project, by agent type. Answer questions like:
- Which agents are most expensive?
- How much did test writing cost this month?
- What's our Haiku vs. Sonnet vs. Opus split?
Set budgets and alerts. Once you understand your spending pattern, set targets. If you budgeted $500/month for agent work and you're on track for $800, you need to reoptimize.
Here's a simple tracking template you can implement:
cost-analysis:
period: "2026-03-01 to 2026-03-16"
agents:
test-engineer:
model: haiku
executions: 245
total_input_tokens: 306250
total_output_tokens: 833000
total_cost: "$3.71"
average_cost_per_execution: "$0.0151"
code-reviewer:
model: sonnet
executions: 87
total_input_tokens: 267450
total_output_tokens: 412300
total_cost: "$7.19"
average_cost_per_execution: "$0.0827"
architect:
model: opus
executions: 12
total_input_tokens: 156700
total_output_tokens: 89250
total_cost: "$10.32"
average_cost_per_execution: "$0.86"
summary:
total_executions: 344
total_cost: "$21.22"
biggest_cost_driver: "architect (opus) at 48.6% of budget"
optimization_opportunity: "Consider running architect on Sonnet for lower-complexity decisions"This is the data you need to make smart decisions. Without it, you're flying blind.
The Hidden Costs of Wrong Model Selection
Before we jump into optimization tactics, let's talk about something that catches most teams off guard: the hidden costs of picking the wrong model.
The Haiku False Economy. You pick Haiku thinking it's cheap. It is, per token. But then you get back output that's borderline. The model made an assumption you wouldn't have. It missed an edge case. Now you need to re-run with Sonnet. You've paid for Haiku twice and Sonnet once—three times the cost of just using Sonnet upfront. Meanwhile, you've delayed your pipeline.
This happens a lot in code generation. Haiku can scaffold basic boilerplate fine. But for anything with business logic nuance, you often get back code that's almost right—the kind of "almost right" that costs hours to debug because the logic is close enough that you don't spot the problem immediately.
The Opus Overkill Tax. On the flip side, teams using Opus for everything pay an unnecessary premium. You're running your linter on Opus. Your formatter on Opus. Your test scaffolding on Opus. Each of these could run on Haiku for 1/10th the cost with identical quality. If you have a team of five engineers, and each runs 100 simple tasks per day on Opus instead of Haiku, you're looking at thousands of dollars in unnecessary costs monthly.
The lesson: premature model optimization costs more than thoughtful model selection. Spend the time to match models to tasks. It pays back fast.
Latency as a Hidden Cost Factor. Here's something that doesn't show up in token calculations: user experience. Opus is slower. Sometimes significantly slower. If your agent is on a critical path—code review that blocks merges, validation that gates deployments—latency becomes an actual cost. Slower merges mean slower feature delivery. Slower feature delivery means lost revenue.
If your test validator runs on Opus and takes 45 seconds per test suite vs. 15 seconds on Haiku, and you run 100 test suites daily, that's 50 minutes of wait time daily per developer. On a five-person team, that's 4+ hours daily of developer idle time. Assuming $50/hour fully loaded cost, that's $200/day. Over a year, that's $50,000 in productivity loss to save maybe $15/day on token costs. That math doesn't work.
The Cascading Rework Problem. One more hidden cost: when agents make mistakes, rework cascades. A test generator using Haiku produces bad tests. They pass locally but fail in CI. The engineering team investigates. Thirty minutes lost on debugging. One agent picking the wrong model cascades into team-wide slowdown.
You can't just look at the cost of the agent. You have to look at the cost of its failure mode. If Haiku has a 5% failure rate on your test generation but Sonnet has a 0.5% failure rate, and each failure costs an hour of investigation, Sonnet is cheaper even at 10x the token cost.
This is why measurement matters. You need to track not just direct costs but true costs including rework, latency impact, and failure cascades.
Real-World Optimization Strategies
Now let's talk actual tactics. Here are patterns that work:
Strategy 1: Tiered Execution
For tasks where quality matters but cost matters too, implement a tiered execution strategy:
- Run with Haiku first. Generate initial output, keep it cheap.
- Validate the output. Check against success criteria.
- Escalate on failure. If output doesn't meet standards, re-run with Sonnet.
Example: test generation.
test-pipeline:
stage-1-generate:
agent: test-engineer
model: haiku
task: Generate comprehensive test suite
stage-2-validate:
agent: test-validator
model: haiku
task: Check that generated tests actually pass
condition: "if stage-1 output passes validation, proceed to stage-3"
stage-3-human-review:
condition: "if stage-1 fails validation"
agent: test-engineer
model: sonnet
task: Manually write tests for edge casesThis approach keeps costs low when Haiku works, but escalates when it doesn't. You're paying Opus prices only for the cases that need it.
Strategy 2: Hybrid Decomposition
For complex tasks, decompose into subtasks and assign different models:
feature-implementation:
subtask-1-design:
agent: architect
model: opus
cost: Higher, but designs architecture once
subtask-2-implementation:
agent: code-generator
model: sonnet
description: "Implement design, follow architect's blueprint"
subtask-3-testing:
agent: test-engineer
model: haiku
description: "Generate tests for implementation"
subtask-4-review:
agent: code-reviewer
model: sonnet
description: "Review implementation against design"You pay for Opus thinking once (the architecture), then Sonnet for substantive work, and Haiku for volume. This is cheaper than running everything on Opus.
Strategy 3: Batch Processing with Haiku
For tasks that can tolerate some failures, batch them with Haiku:
batch-optimization:
task: "Lint and format 500 files in codebase"
agent: style-enforcer
model: haiku
parallelization: 10 concurrent
expected_cost: "$2.50"
alternative_with_sonnet: "$25.00"
validation:
check_all_files_processed: true
check_no_files_corrupted: true
if_validation_fails: "re-run failures with sonnet"Haiku blasts through linting 500 files cheaply. If a few fail validation, Sonnet re-runs those exceptions. Net cost is dramatically lower than running everything on Sonnet.
Strategy 4: Caching and Reuse
For tasks that repeat, cache results:
documentation-generation:
task: Generate API documentation for 50 endpoints
approach-1-generate-all-with-sonnet: "$18.00"
approach-2-smart-caching:
step-1-generate-with-haiku: "Generate initial docs"
step-2-identify-unique-patterns: "Group by similarity (haiku)"
step-3-generate-exemplars-with-sonnet: "Create best-in-class examples"
step-4-apply-templates: "Use exemplars as templates for rest (haiku)"
total_cost: "$2.80"You're leveraging Sonnet's better output for a few examples, then Haiku templates the rest. Much cheaper.
Advanced: Dynamic Model Selection and Load-Based Routing
Once you've got the basics down, you can get sophisticated. Some teams implement dynamic model selection based on runtime conditions.
Confidence-based escalation works like this: Haiku generates output and includes a confidence score. If confidence is high (over 90%), the output goes directly to production. If confidence is medium (60-90%), it gets reviewed by a Sonnet agent who either rubber-stamps it or requests revisions. If confidence is low (under 60%), it escalates directly to Sonnet from the start.
confidence-escalation:
stage-1-haiku-generation:
agent: code-generator
model: haiku
task: Generate code for feature X
output_includes: confidence_score
stage-2-dynamic-routing:
if: confidence_score > 0.9
then: "Pass to production"
else_if: confidence_score > 0.6
then: "Sonnet review"
else: "Sonnet regenerate"This approach uses Haiku as the default path but provides escape hatches for when Haiku isn't confident. Over time, you build a profile of when Haiku's confidence correlates with actual quality.
Load-based model selection is another tactic. If your agent queue is overloaded and latency is spiking, escalate to Haiku for non-critical tasks. If your queue is clear and you have capacity, use Sonnet to improve quality. This keeps your pipeline flowing while being budget-conscious when possible.
load-aware-routing:
queue_depth: get_queue_length()
if: queue_depth > 100
then:
"Use Haiku for low-priority tasks"
"Reserve Sonnet for high-priority"
else_if: queue_depth < 20
then:
"Use Sonnet across the board"
"Prioritize quality over cost"
else:
"Use standard model assignment"This is more advanced—it requires monitoring queue depth and making real-time routing decisions. But it's powerful: it means your system automatically shifts left on cost when busy and shifts right on quality when capacity is available.
Temperature and token limit adjustments are another lever. For a given model, you can adjust the temperature parameter (controls randomness) and max tokens (controls output length). Lower temperature makes Haiku produce more deterministic, higher-quality output at the cost of less creativity. Limiting tokens forces more concise answers, which reduces cost.
Some teams have "tight mode" (lower temperature, token limits) and "creative mode" (higher temperature, more tokens) variants of the same agent running on different models. The tight-Haiku variant handles straightforward tasks cheaply. The creative-Opus variant handles novel problems where you need the full power.
These advanced techniques aren't necessary to start optimizing. But as your agent infrastructure matures, they're worth exploring.
Pitfalls to Avoid
Let me call out the common mistakes:
Pitfall 1: Using Opus by default. Many teams do this because it "just works"—they don't measure costs and don't realize they're overspending. Set Sonnet as your default, then move up or down based on need.
Pitfall 2: Assuming Haiku is always cheaper. It is on a per-token basis, but if Haiku produces lower-quality output that needs human review, fixes, or re-runs, the total cost balloons. Always measure true cost including rework.
Pitfall 3: Not measuring at all. You can't optimize what you don't measure. Even rough cost tracking beats guessing. Spend a few hours setting up logging and you'll recoup the time in savings within weeks.
Pitfall 4: Ignoring latency in cost calculations. Haiku is 3x faster than Opus. If speed matters—like in customer-facing features—the Haiku savings might be eaten by needing more infrastructure to handle latency. Factor in the full picture.
Pitfall 5: Static model assignments. Don't hardcode "this agent always uses Haiku." Build logic to escalate when needed. A test generator that escalates to Sonnet on failure is smarter than one that pushes through with poor tests.
Pitfall 6: Forgetting about context carryover. Some agents maintain context across multiple invocations. Switching models mid-context can cause coherence issues. Be intentional about when you switch.
Case Study: Real Cost Optimization in Action
Let's walk through a real example. Imagine you have a team that recently deployed an agent infrastructure for code review and test generation. The initial setup has everything on Sonnet (the safe default). They're spending about $1,200/month in agent costs.
Here's the breakdown:
- Code linting (50 executions/day): 150 tokens input, 400 tokens output. Sonnet cost: $2.25/execution. Monthly: $337.50
- Test generation (100 executions/day): 800 tokens input, 3000 tokens output. Sonnet cost: $2.70/execution. Monthly: $810
- Code review (20 executions/day): 2000 tokens input, 1500 tokens output. Sonnet cost: $3.15/execution. Monthly: $60 (assuming $1.89/day)
- Bug analysis (10 executions/day): 3000 tokens input, 2500 tokens output. Sonnet cost: $4.05/execution. Monthly: $121.50
Total: ~$1,329/month.
Now they apply model optimization:
-
Move linting to Haiku. Code linting has objective success criteria (does code pass linter or not?). They move it to Haiku. Cost per execution drops from $0.225 to $0.0182. Monthly savings: $330.
-
Move test generation to tiered execution. They run test generation on Haiku first. The test validator (also Haiku) checks if tests pass. About 90% do. For the 10% that fail, they escalate to Sonnet. New cost: $0.40/execution (90% of Haiku cost + 10% of Sonnet cost). Monthly: $120 (down from $810). Savings: $690.
-
Keep code review on Sonnet. Code review requires judgment about code quality, architectural fit, maintainability. That's not Haiku's strength. They keep this on Sonnet but add feedback loops so reviewers can suggest cheaper approaches (higher-level reviews on Haiku if the change is straightforward).
-
Move bug analysis to hybrid. Haiku does an initial analysis of the error logs. If it can identify root cause with high confidence, that's the answer. If not, Sonnet digs deeper. New cost: $0.80/execution. Monthly: $240 (down from $121.50—wait, that's more. Actually, they refine the task: basic log analysis on Haiku, deep debugging only when first attempt failed). New cost: $0.35/execution. Monthly: $105. Savings: $16.50.
New total: ~$570/month.
Total savings: $759/month, or 57%.
What they didn't lose:
- Quality on linting: identical, it's objective
- Quality on tests: slightly higher due to escalation to Sonnet for edge cases
- Quality on code review: identical, still on Sonnet
- Quality on bug analysis: slightly higher due to Sonnet handling complex cases
What they gained:
- Visibility into costs (tracking every execution)
- Ability to adjust as task distribution changes
- Confidence that they're not overpaying for any task
- Speed improvements (Haiku is faster, so linting and tests generate faster)
- Room in budget to add new agents without increasing spend
The whole optimization took them 3-4 hours of work once. The payoff: $9,000/year savings with no quality drop and actual quality improvements in some areas.
This is what's possible when you apply systematic model selection. The numbers will vary based on your task mix, but the pattern holds: most teams can cut agent costs 40-60% while improving or maintaining quality.
Building Your Model Selection Framework
If you're starting from scratch, here's a template process:
Week 1: Baseline and Audit
- Deploy cost logging to all agents
- Collect one week of data
- Categorize agents by role (test, review, lint, etc.)
- Calculate cost per execution for each agent
- Identify your top 5 cost drivers
Week 2: Reassess and Classify
- Using your complexity assessment framework, classify each agent
- Mark as Haiku-candidate, Sonnet-appropriate, or Opus-required
- For Haiku-candidates, identify objective success criteria
- For Opus-required agents, document why they can't move down
Week 3: Pilot Optimization
- Move your single biggest cost driver (usually test generation) to optimized model
- Deploy to a staging environment first
- Measure: cost change, quality change, latency change
- Document results
Week 4: Scale and Monitor
- Roll out changes to production
- Monitor quality metrics and cost metrics daily for the first week
- Adjust if issues arise
- Apply learnings to next wave of agents
Then iterate. Each cycle through your agents should take 2-3 weeks as you refine your understanding of which models work for which tasks.
Expected Output and Summary
When you've optimized your agent model selection, here's what you should see:
- Cost per task drops 30-60% compared to running everything on Sonnet or Opus
- Execution speed improves for high-volume tasks (Haiku is faster)
- Quality stays consistent or improves (because you're matching model to task, not overspending on overkill tasks and underspending on hard ones)
- Clear visibility into spending by agent and agent type
- Actionable optimization opportunities identified in your cost tracking
Your agent team becomes lean and efficient. You're not paying for thinking power you don't need, but you're bringing in Opus when the problem demands it.
Start with a simple framework:
- Classify your tasks by complexity
- Assign base models (Haiku for simple, Sonnet for moderate, Opus for complex)
- Build a cost tracker to see what you're actually spending
- Identify top cost drivers and optimize them first
- Iterate based on real data
The teams doing this well report 40-50% cost reductions within the first month while maintaining or improving output quality. You can get there too.
Model selection isn't just about being cheap—it's about being smart. Use the fastest, cheapest tool that solves the problem. Scale with Haiku where it works. Bring in Sonnet for nuance. Reserve Opus for genius. That's the agent cost optimization game, and when you play it right, everyone wins.
-iNet