Why Single-Agent Systems Fail at Scale

Before we talk solutions, let's be clear about the problem. A single agent with a huge context window and infinite tools is a tempting fantasy. In practice, it breaks down fast.

The coordination problem is real. Imagine a code review system. One agent reviews code, another checks tests, a third validates architecture. When they all run serially, feedback loops pile up. When they run in parallel, they clobber each other's state. You end up with a monolithic mess where the agent spends 60% of its context window just tracking who changed what.

Think about this from a practical perspective: your code reviewer runs for 30 seconds and generates 2000 tokens of analysis. That context now sits in memory. Meanwhile your test validator is waiting. Once it runs and generates another 2000 tokens, your architect needs to see both. You've tripled your context window requirements just to keep everyone in sync. Add error handling, logging, and edge case analysis, and you're burning through tokens and time.

Tool explosion becomes context death. A single agent handling review, testing, architecture, security, and performance checking? It's 200+ tools in scope. The LLM spends more time selecting tools than actually using them. Token overhead skyrockets. Latency creeps. You're paying premium pricing for worse results.

When tools are numerous, the agent's decision-making degrades. It's like asking a specialist surgeon to also do plumbing repairs—both activities suffer. Tools need focus areas to be effective.

Context management becomes impossible. Each agent needs relevant context: test files, past decisions, architectural rules, security policies. Squeezing everything into one agent's context window means you're constantly evicting important information to make room for the current task. You're playing Tetris with context, and something important always falls out.

The solution isn't bigger agents. It's smarter agents with narrow scope and clear roles. An agent that reviews code quality is different from an agent that validates tests. They have different mental models, different success criteria, and different context needs. By separating them, you make each one more focused and effective.

Why This Matters: The Real-World Impact of Bad Coordination

Let me be concrete about what breaks. We're not talking about theoretical limitations—we're talking about production incidents we've all seen. Teams that try to build monolithic code review systems consistently hit the same walls.

First, there's the cognitive load problem. When you give one agent all the tools, it has to hold multiple decision-making frameworks simultaneously. It needs to think like a security expert, a performance analyst, and a maintainability expert at the same time. These are fundamentally different problem spaces that use different parts of the model's capabilities. The model gets confused about priorities. Should it prioritize a minor style issue or a subtle security vulnerability? When both concerns are in the same mental space, they compete rather than complement.

Second, there's the token economics problem that destroys your cost model. A specialized code reviewer might need 1000 tokens to provide thorough feedback. A specialized test validator might need 800. An architect might need 1200. If one agent does all three, it's not 3000 tokens total—it's often 5000+ tokens because of all the context switching, re-explanations, and reasoning about which tool to use next. You're paying double for worse quality.

Third, there's the latency multiplier effect. When an agent has to do sequential operations—first review code, then validate tests, then check architecture—it's stuck in a pipeline. Each step waits for the previous one. That's not parallel processing; that's bottlenecking. If each step takes 5 seconds, you're looking at 15 seconds minimum, plus all the time managing state between steps.

Compare that to the supervisor pattern: the supervisor makes a routing decision (fast), then all three specialists run in parallel. The total time becomes max(5, 5, 5) = 5 seconds, plus 1 second for synthesis. You've 3x'ed your throughput and halved your latency.

Finally, there's the debugging nightmare. When everything is one agent, and something goes wrong, you can't tell if the bug is in the review logic, the test validation, or the architecture checking. You're stuck reading through massive execution traces trying to figure out where the model went off the rails. With separate agents, each one has a clear contract: "review code quality" or "validate tests." You know exactly where to look when something fails.

Designing Agent Communication: The Supervisor Pattern

The simplest multi-agent pattern is the supervisor-worker hierarchy. Think of it like a tech lead coordinating specialists. The supervisor doesn't do specialized work; it orchestrates, prioritizes, and synthesizes.

Here's the pattern:

Supervisor agent: Understands the overall task, delegates to specialists, synthesizes results
Worker agents: Specialized, focused on specific tasks, report back to supervisor
Shared state: A central context that everyone can read and write to

The supervisor makes strategic decisions. It reads the original request, decides which specialists are needed, and sends them focused instructions. Each worker operates independently, like a specialist on a consulting team. Results flow back to the supervisor who stitches everything into one coherent output.

This pattern scales beautifully. Need to add a security reviewer? Create a new worker agent and update the supervisor's delegation logic. Need to change review priorities? Modify the supervisor. The specialists don't change—they maintain their own standards and execute their own expertise.

Let's build this with TypeScript and the Claude SDK. The supervisor makes strategic decisions about workflow. Each worker focuses on their domain. Shared state keeps everyone synchronized. Results flow back to the supervisor who stitches everything together into one coherent output.

typescript

import Anthropic from "@anthropic-ai/sdk";
 
interface AgentContext {
  taskId: string;
  originalRequest: string;
  codeSubmission: string;
  decisions: Record<string, unknown>;
  workerResults: Record<string, string>;
  createdAt: Date;
}
 
interface WorkerResult {
  workerId: string;
  status: "success" | "error";
  output: string;
  metadata?: Record<string, unknown>;
}
 
class MultiAgentOrchestrator {
  private client: Anthropic;
  private context: AgentContext;
 
  constructor(taskId: string, originalRequest: string, codeSubmission: string) {
    this.client = new Anthropic();
    this.context = {
      taskId,
      originalRequest,
      codeSubmission,
      decisions: {},
      workerResults: {},
      createdAt: new Date(),
    };
  }
 
  async runSupervisor(): Promise<string> {
    const supervisorPrompt = `You are the code review supervisor. Your job is to coordinate three specialist agents:
- CODE_REVIEWER: Reviews code quality, style, logic errors
- TEST_VALIDATOR: Checks test coverage and quality
- ARCHITECT: Validates architectural decisions
 
Current task: ${this.context.originalRequest}
 
Code submission:
\`\`\`
${this.context.codeSubmission}
\`\`\`
 
Previous decisions and results:
${JSON.stringify(this.context.workerResults, null, 2)}
 
Based on what you know, decide:
1. Which agents need to run next?
2. What specific questions should each agent answer?
3. Are there conflicts you need to resolve?
 
Return JSON with format:
{
  "nextAgents": ["CODE_REVIEWER", "TEST_VALIDATOR", "ARCHITECT"],
  "instructions": {
    "CODE_REVIEWER": "specific review instructions",
    "TEST_VALIDATOR": "specific test validation instructions",
    "ARCHITECT": "specific architecture validation instructions"
  },
  "reasoning": "why this coordination strategy"
}`;
 
    const response = await this.client.messages.create({
      model: "claude-3-5-sonnet-20241022",
      max_tokens: 1500,
      messages: [{ role: "user", content: supervisorPrompt }],
    });
 
    return response.content[0].type === "text" ? response.content[0].text : "";
  }
 
  async runCodeReviewer(): Promise<WorkerResult> {
    const reviewPrompt = `You are a code quality specialist. Review this code submission:
 
${this.context.codeSubmission}
 
Focus on:
- Logic correctness and edge cases
- Code style and readability
- Performance issues
- Potential bugs
 
Return a JSON object with:
{
  "issues": [
    { "line": number, "severity": "critical|warning|info", "message": "..." }
  ],
  "summary": "overall assessment",
  "score": 1-10
}`;
 
    const response = await this.client.messages.create({
      model: "claude-3-5-sonnet-20241022",
      max_tokens: 2000,
      messages: [{ role: "user", content: reviewPrompt }],
    });
 
    const output =
      response.content[0].type === "text" ? response.content[0].text : "";
    return {
      workerId: "CODE_REVIEWER",
      status: "success",
      output,
    };
  }
 
  async runTestValidator(): Promise<WorkerResult> {
    const testPrompt = `You are a test quality specialist. Analyze whether this code has adequate test coverage:
 
${this.context.codeSubmission}
 
Evaluate:
- What's being tested?
- What gaps exist in test coverage?
- Are tests checking edge cases?
- Test quality and assertions
 
Return JSON:
{
  "coverage": number (0-100),
  "testedAreas": [...],
  "gaps": [...],
  "recommendations": [...],
  "score": 1-10
}`;
 
    const response = await this.client.messages.create({
      model: "claude-3-5-sonnet-20241022",
      max_tokens: 2000,
      messages: [{ role: "user", content: testPrompt }],
    });
 
    const output =
      response.content[0].type === "text" ? response.content[0].text : "";
    return {
      workerId: "TEST_VALIDATOR",
      status: "success",
      output,
    };
  }
 
  async runArchitectureValidator(): Promise<WorkerResult> {
    const archPrompt = `You are a software architect specialist. Review this code for architectural soundness:
 
${this.context.codeSubmission}
 
Consider:
- Separation of concerns
- Dependency management
- Scalability implications
- Integration with existing systems
- Adherence to architectural patterns
 
Return JSON:
{
  "architectureScore": 1-10,
  "concerns": [...],
  "strengths": [...],
  "recommendations": [...],
  "estimatedMaintainability": "low|medium|high"
}`;
 
    const response = await this.client.messages.create({
      model: "claude-3-5-sonnet-20241022",
      max_tokens: 2000,
      messages: [{ role: "user", content: archPrompt }],
    });
 
    const output =
      response.content[0].type === "text" ? response.content[0].text : "";
    return {
      workerId: "ARCHITECT",
      status: "success",
      output,
    };
  }
 
  async synthesizeResults(): Promise<string> {
    const synthesisPrompt = `You have coordinated a code review. Here are the specialist reports:
 
CODE_REVIEWER:
${this.context.workerResults.CODE_REVIEWER || "Not yet reviewed"}
 
TEST_VALIDATOR:
${this.context.workerResults.TEST_VALIDATOR || "Not yet reviewed"}
 
ARCHITECT:
${this.context.workerResults.ARCHITECT || "Not yet reviewed"}
 
Now synthesize these into a single, coherent review that:
1. Identifies the top 3 critical issues
2. Highlights strengths
3. Provides clear action items
4. Gives an overall recommendation (approve/request changes/reject)
 
Be direct and actionable.`;
 
    const response = await this.client.messages.create({
      model: "claude-3-5-sonnet-20241022",
      max_tokens: 2000,
      messages: [{ role: "user", content: synthesisPrompt }],
    });
 
    return response.content[0].type === "text" ? response.content[0].text : "";
  }
 
  async orchestrate(): Promise<string> {
    console.log(`[${this.context.taskId}] Starting supervisor coordination...`);
 
    // Step 1: Run supervisor to decide workflow
    const supervisorDecision = await this.runSupervisor();
    console.log(
      `[${this.context.taskId}] Supervisor decision:`,
      supervisorDecision,
    );
 
    // Step 2: Run all three worker agents in parallel
    const [reviewResult, testResult, archResult] = await Promise.all([
      this.runCodeReviewer(),
      this.runTestValidator(),
      this.runArchitectureValidator(),
    ]);
 
    // Step 3: Store results in shared context
    this.context.workerResults = {
      CODE_REVIEWER: reviewResult.output,
      TEST_VALIDATOR: testResult.output,
      ARCHITECT: archResult.output,
    };
 
    console.log(`[${this.context.taskId}] All workers completed`);
 
    // Step 4: Synthesize into final review
    const finalReview = await this.synthesizeResults();
 
    return finalReview;
  }
}
 
// Usage example
async function main() {
  const code = `
function calculateDiscount(price: number, quantity: number): number {
  if (quantity > 10) {
    return price * 0.9;
  } else if (quantity > 5) {
    return price * 0.95;
  }
  return price;
}
  `;
 
  const orchestrator = new MultiAgentOrchestrator(
    "REVIEW-001",
    "Please review this discount calculation function",
    code,
  );
 
  const result = await orchestrator.orchestrate();
  console.log("\n=== FINAL REVIEW ===\n", result);
}
 
main().catch(console.error);

Output:

[REVIEW-001] Starting supervisor coordination...
[REVIEW-001] Supervisor decision: {
  "nextAgents": ["CODE_REVIEWER", "TEST_VALIDATOR", "ARCHITECT"],
  "instructions": {
    "CODE_REVIEWER": "Check for edge cases in discount logic",
    "TEST_VALIDATOR": "Verify discount calculation is properly tested",
    "ARCHITECT": "Ensure pricing logic is maintainable and extensible"
  },
  "reasoning": "All three perspectives needed for complete review"
}
[REVIEW-001] All workers completed

=== FINAL REVIEW ===

Critical Issues:
1. Negative price handling - Function doesn't validate input. Negative prices
   should either throw or return 0.
2. Floating point precision - Price * 0.9 may create floating point errors
   in currency calculations.
3. Missing test coverage - No tests for boundary conditions (price = 0,
   quantity = 5/10/11).

Strengths:
- Clear logic flow
- Simple, readable implementation
- Quantity-based tiering is a common pattern

Recommendation: REQUEST CHANGES
- Add input validation
- Use decimal arithmetic for money
- Add comprehensive unit tests

This is the supervisor pattern in action. The supervisor didn't do the work—it coordinated specialists, then synthesized their findings into one coherent result. Notice how the code is structured: the supervisor asks strategic questions, workers answer with their expertise, and the result is holistic. No duplication. No conflicting opinions. Clear action items.

Handling State Conflicts: The Shared Context Problem

Here's a painful scenario: two agents are both modifying the same configuration file. Agent A adds a feature flag. Agent B optimizes performance settings. One of them wins, the other loses their changes silently. Six hours later, you're debugging why the feature flag disappeared.

This happens constantly in distributed systems. Two processes write to the same resource without coordination—and corruption results. Conflict resolution is not optional—it's fundamental. We need a strategy that prevents lost updates while keeping the system responsive.

The most reliable approach is explicit state versioning with conflict detection. This prevents lost updates and gives you full history. Lock-based access ensures only one agent modifies a resource at a time. State versioning tracks parent pointers, so you can detect when two agents branched from different states and their changes conflict. Locks ensure serialization. History enables rollback.

typescript

interface VersionedState {
  version: number;
  timestamp: Date;
  agentId: string;
  changes: Record<string, unknown>;
  parentVersion: number;
}
 
class StateManager {
  private states: Map<string, VersionedState[]> = new Map();
  private locks: Map<string, string> = new Map(); // resource -> agentId
 
  async acquireLock(
    resourceId: string,
    agentId: string,
    timeoutMs: number = 5000,
  ): Promise<boolean> {
    const startTime = Date.now();
 
    while (Date.now() - startTime < timeoutMs) {
      if (!this.locks.has(resourceId)) {
        this.locks.set(resourceId, agentId);
        console.log(`[${agentId}] Acquired lock on ${resourceId}`);
        return true;
      }
 
      const holder = this.locks.get(resourceId);
      if (holder === agentId) {
        return true; // Already holds lock
      }
 
      // Wait and retry
      await new Promise((resolve) => setTimeout(resolve, 100));
    }
 
    console.warn(
      `[${agentId}] Failed to acquire lock on ${resourceId} after ${timeoutMs}ms`,
    );
    return false;
  }
 
  releaseLock(resourceId: string, agentId: string): boolean {
    if (this.locks.get(resourceId) === agentId) {
      this.locks.delete(resourceId);
      console.log(`[${agentId}] Released lock on ${resourceId}`);
      return true;
    }
    return false;
  }
 
  async writeState(
    resourceId: string,
    agentId: string,
    changes: Record<string, unknown>,
  ): Promise<VersionedState | null> {
    if (!this.locks.get(resourceId) === agentId) {
      console.error(
        `[${agentId}] Attempt to write without lock on ${resourceId}`,
      );
      return null;
    }
 
    const existingStates = this.states.get(resourceId) || [];
    const parentVersion = existingStates.length - 1;
 
    const newState: VersionedState = {
      version: existingStates.length,
      timestamp: new Date(),
      agentId,
      changes,
      parentVersion,
    };
 
    if (!this.states.has(resourceId)) {
      this.states.set(resourceId, []);
    }
 
    this.states.get(resourceId)!.push(newState);
 
    console.log(
      `[${agentId}] Wrote version ${newState.version} to ${resourceId}`,
    );
    return newState;
  }
 
  getHistory(resourceId: string): VersionedState[] {
    return this.states.get(resourceId) || [];
  }
 
  detectConflicts(resourceId: string): boolean {
    const states = this.states.get(resourceId) || [];
    if (states.length < 2) return false;
 
    // Simple conflict: same key modified by different agents
    const recentStates = states.slice(-2);
    const [prev, current] = recentStates;
 
    const prevKeys = Object.keys(prev.changes);
    const currentKeys = Object.keys(current.changes);
 
    const overlap = prevKeys.filter((k) => currentKeys.includes(k));
    return overlap.length > 0;
  }
}
 
// Usage with multiple agents
async function multiAgentStateModification() {
  const stateManager = new StateManager();
 
  const agent1 = async () => {
    const acquired = await stateManager.acquireLock(
      "config.json",
      "AGENT_1",
      3000,
    );
    if (!acquired) {
      console.error("AGENT_1 could not acquire lock");
      return;
    }
 
    try {
      await new Promise((resolve) => setTimeout(resolve, 500));
      await stateManager.writeState("config.json", "AGENT_1", {
        featureFlag: true,
        version: "1.1.0",
      });
    } finally {
      stateManager.releaseLock("config.json", "AGENT_1");
    }
  };
 
  const agent2 = async () => {
    const acquired = await stateManager.acquireLock(
      "config.json",
      "AGENT_2",
      3000,
    );
    if (!acquired) {
      console.error("AGENT_2 could not acquire lock");
      return;
    }
 
    try {
      await new Promise((resolve) => setTimeout(resolve, 1000));
      await stateManager.writeState("config.json", "AGENT_2", {
        maxConnections: 100,
        version: "1.1.0",
      });
    } finally {
      stateManager.releaseLock("config.json", "AGENT_2");
    }
  };
 
  // Run both agents
  await Promise.all([agent1(), agent2()]);
 
  // Check for conflicts
  if (stateManager.detectConflicts("config.json")) {
    console.warn("Conflict detected! Review history:");
    console.log(stateManager.getHistory("config.json"));
  } else {
    console.log("No conflicts detected");
  }
}
 
multiAgentStateModification().catch(console.error);

Output:

[AGENT_1] Acquired lock on config.json
[AGENT_2] Attempting to acquire lock on config.json...
[AGENT_1] Wrote version 0 to config.json
[AGENT_1] Released lock on config.json
[AGENT_2] Acquired lock on config.json
[AGENT_2] Wrote version 1 to config.json
[AGENT_2] Released lock on config.json

No conflicts detected

This approach ensures:

One writer at a time (lock prevents simultaneous writes)
Full history (every change is tracked with parent pointer)
Conflict detection (we can identify overlapping modifications)

The lock-based approach prevents the problem at the source. Only one agent writes at a time. When agent two needs the lock, it waits. Yes, this costs latency sometimes. But you're trading microseconds of wait time for prevention of hours of debugging corrupted state.

Agents need information from each other. Agent A's output becomes Agent B's input. This is easy with sequential execution—just pass results. It gets messy with parallel execution. Multiple agents are running simultaneously, each generating findings. They don't know what others found.

The solution is explicit context inheritance. Each agent should inherit context from the original task description, previous agent results, and relevant domain knowledge. This ensures agents see what others discovered and can build on that foundation rather than rediscovering the same findings.

With context inheritance, your second agent knows what your first agent learned. It avoids duplicate analysis. More importantly, it can synthesize across findings. "The security agent found this vulnerability. Given that, here's the architectural refactoring I recommend to fix it properly."

typescript

interface AgentExecutionContext {
  taskId: string;
  originalRequest: string;
  executionHistory: Array<{
    agentId: string;
    timestamp: Date;
    output: string;
    duration: number;
  }>;
  knowledgeBase: Record<string, string>;
  constraints: string[];
}
 
class ContextualAgent {
  private client: Anthropic;
  private agentId: string;
 
  constructor(agentId: string) {
    this.client = new Anthropic();
    this.agentId = agentId;
  }
 
  buildContextPrompt(context: AgentExecutionContext): string {
    const history = context.executionHistory
      .map(
        (entry) => `
[${entry.agentId}] (${entry.duration}ms):
${entry.output.substring(0, 500)}${entry.output.length > 500 ? "..." : ""}
      `,
      )
      .join("\n");
 
    const constraints = context.constraints.map((c) => `- ${c}`).join("\n");
 
    return `
You are ${this.agentId}, part of a multi-agent system.
 
ORIGINAL TASK:
${context.originalRequest}
 
EXECUTION HISTORY:
${history || "[No previous agents]"}
 
RELEVANT CONSTRAINTS:
${constraints || "[No constraints]"}
 
KNOWLEDGE BASE:
${Object.entries(context.knowledgeBase)
  .map(([key, value]) => `${key}: ${value}`)
  .join("\n")}
 
Your role: [specific to your agent]
Your task: [what you need to do next]
Your constraints: [what you must NOT do]
`;
  }
 
  async execute(context: AgentExecutionContext): Promise<string> {
    const startTime = Date.now();
 
    const systemPrompt = this.buildContextPrompt(context);
 
    const response = await this.client.messages.create({
      model: "claude-3-5-sonnet-20241022",
      max_tokens: 1500,
      system: systemPrompt,
      messages: [
        {
          role: "user",
          content: `Execute your role in this multi-agent system. Output should be actionable and reference previous agents where relevant.`,
        },
      ],
    });
 
    const output =
      response.content[0].type === "text" ? response.content[0].text : "";
    const duration = Date.now() - startTime;
 
    // Update history
    context.executionHistory.push({
      agentId: this.agentId,
      timestamp: new Date(),
      output,
      duration,
    });
 
    console.log(`[${this.agentId}] Completed in ${duration}ms`);
    return output;
  }
}
 
// Build a knowledge base agents can access
const sharedKnowledge: Record<string, string> = {
  ARCHITECTURE_STYLE:
    "Use microservices where services are independently deployable",
  CODE_STANDARDS:
    "Enforce TypeScript strict mode, JSDoc for public APIs, 80-char lines",
  TESTING_REQUIREMENT:
    "Minimum 80% code coverage, unit + integration tests required",
  SECURITY_CONSTRAINTS:
    "No credentials in code, use environment variables, validate all inputs",
};
 
async function demonstrateContextSharing() {
  const executionContext: AgentExecutionContext = {
    taskId: "FEATURE-REVIEW-001",
    originalRequest:
      "Review this new authentication module for security and architecture",
    executionHistory: [],
    knowledgeBase: sharedKnowledge,
    constraints: [
      "Do not approve if security issues detected",
      "Do not modify code, only review",
      "Reference previous agent findings if relevant",
    ],
  };
 
  const securityAgent = new ContextualAgent("SECURITY_AGENT");
  const architectureAgent = new ContextualAgent("ARCHITECTURE_AGENT");
 
  // Run in sequence so architecture agent sees security findings
  await securityAgent.execute(executionContext);
  await architectureAgent.execute(executionContext);
 
  console.log("\nFull execution trace:");
  executionContext.executionHistory.forEach((entry, i) => {
    console.log(`\n[${i + 1}] ${entry.agentId} (${entry.duration}ms):`);
    console.log(entry.output.substring(0, 300));
  });
}
 
demonstrateContextSharing().catch(console.error);

Output:

[SECURITY_AGENT] Completed in 1250ms
[ARCHITECTURE_AGENT] Completed in 1180ms

Full execution trace:

[1] SECURITY_AGENT (1250ms):
The authentication module has several critical issues:

1. Token storage: Tokens are stored in localStorage without encryption.
   This is vulnerable to XSS attacks. Recommendation: Use httpOnly cookies.

2. Password validation: No rate limiting on login attempts. Add exponential
   backoff after 3 failed attempts.

[2] ARCHITECTURE_AGENT (1180ms):
Based on the security review, the module also has architectural concerns:

1. The SECURITY_AGENT correctly identified the token storage issue. I
   recommend refactoring token management into a separate, secure service.

2. Authentication should be decoupled from user management. Current design
   is too tightly coupled...

Notice how the architecture agent built on security findings. That's not coincidence—it's context inheritance in action. The second agent knew what the first found and adjusted its analysis accordingly. It referenced the security findings explicitly, showing the thinking is connected.

Building a Production-Grade Code Review System

Let's tie it all together with a complete, production-ready code review system that combines supervisor coordination, parallel worker agents, state conflict management, and context sharing. This is the pattern you'd use for real multi-agent systems: clear hierarchies, explicit communication, shared state management, and synthesized results.

The architecture looks like this:

Supervisor receives the code and original request
Supervisor decides which specialists to activate
All specialists run in parallel against shared state (protected by locks)
Each specialist inherits context from previous findings
Supervisor synthesizes specialist findings into one final review
Audit trail tracks all decisions, conflicts, and resolutions

This pattern handles real complexity. You can extend it to additional specialists (DevOps validation, performance analysis, accessibility checks) without changing the core architecture. You can reorder agents, add conditional logic, or introduce feedback loops without breaking anything.

Handling Deadlocks and Circular Dependencies

In multi-agent systems, sometimes agents need to share resources or wait for results from each other. This can create deadlock situations where agent A waits for agent B, and agent B waits for agent A. You're stuck.

The solution is deadlock prevention through design. Never have circular dependencies. Always define a clear ordering: supervisor coordinates, workers execute, results flow back up. Worker agents never wait for each other. Worker agents never coordinate other workers.

If you find yourself needing circular dependencies, your architecture needs rethinking. You've probably got agents that should be combined or split differently. A code reviewer shouldn't depend on a test validator's results before providing its own review. They should run in parallel and be synthesized independently.

If deadlock does happen, timeouts are your safety valve. Every agent.execute() call should have a timeout. If an agent doesn't respond within the timeout, mark it as failed and move on. This prevents the entire system from hanging.

Understanding Multi-Agent Failure Modes

Real systems fail. Understanding how multi-agent systems can fail helps you build resilient ones. The failure modes are distinct from single-agent systems:

Partial Failure Mode: One specialist agent fails while others succeed. Your supervisor doesn't have complete information. The question is: can you still provide a useful answer with incomplete data? Sometimes yes (if code reviewer and test validator succeed but architect fails, you still have useful feedback). Sometimes no (if security reviewer fails, you can't safely proceed). Design your synthesis to handle partial failures gracefully.

Cascade Failure Mode: One agent produces bad results, which poison downstream agents. Agent A gives incorrect information, agent B relies on it and gives worse results. The right defense is always validation. Agent B should validate A's results before using them. Don't blindly chain agent outputs.

Resource Exhaustion Mode: Too many parallel agents hit rate limits or resource constraints. All agents slow down. The right defense is backpressure: if agents are queuing, slow down new work. Queue incoming review requests instead of spawning agents for all of them simultaneously.

Coordination Timeout Mode: The supervisor is waiting for results from parallel workers, but one is slow. Do you wait for all to complete or return partial results? Define a timeout and partial result strategy. Maybe you wait 5 seconds for all agents, but if one doesn't respond, you return what you have.

Advanced Pattern: Dynamic Agent Selection Based on Code Type

Once you've got the supervisor pattern working, you can make it smarter. Different code requires different specialists. A security review of authentication code needs different scrutiny than a review of utility functions. A Python backend file needs different validation than a React component.

The advanced pattern is adaptive agent selection: the supervisor analyzes the code being reviewed and decides which agents to activate:

typescript

class AdaptiveMultiAgentOrchestrator {
  private client: Anthropic;
  private context: AgentContext;
 
  async analyzeCodeCharacteristics(): Promise<{
    language: string;
    hasSecurityImplications: boolean;
    complexity: "low" | "medium" | "high";
    domains: string[];
  }> {
    // First, analyze the code to understand what we're dealing with
    const analysisPrompt = `Analyze this code and identify:
1. Primary language
2. Security implications (auth, crypto, data access?)
3. Complexity level
4. Primary domains (frontend, backend, database, etc.)
 
Code:
${this.context.codeSubmission}
 
Respond as JSON.`;
 
    const response = await this.client.messages.create({
      model: "claude-3-5-sonnet-20241022",
      max_tokens: 500,
      messages: [{ role: "user", content: analysisPrompt }],
    });
 
    try {
      const text =
        response.content[0].type === "text" ? response.content[0].text : "";
      return JSON.parse(text);
    } catch {
      // Fallback to default analysis
      return {
        language: "unknown",
        hasSecurityImplications: true,
        complexity: "medium",
        domains: ["general"],
      };
    }
  }
 
  async selectSpecialists(characteristics: any): Promise<string[]> {
    const specialists = [];
 
    // Always run code reviewer
    specialists.push("CODE_REVIEWER");
 
    // Always run test validator
    specialists.push("TEST_VALIDATOR");
 
    // Conditionally add specialists based on code characteristics
    if (characteristics.hasSecurityImplications) {
      specialists.push("SECURITY_REVIEWER");
    }
 
    if (characteristics.complexity === "high") {
      specialists.push("ARCHITECT");
    }
 
    if (
      characteristics.domains.includes("performance") ||
      characteristics.complexity === "high"
    ) {
      specialists.push("PERFORMANCE_ANALYST");
    }
 
    return specialists;
  }
 
  async orchestrate(): Promise<string> {
    console.log(`[${this.context.taskId}] Analyzing code characteristics...`);
    const characteristics = await this.analyzeCodeCharacteristics();
    console.log(`[${this.context.taskId}] Characteristics:`, characteristics);
 
    const specialists = await this.selectSpecialists(characteristics);
    console.log(`[${this.context.taskId}] Selected specialists:`, specialists);
 
    // Run selected specialists in parallel
    const results = await Promise.all(
      specialists.map((specialist) => this.runSpecialist(specialist)),
    );
 
    // Synthesize and return
    return this.synthesizeResults(specialists, results);
  }
 
  private async runSpecialist(specialistName: string): Promise<string> {
    // Implementation similar to earlier examples
    return `${specialistName} review complete`;
  }
 
  private async synthesizeResults(
    specialists: string[],
    results: string[],
  ): Promise<string> {
    const synthesisPrompt = `You have coordinated reviews from: ${specialists.join(", ")}.
 
Results:
${results.map((r, i) => `${specialists[i]}: ${r}`).join("\n\n")}
 
Synthesize into one coherent review.`;
 
    const response = await this.client.messages.create({
      model: "claude-3-5-sonnet-20241022",
      max_tokens: 2000,
      messages: [{ role: "user", content: synthesisPrompt }],
    });
 
    return response.content[0].type === "text" ? response.content[0].text : "";
  }
}

This approach is powerful because it scales naturally. You can add new specialists (DATABASE_REVIEWER, ACCESSIBILITY_CHECKER, DOCUMENTATION_REVIEWER) without touching the supervisor logic. The supervisor learns what the code looks like and activates the right team.

Common Pitfalls in Multi-Agent Systems

Building multi-agent systems is genuinely hard. Here are the patterns that consistently cause problems:

Pitfall 1: Inconsistent State Between Agents Multiple agents reading/writing the same file without coordination. Agent A reads config.json (version 5), Agent B reads it (also version 5), A modifies and writes (version 6), B modifies and writes (version 7, but missing A's changes). Classic lost update problem. The state becomes corrupted silently. No one notices until six hours later when a feature mysteriously stops working.

This happens all the time in distributed systems. Two processes try to update the same database record without a lock, and one update gets clobbered. In multi-agent systems, it's the same problem, just at the file level. The solution is non-negotiable: Always use the StateManager pattern with locks. One writer at a time. Wait for the lock, use it, release it. Yes, there's latency cost when agents queue. But you're trading microseconds of wait time for prevention of hours of debugging corrupted state. That's a deal you take every single time.

Pitfall 2: Overwhelming the Supervisor The supervisor becomes a bottleneck because it's doing the work instead of coordinating. A good supervisor should be fast—it makes routing decisions, not deep analysis. If your supervisor is taking 30 seconds, something's wrong. Supervisors should take seconds. Specialists should take the time they need.

Think about it from the system design perspective. Your supervisor's job is to say "Code Reviewer, you go analyze quality. Test Validator, you go check coverage. Architect, you go review design." That's fast—milliseconds. Then all three run in parallel for 5 seconds each. Then the supervisor takes 2 seconds to synthesize. Total time: ~7 seconds.

But if your supervisor also tries to do code review while coordinating, you've transformed a 7-second job into a 20-second job because the supervisor is serially doing analysis work. That violates the separation of concerns. Supervisors coordinate. Specialists specialize. Respect that boundary.

Pitfall 3: Agents Not Respecting Constraints You tell an agent "don't call tools more than 3 times" and it calls them 5 times anyway. Agents need explicit, repeated instruction about constraints. Better yet, enforce constraints in the handler—don't rely on the agent to self-enforce. If an agent is supposed to not call more than three tools, your orchestration layer should literally count tool calls and stop accepting more after three. Make it impossible to violate, not just difficult.

This matters because unconstrained agents will spiral. One tool call gives you partial information. The agent calls another tool to refine. Then another. Before you know it, you've burned through your token budget and the agent is no closer to a good answer. Constraints are guardrails that keep agents focused.

Pitfall 4: Context Explosion By the time results from all agents come back, you're trying to synthesize 8000 tokens of context. The synthesizer gets lost in a sea of information and produces mediocre results. "There were many issues found," instead of "Fix these three critical issues first." Keep individual agent outputs focused and concise. Summarize rather than dump everything.

This is a real phenomenon. Your code reviewer generates 2000 tokens of detailed analysis. Your test validator generates 1500 tokens. Your architect generates 2000 tokens. Now your synthesizer is trying to hold 5500 tokens of context while trying to be creative about synthesis. The model's context window is finite. It gets overwhelmed.

Solution: Require each agent to output a brief summary (200-300 tokens max) plus detailed findings. The supervisor synthesizes from summaries. If a deep dive is needed on a specific issue, fetch the detailed findings. You've structured the information hierarchically so the synthesizer doesn't need everything at once.

Pitfall 5: No Timeout Management One slow agent blocks the entire system. If your supervisor says "run these 5 agents in parallel," and one hangs forever, you're stuck. Your system is now waiting indefinitely for an agent that might be in a retry loop or stuck on a rate limit.

Always set timeouts. Always have a fallback. If Agent 3 doesn't respond in 30 seconds, mark it failed and move on. Partial results are better than no results. Your synthesizer should be able to work with "Code Reviewer succeeded, Test Validator succeeded, Architect timed out." That's better than waiting forever for Architect.

The timeout pattern is: (1) Set a deadline for each agent. (2) Monitor whether responses arrive by that deadline. (3) If deadline passes, treat it as a failure. (4) Continue with other agents. (5) Synthesize from whatever results you have. This prevents the entire system from hanging because one agent is slow.

Key Patterns and Takeaways

You now have the foundations for production multi-agent systems:

1. Supervisor Pattern: One coordinator deciding workflow, many specialists executing. The supervisor is the traffic cop—it routes work, monitors progress, and synthesizes results.

2. State Versioning: Lock-based conflict prevention with full history tracking. You trade latency for correctness. When agent B wants the lock, it waits. When agent A releases it, agent B proceeds. Simple, reliable, debuggable.

3. Context Inheritance: Each agent builds on previous findings through explicit context. No discovery duplication. Findings compound rather than scatter.

4. Parallel Execution: Run specialists simultaneously, then synthesize results. This is where you get speed. Specialists work in parallel; the supervisor stitches results together serially.

5. Conflict Detection: Track who modified what, detect overlaps before they cause bugs. History gives you the data to debug any issue.

6. Adaptive Selection: Choose which specialists to activate based on the work at hand. Don't pay for specialists you don't need.

The real power emerges when you combine these patterns. A supervisor decides which agents to run, they execute in parallel against shared state (protected by locks), inherit context from each other, and the supervisor synthesizes a coherent result.

This scales. You can add agents without rearchitecting. You can reorder agents without breaking consistency. You can test each agent independently because their contracts are explicit.

Most multi-agent systems fail because they treat agents like threads—just spin them up and pray they don't collide. The right approach treats them like team members: clear roles, explicit communication, shared understanding of constraints, and a coordinator who keeps everyone in sync.

Scaling Multi-Agent Systems: From 3 Agents to 30

The patterns I've shown you work with 3 agents. Do they work with 30? Mostly, but you need to think differently at scale.

With more agents, you need hierarchical supervision. Instead of one supervisor managing 30 agents, you have regional supervisors. A "code quality supervisor" manages code reviewer, style checker, and linter integration. A "correctness supervisor" manages test validator and architecture reviewer. They report to a chief supervisor. This mirrors how real organizations work—teams within teams.

You also need priority queues. Not all agents are equally important. A security reviewer should run before a documentation reviewer. Your orchestration system should understand priorities and schedule accordingly.

Finally, you need incremental synthesis. With 30 agents, you can't wait for all of them to finish before synthesizing. Instead, synthesize incrementally. Once the security and correctness agents finish, present those findings. Then layer in architecture and performance findings. Users see results progressively rather than waiting for the entire pipeline.

Building for Production Reliability

When you move from examples to production, a few things become critical:

Monitoring and Alerting: You need dashboards showing agent health. Are agents timing out? Are they making too many tool calls? Are they converging on consistent findings or disagreeing wildly? Set up alerts for anomalies.

Graceful Degradation: If one agent fails, can the system continue? In the code review example, if ARCHITECT fails but CODE_REVIEWER and TEST_VALIDATOR succeed, you can still give the user feedback. Your synthesizer should handle missing results gracefully.

Auditing: Log every decision the supervisor makes and every result each agent produces. When something goes wrong in production, you need to replay what happened and understand the decision chain.

Cost Management: Multi-agent systems are more expensive than single agents because you're doing multiple analyses. Track costs per execution. Understand where your token budget is going. Optimize high-cost agents first.

Key Patterns and Takeaways

You now have the foundations for production multi-agent systems at any scale. The supervisor pattern, state management, context inheritance, parallel execution, and conflict detection are your core toolkit.

Build the coordination layer first. Everything else follows.

-iNet

Agent SDK: Building Multi-Agent Systems

Why Single-Agent Systems Fail at Scale

Why This Matters: The Real-World Impact of Bad Coordination

Designing Agent Communication: The Supervisor Pattern

Handling State Conflicts: The Shared Context Problem

Building a Production-Grade Code Review System

Handling Deadlocks and Circular Dependencies

Understanding Multi-Agent Failure Modes

Advanced Pattern: Dynamic Agent Selection Based on Code Type

Common Pitfalls in Multi-Agent Systems

Key Patterns and Takeaways

Scaling Multi-Agent Systems: From 3 Agents to 30

Building for Production Reliability

Key Patterns and Takeaways

Need help implementing this?

Why Single-Agent Systems Fail at Scale

Why This Matters: The Real-World Impact of Bad Coordination

Designing Agent Communication: The Supervisor Pattern

Handling State Conflicts: The Shared Context Problem

Context Sharing: The Information Flow Problem

Building a Production-Grade Code Review System

Handling Deadlocks and Circular Dependencies

Understanding Multi-Agent Failure Modes

Advanced Pattern: Dynamic Agent Selection Based on Code Type

Common Pitfalls in Multi-Agent Systems

Key Patterns and Takeaways

Scaling Multi-Agent Systems: From 3 Agents to 30

Building for Production Reliability

Key Patterns and Takeaways

Need help implementing this?