The High-Level Mental Model

Before we dive into internals, let's establish a mental model. The Agent SDK is built on three pillars:

Session Management: A session holds state (context, tools, working directory, conversation history). It's stateful and persistent across multiple turns. Think of it as your personal Claude instance with memory.
Message Processing Pipeline: Every message you send goes through a validation→dispatch→execution→response flow. This is where the magic happens. Each step is deliberate.
Tool Abstraction Layer: Tools are pluggable. You define them, the SDK registers them, validates their inputs, executes them, and feeds results back to Claude. This separation is what makes the whole system robust.

Think of the SDK as a layered onion. At the center is Claude's language model. Around it are tool execution engines (Bash, Read, Write, browser APIs). Around that is the message routing layer. And around everything is the session container that holds it all together.

Core Components and Their Responsibilities

Let's break down the architecture into functional components. Each has a single responsibility, and they communicate via well-defined interfaces.

1. The Session Manager

The SessionManager is the entry point. When you create a ClaudeCodeSession, you're instantiating this component. It owns:

Configuration state: API keys, model selection, working directory, enabled tools
Conversation history: Every message and response, kept for multi-turn context
Tool registry: Mapping of tool names to tool implementations
Connection state: Whether the session is active, idle, or closed

Here's a simplified conceptual model:

typescript

interface SessionConfig {
  apiKey: string;
  model: string;
  workingDirectory: string;
  tools: Tool[];
  systemPrompt?: string;
  maxTokens?: number;
  temperature?: number;
}
 
class SessionManager {
  private config: SessionConfig;
  private history: Message[] = [];
  private toolRegistry: Map<string, Tool> = new Map();
  private state: "idle" | "processing" | "error" = "idle";
 
  constructor(config: SessionConfig) {
    this.config = config;
    this.initializeToolRegistry();
  }
 
  async message(userInput: string): Promise<Response> {
    // Simplified flow: validate → dispatch → execute → respond
    this.state = "processing";
    const response = await this.dispatchMessage(userInput);
    this.history.push({ role: "user", content: userInput });
    this.history.push({ role: "assistant", content: response });
    this.state = "idle";
    return response;
  }
 
  private initializeToolRegistry(): void {
    // Register built-in tools (Bash, Read, Write, etc.)
    // Register custom tools from config
  }
 
  private async dispatchMessage(input: string): Promise<Response> {
    // We'll dive deeper into this next
  }
}

The SessionManager is orchestrator. It doesn't execute tools itself—it coordinates the flow between the message processor, tool executor, and response handler. This separation means you can swap implementations without breaking the contract.

2. The Message Processor

The MessageProcessor handles the parsing and validation of incoming messages. It doesn't care about Claude or tools; it just ensures input is well-formed and safe.

typescript

interface ProcessedMessage {
  text: string;
  metadata: {
    timestamp: number;
    turnNumber: number;
    sourceContext?: string;
  };
  validationErrors: string[];
}
 
class MessageProcessor {
  process(input: string): ProcessedMessage {
    const processed: ProcessedMessage = {
      text: input.trim(),
      metadata: {
        timestamp: Date.now(),
        turnNumber: this.getTurnNumber(),
      },
      validationErrors: [],
    };
 
    // Validate length
    if (processed.text.length === 0) {
      processed.validationErrors.push("Message cannot be empty");
    }
    if (processed.text.length > 10000) {
      processed.validationErrors.push("Message exceeds 10,000 character limit");
    }
 
    // Validate for injection attacks (basic example)
    if (processed.text.includes("DELETE DATABASE")) {
      processed.validationErrors.push("Potentially dangerous command detected");
    }
 
    return processed;
  }
}

This layer is a guard rail. It catches obvious problems before they reach Claude. In production systems, this is where you'd add rate limiting, content filtering, or audit logging. This is also where you'd implement tenant isolation if building multi-tenant systems.

3. The Tool Registry and Dispatcher

The ToolRegistry is the directory of all tools available to Claude. When Claude says "call the fetch_customer tool," the dispatcher looks it up here.

typescript

interface Tool {
  name: string;
  description: string;
  input_schema: JSONSchema;
  handler: (input: any) => Promise<any>;
  validate?: (input: any) => ValidationResult;
}
 
class ToolRegistry {
  private tools: Map<string, Tool> = new Map();
 
  register(tool: Tool): void {
    // Validate tool definition
    if (!tool.name || !tool.description || !tool.input_schema) {
      throw new Error(`Invalid tool definition: missing required fields`);
    }
 
    // Prevent duplicate registration
    if (this.tools.has(tool.name)) {
      throw new Error(`Tool "${tool.name}" already registered`);
    }
 
    this.tools.set(tool.name, tool);
  }
 
  get(name: string): Tool | undefined {
    return this.tools.get(name);
  }
 
  list(): Tool[] {
    return Array.from(this.tools.values());
  }
}
 
class ToolDispatcher {
  private registry: ToolRegistry;
 
  async dispatch(toolName: string, input: any): Promise<ToolResult> {
    const tool = this.registry.get(toolName);
    if (!tool) {
      return {
        success: false,
        error: `Tool not found: ${toolName}`,
      };
    }
 
    // Validate input against the tool's schema
    const validation = this.validateInput(input, tool.input_schema);
    if (!validation.valid) {
      return {
        success: false,
        error: `Validation failed: ${validation.errors.join(", ")}`,
      };
    }
 
    // Execute the tool
    try {
      const result = await tool.handler(input);
      return {
        success: true,
        data: result,
      };
    } catch (error) {
      return {
        success: false,
        error: `Tool execution failed: ${error.message}`,
      };
    }
  }
 
  private validateInput(input: any, schema: JSONSchema): ValidationResult {
    // Use ajv (JSON Schema validator) or similar
    // Returns { valid: boolean; errors: string[] }
  }
}

Why this architecture? Separating registry from dispatcher lets you:

Inspect available tools without executing them
Validate inputs before execution (fail fast)
Audit tool calls by intercepting at the dispatch layer
Mock tools for testing by swapping the registry

This is the open/closed principle in action. The dispatcher is closed for modification (it always does the same thing), but open for extension (you add tools by registering them).

4. The Built-in Tool Executors

The SDK ships with built-in tool implementations: Bash, Read, Write, Browser, and Network. Each is a separate executor with its own error handling.

typescript

// Example: Bash executor
class BashExecutor implements Tool {
  name = "bash";
  description = "Execute shell commands in the working directory";
  input_schema = {
    type: "object",
    properties: {
      command: {
        type: "string",
        description: "The shell command to execute",
      },
      timeout: {
        type: "number",
        description: "Timeout in milliseconds (default: 30000)",
      },
    },
    required: ["command"],
  };
 
  async handler(input: { command: string; timeout?: number }): Promise<any> {
    const timeout = input.timeout || 30000;
 
    // Spawn a child process in the working directory
    const process = spawn("bash", ["-c", input.command], {
      cwd: this.workingDirectory,
      timeout,
    });
 
    return new Promise((resolve, reject) => {
      let stdout = "";
      let stderr = "";
 
      process.stdout.on("data", (data) => {
        stdout += data.toString();
      });
 
      process.stderr.on("data", (data) => {
        stderr += data.toString();
      });
 
      process.on("close", (code) => {
        resolve({
          exitCode: code,
          stdout,
          stderr,
          success: code === 0,
        });
      });
 
      process.on("error", (error) => {
        reject(error);
      });
    });
  }
}

Each executor:

Defines its own input schema (what parameters it accepts)
Implements the handler interface (async function)
Handles its own error cases (command not found, timeout, permission denied)
Returns structured results that Claude can reason about

This is why the SDK feels so smooth. Each executor is battle-tested, handles edge cases, and provides consistent output. The Bash executor, for example, carefully captures both stdout and stderr, distinguishes between success and failure, and respects timeouts to prevent hanging.

5. The Message Flow Controller

This is the orchestrator that ties everything together. It implements the state machine that governs how messages flow through the system.

typescript

type MessageState =
  | "idle"
  | "awaiting_claude"
  | "executing_tool"
  | "processing_response"
  | "error";
 
class MessageFlowController {
  private state: MessageState = "idle";
  private currentToolCall?: ToolCall;
 
  async processMessage(
    input: string,
    sessionManager: SessionManager,
    dispatcher: ToolDispatcher,
  ): Promise<Response> {
    // 1. VALIDATE: Ensure input is well-formed
    this.state = "idle";
    const processed = sessionManager.validateInput(input);
    if (!processed.valid) {
      throw new Error(`Validation failed: ${processed.errors}`);
    }
 
    // 2. SEND_TO_CLAUDE: Add message to history and call the model
    this.state = "awaiting_claude";
    const claudeResponse = await this.callClaude(input, sessionManager);
 
    // 3. CHECK_FOR_TOOL_CALLS: Does Claude want to use a tool?
    if (claudeResponse.hasToolCall) {
      this.currentToolCall = claudeResponse.toolCall;
      return this.handleToolCall(
        claudeResponse.toolCall,
        dispatcher,
        sessionManager,
      );
    }
 
    // 4. RETURN_FINAL_RESPONSE: No more tool calls, Claude is done
    this.state = "idle";
    return {
      text: claudeResponse.text,
      toolCalls: [],
      finalResponse: true,
    };
  }
 
  private async handleToolCall(
    toolCall: ToolCall,
    dispatcher: ToolDispatcher,
    sessionManager: SessionManager,
  ): Promise<Response> {
    this.state = "executing_tool";
 
    // Dispatch the tool
    const toolResult = await dispatcher.dispatch(toolCall.name, toolCall.input);
 
    // Add the tool result to conversation history
    sessionManager.addMessage({
      role: "user",
      type: "tool_result",
      toolName: toolCall.name,
      result: toolResult,
    });
 
    // Claude may want to use another tool or generate final response
    // Loop back to "awaiting_claude"
    this.state = "processing_response";
    return this.processMessage("", sessionManager, dispatcher);
  }
 
  private async callClaude(
    input: string,
    sessionManager: SessionManager,
  ): Promise<ClaudeResponse> {
    const client = new Anthropic({
      apiKey: sessionManager.config.apiKey,
    });
 
    const response = await client.messages.create({
      model: sessionManager.config.model,
      max_tokens: sessionManager.config.maxTokens || 4096,
      system: sessionManager.getSystemPrompt(),
      tools: sessionManager.getToolDefinitions(),
      messages: sessionManager.getHistory(),
    });
 
    return {
      text: response.content[0].type === "text" ? response.content[0].text : "",
      hasToolCall: response.stop_reason === "tool_use",
      toolCall: this.extractToolCall(response),
    };
  }
 
  private extractToolCall(response: Message): ToolCall | undefined {
    // Find the tool_use block in the response
    const toolUseBlock = response.content.find(
      (block) => block.type === "tool_use",
    );
    if (!toolUseBlock || toolUseBlock.type !== "tool_use") {
      return undefined;
    }
 
    return {
      id: toolUseBlock.id,
      name: toolUseBlock.name,
      input: toolUseBlock.input,
    };
  }
}

This controller implements a loop: Claude produces output → we check for tool calls → we execute tools → we feed results back to Claude → repeat until done.

Why is this a loop and not a single request? Because Claude doesn't execute tools; we execute tools on Claude's behalf. When Claude says "call fetch_customer with id='123'," we:

Execute fetch_customer in our environment
Get the result
Tell Claude what happened
Let Claude decide what to do next (use another tool, generate final response, ask for clarification)

This is the agentic loop. It's the reason Claude Code can work with real systems. It's also why debugging tool calls is easier—you can intercept at each step.

The Session Lifecycle and State Machine

Understanding the session lifecycle is critical for debugging and optimization. A session goes through predictable states:

CREATE → CONFIGURE → READY → MESSAGE_PROCESSING → (TOOL_EXECUTION)* → RESPONSE → IDLE → MESSAGE_PROCESSING → ...

Let's map this out:

1. CREATE: Constructor Runs

typescript

const session = new ClaudeCodeSession({
  apiKey: process.env.ANTHROPIC_API_KEY,
  model: "claude-3-5-sonnet-20241022",
  workingDirectory: "/tmp/project",
});
 
// At this point:
// - API key is validated
// - Model is set
// - Working directory is created (or verified)
// - Tool registry is initialized with built-in tools
// - Conversation history is empty
// - State: CREATED

2. CONFIGURE: Optional Customization

typescript

session.registerTool({
  name: "fetch_user_data",
  description: "Retrieve user by ID",
  input_schema: {
    type: "object",
    properties: {
      userId: { type: "string" },
    },
    required: ["userId"],
  },
  handler: async (input) => {
    // Your implementation
    return { name: "John", email: "john@example.com" };
  },
});
 
// At this point:
// - Custom tools are registered
// - System prompt can be customized
// - State: CONFIGURED

3. READY: Session Awaits Input

The session is configured and waiting. No API calls have happened yet. This is idle time, and it's important for understanding costs. You only pay for API usage when you call message(), not for sessions sitting idle.

4. MESSAGE_PROCESSING: Message Arrives

typescript

const response = await session.message(
  "What files are in the working directory?",
);
 
// Timeline:
// T0: message() called
// T1: Input validated (MessageProcessor)
// T2: API call to Anthropic (awaiting Claude)
// T3: Response from Claude received
// T4: Check for tool calls
// T5: If tool calls: execute them (ToolDispatcher)
// T6: If more tool calls needed: loop back to T2
// T7: Final response returned to user

This timeline is critical for understanding latency. The latency you observe includes network round-trips to Anthropic's API plus tool execution time. If Claude is calling 5 different tools, you're waiting for all 5 to execute sequentially.

5. TOOL_EXECUTION: If Needed

This only happens if Claude requested a tool call. The dispatcher:

Validates the input against the tool's schema
Executes the tool
Captures stdout, stderr, exit codes
Adds the result to the conversation history
Signals the flow controller to loop back to Claude

The key insight: Tool execution failures are not session failures. If a tool crashes, the error is captured and reported to Claude. Claude can then decide to retry, use a different tool, or explain the issue to the user.

6. RESPONSE: Final Output

Claude has made a final statement (no more tool calls), and we return that text to the user.

7. IDLE: Ready for Next Message

The session is back to state IDLE. Conversation history persists. The next message() call will include all previous context. This is where sessions shine—multi-turn conversations feel natural because Claude remembers everything.

Critical insight: Each message() call is not independent. Claude sees the entire history. This is why multi-turn conversations work so well. Claude knows what happened in previous turns and can reference them naturally.

Message Format and the Tool Calling Protocol

Let's zoom in on exactly what Claude sees and how it communicates back.

What Claude Receives (System Context)

When you call session.message(), the SDK builds a context and sends it to Claude:

typescript

// Example of what gets sent to Claude's API:
 
{
  "model": "claude-3-5-sonnet-20241022",
  "max_tokens": 4096,
  "system": "You are Claude Code, an AI assistant that helps with...",
  "tools": [
    {
      "name": "bash",
      "description": "Execute shell commands...",
      "input_schema": { /* JSON Schema */ }
    },
    {
      "name": "read",
      "description": "Read file contents...",
      "input_schema": { /* JSON Schema */ }
    },
    // ... more tools
  ],
  "messages": [
    {
      "role": "user",
      "content": "What files are in the working directory?"
    }
  ]
}

Claude sees:

System prompt: Instructions on how to behave
Available tools: Complete list with descriptions and schemas
Message history: Everything that's happened so far

Understanding this is crucial. The system prompt drives behavior. The tool list determines what Claude can do. The history determines context.

How Claude Calls Tools (Tool Use Format)

Claude doesn't execute tools. Instead, it says "I want to use tool X with these parameters." The response looks like:

typescript

{
  "content": [
    {
      "type": "tool_use",
      "id": "tool_use_xyz123",
      "name": "bash",
      "input": {
        "command": "ls -la"
      }
    }
  ],
  "stop_reason": "tool_use"
}

The SDK extracts this and passes it to the ToolDispatcher. The dispatcher:

Validates the input against the Bash tool's schema
Executes the command
Captures the output

Then the SDK adds the result back to the message history:

typescript

{
  "role": "user",
  "content": [
    {
      "type": "tool_result",
      "tool_use_id": "tool_use_xyz123",
      "content": "drwxr-xr-x  5 user  staff   160 Mar 17 10:45 .\ndrwxr-xr-x 10 user  staff   320 Mar 17 10:40 .."
    }
  ]
}

And Claude continues. It might use another tool, or it might have enough information to respond to the user.

Why this design? It enforces a clear separation: Claude doesn't execute code; it describes code execution. We execute on its behalf and report results. This is safer, more auditable, and lets us sandbox Claude's access. If you need to revoke Claude's access to a tool, you just unregister it. No code changes needed.

Extension Points and the Plugin Architecture

The SDK is designed for extension. Here's where you can hook in:

1. Custom Tools

typescript

session.registerTool({
  name: "query_database",
  description: "Execute SQL queries",
  input_schema: {
    /* ... */
  },
  handler: async (input) => {
    // Your database logic
  },
});

2. Custom System Prompts

Change how Claude behaves:

typescript

session.setSystemPrompt(`
You are an expert backend engineer reviewing pull requests.
Focus on: security, performance, and maintainability.
Always check for SQL injection vulnerabilities.
`);

3. Message Interceptors

Hook into the message flow to log, filter, or modify messages:

typescript

session.onBeforeToolCall((toolCall) => {
  console.log(`Tool called: ${toolCall.name}`);
  // Could reject certain tools here
  return toolCall;
});
 
session.onAfterToolExecution((toolCall, result) => {
  console.log(`Tool completed: ${toolCall.name}`);
  // Could modify the result here
  return result;
});

4. Custom Validators

Add validation logic before tools execute:

typescript

session.registerValidator("bash", (input) => {
  if (input.command.includes("rm -rf")) {
    return { valid: false, reason: "Dangerous command blocked" };
  }
  return { valid: true };
});

These extension points let you:

Enforce security policies (no dangerous commands)
Audit usage (log who did what)
Customize behavior (rate limit, sandboxing)
Integrate with external systems (send tool results to analytics)

In production systems, you'll use all of these. Security validators prevent accidents. Interceptors enable audit trails. Custom tools connect Claude to your business systems.

Error Handling and Recovery

The architecture handles errors at multiple levels:

1. Input Validation Errors

Caught immediately, before API calls:

typescript

try {
  const response = await session.message("x".repeat(10001)); // Too long
} catch (error) {
  // Error: Message exceeds 10,000 character limit
  // State: IDLE (no API call made)
}

2. Tool Validation Errors

Caught before execution:

typescript

// Claude tries to call fetch_user with missing userId parameter
try {
  await dispatcher.dispatch("fetch_user", {});
} catch (error) {
  // Error: Validation failed
  // The tool's handler never runs
  // State: READY_FOR_RETRY (Claude can try again with correct params)
}

3. Tool Execution Errors

Caught during execution:

typescript

// Tool handler throws an error
const result = await dispatcher.dispatch("bash", {
  command: "/nonexistent/file",
});
// Result: { success: false, error: "Command failed: ENOENT" }
// The error is returned to Claude as context
// Claude can decide what to do next

4. API Errors

Network or API issues:

typescript

try {
  const response = await session.message("Hello");
  // Network error, timeout, rate limit, invalid key, etc.
} catch (error) {
  // Error: Network/API error
  // Message history is preserved
  // Session can retry from this point
}

Key insight: Errors don't necessarily break the conversation. Many errors are recoverable. The SDK feeds them back to Claude, which can retry, use a different tool, or explain the problem to the user. This resilience is baked into the architecture.

Performance Characteristics and Optimization

Understanding the architecture helps you optimize for production:

1. Reduce Tool Calls

Each tool call adds latency (API round-trip + execution). Design tools to minimize this:

Bad: 10 separate tool calls to fetch individual customer fields Good: One tool call that fetches the entire customer object

typescript

// Bad pattern:
await bash("get-customer-name", { id: "123" });
await bash("get-customer-email", { id: "123" });
await bash("get-customer-orders", { id: "123" });
 
// Good pattern:
await bash("get-customer", { id: "123", include: ["name", "email", "orders"] });

2. Batch Operations

If Claude needs to process multiple items, batch them:

typescript

// Tool definition
{
  name: "batch_process",
  handler: async (input: { items: string[] }) => {
    // Process all items at once
    // Faster than 100 individual calls
  }
}

3. Cache Frequently Accessed Data

If Claude repeatedly asks for the same data, cache it:

typescript

session.registerTool({
  name: "get_config",
  handler: async (input) => {
    // First call: expensive
    // Subsequent calls: cached
    return cache.get("config") || fetchConfig();
  },
});

4. Stream Responses

For long-running operations, stream progress:

typescript

session.onMessage(async (input) => {
  return {
    stream: true, // Indicate streaming
    streamHandler: async (chunk) => {
      // Each chunk arrives as it's generated
    },
  };
});

Debugging and Introspection

The architecture exposes debugging capabilities:

typescript

// Inspect session state
console.log(session.getState()); // "idle", "processing", etc.
 
// View conversation history
console.log(session.getHistory());
// [
//   { role: "user", content: "list files" },
//   { role: "assistant", content: "..." },
// ]
 
// List registered tools
console.log(session.getTools());
 
// View the last API request/response
console.log(session.getLastRequest());
console.log(session.getLastResponse());
 
// Enable debug logging
session.setDebug(true);
// Outputs: [API] Request sent, [TOOL] bash called, [RESULT] ...

With this introspection, you can diagnose issues, understand what Claude is thinking, and optimize your setup. You can see exactly what gets sent to Claude's API, which is invaluable for debugging strange behavior.

Concurrency and Parallelism: Handling Multiple Sessions

In production, you'll often need multiple concurrent sessions. The architecture handles this gracefully.

Session Isolation

Each session is fully isolated:

typescript

// Create two independent sessions
const session1 = new ClaudeCodeSession({
  apiKey: process.env.ANTHROPIC_API_KEY,
  workingDirectory: "/tmp/project-a",
});
 
const session2 = new ClaudeCodeSession({
  apiKey: process.env.ANTHROPIC_API_KEY,
  workingDirectory: "/tmp/project-b",
});
 
// Run concurrently
const [result1, result2] = await Promise.all([
  session1.message("List files"),
  session2.message("List files"),
]);
 
// Each sees its own directory
console.log(result1); // Files from /tmp/project-a
console.log(result2); // Files from /tmp/project-b

Sessions don't share state. Each maintains its own:

Conversation history
Tool registry
Working directory
Configuration

This means you can spawn dozens of sessions in parallel without interference. Perfect for batch processing, parallel code reviews, or multi-tenant systems.

Queuing and Rate Limiting

If you need to throttle requests, the architecture supports it:

typescript

class SessionPool {
  private queue: Array<() => Promise<any>> = [];
  private activeCount = 0;
  private maxConcurrent = 5;
 
  async enqueue<T>(fn: () => Promise<T>): Promise<T> {
    return new Promise((resolve, reject) => {
      this.queue.push(async () => {
        try {
          const result = await fn();
          resolve(result);
        } catch (error) {
          reject(error);
        }
      });
      this.process();
    });
  }
 
  private async process(): Promise<void> {
    if (this.activeCount >= this.maxConcurrent || this.queue.length === 0) {
      return;
    }
 
    this.activeCount++;
    const fn = this.queue.shift()!;
    await fn();
    this.activeCount--;
    this.process();
  }
}
 
// Usage
const pool = new SessionPool();
const results = await Promise.all(
  items.map((item) => pool.enqueue(() => session.message(`Process: ${item}`))),
);

The architecture supports this because sessions don't depend on global state. You can queue them, throttle them, load-balance them across servers.

Memory Management and Context Windows

One subtle but critical aspect: how much history does Claude see?

When you call session.message(), the SDK builds a context array with:

System prompt (usually ~1000 tokens)
Conversation history (grows with each turn)
Tool definitions (fixed, but can be large if you have many tools)

typescript

// In your MessageFlowController
const context = {
  system: this.getSystemPrompt(), // Fixed size
  messages: this.getHistory(), // Grows over time
  tools: this.getToolDefinitions(), // Fixed size
};
 
// By the 100th turn, history might look like:
{
  system: "You are Claude Code...", // 1000 tokens
  messages: [
    { role: "user", content: "First question" }, // T1
    { role: "assistant", content: "..." }, // T50
    { role: "user", content: "Second question" }, // T1
    { role: "assistant", content: "..." }, // T100
    // ... 98 more turns ...
  ],
  tools: [ /* 20 tools, ~2000 tokens */ ] // 2000 tokens
}

The problem: As the history grows, the total context grows. Eventually, you approach the model's context limit (usually 100k-200k tokens for Claude). This impacts:

Latency: Larger context = slower API requests
Cost: You pay per token, including context
Quality: Older context becomes less relevant

Solutions the architecture provides:

1. Summarization

Summarize old history:

typescript

class SessionManager {
  private history: Message[] = [];
  private maxHistoryMessages = 50;
 
  async addMessage(message: Message): Promise<void> {
    this.history.push(message);
 
    // If history grows too large, summarize it
    if (this.history.length > this.maxHistoryMessages) {
      await this.summarizeOldMessages();
    }
  }
 
  private async summarizeOldMessages(): Promise<void> {
    const oldMessages = this.history.slice(0, -20); // Keep last 20
    const summaryPrompt = `Summarize this conversation for context:\n${JSON.stringify(oldMessages)}`;
 
    const summary = await this.callClaude(summaryPrompt);
    // Replace old messages with summary
    this.history = [
      {
        role: "system",
        content: `Summary of earlier conversation: ${summary}`,
      },
      ...this.history.slice(-20),
    ];
  }
}

2. Rolling Windows

Keep only recent history:

typescript

class SessionManager {
  async message(input: string): Promise<Response> {
    // Only send the last 10 messages
    const recentHistory = this.history.slice(-10);
    const response = await this.callClaudeWithHistory(input, recentHistory);
 
    this.history.push({ role: "user", content: input });
    this.history.push({ role: "assistant", content: response });
 
    return response;
  }
}

3. Selective Retention

Keep important messages, discard trivial ones:

typescript

private isImportant(message: Message): boolean {
  // Tool calls and results are important
  if (message.hasToolCall || message.type === "tool_result") {
    return true;
  }
 
  // Long user messages are important
  if (message.role === "user" && message.content.length > 100) {
    return true;
  }
 
  // Short acknowledgments are not
  return false;
}

Understanding memory management is crucial for production use. A 500-turn conversation will eventually become unwieldy unless you manage context carefully.

Testing and Mocking

The architecture's modularity makes testing straightforward:

typescript

describe("MessageFlowController", () => {
  it("should call a tool when Claude requests it", async () => {
    // Mock the tool
    const mockTool = {
      name: "test_tool",
      handler: jest.fn().mockResolvedValue({ success: true }),
    };
 
    // Create a session with the mock tool
    const session = new ClaudeCodeSession({
      apiKey: "fake-key",
      tools: [mockTool],
    });
 
    // Mock Claude's response to request a tool call
    jest.spyOn(session, "callClaude").mockResolvedValue({
      toolCall: {
        name: "test_tool",
        input: { param: "value" },
      },
    });
 
    // Execute
    await session.message("Use test_tool");
 
    // Verify the tool was called with correct input
    expect(mockTool.handler).toHaveBeenCalledWith({ param: "value" });
  });
});

You can test:

Tool dispatch logic
Validation rules
Error handling
State transitions

Without any API calls. Pure, fast unit tests.

Real-World Deployment Patterns

Understanding the architecture helps you deploy confidently. Different deployment contexts need different configurations.

Single-Machine Development Setup

For local development, simplicity wins:

typescript

const session = new ClaudeCodeSession({
  apiKey: process.env.ANTHROPIC_API_KEY,
  model: "claude-3-5-sonnet-20241022",
  workingDirectory: process.cwd(),
  tools: ["bash", "read", "write"],
});
 
// Session runs entirely locally
// No cross-machine concerns
// Maximum flexibility

This setup is great for experimentation, prototyping, and learning. The downside: no persistence, no scalability, no audit trail.

Server-Based Deployment with Session Pooling

For production, you typically run multiple sessions in a pool:

typescript

class SessionServer {
  private pool: SessionPool;
  private database: AuditLog;
 
  constructor() {
    this.pool = new SessionPool({
      maxSessions: 50,
      sessionTimeout: 3600000, // 1 hour
      idleCleanup: 300000, // 5 minutes
    });
  }
 
  async handleRequest(request: SessionRequest): Promise<Response> {
    const session = await this.pool.acquire();
 
    try {
      const result = await session.message(request.prompt);
      await this.database.log({
        clientId: request.clientId,
        sessionId: session.id,
        prompt: request.prompt,
        result: result,
        timestamp: Date.now(),
      });
      return result;
    } finally {
      this.pool.release(session);
    }
  }
}

This pattern:

Reuses sessions (faster than creating new ones)
Limits concurrent sessions (resource control)
Logs everything (audit trail)
Cleans up idle sessions (prevents leaks)

Multi-Tenant Deployment

In multi-tenant systems, isolation is critical:

typescript

class TenantManager {
  private sessionsByTenant: Map<string, SessionPool> = new Map();
 
  async getSessionForTenant(tenantId: string): Promise<ClaudeCodeSession> {
    if (!this.sessionsByTenant.has(tenantId)) {
      // Create a dedicated pool for this tenant
      this.sessionsByTenant.set(
        tenantId,
        new SessionPool({
          maxSessions: 10, // Per-tenant limit
          workingDirectory: `/data/tenants/${tenantId}`,
          tools: await this.getAuthorizedTools(tenantId),
        }),
      );
    }
 
    return this.sessionsByTenant.get(tenantId).acquire();
  }
 
  private async getAuthorizedTools(tenantId: string): Promise<Tool[]> {
    // Different tenants can access different tools
    // Based on their license tier or permissions
    const tier = await this.getTenantTier(tenantId);
    return toolsByTier[tier];
  }
}

This ensures:

Tenants can't interfere with each other
Resource limits prevent one tenant from starving others
Permissions are enforced at the session level
Working directories are isolated

Performance Profiling and Optimization

Once you understand the architecture, you can profile and optimize. The SDK exposes hooks for instrumentation:

typescript

// Measure message latency
class PerformanceMonitor {
  async measureMessage(
    session: ClaudeCodeSession,
    prompt: string,
  ): Promise<PerformanceMetrics> {
    const startTime = performance.now();
    const validationStart = performance.now();
 
    const processed = session.validateInput(prompt);
 
    const validationTime = performance.now() - validationStart;
    const claudeStart = performance.now();
 
    const response = await session.callClaude(prompt);
 
    const claudeTime = performance.now() - claudeStart;
    let executionTime = 0;
 
    if (response.hasToolCall) {
      const execStart = performance.now();
      // Tool execution happens here
      executionTime = performance.now() - execStart;
    }
 
    const totalTime = performance.now() - startTime;
 
    return {
      total: totalTime,
      validation: validationTime,
      apiCall: claudeTime,
      execution: executionTime,
      overhead: totalTime - claudeTime, // Everything except API call
    };
  }
}

With this instrumentation, you can identify where time is actually spent:

If apiCall is 90%+ of time: You're doing I/O bound work. Parallelize multiple sessions.
If overhead is high: Your tools are slow or your validation is expensive. Optimize tools.
If execution is high: Tool execution is the bottleneck. Run tools in parallel or use faster implementations.

This empirical approach prevents premature optimization. Measure first, optimize where it matters.

Backward Compatibility and Versioning

The SDK needs to evolve, but existing code shouldn't break. The architecture supports this through careful versioning:

typescript

// SDK version tracking
const SDK_VERSION = "1.2.0";
 
class ClaudeCodeSession {
  private apiVersion = "2024-03-17"; // API version session was created with
  private sdkVersion = SDK_VERSION;
 
  async message(input: string): Promise<Response> {
    // Use the API version this session was created with
    // Even if SDK is updated to a newer version
    const response = await this.callClaudeWithVersion(input, this.apiVersion);
    return response;
  }
}
 
// Breaking changes happen in major versions
// Deprecations happen gradually:
// 1.0.0: Feature exists
// 1.1.0: Feature marked deprecated, new API provided
// 2.0.0: Feature removed
 
class SessionManager {
  registerTool(tool: Tool | LegacyTool): void {
    if (isLegacyTool(tool)) {
      console.warn("Tool format deprecated in v2.0. Please update.");
      // Still works, but with a warning
      tool = adaptLegacyTool(tool);
    }
    this.toolRegistry.register(tool);
  }
}

This approach lets teams upgrade gradually without emergency migrations.

Deployment Patterns and Real-World Configurations

Understanding the architecture is one thing. Applying it effectively in production is another. Let's walk through how different organizations deploy the Agent SDK based on their specific constraints and opportunities.

Startups and small teams benefit from the simplicity of single-session deployments. Minimal operational overhead, no need for session pooling or distributed tracing. A single developer spins up a session, sends messages, and gets results. The SDK handles all the complexity behind the scenes. This lets small teams punch above their weight—you get sophisticated AI-assisted workflows without needing to manage the infrastructure that makes them possible.

Mid-size organizations typically adopt session pools with basic load balancing. Run multiple sessions, distribute requests across them, set resource limits per session. At this scale, you're starting to think about availability: if one session crashes, others keep working. You're starting to think about fairness: one long-running operation shouldn't starve others. The architecture supports this naturally—sessions are isolated, so you can implement straightforward queue-and-worker patterns.

Large enterprises implement multi-region deployments with sophisticated logging, monitoring, and compliance requirements. Sessions might be pinned to specific regions for data residency. Requests might be logged to audit systems. Different user tiers might get different tool access. The architecture supports all of this through its extension points. You're not modifying the SDK—you're plugging your infrastructure into it.

The key insight is that the architecture doesn't force a particular deployment pattern. It provides the building blocks that support patterns ranging from toy projects to enterprise-scale systems.

Advanced Memory Management Strategies

Earlier we discussed summarization and rolling windows. Let's go deeper into memory optimization because it's critical for long-running systems.

The naive approach is to keep the entire history forever. This works fine for short conversations (10-50 turns) but breaks for long conversations (hundreds of turns). The context window fills up, cost increases linearly with history length, and older information becomes noise.

A sophisticated approach uses semantic compression. Instead of summarizing old messages as text, extract their semantic meaning. "User asked about authentication issues, we diagnosed it as a JWT expiration problem, implemented token refresh" becomes a structured fact: {topic: "auth", issue: "jwt_expiration", resolution: "token_refresh"}. Store these facts in a separate structured format that Claude can reference without them taking up tokens in the message history.

Implement hierarchical memory. Recent messages stay in the conversation history (full fidelity). Older messages are compressed into summaries (lower fidelity). Older summaries are further compressed into just facts (minimal tokens). When Claude needs to reference something old, you can fetch the full history if necessary. This creates a memory system that's efficient for the common case (recent context) but can still access older information if needed.

Use importance scoring to decide what to keep. Assign scores to messages based on how much information they contain. "User provided error message" is low importance. "User explained the entire system architecture" is high importance. When memory is constrained, discard low-importance messages first. This biases toward keeping information that matters.

Implement topic-based organization. Group messages by topic (authentication, payments, logging, etc.). Within each topic, apply different retention strategies. Topics that are actively being discussed stay in full context. Topics that are inactive get summarized. Topics that are very old get compressed to facts. This matches how humans remember—recent and relevant stuff stays sharp, old stuff becomes fuzzy.

Observability and Debugging in Production

When your sessions are running in production, handling thousands of concurrent requests, you need visibility. The architecture provides hooks for instrumentation, but you need to know which ones to use.

Instrument the message flow controller to track state transitions. "Session moved from IDLE to AWAITING_CLAUDE at T=0s. Received response at T=2.3s. Started executing tool at T=2.4s." This timeline tells you where latency is coming from. If most latency is in AWAITING_CLAUDE, it's Claude's API. If most is in tool execution, your tools are slow.

Instrument tool dispatchers to track which tools are called, with what input, how long they take, what they return. Build a dashboard showing tool usage patterns. "The bash tool is called 10,000 times/day, averaging 200ms. The read tool is called 50,000 times/day, averaging 50ms." This data reveals where to optimize—maybe bash commands are slow because they're spawning many processes, and you could batch them.

Instrument error handling to distinguish between different error types. A tool returning an error is different from a tool crashing. A validation failure is different from a timeout. An API error from Claude is different from a network error. Categorizing errors lets you route them differently—some might retry automatically, some might alert on-call, some might just log.

Use structured logging. Instead of plain text logs, emit JSON with consistent fields: timestamp, session_id, user_id, message_type, duration, error_code. This lets you query your logs: "Show me all sessions with errors in the tool dispatcher." "How many message processing timeouts occurred in the last hour?" Structured logs are indexable and analyzable in ways plain text never will be.

Build synthetic tests that exercise the SDK in ways production doesn't. A test session that goes through 500 turns to check memory management. A test that spawns 1000 concurrent sessions to check for resource leaks. A test that generates pathological inputs (huge strings, deeply nested objects, circular references) to check error handling. These tests catch issues before production sees them.

The Evolution of Patterns: From Prototyping to Scale

The SDK supports a maturity curve. You start simple and grow sophisticated without rewriting fundamentally.

Phase 1: Prototyping. Single session, simple tools, no optimization. You're learning how Claude handles your domain, what patterns work, what instructions are effective. Minimal code. Minimal infrastructure.

Phase 2: Production MVP. Session pool, basic monitoring, structured logging. You're handling real requests, so you need availability and observability. Still straightforward—no distributed tracing or multi-region complexity.

Phase 3: Scale. Load balancing, regional deployment, advanced memory management, comprehensive monitoring. You're handling significant volume and need to think about efficiency and reliability at scale.

Phase 4: Sophistication. Custom session management, specialized tool dispatch, contextual memory, advanced optimization. You've integrated the SDK deeply into your infrastructure and are extracting maximum value.

Each phase builds on the previous one. You're not replacing the SDK—you're using it in more sophisticated ways. The SDK's architecture supports this progression because it's designed with extension points and clear separation of concerns.

The Reliability Mindset

The Agent SDK is built on a reliability-first mindset. Every component is designed with failure modes in mind. Tools can fail—that's expected, not an error. Claude can ask for things that aren't available—the SDK handles it gracefully. Network requests can time out—the SDK has strategies for recovery.

This mindset shapes how you deploy and operate the SDK. You don't assume sessions will run forever without issues. You design for restarts, for failures, for graceful degradation. You monitor proactively. You have runbooks for common failure modes.

The SDK gives you tools to build reliable systems. It's up to you to use them effectively. Understand the architecture, instrument it well, monitor it closely, test it thoroughly. Do those things, and you can confidently deploy AI-assisted workflows that the SDK orchestrates reliably.

Conclusion: The Design Principles

The Agent SDK's architecture reflects several core design principles:

Separation of concerns: Each component has a single responsibility
Clear interfaces: Tools, validators, and executors have well-defined contracts
Layered security: Validation happens at multiple levels
Auditability: Every step can be logged and inspected
Extensibility: Custom tools, validators, and interceptors hook into defined extension points
Robustness: Errors are caught, contextualized, and fed back to Claude
Concurrency-friendly: Sessions are isolated, supporting parallel execution
Production-ready: Built-in support for streaming, memory management, and testing
Observable: Instrumentation hooks for performance monitoring
Evolvable: Versioning and backward compatibility built in

Understanding this architecture means you can:

Debug effectively: Know where problems occur in the pipeline
Optimize for your use case: Adjust tools, prompts, and flow
Scale confidently: Handle hundreds of concurrent sessions
Extend fearlessly: Add tools, validators, and behaviors knowing the design patterns
Build safer systems: Understand where security boundaries are
Deploy strategically: Choose the right deployment pattern for your use case
Monitor comprehensively: Instrument and measure what matters
Upgrade smoothly: Understand versioning and backward compatibility

This isn't a simple library. It's a foundation for building intelligent systems that interact with your infrastructure. And now you understand how it works, from message entry to response exit, with all the safety rails in place. You can deploy it with confidence, knowing that the design is solid and that you can reason about its behavior at every level.

The architects and engineers who built this SDK made intentional decisions at every layer. They prioritized clarity over cleverness, robustness over performance, flexibility over prescriptiveness. Those decisions create a system that can grow with your needs—from a simple toy project to a mission-critical component of your infrastructure.

-iNet

The High-Level Mental Model

Core Components and Their Responsibilities

1. The Session Manager

2. The Message Processor

3. The Tool Registry and Dispatcher

4. The Built-in Tool Executors

5. The Message Flow Controller

The Session Lifecycle and State Machine

1. CREATE: Constructor Runs

2. CONFIGURE: Optional Customization

3. READY: Session Awaits Input

4. MESSAGE_PROCESSING: Message Arrives

5. TOOL_EXECUTION: If Needed

6. RESPONSE: Final Output

7. IDLE: Ready for Next Message

Message Format and the Tool Calling Protocol

What Claude Receives (System Context)

How Claude Calls Tools (Tool Use Format)

Extension Points and the Plugin Architecture

1. Custom Tools

2. Custom System Prompts

3. Message Interceptors

4. Custom Validators

Error Handling and Recovery

1. Input Validation Errors

2. Tool Validation Errors

3. Tool Execution Errors

4. API Errors

Performance Characteristics and Optimization

1. Reduce Tool Calls

2. Batch Operations

3. Cache Frequently Accessed Data

4. Stream Responses

Debugging and Introspection

Concurrency and Parallelism: Handling Multiple Sessions

Session Isolation

Queuing and Rate Limiting

Memory Management and Context Windows

1. Summarization

2. Rolling Windows

3. Selective Retention

Testing and Mocking

Real-World Deployment Patterns

Single-Machine Development Setup

Server-Based Deployment with Session Pooling

Multi-Tenant Deployment

Performance Profiling and Optimization

Backward Compatibility and Versioning

Deployment Patterns and Real-World Configurations

Advanced Memory Management Strategies

Observability and Debugging in Production

The Evolution of Patterns: From Prototyping to Scale

The Reliability Mindset

Conclusion: The Design Principles

Need help implementing this?