What Is the Agent SDK, and Why Does It Exist?

The Claude Code Agent SDK is a programmatic interface to Claude Code. Instead of using the web UI, you call it from your code. You create a session, send messages, handle tool invocations (like file reads, bash commands, and browser automation), and stream responses back to your application logic.

Think of it this way: Claude Code in the web UI is a complete, batteries-included experience. The Agent SDK strips that down to the essentials and gives you the building blocks. You decide what tools to expose, what system prompt to use, what working directory to operate in, and how to integrate the results into your own workflow.

Why would you want this?

Better developer ergonomics: Your team doesn't need to context-switch to Claude Code's web interface. They invoke it from within their IDE, deployment dashboard, or internal portal. Imagine a developer in VS Code right-clicking a file and selecting "Analyze with Claude Code"—no tab-switching, no copy-pasting, seamless integration.

Automation at scale: You can spawn dozens of Claude Code sessions programmatically, letting it handle code review, refactoring, or testing tasks without human interaction. Picture your CI/CD pipeline automatically analyzing every pull request with Claude Code, generating detailed reviews while humans sleep. That's the scale we're talking about.

Custom integration: You control which tools Claude has access to. Maybe you want to sandbox file operations to a specific directory. Maybe you want to log all executed commands to a central audit system. Maybe you want to pipe results into your custom dashboard or Slack channel. You build it.

Compliance and security: You own the session—you control the system prompt, the working directory, the model version, and the API key. Perfect for enterprises that need audit trails, fine-grained permissions, and the ability to verify every action Claude takes. You're not delegating security to a third party; you're implementing it yourself.

Cost optimization: Run multiple sessions in parallel for cheaper models (Haiku, Sonnet), use expensive models (Opus) only when needed, and optimize based on task complexity. You have fine-grained control over cost allocation per use case.

In short: the Agent SDK is Claude Code as a library, not a service. You're no longer consuming it through a web UI; you're embedding it as a capability within your own tools.

Installing and Initializing the Agent SDK

Getting started is refreshingly simple. You need Node.js 18+ and an Anthropic API key.

Installation is a single npm command:

bash

npm install @anthropic-ai/claude-code

Or if you're using yarn:

bash

yarn add @anthropic-ai/claude-code

Or pnpm:

bash

pnpm add @anthropic-ai/claude-code

Once installed, you can import the SDK and create your first session. Here's the minimal setup:

typescript

import { ClaudeCodeSession } from "@anthropic-ai/claude-code";
 
const session = new ClaudeCodeSession({
  apiKey: process.env.ANTHROPIC_API_KEY,
  workingDirectory: "/tmp/my-project",
  model: "claude-3-5-sonnet-20241022",
});
 
// Send a message
const response = await session.message("What files are in this directory?");
console.log(response.text);

That's it. You've created a Claude Code session and asked it a question. The SDK handles authentication, model selection, and communication with Anthropic's API. But the real power emerges when you start configuring the session to your needs.

Configuring Sessions: Permissions, Tools, and Context

The SDK gives you fine-grained control over what Claude Code can do. When you create a session, you pass an options object that shapes its behavior.

Here's the full landscape of configuration:

Model Selection

You specify which Claude model powers the session:

typescript

const session = new ClaudeCodeSession({
  apiKey: process.env.ANTHROPIC_API_KEY,
  model: "claude-3-5-sonnet-20241022", // Latest fast model
});

Why this matters: Different models have different capabilities and costs. Sonnet is fast and cost-effective for routine analysis. Opus is most powerful but slower and more expensive—use it for security reviews or complex refactoring. Haiku is ultralight for simple tasks like counting lines or finding patterns. Choose based on your use case and budget.

Real-world consideration: A code quality gate running on every PR? Use Sonnet. An annual security audit? Opus. Searching for deprecated APIs across your codebase? Haiku. Don't use one model for everything.

Working Directory and Sandbox Boundaries

This is crucial. You specify where Claude Code operates:

typescript

const session = new ClaudeCodeSession({
  apiKey: process.env.ANTHROPIC_API_KEY,
  workingDirectory: "/home/user/projects/my-app",
});

Claude will treat this directory as its "current working directory." File operations are scoped to this location and its subdirectories. This is your security boundary. If you don't want Claude modifying system files, don't give it write access outside your sandbox.

Real example: You're running an automated code review. You set workingDirectory to the PR branch checked out to a temporary location like /tmp/pr-review-12345. Claude can read and analyze code, but it can't touch production systems, other developers' branches, or anything else outside that directory. It's a hard wall.

Pro tip: Always use absolute paths, never relative paths. ./my-app changes meaning depending on where your process runs. /home/user/projects/my-app is unambiguous.

Enabling and Disabling Tools

You control which tools Claude has access to:

typescript

const session = new ClaudeCodeSession({
  apiKey: process.env.ANTHROPIC_API_KEY,
  workingDirectory: "/tmp/sandbox",
  tools: {
    bash: true, // Can execute shell commands
    file_read: true, // Can read files
    file_write: true, // Can write files
    file_search: true, // Can search directory trees
    browser: false, // Cannot automate browsers
  },
});

This is powerful for compliance. Running an automated test suite? Disable file_write and browser so Claude can't accidentally modify test configuration or launch unintended browser sessions. Building a code analyzer? Enable file_read and bash, but disable file_write to prevent modifications.

Tool availability matrix:

Tool	Use Case	Example	Risk
`file_read`	Analyzing code	Code review	Low—read-only
`file_write`	Generating fixes	Refactoring	High—modifications
`bash`	Running tests	Test analysis	Very high—arbitrary execution
`file_search`	Finding patterns	Migration analysis	Low—read-only
`browser`	Testing UIs	E2E test automation	High—external side effects

Real consideration: A single runaway bash command could delete your entire production database. Always ask: "Does Claude really need this tool for this task?" Fewer tools = smaller attack surface.

System Prompt Customization

You can inject a custom system prompt to shape Claude's behavior:

typescript

const session = new ClaudeCodeSession({
  apiKey: process.env.ANTHROPIC_API_KEY,
  workingDirectory: "/project",
  systemPrompt: `You are a code reviewer for our TypeScript monorepo.
Your job is to:
1. Check for type safety issues
2. Verify error handling
3. Ensure consistent style with our linter rules
4. Flag performance concerns
5. Identify security vulnerabilities
 
Do NOT modify code without explicit approval.
Do NOT run tests automatically.
Do NOT suggest frameworks we don't use.
Our tech stack: TypeScript, React, Express, PostgreSQL.`,
});

Now every message Claude processes includes this context. It shapes its responses and behavior toward your specific need. This is how you convert a general-purpose AI into a specialized agent for your workflow.

Pro strategy: Make your system prompt specific enough to guide behavior, but general enough to adapt. Too specific and Claude gets stuck; too general and it loses focus.

Timeouts and Rate Limiting

For production systems, you'll want to control resource consumption:

typescript

const session = new ClaudeCodeSession({
  apiKey: process.env.ANTHROPIC_API_KEY,
  workingDirectory: "/tmp/analysis",
  timeout: 30000, // 30-second timeout per message
  maxToolCalls: 50, // Limit tool invocations
  retryPolicy: {
    maxRetries: 2,
    backoffMs: 1000,
  },
});

Why this matters: A runaway session could consume API quota or hang indefinitely. Timeouts and rate limits keep your costs predictable and your system responsive. A single forgotten timeout in production could cost you thousands.

Real numbers: A message that takes 3 minutes to complete across a 100-PR batch = 5 hours of compute. A message that takes 30 seconds = 50 minutes. The difference is configuration.

Spawning and Managing Sessions Programmatically

Now we get to the fun part: actually using the SDK in your application.

A typical workflow looks like: create a session, send messages, handle tool invocations, and stream results back to your app.

Basic Message Flow

typescript

import { ClaudeCodeSession } from "@anthropic-ai/claude-code";
 
async function analyzeCode(codeDirectory: string) {
  const session = new ClaudeCodeSession({
    apiKey: process.env.ANTHROPIC_API_KEY,
    workingDirectory: codeDirectory,
    model: "claude-3-5-sonnet-20241022",
    tools: {
      file_read: true,
      file_search: true,
      bash: false, // Don't execute anything
    },
  });
 
  // Send a request
  const response = await session.message(
    "Find all TODO comments in this codebase and summarize them by file.",
  );
 
  console.log(response.text);
 
  // Clean up
  await session.close();
}
 
// Usage
analyzeCode("/home/user/my-repo");

This is straightforward: you create a session, send a message, get back a response, and close the session. But Claude Code is event-driven. When it needs to execute a tool (like searching for files), it emits a tool-use event. Your app needs to handle that.

Handling Tool Invocations

Claude doesn't directly execute tools—it requests them. Your application decides whether to grant each request:

typescript

import {
  ClaudeCodeSession,
  ToolUseEvent,
  ToolResultEvent,
} from "@anthropic-ai/claude-code";
 
async function codeReview(prDirectory: string) {
  const session = new ClaudeCodeSession({
    apiKey: process.env.ANTHROPIC_API_KEY,
    workingDirectory: prDirectory,
  });
 
  // Listen for tool invocation requests
  session.on("toolUse", async (event: ToolUseEvent) => {
    console.log(`Claude requested: ${event.toolName}`);
    console.log(`Input: ${JSON.stringify(event.input, null, 2)}`);
 
    // You decide whether to allow it
    if (event.toolName === "file_write" && isNotAllowed(event.input.path)) {
      session.submitToolResult(event.toolId, {
        success: false,
        error: "Cannot write to that directory.",
      });
      return;
    }
 
    // If allowed, execute and return result
    const result = await executeToolLocally(event.toolName, event.input);
    session.submitToolResult(event.toolId, result);
  });
 
  // Now ask Claude to do work
  const response = await session.message(
    "Review the code changes in this PR for security issues.",
  );
 
  console.log(response.text);
  await session.close();
}
 
async function executeToolLocally(
  toolName: string,
  input: Record<string, unknown>,
) {
  // Your custom tool execution logic
  // Could shell out to actual commands, call your own APIs, etc.
  if (toolName === "bash") {
    // Execute with controlled environment
    // Log the command, verify it doesn't touch sensitive paths
    // Return result
  }
  // ... handle other tools
}

This pattern gives you complete control. You can:

Log every tool invocation for audit trails and compliance
Reject unsafe operations (e.g., don't let Claude delete production databases)
Replace real tools with mocks (for testing or sandboxing)
Inject custom tools that don't exist in Claude Code's standard set
Rate-limit operations (fail after N file operations, etc.)
Add telemetry to track what Claude is doing

Streaming Responses

For long-running tasks, you want to stream results back to the user as they arrive, not wait for everything to finish:

typescript

async function streamCodeAnalysis(repoPath: string) {
  const session = new ClaudeCodeSession({
    apiKey: process.env.ANTHROPIC_API_KEY,
    workingDirectory: repoPath,
  });
 
  // Stream text as it arrives
  const stream = session.messageStream(
    "Analyze code complexity and suggest refactoring opportunities.",
  );
 
  for await (const chunk of stream) {
    if (chunk.type === "text") {
      process.stdout.write(chunk.text);
    } else if (chunk.type === "toolUse") {
      console.log(`\n[Tool: ${chunk.toolName}]`);
      // Handle tool use
    }
  }
 
  await session.close();
}

Streaming is essential for user experience. Nobody wants to wait 30 seconds for a response to appear all at once. With streaming, Claude's analysis starts flowing to the user immediately, creating a sense of progress.

Concurrency Patterns and Resource Management

Before we dive into managing multiple sessions, it's worth understanding the resource implications. Each session:

Maintains a connection to Anthropic's API
Holds state in memory (conversation history, file handles)
May consume API quota
Uses bandwidth and compute resources

Running 100 concurrent sessions is possible but expensive. Running 1,000 could exhaust your API quota in minutes. You need concurrency patterns that balance throughput with cost and reliability.

Sequential processing (one session at a time) is safe but slow. For a batch of 100 PRs, it could take hours.

Bounded concurrency (5-10 sessions in parallel) is the sweet spot. You get good throughput without overwhelming the API.

Unbounded concurrency (spawn as many as you want) will crash your costs and API limits.

Here's a helper function that implements bounded concurrency:

typescript

async function processBatch<T, R>(
  items: T[],
  processor: (item: T) => Promise<R>,
  concurrency: number = 5,
): Promise<R[]> {
  const results: R[] = [];
  const executing: Promise<any>[] = [];
 
  for (const item of items) {
    const promise = processor(item)
      .then((result) => {
        results.push(result);
      })
      .catch((error) => {
        console.error(`Processing failed for item: ${error}`);
        results.push(null as any); // Or handle error differently
      });
 
    executing.push(promise);
 
    if (executing.length >= concurrency) {
      // Wait for the first promise to complete
      await Promise.race(executing);
      executing.splice(
        executing.findIndex((p) => p === promise),
        1,
      );
    }
  }
 
  // Wait for remaining promises
  await Promise.all(executing);
  return results;
}
 
// Usage: process 100 PRs with max 5 concurrent sessions
const results = await processBatch(
  prBranches,
  (branch) => reviewPullRequest(branch),
  5, // Concurrency limit
);

This pattern ensures you never exceed your concurrency limit while processing large batches efficiently.

Tuning guide:

Start with 5 concurrent sessions
Monitor API response times and error rates
If response times stay under 2 seconds, increase to 7-8
If you see rate limit errors, decrease back down
Different tasks have different sweet spots

Managing Multiple Sessions

Here's where the SDK truly shines: you can spawn many sessions in parallel, each handling a different task:

typescript

import { ClaudeCodeSession } from "@anthropic-ai/claude-code";
 
async function reviewPullRequests(prBranches: string[]) {
  // Create a session for each PR
  const sessionPromises = prBranches.map(async (branch) => {
    const session = new ClaudeCodeSession({
      apiKey: process.env.ANTHROPIC_API_KEY,
      workingDirectory: `/tmp/pr-${branch}`,
      model: "claude-3-5-sonnet-20241022",
      tools: {
        file_read: true,
        file_search: true,
        bash: true,
        file_write: false, // Read-only for safety
      },
      systemPrompt: `You are a code reviewer. Focus on:
- Type safety and correctness
- Performance implications
- Test coverage
- Documentation updates`,
    });
 
    try {
      const response = await session.message(
        `Review the code changes in branch ${branch}.
        Identify:
        1. Bugs or logical errors
        2. Missing error handling
        3. Incomplete tests
        4. Documentation gaps`,
      );
 
      return {
        branch,
        review: response.text,
        status: "success",
      };
    } catch (error) {
      return {
        branch,
        error: (error as Error).message,
        status: "failed",
      };
    } finally {
      await session.close();
    }
  });
 
  // Wait for all reviews to complete
  const reviews = await Promise.all(sessionPromises);
 
  // Generate report
  for (const review of reviews) {
    if (review.status === "success") {
      console.log(`\n=== ${review.branch} ===`);
      console.log(review.review);
    } else {
      console.error(`Review failed for ${review.branch}: ${review.error}`);
    }
  }
}
 
// Usage: review 10 PRs in parallel
reviewPullRequests([
  "feature/auth-v2",
  "bugfix/socket-leak",
  "chore/deps-update",
  // ... 7 more
]);

Now instead of a human spending hours reviewing code, Claude reviews multiple PRs in parallel. Each session is isolated, configured identically, and reports results back to your system. This is the power of programmatic spawning.

Real-World Use Cases

Let's ground this in concrete scenarios where the Agent SDK transforms how teams work.

Use Case 1: Internal Code Review Portal

Your company has a custom code review portal. Developers open PRs, and your portal displays diffs. You want to augment it with Claude Code's analysis.

typescript

// In your API endpoint that handles PR review requests
import { ClaudeCodeSession } from "@anthropic-ai/claude-code";
import express from "express";
import { execSync } from "child_process";
 
const app = express();
 
app.post("/api/pr/:prId/analyze", async (req, res) => {
  const { prId } = req.params;
 
  // Check out the PR branch
  const tempDir = `/tmp/pr-analysis-${prId}`;
  execSync(`git clone --branch pr-${prId} . ${tempDir}`);
 
  // Create a Claude Code session for this specific PR
  const session = new ClaudeCodeSession({
    apiKey: process.env.ANTHROPIC_API_KEY,
    workingDirectory: tempDir,
    model: "claude-3-5-sonnet-20241022",
    tools: {
      file_read: true,
      file_search: true,
      bash: true,
      file_write: false,
    },
    systemPrompt: `You are a code reviewer for our company.
Focus on security, performance, maintainability, and test coverage.
Be concise but thorough.`,
  });
 
  try {
    // Stream the analysis back to the client
    res.setHeader("Content-Type", "text/event-stream");
    res.setHeader("Cache-Control", "no-cache");
    res.setHeader("Connection", "keep-alive");
 
    const stream = session.messageStream(
      `Perform a thorough code review of the changes in this PR.
      Identify issues in these areas:
      1. Security vulnerabilities
      2. Performance concerns
      3. Code style violations
      4. Missing error handling
      5. Test coverage gaps
      6. Documentation issues`,
    );
 
    for await (const chunk of stream) {
      if (chunk.type === "text") {
        res.write(`data: ${JSON.stringify({ text: chunk.text })}\n\n`);
      }
    }
 
    res.write("data: [DONE]\n\n");
    res.end();
  } catch (error) {
    res.status(500).json({
      error: (error as Error).message,
    });
  } finally {
    await session.close();
    execSync(`rm -rf ${tempDir}`);
  }
});
 
app.listen(3000);

Now developers open a PR in your portal, click "Analyze with Claude Code," and get a detailed review streamed in real time. No context-switching, no manual copy-pasting. The review is integrated directly into your workflow.

Use Case 2: Automated Testing and Quality Gates

Your CI/CD pipeline runs tests, but you want to add static analysis and code quality checks powered by Claude Code.

typescript

import { ClaudeCodeSession } from "@anthropic-ai/claude-code";
 
async function codeQualityGate(commitHash: string) {
  console.log(`Running code quality gate for ${commitHash}...`);
 
  const session = new ClaudeCodeSession({
    apiKey: process.env.ANTHROPIC_API_KEY,
    workingDirectory: process.cwd(),
    model: "claude-3-5-sonnet-20241022",
    tools: {
      file_read: true,
      file_search: true,
      bash: true,
      file_write: false,
    },
    timeout: 120000, // 2 minutes max
  });
 
  try {
    const response = await session.message(
      `Analyze the code changes in commit ${commitHash}.
      Run the test suite and report:
      1. Test results (pass/fail)
      2. Code coverage changes
      3. Any new warnings or errors
      4. Potential issues or improvements
 
      Format your response as JSON with fields: tests_passed, coverage_change, warnings, recommendations.`,
    );
 
    // Parse the response
    const result = JSON.parse(response.text);
 
    if (!result.tests_passed) {
      console.error("Tests failed. Blocking merge.");
      process.exit(1);
    }
 
    if (result.coverage_change < -5) {
      console.error("Coverage dropped >5%. Blocking merge.");
      process.exit(1);
    }
 
    console.log("Quality gate passed.");
    console.log(`Coverage change: ${result.coverage_change}%`);
    console.log(`Recommendations: ${result.recommendations}`);
 
    process.exit(0);
  } catch (error) {
    console.error(`Quality gate failed: ${(error as Error).message}`);
    process.exit(1);
  } finally {
    await session.close();
  }
}
 
// Called from CI pipeline
codeQualityGate(process.env.COMMIT_HASH || "HEAD");

Now every commit automatically triggers a Claude Code analysis. If tests fail or coverage drops, the merge is blocked. This is a quality gate that actually understands code, not just running mechanical linters.

Use Case 3: Developer IDE Integration

You're building an IDE plugin. Developers select a file and request "Claude Code analysis." Your plugin spawns a session and displays results in a sidebar.

typescript

import { ClaudeCodeSession } from "@anthropic-ai/claude-code";
import vscode from "vscode";
 
export async function activateClaudeCodePlugin(
  context: vscode.ExtensionContext,
) {
  // Register command: "Analyze this file with Claude Code"
  const analyzeCommand = vscode.commands.registerCommand(
    "claude-code-plugin.analyzeFile",
    async () => {
      const editor = vscode.window.activeTextEditor;
      if (!editor) {
        vscode.window.showErrorMessage("No active editor.");
        return;
      }
 
      const filePath = editor.document.fileName;
      const workspaceRoot = vscode.workspace.workspaceFolders?.[0].uri.fsPath;
 
      if (!workspaceRoot) {
        vscode.window.showErrorMessage("Not in a workspace.");
        return;
      }
 
      // Show progress
      vscode.window.withProgress(
        {
          location: vscode.ProgressLocation.Notification,
          title: "Analyzing...",
        },
        async (progress) => {
          const session = new ClaudeCodeSession({
            apiKey: process.env.ANTHROPIC_API_KEY,
            workingDirectory: workspaceRoot,
            model: "claude-3-5-sonnet-20241022",
            tools: {
              file_read: true,
              file_search: true,
              bash: false, // Don't execute anything
            },
          });
 
          try {
            progress.report({ increment: 30 });
 
            const response = await session.message(
              `Analyze this file: ${filePath}
              Provide:
              1. Summary of what this code does
              2. Any potential bugs or issues
              3. Suggestions for improvement
              4. Type safety checks (if applicable)`,
            );
 
            progress.report({ increment: 70 });
 
            // Display results in a panel
            const panel = vscode.window.createWebviewPanel(
              "claudeCodeAnalysis",
              "Claude Code Analysis",
              vscode.ViewColumn.Two,
              {},
            );
 
            panel.webview.html = `
              <!DOCTYPE html>
              <html>
              <head>
                <style>
                  body { font-family: -apple-system, BlinkMacSystemFont, sans-serif; padding: 20px; }
                  h2 { color: #333; }
                  p { line-height: 1.6; color: #666; }
                </style>
              </head>
              <body>
                <h2>Claude Code Analysis</h2>
                <div>${response.text.replace(/\n/g, "<br/>")}</div>
              </body>
              </html>
            `;
 
            progress.report({ increment: 100 });
          } finally {
            await session.close();
          }
        },
      );
    },
  );
 
  context.subscriptions.push(analyzeCommand);
}

Now developers can analyze code without leaving VS Code. They get Claude's insights inline, integrated seamlessly into their development workflow.

Real-World Integration Patterns

Before diving into pitfalls, let's ground this in how teams actually use the Agent SDK. Understanding these patterns helps you design your own integration correctly.

Pattern 1: The Code Review Bot

Many teams deploy Claude Code as a GitHub bot that automatically reviews every PR. Here's how the pattern works in practice:

When a PR is created, a GitHub webhook triggers a Lambda function. That function:

Checks out the PR branch
Creates a Claude Code session with that branch as the working directory
Sends a comprehensive code review request
Posts the analysis back to the PR as a comment
Updates a check status (passing with warnings, or blocking if critical issues found)

The beauty of this pattern is that it provides instant feedback to developers. Instead of waiting for a human reviewer, developers get Claude's analysis while the code is fresh in their mind. The analysis includes specific file paths, line numbers, and actionable suggestions. Over time, developers internalize these patterns and write better code upfront.

One team using this pattern reported a 30% reduction in code review comments related to "style" and "common mistakes" within the first month. Reviewers could focus on architecture and design instead of mechanical issues.

The cost? Running Claude on every PR (let's say 50-100 PRs per week with a team of 20) costs roughly $2-5 per PR with efficient configuration. That's $100-500/week, or $5,000-25,000 per year. Compare that to the opportunity cost of a human spending 30 minutes per PR: that's 25-50 hours per week. Even one senior engineer spending a quarter of their time on code review justifies the automation cost.

Pattern 2: The Compliance Checker

Regulated industries (healthcare, finance, legal) need to ensure code meets specific compliance standards. A team in healthcare built a Claude Code agent that runs on every PR and checks for:

Patient data exposure in logs
Hardcoded credentials (PII, tokens)
Weak cryptography
Missing audit logging
Insufficient input validation for PHI (Protected Health Information)

The agent has access to a custom system prompt that includes their industry compliance requirements. Each PR generates a compliance report that gets attached to the GitHub PR as well as pushed to a compliance dashboard. If critical violations are found, the PR fails its compliance check and can't be merged.

This team moved from annual compliance audits (expensive, find issues months after they're written) to continuous compliance checking (cheap, catch issues immediately). The compliance overhead for developers is near-zero—the system runs in the background and reports violations when found.

Pattern 3: The AI-Powered Refactoring Pipeline

Another common pattern: teams use Claude Code to refactor codebases at scale. For example, migrating from one framework to another.

When a team decided to migrate from Redux to Redux Toolkit, they:

Created a branch with the full main codebase
Spawned a Claude Code session
Sent a message: "Automatically refactor all Redux patterns to Redux Toolkit. Generate a PR with all changes."
Claude generated 200+ modified files with systematic refactoring
Team reviewed the generated changes, made adjustments, and merged

This would have taken a contractor weeks. Claude Code did the bulk work in hours. Humans validated and adjusted. The pattern shifts from "write code manually" to "generate code, then validate."

Pattern 4: The Documentation Generator

Some teams use Claude Code to generate API documentation, changelog entries, or architecture diagrams from code. The process:

Create a session pointed at a codebase version
Request: "Generate comprehensive API documentation for all public methods"
Claude reads the code, generates markdown documentation
Team commits documentation alongside code
Next release cycle, run again to keep docs fresh

This addresses the age-old problem: code gets updated, documentation becomes stale. With Claude generating docs from code, you have documentation that's automatically in sync with the actual code.

Key Pitfalls to Avoid

As you integrate the Agent SDK, watch out for these common gotchas:

Not setting a working directory boundary: If you don't specify workingDirectory, Claude operates on your entire filesystem. Always sandbox to a specific path. This is both a security issue and a practical one—unbounded file operations are slow and dangerous. Imagine Claude searching for .js files and recursively traversing your entire home directory. It's wasted API calls, wasted time, and a potential security vulnerability. Scope everything aggressively.

Forgetting to close sessions: Each session consumes resources. Always call await session.close() in a finally block or use resource management patterns. Leaking sessions will drain your API quota. In production, a single forgotten session can cost hundreds of dollars per day if it's repeatedly spawned. Use try/finally or async resource managers to guarantee cleanup.

Enabling all tools by default: Just because Claude can execute bash commands doesn't mean it should in your use case. Explicitly enable only the tools you need. This is a safety boundary and a clarity signal about what Claude is supposed to do. If you're building a code analyzer, you don't need file_write or browser. If you're running security checks, you don't need file_write. Less is more.

Not handling tool invocation errors: When Claude requests a tool, your code might fail to execute it (permission denied, file not found, etc.). Always return an error result so Claude knows what happened and can adapt. If a file doesn't exist, tell Claude that instead of silently returning null. If a command fails, include the error message. Claude learns from these signals and can take corrective action.

Timeout configuration too loose: If you don't set a timeout, a runaway session could consume resources indefinitely. Set reasonable timeouts (30 seconds for interactive tasks, maybe 5 minutes for deep analysis) and respect them. A single runaway session could consume your entire API monthly quota in hours. Timeouts are not overhead—they're essential protection.

Mixing models without considering trade-offs: Sonnet is faster and cheaper. Opus is more capable but slower and more expensive. Haiku is ultralight for simple tasks but less capable. Don't just pick one and never revisit. Different tasks have different needs. A simple "find TODO comments" task? Use Haiku. A complex security review? Use Opus. A routine code quality check? Sonnet is your sweet spot.

Assuming tool output is always correct: Claude is powerful, but it's not infallible. If you're using Claude to generate SQL or shell commands, validate them before executing. Log all invocations for audit trails. A single misgenerated rm -rf command could be catastrophic. Always review, validate, or sandbox destructive operations. This is non-negotiable.

Not logging tool invocations: For compliance, debugging, and incident response, you need a record of everything Claude did. Every file Claude read, every bash command it requested, every API call it made. Store these logs with timestamps, session IDs, and outcomes. When something goes wrong, logs are your lifeline.

Cost Optimization and Scaling Considerations

The Agent SDK's flexibility comes with responsibility for cost management. A naïve implementation could cost thousands per month. A thoughtful implementation costs hundreds. Understanding the cost levers helps you design for scale.

The Cost Structure

Each Claude Code session has several cost components:

API calls: Every message you send to Claude costs tokens. Input tokens are cheap; output tokens cost more.
Tool execution overhead: When Claude requests tools (file reads, bash execution), there's a latency cost. Slow tool execution means longer session durations, more API quota consumed, and higher costs.
Session overhead: Maintaining session state costs memory and compute. High concurrency (100+ sessions) costs more than low concurrency.
Model selection: Haiku costs ~$0.80 per 1M input tokens. Sonnet costs ~$3 per 1M. Opus costs ~$15 per 1M. Using the right model for the right task is essential.

A typical code review session might:

Send 1,500 input tokens (your code + prompt)
Claude responds with 2,500 output tokens (analysis)
Make 20 tool calls (file reads, grep searches)
Complete in 30 seconds total

At Sonnet prices:

Input: 1,500 * $3 / 1,000,000 = $0.0045
Output: 2,500 * $3 / 1,000,000 = $0.0075
Total: ~$0.012 per PR review

With 100 PRs per week: $1.20/week, $62/year. That's noise.

But if you're inefficient:

Sending the entire codebase as context each time (10,000 input tokens)
Making 200 tool calls instead of 20 (slow, redundant searches)
Running Opus instead of Sonnet (5x cost)
Session timeouts causing retries (API quota wasted)

Suddenly you're looking at $2-5 per PR. Same task, 100-200x higher cost.

Cost Optimization Strategies

Strategy 1: Model Selection by Task

Different tasks benefit from different models:

Haiku ($0.80/1M tokens): Simple pattern matching, script generation, data transformation, log parsing. Speed matters more than comprehension.
Sonnet ($3/1M tokens): Code review, analysis, refactoring, medium-complexity reasoning. The sweet spot for most tasks.
Opus ($15/1M tokens): Complex security analysis, architectural decisions, novel problems, high-stakes reviews.

A real example: One team had been running all code reviews on Opus. Switching to Sonnet for standard reviews and Opus for security-critical reviews reduced costs by 70% with better results (Opus was over-thinking routine issues).

Strategy 2: Caching and Reuse

If you're analyzing the same codebase repeatedly (weekly reviews, continuous scanning), cache the codebase analysis:

typescript

// First run: full analysis
const session = new ClaudeCodeSession({
  workingDirectory: "/repo",
});
const firstAnalysis = await session.message(
  "Analyze this codebase for code quality issues",
);
// Save the analysis to a database
 
// Second run: differential analysis
const changes = await getChangedFiles(); // Only changed since last scan
const secondAnalysis = await session.message(
  `Last analysis: [cached analysis from database]
   Changes since then: [list of modified files]
   Analyze only the changed files for regressions or new issues`,
);

This approach reuses the cached analysis and only analyzes what's new. Claude understands the previous state and can make incremental observations.

Strategy 3: Batch Processing

If you have many PRs to review, batch them efficiently:

typescript

// INEFFICIENT: Create new session for each PR
for (const pr of prs) {
  const session = new ClaudeCodeSession({ ... });
  const review = await session.message(`Review ${pr}`);
  await session.close();
}
 
// EFFICIENT: Single session analyzing multiple PRs
const session = new ClaudeCodeSession({ ... });
 
for (const pr of prs) {
  // Reuse same session, avoid overhead of creating/closing
  const review = await session.message(
    `Now analyze: ${pr.name}
     Path: ${pr.path}`
  );
}
 
await session.close();

Session creation and teardown have overhead. When analyzing multiple things, reuse sessions when possible.

Strategy 4: Limiting Tool Calls

The most expensive part of a session is often tool execution, not model time. If Claude makes 100 file read requests, that's 100 file I/O operations. Optimize by:

typescript

// Set limits
const session = new ClaudeCodeSession({
  maxToolCalls: 50, // Fail if exceeding 50 tool invocations
  timeout: 60000, // Fail if taking >60 seconds
});
 
// Give Claude smart starting points
const commonPatterns = await findCodePatterns();
const relevantFiles = await identifyRelevantFiles();
 
const response = await session.message(
  `You have access to these files for context:
   ${commonPatterns}
 
   Focus your analysis on these files which are most likely to have issues:
   ${relevantFiles}
 
   Find security vulnerabilities in the codebase.`,
);

By giving Claude good starting points, you reduce exploratory tool calls and focus on actual analysis.

Scaling to High Concurrency

When you move beyond dozens of sessions to hundreds, new considerations emerge.

Concurrent Session Limits: The Anthropic API has rate limits. High concurrency increases the chance of hitting them. Monitor your usage:

typescript

interface UsageMetrics {
  sessionsActive: number;
  successRate: number;
  avgDurationMs: number;
  apiErrorRate: number;
}
 
async function getUsageMetrics(): Promise<UsageMetrics> {
  // Track these in your monitoring system
  return {
    sessionsActive: sessions.length,
    successRate: successCount / totalCount,
    avgDurationMs: totalDuration / successCount,
    apiErrorRate: apiErrors / totalCount,
  };
}
 
// Adjust concurrency dynamically
const metrics = await getUsageMetrics();
if (metrics.apiErrorRate > 0.05) {
  // 5% error rate: reduce concurrency
  maxConcurrency = Math.max(1, maxConcurrency - 2);
} else if (metrics.apiErrorRate === 0 && metrics.avgDurationMs < 30000) {
  // Low errors and fast: can handle more
  maxConcurrency = Math.min(50, maxConcurrency + 1);
}

This adaptive approach helps you find the right concurrency level for your usage pattern.

Connection Pooling: If you're spawning sessions continuously, implement connection pooling to reuse sessions:

typescript

class SessionPool {
  private available: ClaudeCodeSession[] = [];
  private inUse = new Set<ClaudeCodeSession>();
  private maxSize = 10;
 
  async acquire(): Promise<ClaudeCodeSession> {
    if (this.available.length > 0) {
      const session = this.available.pop()!;
      this.inUse.add(session);
      return session;
    }
 
    if (this.inUse.size < this.maxSize) {
      const session = new ClaudeCodeSession({ ... });
      this.inUse.add(session);
      return session;
    }
 
    // Wait for session to be released
    await new Promise(resolve => this.once('available', resolve));
    return this.acquire();
  }
 
  async release(session: ClaudeCodeSession): Promise<void> {
    this.inUse.delete(session);
    this.available.push(session);
    this.emit('available');
  }
}

This pool reuses sessions across requests, reducing creation/teardown overhead and API quota waste.

Error Handling and Monitoring

In production, things will go wrong. Networks fail. Models hallucinate. Tool execution times out. Your integration needs to handle failure gracefully.

Here's a robust error handling pattern:

typescript

import { ClaudeCodeSession, SessionError } from "@anthropic-ai/claude-code";
import * as logger from "pino"; // Use your preferred logging library
 
const log = logger();
 
async function robustAnalysis(repoPath: string, analysisType: string) {
  const session = new ClaudeCodeSession({
    apiKey: process.env.ANTHROPIC_API_KEY,
    workingDirectory: repoPath,
    model: "claude-3-5-sonnet-20241022",
    timeout: 120000,
  });
 
  const sessionId = Math.random().toString(36).substr(2, 9);
 
  try {
    log.info({ sessionId, analysisType, repoPath }, "Starting analysis");
 
    session.on("toolUse", async (event) => {
      log.debug(
        { sessionId, tool: event.toolName, input: event.input },
        "Tool requested",
      );
 
      try {
        // Execute with safety checks
        if (event.toolName === "bash") {
          // Never allow destructive operations in production
          if (
            event.input.command?.includes("rm -rf") ||
            event.input.command?.includes("dd if=")
          ) {
            log.warn(
              { sessionId, command: event.input.command },
              "Blocking destructive command",
            );
            session.submitToolResult(event.toolId, {
              success: false,
              error: "Destructive commands are not allowed.",
            });
            return;
          }
        }
 
        // Execute the tool
        const result = await executeToolWithTimeout(
          event.toolName,
          event.input,
          30000,
        );
        log.debug({ sessionId, tool: event.toolName }, "Tool succeeded");
        session.submitToolResult(event.toolId, result);
      } catch (error) {
        log.error(
          { sessionId, tool: event.toolName, error: (error as Error).message },
          "Tool execution failed",
        );
        session.submitToolResult(event.toolId, {
          success: false,
          error: `Tool execution failed: ${(error as Error).message}`,
        });
      }
    });
 
    const response = await session.message(
      `Analyze this repository for ${analysisType} issues.
      Provide a structured report.`,
    );
 
    log.info({ sessionId }, "Analysis complete");
    return { success: true, result: response.text };
  } catch (error) {
    if (error instanceof SessionError) {
      log.error(
        {
          sessionId,
          code: error.code,
          message: error.message,
        },
        "Session error",
      );
 
      if (error.code === "TIMEOUT") {
        return {
          success: false,
          error: "Analysis timed out. The repository may be too large.",
        };
      }
 
      if (error.code === "RATE_LIMIT") {
        return {
          success: false,
          error: "API rate limit exceeded. Try again later.",
        };
      }
    }
 
    log.error(
      { sessionId, error: (error as Error).message },
      "Unexpected error",
    );
    return {
      success: false,
      error: "An unexpected error occurred during analysis.",
    };
  } finally {
    try {
      await session.close();
      log.info({ sessionId }, "Session closed");
    } catch (cleanupError) {
      log.warn(
        { sessionId, error: (cleanupError as Error).message },
        "Error closing session",
      );
    }
  }
}
 
async function executeToolWithTimeout(
  toolName: string,
  input: Record<string, unknown>,
  timeout: number,
): Promise<Record<string, unknown>> {
  return Promise.race([
    executeToolActually(toolName, input),
    new Promise<Record<string, unknown>>((_, reject) =>
      setTimeout(
        () => reject(new Error(`Tool execution timeout after ${timeout}ms`)),
        timeout,
      ),
    ),
  ]);
}
 
async function executeToolActually(
  toolName: string,
  input: Record<string, unknown>,
): Promise<Record<string, unknown>> {
  // Your implementation here
  // Log, validate, execute, return result
  return { success: true };
}

This pattern provides:

Structured logging with session IDs for request tracing
Command validation to prevent dangerous operations
Timeout protection at both the session and tool levels
Specific error handling for different failure modes
Guaranteed cleanup even when things go wrong
Audit trail of every tool invocation

Monitoring should also track:

Session success/failure rates
Average session duration
Tool execution success rates
API quota consumption
Error categories and frequencies
Model response times by task type

These metrics let you detect problems early and optimize your usage patterns.

Expected Output and Configuration Patterns

Here's a production-ready session configuration that balances power, safety, and cost:

typescript

import { ClaudeCodeSession, SessionOptions } from "@anthropic-ai/claude-code";
 
interface ReviewConfig {
  repoPath: string;
  reviewType: "pr" | "release" | "security";
  customPrompt?: string;
}
 
async function createConfiguredSession(config: ReviewConfig) {
  const basePrompt = `You are a code reviewer specialized in ${config.reviewType} reviews.
Be thorough but concise. Identify concrete issues with evidence.`;
 
  const options: SessionOptions = {
    apiKey: process.env.ANTHROPIC_API_KEY,
    workingDirectory: config.repoPath,
    model:
      config.reviewType === "security"
        ? "claude-3-opus-20250219" // Most capable for security
        : "claude-3-5-sonnet-20241022", // Fast & cost-effective for standard reviews
    tools: {
      file_read: true,
      file_search: true,
      bash: config.reviewType === "release", // Only allow bash for release checks
      file_write: false, // Never allow modifications
      browser: false, // Not needed for code review
    },
    systemPrompt: config.customPrompt || basePrompt,
    timeout: config.reviewType === "security" ? 300000 : 120000,
    maxToolCalls: 100,
  };
 
  return new ClaudeCodeSession(options);
}
 
// Usage
const session = await createConfiguredSession({
  repoPath: "/tmp/pr-feature",
  reviewType: "security",
});
 
const analysis = await session.message(
  "Check for XSS vulnerabilities, SQL injection risks, and authentication issues.",
);
 
console.log(analysis.text);
await session.close();

This pattern gives you:

Model selection based on task type (security → Opus, standard → Sonnet)
Tool restrictions appropriate to the use case
Configurable timeouts scaled to task complexity
Consistent system prompts that guide behavior
Type safety through TypeScript
Reusability through a configurable factory

Summary

The Claude Code Agent SDK transforms Claude from a web-based tool into a programmable capability you can embed in your own applications. You control the security boundaries, the tools available, the system prompt, and the model. You manage sessions programmatically, handle tool invocations, stream results, and spawn parallel workers.

Whether you're building an internal code review portal, adding quality gates to your CI/CD pipeline, or integrating Claude Code directly into your IDE, the SDK gives you the leverage to embed AI-powered code assistance wherever your developers work.

The key is thoughtful configuration: always sandbox your working directory, enable only necessary tools, set reasonable timeouts, and handle tool invocation events with care. Done right, you're not just using Claude Code—you're embedding it as a first-class capability in your developer workflow.

Start small: create a session, send a message, handle the response. Then expand from there. Your developers (and your code quality) will thank you.

-iNet

Agent Sdk Embedding Claude Code

What Is the Agent SDK, and Why Does It Exist?

Installing and Initializing the Agent SDK

Configuring Sessions: Permissions, Tools, and Context

Model Selection

Working Directory and Sandbox Boundaries

Enabling and Disabling Tools

System Prompt Customization

Timeouts and Rate Limiting

Spawning and Managing Sessions Programmatically

Basic Message Flow

Handling Tool Invocations

Streaming Responses

Concurrency Patterns and Resource Management

Managing Multiple Sessions

Real-World Use Cases

Use Case 1: Internal Code Review Portal

Use Case 2: Automated Testing and Quality Gates

Use Case 3: Developer IDE Integration

Real-World Integration Patterns

Pattern 1: The Code Review Bot

Pattern 2: The Compliance Checker

Pattern 3: The AI-Powered Refactoring Pipeline

Pattern 4: The Documentation Generator

Key Pitfalls to Avoid

Cost Optimization and Scaling Considerations

The Cost Structure

Cost Optimization Strategies

Scaling to High Concurrency

Error Handling and Monitoring

Expected Output and Configuration Patterns

Summary

Need help implementing this?