
You've probably felt the friction: your team needs Claude Code's power, but the web interface isn't enough. Maybe you're building an internal developer tool and want to offer Claude Code's capabilities inside your app. Or you're running CI/CD pipelines and need to spawn automated code review sessions programmatically. That's where the Claude Code Agent SDK comes in.
The Agent SDK (@anthropic-ai/claude-code) lets you embed Claude Code directly into Node.js and TypeScript applications. Instead of your developers jumping between tabs and UIs, they work within your custom tools. Instead of manually triggering code reviews, your pipeline automates them. This isn't just convenience—it's a fundamental shift in how you can architect developer tooling.
In this article, we'll explore what the Agent SDK is, why you'd want to use it, how to install and initialize it, how to spawn and manage sessions, real-world use cases that show its power, and production patterns for error handling and monitoring. By the end, you'll understand how to embed Claude Code into your own applications and unlock capabilities your team didn't know were possible.
Table of Contents
- What Is the Agent SDK, and Why Does It Exist?
- Installing and Initializing the Agent SDK
- Configuring Sessions: Permissions, Tools, and Context
- Model Selection
- Working Directory and Sandbox Boundaries
- Enabling and Disabling Tools
- System Prompt Customization
- Timeouts and Rate Limiting
- Spawning and Managing Sessions Programmatically
- Basic Message Flow
- Handling Tool Invocations
- Streaming Responses
- Concurrency Patterns and Resource Management
- Managing Multiple Sessions
- Real-World Use Cases
- Use Case 1: Internal Code Review Portal
- Use Case 2: Automated Testing and Quality Gates
- Use Case 3: Developer IDE Integration
- Real-World Integration Patterns
- Pattern 1: The Code Review Bot
- Pattern 2: The Compliance Checker
- Pattern 3: The AI-Powered Refactoring Pipeline
- Pattern 4: The Documentation Generator
- Key Pitfalls to Avoid
- Cost Optimization and Scaling Considerations
- The Cost Structure
- Cost Optimization Strategies
- Scaling to High Concurrency
- Error Handling and Monitoring
- Expected Output and Configuration Patterns
- Summary
What Is the Agent SDK, and Why Does It Exist?
The Claude Code Agent SDK is a programmatic interface to Claude Code. Instead of using the web UI, you call it from your code. You create a session, send messages, handle tool invocations (like file reads, bash commands, and browser automation), and stream responses back to your application logic.
Think of it this way: Claude Code in the web UI is a complete, batteries-included experience. The Agent SDK strips that down to the essentials and gives you the building blocks. You decide what tools to expose, what system prompt to use, what working directory to operate in, and how to integrate the results into your own workflow.
Why would you want this?
Better developer ergonomics: Your team doesn't need to context-switch to Claude Code's web interface. They invoke it from within their IDE, deployment dashboard, or internal portal. Imagine a developer in VS Code right-clicking a file and selecting "Analyze with Claude Code"—no tab-switching, no copy-pasting, seamless integration.
Automation at scale: You can spawn dozens of Claude Code sessions programmatically, letting it handle code review, refactoring, or testing tasks without human interaction. Picture your CI/CD pipeline automatically analyzing every pull request with Claude Code, generating detailed reviews while humans sleep. That's the scale we're talking about.
Custom integration: You control which tools Claude has access to. Maybe you want to sandbox file operations to a specific directory. Maybe you want to log all executed commands to a central audit system. Maybe you want to pipe results into your custom dashboard or Slack channel. You build it.
Compliance and security: You own the session—you control the system prompt, the working directory, the model version, and the API key. Perfect for enterprises that need audit trails, fine-grained permissions, and the ability to verify every action Claude takes. You're not delegating security to a third party; you're implementing it yourself.
Cost optimization: Run multiple sessions in parallel for cheaper models (Haiku, Sonnet), use expensive models (Opus) only when needed, and optimize based on task complexity. You have fine-grained control over cost allocation per use case.
In short: the Agent SDK is Claude Code as a library, not a service. You're no longer consuming it through a web UI; you're embedding it as a capability within your own tools.
Installing and Initializing the Agent SDK
Getting started is refreshingly simple. You need Node.js 18+ and an Anthropic API key.
Installation is a single npm command:
npm install @anthropic-ai/claude-codeOr if you're using yarn:
yarn add @anthropic-ai/claude-codeOr pnpm:
pnpm add @anthropic-ai/claude-codeOnce installed, you can import the SDK and create your first session. Here's the minimal setup:
import { ClaudeCodeSession } from "@anthropic-ai/claude-code";
const session = new ClaudeCodeSession({
apiKey: process.env.ANTHROPIC_API_KEY,
workingDirectory: "/tmp/my-project",
model: "claude-3-5-sonnet-20241022",
});
// Send a message
const response = await session.message("What files are in this directory?");
console.log(response.text);That's it. You've created a Claude Code session and asked it a question. The SDK handles authentication, model selection, and communication with Anthropic's API. But the real power emerges when you start configuring the session to your needs.
Configuring Sessions: Permissions, Tools, and Context
The SDK gives you fine-grained control over what Claude Code can do. When you create a session, you pass an options object that shapes its behavior.
Here's the full landscape of configuration:
Model Selection
You specify which Claude model powers the session:
const session = new ClaudeCodeSession({
apiKey: process.env.ANTHROPIC_API_KEY,
model: "claude-3-5-sonnet-20241022", // Latest fast model
});Why this matters: Different models have different capabilities and costs. Sonnet is fast and cost-effective for routine analysis. Opus is most powerful but slower and more expensive—use it for security reviews or complex refactoring. Haiku is ultralight for simple tasks like counting lines or finding patterns. Choose based on your use case and budget.
Real-world consideration: A code quality gate running on every PR? Use Sonnet. An annual security audit? Opus. Searching for deprecated APIs across your codebase? Haiku. Don't use one model for everything.
Working Directory and Sandbox Boundaries
This is crucial. You specify where Claude Code operates:
const session = new ClaudeCodeSession({
apiKey: process.env.ANTHROPIC_API_KEY,
workingDirectory: "/home/user/projects/my-app",
});Claude will treat this directory as its "current working directory." File operations are scoped to this location and its subdirectories. This is your security boundary. If you don't want Claude modifying system files, don't give it write access outside your sandbox.
Real example: You're running an automated code review. You set workingDirectory to the PR branch checked out to a temporary location like /tmp/pr-review-12345. Claude can read and analyze code, but it can't touch production systems, other developers' branches, or anything else outside that directory. It's a hard wall.
Pro tip: Always use absolute paths, never relative paths. ./my-app changes meaning depending on where your process runs. /home/user/projects/my-app is unambiguous.
Enabling and Disabling Tools
You control which tools Claude has access to:
const session = new ClaudeCodeSession({
apiKey: process.env.ANTHROPIC_API_KEY,
workingDirectory: "/tmp/sandbox",
tools: {
bash: true, // Can execute shell commands
file_read: true, // Can read files
file_write: true, // Can write files
file_search: true, // Can search directory trees
browser: false, // Cannot automate browsers
},
});This is powerful for compliance. Running an automated test suite? Disable file_write and browser so Claude can't accidentally modify test configuration or launch unintended browser sessions. Building a code analyzer? Enable file_read and bash, but disable file_write to prevent modifications.
Tool availability matrix:
| Tool | Use Case | Example | Risk |
|---|---|---|---|
file_read | Analyzing code | Code review | Low—read-only |
file_write | Generating fixes | Refactoring | High—modifications |
bash | Running tests | Test analysis | Very high—arbitrary execution |
file_search | Finding patterns | Migration analysis | Low—read-only |
browser | Testing UIs | E2E test automation | High—external side effects |
Real consideration: A single runaway bash command could delete your entire production database. Always ask: "Does Claude really need this tool for this task?" Fewer tools = smaller attack surface.
System Prompt Customization
You can inject a custom system prompt to shape Claude's behavior:
const session = new ClaudeCodeSession({
apiKey: process.env.ANTHROPIC_API_KEY,
workingDirectory: "/project",
systemPrompt: `You are a code reviewer for our TypeScript monorepo.
Your job is to:
1. Check for type safety issues
2. Verify error handling
3. Ensure consistent style with our linter rules
4. Flag performance concerns
5. Identify security vulnerabilities
Do NOT modify code without explicit approval.
Do NOT run tests automatically.
Do NOT suggest frameworks we don't use.
Our tech stack: TypeScript, React, Express, PostgreSQL.`,
});Now every message Claude processes includes this context. It shapes its responses and behavior toward your specific need. This is how you convert a general-purpose AI into a specialized agent for your workflow.
Pro strategy: Make your system prompt specific enough to guide behavior, but general enough to adapt. Too specific and Claude gets stuck; too general and it loses focus.
Timeouts and Rate Limiting
For production systems, you'll want to control resource consumption:
const session = new ClaudeCodeSession({
apiKey: process.env.ANTHROPIC_API_KEY,
workingDirectory: "/tmp/analysis",
timeout: 30000, // 30-second timeout per message
maxToolCalls: 50, // Limit tool invocations
retryPolicy: {
maxRetries: 2,
backoffMs: 1000,
},
});Why this matters: A runaway session could consume API quota or hang indefinitely. Timeouts and rate limits keep your costs predictable and your system responsive. A single forgotten timeout in production could cost you thousands.
Real numbers: A message that takes 3 minutes to complete across a 100-PR batch = 5 hours of compute. A message that takes 30 seconds = 50 minutes. The difference is configuration.
Spawning and Managing Sessions Programmatically
Now we get to the fun part: actually using the SDK in your application.
A typical workflow looks like: create a session, send messages, handle tool invocations, and stream results back to your app.
Basic Message Flow
import { ClaudeCodeSession } from "@anthropic-ai/claude-code";
async function analyzeCode(codeDirectory: string) {
const session = new ClaudeCodeSession({
apiKey: process.env.ANTHROPIC_API_KEY,
workingDirectory: codeDirectory,
model: "claude-3-5-sonnet-20241022",
tools: {
file_read: true,
file_search: true,
bash: false, // Don't execute anything
},
});
// Send a request
const response = await session.message(
"Find all TODO comments in this codebase and summarize them by file.",
);
console.log(response.text);
// Clean up
await session.close();
}
// Usage
analyzeCode("/home/user/my-repo");This is straightforward: you create a session, send a message, get back a response, and close the session. But Claude Code is event-driven. When it needs to execute a tool (like searching for files), it emits a tool-use event. Your app needs to handle that.
Handling Tool Invocations
Claude doesn't directly execute tools—it requests them. Your application decides whether to grant each request:
import {
ClaudeCodeSession,
ToolUseEvent,
ToolResultEvent,
} from "@anthropic-ai/claude-code";
async function codeReview(prDirectory: string) {
const session = new ClaudeCodeSession({
apiKey: process.env.ANTHROPIC_API_KEY,
workingDirectory: prDirectory,
});
// Listen for tool invocation requests
session.on("toolUse", async (event: ToolUseEvent) => {
console.log(`Claude requested: ${event.toolName}`);
console.log(`Input: ${JSON.stringify(event.input, null, 2)}`);
// You decide whether to allow it
if (event.toolName === "file_write" && isNotAllowed(event.input.path)) {
session.submitToolResult(event.toolId, {
success: false,
error: "Cannot write to that directory.",
});
return;
}
// If allowed, execute and return result
const result = await executeToolLocally(event.toolName, event.input);
session.submitToolResult(event.toolId, result);
});
// Now ask Claude to do work
const response = await session.message(
"Review the code changes in this PR for security issues.",
);
console.log(response.text);
await session.close();
}
async function executeToolLocally(
toolName: string,
input: Record<string, unknown>,
) {
// Your custom tool execution logic
// Could shell out to actual commands, call your own APIs, etc.
if (toolName === "bash") {
// Execute with controlled environment
// Log the command, verify it doesn't touch sensitive paths
// Return result
}
// ... handle other tools
}This pattern gives you complete control. You can:
- Log every tool invocation for audit trails and compliance
- Reject unsafe operations (e.g., don't let Claude delete production databases)
- Replace real tools with mocks (for testing or sandboxing)
- Inject custom tools that don't exist in Claude Code's standard set
- Rate-limit operations (fail after N file operations, etc.)
- Add telemetry to track what Claude is doing
Streaming Responses
For long-running tasks, you want to stream results back to the user as they arrive, not wait for everything to finish:
async function streamCodeAnalysis(repoPath: string) {
const session = new ClaudeCodeSession({
apiKey: process.env.ANTHROPIC_API_KEY,
workingDirectory: repoPath,
});
// Stream text as it arrives
const stream = session.messageStream(
"Analyze code complexity and suggest refactoring opportunities.",
);
for await (const chunk of stream) {
if (chunk.type === "text") {
process.stdout.write(chunk.text);
} else if (chunk.type === "toolUse") {
console.log(`\n[Tool: ${chunk.toolName}]`);
// Handle tool use
}
}
await session.close();
}Streaming is essential for user experience. Nobody wants to wait 30 seconds for a response to appear all at once. With streaming, Claude's analysis starts flowing to the user immediately, creating a sense of progress.
Concurrency Patterns and Resource Management
Before we dive into managing multiple sessions, it's worth understanding the resource implications. Each session:
- Maintains a connection to Anthropic's API
- Holds state in memory (conversation history, file handles)
- May consume API quota
- Uses bandwidth and compute resources
Running 100 concurrent sessions is possible but expensive. Running 1,000 could exhaust your API quota in minutes. You need concurrency patterns that balance throughput with cost and reliability.
Sequential processing (one session at a time) is safe but slow. For a batch of 100 PRs, it could take hours.
Bounded concurrency (5-10 sessions in parallel) is the sweet spot. You get good throughput without overwhelming the API.
Unbounded concurrency (spawn as many as you want) will crash your costs and API limits.
Here's a helper function that implements bounded concurrency:
async function processBatch<T, R>(
items: T[],
processor: (item: T) => Promise<R>,
concurrency: number = 5,
): Promise<R[]> {
const results: R[] = [];
const executing: Promise<any>[] = [];
for (const item of items) {
const promise = processor(item)
.then((result) => {
results.push(result);
})
.catch((error) => {
console.error(`Processing failed for item: ${error}`);
results.push(null as any); // Or handle error differently
});
executing.push(promise);
if (executing.length >= concurrency) {
// Wait for the first promise to complete
await Promise.race(executing);
executing.splice(
executing.findIndex((p) => p === promise),
1,
);
}
}
// Wait for remaining promises
await Promise.all(executing);
return results;
}
// Usage: process 100 PRs with max 5 concurrent sessions
const results = await processBatch(
prBranches,
(branch) => reviewPullRequest(branch),
5, // Concurrency limit
);This pattern ensures you never exceed your concurrency limit while processing large batches efficiently.
Tuning guide:
- Start with 5 concurrent sessions
- Monitor API response times and error rates
- If response times stay under 2 seconds, increase to 7-8
- If you see rate limit errors, decrease back down
- Different tasks have different sweet spots
Managing Multiple Sessions
Here's where the SDK truly shines: you can spawn many sessions in parallel, each handling a different task:
import { ClaudeCodeSession } from "@anthropic-ai/claude-code";
async function reviewPullRequests(prBranches: string[]) {
// Create a session for each PR
const sessionPromises = prBranches.map(async (branch) => {
const session = new ClaudeCodeSession({
apiKey: process.env.ANTHROPIC_API_KEY,
workingDirectory: `/tmp/pr-${branch}`,
model: "claude-3-5-sonnet-20241022",
tools: {
file_read: true,
file_search: true,
bash: true,
file_write: false, // Read-only for safety
},
systemPrompt: `You are a code reviewer. Focus on:
- Type safety and correctness
- Performance implications
- Test coverage
- Documentation updates`,
});
try {
const response = await session.message(
`Review the code changes in branch ${branch}.
Identify:
1. Bugs or logical errors
2. Missing error handling
3. Incomplete tests
4. Documentation gaps`,
);
return {
branch,
review: response.text,
status: "success",
};
} catch (error) {
return {
branch,
error: (error as Error).message,
status: "failed",
};
} finally {
await session.close();
}
});
// Wait for all reviews to complete
const reviews = await Promise.all(sessionPromises);
// Generate report
for (const review of reviews) {
if (review.status === "success") {
console.log(`\n=== ${review.branch} ===`);
console.log(review.review);
} else {
console.error(`Review failed for ${review.branch}: ${review.error}`);
}
}
}
// Usage: review 10 PRs in parallel
reviewPullRequests([
"feature/auth-v2",
"bugfix/socket-leak",
"chore/deps-update",
// ... 7 more
]);Now instead of a human spending hours reviewing code, Claude reviews multiple PRs in parallel. Each session is isolated, configured identically, and reports results back to your system. This is the power of programmatic spawning.
Real-World Use Cases
Let's ground this in concrete scenarios where the Agent SDK transforms how teams work.
Use Case 1: Internal Code Review Portal
Your company has a custom code review portal. Developers open PRs, and your portal displays diffs. You want to augment it with Claude Code's analysis.
// In your API endpoint that handles PR review requests
import { ClaudeCodeSession } from "@anthropic-ai/claude-code";
import express from "express";
import { execSync } from "child_process";
const app = express();
app.post("/api/pr/:prId/analyze", async (req, res) => {
const { prId } = req.params;
// Check out the PR branch
const tempDir = `/tmp/pr-analysis-${prId}`;
execSync(`git clone --branch pr-${prId} . ${tempDir}`);
// Create a Claude Code session for this specific PR
const session = new ClaudeCodeSession({
apiKey: process.env.ANTHROPIC_API_KEY,
workingDirectory: tempDir,
model: "claude-3-5-sonnet-20241022",
tools: {
file_read: true,
file_search: true,
bash: true,
file_write: false,
},
systemPrompt: `You are a code reviewer for our company.
Focus on security, performance, maintainability, and test coverage.
Be concise but thorough.`,
});
try {
// Stream the analysis back to the client
res.setHeader("Content-Type", "text/event-stream");
res.setHeader("Cache-Control", "no-cache");
res.setHeader("Connection", "keep-alive");
const stream = session.messageStream(
`Perform a thorough code review of the changes in this PR.
Identify issues in these areas:
1. Security vulnerabilities
2. Performance concerns
3. Code style violations
4. Missing error handling
5. Test coverage gaps
6. Documentation issues`,
);
for await (const chunk of stream) {
if (chunk.type === "text") {
res.write(`data: ${JSON.stringify({ text: chunk.text })}\n\n`);
}
}
res.write("data: [DONE]\n\n");
res.end();
} catch (error) {
res.status(500).json({
error: (error as Error).message,
});
} finally {
await session.close();
execSync(`rm -rf ${tempDir}`);
}
});
app.listen(3000);Now developers open a PR in your portal, click "Analyze with Claude Code," and get a detailed review streamed in real time. No context-switching, no manual copy-pasting. The review is integrated directly into your workflow.
Use Case 2: Automated Testing and Quality Gates
Your CI/CD pipeline runs tests, but you want to add static analysis and code quality checks powered by Claude Code.
import { ClaudeCodeSession } from "@anthropic-ai/claude-code";
async function codeQualityGate(commitHash: string) {
console.log(`Running code quality gate for ${commitHash}...`);
const session = new ClaudeCodeSession({
apiKey: process.env.ANTHROPIC_API_KEY,
workingDirectory: process.cwd(),
model: "claude-3-5-sonnet-20241022",
tools: {
file_read: true,
file_search: true,
bash: true,
file_write: false,
},
timeout: 120000, // 2 minutes max
});
try {
const response = await session.message(
`Analyze the code changes in commit ${commitHash}.
Run the test suite and report:
1. Test results (pass/fail)
2. Code coverage changes
3. Any new warnings or errors
4. Potential issues or improvements
Format your response as JSON with fields: tests_passed, coverage_change, warnings, recommendations.`,
);
// Parse the response
const result = JSON.parse(response.text);
if (!result.tests_passed) {
console.error("Tests failed. Blocking merge.");
process.exit(1);
}
if (result.coverage_change < -5) {
console.error("Coverage dropped >5%. Blocking merge.");
process.exit(1);
}
console.log("Quality gate passed.");
console.log(`Coverage change: ${result.coverage_change}%`);
console.log(`Recommendations: ${result.recommendations}`);
process.exit(0);
} catch (error) {
console.error(`Quality gate failed: ${(error as Error).message}`);
process.exit(1);
} finally {
await session.close();
}
}
// Called from CI pipeline
codeQualityGate(process.env.COMMIT_HASH || "HEAD");Now every commit automatically triggers a Claude Code analysis. If tests fail or coverage drops, the merge is blocked. This is a quality gate that actually understands code, not just running mechanical linters.
Use Case 3: Developer IDE Integration
You're building an IDE plugin. Developers select a file and request "Claude Code analysis." Your plugin spawns a session and displays results in a sidebar.
import { ClaudeCodeSession } from "@anthropic-ai/claude-code";
import vscode from "vscode";
export async function activateClaudeCodePlugin(
context: vscode.ExtensionContext,
) {
// Register command: "Analyze this file with Claude Code"
const analyzeCommand = vscode.commands.registerCommand(
"claude-code-plugin.analyzeFile",
async () => {
const editor = vscode.window.activeTextEditor;
if (!editor) {
vscode.window.showErrorMessage("No active editor.");
return;
}
const filePath = editor.document.fileName;
const workspaceRoot = vscode.workspace.workspaceFolders?.[0].uri.fsPath;
if (!workspaceRoot) {
vscode.window.showErrorMessage("Not in a workspace.");
return;
}
// Show progress
vscode.window.withProgress(
{
location: vscode.ProgressLocation.Notification,
title: "Analyzing...",
},
async (progress) => {
const session = new ClaudeCodeSession({
apiKey: process.env.ANTHROPIC_API_KEY,
workingDirectory: workspaceRoot,
model: "claude-3-5-sonnet-20241022",
tools: {
file_read: true,
file_search: true,
bash: false, // Don't execute anything
},
});
try {
progress.report({ increment: 30 });
const response = await session.message(
`Analyze this file: ${filePath}
Provide:
1. Summary of what this code does
2. Any potential bugs or issues
3. Suggestions for improvement
4. Type safety checks (if applicable)`,
);
progress.report({ increment: 70 });
// Display results in a panel
const panel = vscode.window.createWebviewPanel(
"claudeCodeAnalysis",
"Claude Code Analysis",
vscode.ViewColumn.Two,
{},
);
panel.webview.html = `
<!DOCTYPE html>
<html>
<head>
<style>
body { font-family: -apple-system, BlinkMacSystemFont, sans-serif; padding: 20px; }
h2 { color: #333; }
p { line-height: 1.6; color: #666; }
</style>
</head>
<body>
<h2>Claude Code Analysis</h2>
<div>${response.text.replace(/\n/g, "<br/>")}</div>
</body>
</html>
`;
progress.report({ increment: 100 });
} finally {
await session.close();
}
},
);
},
);
context.subscriptions.push(analyzeCommand);
}Now developers can analyze code without leaving VS Code. They get Claude's insights inline, integrated seamlessly into their development workflow.
Real-World Integration Patterns
Before diving into pitfalls, let's ground this in how teams actually use the Agent SDK. Understanding these patterns helps you design your own integration correctly.
Pattern 1: The Code Review Bot
Many teams deploy Claude Code as a GitHub bot that automatically reviews every PR. Here's how the pattern works in practice:
When a PR is created, a GitHub webhook triggers a Lambda function. That function:
- Checks out the PR branch
- Creates a Claude Code session with that branch as the working directory
- Sends a comprehensive code review request
- Posts the analysis back to the PR as a comment
- Updates a check status (passing with warnings, or blocking if critical issues found)
The beauty of this pattern is that it provides instant feedback to developers. Instead of waiting for a human reviewer, developers get Claude's analysis while the code is fresh in their mind. The analysis includes specific file paths, line numbers, and actionable suggestions. Over time, developers internalize these patterns and write better code upfront.
One team using this pattern reported a 30% reduction in code review comments related to "style" and "common mistakes" within the first month. Reviewers could focus on architecture and design instead of mechanical issues.
The cost? Running Claude on every PR (let's say 50-100 PRs per week with a team of 20) costs roughly $2-5 per PR with efficient configuration. That's $100-500/week, or $5,000-25,000 per year. Compare that to the opportunity cost of a human spending 30 minutes per PR: that's 25-50 hours per week. Even one senior engineer spending a quarter of their time on code review justifies the automation cost.
Pattern 2: The Compliance Checker
Regulated industries (healthcare, finance, legal) need to ensure code meets specific compliance standards. A team in healthcare built a Claude Code agent that runs on every PR and checks for:
- Patient data exposure in logs
- Hardcoded credentials (PII, tokens)
- Weak cryptography
- Missing audit logging
- Insufficient input validation for PHI (Protected Health Information)
The agent has access to a custom system prompt that includes their industry compliance requirements. Each PR generates a compliance report that gets attached to the GitHub PR as well as pushed to a compliance dashboard. If critical violations are found, the PR fails its compliance check and can't be merged.
This team moved from annual compliance audits (expensive, find issues months after they're written) to continuous compliance checking (cheap, catch issues immediately). The compliance overhead for developers is near-zero—the system runs in the background and reports violations when found.
Pattern 3: The AI-Powered Refactoring Pipeline
Another common pattern: teams use Claude Code to refactor codebases at scale. For example, migrating from one framework to another.
When a team decided to migrate from Redux to Redux Toolkit, they:
- Created a branch with the full main codebase
- Spawned a Claude Code session
- Sent a message: "Automatically refactor all Redux patterns to Redux Toolkit. Generate a PR with all changes."
- Claude generated 200+ modified files with systematic refactoring
- Team reviewed the generated changes, made adjustments, and merged
This would have taken a contractor weeks. Claude Code did the bulk work in hours. Humans validated and adjusted. The pattern shifts from "write code manually" to "generate code, then validate."
Pattern 4: The Documentation Generator
Some teams use Claude Code to generate API documentation, changelog entries, or architecture diagrams from code. The process:
- Create a session pointed at a codebase version
- Request: "Generate comprehensive API documentation for all public methods"
- Claude reads the code, generates markdown documentation
- Team commits documentation alongside code
- Next release cycle, run again to keep docs fresh
This addresses the age-old problem: code gets updated, documentation becomes stale. With Claude generating docs from code, you have documentation that's automatically in sync with the actual code.
Key Pitfalls to Avoid
As you integrate the Agent SDK, watch out for these common gotchas:
Not setting a working directory boundary: If you don't specify workingDirectory, Claude operates on your entire filesystem. Always sandbox to a specific path. This is both a security issue and a practical one—unbounded file operations are slow and dangerous. Imagine Claude searching for .js files and recursively traversing your entire home directory. It's wasted API calls, wasted time, and a potential security vulnerability. Scope everything aggressively.
Forgetting to close sessions: Each session consumes resources. Always call await session.close() in a finally block or use resource management patterns. Leaking sessions will drain your API quota. In production, a single forgotten session can cost hundreds of dollars per day if it's repeatedly spawned. Use try/finally or async resource managers to guarantee cleanup.
Enabling all tools by default: Just because Claude can execute bash commands doesn't mean it should in your use case. Explicitly enable only the tools you need. This is a safety boundary and a clarity signal about what Claude is supposed to do. If you're building a code analyzer, you don't need file_write or browser. If you're running security checks, you don't need file_write. Less is more.
Not handling tool invocation errors: When Claude requests a tool, your code might fail to execute it (permission denied, file not found, etc.). Always return an error result so Claude knows what happened and can adapt. If a file doesn't exist, tell Claude that instead of silently returning null. If a command fails, include the error message. Claude learns from these signals and can take corrective action.
Timeout configuration too loose: If you don't set a timeout, a runaway session could consume resources indefinitely. Set reasonable timeouts (30 seconds for interactive tasks, maybe 5 minutes for deep analysis) and respect them. A single runaway session could consume your entire API monthly quota in hours. Timeouts are not overhead—they're essential protection.
Mixing models without considering trade-offs: Sonnet is faster and cheaper. Opus is more capable but slower and more expensive. Haiku is ultralight for simple tasks but less capable. Don't just pick one and never revisit. Different tasks have different needs. A simple "find TODO comments" task? Use Haiku. A complex security review? Use Opus. A routine code quality check? Sonnet is your sweet spot.
Assuming tool output is always correct: Claude is powerful, but it's not infallible. If you're using Claude to generate SQL or shell commands, validate them before executing. Log all invocations for audit trails. A single misgenerated rm -rf command could be catastrophic. Always review, validate, or sandbox destructive operations. This is non-negotiable.
Not logging tool invocations: For compliance, debugging, and incident response, you need a record of everything Claude did. Every file Claude read, every bash command it requested, every API call it made. Store these logs with timestamps, session IDs, and outcomes. When something goes wrong, logs are your lifeline.
Cost Optimization and Scaling Considerations
The Agent SDK's flexibility comes with responsibility for cost management. A naïve implementation could cost thousands per month. A thoughtful implementation costs hundreds. Understanding the cost levers helps you design for scale.
The Cost Structure
Each Claude Code session has several cost components:
- API calls: Every message you send to Claude costs tokens. Input tokens are cheap; output tokens cost more.
- Tool execution overhead: When Claude requests tools (file reads, bash execution), there's a latency cost. Slow tool execution means longer session durations, more API quota consumed, and higher costs.
- Session overhead: Maintaining session state costs memory and compute. High concurrency (100+ sessions) costs more than low concurrency.
- Model selection: Haiku costs ~$0.80 per 1M input tokens. Sonnet costs ~$3 per 1M. Opus costs ~$15 per 1M. Using the right model for the right task is essential.
A typical code review session might:
- Send 1,500 input tokens (your code + prompt)
- Claude responds with 2,500 output tokens (analysis)
- Make 20 tool calls (file reads, grep searches)
- Complete in 30 seconds total
At Sonnet prices:
- Input: 1,500 * $3 / 1,000,000 = $0.0045
- Output: 2,500 * $3 / 1,000,000 = $0.0075
- Total: ~$0.012 per PR review
With 100 PRs per week: $1.20/week, $62/year. That's noise.
But if you're inefficient:
- Sending the entire codebase as context each time (10,000 input tokens)
- Making 200 tool calls instead of 20 (slow, redundant searches)
- Running Opus instead of Sonnet (5x cost)
- Session timeouts causing retries (API quota wasted)
Suddenly you're looking at $2-5 per PR. Same task, 100-200x higher cost.
Cost Optimization Strategies
Strategy 1: Model Selection by Task
Different tasks benefit from different models:
- Haiku ($0.80/1M tokens): Simple pattern matching, script generation, data transformation, log parsing. Speed matters more than comprehension.
- Sonnet ($3/1M tokens): Code review, analysis, refactoring, medium-complexity reasoning. The sweet spot for most tasks.
- Opus ($15/1M tokens): Complex security analysis, architectural decisions, novel problems, high-stakes reviews.
A real example: One team had been running all code reviews on Opus. Switching to Sonnet for standard reviews and Opus for security-critical reviews reduced costs by 70% with better results (Opus was over-thinking routine issues).
Strategy 2: Caching and Reuse
If you're analyzing the same codebase repeatedly (weekly reviews, continuous scanning), cache the codebase analysis:
// First run: full analysis
const session = new ClaudeCodeSession({
workingDirectory: "/repo",
});
const firstAnalysis = await session.message(
"Analyze this codebase for code quality issues",
);
// Save the analysis to a database
// Second run: differential analysis
const changes = await getChangedFiles(); // Only changed since last scan
const secondAnalysis = await session.message(
`Last analysis: [cached analysis from database]
Changes since then: [list of modified files]
Analyze only the changed files for regressions or new issues`,
);This approach reuses the cached analysis and only analyzes what's new. Claude understands the previous state and can make incremental observations.
Strategy 3: Batch Processing
If you have many PRs to review, batch them efficiently:
// INEFFICIENT: Create new session for each PR
for (const pr of prs) {
const session = new ClaudeCodeSession({ ... });
const review = await session.message(`Review ${pr}`);
await session.close();
}
// EFFICIENT: Single session analyzing multiple PRs
const session = new ClaudeCodeSession({ ... });
for (const pr of prs) {
// Reuse same session, avoid overhead of creating/closing
const review = await session.message(
`Now analyze: ${pr.name}
Path: ${pr.path}`
);
}
await session.close();Session creation and teardown have overhead. When analyzing multiple things, reuse sessions when possible.
Strategy 4: Limiting Tool Calls
The most expensive part of a session is often tool execution, not model time. If Claude makes 100 file read requests, that's 100 file I/O operations. Optimize by:
// Set limits
const session = new ClaudeCodeSession({
maxToolCalls: 50, // Fail if exceeding 50 tool invocations
timeout: 60000, // Fail if taking >60 seconds
});
// Give Claude smart starting points
const commonPatterns = await findCodePatterns();
const relevantFiles = await identifyRelevantFiles();
const response = await session.message(
`You have access to these files for context:
${commonPatterns}
Focus your analysis on these files which are most likely to have issues:
${relevantFiles}
Find security vulnerabilities in the codebase.`,
);By giving Claude good starting points, you reduce exploratory tool calls and focus on actual analysis.
Scaling to High Concurrency
When you move beyond dozens of sessions to hundreds, new considerations emerge.
Concurrent Session Limits: The Anthropic API has rate limits. High concurrency increases the chance of hitting them. Monitor your usage:
interface UsageMetrics {
sessionsActive: number;
successRate: number;
avgDurationMs: number;
apiErrorRate: number;
}
async function getUsageMetrics(): Promise<UsageMetrics> {
// Track these in your monitoring system
return {
sessionsActive: sessions.length,
successRate: successCount / totalCount,
avgDurationMs: totalDuration / successCount,
apiErrorRate: apiErrors / totalCount,
};
}
// Adjust concurrency dynamically
const metrics = await getUsageMetrics();
if (metrics.apiErrorRate > 0.05) {
// 5% error rate: reduce concurrency
maxConcurrency = Math.max(1, maxConcurrency - 2);
} else if (metrics.apiErrorRate === 0 && metrics.avgDurationMs < 30000) {
// Low errors and fast: can handle more
maxConcurrency = Math.min(50, maxConcurrency + 1);
}This adaptive approach helps you find the right concurrency level for your usage pattern.
Connection Pooling: If you're spawning sessions continuously, implement connection pooling to reuse sessions:
class SessionPool {
private available: ClaudeCodeSession[] = [];
private inUse = new Set<ClaudeCodeSession>();
private maxSize = 10;
async acquire(): Promise<ClaudeCodeSession> {
if (this.available.length > 0) {
const session = this.available.pop()!;
this.inUse.add(session);
return session;
}
if (this.inUse.size < this.maxSize) {
const session = new ClaudeCodeSession({ ... });
this.inUse.add(session);
return session;
}
// Wait for session to be released
await new Promise(resolve => this.once('available', resolve));
return this.acquire();
}
async release(session: ClaudeCodeSession): Promise<void> {
this.inUse.delete(session);
this.available.push(session);
this.emit('available');
}
}This pool reuses sessions across requests, reducing creation/teardown overhead and API quota waste.
Error Handling and Monitoring
In production, things will go wrong. Networks fail. Models hallucinate. Tool execution times out. Your integration needs to handle failure gracefully.
Here's a robust error handling pattern:
import { ClaudeCodeSession, SessionError } from "@anthropic-ai/claude-code";
import * as logger from "pino"; // Use your preferred logging library
const log = logger();
async function robustAnalysis(repoPath: string, analysisType: string) {
const session = new ClaudeCodeSession({
apiKey: process.env.ANTHROPIC_API_KEY,
workingDirectory: repoPath,
model: "claude-3-5-sonnet-20241022",
timeout: 120000,
});
const sessionId = Math.random().toString(36).substr(2, 9);
try {
log.info({ sessionId, analysisType, repoPath }, "Starting analysis");
session.on("toolUse", async (event) => {
log.debug(
{ sessionId, tool: event.toolName, input: event.input },
"Tool requested",
);
try {
// Execute with safety checks
if (event.toolName === "bash") {
// Never allow destructive operations in production
if (
event.input.command?.includes("rm -rf") ||
event.input.command?.includes("dd if=")
) {
log.warn(
{ sessionId, command: event.input.command },
"Blocking destructive command",
);
session.submitToolResult(event.toolId, {
success: false,
error: "Destructive commands are not allowed.",
});
return;
}
}
// Execute the tool
const result = await executeToolWithTimeout(
event.toolName,
event.input,
30000,
);
log.debug({ sessionId, tool: event.toolName }, "Tool succeeded");
session.submitToolResult(event.toolId, result);
} catch (error) {
log.error(
{ sessionId, tool: event.toolName, error: (error as Error).message },
"Tool execution failed",
);
session.submitToolResult(event.toolId, {
success: false,
error: `Tool execution failed: ${(error as Error).message}`,
});
}
});
const response = await session.message(
`Analyze this repository for ${analysisType} issues.
Provide a structured report.`,
);
log.info({ sessionId }, "Analysis complete");
return { success: true, result: response.text };
} catch (error) {
if (error instanceof SessionError) {
log.error(
{
sessionId,
code: error.code,
message: error.message,
},
"Session error",
);
if (error.code === "TIMEOUT") {
return {
success: false,
error: "Analysis timed out. The repository may be too large.",
};
}
if (error.code === "RATE_LIMIT") {
return {
success: false,
error: "API rate limit exceeded. Try again later.",
};
}
}
log.error(
{ sessionId, error: (error as Error).message },
"Unexpected error",
);
return {
success: false,
error: "An unexpected error occurred during analysis.",
};
} finally {
try {
await session.close();
log.info({ sessionId }, "Session closed");
} catch (cleanupError) {
log.warn(
{ sessionId, error: (cleanupError as Error).message },
"Error closing session",
);
}
}
}
async function executeToolWithTimeout(
toolName: string,
input: Record<string, unknown>,
timeout: number,
): Promise<Record<string, unknown>> {
return Promise.race([
executeToolActually(toolName, input),
new Promise<Record<string, unknown>>((_, reject) =>
setTimeout(
() => reject(new Error(`Tool execution timeout after ${timeout}ms`)),
timeout,
),
),
]);
}
async function executeToolActually(
toolName: string,
input: Record<string, unknown>,
): Promise<Record<string, unknown>> {
// Your implementation here
// Log, validate, execute, return result
return { success: true };
}This pattern provides:
- Structured logging with session IDs for request tracing
- Command validation to prevent dangerous operations
- Timeout protection at both the session and tool levels
- Specific error handling for different failure modes
- Guaranteed cleanup even when things go wrong
- Audit trail of every tool invocation
Monitoring should also track:
- Session success/failure rates
- Average session duration
- Tool execution success rates
- API quota consumption
- Error categories and frequencies
- Model response times by task type
These metrics let you detect problems early and optimize your usage patterns.
Expected Output and Configuration Patterns
Here's a production-ready session configuration that balances power, safety, and cost:
import { ClaudeCodeSession, SessionOptions } from "@anthropic-ai/claude-code";
interface ReviewConfig {
repoPath: string;
reviewType: "pr" | "release" | "security";
customPrompt?: string;
}
async function createConfiguredSession(config: ReviewConfig) {
const basePrompt = `You are a code reviewer specialized in ${config.reviewType} reviews.
Be thorough but concise. Identify concrete issues with evidence.`;
const options: SessionOptions = {
apiKey: process.env.ANTHROPIC_API_KEY,
workingDirectory: config.repoPath,
model:
config.reviewType === "security"
? "claude-3-opus-20250219" // Most capable for security
: "claude-3-5-sonnet-20241022", // Fast & cost-effective for standard reviews
tools: {
file_read: true,
file_search: true,
bash: config.reviewType === "release", // Only allow bash for release checks
file_write: false, // Never allow modifications
browser: false, // Not needed for code review
},
systemPrompt: config.customPrompt || basePrompt,
timeout: config.reviewType === "security" ? 300000 : 120000,
maxToolCalls: 100,
};
return new ClaudeCodeSession(options);
}
// Usage
const session = await createConfiguredSession({
repoPath: "/tmp/pr-feature",
reviewType: "security",
});
const analysis = await session.message(
"Check for XSS vulnerabilities, SQL injection risks, and authentication issues.",
);
console.log(analysis.text);
await session.close();This pattern gives you:
- Model selection based on task type (security → Opus, standard → Sonnet)
- Tool restrictions appropriate to the use case
- Configurable timeouts scaled to task complexity
- Consistent system prompts that guide behavior
- Type safety through TypeScript
- Reusability through a configurable factory
Summary
The Claude Code Agent SDK transforms Claude from a web-based tool into a programmable capability you can embed in your own applications. You control the security boundaries, the tools available, the system prompt, and the model. You manage sessions programmatically, handle tool invocations, stream results, and spawn parallel workers.
Whether you're building an internal code review portal, adding quality gates to your CI/CD pipeline, or integrating Claude Code directly into your IDE, the SDK gives you the leverage to embed AI-powered code assistance wherever your developers work.
The key is thoughtful configuration: always sandbox your working directory, enable only necessary tools, set reasonable timeouts, and handle tool invocation events with care. Done right, you're not just using Claude Code—you're embedding it as a first-class capability in your developer workflow.
Start small: create a session, send a message, handle the response. Then expand from there. Your developers (and your code quality) will thank you.
-iNet