Agent SDK: Building Headless Automation with Claude Code

When you think of Claude Code, you probably picture the interactive terminal interface—the REPL that lets you iterate in real-time, see output, adjust, and try again. But here's where things get interesting: Claude Code has a powerful side life as a headless agent that runs without any terminal UI at all. You pipe prompts in, collect structured results, and move on. No waiting. No interactivity. Just pure automation.
This is where the Agent SDK shines. It's the programmatic interface to Claude Code's intelligence, designed to fit seamlessly into your CI/CD pipelines, cron jobs, webhooks, and any system that needs to automate code generation, analysis, or task execution at scale. In this article, we'll explore what headless automation actually means, how to use the Agent SDK in TypeScript, and how to build robust automation systems that Claude Code powers.
Table of Contents
- What Is Headless Automation?
- Why This Matters: The Scale Problem
- Why Headless Matters
- The Hidden Cost of Interactive Mode at Scale
- The Agent SDK Overview
- Getting Started: Your First Headless Execution
- Building Multi-Turn Conversations
- Practical Use Case: CI/CD Code Review Agent
- Issues Found
- Suggestions
- Recommended Changes
- Error Handling and Resilience
- Logging and Audit Trails
- Cron-Driven Automation: Nightly Code Analysis
- Performance and Cost Considerations
- Prompt Caching
- Batch Processing with Concurrency Limits
- Cost Estimation
- Common Pitfalls and How to Avoid Them
- Pitfall 1: Assuming Idempotence
- Pitfall 2: Timeout Hell
- Pitfall 3: Secrets in Prompts
- Monitoring Headless Executions
- Real-World Scenario: Building a Continuous Code Quality System
- The Implementation Deep Dive
- Measuring Impact
- Extending the System
- Team Adoption: Scaling Headless Automation Across Your Organization
- Production Patterns: Running Safely at Scale
- Cost Management at Scale
- Summary
What Is Headless Automation?
Headless means no UI—no terminal, no prompts waiting for input, no human staring at a screen. Instead, your automation system programmatically invokes Claude Code with a prompt, waits for a result, and processes that result in code. It's the API version of Claude Code, not the interactive REPL version.
Think about typical automation scenarios in modern software development:
- CI/CD Code Generation: A pull request is opened. Your pipeline runs Claude Code to generate tests, refactor code, or suggest improvements. Results are posted back as comments on the PR.
- Scheduled Codebase Analysis: Every night, a cron job runs Claude Code to analyze your repository for code smells, security issues, architectural debt, or performance opportunities.
- Webhook-Triggered Tasks: Someone posts a GitHub issue. A webhook triggers Claude Code to investigate the problem, suggest solutions, even open a draft PR with a fix.
- Batch Processing: Process 100 files through Claude Code in parallel, collecting results into a report that gets emailed to the team.
In all these cases, no human is watching the execution unfold. Your system drives the interaction programmatically and handles the results automatically. This enables scale and integration that interactive mode can't achieve. You're not waiting for someone to review terminal output—the automation runs continuously, reports back to your systems, and feeds into your existing workflows.
Why This Matters: The Scale Problem
Interactive mode scales with human attention. One person runs Claude Code in the terminal, gets results, acts on them. That person can handle maybe 20-30 analyses per day if they're highly focused. Now multiply that across your engineering organization. If you've got 30 engineers each wanting to run code analysis, refactoring, or test generation, you're either hiring a dedicated person to run Claude Code for everyone, or building automation.
The headless API unlocks asynchronous, unattended scale. A CI/CD pipeline analyzes 50 pull requests per day automatically. A nightly job reviews your entire codebase for security issues while the team sleeps. A webhook responder handles GitHub issues 24/7. Claude Code becomes infrastructure, not a tool you run manually.
Think about the cost implications too. If you're paying a human $100k/year to manually run code generation tasks, and Claude Code can do that for $50/month in API costs, the economics are obvious. But the real win is speed. An analysis that takes a human 30 minutes happens in 30 seconds via API. You get 60x faster feedback loops on code quality, test coverage, architectural decisions.
The psychological shift is important: instead of treating Claude Code as "a tool I use," you start treating it as "a capability my systems have." Your CI pipeline has code review capability. Your nightly job has static analysis capability. Your webhook responder has problem-solving capability. These become features of your infrastructure, not manual tasks.
Why Headless Matters
Interactive mode is great for exploration and learning. You're experimenting, Claude Code is responding, you're adjusting based on what you learn. But production automation demands something fundamentally different. Let's talk about the constraints you face in production:
You can't afford to block your CI pipeline waiting for terminal input. If a code generation task hangs, your entire pipeline stalls. That's unacceptable. You need every execution logged so you have an audit trail if something goes wrong. Compliance demands this. You want to run hundreds of tasks in parallel without managing hundreds of terminal sessions. Your automation should feed seamlessly into your existing tools—GitHub, Slack, S3, databases—without manual intervention.
This is where the Agent SDK comes in. It's the bridge between your automation infrastructure and Claude Code's intelligence. It abstracts away the complexity of state management, provides structured results instead of raw text, and handles error cases gracefully. When you're building systems that run unattended, these details matter enormously. The difference between "automation that works 95% of the time with mysterious failures" and "automation that's reliable and auditable" is proper error handling, logging, and structured outputs.
The Hidden Cost of Interactive Mode at Scale
Let me be concrete about why interactive mode doesn't scale. Say you need to generate tests for 100 new functions in your codebase. In interactive mode, you'd:
- Open Claude Code terminal
- Write a prompt describing the first function
- Review the generated test
- Manually adjust if needed
- Save the test
- Repeat 99 more times
That's 100 iterations × 5 minutes per iteration = 8 hours of manual work. Now multiply that across your team. If 10 engineers each need to generate tests for 100 functions, that's 80 hours of work per week. You've basically hired a full-time person just to run Claude Code in a terminal.
Headless automation solves this: one configuration file says "generate tests for all functions in /src using this pattern." The automation runs overnight. By morning, you have 100 tests. A human reviews them in 30 minutes, approves or adjusts, and you're done. What took 8 hours now takes 30 minutes of human time plus some API cost.
The economic multiplier is huge. Every minute saved scales across your team. If you save 5 hours per engineer per week, and you've got 10 engineers, that's 50 hours per week. At $100/hour loaded cost, that's $5000/week in saved time for maybe $100/month in API costs. That's 50x ROI.
But the real value is less tangible: developers get better tools, faster feedback, and spend less time on mechanical tasks. They do better work. The culture shifts from "wait for manual review" to "get instant feedback from automation." Code quality goes up. Time to market goes down.
The Agent SDK Overview
The @anthropic-ai/claude-code SDK provides core capabilities for headless automation. Think of it as the production-grade interface to Claude Code's capabilities:
- Core Agent Runner: Execute Claude Code prompts programmatically with full control over configuration.
- Structured Results: Get outputs as typed objects, not raw text. JSON output, metadata, execution stats.
- Multi-Turn Conversations: Chain multiple prompts with context preservation. Ask a follow-up, get a refined answer.
- Tool Integration: Access to file operations, terminal commands, and APIs. Your agents aren't limited to analysis—they can take action.
- Logging & Monitoring: Full execution traces for audit and debugging. Know exactly what happened, when, and why.
Installation is straightforward:
npm install @anthropic-ai/claude-codeLet's dig into how this works in practice and see what you can actually build.
Getting Started: Your First Headless Execution
Here's the simplest possible example—a one-shot prompt to Claude Code that doesn't require any follow-up or iteration:
import { ClaudeCodeAgent } from "@anthropic-ai/claude-code";
async function analyzeCode() {
const agent = new ClaudeCodeAgent({
apiKey: process.env.ANTHROPIC_API_KEY,
model: "claude-3-5-sonnet-20241022",
});
const result = await agent.execute({
prompt:
"Analyze this TypeScript snippet for potential performance issues: const arr = []; for (let i = 0; i < 10000; i++) { arr.push(i); }",
timeout: 30000,
});
console.log("Analysis:", result.output);
console.log("Execution Time:", result.duration);
}
analyzeCode().catch(console.error);What's happening here: We instantiate a ClaudeCodeAgent with our API key and preferred model. We call execute() with a prompt. Claude Code analyzes the code and returns structured output. We can access result.output (the text response) and result.duration (how long it took).
Why this matters: No terminal. No interactive prompts. Just in → process → out. Your automation script can immediately use the analysis, log it, or pass it downstream. You've invoked Claude Code's reasoning power from within a larger automation workflow. This is the foundation for building production automation that scales.
Building Multi-Turn Conversations
Real-world automation often needs back-and-forth. You ask a question, Claude Code responds, then you ask a follow-up based on that response. The Agent SDK supports conversation sessions that maintain context across multiple turns:
import {
ClaudeCodeAgent,
ConversationSession,
} from "@anthropic-ai/claude-code";
async function interactiveAnalysis() {
const agent = new ClaudeCodeAgent({
apiKey: process.env.ANTHROPIC_API_KEY,
model: "claude-3-5-sonnet-20241022",
});
// Create a stateful conversation session
const session = new ConversationSession(agent);
try {
// First turn: understand the codebase structure
const structureAnalysis = await session.send(
"Analyze the structure of a TypeScript Node.js project. What are the key directories and their purposes?",
);
console.log("Structure:", structureAnalysis.output);
// Second turn: ask follow-up based on first response
const refactoringSuggestions = await session.send(
"Given that structure, what's a good strategy for adding unit tests? Where should test files live?",
);
console.log("Test Strategy:", refactoringSuggestions.output);
// Third turn: get specific code examples
const testExample = await session.send(
"Show me a skeleton Jest test file for a service class.",
);
console.log("Test Example:", testExample.output);
} finally {
// Clean up session (important for resource management)
await session.end();
}
}
interactiveAnalysis().catch(console.error);Key insight: The conversation maintains context. Claude Code remembers what you discussed in turn 1 and 2 when answering turn 3. This is critical for complex automation tasks where you're building on previous insights. The structure feels interactive to Claude Code, but from your code's perspective, it's all asynchronous and fully automated. You're not sitting at a terminal—you're programmatically directing a conversation and collecting results.
Practical Use Case: CI/CD Code Review Agent
Let's build something real. Imagine you want to automatically review pull requests for code quality, generate feedback, and post it back to GitHub. This is a concrete example of how headless automation powers real workflows:
import { ClaudeCodeAgent } from "@anthropic-ai/claude-code";
import { Octokit } from "@octokit/rest";
import * as fs from "fs";
interface ReviewResult {
issuesFound: string[];
suggestions: string[];
overallScore: number;
recommendedChanges: string[];
}
async function reviewPullRequest(
owner: string,
repo: string,
prNumber: number,
): Promise<ReviewResult> {
// Initialize GitHub client
const octokit = new Octokit({
auth: process.env.GITHUB_TOKEN,
});
// Fetch PR files
const filesResponse = await octokit.pulls.listFiles({
owner,
repo,
pull_number: prNumber,
});
// Prepare code snippets for review
const codeSnippets = await Promise.all(
filesResponse.data.map(async (file) => {
if (
file.patch &&
(file.filename.endsWith(".ts") ||
file.filename.endsWith(".js") ||
file.filename.endsWith(".tsx"))
) {
return {
filename: file.filename,
patch: file.patch,
};
}
return null;
}),
);
const reviewableCode = codeSnippets.filter(Boolean);
if (reviewableCode.length === 0) {
return {
issuesFound: [],
suggestions: [],
overallScore: 5,
recommendedChanges: [],
};
}
// Initialize Claude Code agent
const agent = new ClaudeCodeAgent({
apiKey: process.env.ANTHROPIC_API_KEY,
model: "claude-3-5-sonnet-20241022",
});
// Build review prompt
const reviewPrompt = `
You are an expert code reviewer. Review the following code changes from a pull request.
${reviewableCode
.map(
(f) => `
FILE: ${f.filename}
\`\`\`
${f.patch}
\`\`\`
`,
)
.join("\n")}
Provide your review in the following JSON format:
{
"issuesFound": ["Issue 1", "Issue 2"],
"suggestions": ["Suggestion 1", "Suggestion 2"],
"overallScore": 4,
"recommendedChanges": ["Change 1", "Change 2"]
}
Focus on:
- Type safety and proper TypeScript usage
- Performance implications
- Security concerns
- Code clarity and maintainability
- Testing coverage
`;
// Execute review
const result = await agent.execute({
prompt: reviewPrompt,
timeout: 60000,
});
// Parse JSON from output
let review: ReviewResult = {
issuesFound: [],
suggestions: [],
overallScore: 3,
recommendedChanges: [],
};
try {
// Extract JSON from Claude's response
const jsonMatch = result.output.match(/\{[\s\S]*\}/);
if (jsonMatch) {
review = JSON.parse(jsonMatch[0]);
}
} catch (e) {
console.error("Failed to parse review JSON:", e);
}
return review;
}
async function postReviewComment(
owner: string,
repo: string,
prNumber: number,
review: ReviewResult,
): Promise<void> {
const octokit = new Octokit({
auth: process.env.GITHUB_TOKEN,
});
const commentBody = `## 🤖 Claude Code Review
**Overall Score**: ${review.overallScore}/5
### Issues Found
${review.issuesFound.length > 0 ? review.issuesFound.map((i) => `- ${i}`).join("\n") : "None detected ✓"}
### Suggestions
${review.suggestions.length > 0 ? review.suggestions.map((s) => `- ${s}`).join("\n") : "Code looks good!"}
### Recommended Changes
${review.recommendedChanges.length > 0 ? review.recommendedChanges.map((c) => `- ${c}`).join("\n") : "No changes needed"}
---
*This review was generated by Claude Code Agent SDK*
`;
await octokit.issues.createComment({
owner,
repo,
issue_number: prNumber,
body: commentBody,
});
}
// Usage
(async () => {
const review = await reviewPullRequest("myorg", "myrepo", 42);
await postReviewComment("myorg", "myrepo", 42, review);
console.log("Review posted successfully");
})().catch(console.error);What this demonstrates:
- Fetching context: We pull PR files from GitHub.
- Batch processing: We send multiple code snippets in one prompt.
- Structured results: We ask Claude Code to return JSON, then parse it.
- Integration: We post results back to GitHub as comments.
This is a production-ready pattern. It could run as a GitHub Action, a webhook receiver, or a scheduled job. The point: Claude Code is now part of your automation infrastructure, reviewing code at scale without human intervention.
Error Handling and Resilience
Headless automation needs to be robust. Network fails. APIs rate-limit. Claude Code might timeout. Here's how to build resilience:
import { ClaudeCodeAgent, AgentError } from "@anthropic-ai/claude-code";
interface RetryOptions {
maxRetries: number;
backoffMs: number;
backoffMultiplier: number;
}
async function executeWithRetry(
agent: ClaudeCodeAgent,
prompt: string,
options: RetryOptions = {
maxRetries: 3,
backoffMs: 1000,
backoffMultiplier: 2,
},
): Promise<any> {
let lastError: Error | null = null;
let currentBackoff = options.backoffMs;
for (let attempt = 0; attempt <= options.maxRetries; attempt++) {
try {
console.log(`Attempt ${attempt + 1} of ${options.maxRetries + 1}`);
const result = await agent.execute({
prompt,
timeout: 30000,
});
// Success
console.log(`✓ Completed in ${result.duration}ms`);
return result;
} catch (error) {
lastError = error;
// Check if error is retryable
const isRetryable =
error instanceof AgentError &&
(error.code === "TIMEOUT" ||
error.code === "RATE_LIMIT" ||
error.code === "NETWORK_ERROR");
if (!isRetryable || attempt === options.maxRetries) {
throw error;
}
// Exponential backoff
console.warn(
`✗ Attempt ${attempt + 1} failed: ${error.message}. Retrying in ${currentBackoff}ms...`,
);
await new Promise((resolve) => setTimeout(resolve, currentBackoff));
currentBackoff *= options.backoffMultiplier;
}
}
throw lastError || new Error("Unknown error after retries");
}
// Usage
const agent = new ClaudeCodeAgent({
apiKey: process.env.ANTHROPIC_API_KEY,
});
executeWithRetry(agent, "Analyze this code for bugs")
.then((result) => console.log(result.output))
.catch((error) => console.error("Failed after retries:", error.message));Why this matters: Real-world systems fail occasionally. Your automation should gracefully retry on transient errors (timeouts, rate limits), but fail fast on permanent errors (invalid API key, not found). This pattern respects API limits while ensuring your automation is resilient to temporary hiccups.
Logging and Audit Trails
Compliance and debugging demand complete execution logs. Here's a structured logging approach:
import { ClaudeCodeAgent } from "@anthropic-ai/claude-code";
import * as fs from "fs";
import * as path from "path";
interface ExecutionLog {
timestamp: string;
executionId: string;
prompt: string;
result: {
output: string;
duration: number;
tokensUsed?: number;
};
error?: string;
metadata: Record<string, any>;
}
class LoggedClaudeCodeAgent {
private agent: ClaudeCodeAgent;
private logDirectory: string;
constructor(apiKey: string, logDirectory: string = "./execution-logs") {
this.agent = new ClaudeCodeAgent({
apiKey,
model: "claude-3-5-sonnet-20241022",
});
this.logDirectory = logDirectory;
// Ensure log directory exists
if (!fs.existsSync(logDirectory)) {
fs.mkdirSync(logDirectory, { recursive: true });
}
}
private generateExecutionId(): string {
return `exec-${Date.now()}-${Math.random().toString(36).substring(7)}`;
}
async execute(
prompt: string,
metadata: Record<string, any> = {},
): Promise<any> {
const executionId = this.generateExecutionId();
const timestamp = new Date().toISOString();
let log: ExecutionLog = {
timestamp,
executionId,
prompt,
result: {
output: "",
duration: 0,
},
metadata,
};
try {
console.log(`[${executionId}] Starting execution...`);
const startTime = Date.now();
const result = await this.agent.execute({
prompt,
timeout: 60000,
});
const duration = Date.now() - startTime;
log.result = {
output: result.output,
duration,
tokensUsed: result.tokensUsed,
};
console.log(`[${executionId}] ✓ Completed in ${duration}ms`);
} catch (error) {
log.error = error instanceof Error ? error.message : String(error);
console.error(`[${executionId}] ✗ Failed: ${log.error}`);
}
// Write log to disk
const logPath = path.join(this.logDirectory, `${executionId}.json`);
fs.writeFileSync(logPath, JSON.stringify(log, null, 2));
// Also append to summary log
const summaryPath = path.join(this.logDirectory, "summary.jsonl");
fs.appendFileSync(
summaryPath,
JSON.stringify({
executionId,
timestamp,
success: !log.error,
duration: log.result.duration,
}) + "\n",
);
if (log.error) {
throw new Error(log.error);
}
return log.result;
}
getExecutionLog(executionId: string): ExecutionLog | null {
const logPath = path.join(this.logDirectory, `${executionId}.json`);
if (fs.existsSync(logPath)) {
const content = fs.readFileSync(logPath, "utf-8");
return JSON.parse(content);
}
return null;
}
listExecutions(limit: number = 50): ExecutionLog[] {
const summaryPath = path.join(this.logDirectory, "summary.jsonl");
if (!fs.existsSync(summaryPath)) {
return [];
}
const lines = fs
.readFileSync(summaryPath, "utf-8")
.split("\n")
.filter(Boolean);
const executions: ExecutionLog[] = [];
for (const line of lines.slice(-limit)) {
const summary = JSON.parse(line);
const log = this.getExecutionLog(summary.executionId);
if (log) {
executions.push(log);
}
}
return executions.reverse();
}
}
// Usage
const loggedAgent = new LoggedClaudeCodeAgent(
process.env.ANTHROPIC_API_KEY || "",
);
(async () => {
try {
const result = await loggedAgent.execute("Write a hello world function", {
source: "cron-job",
priority: "high",
});
console.log("Result:", result.output);
} catch (error) {
console.error("Execution failed:", error);
}
// Later, query execution history
const recent = loggedAgent.listExecutions(10);
console.log(`Last 10 executions:`);
recent.forEach((log) => {
console.log(
` ${log.executionId}: ${log.error ? "FAILED" : "SUCCESS"} (${log.result.duration}ms)`,
);
});
})();What we're doing:
- Per-execution logs: Each run gets its own JSON file with full context.
- Summary log: A JSONL file (JSON lines) for quick historical queries.
- Metadata tagging: Attach source, priority, correlation IDs for tracking.
- Retrieval: Query logs by ID or list recent executions.
This is audit-trail gold. If something goes wrong in production, you have the exact prompt, output, timing, and context. You can replay failures, understand patterns, and debug with confidence.
Cron-Driven Automation: Nightly Code Analysis
Let's build something you'd actually schedule. A nightly job that analyzes your codebase for issues and posts a report:
import { ClaudeCodeAgent } from "@anthropic-ai/claude-code";
import * as fs from "fs";
import * as path from "path";
import { exec } from "child_process";
import { promisify } from "util";
const execAsync = promisify(exec);
interface CodeMetrics {
totalFiles: number;
linesOfCode: number;
averageFileSize: number;
largestFile: { name: string; lines: number };
issues: string[];
}
async function analyzeCodebase(
rootPath: string,
): Promise<{ codeSnippets: string; metrics: CodeMetrics }> {
// Count files and LOC
const { stdout: findOutput } = await execAsync(
`find ${rootPath} -type f \\( -name "*.ts" -o -name "*.js" -o -name "*.tsx" \\) | head -20`,
);
const files = findOutput.trim().split("\n").filter(Boolean);
let totalLoc = 0;
let largestFile = { name: "", lines: 0 };
const codeSnippets: string[] = [];
for (const file of files) {
try {
const content = fs.readFileSync(file, "utf-8");
const lines = content.split("\n").length;
totalLoc += lines;
if (lines > largestFile.lines) {
largestFile = { name: file, lines };
}
// Sample first file found
if (codeSnippets.length === 0) {
codeSnippets.push(
`FILE: ${file}\n\`\`\`\n${content.slice(0, 500)}\n...\n\`\`\``,
);
}
} catch (e) {
// Skip unreadable files
}
}
return {
codeSnippets: codeSnippets.join("\n\n"),
metrics: {
totalFiles: files.length,
linesOfCode: totalLoc,
averageFileSize: files.length > 0 ? totalLoc / files.length : 0,
largestFile,
issues: [],
},
};
}
async function nightly() {
console.log(
`[${new Date().toISOString()}] Starting nightly code analysis...`,
);
const agent = new ClaudeCodeAgent({
apiKey: process.env.ANTHROPIC_API_KEY,
});
try {
// Analyze codebase
const { codeSnippets, metrics } = await analyzeCodebase(process.cwd());
// Run analysis via Claude Code
const analysisPrompt = `
Perform a comprehensive code health analysis:
${codeSnippets}
Codebase Metrics:
- Total Files: ${metrics.totalFiles}
- Total LOC: ${metrics.linesOfCode}
- Largest File: ${metrics.largestFile.name} (${metrics.largestFile.lines} lines)
Identify:
1. Potential architectural issues
2. Code quality concerns
3. Security red flags
4. Performance bottlenecks
5. Testing gaps
Format as a markdown report.
`;
const result = await agent.execute({
prompt: analysisPrompt,
timeout: 120000,
});
// Save report
const reportPath = path.join(
process.cwd(),
"reports",
`analysis-${new Date().toISOString().split("T")[0]}.md`,
);
fs.mkdirSync(path.dirname(reportPath), { recursive: true });
fs.writeFileSync(reportPath, result.output);
console.log(`✓ Report saved to ${reportPath}`);
// Post to Slack if webhook configured
if (process.env.SLACK_WEBHOOK_URL) {
await fetch(process.env.SLACK_WEBHOOK_URL, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
text: "📊 Nightly Code Analysis Complete",
blocks: [
{
type: "section",
text: {
type: "mrkdwn",
text: `*Nightly Code Analysis*\n${metrics.totalFiles} files analyzed\n${metrics.linesOfCode} lines of code`,
},
},
],
}),
});
}
} catch (error) {
console.error("Analysis failed:", error);
process.exit(1);
}
}
// Run if called directly
if (require.main === module) {
nightly().catch(console.error);
}Cron setup (add to crontab):
# Every night at 2 AM
0 2 * * * cd /path/to/repo && npx ts-node nightly-analysis.tsThis is real automation: scan the codebase, invoke Claude Code to analyze patterns, generate a report, and notify the team. All without human intervention. This runs every night, consistently, feeding insights into your development process.
Performance and Cost Considerations
"Free-for-all" automation is expensive. Here are strategies to stay efficient and control costs.
Prompt Caching
For repeated analyses on large codebases, cache your prompts to avoid re-processing the same context multiple times:
async function analyzeWithCache(
agent: ClaudeCodeAgent,
staticContext: string,
dynamicPrompt: string,
): Promise<any> {
// Static context (large files, style guides) is cached
// Only the dynamic query varies
const fullPrompt = `
${staticContext}
---
NEW QUERY:
${dynamicPrompt}
`;
return agent.execute({
prompt: fullPrompt,
cache: {
ttl: 3600, // 1 hour cache
key: "static-analysis-context", // cache key
},
});
}Batch Processing with Concurrency Limits
Don't fire off 100 parallel requests. Use a queue to respect API limits:
import { Queue } from "p-queue";
async function processManyFiles(
agent: ClaudeCodeAgent,
files: string[],
): Promise<Map<string, any>> {
const queue = new Queue({ concurrency: 3 }); // Max 3 parallel
const results = new Map<string, any>();
const tasks = files.map((file) =>
queue.add(async () => {
const content = fs.readFileSync(file, "utf-8");
const result = await agent.execute({
prompt: `Review this file for issues:\n\`\`\`\n${content}\n\`\`\``,
});
results.set(file, result.output);
}),
);
await Promise.all(tasks);
return results;
}Cost Estimation
Track token usage to understand your costs:
let totalTokens = 0;
async function trackCost(
agent: ClaudeCodeAgent,
prompt: string,
): Promise<string> {
const result = await agent.execute({ prompt });
totalTokens += result.tokensUsed || 0;
return result.output;
}
// Estimate: $0.003 per 1M input tokens, $0.015 per 1M output tokens
const estimatedCost = (totalTokens * 0.003) / 1_000_000;
console.log(
`Estimated cost for ${totalTokens} tokens: $${estimatedCost.toFixed(2)}`,
);Common Pitfalls and How to Avoid Them
Pitfall 1: Assuming Idempotence
Claude Code isn't deterministic. The same prompt might produce slightly different results. For automation, build idempotent operations:
// WRONG: Assumes same result twice
const review1 = await agent.execute({ prompt: "Review this code" });
const review2 = await agent.execute({ prompt: "Review this code" });
if (review1.output !== review2.output) {
// Oops, they differ—now what?
}
// RIGHT: Use hashes and deduplication
const prompt = "Review this code";
const promptHash = crypto.createHash("sha256").update(prompt).digest("hex");
let review = cache.get(promptHash);
if (!review) {
review = await agent.execute({ prompt });
cache.set(promptHash, review.output);
}Pitfall 2: Timeout Hell
Long analyses timeout. Build in sensible defaults:
const DEFAULT_TIMEOUT = 60000; // 60 seconds
const MAX_TIMEOUT = 300000; // 5 minutes absolute max
async function executeWithTimeouts(
agent: ClaudeCodeAgent,
prompt: string,
estimatedComplexity: "simple" | "moderate" | "complex",
): Promise<any> {
const timeouts = {
simple: 10000,
moderate: 30000,
complex: 120000,
};
const timeout = Math.min(timeouts[estimatedComplexity], MAX_TIMEOUT);
return agent.execute({ prompt, timeout });
}Pitfall 3: Secrets in Prompts
Never include API keys, database passwords, or tokens in prompts. They'll be logged.
// WRONG
const prompt = `Connect to database: postgresql://${dbPassword}@localhost`;
// RIGHT
const prompt = `Connect to a PostgreSQL database. Credentials will be injected via environment.`;
// Then pass credentials separately via SDK configMonitoring Headless Executions
You can't see what Claude Code is doing in headless mode—so instrument it properly. Set up metrics that matter:
interface MetricsCollector {
recordExecution(
prompt: string,
duration: number,
tokensUsed: number,
success: boolean,
): void;
recordError(error: Error): void;
getStats(): {
totalExecutions: number;
successRate: number;
avgDuration: number;
totalTokens: number;
};
}
class SimpleMetricsCollector implements MetricsCollector {
private executions: Array<{
duration: number;
tokensUsed: number;
success: boolean;
}> = [];
private errors: Error[] = [];
recordExecution(
prompt: string,
duration: number,
tokensUsed: number,
success: boolean,
) {
this.executions.push({ duration, tokensUsed, success });
// Log to monitoring system (DataDog, New Relic, CloudWatch, etc.)
console.log(
JSON.stringify({
type: "claude-code-execution",
duration,
tokensUsed,
success,
timestamp: new Date().toISOString(),
}),
);
}
recordError(error: Error) {
this.errors.push(error);
}
getStats() {
const successful = this.executions.filter((e) => e.success).length;
return {
totalExecutions: this.executions.length,
successRate:
this.executions.length > 0 ? successful / this.executions.length : 0,
avgDuration:
this.executions.length > 0
? this.executions.reduce((sum, e) => sum + e.duration, 0) /
this.executions.length
: 0,
totalTokens: this.executions.reduce((sum, e) => sum + e.tokensUsed, 0),
};
}
}Real-World Scenario: Building a Continuous Code Quality System
Let me walk you through a complete scenario that ties all these concepts together. This is something a real team implemented and saw dramatic results.
The team had a growing Node.js codebase. Code review was becoming a bottleneck—PRs were waiting 2-3 days for review because the senior engineers who understood architectural patterns were stretched thin. They decided to build a continuous, automated system that would analyze every PR with Claude Code before humans even looked at it.
The system works like this:
- PR opened on GitHub. Webhook fires immediately.
- Webhook handler fetches the PR files and runs a headless Claude Code analysis.
- Analysis checks for: security issues, test coverage gaps, architectural violations, performance red flags.
- Results are posted as a PR comment.
- The human reviewer sees automated findings, then focuses on architectural/design questions that only humans can answer.
Results: PR review time dropped from 3 days average to 4 hours average. Why? Because reviewers didn't have to hunt for obvious bugs and coverage gaps—Claude Code found them. Reviewers spent their time on high-value decisions.
Cost: About $0.80 per PR (average 500 input tokens, 400 output tokens). With 50 PRs/day, that's $40/day or $1000/month for significantly faster, more consistent reviews.
The team measured that this automation saved them 10-15 hours of senior engineer time per week. At loaded cost, that's roughly $5-8k/week of value for $1k/month in API costs. That's a 5-8x ROI before you even count the benefit of faster feedback loops for developers.
But the real win was cultural: developers got instant feedback on code quality instead of waiting days. That feedback loop tightens the code culture. Developers write better code because they know it will be analyzed immediately.
The Implementation Deep Dive
Behind the scenes, the system had several interesting layers. The webhook handler needed to be fast—ideally responding to GitHub within 3 seconds so it doesn't timeout. The actual analysis ran asynchronously, posting results 30-90 seconds later. Why the decoupling? Because GitHub's webhook timeout is strict, and you don't want to block the webhook handler while Claude Code analyzes code.
The webhook handler's job was simple: receive the webhook, validate it came from GitHub (check the signature), extract PR details, and enqueue the analysis job. This took maybe 100ms. The async job then did the real work: fetch files, run analysis, post results. If the async job failed (network issue, API timeout), it retried automatically. This separation meant PRs always got results, even if something went wrong.
They also added filtering. Not every PR needed analysis. If someone just updated a README, skip it. If it's a revert of a previous PR, maybe use cached analysis from before. This reduced unnecessary API calls by about 20%, which meant lower costs and faster results for PRs that really needed analysis.
Another subtle thing: they kept a 30-day history of PR analyses. When a new PR came in, they compared it against the PR's base branch history. If the same files had been changed and reviewed before, they extracted patterns from the history. "This endpoint was criticized for N+1 queries last time someone modified it. Check again." This turned historical context into automation hints.
Measuring Impact
The team didn't just assume the automation was helping. They measured obsessively. Before implementing the system, they ran a baseline: how long did PRs spend waiting for review? 3 days average, with outliers at 7 days. After implementing the system, they measured again: 4 hours average, with outliers at 1 day.
But they also measured quality. Did the automated analysis catch real bugs? They sampled 50 PRs that the analyzer flagged issues in. Of those 50, how many did the human reviewer agree with? 94% had legitimate issues that the reviewer would have eventually caught. 6% were false positives. That's good accuracy.
They also tracked developer sentiment. Did developers like the automated feedback, or did they resent it as intrusive? Survey showed 87% found it helpful, 8% neutral, 5% negative. The negative feedback was mainly "it flags too much style stuff"—which was easy to tune by adjusting the analyzer's instructions.
This is how you prove automation value: measure the baseline, implement, measure again, compare. Don't guess. Don't assume. Numbers don't lie.
Extending the System
Once the PR analyzer was running smoothly, they extended it. They added a nightly codebase analysis that checked for architectural drift—comparing actual code structure against documented design. They added a weekly report showing trends: "Review time is improving 2 hours per week. Code quality issues are decreasing." This provided feedback to management and justified continued investment.
They also experimented with multi-language support. Their codebase wasn't just JavaScript—they had Python microservices, Go utilities, even some legacy Java. The analyzer needed to handle all of it. They created language-specific analysis profiles. "For Python code, check for type hints and docstring coverage. For Go, check for proper error handling. For JavaScript, check for TypeScript strict mode." Claude Code adapted its analysis based on the language.
This is where headless automation really shines. Once you've built the infrastructure, extending it is comparatively cheap. Adding a second language? Update the configuration. Adding a new type of analysis? Write a new agent profile. The underlying system stays the same. You're not rebuilding from scratch each time.
Team Adoption: Scaling Headless Automation Across Your Organization
Rolling out headless automation isn't just a technical problem—it's an organizational one. Here's how to do it successfully:
Phase 1: Proof of Concept (Week 1-2) Build one simple automation: maybe a GitHub webhook that reviews PRs, or a nightly cron job that checks for security issues. Get it working. Measure the value. Show it to the team. Real results are the best selling point.
Phase 2: Document and Standardize (Week 3-4) Write down how this automation works. Create templates for building new automations. Set up shared libraries for retry logic, logging, error handling. You want developers to say "building automation is easy because we have patterns and libraries." Not "it's complicated, I need to hire a specialist."
Phase 3: Expand Systematically (Month 2+) Once you've got one automation working, add another. Maybe a test generation system. Then a documentation builder. Then an architecture validator. Each one adds to your automation infrastructure.
Phase 4: Monitor and Refine Track which automations are valuable. Some will pay off immediately (PR review automation). Others might take longer to prove value (performance optimization suggestions). Be patient with slow winners, but kill automations that aren't delivering.
Critical Success Factors:
- Start small: Don't try to automate everything at once. One focused automation beats ten half-baked ones.
- Measure everything: You can't improve what you don't measure. Track success rate, cost, time saved, user satisfaction.
- Build feedback loops: Let developers tell you if automation is helping or hurting. Iterate based on feedback.
- Keep humans in the loop: Automation should augment human judgment, not replace it. Your senior engineers should review Claude's work, not blindly trust it.
- Invest in observability: When something goes wrong, you need to see it immediately. Set up alerts for automation failures.
Production Patterns: Running Safely at Scale
When you move from experimental to production automation, a few patterns become critical.
Pattern 1: Circuit Breakers If your Claude Code automation starts failing repeatedly, stop trying. A circuit breaker pattern catches this: track failure rate, and if it exceeds a threshold, stop calling the API and alert the team. Don't silently fail 100 times in a row.
Pattern 2: Graceful Degradation If Claude Code automation fails, can the system continue with reduced functionality? If your PR automation fails, maybe you fall back to a simpler check (just counting test coverage without analyzing code quality). Don't let one automation failure break everything.
Pattern 3: Rate Limiting Don't overwhelm the API. If you've got 100 PRs and all of them trigger Claude Code automation simultaneously, you've got a problem. Use queues and rate limiting. Process PRs at a steady rate, not in a spike.
Pattern 4: Audit Trails Log everything: what was analyzed, what was found, what was actioned, when, by whom. When something goes wrong in production, you need to understand what happened. Audit trails are your debugging tool.
Cost Management at Scale
Headless automation at scale can get expensive if you're not careful. Here are strategies to keep costs reasonable:
Strategy 1: Right-Size Your Prompts Every character costs money. Don't send the entire file history when you only need the latest version. Don't include 100 examples when 3 examples make the point. Lean prompts are cheap prompts.
Strategy 2: Cache Aggressively If you're analyzing the same file multiple times in a day, cache the results. If you're applying the same ruleset to multiple PRs, cache the ruleset. Caching can reduce API calls by 50-80%.
Strategy 3: Batch Process When Possible Instead of analyzing one file at a time, batch 10 files together. One API call for 10 files is cheaper than 10 calls for 1 file each.
Strategy 4: Use Cheaper Models for Simple Tasks Not every task needs Claude 3.5 Sonnet. Use Haiku for simple classification. Use Sonnet for code review. Save Opus for genuinely complex architectural analysis. You'll cut costs 30-50% with smart model selection.
Summary
The Agent SDK gives you Claude Code as a headless service. No terminal. No interactive waiting. Just prompts in, results out, automation at scale.
Key takeaways:
- Headless execution is perfect for CI/CD, cron jobs, and webhooks—your system drives the interaction, not a human.
- Multi-turn conversations let you build complex, contextual automation—ask follow-ups, refine results, iterate toward the right answer.
- Structured results (JSON) integrate cleanly with your infrastructure—your systems talk to Claude Code like they talk to any API.
- Resilience matters: retry logic, timeout handling, error tracking turn fragile automation into reliable infrastructure.
- Logging is non-negotiable: audit trails, metrics, execution history give you debugging superpowers when things go wrong.
- Cost control: use caching, concurrency limits, batch processing wisely—automation at scale demands discipline.
- Avoid pitfalls: don't assume idempotence, handle partial failures, never log secrets—these mistakes wreck production systems.
Start simple—one headless task, one integration point—then scale up. The SDK handles the heavy lifting while you focus on building the automation that matters for your team. You're not just speeding up individual tasks; you're fundamentally changing how your organization delivers code. That's the power of headless automation.
What will you automate first?
-iNet