April 23, 2025
Claude AI Development

Agent Sdk Streaming Responses Progress

You've probably experienced that frustrating moment when you're waiting for an AI system to respond, and there's just... nothing. No feedback. No indication of progress. Just a spinning wheel and the creeping anxiety that maybe something broke. With the Claude Code Agent SDK, you don't have to live in that uncertainty anymore.

The SDK gives you real-time visibility into exactly what Claude is doing—as it's happening. You can watch text generate character by character, see which tools Claude is calling and why, track when those tools return results, and know the precise moment a response is complete. More importantly, you can build this transparency directly into your applications so your users never feel left in the dark.

In this article, we're diving deep into how to work with streaming responses and progress indicators in the Agent SDK. By the end, you'll understand how to subscribe to events, build real-time progress bars, integrate streaming into web UIs, handle edge cases like cancellation and timeouts, and deploy complete streaming architectures in production. We'll explore the philosophy behind streaming, then progressively build more sophisticated implementations.

Table of Contents
  1. Why Streaming Matters More Than You Think
  2. The Hidden Cost of Non-Streaming Systems
  3. Understanding the Event Model
  4. Subscribing to Events with Event Handlers
  5. Building a Real-Time Progress Indicator
  6. Streaming to Web UIs with WebSockets
  7. Using Async Iterators for Streaming
  8. Cancellation and Timeout Patterns
  9. Handling Tool Calls and Results in Streaming
  10. Common Pitfalls and How to Avoid Them
  11. Integrating Streaming Into Your Architecture
  12. Advanced Streaming Patterns
  13. The Bigger Picture
  14. Summary
  15. Production Deployment Patterns for Streaming
  16. Streaming Large Responses: Memory and Performance
  17. Backpressure Handling in Real Systems
  18. Caching Streamed Responses
  19. Performance Optimization: Streaming for Efficiency at Scale
  20. Error Handling and Resilience in Streaming
  21. The Streaming Mindset

Why Streaming Matters More Than You Think

Here's the thing: streaming isn't just a nice-to-have feature. It fundamentally changes how users perceive your application and how you can build more responsive, trustworthy systems.

When you wait for a complete response before showing anything, a 5-second response feels agonizingly slow. It's an eternity in user time. But when you see text streaming in character by character, those same 5 seconds feel snappy and responsive. The difference isn't in the actual speed—it's in the feedback loop. Users feel in control when they can see progress. This is a fundamental UX principle that applies whether you're building AI assistants, file uploads, or any long-running operation.

Beyond perception, streaming enables some genuinely powerful capabilities:

Real-time debugging: Watch Claude's thoughts and tool calls unfold so you can catch issues immediately rather than discovering them at the end. If Claude is heading in the wrong direction, you can see it happening and potentially stop it before wasting computation.

Incremental value: Start displaying results to users before the entire response is ready. A streaming text response, even partially complete, provides value while waiting for tool results. You're not blocking on the final frame—you're showing value as soon as it's available.

Cancellation: If you can see what Claude is doing in real time, you can also cancel mid-stream if the direction feels wrong. This prevents wasted computation and gives users control. Waiting 30 seconds for a bad answer is frustrating. Stopping it after 5 seconds when you realize it's wrong is empowering.

Resource efficiency: Stream results to a database or file as they arrive rather than holding everything in memory. For long responses, this matters tremendously. A 100-page document doesn't need to live in RAM—stream it to disk as it generates.

Tool call visibility: Watch the exact moment Claude decides to use a tool, see the parameters it's sending, and get feedback when results come back—all without waiting for the complete agentic loop to finish. This transparency builds confidence in the system.

The Claude Code Agent SDK is built with streaming as a first-class citizen. Let's see how to actually use it.

The Hidden Cost of Non-Streaming Systems

Most people think streaming is just about speed perception. Show the user text faster, they feel like the system is faster. That's true, but it's only half the story. The deeper issue is cognitive load during uncertainty.

When a user sends a request to an AI system and then waits for a response, they enter a state of uncertainty. Did the request go through? Is the system working? Is it stuck? Is something broken? Humans have poor intuition for how long computation should take, so we assume the worst. Every second of silence feels like an eternity. Research in UX shows that perceived response time has a logarithmic relationship to actual response time—a user waiting for a response feels the delay much more acutely than if they're watching progress.

This has real business implications. In customer support scenarios, a user asking an AI assistant for help feels increasingly frustrated with each silent second. They'll refresh the page. They'll abandon the support interaction. They'll leave a negative review. A system that shows progress—text streaming in, indicators moving, visible work—keeps users engaged and patient. The same system without streaming drives them away.

For teams building internal tools, non-streaming responses create different problems. An engineering team waiting for Claude to analyze a codebase for security issues doesn't get progress feedback. They don't know if it's examining configuration files or running static analysis or analyzing dependencies. They can't tell if it's on track or stuck. This creates anxiety and reduces trust.

With streaming, the team sees the work happening. "Scanning TypeScript files..." "Analyzing dependencies..." "Checking authentication patterns..." The visibility builds confidence. The team understands what Claude is doing and can even interrupt if they realize it's going in the wrong direction. That control is psychologically powerful.

Understanding the Event Model

The Agent SDK doesn't force you into a streaming paradigm. You can still wait for complete responses if that's what you need. But under the hood, the SDK is emitting events as Claude works. Your job is to listen to those events and react to them.

Think of it like this: instead of a single "give me the full response when you're done" call, the SDK gives you a subscription model. You can subscribe to different types of events, and the SDK will notify you as each one occurs. This is the reactive programming pattern applied to AI generation.

Here are the event types the SDK emits:

text_delta: A chunk of text has been generated. This happens continuously as Claude writes. In typical usage, you'll get dozens or hundreds of these events as a response generates.

tool_use: Claude has decided to use a tool and is providing the parameters. This is where you learn what external resource Claude wants to access. This event fires before the tool actually executes—it's Claude saying "I'm about to do this."

tool_result: A tool has returned a result, and Claude is about to process it to continue the response. This is your signal that the tool execution completed and Claude can resume thinking.

message_complete: The entire message is finished. All text has been generated, all tool calls have been made and processed, and Claude is done. This is your cue to finalize the response and clean up resources.

These events come through in order, and they happen in real time. This is the hook you need to build progress indicators, real-time UIs, and responsive feedback loops. Understanding the event model is foundational—everything else builds on it.

Subscribing to Events with Event Handlers

The simplest way to consume streaming events is through event handlers. When you create an Agent SDK session and ask it to process a request, you can attach listeners to handle each event type as it arrives.

Here's the basic pattern:

typescript
import { AgentSDK } from "@anthropic-ai/agent-sdk";
 
const sdk = new AgentSDK({
  apiKey: process.env.ANTHROPIC_API_KEY,
});
 
// Create a session
const session = await sdk.createSession();
 
// Set up event handlers
session.on("text_delta", (event) => {
  console.log("Text chunk:", event.text);
});
 
session.on("tool_use", (event) => {
  console.log("Tool called:", event.toolName);
  console.log("Parameters:", event.parameters);
});
 
session.on("tool_result", (event) => {
  console.log("Tool completed:", event.toolName);
  console.log("Result:", event.result);
});
 
session.on("message_complete", (event) => {
  console.log("Done! Full message:", event.text);
});
 
// Trigger a response
await session.process("What's the weather in San Francisco?");

In this setup, every time Claude generates a piece of text, the text_delta handler fires. When Claude calls a tool, tool_use fires with the tool name and parameters. When that tool returns a result, tool_result fires. And finally, message_complete tells you the entire response is done.

This event-driven model is powerful because it decouples the response generation from the consumption. Claude's work happens independently; you subscribe to the events you care about. You could have multiple subscribers listening to the same stream. You could log to one place, render to the UI elsewhere, and write to a database simultaneously—all from the same event stream.

Let's look at a more practical example. Imagine you're building a search-powered chatbot. When the user asks a question, Claude might decide to search for information, process the results, and then write an answer. With streaming events, you can see exactly when each step happens:

typescript
let currentToolName = "";
let fullText = "";
 
session.on("text_delta", (event) => {
  fullText += event.text;
  process.stdout.write(event.text); // Real-time output
});
 
session.on("tool_use", (event) => {
  currentToolName = event.toolName;
  console.log(`\n🔧 Calling ${event.toolName}...`);
});
 
session.on("tool_result", (event) => {
  console.log(`✓ ${currentToolName} returned results`);
});
 
session.on("message_complete", (event) => {
  console.log(`\n✨ Complete! Total response: ${fullText.length} characters`);
});
 
await session.process(
  "Find me information about the latest AI research trends",
);

Notice how we're building state as events arrive. We track the current tool name, accumulate text, and print status messages. This gives the user real-time feedback without forcing them to wait for the entire response.

Building a Real-Time Progress Indicator

Now let's build something users can actually see: a progress indicator that shows which tool is executing and how much text has been generated so far. Progress indicators are crucial for longer operations where users would otherwise feel abandoned.

If Claude needs to call multiple tools in sequence, the user should see exactly where you are in that pipeline. Here's a practical implementation:

typescript
interface ProgressState {
  textLength: number;
  toolsExecuted: string[];
  currentTool: string | null;
  startTime: number;
  estimatedCompletion: number | null;
}
 
class ProgressTracker {
  private state: ProgressState = {
    textLength: 0,
    toolsExecuted: [],
    currentTool: null,
    startTime: Date.now(),
    estimatedCompletion: null,
  };
 
  // Track text generation
  onTextDelta(text: string): void {
    this.state.textLength += text.length;
    this.printProgress();
  }
 
  // Track tool execution start
  onToolUse(toolName: string): void {
    this.state.currentTool = toolName;
    console.log(`\n📡 Starting: ${toolName}`);
  }
 
  // Track tool execution complete
  onToolResult(toolName: string): void {
    this.state.toolsExecuted.push(toolName);
    this.state.currentTool = null;
    console.log(`✓ Completed: ${toolName}`);
  }
 
  // Track completion
  onComplete(): void {
    const elapsed = Date.now() - this.state.startTime;
    console.log(`\n🎯 Done in ${elapsed}ms`);
    console.log(`   Generated ${this.state.textLength} characters`);
    console.log(`   Executed ${this.state.toolsExecuted.length} tools`);
    console.log(`   Tools: ${this.state.toolsExecuted.join(", ")}`);
  }
 
  private printProgress(): void {
    const elapsed = Date.now() - this.state.startTime;
    const rate = this.state.textLength / (elapsed / 1000);
 
    let status = `\r📝 ${this.state.textLength} chars`;
 
    if (this.state.currentTool) {
      status += ` | 🔧 ${this.state.currentTool}`;
    }
 
    if (rate > 0) {
      status += ` | ${rate.toFixed(0)} chars/sec`;
    }
 
    process.stdout.write(status);
  }
 
  getState(): ProgressState {
    return { ...this.state };
  }
}
 
// Usage
const tracker = new ProgressTracker();
 
session.on("text_delta", (event) => {
  tracker.onTextDelta(event.text);
});
 
session.on("tool_use", (event) => {
  tracker.onToolUse(event.toolName);
});
 
session.on("tool_result", (event) => {
  tracker.onToolResult(event.toolName);
});
 
session.on("message_complete", () => {
  tracker.onComplete();
});
 
await session.process("Analyze the market trends for tech stocks");

This ProgressTracker gives you a real-time view into what's happening. It shows:

  • How much text has been generated so far
  • Which tool is currently executing
  • Generation speed in characters per second
  • Total execution time and summary at the end

When you run this, you'll see output like:

📝 254 chars | 🔧 search_market_data | 128 chars/sec
✓ Completed: search_market_data
📝 512 chars | 🔧 analyze_data | 256 chars/sec
✓ Completed: analyze_data
📝 1024 chars | 96 chars/sec

🎯 Done in 3421ms
   Generated 1024 characters
   Executed 2 tools
   Tools: search_market_data, analyze_data

This is infinitely better than a blank screen. You've transformed a black-box operation into a transparent, observable process.

Streaming to Web UIs with WebSockets

One of the most powerful uses of streaming is pushing real-time updates to a web UI via WebSocket. Instead of your server waiting for the entire response and then sending it, you stream updates as they arrive, and the browser displays them live. This is where streaming truly shines for user experience.

Here's how you'd set this up:

typescript
import { WebSocketServer, WebSocket } from "ws";
import { AgentSDK } from "@anthropic-ai/agent-sdk";
 
const wss = new WebSocketServer({ port: 8080 });
const sdk = new AgentSDK({
  apiKey: process.env.ANTHROPIC_API_KEY,
});
 
wss.on("connection", async (ws: WebSocket) => {
  console.log("Client connected");
 
  ws.on("message", async (message: string) => {
    try {
      const userQuery = JSON.parse(message).query;
 
      // Create session for this user
      const session = await sdk.createSession();
 
      // Stream events to the connected client
      session.on("text_delta", (event) => {
        ws.send(
          JSON.stringify({
            type: "text_delta",
            text: event.text,
          }),
        );
      });
 
      session.on("tool_use", (event) => {
        ws.send(
          JSON.stringify({
            type: "tool_use",
            tool: event.toolName,
            parameters: event.parameters,
          }),
        );
      });
 
      session.on("tool_result", (event) => {
        ws.send(
          JSON.stringify({
            type: "tool_result",
            tool: event.toolName,
            result: event.result,
          }),
        );
      });
 
      session.on("message_complete", (event) => {
        ws.send(
          JSON.stringify({
            type: "message_complete",
            finalText: event.text,
          }),
        );
      });
 
      // Process the user's query
      await session.process(userQuery);
    } catch (error) {
      ws.send(
        JSON.stringify({
          type: "error",
          message: error.message,
        }),
      );
    }
  });
 
  ws.on("close", () => {
    console.log("Client disconnected");
  });
});

On the browser side, you'd consume these events and update the UI in real time:

typescript
// Client-side TypeScript
class ChatUI {
  private ws: WebSocket;
  private textElement: HTMLElement;
  private statusElement: HTMLElement;
 
  constructor() {
    this.ws = new WebSocket("ws://localhost:8080");
    this.textElement = document.getElementById("response")!;
    this.statusElement = document.getElementById("status")!;
 
    this.ws.onmessage = (event) => {
      const message = JSON.parse(event.data);
 
      switch (message.type) {
        case "text_delta":
          // Stream text in real time
          this.textElement.textContent += message.text;
          break;
 
        case "tool_use":
          // Show which tool is running
          this.statusElement.textContent = `🔧 Running ${message.tool}...`;
          break;
 
        case "tool_result":
          // Confirm tool completed
          this.statusElement.textContent = `✓ ${message.tool} done`;
          break;
 
        case "message_complete":
          // Mark as done
          this.statusElement.textContent = "✨ Complete";
          break;
 
        case "error":
          this.statusElement.textContent = `❌ Error: ${message.message}`;
          break;
      }
    };
  }
 
  sendQuery(query: string): void {
    this.textElement.textContent = "";
    this.statusElement.textContent = "Thinking...";
    this.ws.send(JSON.stringify({ query }));
  }
}
 
// Initialize when DOM is ready
document.addEventListener("DOMContentLoaded", () => {
  const ui = new ChatUI();
  document.getElementById("send-btn")!.addEventListener("click", () => {
    const input = document.getElementById("query") as HTMLInputElement;
    ui.sendQuery(input.value);
    input.value = "";
  });
});

The result: users see text appearing character by character, watch tool execution in real time, and get immediate feedback when tools complete. No spinners, no waiting—just pure streaming feedback. This is the difference between "I'm not sure if this is working" and "I can see exactly what's happening."

Using Async Iterators for Streaming

If you prefer a more functional approach, the Agent SDK also supports async iterators. Instead of event handlers, you can iterate through events as they arrive:

typescript
const session = await sdk.createSession();
 
// Get an async iterator for all events
for await (const event of session.stream("Solve this problem")) {
  switch (event.type) {
    case "text_delta":
      process.stdout.write(event.text);
      break;
 
    case "tool_use":
      console.log(`\nCalling ${event.toolName}...`);
      break;
 
    case "tool_result":
      console.log(`Got result from ${event.toolName}`);
      break;
 
    case "message_complete":
      console.log("\nDone!");
      break;
  }
}

This approach is cleaner if you're building sequential logic that depends on event order. The iterator guarantees you get events in the exact order they occurred, so you can reliably compose operations. It's also more idiomatic JavaScript/TypeScript.

A key advantage: async iterators work naturally with async/await, so error handling is more intuitive:

typescript
try {
  for await (const event of session.stream(userQuery)) {
    // Process event
  }
} catch (error) {
  console.error("Stream interrupted:", error);
}

You can combine this with cancellation (which we'll cover next) for graceful error handling. Async iterators are particularly useful when you're chaining multiple async operations or need to integrate with modern async frameworks.

Cancellation and Timeout Patterns

Here's a scenario you'll definitely encounter: the user wants to stop waiting for a response mid-stream. Maybe Claude is taking too long, or the user realizes they asked the wrong question. How do you handle this gracefully?

The Agent SDK supports cancellation via AbortController, a standard browser and Node.js API for signaling cancellation:

typescript
const controller = new AbortController();
 
// Set a timeout—cancel after 30 seconds
const timeoutId = setTimeout(() => {
  controller.abort();
}, 30000);
 
try {
  const session = await sdk.createSession();
 
  for await (const event of session.stream(userQuery, {
    signal: controller.signal,
  })) {
    switch (event.type) {
      case "text_delta":
        process.stdout.write(event.text);
        break;
      // ... other cases
    }
  }
 
  clearTimeout(timeoutId); // Cancel the timeout if completed naturally
} catch (error) {
  if (error.name === "AbortError") {
    console.log("Stream was cancelled");
  } else {
    throw error;
  }
}

But timeouts are just one cancellation pattern. You might also want to let users manually stop a response:

typescript
class InterruptibleStream {
  private controller: AbortController = new AbortController();
  private session: any;
 
  async start(query: string, sdk: AgentSDK): Promise<void> {
    this.session = await sdk.createSession();
 
    try {
      for await (const event of this.session.stream(query, {
        signal: this.controller.signal,
      })) {
        // Process event
      }
    } catch (error) {
      if (error.name === "AbortError") {
        console.log("User cancelled");
      }
    }
  }
 
  // Allow external code to stop the stream
  cancel(): void {
    this.controller.abort();
  }
}
 
// Usage: wire up a "stop" button
const stream = new InterruptibleStream();
 
startButton.addEventListener("click", () => {
  stream.start(userQuery, sdk);
});
 
stopButton.addEventListener("click", () => {
  stream.cancel();
});

In a web context, you could emit a cancellation signal back to your server:

typescript
// Server endpoint for cancellation
app.post("/api/cancel-stream/:sessionId", (req, res) => {
  const { sessionId } = req.params;
  // Look up the session's AbortController and call abort()
  sessions.get(sessionId)?.controller.abort();
  res.json({ cancelled: true });
});
 
// Client-side cancel button
cancelButton.addEventListener("click", async () => {
  await fetch(`/api/cancel-stream/${sessionId}`, { method: "POST" });
});

One pitfall to watch: don't ignore AbortError exceptions. They're not bugs—they're the expected way cancellation works. If you ignore them, you might not properly clean up resources:

typescript
// ❌ DON'T: Swallow abort errors
try {
  for await (const event of session.stream(query, { signal })) {
    // ...
  }
} catch (error) {
  // Ignoring all errors, including cancellation
}
 
// ✅ DO: Handle abort errors specifically
try {
  for await (const event of session.stream(query, { signal })) {
    // ...
  }
} catch (error) {
  if (error.name === "AbortError") {
    console.log("Stream cancelled by user");
    // Clean up resources
  } else {
    // Unexpected error
    throw error;
  }
}

Handling Tool Calls and Results in Streaming

Tools are where things get interesting. When Claude wants to use a tool, you get a tool_use event. But here's the critical part: you have to handle the tool call yourself and send back a result via tool_result.

This is different from the non-streaming path where the SDK might handle some tools automatically. In streaming, you're responsible for the entire loop. This gives you fine-grained control over tool execution.

Here's what that looks like:

typescript
interface Tool {
  name: string;
  description: string;
  execute: (parameters: any) => Promise<any>;
}
 
// Register tools
const tools: Map<string, Tool> = new Map([
  [
    "search",
    {
      name: "search",
      description: "Search the web",
      execute: async (params) => {
        // Simulate web search
        return { results: ["Result 1", "Result 2"] };
      },
    },
  ],
  [
    "calculate",
    {
      name: "calculate",
      description: "Perform calculations",
      execute: async (params) => {
        return { result: eval(params.expression) };
      },
    },
  ],
]);
 
// Process with streaming
const session = await sdk.createSession();
 
session.on("tool_use", async (event) => {
  const tool = tools.get(event.toolName);
 
  if (!tool) {
    console.error(`Tool not found: ${event.toolName}`);
    return;
  }
 
  try {
    console.log(`Executing ${event.toolName}...`);
    const result = await tool.execute(event.parameters);
 
    // Send result back to Claude
    await session.provideToolResult({
      toolName: event.toolName,
      result: result,
    });
  } catch (error) {
    // Send error back to Claude
    await session.provideToolResult({
      toolName: event.toolName,
      error: error.message,
    });
  }
});
 
session.on("text_delta", (event) => {
  process.stdout.write(event.text);
});
 
session.on("message_complete", () => {
  console.log("\nDone!");
});
 
await session.process("Search for information about quantum computing");

The key insight: tool execution happens inline with streaming. Claude generates some text, decides to use a tool, you execute it and send results back, then Claude continues generating. All of this happens in real time. There's no waiting for everything to buffer up.

This enables some powerful patterns. For example, you can show tool results to the user immediately:

typescript
let pendingTools: Map<string, string> = new Map();
 
session.on("tool_use", async (event) => {
  // Show user we're calling a tool
  const toolId = `tool-${Date.now()}`;
  pendingTools.set(toolId, event.toolName);
 
  ws.send(
    JSON.stringify({
      type: "tool_start",
      id: toolId,
      name: event.toolName,
      parameters: event.parameters,
    }),
  );
 
  try {
    const result = await tools.get(event.toolName)?.execute(event.parameters);
 
    ws.send(
      JSON.stringify({
        type: "tool_result",
        id: toolId,
        result: result,
      }),
    );
 
    await session.provideToolResult({
      toolName: event.toolName,
      result: result,
    });
  } catch (error) {
    ws.send(
      JSON.stringify({
        type: "tool_error",
        id: toolId,
        error: error.message,
      }),
    );
 
    await session.provideToolResult({
      toolName: event.toolName,
      error: error.message,
    });
  }
 
  pendingTools.delete(toolId);
});

On the client side, you can display tool execution in a timeline:

typescript
// Client-side
let toolTimeline: Array<any> = [];
 
ws.onmessage = (event) => {
  const message = JSON.parse(event.data);
 
  if (message.type === "tool_start") {
    toolTimeline.push({
      id: message.id,
      name: message.name,
      status: "running",
      startTime: Date.now(),
    });
    renderToolTimeline();
  }
 
  if (message.type === "tool_result") {
    const tool = toolTimeline.find((t) => t.id === message.id);
    if (tool) {
      tool.status = "completed";
      tool.duration = Date.now() - tool.startTime;
      tool.result = message.result;
      renderToolTimeline();
    }
  }
 
  if (message.type === "tool_error") {
    const tool = toolTimeline.find((t) => t.id === message.id);
    if (tool) {
      tool.status = "error";
      tool.error = message.error;
      renderToolTimeline();
    }
  }
};
 
function renderToolTimeline() {
  const html = toolTimeline
    .map((tool) => {
      let icon = "⏳";
      if (tool.status === "completed") icon = "✓";
      if (tool.status === "error") icon = "✗";
 
      return `
        <div class="tool-item">
          <span class="icon">${icon}</span>
          <span class="name">${tool.name}</span>
          ${tool.duration ? `<span class="duration">${tool.duration}ms</span>` : ""}
          ${tool.error ? `<span class="error">${tool.error}</span>` : ""}
        </div>
      `;
    })
    .join("");
 
  document.getElementById("tool-timeline").innerHTML = html;
}

This pattern—showing tool execution in real time—creates a powerful debugging experience. Users can see exactly what Claude is doing and why. If a tool call fails, they see it immediately. If a tool returns unexpected results, they see that too. This transparency is invaluable for building trust in AI systems.

Common Pitfalls and How to Avoid Them

You're going to hit some snags as you work with streaming. Let me save you some debugging time.

Pitfall #1: Forgetting to handle tool results

When you get a tool_use event, you MUST call provideToolResult before Claude can continue. If you don't, the session hangs forever.

typescript
// ❌ Wrong: Get tool event but don't send result
session.on("tool_use", (event) => {
  console.log("Tool:", event.toolName);
  // Forgot to call provideToolResult!
});
 
// ✅ Right: Always send result (even if it's an error)
session.on("tool_use", async (event) => {
  try {
    const result = await executeTool(event);
    await session.provideToolResult({ toolName: event.toolName, result });
  } catch (error) {
    await session.provideToolResult({
      toolName: event.toolName,
      error: error.message,
    });
  }
});

Pitfall #2: Buffering entire responses when streaming is available

If you're building a progress indicator, don't wait for message_complete to show anything. Show text as it arrives:

typescript
// ❌ Wrong: Only show text at the end
let fullText = "";
session.on("text_delta", (event) => {
  fullText += event.text;
});
session.on("message_complete", () => {
  displayText(fullText); // All at once
});
 
// ✅ Right: Show as it arrives
session.on("text_delta", (event) => {
  appendTextToUI(event.text); // Incremental display
});

Pitfall #3: Not handling the AbortError exception

When you cancel a stream, it throws an AbortError. If you don't catch it properly, it'll crash your process:

typescript
// ❌ Wrong: Generic catch loses the abort signal
try {
  for await (const event of session.stream(query, { signal })) {
    // Process
  }
} catch (error) {
  console.log("Error:", error.message); // Treats cancel like an error
}
 
// ✅ Right: Handle abort specifically
try {
  for await (const event of session.stream(query, { signal })) {
    // Process
  }
} catch (error) {
  if (error.name !== "AbortError") {
    throw error; // Re-throw real errors
  }
}

Pitfall #4: Race conditions with multiple concurrent streams

If you're running multiple sessions at once, be careful with shared state:

typescript
// ❌ Wrong: Shared state gets mixed up
let currentToolName = ""; // Shared across sessions!
 
for (const query of queries) {
  const session = await sdk.createSession();
  session.on("tool_use", (event) => {
    currentToolName = event.toolName; // Race condition!
  });
  // ...
}
 
// ✅ Right: Isolate state per session
for (const query of queries) {
  const session = await sdk.createSession();
  const state = { toolName: "" }; // Per-session state
 
  session.on("tool_use", (event) => {
    state.toolName = event.toolName;
  });
  // ...
}

Pitfall #5: Not disposing of resources on cancellation

When a stream is cancelled, you need to clean up resources that might have been allocated:

typescript
// ✅ RIGHT: Clean up on cancellation
try {
  for await (const event of session.stream(query, { signal })) {
    // Process event
  }
} catch (error) {
  if (error.name === "AbortError") {
    // Clean up resources
    await db.close();
    await tempFile.delete();
    console.log("Resources cleaned up after cancellation");
  } else {
    throw error;
  }
}

Integrating Streaming Into Your Architecture

So far we've looked at individual pieces. Let's put it together into a real architecture that you could actually deploy to production.

Here's a complete example: a streaming assistant service with progress tracking, tool execution, and WebSocket integration:

typescript
import { AgentSDK } from "@anthropic-ai/agent-sdk";
import { WebSocketServer } from "ws";
import express from "express";
 
// Tool implementations
const tools = {
  search: async (params: { query: string }) => {
    // Simulate web search
    return { results: ["Result 1", "Result 2"] };
  },
  weather: async (params: { location: string }) => {
    // Simulate weather API
    return { temp: 72, condition: "sunny" };
  },
};
 
// Session manager
class SessionManager {
  private sessions = new Map<string, any>();
 
  create(id: string, ws: any) {
    this.sessions.set(id, { ws, startTime: Date.now() });
  }
 
  get(id: string) {
    return this.sessions.get(id);
  }
 
  delete(id: string) {
    this.sessions.delete(id);
  }
}
 
const sessionManager = new SessionManager();
const sdk = new AgentSDK({ apiKey: process.env.ANTHROPIC_API_KEY });
 
// WebSocket server
const wss = new WebSocketServer({ port: 8080 });
 
wss.on("connection", async (ws) => {
  const sessionId = `session-${Date.now()}`;
  sessionManager.create(sessionId, ws);
 
  ws.on("message", async (message: string) => {
    try {
      const { query } = JSON.parse(message);
      const session = await sdk.createSession();
 
      // Track progress
      let textLength = 0;
      const toolsExecuted: string[] = [];
 
      session.on("text_delta", (event) => {
        textLength += event.text.length;
        ws.send(
          JSON.stringify({
            type: "text_delta",
            text: event.text,
            progress: {
              textLength,
              toolsExecuted,
            },
          }),
        );
      });
 
      session.on("tool_use", async (event) => {
        ws.send(
          JSON.stringify({
            type: "tool_start",
            tool: event.toolName,
            parameters: event.parameters,
          }),
        );
 
        // Execute tool
        const toolFn = tools[event.toolName as keyof typeof tools];
        if (!toolFn) {
          throw new Error(`Unknown tool: ${event.toolName}`);
        }
 
        try {
          const result = await toolFn(event.parameters);
          toolsExecuted.push(event.toolName);
 
          await session.provideToolResult({
            toolName: event.toolName,
            result,
          });
 
          ws.send(
            JSON.stringify({
              type: "tool_result",
              tool: event.toolName,
              result,
            }),
          );
        } catch (error) {
          await session.provideToolResult({
            toolName: event.toolName,
            error: (error as Error).message,
          });
 
          ws.send(
            JSON.stringify({
              type: "tool_error",
              tool: event.toolName,
              error: (error as Error).message,
            }),
          );
        }
      });
 
      session.on("message_complete", () => {
        ws.send(
          JSON.stringify({
            type: "complete",
            summary: {
              textLength,
              toolsExecuted,
              duration: Date.now() - sessionManager.get(sessionId)!.startTime,
            },
          }),
        );
      });
 
      await session.process(query);
    } catch (error) {
      ws.send(
        JSON.stringify({
          type: "error",
          message: (error as Error).message,
        }),
      );
    }
  });
 
  ws.on("close", () => {
    sessionManager.delete(sessionId);
  });
});
 
// Express server for static files
const app = express();
app.use(express.static("public"));
app.listen(3000, () => {
  console.log("Server on port 3000");
  console.log("WebSocket on port 8080");
});

And on the client, you'd consume this with something like:

typescript
class StreamingChat {
  private ws: WebSocket;
 
  constructor() {
    this.ws = new WebSocket("ws://localhost:8080");
    this.ws.onmessage = (event) => this.handleMessage(JSON.parse(event.data));
  }
 
  private handleMessage(msg: any) {
    switch (msg.type) {
      case "text_delta":
        document.getElementById("response")!.textContent += msg.text;
        this.updateProgress(msg.progress);
        break;
 
      case "tool_start":
        this.showTool(msg.tool, "running");
        break;
 
      case "tool_result":
        this.showTool(msg.tool, "completed");
        break;
 
      case "complete":
        this.showSummary(msg.summary);
        break;
    }
  }
 
  private updateProgress(progress: any) {
    const el = document.getElementById("stats")!;
    el.textContent = `${progress.textLength} chars | ${progress.toolsExecuted.length} tools`;
  }
 
  private showTool(name: string, status: string) {
    const el = document.getElementById("tools")!;
    const toolEl = document.createElement("div");
    toolEl.className = `tool tool-${status}`;
    toolEl.textContent = `${status === "running" ? "⏳" : "✓"} ${name}`;
    el.appendChild(toolEl);
  }
 
  private showSummary(summary: any) {
    console.log("Complete:", summary);
  }
 
  send(query: string) {
    this.ws.send(JSON.stringify({ query }));
  }
}

This architecture is production-ready: it handles multiple concurrent users, tracks progress, executes tools, and streams everything to the client in real time.

Advanced Streaming Patterns

As you get more sophisticated, you'll want additional patterns. One useful pattern is adaptive backpressure—slowing down streaming if the client can't keep up:

typescript
session.on("text_delta", (event) => {
  // Only send if the WebSocket buffer isn't full
  if (ws.bufferedAmount < 65536) {
    // 64KB threshold
    ws.send(
      JSON.stringify({
        type: "text_delta",
        text: event.text,
      }),
    );
  } else {
    // Queue for retry
    console.log("WebSocket buffer full, applying backpressure");
  }
});

Another pattern is persistence—saving streamed responses as they arrive:

typescript
const fileStream = fs.createWriteStream("response.txt");
 
session.on("text_delta", (event) => {
  fileStream.write(event.text);
  ws.send(
    JSON.stringify({
      type: "text_delta",
      text: event.text,
    }),
  );
});
 
session.on("message_complete", () => {
  fileStream.end();
});

The Bigger Picture

Streaming responses and progress indicators aren't just nice UI features. They're fundamental to building AI systems that users trust. When you can see what Claude is doing—when you watch it call tools, get results, and continue thinking—you get confidence. You feel in control.

The Agent SDK gives you all the primitives you need: event handlers, async iterators, tool execution hooks, and cancellation support. The patterns we've covered—progress tracking, WebSocket integration, tool execution, cancellation—these are the building blocks you'll use again and again.

Start simple: add text_delta listeners to see streaming in action. Build from there. Maybe add a progress bar. Then integrate with WebSockets. Then add tool tracking. Each step builds on the previous one, and each one adds real value to your users.

The key insight: streaming is a feature, but it's really about transparency. When users see what's happening, they become partners in the process rather than passive consumers waiting for a result. That changes everything.

Summary

You now understand how to work with streaming responses and progress in the Agent SDK. We covered:

  • The event model: How the SDK emits text_delta, tool_use, tool_result, and message_complete events
  • Event handlers and iterators: Two ways to subscribe to events
  • Progress tracking: Building real-time indicators that show generation speed and tool execution
  • WebSocket integration: Streaming events to a browser UI for live updates
  • Tool execution: Handling tool calls and providing results back to Claude
  • Cancellation patterns: Using AbortController for timeouts and user-initiated stops
  • Real-world architecture: A complete implementation with all pieces working together
  • Common pitfalls: The mistakes you'll make and how to avoid them
  • Advanced patterns: Backpressure management, persistence, and more

The Agent SDK makes streaming a first-class experience, not an afterthought. Use that power to build systems that give your users real-time visibility into what Claude is doing. The transparency pays off in user satisfaction, trust, and ultimately, better outcomes.

Ready to start streaming? Pick one pattern from this article, implement it, and watch your users experience the difference immediate feedback makes.

Production Deployment Patterns for Streaming

Moving streaming from a local prototype to a production system requires thinking about scaling, reliability, and operational monitoring. The simple examples in this article work great for prototypes, but production adds complexity.

Connection pooling becomes essential. If you're running a streaming service that handles hundreds of concurrent users, each creating a WebSocket connection and spawning multiple Claude Code sessions, you need to manage resources carefully. Implement connection pooling to reuse sessions across requests:

typescript
class SessionPool {
  private pool: Map<string, AgentSDK> = new Map();
  private maxPoolSize: number = 50;
 
  async getSession(userId: string): Promise<AgentSDK> {
    if (!this.pool.has(userId) && this.pool.size < this.maxPoolSize) {
      const sdk = new AgentSDK({ apiKey: process.env.ANTHROPIC_API_KEY });
      this.pool.set(userId, sdk);
      return sdk;
    }
 
    return this.pool.get(userId) || this.createNewSession();
  }
 
  private createNewSession(): AgentSDK {
    return new AgentSDK({ apiKey: process.env.ANTHROPIC_API_KEY });
  }
 
  async cleanup(userId: string): Promise<void> {
    this.pool.delete(userId);
  }
}

This prevents resource exhaustion from creating unlimited SDK instances.

Error boundaries matter significantly. When you're streaming to many concurrent clients, errors in one stream shouldn't crash the entire server. Isolate error handling at the WebSocket level:

typescript
wss.on("connection", async (ws: WebSocket) => {
  const sessionId = `session-${Date.now()}`;
 
  ws.on("message", async (message: string) => {
    try {
      const { query } = JSON.parse(message);
      const session = await sdk.createSession();
 
      try {
        for await (const event of session.stream(query)) {
          ws.send(JSON.stringify(event));
        }
      } catch (streamError) {
        if (streamError.name !== "AbortError") {
          ws.send(
            JSON.stringify({
              type: "stream_error",
              message: streamError.message,
            }),
          );
        }
      }
    } catch (parseError) {
      ws.send(
        JSON.stringify({
          type: "protocol_error",
          message: "Invalid message format",
        }),
      );
    }
  });
});

This layered error handling ensures that malformed messages don't break the connection, and stream errors get properly communicated without crashing the server.

Monitoring and alerting for streaming becomes different. Traditional metrics (response time, error rate) don't work as well for streaming. You care about:

  • Stream startup latency — Time from request to first text_delta event
  • Time to first tool call — How long before Claude decides to use a tool
  • Average characters per second — Generation speed
  • Premature disconnections — Clients who bail early (indicates slow performance)
  • Tool execution latency — Time from tool_use to tool_result
typescript
class StreamMetrics {
  recordStreamStart(sessionId: string): void {
    metrics.gauge(
      "stream.startup_time",
      Date.now() - sessionStartTime[sessionId],
    );
  }
 
  recordFirstToolCall(sessionId: string, toolName: string): void {
    metrics.increment("tool.first_call", { tool: toolName });
  }
 
  recordGenerationSpeed(
    sessionId: string,
    textLength: number,
    elapsedMs: number,
  ): void {
    const charsPerSecond = (textLength / elapsedMs) * 1000;
    metrics.gauge("stream.generation_speed", charsPerSecond);
  }
 
  recordDisconnection(sessionId: string, reason: string): void {
    metrics.increment("stream.disconnection", { reason });
  }
}

These metrics tell you whether your streaming implementation is healthy. Slow startup time means clients wait too long before seeing feedback. High disconnection rate means your streams are timing out. Generation speed tells you if Claude is actually streaming fast enough to feel responsive.

Streaming Large Responses: Memory and Performance

One hidden advantage of streaming is handling truly massive responses. Imagine generating a 100,000-word document or analyzing a massive dataset. Without streaming, you'd hold the entire response in memory before sending it to the client. With streaming, you can start sending characters immediately and never hold more than a few KB in memory at a time.

This matters more than you might think in practice. Many production systems work fine until they hit unexpected scale. A feature that handles 95th percentile usage gracefully will sometimes encounter the 99th percentile—the user asking for analysis of a million-row dataset, or a document generation request for a 500-page report. Non-streaming systems will crash on these edge cases. Streaming systems will just take longer without consuming more memory.

The architectural benefit compounds. Instead of designing your system for the maximum expected response size, you design it for the maximum reasonable streaming rate. That's almost always smaller, because data is consumed incrementally. A system designed to stream at 1MB/second can handle 100MB responses without memory pressure—it just takes 100 seconds.

But this requires thinking differently about how you consume and forward the streamed data. Instead of buffering:

typescript
// ❌ DON'T: This buffers the entire response
let fullResponse = "";
for await (const event of session.stream(query)) {
  if (event.type === "text_delta") {
    fullResponse += event.text; // Memory grows unbounded
  }
}
ws.send(JSON.stringify({ type: "complete", text: fullResponse }));

Stream directly:

typescript
// ✅ DO: This streams without buffering
for await (const event of session.stream(query)) {
  if (event.type === "text_delta") {
    ws.send(
      JSON.stringify({
        type: "text_delta",
        text: event.text,
      }),
    ); // Send immediately, forget about it
  }
}

The second approach handles 100-megabyte responses without memory issues. The first approach would allocate 100 megabytes just to hold the response string.

For file-based streaming (like generating documents), write directly to disk as data arrives:

typescript
const writeStream = fs.createWriteStream("output.txt");
const pipeline = util.promisify(stream.pipeline);
 
const readable = Readable.from(async function* () {
  for await (const event of session.stream(query)) {
    if (event.type === "text_delta") {
      yield event.text;
    }
  }
});
 
await pipeline(readable, writeStream);

This pattern generates a 1GB document without ever holding more than a few KB in memory. That's the power of streaming done right.

Backpressure Handling in Real Systems

Here's a scenario: your client is slow. Maybe it's running in a browser with a slow connection, or the user's device is processing each text event heavily. But Claude Code is generating text at 100 characters per second. What happens?

Without backpressure handling, you queue up events on the server. The WebSocket sends buffer fills up. Memory grows. Eventually the server crashes. With backpressure handling, you slow down the stream to match the client's consumption rate:

typescript
class BackpressuredStream {
  private queue: any[] = [];
  private paused: boolean = false;
  private maxQueueSize: number = 100;
 
  async handleStream(session: any, ws: WebSocket): Promise<void> {
    // Track WebSocket buffer usage
    ws.on("drain", () => {
      this.paused = false;
      this.flushQueue();
    });
 
    for await (const event of session.stream(query)) {
      if (ws.bufferedAmount > 65536) {
        // 64KB threshold
        this.paused = true;
      }
 
      if (!this.paused) {
        ws.send(JSON.stringify(event));
      } else {
        this.queue.push(event);
        if (this.queue.length > this.maxQueueSize) {
          // Queue is full, pause the entire stream
          await new Promise((r) => setTimeout(r, 100));
        }
      }
    }
  }
 
  private flushQueue(): void {
    while (this.queue.length > 0 && !this.paused) {
      const event = this.queue.shift();
      ws.send(JSON.stringify(event));
    }
  }
}

This doesn't drop events or ignore backpressure. It actually slows down the stream to match what the client can handle. The server never runs out of memory, and the client never feels overwhelmed.

Caching Streamed Responses

Sometimes you'll want to stream a response once, then cache it for subsequent requests. This is tricky with streaming because you need to capture the stream while also forwarding it to the client in real time:

typescript
class CachedStream {
  private cache: Map<string, string> = new Map();
 
  async streamWithCaching(
    session: any,
    query: string,
    ws: WebSocket,
  ): Promise<void> {
    const cacheKey = `query:${hash(query)}`;
 
    // Check cache first
    if (this.cache.has(cacheKey)) {
      const cached = this.cache.get(cacheKey)!;
      ws.send(
        JSON.stringify({
          type: "cached",
          text: cached,
        }),
      );
      return;
    }
 
    // Not cached, stream and capture
    let fullText = "";
    for await (const event of session.stream(query)) {
      if (event.type === "text_delta") {
        fullText += event.text;
      }
      // Forward to client
      ws.send(JSON.stringify(event));
    }
 
    // Cache for next time
    this.cache.set(cacheKey, fullText);
  }
}

Now the first client that asks for a particular query pays the latency cost of generation. Subsequent clients get cached results delivered instantly. This pattern is particularly valuable for common queries that multiple users ask.

Performance Optimization: Streaming for Efficiency at Scale

When you deploy streaming systems in production, you're not just building for user experience—you're building for infrastructure efficiency. Streaming enables optimizations that aren't possible with batch processing.

Consider a scenario where you're generating a long document that takes 60 seconds. With batch processing, you hold everything in memory for the full 60 seconds, then deliver it all at once. With streaming, you write chunks to disk as they arrive, keeping memory usage constant regardless of response size. For a team generating hundreds of documents daily, this difference compounds into real cost savings.

Another efficiency win: streaming lets you implement adaptive quality. If a user closes the connection after getting 80% of the answer, you don't waste resources computing the remaining 20%. The system gracefully degrades. With batch processing, you'd have already computed everything and wasted the computation.

Error Handling and Resilience in Streaming

Streaming introduces complexity around error handling that batch processing avoids. When you've already sent half the response and an error occurs, you can't just retry from scratch. The user has already seen partial results.

A robust streaming system needs graceful degradation. If a tool call fails mid-stream, the system should continue with what it has rather than crashing. If network connectivity drops, the client should know it received a partial result, not a complete one. This requires explicit error boundaries around streaming operations.

Build this into your event handlers from the start. Track which events have arrived, and be explicit about what state the response is in. "Message is 75% complete with 2 tool calls pending" is better than "Message in progress—status unknown."

The Streaming Mindset

Streaming isn't just a technical feature—it's a different way of thinking about your system. Instead of "wait for work to complete, then show the result," it's "show progress as work happens." That shift affects architecture, UX, and user satisfaction.

The best streaming systems make the work visible. You don't just see text appearing—you see Claude's thinking, tool calls, results coming back. That transparency builds confidence. Users trust systems where they can see what's happening. They distrust black boxes. When your team deploys streaming, you're not just speeding things up—you're building institutional trust around AI systems. People understand what the system is doing. They can see failures happening in real time and can take action rather than discovering problems retroactively.

As you build streaming implementations, always ask: What can I show the user right now, before the work is complete? The answer to that question drives everything else.


-iNet

Need help implementing this?

We build automation systems like this for clients every day.

Discuss Your Project