April 14, 2025
Claude AI Development

Agent Sdk Multi Turn Conversations

The Claude Code Agent SDK is built for sustained, stateful conversations—not fire-and-forget API calls. Whether you're orchestrating a multi-step workflow, building an interactive agent team, or resuming long-running tasks, understanding how to manage conversation state across multiple turns is fundamental to unlocking the SDK's full potential.

In this guide, we'll explore the patterns that make multi-turn conversations work: how to maintain context, inject information between turns, branch logic conditionally, persist sessions, and handle the tricky edge case of context overflow. You'll walk away with production-ready code and the mental models to design robust agent workflows.

Table of Contents
  1. Why Multi-Turn Conversations Matter
  2. Core Concepts: Session State and Context Management
  3. What the SDK Manages Automatically
  4. What You Control
  5. Pattern 1: Sequential Message Exchange with Context Carryover
  6. Pattern 2: Injecting Context Between Turns
  7. Pattern 3: Conditional Branching Based on Agent Decisions
  8. Pattern 4: Session Persistence and Resumption
  9. Pattern 5: Handling Context Overflow and Token Limits
  10. Pattern 6: Multi-Agent Orchestration with Shared Context
  11. Pattern 7: Error Recovery and Retry Logic
  12. Pattern 8: Structured Outputs and Deterministic Responses
  13. Common Pitfalls and How to Avoid Them
  14. Pitfall 1: Assuming Conversations Are Deterministic
  15. Pitfall 2: Losing Context in Serialization
  16. Pitfall 3: Injecting Misleading Context
  17. Pitfall 4: Token Limits Creeping Up Silently
  18. Pitfall 5: Branching Without Exiting Paths
  19. Best Practices Summary
  20. Advanced State Management for Complex Workflows
  21. Handling Context Overflow at Scale
  22. Real-Time Feedback and Interaction Loops
  23. Debugging Multi-Turn Conversations
  24. Performance Optimization for Production
  25. Testing Multi-Turn Conversations
  26. Conclusion
  27. The Deeper Pattern: Why Multi-Turn Conversations Mirror Human Thinking
  28. Advanced Pattern: Context Compression and Summarization
  29. Real-World Example: Multi-Turn Code Generation Workflow
  30. The Debugging Advantage in Multi-Turn Systems

Why Multi-Turn Conversations Matter

Here's the reality: meaningful work rarely happens in a single API call.

A typical agent workflow looks like this:

  1. User kicks off a task ("Build me a test suite for this controller")
  2. Agent asks clarifying questions ("What testing framework? What's the scope?")
  3. User responds ("Jest, happy path + edge cases")
  4. Agent generates code (pulls context from conversation history)
  5. User reviews ("Can you add mutation testing?")
  6. Agent iterates (adjusts based on feedback)

That's five to ten distinct message exchanges. If you treat each as an isolated request—resending the full conversation history, losing context about what the user approved, forgetting what hypotheses you tested—you burn tokens, introduce inconsistency, and create a frustrating user experience.

The Agent SDK is designed to handle this elegantly. Session objects maintain conversation history automatically. You control what context to inject between turns. You decide when to branch or loop back. And crucially, you can pause, persist, and resume without losing a beat.

Core Concepts: Session State and Context Management

Before we dive into patterns, let's clarify what "session" means in the SDK.

A session is a stateful conversation container. When you initialize a session:

typescript
const session = await client.agents.sessions.create({
  agentId: "my-agent-id",
  systemPrompt: "You are a code review expert...",
});

The SDK creates a persistent conversation thread. Every message you send appends to the history. The model sees all previous turns. Your code has access to the full transcript. This is fundamentally different from stateless REST APIs where you manually craft the messages array each time. Here, the session handles bookkeeping. You focus on logic.

What the SDK Manages Automatically

  • Message history: Every turn is recorded with metadata
  • Turn metadata: Timestamps, role (user/assistant), token usage estimates
  • Session state: Unique ID, creation time, last activity, expiration
  • Context window: Automatic summarization if you hit token limits (more on this later)
  • Error recovery: Built-in retry logic for transient failures

What You Control

  • User input: What you send in each turn
  • Injected context: System messages, reference documents, facts to remember
  • Conditional logic: When to ask follow-up questions vs. execute
  • Persistence: When to save/resume sessions
  • Exit criteria: When the conversation is "done"
  • Branching: Different paths based on agent responses
  • Token budgeting: Proactive management of context window

The mental model: You're the director, the session is the stage, and context is the script.

Pattern 1: Sequential Message Exchange with Context Carryover

Let's start simple: a basic back-and-forth where we rely on the SDK to maintain context.

typescript
import Anthropic from "@anthropic-ai/sdk";
 
const client = new Anthropic();
 
async function codeReviewSession() {
  // Initialize a new session
  const session = await client.agents.sessions.create({
    agentId: "code-reviewer",
    systemPrompt: `You are an expert code reviewer.
    For each code sample, provide feedback on:
    1. Correctness
    2. Performance
    3. Readability
    4. Security
 
    Format feedback as a bulleted list.`,
  });
 
  console.log(`Session created: ${session.id}`);
 
  // Turn 1: User submits code
  const codeSnippet = `
    function findUser(users, id) {
      for (let i = 0; i < users.length; i++) {
        if (users[i].id === id) return users[i];
      }
      return null;
    }
  `;
 
  const turn1 = await client.agents.sessions.messages.create(session.id, {
    role: "user",
    content: `Please review this code:\n\`\`\`javascript\n${codeSnippet}\n\`\`\``,
  });
 
  console.log("Agent feedback (Turn 1):");
  console.log(turn1.content[0].text);
 
  // Turn 2: User asks a follow-up (agent remembers the code)
  const turn2 = await client.agents.sessions.messages.create(session.id, {
    role: "user",
    content: "What about using .find() instead? Would that be better?",
  });
 
  console.log("\nAgent response (Turn 2):");
  console.log(turn2.content[0].text);
  // ^ No need to re-send the code. The agent has context.
 
  // Turn 3: Iterate based on feedback
  const improvedCode = `
    function findUser(users, id) {
      return users.find(user => user.id === id) || null;
    }
  `;
 
  const turn3 = await client.agents.sessions.messages.create(session.id, {
    role: "user",
    content: `Here's my revision:\n\`\`\`javascript\n${improvedCode}\n\`\`\``,
  });
 
  console.log("\nAgent feedback on revision (Turn 3):");
  console.log(turn3.content[0].text);
}
 
codeReviewSession().catch(console.error);

Why this matters: Notice that Turn 2 never re-mentions the original code. The agent still gives relevant feedback because the session transcript includes the full conversation context. This is why multi-turn conversations are efficient—you're not bloating every request with redundant context.

What happens behind the scenes:

  1. You send "What about using .find()?"
  2. SDK appends this to the session history
  3. Model receives: [system prompt] + [full conversation history up to this message]
  4. Model responds with context awareness
  5. SDK stores the response in the session
  6. If token count approaches limit, SDK can summarize old turns

This pattern scales beautifully. As long as you're within token limits, the conversation feels natural and incremental.

Pattern 2: Injecting Context Between Turns

Real workflows often require pulling in external data mid-conversation. Think: loading a knowledge base, injecting test results, or supplying fresh facts.

The key insight: Injected context goes into the system prompt or as a special message, not mixed with user input. This keeps your intent clear and prevents the model from getting confused about what's user-facing vs. what's reference material.

typescript
async function agentWithInjectedContext() {
  const session = await client.agents.sessions.create({
    agentId: "doc-assistant",
  });
 
  // Turn 1: User asks about API
  await client.agents.sessions.messages.create(session.id, {
    role: "user",
    content: "How do I authenticate with your API?",
  });
 
  // Turn 2: We fetch fresh documentation and inject it
  const apiDocs = await fetchLatestAPIDocs(); // Your function
 
  // Inject as an assistant message (context, not a real response yet)
  await client.agents.sessions.messages.create(session.id, {
    role: "assistant",
    content: `I have the latest API documentation:\n\n${apiDocs}\n\nLet me answer your question...`,
  });
 
  // Turn 3: Now ask the agent to answer based on injected context
  const response = await client.agents.sessions.messages.create(session.id, {
    role: "user",
    content:
      "Based on the documentation just provided, give me a concise auth example.",
  });
 
  console.log(response.content[0].text);
}

The injection pattern:

  1. Recognize you need external data
  2. Fetch it (API, file, database)
  3. Insert it as an assistant context message before asking the actual question
  4. User sees this as part of the conversation flow
  5. Model has the data in context for the next response

This is particularly powerful for fact-checking workflows:

typescript
async function factCheckingLoop() {
  const session = await client.agents.sessions.create({
    agentId: "fact-checker",
  });
 
  let claim = "The Great Wall of China is 13,000 miles long.";
 
  // Turn 1: Agent evaluates claim
  await client.agents.sessions.messages.create(session.id, {
    role: "user",
    content: `Fact-check this claim:\n\n"${claim}"`,
  });
 
  // Turn 2: We search for real data
  const searchResults = await client.web.search({
    query: "Great Wall of China actual length miles",
  });
 
  // Inject search results
  await client.agents.sessions.messages.create(session.id, {
    role: "assistant",
    content: `Here are the search results:\n\n${JSON.stringify(searchResults, null, 2)}`,
  });
 
  // Turn 3: Agent revises assessment with real data
  const finalVerdict = await client.agents.sessions.messages.create(
    session.id,
    {
      role: "user",
      content: "Based on the search results, give me your final verdict.",
    },
  );
 
  console.log(finalVerdict.content[0].text);
}

Why inject as messages instead of system context?

  • Clarity: The conversation is transparent. You can see exactly when facts entered.
  • Ordering: Messages maintain sequence. Later context overrides earlier.
  • Explainability: Anyone reading the transcript understands the information flow.
  • Flexibility: You can inject mid-conversation, not just at initialization.
  • Token efficiency: System context persists; message context can be pruned.

Pattern 3: Conditional Branching Based on Agent Decisions

Multi-turn conversations aren't always linear. Sometimes the agent's response determines what happens next. This is where the SDK shines—you can dynamically route based on agent outputs.

typescript
enum DecisionType {
  NEEDS_RESEARCH = "needs_research",
  READY_TO_CODE = "ready_to_code",
  CLARIFICATION_NEEDED = "clarification_needed",
  ERROR = "error",
}
 
interface AgentDecision {
  decision: DecisionType;
  reasoning: string;
  nextAction?: string;
}
 
async function conditionalWorkflow() {
  const session = await client.agents.sessions.create({
    agentId: "task-router",
  });
 
  // Turn 1: Submit task
  const initialResponse = await client.agents.sessions.messages.create(
    session.id,
    {
      role: "user",
      content: `I need to build a real-time chat system. Here's my constraint:
      must support 10,000 concurrent users with <100ms latency.`,
    },
  );
 
  // Parse agent's decision
  const decision = parseAgentDecision(initialResponse.content[0].text);
 
  if (decision.decision === DecisionType.NEEDS_RESEARCH) {
    console.log("Agent thinks we need research. Fetching benchmarks...");
 
    const benchmarks = await fetchTechBenchmarks();
    await client.agents.sessions.messages.create(session.id, {
      role: "assistant",
      content: `Here are relevant performance benchmarks:\n\n${benchmarks}`,
    });
 
    // Turn 2a: Resume after research
    const postResearchResponse = await client.agents.sessions.messages.create(
      session.id,
      {
        role: "user",
        content:
          "Given these benchmarks, what's your recommended architecture?",
      },
    );
 
    console.log(postResearchResponse.content[0].text);
  } else if (decision.decision === DecisionType.READY_TO_CODE) {
    console.log("Agent is ready. Proceeding to code generation...");
 
    // Turn 2b: Skip research, go straight to generation
    const codeResponse = await client.agents.sessions.messages.create(
      session.id,
      {
        role: "user",
        content: "Please generate the core connection pool implementation.",
      },
    );
 
    console.log(codeResponse.content[0].text);
  } else if (decision.decision === DecisionType.CLARIFICATION_NEEDED) {
    console.log("Agent needs more info. Asking user...");
    console.log(`Clarification needed: ${decision.nextAction}`);
 
    // In a real app, you'd prompt the user and resume
    // Turn 2c: User provides clarification
    const clarification = "We're using Node.js with TypeScript.";
    await client.agents.sessions.messages.create(session.id, {
      role: "user",
      content: clarification,
    });
  }
}
 
function parseAgentDecision(text: string): AgentDecision {
  // This is simplified. In production, use structured outputs or JSON parsing.
  if (text.includes("research") || text.includes("benchmark")) {
    return {
      decision: DecisionType.NEEDS_RESEARCH,
      reasoning: text,
    };
  } else if (text.includes("architecture") || text.includes("design")) {
    return {
      decision: DecisionType.READY_TO_CODE,
      reasoning: text,
    };
  } else {
    return {
      decision: DecisionType.CLARIFICATION_NEEDED,
      reasoning: text,
      nextAction: text,
    };
  }
}

The branching pattern in action:

  1. Send a task/prompt
  2. Parse the response for a decision signal
  3. Based on that signal, branch to different workflows
  4. Continue the same session down the chosen path

This is how you build intelligent agents that adapt. The conversation isn't predetermined. It flows based on what the model determines is needed. A real-world example: A code generator that says "I need to see the existing database schema before writing migrations." Instead of ignoring that, you fetch the schema and inject it. The agent's next response is infinitely better because it has what it asked for.

Pattern 4: Session Persistence and Resumption

Sometimes conversations are long or expensive. You want to save progress and resume later. This is critical for production workflows that might span hours or days.

typescript
import fs from "fs";
import path from "path";
 
interface SessionTranscript {
  sessionId: string;
  createdAt: string;
  messages: Array<{
    role: "user" | "assistant";
    content: string;
    timestamp: string;
  }>;
  metadata: Record<string, any>;
}
 
async function saveSessionTranscript(
  sessionId: string,
  transcript: SessionTranscript,
): Promise<void> {
  const filename = `session-${sessionId}-${Date.now()}.json`;
  const filepath = path.join("./transcripts", filename);
 
  // Ensure directory exists
  if (!fs.existsSync("./transcripts")) {
    fs.mkdirSync("./transcripts", { recursive: true });
  }
 
  fs.writeFileSync(filepath, JSON.stringify(transcript, null, 2));
  console.log(`Session saved to ${filepath}`);
}
 
async function resumeSession(sessionId: string) {
  // Fetch the saved transcript
  const files = fs.readdirSync("./transcripts");
  const transcriptFile = files.find((f) => f.includes(sessionId));
 
  if (!transcriptFile) {
    console.error(`No transcript found for session ${sessionId}`);
    return;
  }
 
  const transcript: SessionTranscript = JSON.parse(
    fs.readFileSync(path.join("./transcripts", transcriptFile), "utf-8"),
  );
 
  console.log(
    `Resuming session ${sessionId} with ${transcript.messages.length} prior messages`,
  );
 
  // Recreate the session context by replaying messages
  const session = await client.agents.sessions.create({
    agentId: "code-generator",
    systemPrompt: `You are resuming a prior conversation. Here's what we've discussed so far:\n\n${formatTranscript(transcript.messages)}`,
  });
 
  // Continue from where we left off
  const nextMessage = await client.agents.sessions.messages.create(session.id, {
    role: "user",
    content:
      "I'm back. Here's what I'd like to do next: [your continuation task]",
  });
 
  console.log(nextMessage.content[0].text);
}
 
function formatTranscript(messages: SessionTranscript["messages"]): string {
  return messages.map((m) => `**${m.role}**: ${m.content}`).join("\n\n");
}
 
async function longRunningWorkflow() {
  // Phase 1: Create session and do initial work
  const session = await client.agents.sessions.create({
    agentId: "architect",
  });
 
  const response1 = await client.agents.sessions.messages.create(session.id, {
    role: "user",
    content: "Design a microservices architecture for an e-commerce platform.",
  });
 
  console.log("Architecture design:\n", response1.content[0].text);
 
  // Save checkpoint
  const transcript: SessionTranscript = {
    sessionId: session.id,
    createdAt: new Date().toISOString(),
    messages: [
      {
        role: "user",
        content:
          "Design a microservices architecture for an e-commerce platform.",
        timestamp: new Date().toISOString(),
      },
      {
        role: "assistant",
        content: response1.content[0].text,
        timestamp: new Date().toISOString(),
      },
    ],
    metadata: { phase: "architecture_design", status: "paused" },
  };
 
  await saveSessionTranscript(session.id, transcript);
 
  // Later... hours, days, or weeks
  // await resumeSession(session.id);
}

Why persistence matters:

  1. Avoid re-running expensive work: If the agent spent 10 minutes analyzing your codebase, you don't want to redo that.
  2. Audit trail: You have a complete record of every decision and input.
  3. Resumption without re-context: Save the transcript, resume with a fresh session that has the history in its system prompt.
  4. Workflow checkpointing: Break long tasks into phases, save between phases, resume asynchronously.

Pitfall to avoid: Don't try to reuse the exact same session ID if the SDK has already closed the session. Instead, save the transcript and recreate context in a new session by injecting the history as system context.

Pattern 5: Handling Context Overflow and Token Limits

This is where reality bites. Context windows are finite.

With Claude models offering 200k token context, you have breathing room. But conversations grow fast:

  • A 20-turn conversation with verbose responses = ~20k tokens
  • Add a 50k-token code repository = 70k tokens
  • Add test results, logs, and debugging info = 120k+ tokens
  • Add multiple code files and documentation = 180k+ tokens

You hit the ceiling faster than you'd expect. The key is being proactive rather than reactive.

typescript
interface ContextState {
  totalTokens: number;
  maxTokens: number;
  warningThreshold: number;
  criticalThreshold: number;
}
 
async function manageContextOverflow(session: any) {
  const contextState: ContextState = {
    totalTokens: 0,
    maxTokens: 200000, // Claude Haiku
    warningThreshold: 0.75, // 150k tokens
    criticalThreshold: 0.9, // 180k tokens
  };
 
  let messageCount = 0;
 
  while (true) {
    // Simulate incoming user message
    const userInput = "Add pagination support to the user repository.";
 
    // Estimate token count (rough approximation)
    const estimatedTokens = estimateTokenCount(userInput);
    contextState.totalTokens += estimatedTokens;
    messageCount++;
 
    // Check context health
    const utilization = contextState.totalTokens / contextState.maxTokens;
 
    if (utilization > contextState.criticalThreshold) {
      console.warn(
        `CRITICAL: Context usage at ${(utilization * 100).toFixed(1)}%. Compacting...`,
      );
      await compactSession(session);
      contextState.totalTokens = estimatedTokens * 10; // Reset after compaction
    } else if (utilization > contextState.warningThreshold) {
      console.warn(
        `WARNING: Context usage at ${(utilization * 100).toFixed(1)}%. Monitor closely.`,
      );
    }
 
    // Process the message
    const response = await client.agents.sessions.messages.create(session.id, {
      role: "user",
      content: userInput,
    });
 
    contextState.totalTokens += estimateTokenCount(response.content[0].text);
 
    console.log(
      `Turn ${messageCount}: Context usage ${(utilization * 100).toFixed(1)}%`,
    );
 
    if (messageCount >= 5) break; // Exit after 5 iterations for demo
  }
}
 
function estimateTokenCount(text: string): number {
  // Rough heuristic: 1 token ≈ 4 characters (varies by language/content)
  return Math.ceil(text.length / 4);
}
 
async function compactSession(session: any): Promise<void> {
  // This is a simulation. In production, you'd use the SDK's compact feature.
  console.log("Compacting session history...");
 
  // Fetch full session transcript
  // Summarize early turns into a condensed summary
  // Replace early turns with the summary
 
  const summary = `
[Sessions 1-10 Summary]
- User requested a feature to add pagination to repositories
- We designed a pagination strategy using cursor-based pagination
- Agent generated initial implementation
- User approved and provided feedback
[End Summary]
`;
 
  // In real code, you'd call a compaction endpoint or recreate the session
  // with the summary injected at the top.
 
  console.log("Session compacted. Freed ~40k tokens.");
}
 
async function proactiveCompaction() {
  // Better pattern: Compact BEFORE you hit the limit
 
  const session = await client.agents.sessions.create({
    agentId: "code-gen",
  });
 
  const turnLimit = 20; // Compact every N turns
  let turnCount = 0;
 
  for (let i = 0; i < 50; i++) {
    turnCount++;
 
    const response = await client.agents.sessions.messages.create(session.id, {
      role: "user",
      content: `Task ${i}: Generate a function...`,
    });
 
    console.log(`Turn ${turnCount}: Processed`);
 
    // Proactive compaction
    if (turnCount >= turnLimit) {
      console.log("Proactive compaction triggered...");
      // Summarize turns 1-20, keep turns 21-current
      turnCount = 0;
      // In production: call compaction API
    }
  }
}

Token management strategies (in order of preference):

  1. Proactive compaction: Every N turns (e.g., 20), summarize old turns. Frees space without hitting the limit.
  2. Session checkpointing: Save the session state, start a fresh session with an injected summary, continue.
  3. Aggressive pruning: Drop non-critical messages (e.g., keep summaries but discard verbose drafts).
  4. Query-specific focus: Inject only the context relevant to the current turn, not the entire history.
  5. Message filtering: Remove verbose intermediate states, keep only final results and decisions.

The key insight: Don't wait until you're at 95% utilization. Compact early, compress aggressively, and monitor continuously.

Pattern 6: Multi-Agent Orchestration with Shared Context

Here's where things get powerful: multiple agents working on the same session, passing context between them. This is the foundation for agent teams.

typescript
interface Agent {
  id: string;
  role: string;
  expertise: string;
}
 
const agents = {
  architect: {
    id: "arch-agent",
    role: "System Architect",
    expertise: "Design, scalability, trade-offs",
  },
  implementer: {
    id: "code-agent",
    role: "Implementation Engineer",
    expertise: "Writing production code",
  },
  reviewer: {
    id: "review-agent",
    role: "Code Reviewer",
    expertise: "Security, performance, testing",
  },
};
 
async function multiAgentWorkflow() {
  // Create a shared session
  const sharedSession = await client.agents.sessions.create({
    agentId: "orchestrator",
  });
 
  console.log("=== PHASE 1: Architecture ===");
  const archResponse = await client.agents.sessions.messages.create(
    sharedSession.id,
    {
      role: "user",
      content: `You are the ${agents.architect.role}.
      Design a payment processing system that handles 1000s of transactions/sec.
      Consider: payment providers, idempotency, error recovery.`,
    },
  );
 
  const architecture = archResponse.content[0].text;
  console.log(architecture);
 
  // Inject architecture as context for next agent
  await client.agents.sessions.messages.create(sharedSession.id, {
    role: "assistant",
    content: `Architecture design complete. Summary:\n${architecture.substring(0, 500)}...`,
  });
 
  console.log("\n=== PHASE 2: Implementation ===");
  const codeResponse = await client.agents.sessions.messages.create(
    sharedSession.id,
    {
      role: "user",
      content: `You are the ${agents.implementer.role}.
      Based on the architecture above, write TypeScript code for the PaymentProcessor class.
      Include: transaction logging, retry logic, provider fallback.`,
    },
  );
 
  const code = codeResponse.content[0].text;
  console.log(code);
 
  // Inject code for review
  await client.agents.sessions.messages.create(sharedSession.id, {
    role: "assistant",
    content: `Implementation complete:\n\`\`\`typescript\n${code}\n\`\`\``,
  });
 
  console.log("\n=== PHASE 3: Review ===");
  const reviewResponse = await client.agents.sessions.messages.create(
    sharedSession.id,
    {
      role: "user",
      content: `You are the ${agents.reviewer.role}.
      Review the code above for:
      1. Security vulnerabilities
      2. Performance bottlenecks
      3. Missing test coverage
      Provide specific, actionable feedback.`,
    },
  );
 
  const review = reviewResponse.content[0].text;
  console.log(review);
 
  console.log("\n=== Final Output ===");
  console.log(
    "Session transcript saved. Full context available for next phase.",
  );
}

Why shared sessions rock:

  • Continuity: Each agent sees the full conversation history.
  • Accountability: Every decision is traced back to which agent made it.
  • Iterative refinement: Later agents can critique and improve earlier work.
  • Context efficiency: You're not duplicating information across separate API calls.
  • Quality gates: Each phase can validate the previous phase's output.

This pattern scales to 5, 10, or 20+ agents coordinating on complex tasks. The secret is keeping the shared session as the single source of truth.

Pattern 7: Error Recovery and Retry Logic

Multi-turn conversations can fail at any point. Networks drop. Token limits exceed. Agents hallucinate. You need resilient retry patterns.

typescript
interface RetryConfig {
  maxRetries: number;
  initialDelayMs: number;
  backoffMultiplier: number;
  maxDelayMs: number;
}
 
async function robustSessionCall(
  sessionId: string,
  message: string,
  config: RetryConfig = {
    maxRetries: 3,
    initialDelayMs: 1000,
    backoffMultiplier: 2,
    maxDelayMs: 10000,
  },
): Promise<any> {
  let lastError: Error | null = null;
  let delay = config.initialDelayMs;
 
  for (let attempt = 1; attempt <= config.maxRetries; attempt++) {
    try {
      console.log(
        `Attempt ${attempt}: Sending message to session ${sessionId}`,
      );
 
      const response = await client.agents.sessions.messages.create(sessionId, {
        role: "user",
        content: message,
      });
 
      console.log("✓ Success");
      return response;
    } catch (error) {
      lastError = error as Error;
      console.error(`✗ Attempt ${attempt} failed: ${lastError.message}`);
 
      if (attempt < config.maxRetries) {
        console.log(`Waiting ${delay}ms before retry...`);
        await new Promise((resolve) => setTimeout(resolve, delay));
        delay = Math.min(delay * config.backoffMultiplier, config.maxDelayMs);
      }
    }
  }
 
  throw new Error(
    `Failed after ${config.maxRetries} attempts: ${lastError?.message}`,
  );
}
 
async function sessionWithErrorRecovery() {
  const session = await client.agents.sessions.create({
    agentId: "data-processor",
  });
 
  try {
    // Turn 1: Initial request with retry
    const result = await robustSessionCall(
      session.id,
      "Process this CSV and generate summary statistics.",
    );
 
    console.log(result.content[0].text);
 
    // Turn 2: Follow-up (relies on session state from Turn 1)
    const followUp = await robustSessionCall(
      session.id,
      "Now identify outliers and anomalies.",
    );
 
    console.log(followUp.content[0].text);
  } catch (error) {
    console.error("Workflow failed after retries:", error);
 
    // Fallback: save session state and alert human
    const backup = {
      sessionId: session.id,
      timestamp: new Date().toISOString(),
      lastError: (error as Error).message,
    };
 
    console.log("Session backup saved for manual review:", backup);
  }
}

Retry strategies for multi-turn conversations:

  1. Exponential backoff: Start with 1s delay, double each retry (1s → 2s → 4s).
  2. Jitter: Add random noise to prevent thundering herd (if 100 sessions retry simultaneously).
  3. Circuit breaker: If 5 consecutive calls fail, stop and escalate (don't hammer the API).
  4. Graceful degradation: Save state, alert the user, offer manual recovery.
  5. Idempotency: Reuse the same message ID if retrying (prevents duplicate processing).

What NOT to do: Don't silently swallow errors. Log them. Track patterns. If a certain turn consistently fails, there's a systemic issue (bad prompt, token overflow, etc.).

Pattern 8: Structured Outputs and Deterministic Responses

By default, agent responses are free-form text. For workflows, you often need structured data. This is where you move from conversational to programmatic.

typescript
interface CodeReviewOutput {
  issues: Array<{
    severity: "critical" | "warning" | "info";
    category: string;
    description: string;
    lineNumber?: number;
    suggestedFix: string;
  }>;
  overallScore: number;
  recommendations: string[];
}
 
async function structuredCodeReview(code: string): Promise<CodeReviewOutput> {
  const session = await client.agents.sessions.create({
    agentId: "code-reviewer",
  });
 
  const response = await client.agents.sessions.messages.create(session.id, {
    role: "user",
    content: `Review this code and respond with ONLY valid JSON matching this schema:
    {
      "issues": [
        { "severity": "critical|warning|info", "category": "string", "description": "string", "lineNumber": number, "suggestedFix": "string" }
      ],
      "overallScore": number (0-10),
      "recommendations": ["string"]
    }
 
    Code to review:
    \`\`\`javascript
    ${code}
    \`\`\``,
  });
 
  // Parse the JSON response
  const reviewText = response.content[0].text;
  const jsonMatch = reviewText.match(/\{[\s\S]*\}/);
 
  if (!jsonMatch) {
    throw new Error(`No JSON found in response: ${reviewText}`);
  }
 
  const parsed: CodeReviewOutput = JSON.parse(jsonMatch[0]);
  return parsed;
}
 
async function structuredWorkflow() {
  const codeToReview = `
    function calculateTotal(items) {
      let total = 0;
      for (let i = 0; i < items.length; i++) {
        total = total + items[i].price;
      }
      return total;
    }
  `;
 
  const review = await structuredCodeReview(codeToReview);
 
  console.log("Review Results:");
  console.log(`Overall Score: ${review.overallScore}/10`);
 
  review.issues.forEach((issue) => {
    console.log(`\n[${issue.severity.toUpperCase()}] ${issue.category}`);
    console.log(`  ${issue.description}`);
    if (issue.lineNumber) console.log(`  Line: ${issue.lineNumber}`);
    console.log(`  Fix: ${issue.suggestedFix}`);
  });
 
  console.log("\nRecommendations:");
  review.recommendations.forEach((rec) => console.log(`- ${rec}`));
}

Why structure matters:

  • Downstream processing: You can feed the JSON into automated tools.
  • Validation: You can check that required fields exist.
  • Consistency: Every response has the same shape, no surprises.
  • Composability: One agent's output becomes another agent's input seamlessly.

Pitfall: Agents sometimes escape JSON or include markdown formatting. Always validate and sanitize before parsing. Use JSON.parse() in a try-catch.

Common Pitfalls and How to Avoid Them

Pitfall 1: Assuming Conversations Are Deterministic

The problem: You assume that if you send the same message to an agent in Turn 5, you'll get the same response.

Why it fails: The prior conversation context affects every response. Same message, different context = different output.

The fix: Treat multi-turn conversations as inherently variable. Build robustness via retries, validation, and fallback paths, not via assumptions about output.

Pitfall 2: Losing Context in Serialization

The problem: You save a session ID, but weeks later, the session is gone or the context was pruned by the backend.

Why it fails: Sessions aren't guaranteed to persist forever. Backend maintenance, token cleanup, etc.

The fix: Always save the transcript, not just the session ID. Use the transcript to recreate context in a new session.

Pitfall 3: Injecting Misleading Context

The problem: You inject "Here's the test output" but then ask the agent to pass tests without giving it the actual test code.

Why it fails: The agent is confused. It has results but no source. It can't debug or learn.

The fix: Be explicit. If you're injecting partial context, say so: "Here's the failing test output. The test file is file.test.ts. Please debug."

Pitfall 4: Token Limits Creeping Up Silently

The problem: You don't monitor context usage. One day, messages start failing or getting truncated.

Why it fails: No visibility into utilization. You're flying blind.

The fix: Log token estimates after every turn. Set a warning threshold (75% utilization). Compact proactively.

Pitfall 5: Branching Without Exiting Paths

The problem: You branch to three possible workflows but never merge back or close the session.

Why it fails: Sessions remain open. Tokens leak. You forget what the agent was working on.

The fix: Always have a terminal state for each branch. Either "done, save transcript" or "error, log and escalate."

Best Practices Summary

  1. Save often: Checkpoint transcripts at key decision points.
  2. Inject strategically: External data goes in as context messages, not user messages.
  3. Monitor tokens: Log utilization after every turn. Compact at 75%.
  4. Validate outputs: Don't assume structured responses. Parse and validate.
  5. Retry wisely: Exponential backoff, not immediate retry.
  6. Branch explicitly: Every path must have a clear exit condition.
  7. Log everything: Transcripts are gold. Store them durably.
  8. Test with real conversations: Unit tests catch syntax errors; real sessions catch context bugs.
  9. Use compression: Summarize old turns to free tokens proactively.
  10. Version your prompts: Track changes to system prompts and decision logic.

Advanced State Management for Complex Workflows

As conversations grow more complex, managing state explicitly becomes critical. The SDK provides tools, but you need patterns.

Checkpoints for Irreversibility: Before making changes that can't be undone, create checkpoints. In code refactoring, before restructuring modules, ask the agent to save its plan. If the execution fails, you can backtrack to the plan and try differently.

State Machines: Model conversations as state transitions. "Awaiting initial request" → "Gathering context" → "Planning" → "Executing" → "Reviewing" → "Done". Each state has specific allowed transitions. This prevents invalid sequences like executing before planning.

Invariant Checking: After each turn, verify critical invariants. "Does the generated code parse?" "Are all required fields present in the JSON?" Catch bugs early before they compound downstream.

Side Effect Tracking: Some operations have external side effects (commit to git, call an API, write to database). Track these separately. If you need to rollback, you can undo side effects. Without tracking, rollbacks fail silently.

Handling Context Overflow at Scale

Large conversations will eventually hit token limits. Gracefully handling overflow prevents hard failures.

Proactive Summarization: Don't wait until you hit the limit. At 70% utilization, trigger summarization. The agent writes a brief summary of progress. This summary replaces the full history in future turns. You've freed tokens before hitting the wall.

Hierarchical Summarization: Don't flatten history to one summary. Create summaries of summaries. Early turns summarize to "Phase 1 completed with these outcomes." Later summaries add more detail about Phase 2. This allows expanding context as needed while keeping base history compressed.

Selective Retention: Not all history matters equally. The last five exchanges matter most. Exchanges from five turns ago matter less. Implement weighted retention—keep full detail for recent turns, compress older ones.

Branching Cleanup: If you explored multiple paths in one session, retain only the chosen path in the main transcript. Archive discarded paths separately for audit, but don't carry them forward. This prevents your history from accumulating every alternative you considered.

Real-Time Feedback and Interaction Loops

Some applications need agents responding to live user feedback, not just batch requests.

Streaming Responses: Instead of waiting for complete responses, stream tokens as they arrive. Users see thinking in real-time, not as a bulk drop. The SDK supports streaming; use it for interactive feel.

Interruption Handling: Users might interrupt an agent mid-response. Handle this gracefully. Don't discard the partial response—save it for potential resumption. "You asked me to write a function. Here's my incomplete draft. Want me to continue from where I left off?"

Feedback Injection: After partial responses, collect feedback. "Is this the right approach?" "Should I add error handling here?" Inject the feedback and let the agent improve incrementally. This feels collaborative, not like the agent is talking to you.

Rate Limiting Conversations: If users can submit messages rapidly, you might create too many turns too quickly. Implement minimum time between turns (e.g., 100ms), rate limits per user, and queue management. This keeps the system stable even under load.

Debugging Multi-Turn Conversations

When things go wrong, debugging is harder because state is distributed across turns.

Turn-Level Logging: Log every turn with full details. What was sent? What came back? How many tokens? How long did it take? JSON format works well:

json
{
  "turn": 5,
  "timestamp": "2026-03-17T14:32:15Z",
  "input": "Review this code for security issues",
  "model_response": "...truncated...",
  "tokens_used": 1247,
  "tokens_remaining": 3753,
  "duration_ms": 2341,
  "errors": []
}

Session Replays: Keep all transcripts durably. You can later replay conversations in a test environment to reproduce issues. "Conversation XYZ failed at turn 12. Let me replay it and see where it diverges from expected behavior."

Context Snapshots: Periodically save context snapshots. The agent's understanding of the current state. These let you inspect what the agent "believed" at each point, helpful for understanding divergence from expected behavior.

Comparative Analysis: When a conversation produces unexpected output, compare against a similar successful conversation. Where did they diverge? Was it an injection, a branching decision, or accumulated context drift?

Performance Optimization for Production

High-traffic agent systems need optimization.

Caching Injected Context: If you frequently inject the same documents, cache them. Don't re-upload the Python standard library documentation with every turn. Upload once, reference by ID.

Async Processing: Not all turns need synchronous responses. Offload expensive operations (database queries, API calls) to async tasks. Response to the user while processing completes.

Connection Pooling: Reuse connections to the Claude API. Don't create new connections per session. Connection overhead adds up across millions of turns.

Batch Operations: If you manage thousands of sessions, batch status checks and updates. Don't poll each session individually; batch ten at a time.

Testing Multi-Turn Conversations

Unit tests alone won't catch context bugs. You need conversation-level testing.

Transcript Fixtures: Capture real conversation transcripts as test data. Replay them with fresh sessions. Verify output consistency. If updates to prompts change output, catch it before production.

Invariant-Based Testing: Define what must be true after each turn. Generate all possible inputs, run them through a conversation, verify invariants hold. This is property-based testing applied to conversations.

Chaos Testing: Randomly truncate context, fail injections, delay responses. See how your conversation handles adversity. This finds edge cases unit tests miss.

Load Testing: Simulate thousands of concurrent conversations. Verify they don't interfere with each other. Check that session isolation works.

Conclusion

Multi-turn conversations are where the Agent SDK truly shines. Instead of treating each API call as isolated, you're building stateful, context-aware workflows that feel like collaborations between humans and AI systems.

The patterns we've covered—sequential context carryover, injected facts, conditional branching, persistence, token management, and multi-agent orchestration—form the foundation of production-grade agent systems.

The real power emerges when you combine these patterns:

  • Architecture phase with Agent A → Save transcript
  • Inject results for Agent B's review → Parse structured JSON
  • Branch on quality score → High score = ship, Low score = loop back to Agent A
  • Compact context every 20 turns → Ensure we never hit token limits
  • Persist final transcript → Audit trail for compliance

That's a robust, scalable system. That's how you build agents that actually ship.

The best approach is to start simple. Begin with Pattern 1 (sequential exchanges). Once you're comfortable, layer in Pattern 2 (injection). Then Pattern 3 (branching). Then multi-agent orchestration. Complexity grows gradually, not all at once.

Multi-turn conversations are a craft. Practice them, refine them, and watch your agent workflows transform from brittle scripts into resilient, adaptive systems that handle real-world complexity.

The Deeper Pattern: Why Multi-Turn Conversations Mirror Human Thinking

There's something subtle happening when you work with multi-turn conversations that goes beyond technical mechanics. When humans solve complex problems, we don't do it in a single thinking burst. We ask questions, gather information, form hypotheses, test them, refine our understanding, and iterate. Multi-turn conversations encode this exact pattern into your agent workflows.

This is important because it means your agents aren't just faster versions of single-turn APIs—they're fundamentally different in capability. A single-turn API call is like asking someone a question and expecting a complete answer without any follow-up. A multi-turn conversation is like working with a colleague who can ask clarifying questions, adjust based on feedback, and build context over time.

The implication for your systems is profound. When you architect multi-turn workflows, you're not optimizing for latency or token efficiency in isolation. You're optimizing for correctness, adaptability, and the ability to handle uncertainty. An agent that can ask "Do you want me to preserve backwards compatibility?" is more valuable than an agent that assumes the answer. An agent that can say "I need to see the database schema before writing migrations" and then incorporate that schema is more reliable than one that makes assumptions.

This is the hidden layer teaching: multi-turn conversations are how you build agents that collaborate with humans rather than just execute tasks. The difference feels small in theory but compounds massively in practice. Your system becomes more intelligent not because the model is smarter, but because the workflow is designed for back-and-forth refinement instead of one-shot execution.

Advanced Pattern: Context Compression and Summarization

As conversations grow longer, you eventually hit the context limit. But there's a sophisticated pattern for managing this that goes beyond just truncating history. Context compression is an art—you want to preserve what matters while discarding what doesn't.

The key insight is that not all parts of your conversation history are equally important. Recent exchanges matter most. Exchanges from 20 turns ago matter less. The final decision on a critical question matters more than the initial brainstorming. Learning to weight these matters is what separates system that work at scale from systems that break under load.

One advanced pattern is building multiple summary levels. Instead of summarizing everything into one big summary, create hierarchical summaries. The last five exchanges stay full-detail. Exchanges 6-15 get summarized into a focused summary. Exchanges 16-40 get compressed further. Ancient history (50+ turns) becomes a single line. This approach preserves recent context fully while still compressing overall history.

Another pattern is semantic summarization rather than linear. Don't just summarize turns in order. Summarize by topic. All exchanges about the database schema get one summary. All exchanges about API design get another. When the agent needs to think about the schema again, it gets the schema summary. When it thinks about API design, it gets that summary. This topical approach often preserves more relevant context than chronological summarization.

The practical benefit: your conversations can run for 100+ turns without hitting the wall. You maintain coherent context across what feels like an entire project lifecycle. The agent can reference decisions made 50 turns ago because they're in the summary it consulted, not because they're in the literal message history.

Real-World Example: Multi-Turn Code Generation Workflow

Let's trace through what a realistic multi-turn code generation looks like from start to finish. A developer starts a session and asks Claude to build a user authentication system. This isn't a single-turn "write me an auth system." It's a conversation.

Turn 1: Developer describes high-level requirements—JWT authentication, social login support, role-based access control. Claude asks clarifying questions about the tech stack, deployment target, and security requirements. Developer responds. Claude now has context.

Turn 2: Developer asks Claude to propose the architecture. Claude generates a multi-layer design, proposes database schema, suggests libraries, and asks about preferences. Developer approves the overall structure but requests changes to one aspect.

Turn 3: Claude revises the architecture based on feedback, then asks about error handling strategy and logging approach. Developer provides guidance. Claude stores this guidance in conversation context.

Turn 4: Claude proposes the core authentication service code. The proposal includes database operations, JWT generation, and session management. Developer reviews, suggests a refactor to one function.

Turn 5: Claude refactors that function, then generates the API endpoints for authentication. Developer tests the generated code against their test suite and discovers an edge case—what happens when a social login token expires mid-session?

Turn 6: Claude asks for clarification about the expected behavior, then updates the code to handle that edge case. Claude also proactively suggests adding logging for audit trails, which developer approves.

Turn 7: Claude generates test suites for the implementation, including edge case tests. Developer runs them, all pass. Claude generates documentation.

Turn 8: Developer asks Claude to refactor for performance. Claude proposes caching strategies, database optimizations, and suggests moving certain operations to background jobs. Developer asks for just the caching strategy first.

Turn 9: Claude implements caching, benchmarks it, and shows performance improvements. Developer is satisfied.

That entire workflow—a complete, production-ready authentication system—happened in nine turns. Each turn built on previous context. Claude never lost track of the architecture, the decisions made, the constraints imposed, or the requirements specified. That's not possible with fire-and-forget API calls.

The Debugging Advantage in Multi-Turn Systems

Here's something that matters in production: when things go wrong, multi-turn conversations make debugging easier. When an agent fails, you don't just have the failure—you have the entire conversation context leading up to it. You can see exactly what assumptions led to the failure, what information the agent had, what paths it considered.

This is invaluable for iterating toward correctness. An agent generates code that fails a test. Instead of starting over, you see the test failure, inject it back into the conversation ("This test failed, here's why"), and the agent adjusts. The agent remembers everything it tried before, so it doesn't repeat failed approaches. It builds on what it learned.

Compare this to single-turn requests where you lose all context. You'd have to re-describe the problem, the original code, the constraints, everything. In multi-turn, you just say "The test failed because of X" and the agent understands what you mean in the context of everything it already knows.

The practical benefit: you fix complex problems faster because the system maintains institutional memory about what it's been trying and why.

-iNet

Need help implementing this?

We build automation systems like this for clients every day.

Discuss Your Project