
Here's a situation you've probably faced: you spawn an agent to handle a multi-stage task. It reads files, makes decisions, performs work. Then—boom—halfway through, you lose the conversation history. Your agent starts asking the same questions again. It forgets critical facts you established five minutes ago. It's like watching someone suffer from severe amnesia in real time.
The problem is context windows are finite. Every AI model, including Claude, has a maximum context length. Once you hit that limit, the model can't see earlier messages. For long-running agents orchestrating complex workflows, this isn't just inconvenient—it's catastrophic. Tasks fail. State gets lost. You end up manually recreating context from scratch.
Claude Code solves this with a sophisticated memory and context management system that works across three layers: persistent disk storage, in-session memory tracking, and intelligent context compression. In this article, we're diving deep into how this system works, why it matters for multi-stage agent workflows, and how you configure it to keep your agents sharp, informed, and context-aware—even when running for hours or days. You'll understand what happens behind the scenes and how to design agents that don't forget what they're doing midway through.
Table of Contents
- The Context Window Problem
- How Claude Code's Memory System Works
- Layer 1: Persistent Disk Memory
- Layer 2: Session Checkpoints
- Layer 3: In-Session Context Compression
- Full Conversation History
- Compressed Summary
- Key Decision Points
- Injecting Persistent Context with CLAUDE.md
- The CLAUDE.md Structure
- System Overview
- Project Structure
- Story Development Modes
- Quality Checks
- Why This Matters
- Memory Directory Patterns for Cross-Session Knowledge
- Pattern 1: Write Findings, Index Them
- Pattern 2: Decision Records as Context
- Context
- Decision
- Consequences
- Related Decisions
- Pattern 3: Task State Tracking
- Active Task: Prose Validation Pass
- Completed Task: World-Building Review
- Context Compression Strategies
- Strategy 1: Minimal Compression (Detailed Work)
- Strategy 2: Aggressive Compression (Bulk Processing)
- Strategy 3: Checkpoint-Driven (Multi-Phase Tasks)
- Context Overflow: What Happens and How to Handle It
- The Overflow Moment
- Avoiding Context Overflow: Best Practices
- Practice 1: Scope Work into Focused Phases
- Practice 2: Write Findings to Disk Early
- Practice 3: Use Focused Instructions
- Practice 4: Set Context Budgets
- Advanced Pattern: Focused Agent Instructions
- Your Expertise
- What You Ignore
- Your Tool: Write Dialogue
- Success Criteria
- Your Limits
- Real-World Example: Multi-Chapter Validation Pass
- Phase 1: Setup and Discovery (Agent: style-enforcer)
- Phase 2: Scale-Out Validation (Multiple agents, parallel)
- Phase 3: Aggregation (Agent: manuscript-assembler)
- Memory Index Files and Discovery
- Knowledge Base Index
- Patterns Detected
- Best Practices
- Technical Decisions
- Research Completed
- Building Your Index
- Context Window vs. Token Budget Trade-offs
- High-Context Agents (180K-200K available)
- Low-Context Agents (60K-80K available)
- Medium-Context Agents (120K-150K available)
- Choosing Your Strategy
- Monitoring Context Usage in Real-Time
- Context Management Checklist
- Memory Consolidation: When Sessions End
- Step 1: Collect Session Data
- Step 2: Extract Patterns and Decisions
- Step 3: Write to Long-Term Memory
- Context
- Decision
- Implementation
- Step 4: Archive Old Sessions
- Why Consolidation Matters
- Advanced Technique: Cross-Agent Context Sharing
- Pitfall: Memory Bloat and Stale Knowledge
- Key Takeaways
- Real-World Impact: When Context Management Saves the Day
- Designing Agents for Memory-Aware Execution
- Memory System Best Practices
- Monitoring Agent Health Through Memory
- The Philosophy: External Memory Over Internal Context
- Conclusion: Memory is the Multiplier
The Context Window Problem
Let's be concrete about what we're up against. A context window is your agent's working memory—the part of the conversation it can actually "see" and reason about.
When you create a subagent in Claude Code, the agent starts with a fresh context window (around 200K tokens with Claude Haiku, or 8K with smaller models). This window includes:
- System prompt (~2K tokens): Instructions about how to behave, your project's CLAUDE.md, validation rules
- Agent-specific instructions (~1-3K tokens): Goals, tools, constraints
- Conversation history (~190K tokens available): Previous messages, file contents you've shown, outputs from tool calls
- Current request (~variable): What you're asking the agent to do
As the agent works, it consumes context with every action:
- Reading a large file? That's 5K-50K tokens gone.
- Tool call output from running tests? 10K-100K tokens.
- File diffs showing changes? 2K-10K tokens per file.
- Multiple back-and-forths debugging an issue? 50K+ tokens easily.
- Processing test output with stack traces? 20K-80K tokens.
On a 200K context window, you might get 10-15 meaningful interactions before you're approaching the limit. For multi-hour or multi-day tasks, that's nowhere near enough. It's like having a person who forgets everything you told them more than a few conversations ago.
What happens when you run out of context?
The agent doesn't fail gracefully. It can't. Once the context window fills up, the model can no longer see earlier messages. Your agent loses:
- Task history (why are we doing this again?)
- Earlier findings (didn't we already debug this file?)
- Established facts (what's the project's architecture?)
- Decisions made (we agreed on this approach, remember?)
The agent either stops working or resets to square one. Either way, you're paying the efficiency cost. It's the equivalent of having a teammate who's competent but has anterograde amnesia. Each conversation feels like the first time they're hearing about the project.
How Claude Code's Memory System Works
Claude Code tackles this with a three-layer memory architecture. Each layer solves a different part of the problem.
Layer 1: Persistent Disk Memory
The memory/ directory is your agent's long-term storage. Unlike the context window (which vanishes when a conversation ends), disk memory persists across sessions. This is where agents write down everything they learn.
The directory structure looks like this:
memory/
├── context/ # Active task state
│ ├── active-tasks.md
│ ├── recent-changes.jsonl
│ └── codebase-map.md
├── knowledge/ # Indexed facts and patterns
│ ├── best-practices.md
│ ├── detected-patterns.jsonl
│ └── api-signatures.md
├── decisions/ # Architecture decisions (ADRs)
│ ├── ADR-001-memory-structure.md
│ ├── ADR-002-implementation-agents.md
│ └── INDEX.md
├── research/ # Investigation results
│ ├── findings.md
│ └── competitive-analysis.md
├── sessions/ # Session checkpoints
│ ├── [uuid].json # Conversation snapshot
│ └── index.md
├── validation/ # Test results, quality gates
│ ├── test-results.jsonl
│ └── coverage-report.md
└── hooks-log.jsonl # Audit trail of all operationsWhen an agent completes a phase of work, it writes findings to disk. Those findings survive the conversation. Here's what that looks like:
# memory/knowledge/detected-patterns.jsonl
{
"timestamp": "2026-03-16T14:22:00Z",
"agent": "prose-generator",
"pattern": "dialogue-hooks",
"description": "Character dialogue uses present-tense verbs to convey urgency",
"examples": ["Chapter 3, line 45", "Chapter 5, line 122"],
"confidence": 0.92,
}This persists even if the agent's context window fills. The next session, the agent (or a different agent) can read this file and instantly absorb the pattern without re-discovering it. It's like having a notebook that survives between conversations.
Layer 2: Session Checkpoints
When context approaches the limit, Claude Code creates a session checkpoint—a JSON snapshot of the current conversation state. This is like taking a save-game screenshot before the system runs out of memory.
# memory/sessions/{uuid}.json
{
"session_id": "23135046-9cbd-4204-a01f-2a6cf845389b",
"timestamp": "2026-03-16T14:45:00Z",
"agent": "style-enforcer",
"task": "Validate prose against fiction style guide",
"progress":
{
"files_checked": 12,
"violations_found": 3,
"current_file": "chapters/chapter-05.md",
},
"state":
{
"violations":
[
{
"file": "chapters/chapter-03.md",
"line": 45,
"rule": "filter-words",
"violation": "just",
"suggestion": "Remove 'just' for directness",
},
],
"next_action": "Continue with chapter-06.md",
"context_used": 187432,
"context_remaining": 12568,
},
"resumable": true,
}The agent can then load this checkpoint and resume where it left off:
# Agent resumes by reading the checkpoint
cat memory/sessions/23135046-9cbd-4204-a01f-2a6cf845389b.json
# Extract: "current_file": "chapters/chapter-05.md"
# Extract: "next_action": "Continue with chapter-06.md"
# Extract violations already found
# Continue from chapter-06 instead of restartingNo lost work. No repeated analysis. The agent picks up mid-thought. It's like handing a colleague a note: "Here's where I left off. Keep going from here."
Layer 3: In-Session Context Compression
While an agent is working, Claude Code monitors context usage. When you approach 80-90% of the window, the /compact command kicks in automatically (or you can invoke it manually):
/compact strategy=summary max_tokens=50000This command:
- Identifies key information from the full conversation
- Removes redundant details (did we really need to see that full test output twice?)
- Creates a concise summary of what's happened so far
- Rewrites the conversation history to a compressed form
Here's what compression looks like in practice. The dramatic difference shows why this matters:
Before compression (180K tokens of full history):
## Full Conversation History
[User]: Read the authentication module
[Agent]: I've read src/auth.js. Here's the full file: <4,000 words>
[User]: Now check if there are security issues
[Agent]: I found 3 issues... <detailed analysis>
...
[100+ more messages, including repeated file reads, debug outputs, etc.]After compression (35K tokens):
## Compressed Summary
**Task**: Audit authentication module for security issues
**Key Findings**:
- File: src/auth.js
- Issues Found: 3 critical, 2 medium
1. Passwords stored in plaintext (line 45)
2. Missing CSRF token validation (line 78)
3. Session timeout not enforced (line 120)
**Current Status**: Ready to propose fixes
**Context Preserved**: All critical facts, findings, and next steps
---
## Key Decision Points
- Approached context limit at 187K tokens
- Compressed to preserve task continuity
- Ready to resume work without information lossThe agent continues working with 140K-150K context available again, instead of being stuck at the 12K limit. It's not a perfect process—you lose some detailed history—but it's better than losing everything.
Injecting Persistent Context with CLAUDE.md
Here's where it gets powerful. When Claude Code creates a subagent, it automatically loads CLAUDE.md into the agent's context. This file contains your project's configuration, conventions, and knowledge—everything the agent needs to understand your codebase before it even starts working.
Here's how it works:
The CLAUDE.md Structure
Your CLAUDE.md file sits at the repository root and acts as a context injection manifest. It's like handing a new employee the employee handbook before their first day:
# CLAUDE.md - Project Configuration
## System Overview
You are a story authoring system that combines creative writing expertise with narrative development.
## Project Structure
/ # Repository root
├── .claude/ # Configuration
├── projects/ # Project workspaces
├── memory/ # System memory
└── standards/ # Style guides
## Story Development Modes
- **Discovery Mode**: Exploratory writing
- **Structured Mode**: Following outlines
- **Revision Mode**: Editing existing content
## Quality Checks
Before delivering content:
1. Consistency with established facts
2. Character voice authenticity
3. Pacing and narrative flow
4. Conflict and tension
5. Scene purpose and progressionWhen a subagent starts, this file is automatically loaded at the beginning of its context, before any conversation history. The agent immediately understands:
- Your project's purpose and structure
- Development patterns and conventions
- Quality standards it should follow
- Available tools and commands
- Validation rules and gates
- Technology stack and best practices
Why This Matters
Without CLAUDE.md, every agent would need to ask: "What's this project about? How are files organized? What's the style guide?"
With it, agents start context-aware. They understand the landscape instantly. They can make better decisions because they have the blueprint. It's the difference between a new hire who's been briefed versus one who's walking in cold.
For a large project, CLAUDE.md might be 5K-10K tokens, but it saves 50K+ tokens across multiple agents because nobody's asking basic questions or re-discovering patterns. It's an upfront investment that pays compound dividends.
Memory Directory Patterns for Cross-Session Knowledge
The memory/ directory isn't just a file dump. It's an indexed knowledge base. Here's how agents use it effectively:
Pattern 1: Write Findings, Index Them
When an agent discovers something useful, it writes to disk and creates an index entry:
# Agent discovers a pattern
echo '{
"timestamp": "2026-03-16T14:22:00Z",
"pattern_id": "p-001",
"pattern": "dialogue-urgency-markers",
"description": "Character dialogue uses present-tense verbs and short sentences when urgent",
"evidence": ["Chapter 3, L45", "Chapter 5, L122"],
"confidence": 0.92
}' >> memory/knowledge/detected-patterns.jsonl
# Agent updates index
echo "- p-001: dialogue-urgency-markers (confidence: 0.92)" >> memory/knowledge/INDEX.mdLater, any agent can query the index efficiently:
grep "dialogue" memory/knowledge/INDEX.md
# Output: p-001: dialogue-urgency-markers (confidence: 0.92)
# Then read the full pattern
jq '.[] | select(.pattern_id == "p-001")' memory/knowledge/detected-patterns.jsonlThis two-level system (index + detail files) keeps lookups fast. You don't scan megabytes of data to find what you need.
Pattern 2: Decision Records as Context
When agents make architectural decisions, they document them as ADRs (Architecture Decision Records):
# memory/decisions/ADR-005-context-compression-strategy.md
# ADR-005: Context Compression Strategy for Long-Running Agents
## Context
Long-running agents exceed context limits during multi-hour tasks. We need a strategy to maintain continuity.
## Decision
Use three-phase compression:
1. Remove redundant outputs (repeated file reads)
2. Summarize findings into bullets
3. Preserve decision points and state
## Consequences
- Agents can run indefinitely without manual intervention
- Some detailed history is lost (acceptable tradeoff)
- Faster resume times
- Reduced token usage by ~40%
## Related Decisions
- ADR-003: Session CheckpointsOther agents read this and understand why a decision was made, not just what was decided. This prevents agents from undoing each other's work. It's a decision log that travels across time.
Pattern 3: Task State Tracking
Active tasks are tracked in a living document:
# memory/context/active-tasks.md
## Active Task: Prose Validation Pass
- **Agent**: style-enforcer
- **Status**: In Progress (75% complete)
- **Progress**: 12/16 chapters validated
- **Next Step**: Validate chapters 13-16 for filter words
- **Blockers**: None
- **Last Updated**: 2026-03-16 14:30:00
## Completed Task: World-Building Review
- **Agent**: world-builder
- **Status**: Complete
- **Output**: memory/knowledge/worldbuilding-patterns.md
- **Findings**: 7 inconsistencies in magic system, all documented
- **Last Updated**: 2026-03-16 12:15:00When you spawn a new agent, it reads this file and understands what's already been done. No duplicate work. No redundant analysis. Multiple agents coordinate through shared state files.
Context Compression Strategies
As you design agents, you control how aggressively context is compressed. Different tasks need different approaches. The right strategy depends on your task's needs.
Strategy 1: Minimal Compression (Detailed Work)
For tasks where nuance matters (editing dialogue, analyzing character arcs), use minimal compression:
/compact strategy=minimal max_tokens=30000This keeps 70K+ tokens of conversation history intact. The agent remembers detailed discussions about subtle choices. It's good for:
- Detailed editing work
- Quality assurance passes
- Complex decision-making
- Writing where tone and nuance are critical
- Iterative problem-solving
Tradeoff: Less context available for new work. You're trading future capacity for historical detail.
Strategy 2: Aggressive Compression (Bulk Processing)
For tasks where you just need the results (running tests across 100 files, validating syntax), compress aggressively:
/compact strategy=aggressive max_tokens=10000This compresses the history to just the essentials: what we're doing, key findings, next steps. Most detailed debugging notes get dropped.
Tradeoff: You lose detailed debugging history, but you get 100K+ tokens for new work.
Good for:
- Bulk file operations
- Test suites (you care about pass/fail, not the full log)
- Validation passes
- Processing large file sets
- High-volume, low-complexity tasks
Strategy 3: Checkpoint-Driven (Multi-Phase Tasks)
For tasks with natural phase breaks, use checkpoints instead of compression:
# Phase 1: Analyze structure
/checkpoint save analysis-phase
# Phase 2: Implement changes
/checkpoint restore analysis-phaseThis creates explicit save points. You can pause, resume, or even branch to explore alternative approaches. Each phase starts fresh but has the previous phase's findings available.
Context Overflow: What Happens and How to Handle It
Despite your best efforts, context sometimes fills up completely. Here's what happens and how to recover.
The Overflow Moment
Your agent is working. It makes a tool call. The output is massive (e.g., running a test suite with 500 tests).
Before: 140K tokens used, 60K available
Tool Output: 85K tokens
After: Would need 225K tokens (exceeds 200K limit)
Claude Code detects this and pauses the agent. It writes a checkpoint and returns an error:
ERROR: Context overflow at step 47
- Context used: 225,000 tokens
- Context limit: 200,000 tokens
- Deficit: 25,000 tokens
Checkpoint saved: memory/sessions/{uuid}.json
Next action: Compress context and resume
Run: /compact strategy=aggressive
Then: /resume
You now have options for recovery:
Option 1: Aggressive Compression and Resume
/compact strategy=aggressive max_tokens=15000
/resumeThe agent discards detailed history, keeps the essentials, and resumes with 140K+ context available. Fastest recovery.
Option 2: Break into Subphases
# Save current progress
/checkpoint save phase-1-complete
# Spawn a fresh agent for phase 2
/dispatch prose-generator phase-2-rewrite --resume-from phase-1-completeThe fresh agent reads the checkpoint and continues where the previous one left off. Good for long tasks that need multiple agents.
Option 3: Archive and Start Fresh
# If the task is stalled and not worth recovering
/archive current-session reason="Exploratory dead-end"
/story-craft restartRarely needed, but sometimes it's faster to start fresh than to debug a context-constrained agent.
Avoiding Context Overflow: Best Practices
Here's how experienced Claude Code operators prevent context overflow before it happens.
Practice 1: Scope Work into Focused Phases
Instead of one mega-task, break it into phases:
# GOOD: Scoped phases with clear boundaries
Phase 1: Analyze current state (2K token output)
Phase 2: Propose changes (5K token output)
Phase 3: Implement changes (3K token output)
Phase 4: Validate changes (2K token output)
# BAD: One open-ended task
"Here's a 400-page manuscript, analyze everything and suggest revisions"Scoped phases naturally limit context consumption. Each phase is small enough to fit in the window with room to spare.
Practice 2: Write Findings to Disk Early
Don't wait until context fills. Write findings as you go:
# As agent discovers something
echo "Found 3 filter-word violations in chapter 3" >> memory/context/findings.md
# As agent makes a decision
echo "Decided to use present-tense dialogue in action scenes" >> memory/decisions/current.md
# As agent progresses
echo "Validated chapters 1-5, moving to chapter 6" >> memory/context/active-tasks.mdThis way, even if context fills, your work is safe on disk. It's like saving frequently in a video game.
Practice 3: Use Focused Instructions
Instead of generic instructions, give agents specific, bounded tasks:
# GOOD: Focused task with clear exit criteria
Task: Validate prose in chapters/chapter-03.md
- Check for filter words (found in standards/fiction_style_guide.md)
- Check for passive voice in action scenes
- Exit when file is validated or 3 issues found
# BAD: Open-ended, context-hungry task
Task: Validate all prose in the manuscript
- Check everything against the style guide
- Suggest improvements
- ... (no natural exit point)Focused tasks consume less context because they have clear boundaries. The agent knows when to stop.
Practice 4: Set Context Budgets
Before spawning an agent, decide: "How much context can I afford to use for this task?"
/dispatch style-enforcer validate-prose \
--task-description "Validate chapter-03.md for filter words" \
--context-budget 30000 \
--compression-strategy minimalIf the agent exceeds the budget, it automatically triggers compression or checkpoint. It's like giving the agent a token allowance.
Advanced Pattern: Focused Agent Instructions
The most powerful context management technique is focused instructions. Instead of loading all your project knowledge into every agent, you load only what that specific agent needs.
Here's an example. Instead of a 10K CLAUDE.md, the agent gets a 3K specialization:
# Agent: dialogue-crafter (3K token specialization)
You are the dialogue-crafter agent. Your role is to write character conversations.
## Your Expertise
- Character voice authenticity
- Subtext and unspoken conflict
- Dialogue tags and beats
- Conversation pacing
## What You Ignore
- Prose description (other agents handle this)
- Plot structure (handled by story-architect)
- Grammar/mechanics (handled by style-enforcer)
## Your Tool: Write Dialogue
Input:
- Character names and current emotional states
- Conversation goal
- Setting/context
Output:
- Dialogue draft with subtext notes
- Stage directions
- Estimated scene duration
## Success Criteria
- Dialogue reflects each character's unique voice
- Subtext is clear to readers
- Goal of conversation is achieved
- Pacing feels natural
## Your Limits
- Never write narrative prose
- Never change established character traits
- Never introduce new plot elements
- When in doubt, ask the story-architect agentThis agent has a 3K instruction set, plus 5K of relevant context it loads on demand. Instead of loading the entire CLAUDE.md project file (10K tokens), it loads only its specialty. Result: 50% more available tokens for actual work.
Real-World Example: Multi-Chapter Validation Pass
Let's walk through a real scenario: validating 31 chapters of a novel for style consistency. This is exactly the kind of task where context management makes the difference between success and failure.
Phase 1: Setup and Discovery (Agent: style-enforcer)
Agent starts with:
- CLAUDE.md loaded (8K tokens)
- Focused agent instructions (3K tokens)
- memory/knowledge/fiction_style_guide.md loaded (5K tokens)
- Context available: ~184K tokens
Task: Validate chapters 1-5, create violation registry
Progress:
- Reads chapter-01.md (2K tokens consumed)
- Analyzes for filter words, passive voice (3K tokens for analysis)
- Writes findings: memory/knowledge/violations.jsonl (persisted to disk)
- Moves to chapter-02.md (2K tokens consumed)
- ... continues for chapters 3-5
After validating chapters 1-5:
- Context used: 45K tokens
- Findings on disk: 12 violations found
- Ready for next phasePhase 2: Scale-Out Validation (Multiple agents, parallel)
Agent A (chapters 6-11)
Agent B (chapters 12-17)
Agent C (chapters 18-24)
Agent D (chapters 25-31)
Each agent:
- Reads its own CLAUDE.md (8K tokens)
- Reads shared violations registry from disk (1K tokens)
- Knows what issues Agent 1 found (from memory/knowledge/)
- Validates its chapters
- Writes violations back to shared registry
Because they're isolated in worktrees, they don't interfere.
Parallel execution: 4 agents validate all 31 chapters simultaneously.Phase 3: Aggregation (Agent: manuscript-assembler)
Agent starts with:
- All violations from memory/knowledge/violations.jsonl (2K tokens)
- Summary of findings from each agent (3K tokens from checkpoint files)
- Task: Create violation report and summary
Output:
- violations_summary.md: Organized by chapter, by violation type
- recommendations.md: Suggested fixes
- next_steps.md: What to do next
Total context used: 50K tokens (fresh agent can afford this)
Task complete.Total tokens consumed: ~200K (across all agents, across 3 phases) Total time: ~30 minutes (parallel execution) Key insight: By writing findings to disk immediately, agents don't duplicate work. Each validates only their chapters, once.
This is impossible without the memory system. With a simple sequential approach, the second agent would re-discover what the first found. With proper memory management, they coordinate through shared state.
Memory Index Files and Discovery
One challenge with disk-based memory is discoverability. You write findings to memory/knowledge/, but how does an agent know what files exist? How does it efficiently find the right information without scanning every file?
Claude Code uses index files to solve this. Think of them like a card catalog in a library:
# memory/knowledge/INDEX.md
## Knowledge Base Index
### Patterns Detected
- p-001: dialogue-urgency-markers (confidence: 0.92)
File: detected-patterns.jsonl
Updated: 2026-03-16 14:22:00
- p-002: scene-transition-rhythm (confidence: 0.87)
File: detected-patterns.jsonl
Updated: 2026-03-16 15:44:00
### Best Practices
- bp-001: show-dont-tell in action scenes
- bp-002: character-voice-consistency
- bp-003: pacing-in-dialogue
### Technical Decisions
- ADR-001: Memory structure design
- ADR-002: Implementation patterns
- ADR-003: Session checkpoint format
### Research Completed
- competitive-analysis (2026-01-15)
- market-trends (2026-02-10)
- best-practices-research (2026-03-01)When an agent needs to find something, it reads the index first:
# Agent needs to validate dialogue style
grep "dialogue" memory/knowledge/INDEX.md
# Output: p-001: dialogue-urgency-markers
# Agent loads just that pattern file
jq '.[] | select(.pattern_id == "p-001")' memory/knowledge/detected-patterns.jsonlThis is vastly more efficient than scanning all files. The index acts as a catalog, making memory discoverable and efficient. A simple index file prevents expensive full-directory scans.
Building Your Index
Indexes are human-created and updated as you work. Here's how:
# As your style-enforcer agent discovers a pattern
echo "- p-003: passive-voice-in-action (confidence: 0.95)" >> memory/knowledge/INDEX.md
# Create the actual pattern file
echo '{
"pattern_id": "p-003",
"pattern": "passive-voice-in-action",
"rule": "Use active voice in action scenes for urgency",
"examples": ["Chap 3, Line 12", "Chap 5, Line 89"],
"confidence": 0.95
}' >> memory/knowledge/detected-patterns.jsonlWell-maintained indexes mean agents can find critical context in milliseconds instead of scanning megabytes of files. It's a small investment that pays big dividends when you have many agents and lots of history.
Context Window vs. Token Budget Trade-offs
When configuring agents, you face a fundamental trade-off: context window size vs. work scope. Understanding this trade-off helps you design efficient agents.
Larger context windows mean more room for history and examples, but they also mean more overhead. Smaller, focused windows mean you can fit fewer details, but agents stay sharp and focused.
Here's how to think about it:
High-Context Agents (180K-200K available)
Good for: Complex decisions, detailed editing, quality assurance
Agent: prose-reviewer
Context Strategy: High (minimize compression)
Task: Deep edit of chapter 5
Context allocation:
- CLAUDE.md: 8K
- Agent instructions: 5K
- Style guide: 8K
- Chapter 5 full text: 12K
- Previous editing history: 30K
- Conversation so far: 25K
- Available for work: 112KThis agent has room for detailed conversations about tone, nuance, and character voice. It can go back and forth discussing subtle edits without running out of space.
Low-Context Agents (60K-80K available)
Good for: Bulk processing, validation passes, high-volume tasks
Agent: violation-scanner
Context Strategy: Low (aggressive compression)
Task: Scan all 31 chapters for specific violations
Context allocation:
- CLAUDE.md: 3K (minimal)
- Agent instructions: 2K (highly focused)
- Style guide rules (just the relevant section): 2K
- Violations found so far: 3K
- Available for work: 70KThis agent is laser-focused. It scans files fast, records violations, and doesn't maintain long conversation history. High throughput, low overhead.
Medium-Context Agents (120K-150K available)
Good for: Most general tasks, mixed work, iterative problem-solving
Agent: test-engineer
Context Strategy: Medium (selective compression)
Task: Run test suite, debug failures, propose fixes
Context allocation:
- CLAUDE.md: 5K
- Agent instructions: 3K
- Test framework reference: 4K
- Current test output: 20K
- Previous debug attempts: 15K
- Conversation: 20K
- Available for work: 83KThis agent has room for iteration but isn't swimming in history. It can explore options and debug problems without getting bogged down.
Choosing Your Strategy
Ask yourself when designing an agent:
- Is this a one-off task or recurring work? (Recurring → lower context, save to memory)
- Do I need detailed conversation? (Yes → higher context, minimal compression)
- Am I processing high volume? (Yes → lower context, aggressive compression)
- Is this decision-heavy or action-heavy? (Decision → higher context, Action → lower context)
The best agents are right-sized for their task. An agent solving one focused problem with a 60K context window outperforms an agent trying to solve everything with a bloated 200K window.
Monitoring Context Usage in Real-Time
Claude Code provides tools to monitor context consumption as agents work. Visibility is key to preventing overflow:
# Check current context status
/context status
# Output:
# Context Used: 145,000 / 200,000 (72.5%)
# Context Remaining: 55,000
# Compression Trigger: 160,000 (80%)
# Estimated steps remaining: 8-12
# View context consumption by component
/context breakdown
# Output:
# System Prompt: 8,000 (4%)
# Agent Instructions: 3,000 (1.5%)
# CLAUDE.md: 5,000 (2.5%)
# Conversation History: 85,000 (42.5%)
# Tool Outputs: 44,000 (22%)
# Working Space: 50,000 (25%)This transparency lets you make informed decisions about compression, checkpointing, or phase breaks before you hit limits. You can see what's eating context and adjust accordingly.
Context Management Checklist
Before you spawn an agent for a long-running task, ask:
- Is CLAUDE.md loaded? (It should be, automatically)
- Did I write a focused agent instructions file? (3K instead of 10K is better)
- Did I break the task into phases? (Each phase should fit in 60K-100K context)
- Are findings written to disk? (Not just in conversation history)
- Is there a clear exit criteria? (When is this agent done?)
- Can I run parallel agents? (Worktrees for isolation?)
- What's my context budget? (How much context can this task afford?)
- Do I have a compression strategy? (What happens if we approach the limit?)
- Did I set up indexes for memory discovery? (Agents can find what they need)
- Am I monitoring context in real-time? (Using /context commands)
Answer these questions before spawning, and you'll rarely hit context overflow.
Memory Consolidation: When Sessions End
At the end of a day, a week, or a project phase, Claude Code runs memory consolidation. This is an automated process that takes the day's work and distills it into permanent knowledge.
Consolidation:
- Reviews all session checkpoints from the past day
- Extracts key insights (patterns, decisions, blockers)
- Consolidates into long-term memory (memory/knowledge, memory/decisions)
- Archives old sessions (memory/sessions/archive)
- Creates a session summary for your reference
Here's what consolidation does, step by step:
Step 1: Collect Session Data
# System finds all sessions from the last 24 hours
ls memory/sessions/*.json | grep -E "2026-03-1[5-6]"
# For each session, read the state
cat memory/sessions/23135046-9cbd-4204-a01f-2a6cf845389b.json | jq '.state'Step 2: Extract Patterns and Decisions
# From session 23135046: style-enforcer agent
Findings:
- Filter words appear 3x per chapter on average
- Best fix: Remove in editing phase
- Tool words: "just", "really", "very"
Decision:
- Implement automated filter-word detection
- Run as part of pre-publication validation
Pattern:
- Dialogue urgency correlates with sentence length
- Short sentences = high tension
- Long sentences = calm reflectionStep 3: Write to Long-Term Memory
# Pattern gets written to memory/knowledge/
echo '{
"source_sessions": ["23135046", "9cbd4204", ...],
"consolidated_date": "2026-03-16",
"pattern": "filter-words-prevalence",
"finding": "Average 3 filter-word violations per chapter",
"recommendation": "Implement automated scanning before publication"
}' >> memory/knowledge/consolidated-findings.jsonl
# Decision gets written to memory/decisions/
echo '# ADR-006: Automated Filter-Word Detection
## Context
Sessions 23135046, 9cbd4204, etc. discovered filter words ("just", "really", "very")
appear ~3x per chapter, causing narrative weakness.
## Decision
Implement automated filter-word detection as pre-publication gate.
## Implementation
See memory/knowledge/consolidated-findings.jsonl (2026-03-16)
' > memory/decisions/ADR-006-filter-word-automation.mdStep 4: Archive Old Sessions
Sessions older than 30 days are archived:
# Old session gets moved to archive
mv memory/sessions/old-uuid.json memory/sessions/archive/2026-01-{date}-old-uuid.json
# Index is updated
echo "2026-01-15: session-uuid-a (prose-generator, complete)" >> memory/sessions/archive/INDEX.mdThe next time you need details from that old session, it's in the archive. But your active memory stays lean.
Why Consolidation Matters
Without consolidation, your memory/ directory would grow indefinitely. With it:
- Active memory stays efficient (current sessions only)
- Knowledge accumulates (patterns and decisions from many sessions)
- History is preserved (archived, but available)
- Insights compound (patterns from 10 sessions get consolidated into 1 actionable decision)
Consolidation is usually automatic (runs on /improve-loop or at end-of-day), but you can trigger it manually:
/consolidate period=24h target=knowledge
# Consolidates all sessions from the last 24 hours into long-term knowledgeAdvanced Technique: Cross-Agent Context Sharing
Here's a power move: multiple agents sharing context asynchronously. They work in parallel but learn from each other through shared disk memory.
Imagine you have:
- Agent A: Currently writing chapter 3
- Agent B: Waiting to edit chapter 2
- Agent C: Validating chapters 1 and 2
Without explicit sharing, each agent reinvents the wheel. With context sharing:
# Agent A (prose-generator) discovers a pattern
echo '{
"pattern_id": "p-004",
"pattern": "scene-opening-rhythm",
"description": "Scene openings feel strongest with concrete sensory detail",
"discovered_by": "prose-generator",
"timestamp": "2026-03-16T14:22:00Z",
"broadcast": true
}' >> memory/knowledge/shared-patterns.jsonl
# Agent B (prose-editor) reads the shared patterns on startup
cat memory/knowledge/shared-patterns.jsonl | jq '.[] | select(.broadcast == true)'
# Agent B incorporates this knowledge into its editing pass
# No re-discovery. No duplicate work. Pure leverage.
# Agent C (style-enforcer) also reads and applies the patternThe key is the "broadcast": true flag. Patterns marked for broadcast are automatically loaded by any agent that starts after that pattern was discovered.
This creates a knowledge sharing network where agents teach each other asynchronously through disk memory. Agent A's discoveries become Agent B's starting assumptions.
Pitfall: Memory Bloat and Stale Knowledge
One risk with extensive disk memory: your memory directory becomes a junkyard. After many sessions and months of work, you might have:
- 50 old session checkpoints that aren't relevant anymore
- 20 patterns that were disproven
- 10 decisions that were reversed
- 100s of findings that are outdated
Without maintenance, memory becomes noise instead of signal. It stops helping.
How to avoid this:
- Archive regularly (consolidation does this, but monitor it)
- Review findings quarterly (update confidence scores, retire stale patterns)
- Consolidate across phases (when moving to a new novel chapter, consolidate the previous chapter's findings)
- Tag findings with confidence (high-confidence findings stay, low-confidence patterns get reviewed)
# In your findings files, include:
{
"pattern": "dialogue-urgency",
"confidence": 0.92, # High = keep
"last_validated": "2026-03-10",
"validation_count": 15, # Tested in 15 sessions
"status": "active", # vs "archived" or "stale"
}Agents can then filter: "show me high-confidence, recently validated patterns." This keeps the signal-to-noise ratio high.
Key Takeaways
Claude Code's memory and context management system is built on three core ideas:
-
Persistent disk memory survives the conversation. Agents write findings to
memory/, and those findings outlast any single agent session. Knowledge compounds across time. -
CLAUDE.md and focused instructions mean agents start smart. Instead of asking basic questions, they understand your project instantly and only load the knowledge they need.
-
Context compression and checkpoints keep agents productive even in long-running tasks. When context fills, agents compress what matters and resume without losing work.
Together, these three layers mean you can spawn agents that work for hours or days, across multiple phases, in parallel—without losing critical context or repeating work.
That's how Claude Code scales individual agents into agent teams. And that's how you build reliable, self-aware automation that doesn't forget what it's doing halfway through.
Real-World Impact: When Context Management Saves the Day
To understand why this matters, let's look at a real scenario. A literary agency wants to analyze 500 manuscript submissions automatically. Each manuscript is 40,000-100,000 words. The task: extract key data about each one (genre, themes, character development quality, market viability assessment).
Without memory management, this is impossible. A single agent analyzing manuscripts would exhaust its context window after 2-3 manuscripts and start forgetting what it had learned. With proper memory management:
Agent A analyzes manuscripts 1-50, writes findings to disk. Mid-way through manuscript 25, its context approaches the limit. It triggers a checkpoint, summarizing its progress: "Analyzed 24 manuscripts, found 3 with exceptional character development, average word count 62,000 words."
Agent B reads that checkpoint and continues from manuscript 25. No restart. No re-reading the same manuscripts. No repeating the analysis.
By manuscript 75, both agents notice a pattern in their findings: manuscripts from authors in tech (obvious by the language and references) have faster-paced dialogue but weaker emotional development. They write this pattern to disk.
Agent C, starting at manuscript 100, reads this pattern immediately. It now analyzes manuscripts with awareness of that pattern. Instead of spending cycles discovering the same insight, it can test whether the pattern continues and refine it.
By the end, all 500 manuscripts are analyzed in about a quarter of the time it would have taken sequentially. Multiple agents work in parallel. Knowledge compounds across agents. The analysis actually gets better as more data comes in because patterns from early manuscripts inform later analysis.
This is the power of memory management done right. It's not just efficiency—it's quality. Your later work benefits from knowledge discovered in earlier work.
Designing Agents for Memory-Aware Execution
When you design a new agent, think about its memory footprint. Will it need to retain detailed history, or can it summarize? Will it work in parallel with other agents, or sequentially?
A prose editor that makes iterative tweaks should use high-context strategy (keep conversation history, allow detailed back-and-forth about subtle choices). A script that validates 1,000 files against a style guide should use low-context strategy (aggressive compression, save findings to disk, move fast).
The key question: Will this agent need to refer back to previous work, or can it just record results and move on?
If the agent needs to refer back (iterative work, complex decision-making), keep context high. If it's just recording results (bulk validation, processing), use aggressive compression.
This isn't complicated, but it matters. Agents designed for their context strategy work smoothly. Agents forced into the wrong strategy fight against their constraints.
Memory System Best Practices
As you use Claude Code's memory system in production, these practices will help you avoid pitfalls:
1. Archive Old Sessions Regularly: Memory bloat kills efficiency. Run /consolidate weekly to move old session data to archives. Keep active memory lean.
2. Index Your Memory: Don't rely on grep and hope. Maintain an INDEX.md that catalogs what you know and where. When an agent needs to find something, it reads the index first, then loads just what it needs.
3. Version Your Decisions: Architecture Decision Records (ADRs) aren't just documentation. They're explanations that survive across time. When agents read ADRs, they understand not just what was decided, but why. This prevents them from undoing good decisions.
4. Use Timestamps Everywhere: When did you discover this pattern? When was the last time a finding was validated? Timestamps let agents filter: "show me patterns validated in the last week" vs. "show me patterns from months ago (probably stale)."
5. Validate Findings Before Persisting: Not every observation should become permanent knowledge. A pattern found once might be a fluke. A pattern found in 10 sessions is real. Only persist findings that have high confidence or multiple validations.
Monitoring Agent Health Through Memory
Your memory system is a window into agent health. If you notice:
- Agents repeating the same analysis: They're not finding previous findings in memory. Your memory index is incomplete or agents aren't checking it.
- Memory directory growing rapidly: Sessions aren't being consolidated. Run consolidation, or reduce session checkpointing frequency.
- Stale patterns being used: You're not pruning low-confidence findings. Set automatic archival for findings older than 30 days with low validation.
- Agents working sequentially when they could parallel: You're not using worktrees effectively. Agents should be able to work in parallel while sharing findings through disk memory.
These are all signals. Pay attention to them and adjust your memory strategy accordingly.
The Philosophy: External Memory Over Internal Context
The deepest insight behind Claude Code's memory system is this: external memory scales better than internal context.
An agent can only hold so much in its context window. But it can write unlimited information to disk. The bottleneck isn't "how much can we remember?" but "how efficiently can we organize and discover what we've written?"
This shifts your optimization from "how can we fit more in the context window?" to "how can we organize memory so agents find what they need instantly?"
It's like the difference between a person trying to remember everything in their head (limited, stressful) versus a person with good notes and a solid filing system (can scale indefinitely).
Claude Code gives you the filing system. Your job is to organize it well.
Conclusion: Memory is the Multiplier
Context management alone doesn't make Claude Code powerful. But combined with the ability to spawn multiple agents that share knowledge through persistent memory, it becomes a multiplier.
One agent validates a few chapters. It writes patterns to disk. A second agent reads those patterns and validates more chapters more intelligently. A third agent aggregates findings from both. The result is more than the sum of the parts because each agent learns from the others.
That's not just efficiency. That's intelligence scaling. That's how you turn a tool that can analyze one manuscript into a system that can analyze 500 and get smarter as it goes.
-iNet