OpenClaw Bootstrap Files: AGENTS.md, IDENTITY.md, TOOLS.md, and HEARTBEAT.md Explained

You know that moment when you're spinning up a new system and realize you're making a hundred tiny decisions about "how should this thing behave?" That's exactly what OpenClaw's bootstrap files solve. Instead of scattering configuration across a dozen places, these four files become your workspace's source of truth—the actual constitution that governs how your agents think, act, and interact.
Here's the thing: most systems treat configuration as an afterthought. OpenClaw does the opposite. The bootstrap files are where it all starts. Before any execution happens, before any memory gets written, before any tools get called—these files define what's possible. They're not just settings. They're the operating system of your workspace. And they exist for a reason: because most teams discover too late that scattered configuration is a nightmare.
Think about it. You've seen systems where agent behavior lives in environment variables, other behavior in code comments, retry logic buried in source files, permissions scattered across multiple config servers. Then something breaks in production and nobody can tell you why without reading ten different files and hoping they're all in sync. OpenClaw's answer? Centralize. Version. Make it auditable. Make it readable. Make it the truth.
Let me walk you through what each one does and, more importantly, why it needs to exist as a separate concern.
Table of Contents
- The Bootstrap Files: A Quick Map
- AGENTS.md: Defining Agent Behavior and Defaults
- The Structure
- Key Fields Explained
- Why This Matters: The Domain Knowledge Layer
- IDENTITY.md: The Personality and Presence Layer
- What Lives Here
- The Voice Layer
- Personalization at Scale
- TOOLS.md: The Capability Registry and Permissions Layer
- Structure and Permissions
- The Permission Model
- Rate Limiting and Quotas
- HEARTBEAT.md: The Execution Engine and Scheduling Configuration
- Execution Windows and Scheduling
- What Heartbeat Controls
- Retry Logic and Resilience
- How They Work Together: The System as a Whole
- Best Practices: Working With Bootstrap Files
- The Philosophy: Why This Matters at Scale
- Conclusion: The Constitution of Your Workspace
- Related Reading
The Bootstrap Files: A Quick Map
Before we dive into the details, here's what you're working with:
- AGENTS.md — The central switchboard. This is where you define agent types, their capabilities, and their default behaviors.
- IDENTITY.md — The personality layer. This tells agents who they are, how they should present themselves, and what their relationship to users looks like.
- TOOLS.md — The capability registry. Which tools can do what, who's allowed to use them, and what the guardrails are.
- HEARTBEAT.md — The execution engine. Scheduling, retry logic, execution windows, and how the system keeps itself alive.
They work together as an integrated system. Change one, and you're potentially affecting the others. That's by design.
AGENTS.md: Defining Agent Behavior and Defaults
This is where the magic starts. AGENTS.md exists because you need a single source of truth for agent behavior. Without it, you'd define agent behavior scattered across code, environment variables, and command-line flags. You'd have no visibility into what you actually built. AGENTS.md changes that. It's your agent configuration layer—where you define what kind of agents exist in your system, what resources they need, how long they can run, and which model backs their reasoning.
The Structure
AGENTS.md typically follows a hierarchical pattern:
agents:
default:
timeout: 300
retries: 3
model: claude-haiku-4-5
context_window: 100000
story-agents:
parent: default
timeout: 600
model: claude-opus-4-6
temperature: 0.8
allowed_tools: [research, websearch, file_write]
engineering-agents:
parent: default
timeout: 300
model: claude-haiku-4-5
temperature: 0.2
allowed_tools: [bash, git, code_analysis]
validation-agents:
parent: default
timeout: 180
model: claude-haiku-4-5
temperature: 0
allowed_tools: [test_runner, bash]See what's happening here? You've got a parent default configuration, and then each agent type inherits from it and overrides specific values. The story agents get a longer timeout (they're creative, they need thinking time). The engineering agents are fast and deterministic. Validation agents have zero temperature because they need to be... well, validly critical.
This inheritance pattern is intentional design. Instead of defining every agent from scratch, you establish a baseline (default) and then specialize. Want to create a new agent type? You don't duplicate the whole configuration. You reference default and override what's different. This scales beautifully—you can have dozens of agent types without configuration exploding into chaos.
Here's a real example of what that looks like in a moderately complex workspace:
agents:
default:
timeout: 300
retries: 3
model: claude-haiku-4-5
context_window: 100000
log_level: info
error_recovery: exponential_backoff
story-agents:
parent: default
timeout: 600
model: claude-opus-4-6
temperature: 0.8
allowed_tools: [research, websearch, file_write, image_gen]
context_window: 200000
streaming_enabled: true
engineering-agents:
parent: default
timeout: 300
temperature: 0.2
allowed_tools: [bash, git, code_analysis, docker]
context_window: 100000
sandbox: enabled
research-agents:
parent: default
timeout: 900
model: claude-opus-4-6
temperature: 0.5
allowed_tools: [websearch, pdf_read, data_analysis]
context_window: 150000
security-agents:
parent: default
timeout: 180
temperature: 0
allowed_tools: [audit_logs, vulnerability_scanner, threat_analysis]
context_window: 50000
approval_required: trueEach agent type exists for a reason. You're not creating arbitrary categories. You're saying: "These tasks cluster naturally, and they need similar configuration."
Key Fields Explained
timeout — How long (in seconds) an agent gets to complete its task before OpenClaw kills it. Story agents: 600s. Code agents: 300s. This isn't arbitrary. It reflects how these tasks naturally distribute their time.
retries — How many times the system tries to execute before giving up. Default is usually 3. For critical validation tasks, you might bump this to 5. For exploratory tasks, you might drop it to 1.
model — Which LLM model the agent uses. This is huge. You're trading off cost, speed, and capability. Haiku is fast and cheap. Opus is slower but more capable. Your choice here directly impacts your system's economics.
temperature — The "creativity knob" (0-1). Set to 0 for deterministic tasks (validation, code generation). Set higher (0.7-0.9) for creative tasks (writing, brainstorming). This isn't optional—it's mission-critical for agent behavior.
allowed_tools — The capability whitelist. Each agent type only gets access to tools that make sense for it. A validation agent shouldn't be able to write story files. A story agent shouldn't be able to run arbitrary bash commands. This is your safety layer.
Why This Matters: The Domain Knowledge Layer
Here's the real insight: AGENTS.md is where you encode domain knowledge about how different tasks should be handled. You're not just listing agents. You're building a taxonomy of problem types, and each category gets deliberate resource allocation.
Why does this matter? Because every LLM call costs money. Every timeout is a choice between responsiveness and cost. Every model selection is a trade-off between capability and speed. Without AGENTS.md, these choices are invisible. They hide in code. They get made inconsistently. One developer sets timeout to 100 seconds for a story agent. Another sets it to 1000. Nobody knows. You can't audit it. You can't improve it.
With AGENTS.md, every choice is visible and intentional. When you add a new agent type, you're making a conscious decision: "These tasks have enough in common that they should share configuration, model choice, and tool access." That decision ripples through your entire system. Each field—model choice, timeout, temperature, allowed tools—is a high-stakes decision that affects correctness, cost, and safety simultaneously. A single mistake in AGENTS.md configuration can cascade through dozens of agents, potentially causing expensive over-provisioning, security vulnerabilities, or degraded output quality.
Let me give you a concrete example of why this matters. Suppose you have a story-agent that was assigned a 600-second timeout because creative writing is slow. But you accidentally grant it bash tool access (maybe for file operations). Now, if a prompt injection happens and the LLM tries to exfiltrate data via bash, you've got 10 minutes for it to do damage. The long timeout that made sense for story writing now becomes a liability for security. This is why agent type definition is foundational—every field you set cascades into security, performance, and cost implications.
Similarly, temperature is non-obvious to newcomers. You might think "why not let every agent be creative?" But consider a data validation agent with temperature 0.8. It's supposed to check if a CSV matches a schema. High temperature means it might occasionally "hallucinate" that malformed rows are valid. That's a disaster for data integrity. Temperature isn't flavor. It's correctness policy.
The model choice is the same story. You might see Haiku and Opus and think they're just different price points. But Haiku was trained differently. It's faster at structured tasks, weaker at reasoning. Opus excels at multi-step reasoning, struggles with time pressure. Pick the wrong one, and your agent either becomes expensive (using Opus for simple tasks) or wrong (using Haiku for complex analysis). AGENTS.md is where you encode: "For this type of problem, we've tested, and Opus is worth the extra cost."
IDENTITY.md: The Personality and Presence Layer
If AGENTS.md is about what agents can do, IDENTITY.md is about who they are and how they present themselves. IDENTITY.md exists because you need consistent voice across your workspace without embedding voice instructions in code or duplicating prompts across dozens of agents.
Without it, you'd have a nightmare: one agent uses formal technical language, another uses casual tone, another is cold and professional. Users wouldn't know what to expect. Your workspace would feel incoherent. Worse, you'd be managing voice through prompt engineering scattered across different code files, making global changes impossible.
IDENTITY.md solves this by centralizing personality. This file is where your workspace gains personality. It's not just metadata. Every response your agents generate flows through their identity configuration. When a user interacts with your system, they're not interacting with a generic LLM. They're interacting with something that has a voice, expertise areas, and a consistent presence. That consistency emerges from IDENTITY.md.
What Lives Here
IDENTITY.md contains:
workspace:
name: "OpenClaw Instance"
version: "1.0.0"
agent_identities:
default_personality: |
You are a helpful, clear-thinking AI agent designed to solve problems.
You work in teams. You value evidence and iteration.
story-writer:
name: "Story Architect"
role: "Creative narrative development specialist"
voice: "Warm, encouraging, collaborative"
expertise:
- "Plot structure and character development"
- "Prose quality and pacing"
- "Thematic coherence"
signing_off: "-iNet"
communication_style: "narrative_focused"
response_format: "prose_with_dialogue"
code-reviewer:
name: "Code Auditor"
role: "Quality assurance and architecture validation"
voice: "Direct, precise, educational"
expertise:
- "Code correctness and performance"
- "Architecture patterns"
- "Testing and documentation"
signing_off: "—Code Auditor"
communication_style: "technical_detailed"
response_format: "structured_with_examples"
data-analyst:
name: "Data Interpreter"
role: "Statistical analysis and insight generation"
voice: "Analytical, cautious about claims, evidence-driven"
expertise:
- "Statistical methodology"
- "Data visualization principles"
- "Bias detection and mitigation"
signing_off: "—Data Interpreter"
communication_style: "findings_first"
response_format: "summary_with_caveats"Notice something? The agents aren't generic. They have names, roles, expertise areas, and communication styles. When a story agent responds, it's not just "here's your chapter edit." It's warm and encouraging because that's baked into its identity. When a code reviewer responds, it's not rambling—it's structured and precise because that's its identity.
The Voice Layer
This is critical: IDENTITY.md defines how agents communicate. The voice field isn't fluff. It's instruction. When you set voice: "Direct, precise, educational", that agent's every response is shaped by that.
This is why -iNet adopts a particular voice in responses. That voice—conversational, authoritative, casual interjections, problem-hook openings—comes from IDENTITY.md. It's not something the agent decides. It's part of the workspace constitution.
Personalization at Scale
Here's where it gets powerful: you can define multiple identities for the same agent type. In a large workspace, you might have:
story-writer-academic— Formal, research-backed, technical precisionstory-writer-casual— Conversational, approachable, storytelling-firststory-writer-technical— Precise, structured, systems-focused
Each one inherits the core expertise but adapts the voice. Users select which identity they want for a given task. OpenClaw keeps them consistent because the choice is tracked in IDENTITY.md.
Here's a practical scenario: you're working on documentation for a developer audience. You don't want warm and fuzzy. You want the story-writer-technical identity. But next week, you're writing content for a marketing team that needs narrative flow and accessibility. You switch to story-writer-casual. Same underlying agent type, same tools, same training—but the personality adapts to the audience. This scales because IDENTITY.md acts as a registry. When an agent spawns, the system looks up which identity it should use. The agent's system prompt gets injected with that identity's voice, expertise framing, and communication style. All from a single IDENTITY.md file.
The hidden benefit? Consistency across your entire system. If you define a voice once ("Warm, encouraging, collaborative"), every agent using that identity will embody it. You don't need to manually prompt-engineer every single agent. The identity is the prompt engineering—centralized, versioned, auditable.
This also makes it trivial to update how your agents present themselves. Need to sound more professional? Edit IDENTITY.md once. Every agent gets the change on next execution. Need to add a new expertise area? Add it to the identity definition. No code changes. No agent retraining. Just configuration.
TOOLS.md: The Capability Registry and Permissions Layer
You've got agents with models and personalities. Now, what can they actually do?
TOOLS.md is your permission matrix. It's where you answer one of the most critical questions in system design: "What is each agent type allowed to do?" And more importantly, what are they forbidden from doing?
This file exists because without it, permissions get buried in agent code or runtime checks that are easy to miss. A story agent shouldn't be able to run arbitrary bash. A validation agent shouldn't be able to send emails. A debugging agent shouldn't have access to production secrets. Without centralized permissions, these guardrails get fragile. One developer forgets to check permissions. Now a story agent can bash. Another dev adds "temporary" access to a tool "just for testing." Months later, that temporary permission is still active in production.
TOOLS.md prevents this through explicit, centralized, auditable permission definition. Every tool is listed. Every restriction is visible. Every agent type's access is declared. You don't have to hunt through code to understand what's actually allowed. You read TOOLS.md and you know.
Structure and Permissions
tools:
bash:
description: "Execute shell commands"
risk_level: high
allowed_for:
- engineering-agents
- validation-agents
forbidden_for:
- story-agents
rate_limit: 10 per minute
timeout: 60
audit_log: true
websearch:
description: "Search the internet"
risk_level: low
allowed_for:
- all
rate_limit: 20 per hour
cost_per_call: 0.05
requires_citation: true
file_write:
description: "Write files to disk"
risk_level: medium
allowed_for:
- story-agents
- engineering-agents
forbidden_for:
- validation-agents
forbidden_paths:
- "/etc/*"
- "/.git/*"
- "/config/*"
- "/secrets/*"
allowed_patterns:
- "projects/**/*.md"
- "drafts/**/*"
size_limit: 50MB
audit_log: true
git_push:
description: "Push to remote repositories"
risk_level: high
allowed_for:
- engineering-agents
requires_approval: true
approval_method: "human_review"
restricted_branches:
- "main"
- "production"
audit_log: comprehensive
email_send:
description: "Send emails on behalf of workspace"
risk_level: high
allowed_for:
- admin-agents
forbidden_for:
- all_others
requires_approval: true
recipient_whitelist: true
audit_log: comprehensiveSee the architecture? Each tool has:
- risk_level — How dangerous is this? Bash is high. Websearch is low.
- allowed_for — Which agent types can use it?
- forbidden_for — Explicit blacklist (belt and suspenders).
- rate_limit — How often can it be called?
- forbidden_paths / allowed_patterns — Fine-grained file access control.
- requires_approval — Does a human need to sign off?
- audit_log — Should calls be logged?
- timeout — How long before we kill the execution?
- cost_per_call — Track spending per tool.
This level of detail isn't paranoia. It's necessary specification. Each field encodes a decision about safety, cost, and auditability.
The Permission Model
This is important: permissions flow from agent type to tool. An individual agent doesn't get special privileges. It operates within the boundaries set by its type.
Why? Consistency. Predictability. Auditability. When a story agent tries to run bash, it fails—not because we forgot to check, but because TOOLS.md says story agents don't have bash access. The denial is deterministic, logged, and foreseeable.
This design prevents a class of subtle bugs. Imagine you spin up a temporary debugging agent, forget to restrict its permissions, and it accidentally modifies production data. With TOOLS.md, that's harder. The default is deny. You have to explicitly grant access. And once granted, the permission is documented, reviewable, and auditable.
Rate Limiting and Quotas
OpenClaw uses TOOLS.md to enforce quotas. Each tool has a rate limit. When an agent hits the limit, it queues. The system respects those boundaries.
This is crucial for cost management (websearch costs money—$0.05 per call in the example), system health (you don't want one runaway agent hammering the file system at 1000 requests/second), and safety (preventing infinite loops where an agent keeps calling the same tool).
Think about this practically: if a story agent gets stuck in a loop calling file_write 100 times per second, and you didn't have a rate limit, you'd fill your disk and crash the system. With TOOLS.md rate limits in place, after 10 calls per minute (example), subsequent calls queue. The agent has to wait. You notice the problem. You kill the agent. System survives.
Rate limiting also creates a cost ceiling. If websearch is $0.05 per call and you limit to 20 per hour per agent, you know that agent can't cost more than $1/hour on websearch alone. Multiply by the number of agents, and you can predict your total spend. That's infrastructure as code, not "hope we don't go over budget."
HEARTBEAT.md: The Execution Engine and Scheduling Configuration
The first three files define what the system is. HEARTBEAT.md defines when and how often it runs. It's the circulatory system of OpenClaw—the rhythm that keeps everything alive and synchronized.
This file exists because execution timing matters as much as execution logic. Without centralized scheduling, you'd have cron jobs scattered across servers, scheduled tasks buried in code, retry logic embedded in agent implementations. You'd never know when something actually runs. You'd miss dependencies. You'd deploy code at the same time as scheduled tasks start, causing conflicts. Backups would interfere with real-time processing. Memory flushes would cause latency spikes.
HEARTBEAT.md centralizes all timing decisions. When does each task run? What's the concurrency limit? What happens on failure? All visible in one place. You can read it and understand exactly when your system is busy, when it's idle, and when it needs human intervention.
Execution Windows and Scheduling
heartbeat:
interval: 60 # seconds between checks for new work
max_concurrent_agents: 10
startup_wait: 30 # seconds before first execution
graceful_shutdown_timeout: 300 # 5 minutes
execution_windows:
office_hours:
start: "08:00"
end: "18:00"
timezone: "UTC"
max_agents: 15
priority: high
cost_budget: 1000 # dollars per window
off_hours:
start: "18:00"
end: "08:00"
timezone: "UTC"
max_agents: 5
priority: low
cost_budget: 200
allowed_tasks:
- maintenance
- non_urgent_analysis
blackout_windows:
- name: "deployment_window"
start: "22:00"
end: "23:00"
days: [Saturday]
action: "pause_all"
scheduled_tasks:
memory_flush:
interval: 3600 # 1 hour
agent: validation-agents
critical: true
daily_summary:
cron: "0 9 * * *" # 9 AM daily
agent: story-agents
retries: 3
backup:
cron: "0 */6 * * *" # Every 6 hours
agent: engineering-agents
priority: high
timeout: 1800
credential_rotation:
cron: "0 3 1 * *" # 3 AM on first of month
agent: security-agents
requires_approval: false
critical: trueWhat Heartbeat Controls
interval — How often OpenClaw wakes up to check for new work. 60 seconds is typical. Shorter intervals = more responsiveness but higher overhead. If you set it to 10 seconds, you're polling constantly; if you set it to 300, you might have 5-minute latency before new tasks start.
max_concurrent_agents — How many agents can run simultaneously. This is your concurrency ceiling. With 10, you can run up to 10 agents at the same time. This matters for resource management. If each agent uses 512 MB RAM, 10 concurrent agents = 5 GB of memory. Set it too high, you OOM. Set it too low, you underutilize.
execution_windows — Different rules for different times. During office hours, you allow more agents (15) and set a higher cost budget ($1000). After hours, you constrain (5 agents, $200 budget). This isn't arbitrary. It reflects business reality: during business hours, you're actively using the system and have more budget flexibility. After hours, you want only maintenance and lightweight analysis.
scheduled_tasks — Recurring jobs that need to happen. Memory flushes, daily summaries, backups—these are defined here, not scattered across code. Each task specifies: which agent type runs it, what the schedule is (interval or cron), and whether it's critical (if critical fails, alert the operator).
blackout_windows — Times when the system should pause. Deploying new code? Set a blackout window. You don't want agents running while you're mid-deployment. The action can be "pause_all" (wait for current tasks, don't start new ones) or "kill_all" (aggressive shutdown).
Retry Logic and Resilience
HEARTBEAT.md also defines how the system recovers from failure:
retry_strategy:
exponential_backoff:
base_delay: 1 # seconds
max_delay: 3600 # 1 hour
multiplier: 2
jitter: true # add randomness to avoid thundering herd
max_retries: 5
retry_only_on: [timeout, transient_error, rate_limit]
circuit_breaker:
failure_threshold: 5
success_threshold: 2
timeout: 300
failure_modes:
timeout: "exponential_backoff"
permission_denied: "immediate_fail"
service_unavailable: "exponential_backoff"
degradation_policy:
on_failure: "drop_lowest_priority_tasks"
on_timeout: "alert_and_continue"Exponential backoff is key: first retry at 1 second, then 2, then 4, then 8. If something's broken, you don't want to hammer it every second. You back off exponentially, giving the system time to recover. The jitter flag adds randomness—so if 100 agents all fail at the same time, they don't all retry at second 1, then all retry at second 2. That's the "thundering herd" problem. Jitter spreads the retries across the interval.
Circuit breakers prevent cascading failures: if a tool fails 5 times in a row, we stop trying and alert the system. But here's the nuance: not all failures are equal. A permission_denied error should fail immediately (no retry, no circuit breaker—it'll never work). A transient_error might recover. The circuit breaker logic needs to distinguish.
The degradation_policy is interesting. When the system is under stress, what happens? Do you queue everything? Do you drop lowest-priority tasks? You decide here. It's a choice about fairness and performance under load.
How They Work Together: The System as a Whole
Here's where it gets interesting—and here's where most teams get it wrong. These four files aren't just independent configuration files you manage separately. They're interdependent. They form a single system. Change one, and you're affecting the others. Understanding these dependencies is what separates teams that have coherent agent systems from teams that have configuration chaos.
AGENTS.md → IDENTITY.md: When you spawn an agent-type (defined in AGENTS.md), the system looks up the corresponding identity in IDENTITY.md. The identity's voice, expertise, and communication style get injected into the agent's system prompt. Change the agent's model in AGENTS.md, and the identity still applies—it just uses a different LLM backbone.
AGENTS.md → TOOLS.md: When AGENTS.md specifies allowed_tools: [bash, git], the system checks TOOLS.md to see what restrictions apply. Is bash high-risk? Does it require approval? The tool definitions from TOOLS.md gate what the agent can actually do.
TOOLS.md → HEARTBEAT.md: When a tool is marked requires_approval: true, HEARTBEAT.md's retry logic knows not to auto-retry failures. It waits for human review. If a tool has rate_limit: 10 per minute, HEARTBEAT's scheduling system respects that limit across all agents.
HEARTBEAT.md → AGENTS.md: When a task is scheduled in HEARTBEAT.md to run an agent type, it checks AGENTS.md for the agent's timeout and retry configuration. If the scheduled task is critical, HEARTBEAT.md might increase the max_retries or extend the execution window.
Change AGENTS.md's model choice from Haiku to Opus, and you're affecting cost (which feeds into HEARTBEAT.md's cost_budget). Make a tool high-risk in TOOLS.md, and suddenly HEARTBEAT.md might trigger approval workflows, which affects scheduling. Add a new execution window in HEARTBEAT.md, and you need to verify that all agent types defined in AGENTS.md can operate in that window with their allocated time budgets.
This is design intent. The bootstrap files are a system. You can't fully understand one without understanding the others. Change one, and you're implicitly affecting the others. This is why you version them together, review them together, and test them together.
Best Practices: Working With Bootstrap Files
Here's what I've learned from watching teams succeed (and fail) with these files. This isn't theoretical. These are patterns from teams actually running production OpenClaw systems:
1. Version your bootstrap files. They're configuration, but they're also infrastructure-as-code. Treat them like source. Commit them. Review changes. Understand the diff before merging.
Put them in your git repo. Tag releases so you can correlate system behavior to specific configurations. If agent behavior changes unexpectedly, check the bootstrap files in that commit. Did someone increase temperature? Reduce timeout? Grant a new tool? The answer is in the git history.
2. Separate concerns. Don't put execution logic in AGENTS.md. Don't define personalities in TOOLS.md. Each file has a job. Do that job well.
AGENTS.md = capability and resource allocation. IDENTITY.md = communication and personality. TOOLS.md = permissions and safeguards. HEARTBEAT.md = execution rhythm. Cross-cutting concerns (logging, monitoring, auth) belong in dedicated files or a central config.
3. Inherit and override, don't duplicate. Use the parent-child pattern in AGENTS.md. Don't copy-paste configuration. When you need variation, override the specific field.
If you have 10 agent types and they all share the default timeout, don't repeat timeout: 300 in each one. Define it once in default, and each child inherits it. If an agent type needs a different timeout, override it. This scales. When you need to update the default timeout (say, from 300 to 400), you change it once, not in 10 places.
4. Document constraints. When you forbid a tool for an agent type, leave a comment explaining why. Future-you will thank you.
story-agents:
allowed_tools: [research, websearch, file_write]
# Bash not allowed: story agents should not execute arbitrary commands.
# Use case: prose generation, not system administration.This is obvious when you write it. Six months later, when someone asks "why can't story agents run bash?", you'll have an answer ready.
5. Monitor HEARTBEAT.md impact. Scheduling changes affect system behavior at scale. If you increase max_concurrent_agents from 10 to 20, measure the impact. Maybe you're hitting database connection limits. Maybe your filesystem can't handle the I/O. Maybe your LLM API rate limits you.
Set up monitoring dashboards: agent spawn rate, concurrent agents running, task queue depth, retry rates. When you make a HEARTBEAT.md change, watch these dashboards for the next hour. Did throughput improve? Did error rates spike? Document the results.
6. Test identity changes. Before you change how agents present themselves in IDENTITY.md, test it. Does the new voice still convey the right expertise? Is it still professional?
Spin up a test agent with the new identity and run it against a known task. Have humans read the output. Does it sound right? If you're changing from "Direct, precise, educational" to "Casual, conversational, humorous," does the humor land or does it feel unprofessional for this use case?
7. Plan for identity variants. As your workspace grows, you'll want different identities for different contexts. A customer-facing agent needs a different voice than an internal research agent. Design for this early.
Define a hierarchy: default identity, then specialized variants. Centralize voice definitions so you don't have inconsistent tones across the system.
8. Audit tool usage. Periodically review TOOLS.md. Which tools are actually being used? Which agents use them? Are there tools you defined but never use? Tools that should be restricted but aren't?
This catches permission creep. Over time, agents accumulate access to tools they don't need. Regular audits bring it back under control.
9. Test the bootstrap pipeline. Write a test that loads all four bootstrap files, validates their syntax, checks for invalid references (e.g., an agent type that doesn't exist trying to use a tool), and verifies consistency.
./validate-bootstrap-files.sh
# Validates AGENTS.md against IDENTITY.md
# Validates TOOLS.md against AGENTS.md
# Validates HEARTBEAT.md tasks against AGENTS.md
# Returns 0 if all checks pass, 1 if any failBefore you deploy a bootstrap change, run this test. Catch configuration errors before they hit production.
10. Maintain a runbook. Document common operations:
- How to add a new agent type
- How to grant an agent a new tool
- How to adjust execution windows
- How to rotate credentials referenced in TOOLS.md
- How to troubleshoot "agent not starting" (check AGENTS.md timeout, check TOOLS.md permissions, check HEARTBEAT.md scheduling)
This becomes invaluable when you're on-call at 3 AM.
The Philosophy: Why This Matters at Scale
Here's the deeper insight: OpenClaw treats configuration as source of truth. Not as nice-to-have documentation. Not as deployment-time parameters. The bootstrap files define the system. This is the hidden layer that separates OpenClaw from every other agent orchestration system.
What does this actually mean? It means you can:
- Audit what agents can do by reading TOOLS.md. No ambiguity. No "maybe they have access to this, maybe not."
- Understand when tasks run by reading HEARTBEAT.md. Every scheduled task is declared. Every execution window is visible.
- Know how agents behave by reading AGENTS.md. Model choice, timeout, retry logic—it's all there.
- Predict how users interact with the system by reading IDENTITY.md. The voice, the expertise areas, the communication style.
You don't have to dig through code. The constitution is right there. This clarity—where configuration lives in human-readable files that version with your code—is where OpenClaw's power comes from.
Here's what this enables: when something goes wrong, you don't spend eight hours in the debugger. You read the bootstrap files. Was the agent supposed to have bash access? Check TOOLS.md. Did the task run at the right time? Check HEARTBEAT.md. Why did it timeout? Check AGENTS.md. The answers are there, auditable and historical in git.
When you need to change behavior, you edit them, commit the change, and the system evolves. The files become a living documentation of how your workspace actually works. And crucially, they're version-controlled. You can see the exact commit where behavior changed. You can revert changes. You can correlate system behavior to specific configuration choices.
This is radically different from systems where configuration is scattered: some in environment variables, some in databases, some in command-line flags, some in code comments, some in service mesh config files you forgot existed. Those systems are nightmares to audit. You can never be sure you've found all the configuration. You can never be certain what's actually running. You've got configuration spread across systems you don't even remember setting up.
With OpenClaw's bootstrap approach, you have certainty. The files are the law. Everything else is just implementing what the files say. This certainty is worth its weight in gold when you're trying to run reliable systems at scale.
Conclusion: The Constitution of Your Workspace
The four bootstrap files—AGENTS.md, IDENTITY.md, TOOLS.md, HEARTBEAT.md—form the operating system of OpenClaw. They encode decisions about capability, personality, permission, and timing. They're where workspace design happens.
But here's what I really want you to understand: these files exist because OpenClaw recognizes something fundamental that most systems miss. Configuration matters. Not as an afterthought. Not as deployment trivia. Configuration is where power lives. When you centralize it, version it, and treat it as seriously as code, you gain clarity that most engineering teams never achieve.
This is the hidden layer: OpenClaw doesn't hide configuration in environment variables or environment-specific config servers or passed through CLI flags scattered across different CI/CD runners. Everything is declarative, versioned, and auditable. You can read AGENTS.md and know, with certainty, what every agent can do. You can read IDENTITY.md and understand how users will perceive your system. You can audit TOOLS.md and verify that your security posture is what you think it is. You can check HEARTBEAT.md and see the exact timing of every scheduled task.
This transparency isn't nice to have. It's foundational to building reliable systems. Because when something goes wrong—and it will—you can trace it to a configuration choice, a permission mismatch, a timeout that was too short, or a scheduling conflict. You don't guess. You read the files and you know.
Most teams learn this lesson the hard way. They build a system, scatter configuration across a dozen places, and months later when behavior breaks, no one can explain why. The developers who wrote the code have moved on. The deployment scripts are outdated. The environment variables aren't documented. It becomes archaeology.
OpenClaw's bootstrap approach prevents this entirely. Your future self—and your team—will thank you for taking the time to structure these files correctly.
Related Reading
- OpenClaw Commands Reference — How to invoke agents defined in AGENTS.md
- Scheduling Complex Workflows — Advanced HEARTBEAT.md patterns for multi-agent orchestration
- Security in Agent Systems — Deep dive into TOOLS.md permission models and safety design
- Identity and Voice Customization — Patterns for managing multiple agent identities at scale
- OpenClaw Architecture Overview — How bootstrap files connect to the broader system