Understanding the Tool Contract

Before we write a single line of code, let's talk about the why. When you define a custom tool, you're essentially signing a contract with Claude:

You promise: Here's exactly what this tool does, what inputs it takes, and what it returns.
Claude promises: I'll call this tool when it makes sense, pass the inputs you specified, and expect the output format you declared.

Break that contract, and things fall apart. Claude might call a tool with the wrong argument types. Your handler might crash. The agent might not understand what actually happened and make bad decisions downstream.

The key insight is that Claude doesn't run your code directly. It doesn't see your TypeScript. It sees a JSON Schema description of your tool — that's it. Your job is to make that schema crystal clear and make your handler bulletproof enough to handle whatever Claude throws at it.

This is why tool design is less about writing clever code and more about clear communication. You're not writing for a machine that understands your intent. You're writing for a language model that can only read what you write and must make decisions based on that description alone.

The Hidden Layer: Why Tool Design Matters

Custom tools are where the rubber meets the road for agent reliability. A well-designed tool that's bulletproof and predictable means Claude can confidently use it hundreds of times. A poorly designed tool that crashes occasionally or behaves unpredictably causes Claude to doubt itself. After a few failures, Claude becomes conservative. It stops using tools that failed before, even when they're the right choice. You end up with an agent that's more cautious and less capable than it should be.

The opposite is true for well-designed tools. Claude learns to trust them. It calls them confidently. It interprets results correctly. It chains operations together without hesitation. A mature agent system with a hundred perfect tools is more capable and more reliable than one with ten tools, three of which are flaky.

Real-world deployments spend disproportionate time on tool quality. The tool definitions might take an hour. The handler implementation might take two hours. But the testing, validation, and refinement takes a week. Teams that do this well—that build tools slowly and carefully—end up with systems that just work. Teams that rush tools end up debugging mysterious agent behavior for months.

The Psychology of Tool Reliability

There's also a human element. When you're responsible for tools that agents depend on, you approach the work differently. You test more. You think through edge cases. You document assumptions. You add defensive code. This isn't overhead—this is professionalism. The difference between a tool that's "probably fine" and one that's "definitely reliable" might be a few hours of extra work, but it prevents months of debugging later.

Additionally, tool design teaches you how to think like Claude. You get into the habit of asking: "If I were a language model reading this description, would I understand what this tool does? Would I know when to call it? Would I be confident about the inputs I need?" This kind of thinking—seeing through Claude's eyes—makes you better at everything. You write clearer code. You document better. You think about edge cases proactively.

Defining Tool Schemas with JSON Schema

Every tool starts with a schema. This is a JSON Schema that describes:

What the tool does (name and description)
What inputs it accepts (parameters with types)
What each parameter means and constraints

Let's build a practical example. Say you're building an agent that manages infrastructure — it needs to query AWS for running instances. Here's how you'd define that tool:

typescript

const describeInstancesTool = {
  name: "describe_aws_instances",
  description:
    "Fetch details about EC2 instances running in AWS. Returns instance ID, state, instance type, and launch time.",
  input_schema: {
    type: "object",
    properties: {
      region: {
        type: "string",
        description:
          "AWS region (e.g., 'us-east-1', 'eu-west-1'). Defaults to us-east-1.",
        enum: ["us-east-1", "us-west-2", "eu-west-1", "ap-southeast-1"],
      },
      filters: {
        type: "object",
        description:
          "Optional filters to narrow results. Use instance-state-name: 'running' or 'stopped'.",
        additionalProperties: {
          type: "string",
        },
      },
      max_results: {
        type: "integer",
        description:
          "Maximum number of instances to return. Between 1 and 100. Defaults to 20.",
        minimum: 1,
        maximum: 100,
        default: 20,
      },
    },
    required: ["region"],
    additionalProperties: false,
  },
};

Let's unpack this, because every field matters:

name: Keep it snake_case and descriptive. Claude uses this to decide when to call the tool. A name like describe_aws_instances is way better than query because it tells Claude exactly what it does. The name should be a noun-verb combo that's specific enough to disambiguate from other tools you might have.

description: This is where you earn your paycheck. Be specific. "Fetch details about EC2 instances" is better than "Get AWS data." Mention what fields are returned. If the tool has side effects or limitations, say so: "This call is read-only and does not modify instances." If it's slow, mention it: "May take 5-10 seconds if querying many regions." A good description answers three questions: What does it do? What does it return? When would you use it?

input_schema: This is pure JSON Schema. The type: "object" means your inputs are a JSON object. Each property in properties is a parameter Claude can pass. This is the contract that Claude will respect.

properties: Here we have three parameters:

region: A string, constrained to valid AWS regions via enum. Always use enum when you have a fixed set of valid values — this prevents Claude from inventing regions that don't exist. The enum is your safeguard against hallucination.
filters: An optional object for filtering. We're allowing any string keys (like instance-state-name) and string values. This gives flexibility but should be documented carefully in the description. When you use additionalProperties, you're opening a door — make sure Claude understands what kind of keys and values are valid.
max_results: An integer with constraints. The minimum, maximum, and default fields are crucial. minimum: 1 prevents Claude from asking for zero results. maximum: 100 prevents it from requesting a million rows. default: 20 means if Claude doesn't specify this parameter, we use 20. These constraints are defensive programming at the schema level.

required: Only region is required. Everything else is optional, which is why they're not listed here. If something's always required, put it in required. Be conservative — mark as required only what's truly necessary.

additionalProperties: Set to false to prevent Claude from passing random extra parameters your handler doesn't expect. This is a security boundary. If you accidentally leave this off and someone has a prompt injection, they could add parameters you didn't anticipate.

This schema is the entire contract. Claude can only pass parameters listed here. Your handler can only return what it declares. Structure it right, and both sides win.

Implementing Handlers with Error Handling

Now for the handler. A handler is a function that:

Receives parameters from Claude
Executes the actual logic
Returns a result or throws an error
Never crashes the agent

Here's the handler for our AWS tool:

typescript

interface DescribeInstancesInput {
  region: string;
  filters?: Record<string, string>;
  max_results?: number;
}
 
interface EC2Instance {
  instanceId: string;
  state: string;
  instanceType: string;
  launchTime: string;
}
 
async function handleDescribeInstances(
  input: DescribeInstancesInput,
): Promise<{ instances: EC2Instance[]; count: number; error?: string }> {
  try {
    // Validate input (even though schema validates, defense in depth)
    if (!input.region) {
      throw new Error("region parameter is required");
    }
 
    if (
      input.max_results &&
      (input.max_results < 1 || input.max_results > 100)
    ) {
      throw new Error("max_results must be between 1 and 100");
    }
 
    const maxResults = input.max_results ?? 20;
 
    // Build the actual AWS API call
    // (In real code, you'd use AWS SDK v3)
    const params: any = {
      Region: input.region,
      MaxResults: maxResults,
    };
 
    if (input.filters) {
      params.Filters = Object.entries(input.filters).map(([key, value]) => ({
        Name: key,
        Values: [value],
      }));
    }
 
    // Mock API response for demonstration
    const mockInstances: EC2Instance[] = [
      {
        instanceId: "i-0123456789abcdef0",
        state: "running",
        instanceType: "t3.medium",
        launchTime: "2026-03-10T14:30:00Z",
      },
      {
        instanceId: "i-0123456789abcdef1",
        state: "running",
        instanceType: "t3.large",
        launchTime: "2026-03-12T08:15:00Z",
      },
    ];
 
    return {
      instances: mockInstances.slice(0, maxResults),
      count: mockInstances.length,
      error: undefined,
    };
  } catch (err) {
    // Catch and format errors safely
    const errorMessage = err instanceof Error ? err.message : String(err);
    return {
      instances: [],
      count: 0,
      error: `Failed to describe instances: ${errorMessage}`,
    };
  }
}

Expected Output:

json

{
  "instances": [
    {
      "instanceId": "i-0123456789abcdef0",
      "state": "running",
      "instanceType": "t3.medium",
      "launchTime": "2026-03-10T14:30:00Z"
    },
    {
      "instanceId": "i-0123456789abcdef1",
      "state": "running",
      "instanceType": "t3.large",
      "launchTime": "2026-03-12T08:15:00Z"
    }
  ],
  "count": 2,
  "error": undefined
}

Let's talk about what makes this handler solid:

Type Safety: The DescribeInstancesInput interface matches our schema. TypeScript will yell at us if we try to pass parameters the schema doesn't allow. The return type is explicit too — we return success or failure in a structured way. This is defensive: even if the schema allowed something, TypeScript catches it before it reaches the handler.

Defense in Depth: We validate the input again inside the handler, even though the schema should prevent invalid input. Why? Because schemas can get bypassed sometimes in edge cases, network issues, or if someone's testing directly. If you're calling AWS with invalid parameters, you want to catch it fast before you bill the customer for bad API calls.

Error Handling: We wrap the entire logic in try/catch. When something goes wrong, we:

Catch the error (any type)
Extract a readable message
Return it in the same structure as success, but with the error field set
Never throw an exception that crashes the agent

Graceful Degradation: Instead of throwing, we return a response object with an error field. Claude can see what went wrong and decide whether to retry, ask the user, or try something else. This is way better than crashing and losing the agent's context.

Clear Return Structure: We always return the same shape: instances, count, and optionally error. Claude can rely on this structure. Consistency is key — if sometimes you return { data: [] } and sometimes { results: [] }, Claude gets confused.

Input Validation and Type Safety

The schema validates syntax, but your handler validates semantics. There's a difference:

Syntax validation (schema): Is this a string? Is this an integer between 1 and 100?
Semantic validation (handler): Does this value make sense in the real world? Is this instance ID actually in our database?

Here's a tool that searches a database. The schema says the query is a string, but the handler needs to validate it actually works:

typescript

const searchUsersTool = {
  name: "search_users",
  description:
    "Search for users by name, email, or user ID. Returns user profiles.",
  input_schema: {
    type: "object",
    properties: {
      query: {
        type: "string",
        description:
          "Search term: a user name, email, or numeric ID. Minimum 1 character.",
        minLength: 1,
        maxLength: 255,
      },
      limit: {
        type: "integer",
        description: "Max results to return. Default 10, max 100.",
        minimum: 1,
        maximum: 100,
        default: 10,
      },
    },
    required: ["query"],
    additionalProperties: false,
  },
};
 
interface SearchUsersInput {
  query: string;
  limit?: number;
}
 
interface UserProfile {
  id: string;
  name: string;
  email: string;
  created_at: string;
}
 
async function handleSearchUsers(
  input: SearchUsersInput,
): Promise<{ users: UserProfile[]; total_found: number; message: string }> {
  const query = input.query.trim();
  const limit = input.limit ?? 10;
 
  // Semantic validation: query is too short
  if (query.length < 1) {
    return {
      users: [],
      total_found: 0,
      message: "Search query must be at least 1 character long",
    };
  }
 
  // Semantic validation: prevent SQL injection patterns
  if (query.includes(";") || query.includes("--") || query.includes("/*")) {
    return {
      users: [],
      total_found: 0,
      message: "Invalid query format detected",
    };
  }
 
  try {
    // In production, this would hit your database
    // For now, we'll simulate some results
    const mockUsers: UserProfile[] = [
      {
        id: "user_001",
        name: "Alice Johnson",
        email: "alice@example.com",
        created_at: "2025-01-15T10:30:00Z",
      },
      {
        id: "user_002",
        name: "Bob Smith",
        email: "bob@example.com",
        created_at: "2025-02-20T14:45:00Z",
      },
    ];
 
    // Filter based on query
    const filtered = mockUsers.filter(
      (user) =>
        user.name.toLowerCase().includes(query.toLowerCase()) ||
        user.email.toLowerCase().includes(query.toLowerCase()) ||
        user.id.includes(query),
    );
 
    const results = filtered.slice(0, limit);
 
    return {
      users: results,
      total_found: filtered.length,
      message: `Found ${filtered.length} user(s), returning ${results.length}`,
    };
  } catch (err) {
    const errorMessage = err instanceof Error ? err.message : String(err);
    return {
      users: [],
      total_found: 0,
      message: `Database error: ${errorMessage}`,
    };
  }
}

Expected Output (for query = "alice"):

json

{
  "users": [
    {
      "id": "user_001",
      "name": "Alice Johnson",
      "email": "alice@example.com",
      "created_at": "2025-01-15T10:30:00Z"
    }
  ],
  "total_found": 1,
  "message": "Found 1 user(s), returning 1"
}

Key validation patterns:

Trim whitespace: query.trim() removes leading/trailing spaces that might come from how Claude formatted its request. You'd be surprised how often this matters.
Security checks: We look for SQL injection patterns. You should do this for any tool that touches a database. Even with parameterized queries, it's good to reject obviously malicious input early.
Semantic boundaries: If Claude asks to search for a numeric ID, we check if the query is actually numeric before hitting the database. This prevents wasting database resources on queries that will never match.
Clear error messages: When validation fails, we tell Claude specifically why, so it can adjust. "Invalid query format detected" is better than just returning empty results, because Claude knows something went wrong.

The hidden why here is that you're protecting Claude from making bad decisions. If you silently return empty results instead of telling it validation failed, Claude might think there are no matching users. If you tell it the query format is invalid, it knows to fix the query.

Async Handlers and Long-Running Operations

Many tools take time. Database queries, API calls, file operations — they all need async handlers. But there's a gotcha: Claude waits for your handler to complete before making its next decision. If your handler takes 30 seconds, Claude waits 30 seconds.

For long operations, you have two patterns:

Pattern 1: Polling — Start the operation, return a job ID, then Claude polls for status.

Pattern 2: Async Completion — Start the operation and return immediately with a status, let Claude handle retries.

Here's Pattern 1 (polling):

typescript

const deployApplicationTool = {
  name: "deploy_application",
  description:
    "Deploy an application to a server. Returns deployment ID. Use check_deployment_status to monitor progress.",
  input_schema: {
    type: "object",
    properties: {
      app_name: {
        type: "string",
        description: "Name of the application to deploy",
      },
      version: {
        type: "string",
        description: "Version tag to deploy (e.g., '1.2.3' or 'latest')",
      },
      environment: {
        type: "string",
        enum: ["staging", "production"],
        description: "Deployment target environment",
      },
    },
    required: ["app_name", "version", "environment"],
    additionalProperties: false,
  },
};
 
interface DeployInput {
  app_name: string;
  version: string;
  environment: "staging" | "production";
}
 
interface DeploymentStatus {
  deployment_id: string;
  status: "queued" | "in_progress" | "completed" | "failed";
  progress_percent?: number;
  error_message?: string;
}
 
// In-memory store for deployments (replace with database in production)
const deploymentStore: Map<string, DeploymentStatus> = new Map();
 
async function handleDeployApplication(
  input: DeployInput,
): Promise<DeploymentStatus> {
  const deploymentId = `deploy_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
 
  // Validate environment-specific rules
  if (
    input.environment === "production" &&
    !input.version.match(/^\d+\.\d+\.\d+$/)
  ) {
    return {
      deployment_id: deploymentId,
      status: "failed",
      error_message:
        "Production deployments require semantic versioning (e.g., 1.2.3), not 'latest'",
    };
  }
 
  try {
    // Create the deployment record
    const deployment: DeploymentStatus = {
      deployment_id: deploymentId,
      status: "queued",
      progress_percent: 0,
    };
 
    deploymentStore.set(deploymentId, deployment);
 
    // Simulate async deployment (in production, this would actually deploy)
    setTimeout(async () => {
      const stored = deploymentStore.get(deploymentId);
      if (stored) {
        stored.status = "in_progress";
        stored.progress_percent = 50;
      }
 
      // Simulate work completion
      setTimeout(() => {
        const final = deploymentStore.get(deploymentId);
        if (final) {
          final.status = "completed";
          final.progress_percent = 100;
        }
      }, 2000);
    }, 1000);
 
    return deployment;
  } catch (err) {
    const errorMessage = err instanceof Error ? err.message : String(err);
    return {
      deployment_id: deploymentId,
      status: "failed",
      error_message: `Deployment failed: ${errorMessage}`,
    };
  }
}
 
const checkDeploymentStatusTool = {
  name: "check_deployment_status",
  description:
    "Check the status of a running deployment. Use the deployment_id from deploy_application.",
  input_schema: {
    type: "object",
    properties: {
      deployment_id: {
        type: "string",
        description: "The deployment ID from deploy_application",
      },
    },
    required: ["deployment_id"],
    additionalProperties: false,
  },
};
 
interface CheckStatusInput {
  deployment_id: string;
}
 
async function handleCheckDeploymentStatus(
  input: CheckStatusInput,
): Promise<DeploymentStatus> {
  const deployment = deploymentStore.get(input.deployment_id);
 
  if (!deployment) {
    return {
      deployment_id: input.deployment_id,
      status: "failed",
      error_message: `Deployment ID not found: ${input.deployment_id}`,
    };
  }
 
  return deployment;
}

Expected Output (initial call):

json

{
  "deployment_id": "deploy_1710694800000_a1b2c3d4e",
  "status": "queued",
  "progress_percent": 0
}

Expected Output (after checking status):

json

{
  "deployment_id": "deploy_1710694800000_a1b2c3d4e",
  "status": "in_progress",
  "progress_percent": 50
}

This pattern works because:

Immediate response: The deploy tool returns immediately with a deployment ID. Claude gets the response and knows it worked.
Polling mechanism: Claude can call check_deployment_status periodically to monitor progress.
Clear status: Each status check tells Claude exactly where we are (queued, in_progress, completed).
Error handling: If the deployment fails, we return that in the status, not as an exception.

The hidden why here: Claude operates synchronously. Your handlers block everything until they return. For long operations, you must return quickly and let Claude poll. This pattern scales because Claude can handle multiple polls without overloading your system. The key is making the polling tool lightweight — it should just return current state, not do any work.

Building Reusable Tool Libraries

Once you have a few tools, organize them into a library. This makes tools composable, testable, and shareable across projects.

Here's how to structure it:

typescript

// tools/types.ts
export interface ToolDefinition {
  name: string;
  description: string;
  input_schema: {
    type: string;
    properties: Record<string, any>;
    required: string[];
    additionalProperties: boolean;
  };
}
 
export interface ToolHandler {
  (input: any): Promise<any>;
}
 
export interface RegisteredTool {
  definition: ToolDefinition;
  handler: ToolHandler;
}
 
// tools/registry.ts
export class ToolRegistry {
  private tools: Map<string, RegisteredTool> = new Map();
 
  register(
    name: string,
    definition: ToolDefinition,
    handler: ToolHandler,
  ): void {
    if (this.tools.has(name)) {
      throw new Error(`Tool '${name}' is already registered`);
    }
    this.tools.set(name, { definition, handler });
  }
 
  get(name: string): RegisteredTool | undefined {
    return this.tools.get(name);
  }
 
  getAll(): RegisteredTool[] {
    return Array.from(this.tools.values());
  }
 
  getDefinitions(): ToolDefinition[] {
    return this.getAll().map((tool) => tool.definition);
  }
 
  async execute(name: string, input: any): Promise<any> {
    const tool = this.get(name);
    if (!tool) {
      throw new Error(`Tool '${name}' not found in registry`);
    }
    return tool.handler(input);
  }
}
 
// tools/aws/ec2.ts
import { ToolRegistry, ToolDefinition } from "../types";
 
export function registerEC2Tools(registry: ToolRegistry): void {
  const describeInstancesDefinition: ToolDefinition = {
    name: "describe_aws_instances",
    description: "Fetch details about EC2 instances running in AWS.",
    input_schema: {
      type: "object",
      properties: {
        region: {
          type: "string",
          enum: ["us-east-1", "us-west-2", "eu-west-1", "ap-southeast-1"],
        },
        max_results: {
          type: "integer",
          minimum: 1,
          maximum: 100,
          default: 20,
        },
      },
      required: ["region"],
      additionalProperties: false,
    },
  };
 
  registry.register(
    "describe_aws_instances",
    describeInstancesDefinition,
    handleDescribeInstances,
  );
}
 
async function handleDescribeInstances(input: any): Promise<any> {
  // Handler implementation
  return { instances: [], count: 0 };
}
 
// tools/database/index.ts
import { ToolRegistry, ToolDefinition } from "../types";
 
export function registerDatabaseTools(registry: ToolRegistry): void {
  const queryDatabaseDefinition: ToolDefinition = {
    name: "query_database",
    description: "Execute a SELECT query against the database.",
    input_schema: {
      type: "object",
      properties: {
        sql: {
          type: "string",
          description: "SELECT query. Parameters must use $1, $2 syntax.",
        },
        params: {
          type: "array",
          description: "Query parameters for safe substitution",
          items: {
            oneOf: [
              { type: "string" },
              { type: "number" },
              { type: "boolean" },
              { type: "null" },
            ],
          },
        },
      },
      required: ["sql"],
      additionalProperties: false,
    },
  };
 
  registry.register(
    "query_database",
    queryDatabaseDefinition,
    handleQueryDatabase,
  );
}
 
async function handleQueryDatabase(input: any): Promise<any> {
  // Handler implementation
  return { rows: [], count: 0 };
}
 
// main.ts - Usage
const registry = new ToolRegistry();
 
// Register tool families
registerEC2Tools(registry);
registerDatabaseTools(registry);
 
// Get all definitions to pass to Claude
const toolDefinitions = registry.getDefinitions();
 
// When Claude calls a tool
async function onToolCall(toolName: string, toolInput: any): Promise<any> {
  return registry.execute(toolName, toolInput);
}
 
console.log(`Registered ${toolDefinitions.length} tools`);
toolDefinitions.forEach((tool) => {
  console.log(`  - ${tool.name}: ${tool.description}`);
});

Expected Output:

Registered 2 tools
  - describe_aws_instances: Fetch details about EC2 instances running in AWS.
  - query_database: Execute a SELECT query against the database.

This registry approach gives you:

Modularity: Each tool family lives in its own file. Easy to find, easy to test.
Discoverability: The registry lists all available tools. You can log them, validate them, or send them to the API.
Consistency: Every tool goes through the same register method. If you need to validate schemas, you add it once.
Testability: You can unit test handlers in isolation, then test the registry's execute logic separately.
Reusability: The ToolRegistry class works the same way across projects. Copy it, use it everywhere.

Putting It All Together: A Complete Example

Let's build a mini system that ties everything together. An agent that manages infrastructure:

typescript

import Anthropic from "@anthropic-ai/sdk";
 
interface ToolDefinition {
  name: string;
  description: string;
  input_schema: any;
}
 
interface RegisteredTool {
  definition: ToolDefinition;
  handler: (input: any) => Promise<any>;
}
 
class ToolRegistry {
  private tools: Map<string, RegisteredTool> = new Map();
 
  register(
    name: string,
    definition: ToolDefinition,
    handler: (input: any) => Promise<any>,
  ): void {
    this.tools.set(name, { definition, handler });
  }
 
  get(name: string): RegisteredTool | undefined {
    return this.tools.get(name);
  }
 
  getDefinitions(): ToolDefinition[] {
    return Array.from(this.tools.values()).map((tool) => tool.definition);
  }
 
  async execute(name: string, input: any): Promise<any> {
    const tool = this.get(name);
    if (!tool) throw new Error(`Tool not found: ${name}`);
    return tool.handler(input);
  }
}
 
// Set up tools
const registry = new ToolRegistry();
 
registry.register(
  "list_servers",
  {
    name: "list_servers",
    description: "List all managed servers and their current status",
    input_schema: {
      type: "object",
      properties: {
        environment: {
          type: "string",
          enum: ["staging", "production"],
          description: "Which environment to query",
        },
      },
      required: ["environment"],
      additionalProperties: false,
    },
  },
  async (input) => {
    const servers =
      input.environment === "production"
        ? [
            { id: "prod-01", status: "running", uptime_hours: 720 },
            { id: "prod-02", status: "running", uptime_hours: 342 },
          ]
        : [
            { id: "stage-01", status: "running", uptime_hours: 48 },
            { id: "stage-02", status: "stopped", uptime_hours: 0 },
          ];
    return { servers, count: servers.length };
  },
);
 
registry.register(
  "restart_server",
  {
    name: "restart_server",
    description: "Restart a specific server. Requires 5-10 minutes.",
    input_schema: {
      type: "object",
      properties: {
        server_id: {
          type: "string",
          description: "The server ID to restart",
        },
        force: {
          type: "boolean",
          description: "Force restart without graceful shutdown. Dangerous.",
          default: false,
        },
      },
      required: ["server_id"],
      additionalProperties: false,
    },
  },
  async (input) => {
    if (input.force) {
      return {
        status: "warning",
        message: `Force restart initiated on ${input.server_id}. Connections will be dropped immediately.`,
        restart_id: `restart_${Date.now()}`,
      };
    }
    return {
      status: "queued",
      message: `Graceful restart queued for ${input.server_id}. Active connections have 2 minutes to close.`,
      restart_id: `restart_${Date.now()}`,
    };
  },
);
 
// Agent loop
async function runInfrastructureAgent() {
  const client = new Anthropic();
  const messages: Anthropic.MessageParam[] = [];
 
  console.log("Infrastructure Agent starting...\n");
 
  // Initial instruction
  const userMessage =
    "Check the status of our production servers and restart prod-02 if it's not responding. Be careful with restarts.";
 
  messages.push({
    role: "user",
    content: userMessage,
  });
 
  console.log(`User: ${userMessage}\n`);
 
  // Agentic loop
  let continueLoop = true;
  while (continueLoop) {
    const response = await client.messages.create({
      model: "claude-3-5-sonnet-20241022",
      max_tokens: 1024,
      tools: registry.getDefinitions().map((def) => ({
        name: def.name,
        description: def.description,
        input_schema: def.input_schema,
      })),
      messages,
    });
 
    // Add assistant's response to conversation
    messages.push({
      role: "assistant",
      content: response.content,
    });
 
    // Check if we're done
    if (response.stop_reason === "end_turn") {
      continueLoop = false;
      // Extract and print the final text response
      const textBlock = response.content.find((block) => block.type === "text");
      if (textBlock && textBlock.type === "text") {
        console.log(`Agent: ${textBlock.text}\n`);
      }
    } else if (response.stop_reason === "tool_use") {
      // Process tool calls
      const toolUseBlocks = response.content.filter(
        (block) => block.type === "tool_use",
      );
 
      const toolResults: Anthropic.ToolResultBlockParam[] = [];
 
      for (const toolUse of toolUseBlocks) {
        if (toolUse.type === "tool_use") {
          console.log(`→ Calling tool: ${toolUse.name}`);
          console.log(`  Input: ${JSON.stringify(toolUse.input, null, 2)}`);
 
          try {
            const result = await registry.execute(toolUse.name, toolUse.input);
            console.log(`  Result: ${JSON.stringify(result, null, 2)}\n`);
 
            toolResults.push({
              type: "tool_result",
              tool_use_id: toolUse.id,
              content: JSON.stringify(result),
            });
          } catch (err) {
            const error = err instanceof Error ? err.message : String(err);
            console.log(`  Error: ${error}\n`);
 
            toolResults.push({
              type: "tool_result",
              tool_use_id: toolUse.id,
              content: `Error: ${error}`,
              is_error: true,
            });
          }
        }
      }
 
      // Add tool results to conversation
      messages.push({
        role: "user",
        content: toolResults,
      });
    } else {
      continueLoop = false;
    }
  }
}
 
// Run the agent
runInfrastructureAgent().catch(console.error);

Expected Output:

Infrastructure Agent starting...

User: Check the status of our production servers and restart prod-02 if it's not responding. Be careful with restarts.

→ Calling tool: list_servers
  Input: {
    "environment": "production"
  }
  Result: {
    "servers": [
      {
        "id": "prod-01",
        "status": "running",
        "uptime_hours": 720
      },
      {
        "id": "prod-02",
        "status": "running",
        "uptime_hours": 342
      }
    ],
    "count": 2
  }

→ Calling tool: restart_server
  Input: {
    "server_id": "prod-02",
    "force": false
  }
  Result: {
    "status": "queued",
    "message": "Graceful restart queued for prod-02. Active connections have 2 minutes to close.",
    "restart_id": "restart_1710694800000"
  }

Agent: I've checked the production servers and both prod-01 and prod-02 are currently running with healthy uptime. Since prod-02 is responding normally, I initiated a graceful restart as requested. The restart will begin shortly, giving active connections 2 minutes to close properly.

Best Practices and Common Pitfalls

Do This:

Be specific in descriptions. "Fetch user data from database" beats "Get data." Specificity helps Claude choose the right tool.
Use enums for fixed values. Never let Claude invent possibilities. The enum constraint is your best friend.
Validate semantically. The schema validates syntax; your handler validates whether the request makes real-world sense. Do both.
Return errors gracefully. Use error fields, not exceptions. Claude can understand error fields; crashed handlers lose the agent's context.
Document side effects. If a tool modifies data, say so explicitly. "This tool is read-only" or "This tool creates a database record" are critical details.
Test handlers in isolation. Before wiring them into an agent, run them directly with various inputs. Edge cases hide in the details.

Don't Do This:

Don't make handlers synchronous. Always use async, even if the operation is fast. It keeps your code composable and future-proof.
Don't expose internal errors. Sanitize them. Don't tell Claude "PostgreSQL connection timeout" if that's an implementation detail. Say "Database temporarily unavailable, retrying."
Don't allow infinite loops. If a tool can fail, have your handler return immediately with an error. Don't retry inside the handler — let Claude decide whether to retry.
Don't make parameters vague. "Filter" is vague. "Filter by instance state (e.g., 'running', 'stopped')" is clear. "Sort order" is vague. "Sort order: 'ascending' or 'descending'" is clear.
Don't skip the required array. If a parameter is truly required, put it in required. Don't rely on Claude to figure it out.

Advanced Tool Patterns: Handling Complex Scenarios

As you build more sophisticated tools, you'll encounter scenarios that demand deeper consideration. These patterns emerge from real-world deployments where tools must handle complexity gracefully.

Streaming and Progressive Results

Some operations produce results gradually. Maybe you're querying a large database and want to show results as they arrive rather than waiting for completion. Your tool can support this by implementing a streaming interface. Instead of returning all results at once, you return them in chunks with progress indicators. Claude can then handle intermediate results appropriately, deciding whether to consume the stream or wait for completion.

The key insight is that your tool schema can describe this behavior. Declare that your tool returns data progressively, and Claude will adapt its behavior. You might include a stream_enabled parameter that allows Claude to request streaming versus batched results. Your handler checks this parameter and behaves accordingly.

Pagination and Large Result Sets

Real databases return thousands of records. Returning all of them would overflow Claude's context. The solution is pagination. Your schema should include parameters like page, page_size, and your response should include metadata like total_count, has_more, and next_page_token.

But here's the subtle part: don't make Claude figure out pagination manually. Instead, document the pagination pattern clearly so Claude knows how to iterate:

Your description might say: "Results are paginated. Use page parameter to iterate. Response includes has_more flag. When has_more is true, increment page and call again."

This explicit guidance prevents Claude from making the same request repeatedly or missing results by not iterating fully.

Timeouts and Rate Limiting

External APIs often have rate limits and timeouts. Your tool needs to handle these gracefully. When Claude hits a rate limit, it shouldn't retry immediately. When an operation times out, the handler should return a clear message about what happened.

Document this in your description: "This tool respects API rate limits. If you receive a 429 error, wait 30 seconds before retrying. Timeouts typically resolve themselves; retry once after 5 seconds."

In your handler, catch these errors explicitly:

Your handler detects rate limit errors and returns them with retry guidance. It catches timeout errors and suggests whether immediate retry makes sense or whether waiting is needed. Claude reads these messages and adjusts its behavior accordingly.

Caching and Deduplication

If the same query runs twice in a session, you could cache results. This saves API calls and improves speed. Your handler could maintain a simple in-memory cache, keyed by the input parameters. Before executing a query, you check if results already exist in cache.

The trick is making this transparent to Claude. It doesn't need to know about caching. From its perspective, it makes a query and gets a result. That result might be fresh or cached—it doesn't matter. Your handler makes that decision.

However, you might want to flag when cached results are returned, especially if they're stale. Include a cached flag in your response: { instances: [...], cached: true, cached_at: "2026-03-17T10:30:00Z" }. Claude can then decide if it needs fresh data.

Authorization and Access Control

Some tools can only be used by certain users or with certain credentials. Your handler can enforce this. Accept an optional auth_token or api_key parameter, validate it against your authorization system, and return an error if unauthorized.

This keeps access control logic in one place—your handler. Claude doesn't need to worry about who's allowed to call what. Your handler decides.

For sensitive operations, you might want to log every invocation. Include metadata in your response: { result: ..., executed_by: "user_id_123", execution_time_ms: 45, audit_logged: true }. This creates an audit trail.

Tool Testing and Validation

Before deploying a tool to production agents, validate it thoroughly. Create a test harness:

Write tests for normal cases, edge cases, and error cases. Test what happens when parameters are missing, when external services are slow, when rate limits are hit, when data is malformed.

Run these tests before every deployment. Make tool testing part of your CI/CD pipeline. Don't let a poorly tested tool reach production.

Test your tool both in isolation and integrated with agents. In isolation, you verify the handler works correctly. Integrated with agents, you verify Claude uses the tool correctly and interprets results appropriately.

Tool Versioning and Evolution

Tools evolve. You add parameters, change return structures, improve descriptions. How do you do this without breaking existing agents?

Backward Compatibility: When changing a tool, maintain backward compatibility. If you add a new parameter, make it optional. If you change the response structure, keep old fields and add new ones separately. Old agents will ignore new fields. New agents will use them.

Versioning Strategy: Alternatively, version your tools explicitly. Name them describe_aws_instances_v1, describe_aws_instances_v2. Agents explicitly request the version they expect. This prevents breaking changes but requires agents to migrate explicitly.

Most real-world deployments use the backward-compatibility approach for minor changes, and versioning only for major breaking changes.

Wrapping Up

Custom tools are how agents interact with the world. Define them with crystal-clear schemas, implement handlers that are bulletproof, validate inputs at both the schema and semantic level, and organize them into registries you can reuse.

The pattern is: declare what you promise, keep that promise, handle failures gracefully, and never crash the agent. Do that, and you're building systems that actually work. Your agents will be smarter, more reliable, and less likely to hallucinate about what's possible. The contract matters because Claude will respect it if you define it clearly.

As your tools grow more sophisticated, remember: the best tools are the ones that get out of Claude's way. Clear schemas, predictable behavior, helpful error messages, and consistent patterns. Those are the foundations of reliable agent systems.

Your tools are the vocabulary of your agent system. Make that vocabulary rich, precise, and trustworthy. Everything else flows from there.

-iNet

The Hidden Complexity of Tool Design

Before you start building tools, understand what you're really doing. A tool definition is more than an API schema. It's a promise to Claude about what's possible, what's safe, and what will happen. When Claude looks at your tool definition, it's asking: "Do I understand this well enough to use it responsibly?"

This responsibility cuts both ways. You're responsible for defining the tool accurately. Claude is responsible for interpreting your definition correctly. When both sides honor that responsibility, tools work seamlessly. When one side fails, the whole system breaks.

Consider a tool that deletes data. Your schema might say "Delete a database record by ID." But what Claude needs to know is deeper: "What happens if the record doesn't exist? What if there are foreign key constraints? What if the user who issued the command isn't authorized?" These aren't edge cases. They're the normal operating environment. The best tool definitions acknowledge this complexity explicitly. They explain constraints. They give Claude the full picture so it can make good decisions.

Understanding Tool Input Validation

Input validation is where the rubber meets the road. The JSON schema provides structural validation (is this a string, a number, an array?). But semantic validation�does this make sense?�is your job. Here's a concrete example: Your tool accepts a database_name parameter. The schema validates that it's a string. Good. But your handler should also validate that the database actually exists, that the user has permission to access it, and that the name doesn't contain injection attacks. This layered approach catches mistakes at multiple levels. A malformed request fails at the schema stage. A syntactically valid but semantically nonsensical request fails in the handler. An unauthorized request fails at the authorization stage.

Tool Documentation as a Form of API Design

How you document a tool shapes how Claude uses it. A vague description leads to vague usage. A precise description leads to smart, targeted usage. Good tool documentation includes what it does, when to use it, when NOT to use it, what it requires, what it returns, side effects, performance characteristics, and pagination patterns. This might seem verbose, but it's an investment that pays dividends every time Claude uses the tool.

The Cost of Poor Tool Integration

When tools are poorly designed, the cost compounds throughout your system. A single unreliable tool doesn't just fail occasionally. It makes your agent hesitant to use any tools. Your agent starts reverting to purely reasoning-based responses instead of taking actions. It hedges its responses because it doesn't trust the outputs. The whole system slows down and becomes less capable. Contrast that with well-designed tools: Claude uses them confidently, chains them together for complex operations, and tries increasingly sophisticated things because the tools are reliable. This emergence of capability is what great tooling enables.

Agent SDK: Custom Tool Definitions and Handlers

Understanding the Tool Contract

The Hidden Layer: Why Tool Design Matters

The Psychology of Tool Reliability

Defining Tool Schemas with JSON Schema

Implementing Handlers with Error Handling

Input Validation and Type Safety

Async Handlers and Long-Running Operations

Building Reusable Tool Libraries

Putting It All Together: A Complete Example

Best Practices and Common Pitfalls

Advanced Tool Patterns: Handling Complex Scenarios

Streaming and Progressive Results

Timeouts and Rate Limiting

Caching and Deduplication

Authorization and Access Control

Tool Testing and Validation

Tool Versioning and Evolution

Wrapping Up

The Hidden Complexity of Tool Design

Understanding Tool Input Validation

Tool Documentation as a Form of API Design

The Cost of Poor Tool Integration

Need help implementing this?