August 4, 2025
Claude AI Development

Building a CLI Tool with the Claude Code Agent SDK

Let's say you've got a database. You run queries manually. You write migrations by hand. You debug production issues while staring at terminal output. It's tedious. What if instead, you could say: "Claude, check the user table for records where createdat is null" and Claude _actually does it? Not just talks about it—actually executes the query, gets the results, and helps you interpret them.

That's what the Claude Code Agent SDK makes possible. You can embed Claude directly into your CLI tools. Instead of a traditional command-line interface with fixed subcommands, you get an intelligent interface that understands your domain, accesses your systems, and reasons about the results. Your users type natural language questions. Claude figures out which tools to invoke, interprets results, and explains what they mean.

In this article, we'll build a real-world example: a database management CLI that lets users interact with their database through Claude. You'll learn how to scaffold a CLI project, define domain-specific tools, handle terminal I/O, and package everything into a distributable tool. By the end, you'll have a working CLI that you can extend for your own use cases.

Table of Contents
  1. Why an Agent-Based CLI Changes Everything
  2. The Architecture: CLI + Agent SDK
  3. Scaffolding the Project
  4. Building the Entry Point
  5. Defining Domain-Specific Tools
  6. Setting Up the Session
  7. Terminal I/O and Formatting
  8. Configuration Management
  9. Packaging and Distribution
  10. Using the CLI: Real Examples
  11. Example 1: List Tables
  12. Example 2: Analyze a Table
  13. Example 3: Complex Query
  14. Example 4: Schema Investigation
  15. Multi-Turn Conversations and Memory
  16. Deployment Strategies
  17. Error Handling and Resilience in AI-Powered CLIs
  18. Extending the CLI: Adding New Tools
  19. Testing Your CLI Tool
  20. Why This Pattern Scales: The Underlying Principles
  21. Understanding the Power of Declarative Tool Design
  22. Advanced Session Management: Conversation Continuity
  23. Production Patterns: Observability and Monitoring
  24. Handling Ambiguity and Natural Language Edge Cases
  25. Extending the CLI: Adding Plugins and Custom Tools
  26. The Critical Importance of Error Handling in Agent-Based Tools
  27. Designing Tools for Claude's Decision-Making
  28. Real-World Deployment Strategies
  29. Authentication and Credential Management in Agent-Based CLIs
  30. Measuring Success: Metrics for AI-Powered CLIs
  31. Common Pitfalls and How to Avoid Them
  32. Real-World Extensions: Beyond Databases
  33. The Future of CLIs

Why an Agent-Based CLI Changes Everything

Before we build, understand what makes this different from traditional tools. A traditional CLI tool is a rigid pipeline: input → logic → output. You invoke it with specific arguments in a specific format, and it executes a pre-determined sequence of operations. If you need something slightly different, you have to invoke the tool again with different arguments, or you have to manually combine multiple tools.

An agent-based CLI is fundamentally different. It takes natural language input and decides what to do. You don't have to memorize command syntax. You don't have to know exactly which tool to invoke. You describe what you want, and Claude figures out the path to get there. This is powerful because it matches how humans think about problems. "Find me all users who haven't logged in recently" is a natural question. Translating that into SQL syntax, invoking the right CLI tool, and interpreting the results is tedious. An agent does all of that translation automatically.

The practical impact is profound. Domain experts (your database administrators, your system operators) can now interact with systems using plain language instead of specialized syntax. This democratizes access. It reduces errors because Claude can ask clarifying questions. It's faster because you're not context-switching between tool invocation and interpretation. And it's more powerful because Claude can chain multiple operations together, combining results from different tools to answer complex questions that no single tool could answer alone.

This is why agent-based CLIs are emerging as a new category of tools. They're not replacing traditional CLIs—you still need the underlying tools like psql or aws for scripts and automation. But for interactive use, for exploration, for one-off operations, agent-based CLIs are fundamentally better. They meet users where they are instead of forcing users to learn specialized syntax.

This is powerful because it changes the mental model. Instead of memorizing SELECT COUNT(*) FROM users WHERE created_at > NOW() - interval '7 days', you just ask: "How many users signed up in the last week?" Claude translates intent to SQL, executes it, and explains the result. This is faster for experts and vastly more accessible for novices.

The Architecture: CLI + Agent SDK

Here's the structure we're building:

dbcli/
├── src/
│   ├── index.ts              # Entry point
│   ├── session.ts            # Session setup and tool registration
│   ├── tools/
│   │   ├── database.ts       # Database tools (query, migrate, etc.)
│   │   ├── introspection.ts  # Schema inspection tools
│   │   └── analysis.ts       # Data analysis tools
│   ├── io/
│   │   ├── terminal.ts       # Terminal I/O helpers
│   │   └── formatting.ts     # Result formatting
│   └── config.ts             # Configuration management
├── package.json
├── tsconfig.json
└── README.md

The CLI acts as a thin wrapper around the Agent SDK. You set up tools specific to database operations, and Claude uses them to accomplish tasks. The secret is clear separation: the CLI handles I/O, configuration, and user interaction. The Agent SDK handles reasoning, tool coordination, and conversation management.

Think of it like a restaurant kitchen: the CLI is the cashier taking orders and serving plates. The Agent SDK is the chef coordinating the kitchen, deciding what needs to cook and in what order.

Scaffolding the Project

Start with a basic Node.js project:

bash
mkdir dbcli
cd dbcli
npm init -y
npm install @anthropic-ai/claude-code dotenv
npm install -D typescript @types/node ts-node

Create tsconfig.json to configure TypeScript compilation and type checking. This file tells the TypeScript compiler how to transpile your code and where to output the compiled JavaScript:

json
{
  "compilerOptions": {
    "target": "ES2020",
    "module": "commonjs",
    "lib": ["ES2020"],
    "outDir": "./dist",
    "rootDir": "./src",
    "strict": true,
    "esModuleInterop": true,
    "skipLibCheck": true,
    "forceConsistentCasingInFileNames": true,
    "resolveJsonModule": true,
    "declaration": true,
    "declarationMap": true,
    "sourceMap": true
  },
  "include": ["src/**/*"],
  "exclude": ["node_modules", "dist"]
}

The strict: true flag enforces best practices like explicit type annotations. declaration: true generates type definition files so other projects can use your CLI as a library. sourceMap: true helps with debugging by mapping compiled JavaScript back to original TypeScript.

Create a .env file for your API key and database connection. Keep this in .gitignore to avoid accidentally committing credentials:

bash
ANTHROPIC_API_KEY=sk-ant-...
DATABASE_URL=postgres://user:password@localhost/dbname

Now the fun part: let's build the entry point that creates the interactive REPL.

Building the Entry Point

Create src/index.ts. This is where the CLI boots up and enters the interactive loop:

typescript
#!/usr/bin/env node
 
import readline from "readline";
import { ClaudeCodeSession } from "@anthropic-ai/claude-code";
import { setupSession } from "./session";
import { formatResponse, clearScreen } from "./io/formatting";
 
async function main() {
  // Load configuration from environment
  const apiKey = process.env.ANTHROPIC_API_KEY;
  const dbUrl = process.env.DATABASE_URL;
 
  if (!apiKey) {
    console.error("Error: ANTHROPIC_API_KEY not set");
    process.exit(1);
  }
 
  if (!dbUrl) {
    console.error("Error: DATABASE_URL not set");
    process.exit(1);
  }
 
  // Initialize session with tools
  const session = await setupSession({ apiKey, dbUrl });
 
  // Show welcome message
  console.log("\n🗄️  Database CLI powered by Claude Code");
  console.log("Type 'help' for commands, 'exit' to quit.\n");
 
  // Create interactive readline interface
  const rl = readline.createInterface({
    input: process.stdin,
    output: process.stdout,
  });
 
  const askQuestion = (): void => {
    rl.question("dbcli> ", async (input: string) => {
      const trimmed = input.trim();
 
      // Handle special commands
      if (trimmed === "exit" || trimmed === "quit") {
        console.log("Goodbye!");
        rl.close();
        process.exit(0);
      }
 
      if (trimmed === "help") {
        showHelp();
        askQuestion();
        return;
      }
 
      if (trimmed === "clear") {
        clearScreen();
        askQuestion();
        return;
      }
 
      if (!trimmed) {
        askQuestion();
        return;
      }
 
      // Process user input through Claude
      try {
        console.log("\n⏳ Processing...\n");
        const response = await session.message(trimmed);
        formatResponse(response);
      } catch (error) {
        console.error(`\n❌ Error: ${error.message}\n`);
      }
 
      askQuestion();
    });
  };
 
  askQuestion();
}
 
function showHelp(): void {
  console.log(`
Available commands:
  help           - Show this help message
  exit/quit      - Exit the CLI
  clear          - Clear the screen
 
Natural language queries (examples):
  "List all users"
  "How many records are in the orders table?"
  "Show me the schema for the products table"
  "Create a migration for adding an email column"
  "Run the latest migration"
`);
}
 
main().catch((error) => {
  console.error("Fatal error:", error);
  process.exit(1);
});

This is your REPL (read-eval-print loop). It:

  1. Loads configuration from environment variables—API key and database URL come from .env
  2. Initializes the session with database-specific tools we'll define next
  3. Reads user input from the terminal using readline for nice interactive prompts
  4. Sends it to Claude for processing—Claude reads the input, decides which tools to use, and executes them
  5. Formats and displays the response using helpers we'll build
  6. Loops until the user exits

The key insight: the CLI is just I/O plumbing. All the intelligence lives in the session and its tools. Claude decides what to do. The CLI just facilitates the conversation.

Defining Domain-Specific Tools

Now let's build the tools that Claude will actually use. These are specific to database management. Each tool is a JavaScript function wrapped with metadata that Claude can understand and invoke.

Create src/tools/database.ts. These are the core database operations:

typescript
import { Pool, QueryResult } from "pg"; // Using PostgreSQL as example
 
export function createDatabaseTools(pool: Pool) {
  return [
    {
      name: "execute_query",
      description:
        "Execute a SQL query against the database. Use for SELECT, INSERT, UPDATE, DELETE, CREATE TABLE, etc.",
      input_schema: {
        type: "object" as const,
        properties: {
          query: {
            type: "string",
            description: "The SQL query to execute",
          },
          params: {
            type: "array",
            items: { type: ["string", "number", "boolean", "null"] },
            description:
              "Optional parameterized query values (for safety, use this instead of string interpolation)",
          },
        },
        required: ["query"],
      },
      handler: async (input: { query: string; params?: any[] }) => {
        try {
          const result = await pool.query(input.query, input.params);
          return {
            success: true,
            rows: result.rows,
            rowCount: result.rowCount,
            fields: result.fields.map((f) => ({
              name: f.name,
              type: f.dataTypeID,
            })),
          };
        } catch (error) {
          return {
            success: false,
            error: error.message,
          };
        }
      },
    },
 
    {
      name: "list_tables",
      description: "List all tables in the current database",
      input_schema: {
        type: "object" as const,
        properties: {},
        required: [],
      },
      handler: async () => {
        try {
          const result = await pool.query(
            `SELECT table_name FROM information_schema.tables
             WHERE table_schema = 'public'
             ORDER BY table_name`,
          );
          return {
            success: true,
            tables: result.rows.map((row) => row.table_name),
          };
        } catch (error) {
          return {
            success: false,
            error: error.message,
          };
        }
      },
    },
 
    {
      name: "describe_table",
      description: "Get the schema of a specific table",
      input_schema: {
        type: "object" as const,
        properties: {
          table_name: {
            type: "string",
            description: "The name of the table to describe",
          },
        },
        required: ["table_name"],
      },
      handler: async (input: { table_name: string }) => {
        try {
          const result = await pool.query(
            `SELECT column_name, data_type, is_nullable, column_default
             FROM information_schema.columns
             WHERE table_name = $1
             ORDER BY ordinal_position`,
            [input.table_name],
          );
 
          if (result.rows.length === 0) {
            return {
              success: false,
              error: `Table "${input.table_name}" not found`,
            };
          }
 
          return {
            success: true,
            table: input.table_name,
            columns: result.rows.map((row) => ({
              name: row.column_name,
              type: row.data_type,
              nullable: row.is_nullable === "YES",
              default: row.column_default,
            })),
          };
        } catch (error) {
          return {
            success: false,
            error: error.message,
          };
        }
      },
    },
 
    {
      name: "get_table_count",
      description: "Count the number of rows in a table",
      input_schema: {
        type: "object" as const,
        properties: {
          table_name: {
            type: "string",
            description: "The table name",
          },
        },
        required: ["table_name"],
      },
      handler: async (input: { table_name: string }) => {
        try {
          const result = await pool.query(
            `SELECT COUNT(*) as count FROM ${input.table_name}`,
          );
          return {
            success: true,
            table: input.table_name,
            count: parseInt(result.rows[0].count, 10),
          };
        } catch (error) {
          return {
            success: false,
            error: error.message,
          };
        }
      },
    },
  ];
}

Key design decisions here:

  1. Parameterized queries: The params field encourages safe query construction. Claude will use this instead of string interpolation, preventing SQL injection vulnerabilities.

  2. Structured responses: Each tool returns { success, data/error }. Claude can parse this structure reliably and reason about failures.

  3. Restricted scope: These tools only do database work. No arbitrary command execution. No file system access. Claude is sandboxed to just SQL operations, limiting damage from mistakes.

  4. Error handling: Every tool catches exceptions and returns error information. Claude can reason about what went wrong and suggest fixes.

Let's add schema introspection tools. Create src/tools/introspection.ts:

typescript
import { Pool } from "pg";
 
export function createIntrospectionTools(pool: Pool) {
  return [
    {
      name: "get_foreign_keys",
      description: "Get all foreign key relationships in the database",
      input_schema: {
        type: "object" as const,
        properties: {
          table_name: {
            type: "string",
            description: "Optional: filter to a specific table",
          },
        },
        required: [],
      },
      handler: async (input: { table_name?: string }) => {
        try {
          let query = `
            SELECT
              constraint_name,
              table_name,
              column_name,
              foreign_table_name,
              foreign_column_name
            FROM information_schema.referential_constraints
            NATURAL JOIN information_schema.key_column_usage
          `;
 
          const params = [];
          if (input.table_name) {
            query += ` WHERE table_name = $1`;
            params.push(input.table_name);
          }
 
          const result = await pool.query(query, params);
          return {
            success: true,
            relationships: result.rows,
          };
        } catch (error) {
          return {
            success: false,
            error: error.message,
          };
        }
      },
    },
 
    {
      name: "get_indexes",
      description: "List all indexes in a table",
      input_schema: {
        type: "object" as const,
        properties: {
          table_name: {
            type: "string",
            description: "The table name",
          },
        },
        required: ["table_name"],
      },
      handler: async (input: { table_name: string }) => {
        try {
          const result = await pool.query(
            `SELECT indexname, indexdef FROM pg_indexes WHERE tablename = $1`,
            [input.table_name],
          );
          return {
            success: true,
            table: input.table_name,
            indexes: result.rows,
          };
        } catch (error) {
          return {
            success: false,
            error: error.message,
          };
        }
      },
    },
  ];
}

Now create src/tools/analysis.ts for data analysis capabilities:

typescript
import { Pool } from "pg";
 
export function createAnalysisTools(pool: Pool) {
  return [
    {
      name: "analyze_table_stats",
      description: "Get statistics about a table (row count, size, etc.)",
      input_schema: {
        type: "object" as const,
        properties: {
          table_name: {
            type: "string",
            description: "The table to analyze",
          },
        },
        required: ["table_name"],
      },
      handler: async (input: { table_name: string }) => {
        try {
          const result = await pool.query(
            `SELECT
              schemaname, tablename,
              pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size,
              n_live_tup as row_count,
              n_dead_tup as dead_rows,
              last_vacuum, last_autovacuum
            FROM pg_stat_user_tables
            WHERE tablename = $1`,
            [input.table_name],
          );
 
          if (result.rows.length === 0) {
            return {
              success: false,
              error: `Table "${input.table_name}" not found`,
            };
          }
 
          return {
            success: true,
            stats: result.rows[0],
          };
        } catch (error) {
          return {
            success: false,
            error: error.message,
          };
        }
      },
    },
 
    {
      name: "find_null_values",
      description:
        "Find columns with NULL values and their frequency in a table",
      input_schema: {
        type: "object" as const,
        properties: {
          table_name: {
            type: "string",
            description: "The table to scan",
          },
        },
        required: ["table_name"],
      },
      handler: async (input: { table_name: string }) => {
        try {
          // Get all columns first
          const columnsResult = await pool.query(
            `SELECT column_name FROM information_schema.columns
             WHERE table_name = $1`,
            [input.table_name],
          );
 
          const columns = columnsResult.rows.map((row) => row.column_name);
 
          // Build a query that checks null counts for each column
          const nullChecks = columns
            .map(
              (col) =>
                `COUNT(CASE WHEN "${col}" IS NULL THEN 1 END) as "${col}_nulls"`,
            )
            .join(", ");
 
          const countQuery = `SELECT COUNT(*) as total, ${nullChecks} FROM ${input.table_name}`;
          const result = await pool.query(countQuery);
          const row = result.rows[0];
          const total = row.total;
 
          const nullSummary = columns.map((col) => ({
            column: col,
            null_count: row[`${col}_nulls`],
            null_percentage: ((row[`${col}_nulls`] / total) * 100).toFixed(2),
          }));
 
          return {
            success: true,
            table: input.table_name,
            total_rows: total,
            null_summary: nullSummary.filter((s) => s.null_count > 0),
          };
        } catch (error) {
          return {
            success: false,
            error: error.message,
          };
        }
      },
    },
  ];
}

These tools give Claude real database capabilities. It can:

  • Execute queries and see results immediately
  • Understand the schema through introspection
  • Analyze data quality issues
  • Check relationships between tables

Notice how each tool is focused and composable. Claude can chain them together: "Describe the users table, then count the rows, then find NULL values in the email column." Each tool does one thing well, and Claude orchestrates them.

Setting Up the Session

Create src/session.ts to tie everything together. This is where tools get registered with the Claude Code session:

typescript
import { Pool } from "pg";
import { ClaudeCodeSession } from "@anthropic-ai/claude-code";
import { createDatabaseTools } from "./tools/database";
import { createIntrospectionTools } from "./tools/introspection";
import { createAnalysisTools } from "./tools/analysis";
 
export async function setupSession(config: {
  apiKey: string;
  dbUrl: string;
}): Promise<ClaudeCodeSession> {
  // Initialize database pool
  const pool = new Pool({
    connectionString: config.dbUrl,
  });
 
  // Test connection
  try {
    await pool.query("SELECT 1");
  } catch (error) {
    throw new Error(`Failed to connect to database: ${error.message}`);
  }
 
  // Create all tools
  const allTools = [
    ...createDatabaseTools(pool),
    ...createIntrospectionTools(pool),
    ...createAnalysisTools(pool),
  ];
 
  // Initialize Claude Code session
  const session = new ClaudeCodeSession({
    apiKey: config.apiKey,
    model: "claude-3-5-sonnet-20241022",
    workingDirectory: process.cwd(),
    tools: allTools,
  });
 
  // Set a custom system prompt for database operations
  session.setSystemPrompt(`
You are an expert database administrator and analyst. You help users:
1. Query and explore databases
2. Understand schemas and relationships
3. Analyze data quality and performance
4. Write migrations and manage schemas
5. Debug data issues
 
Guidelines:
- Always use parameterized queries (the params field) to prevent SQL injection
- Explain what you're doing and why
- Show the results in a clear, readable format
- Suggest optimizations when you notice issues
- Ask for clarification if ambiguous
  `);
 
  // Add logging for tool calls (optional, helps with debugging)
  session.onBeforeToolCall((toolCall) => {
    console.log(`[TOOL] Calling: ${toolCall.name}`);
  });
 
  return session;
}

The session setup is where the magic happens. We create a database connection pool (which handles connection reuse for efficiency), bundle all our tools together, and initialize Claude with a system prompt that teaches it how to be a database expert.

Terminal I/O and Formatting

Create src/io/formatting.ts to format responses nicely for terminal output:

typescript
export function formatResponse(response: any): void {
  if (!response) {
    console.log("No response");
    return;
  }
 
  // If it's a text response, print it
  if (typeof response === "string") {
    console.log(response);
    return;
  }
 
  // If it has a .text field, print that
  if (response.text) {
    console.log(response.text);
  }
 
  // If there are tool calls in the response, show them
  if (response.toolCalls && response.toolCalls.length > 0) {
    console.log("\n📋 Tool calls made:");
    response.toolCalls.forEach((call: any) => {
      console.log(`  - ${call.name}(${JSON.stringify(call.input)})`);
    });
  }
 
  console.log(""); // Blank line for readability
}
 
export function formatTable(columns: string[], rows: any[][]): string {
  if (rows.length === 0) {
    return "No results";
  }
 
  // Calculate column widths
  const widths = columns.map((col) =>
    Math.max(col.length, ...rows.map((row) => String(row[col] || "").length)),
  );
 
  // Build header
  const header = columns.map((col, i) => col.padEnd(widths[i])).join(" | ");
 
  // Build separator
  const separator = widths.map((w) => "-".repeat(w)).join("-+-");
 
  // Build rows
  const formattedRows = rows
    .map((row) =>
      columns
        .map((col, i) => String(row[col] || "").padEnd(widths[i]))
        .join(" | "),
    )
    .join("\n");
 
  return [header, separator, formattedRows].join("\n");
}
 
export function clearScreen(): void {
  console.clear();
}

These helpers make output readable in the terminal. Tables are aligned, headers are clear, tool invocations are visible.

Configuration Management

Create src/config.ts to manage configuration loading and validation:

typescript
import dotenv from "dotenv";
 
dotenv.config();
 
export const config = {
  apiKey: process.env.ANTHROPIC_API_KEY || "",
  databaseUrl: process.env.DATABASE_URL || "",
  debug: process.env.DEBUG === "true",
  model: process.env.MODEL || "claude-3-5-sonnet-20241022",
};
 
export function validateConfig(): string[] {
  const errors: string[] = [];
 
  if (!config.apiKey) {
    errors.push("ANTHROPIC_API_KEY not set");
  }
 
  if (!config.databaseUrl) {
    errors.push("DATABASE_URL not set");
  }
 
  return errors;
}

Packaging and Distribution

Update package.json to make this executable and installable:

json
{
  "name": "dbcli",
  "version": "1.0.0",
  "description": "AI-powered database management CLI",
  "main": "dist/index.js",
  "bin": {
    "dbcli": "dist/index.js"
  },
  "scripts": {
    "build": "tsc",
    "start": "ts-node src/index.ts",
    "dev": "ts-node src/index.ts",
    "prepare": "npm run build"
  },
  "dependencies": {
    "@anthropic-ai/claude-code": "^1.0.0",
    "dotenv": "^16.0.3",
    "pg": "^8.10.0"
  },
  "devDependencies": {
    "@types/node": "^20.0.0",
    "@types/pg": "^8.10.0",
    "typescript": "^5.0.0",
    "ts-node": "^10.9.0"
  }
}

The bin field tells npm this is a CLI tool. The prepare script runs before publishing to npm, ensuring the code is compiled. The #!/usr/bin/env node shebang in index.ts makes the file executable.

Build the CLI:

bash
npm run build

Now you can run it:

bash
# Development
npm run dev
 
# Or after building
node dist/index.js

To install it globally for use anywhere:

bash
npm install -g .
dbcli

Using the CLI: Real Examples

Once running, try these interactions:

Example 1: List Tables

dbcli> Show me all the tables in this database

Claude will call list_tables to get the table names and present them nicely formatted.

Example 2: Analyze a Table

dbcli> Analyze the users table for data quality issues

Claude will:

  1. Call describe_table("users") to get the schema
  2. Call analyze_table_stats("users") to get size/row count
  3. Call find_null_values("users") to find problems
  4. Synthesize the information and present actionable insights

Example 3: Complex Query

dbcli> Find all orders from the last 7 days that haven't been shipped yet

Claude will figure out the schema, write the appropriate SQL with parameterized values, execute it, and show results.

Example 4: Schema Investigation

dbcli> What does the user_permissions table look like, and what other tables does it reference?

Claude will call describe_table and get_foreign_keys to understand relationships.

Multi-Turn Conversations and Memory

The CLI maintains conversation context. This is powerful because Claude remembers what you've asked and can build on previous results. You're having a conversation, not running isolated commands.

typescript
// The session automatically keeps history
const session = setupSession(config);
 
// Turn 1
await session.message("List all tables");
// history = [{ user: "List all tables" }, { assistant: "..." }]
 
// Turn 2
await session.message("Describe the orders table");
// history = [{ user: "..." }, { assistant: "..." }, { user: "..." }, { assistant: "..." }]
// Claude sees both the first and second interaction
 
// Turn 3
await session.message("Which one has more rows?");
// Claude knows you're comparing the two tables you just discussed

This conversational context is why AI-powered CLIs feel natural. You're having a discussion with an expert, not issuing commands to a dumb program.

Deployment Strategies

Once your CLI is built, distribute it:

NPM package: Publish to npm for easy installation. Users run npm install -g dbcli and it's available globally. Perfect for Node.js developers.

Docker container: Bundle the CLI in a Docker image. Users run docker run dbcli with environment variables. Perfect for CI/CD pipelines where Node.js might not be installed.

Standalone binary: Use pkg to create a single executable. No Node.js dependency needed. Users download one file and run it. Best for maximum accessibility.

Error Handling and Resilience in AI-Powered CLIs

When you're building tools that interact with systems via Claude, error handling becomes more complex. Claude might misunderstand a request, call the wrong tool, or pass invalid parameters. You need graceful degradation and clear error messages.

Here's how to build resilient error handling:

typescript
// Wrapper around tool execution that handles errors gracefully
async function executeToolSafely(
  toolCall: any,
  session: ClaudeCodeSession,
): Promise<any> {
  try {
    // Execute the tool with timeout
    const timeoutPromise = new Promise((_, reject) =>
      setTimeout(
        () => reject(new Error("Tool execution timeout (30s)")),
        30000,
      ),
    );
 
    const result = await Promise.race([
      session.executeTool(toolCall),
      timeoutPromise,
    ]);
 
    // Validate result structure
    if (!result || typeof result !== "object") {
      throw new Error("Tool returned invalid result structure");
    }
 
    return result;
  } catch (error) {
    // Log error for debugging
    console.error(`Tool error: ${toolCall.name}`, error.message);
 
    // Return structured error response
    return {
      success: false,
      error: error.message,
      toolName: toolCall.name,
    };
  }
}

When tools fail, Claude sees the error and can retry with different parameters or escalate to the user. "The database connection failed. Let me check your connection settings... I think you might need to update your DATABASE_URL environment variable."

Extending the CLI: Adding New Tools

One of the biggest advantages of this architecture is extensibility. Adding new tools is straightforward. Let's add a migration management tool:

typescript
// src/tools/migrations.ts
import { Pool } from "pg";
import * as fs from "fs";
import * as path from "path";
 
export function createMigrationTools(pool: Pool) {
  return [
    {
      name: "list_migrations",
      description: "List all database migrations and their status",
      input_schema: {
        type: "object" as const,
        properties: {},
        required: [],
      },
      handler: async () => {
        try {
          // Check for migrations table
          await pool.query(`
            CREATE TABLE IF NOT EXISTS migrations (
              id SERIAL PRIMARY KEY,
              name VARCHAR(255) UNIQUE,
              applied_at TIMESTAMP DEFAULT NOW()
            )
          `);
 
          const result = await pool.query(
            "SELECT name, applied_at FROM migrations ORDER BY applied_at DESC",
          );
 
          return {
            success: true,
            migrations: result.rows,
          };
        } catch (error) {
          return {
            success: false,
            error: error.message,
          };
        }
      },
    },
 
    {
      name: "run_migration",
      description: "Run a migration file against the database",
      input_schema: {
        type: "object" as const,
        properties: {
          migration_file: {
            type: "string",
            description:
              "Path to migration file (e.g., migrations/001_create_users.sql)",
          },
        },
        required: ["migration_file"],
      },
      handler: async (input: { migration_file: string }) => {
        try {
          const fullPath = path.join(process.cwd(), input.migration_file);
 
          // Security: prevent directory traversal
          const normalized = path.normalize(fullPath);
          const baseDir = path.join(process.cwd(), "migrations");
          if (!normalized.startsWith(baseDir)) {
            return {
              success: false,
              error:
                "Access denied: migrations must be in migrations/ directory",
            };
          }
 
          const sql = fs.readFileSync(fullPath, "utf-8");
 
          // Execute migration
          await pool.query(sql);
 
          // Record migration
          const migrationName = path.basename(input.migration_file);
          await pool.query(
            "INSERT INTO migrations (name) VALUES ($1) ON CONFLICT (name) DO NOTHING",
            [migrationName],
          );
 
          return {
            success: true,
            message: `Migration ${migrationName} executed successfully`,
          };
        } catch (error) {
          return {
            success: false,
            error: error.message,
          };
        }
      },
    },
  ];
}

Now register it in your session setup:

typescript
// In setupSession() function
const allTools = [
  ...createDatabaseTools(pool),
  ...createIntrospectionTools(pool),
  ...createAnalysisTools(pool),
  ...createMigrationTools(pool), // Add this line
];

Restart the CLI, and Claude immediately knows about migration tools. Ask: "What migrations have been applied?" Claude calls list_migrations. Ask "Run the migration files in the migrations directory" and Claude iterates through files and applies them safely.

This extensibility means you can build a tool that grows with your needs. Start with basic query tools. Add schema introspection. Add migration management. Add backup/restore tools. Add performance analysis. Build it piece by piece.

Testing Your CLI Tool

Before deploying a CLI to users, test it thoroughly:

typescript
// tests/dbcli.test.ts
import { setupSession } from "../src/session";
 
describe("Database CLI", () => {
  let session: any;
 
  beforeAll(async () => {
    session = await setupSession({
      apiKey: process.env.ANTHROPIC_API_KEY || "",
      dbUrl: process.env.DATABASE_URL || "",
    });
  });
 
  test("should list tables", async () => {
    const response = await session.message("List all tables");
    expect(response.text).toContain("table");
  });
 
  test("should describe table schema", async () => {
    const response = await session.message("Describe the users table");
    expect(response.text).toContain("column");
  });
 
  test("should handle errors gracefully", async () => {
    const response = await session.message(
      "Describe the nonexistent_table_xyz",
    );
    expect(response.text.toLowerCase()).toContain("not found");
  });
 
  test("should count rows", async () => {
    const response = await session.message("How many users exist?");
    expect(response.text).toMatch(/\d+/);
  });
});

Run npm test to validate the CLI works as expected. This becomes your regression test suite. Every time you change prompts or add tools, these tests verify nothing breaks.

Why This Pattern Scales: The Underlying Principles

The AI-powered CLI pattern scales because it applies separation of concerns at the right level:

  1. I/O layer (CLI) handles user interaction, formatting, history
  2. Tool layer defines what can be done (bounded, safe operations)
  3. Intelligence layer (Claude) decides what to do given a goal

Each layer can scale independently:

  • The tool layer scales by adding more domain-specific tools
  • The intelligence layer scales by using better models or better prompts
  • The I/O layer scales by adding plugins (Slack bot, web UI, GraphQL API)

The same tools work in a Slack bot as in the CLI. The same tools work with Claude as with a different LLM. This loose coupling is architectural maturity.

Understanding the Power of Declarative Tool Design

What makes the tool layer so powerful is that you're declaring what's possible rather than implementing workflows. Instead of writing hard-coded logic that says "if the user asks about tables, do X; if they ask about columns, do Y," you're saying "here are all the things that are possible." Claude figures out the combinations.

This is liberating because new use cases emerge without code changes. A user asks "compare schema between two databases" and Claude chains together describe_table, get_foreign_keys, and get_indexes tools to accomplish it—a combination you never explicitly coded.

This declarative approach compounds over time. After you've built 10 tools, new ones are faster because you understand the patterns. After 20, you can add tools in minutes. Your system becomes more capable not through more code, but through smarter composition of existing capabilities.

Advanced Session Management: Conversation Continuity

The session object in our example is stateless from Claude's perspective, but stateful in conversation terms. Each message is a turn in a conversation. Claude remembers the previous turns and uses them for context. This conversational continuity is why the CLI feels natural.

Let's explore how to leverage this more deeply:

typescript
// Multi-turn conversation example
async function runInteractiveSession(session: ClaudeCodeSession) {
  console.log("Starting multi-turn session...\n");
 
  // Turn 1: User investigates the schema
  const turn1 = await session.message(
    "List all tables and give me a brief description",
  );
  console.log("Claude:", turn1.text);
 
  // Turn 2: Claude remembers the tables from turn 1
  const turn2 = await session.message("Which one has the most rows?");
  console.log("Claude:", turn2.text);
 
  // Turn 3: Claude uses context from both previous turns
  const turn3 = await session.message("Analyze data quality for that table");
  console.log("Claude:", turn3.text);
 
  // The session history automatically accumulated. Claude has the full context.
  // No need to repeat information.
}

This multi-turn pattern is why AI-powered CLIs are so much more efficient than traditional ones. You don't have to specify complete instructions every time. You build context naturally through conversation.

Production Patterns: Observability and Monitoring

When you deploy an AI-powered CLI to production, you need visibility into what's happening. Claude is making decisions about which tools to use. Those decisions should be logged and monitored:

typescript
// Add detailed logging for production usage
session.onBeforeToolCall((toolCall) => {
  console.log(`[AUDIT] Tool: ${toolCall.name}`);
  console.log(`[AUDIT] Input:`, JSON.stringify(toolCall.input, null, 2));
});
 
session.onAfterToolCall((toolCall, result) => {
  console.log(`[AUDIT] Result:`, JSON.stringify(result, null, 2));
 
  // Track metrics
  metrics.incrementCounter("tool_calls", { tool: toolCall.name });
  metrics.recordDuration("tool_execution", toolCall.duration);
});
 
session.onError((error) => {
  console.error(`[ERROR] ${error.message}`);
  metrics.incrementCounter("errors", { type: error.name });
});

In production, these logs go to a centralized logging system. You can set up alerts: "if a user runs more than 100 tool calls in one session, investigate." Or "if the same error happens 5 times, page the on-call engineer."

This observability is what transforms an experimental CLI into a production tool that you can trust and reason about.

Handling Ambiguity and Natural Language Edge Cases

Claude is smart but not telepathic. Users will ask ambiguous questions. A good CLI handles this gracefully. For the database CLI, you might have queries that match multiple possible interpretations.

typescript
// Handle ambiguous requests
session.setSystemPrompt(`
You are a database assistant. When the user's request is ambiguous:
1. Ask clarifying questions BEFORE executing tools
2. If you can infer intent from context, do so but confirm
3. Never make destructive assumptions (DELETE vs SELECT)
4. For potentially expensive operations (full table scans), warn the user first
 
Examples of ambiguous requests and how to handle:
- "Show me the orders" → Ask: How many? Sorted by what?
- "Delete old records" → Ask: Define "old". How many will this affect?
- "Find duplicates" → Ask: Duplicates by which column(s)?
`);

This system prompt teaches Claude to be helpful but cautious. It prevents "oops, I dropped the table" moments by building in safety checks.

Extending the CLI: Adding Plugins and Custom Tools

As your CLI evolves, you'll want users to add custom tools. Here's how to support that:

typescript
// plugin-loader.ts
import * as fs from "fs";
import * as path from "path";
 
export async function loadCustomTools(pluginDir: string): Promise<any[]> {
  const tools: any[] = [];
 
  if (!fs.existsSync(pluginDir)) {
    return tools;
  }
 
  const files = fs.readdirSync(pluginDir).filter((f) => f.endsWith(".js"));
 
  for (const file of files) {
    try {
      const modulePath = path.join(pluginDir, file);
      const module = await import(modulePath);
 
      if (module.createTools && typeof module.createTools === "function") {
        const customTools = module.createTools();
        tools.push(...customTools);
        console.log(`✓ Loaded plugin: ${file}`);
      }
    } catch (error) {
      console.warn(`✗ Failed to load plugin ${file}: ${error.message}`);
    }
  }
 
  return tools;
}
 
// Usage in main setup
const allTools = [
  ...createDatabaseTools(pool),
  ...createIntrospectionTools(pool),
  ...createAnalysisTools(pool),
  ...(await loadCustomTools("./.claude/plugins")),
];

Users can now drop .js files in a plugins directory, and they're automatically loaded and available to Claude. This transforms your CLI from a fixed tool into a platform.

The Critical Importance of Error Handling in Agent-Based Tools

When Claude is operating tools on your behalf, error handling becomes paramount. In traditional CLIs, when a command fails, you're there to see it and respond. In agent-based CLIs, Claude has to interpret the error and decide what to do next. This means your error messages need to be exceptionally clear and actionable. A cryptic database error that a human DBA could interpret in seconds might confound Claude and lead to incorrect recovery attempts.

Consider this scenario: Claude runs a query and gets back a PostgreSQL error like "relation "users" does not exist". A human immediately understands: the table doesn't exist, maybe they're on the wrong schema. Claude, without proper context, might interpret this as a transient error and retry, or it might give up entirely. Better practice is to enhance the error with context. Return something like: "Error: Table 'users' not found in schema 'public'. Available tables: [list]. Did you mean 'user_accounts'?" This transforms the error from a puzzle into actionable information.

Similarly, when tools have side effects (modifications), you need to be extremely careful. If Claude runs a DELETE query and something goes wrong, you want to know exactly what happened and whether the operation succeeded or failed. Always return detailed information about the operation: rows affected, what was actually deleted, confirmation of the operation. This transparency builds confidence in the tool.

The principle extends to timeout handling. Long-running operations can cause problems. If Claude runs a query that takes 30 seconds, should it wait? What if there's a network hiccup at second 25? Design your tools with reasonable timeouts and make those timeouts explicit to Claude. If a tool times out, it should return a clear timeout error, not a cryptic connection reset message.

Pagination is another critical consideration. If a query returns 100,000 rows, returning all of them in one response wastes tokens and time. Design your tools to return paginated results. Let Claude ask for the next page if it needs more data. This makes interactions faster and cheaper.

Designing Tools for Claude's Decision-Making

Understanding how Claude uses tools changes how you should design them. Claude needs clear semantics. When you define a tool, the description and parameters are Claude's only guide to whether and how to use it. Vague descriptions lead Claude astray. A tool described as "execute SQL" is ambiguous. Does it support DDL? DML? Subqueries? Better: "Execute SELECT queries only. Returns result set with row count and field names. Does not support INSERT, UPDATE, DELETE, or DDL operations."

Parameters should be as specific as possible. If a parameter can only take certain values, use enums. If a number has bounds, specify them. These constraints aren't just documentation—they're guardrails that help Claude use the tool correctly. When a parameter is an enum with five options, Claude picks the right one. When a parameter is an open string, Claude might try values you didn't intend.

Tool naming matters too. Names should be verbs: execute_query, list_tables, describe_table, analyze_table. Not "query", not "tables", not "table_info". The full name tells Claude what the tool does. Related tools should have consistent naming: list_tables, list_indexes, list_columns. Consistency helps Claude group related operations.

Documentation in tool descriptions is where you guide Claude's behavior. Use the description field to explain not just what the tool does, but when to use it. "Get schema information for a table. Use this BEFORE executing queries on that table to understand structure." This helps Claude sequence operations correctly.

Real-World Deployment Strategies

Authentication and Credential Management in Agent-Based CLIs

One of the most critical aspects of building agent-based CLIs is credential management. Your CLI will have access to databases, APIs, and other sensitive systems. Claude Code will be invoking operations on these systems. This creates significant security responsibilities. You absolutely cannot let credentials leak into logs, into Claude's response, or into git history.

The principle is simple: credentials should be environment-based, never hardcoded, and stripped from all output. When Claude makes a request using database credentials, those credentials stay internal. They never appear in the response Claude gives to the user. If Claude tries to print credentials (which it might do while debugging), your tools intercept and redact that output. This is non-negotiable.

For local development, use environment variables and .env files that are gitignored. For production, use secret management services (AWS Secrets Manager, HashiCorp Vault, Azure Key Vault). For CI/CD systems, use provider-native secret management. Everywhere, follow these rules: (1) Read credentials from the environment, never from config files, (2) Redact credentials from logs and responses, (3) Never pass credentials as arguments to tools (use environment context instead), (4) Use parameterized queries to prevent SQL injection, (5) Implement rate limiting and approval gates on dangerous operations.

The challenge with agent-based CLIs is that Claude is making autonomous decisions about what to do. You're trusting Claude with your tools. This requires not just confidence in Claude's safety but also your own guardrails. Implement approval workflows for destructive operations. Rate limit rapid deletions or modifications. Log everything so you have an audit trail of what Claude did and when. If something goes wrong, you can understand exactly what happened.

Different deployment scenarios require different approaches:

Docker for teams: Package the CLI in a Docker image with all dependencies baked in. Teams run docker run dbcli with environment variables. Portable, reproducible, no Node.js dependency on the host.

NPM for developers: Publish to NPM for maximum reach. Developers run npm install -g dbcli. Easiest for adoption, but requires Node.js on the host.

Binary executable: Use pkg or similar to create a standalone executable. No runtime dependency. Single file. Best distribution model for non-technical users.

Web UI wrapper: The same tools work with a web frontend. JavaScript runs the same session in the browser, hits the same tools, same backend. Multi-platform without extra code.

The architecture you've built today supports all of these deployment models without changes. Different UIs, same tools, same intelligence. That's architectural power.

Measuring Success: Metrics for AI-Powered CLIs

Track these metrics to understand how your CLI is actually being used:

  • Tool usage distribution: Which tools do users invoke most? Which are never used?
  • Multi-turn conversation depth: Are users asking follow-up questions or one-shot queries?
  • Error recovery: When Claude hits an error, does it recover? Does the user help it?
  • Time to task completion: How long does a typical interaction take?
  • Claude efficiency: How many tool calls to solve a problem? Can we reduce it?

These metrics reveal gaps. If a tool is never used, maybe it's named poorly or documented poorly. If conversation depth is shallow, maybe users don't understand they can ask follow-ups. Use metrics to improve the product over time.

Common Pitfalls and How to Avoid Them

Building agent-based CLIs is rewarding but fraught with subtle pitfalls. Understanding these patterns helps you avoid them.

Pitfall 1: Ambiguous Tool Descriptions

You define a tool with vague description: "Query the database." Claude sees this and assumes it can do anything—subqueries, complex joins, modifications. Then you're shocked that Claude tried a DELETE query when you only meant to allow SELECT. The fix is obvious but often overlooked: be explicit. Describe exactly what the tool does, what it accepts, what it returns. Better description: "Execute SELECT queries only. Returns result set with field names and row count. Supports WHERE clauses, JOINs, and subqueries. Timeout is 30 seconds."

Pitfall 2: Silent Failures

Your tool runs a query, gets an error, and returns a generic error message. Claude can't interpret it, so it tries the same query again. It tries variations. After five attempts, it gives up and tells the user "I couldn't access the database." The user has no idea what went wrong. Better practice: return detailed error context. "Error executing query: syntax error at line 3, column 17. Expected column name, found 'xyz'. Available columns in table 'users': id, name, email, created_at, updated_at."

Pitfall 3: Tools That Are Too Powerful

You give Claude a tool that can execute arbitrary SQL. Claude is smart, but it makes mistakes. It runs a query that takes 60 seconds because it didn't realize there's a missing index. It locks the production database. It performs an expensive full table scan. The fix: constrain tools to specific, safe operations. Instead of "execute arbitrary SQL," provide "analyze table for missing indexes" or "get record count for table." High-level operations that hide dangerous details.

Pitfall 4: Forgetting About Token Cost

Imagine Claude runs a query that returns 50,000 rows of detailed data. It sends all of that back in the response, and you're amazed at the token bill. Even worse, the large response makes Claude's thinking slower and more expensive. The fix: paginate aggressively. Return the first 100 rows, not all rows. Let Claude ask for more if needed. Return only the columns Claude requested, not all columns.

Pitfall 5: Tool Interdependencies

You design tools in isolation. Tool A returns IDs. Tool B needs those IDs as input. But you didn't think about the workflow. Claude has to figure out the sequence. It might get it wrong, wasting attempts. Better practice: design tools as a coherent system. Understanding the expected workflows means you can provide tool configurations that guide Claude in the right direction.

Pitfall 6: Insufficient Error Handling in Credentials

Your tool accidentally logs the database password when debugging. Now it's in the logs. It might be in Claude's response. Definitely a breach. The fix: every tool that touches credentials needs explicit redaction. Strip secrets from error messages. Test that credentials never leak. Make credential handling paranoid—assume every tool will be debugged and logged.

Pitfall 7: Deployment Without Telemetry

You deploy your CLI and have no idea how it's being used. Is it working? Are there errors? Are users confused? The fix: instrument everything. Log tool invocations, results, errors, and timing. Collect metrics on usage patterns. These measurements tell you what's working and what needs improvement.

Real-World Extensions: Beyond Databases

The pattern you've learned—wrapping domain tools with Claude—works for any system:

Kubernetes CLI: Tools for managing deployments, viewing logs, checking resources. "Deploy the new version of payment service to staging."

AWS CLI: Tools for creating resources, managing IAM, checking costs. "Show me all databases that aren't encrypted and create a migration plan to encrypt them."

Infrastructure CLI: Tools for provisioning, monitoring, debugging. "What's causing the latency in the payment service and how do I fix it?"

Git CLI: Tools for branch management, commit history, conflict resolution. "Rebase my feature branch and resolve any conflicts automatically."

Each follows the same pattern: define tools, wrap with Claude, add natural language interface. The tools encapsulate domain knowledge. Claude orchestrates them intelligently.

The Future of CLIs

Traditional CLIs are based on fixed commands: ls, grep, sort. You invoke the right command with the right flags. It's powerful for experts but intimidating for newcomers.

AI-powered CLIs are different. You describe what you want in natural language. Claude figures out which tools to use, how to combine them, and how to present the results. It's more accessible for beginners while remaining powerful for experts.

As Claude and similar models improve, expect this pattern to become standard. Intelligent CLIs for system administration, cloud management, data engineering—all follow this same foundation.


-iNet

Practical tools for modern engineering

Need help implementing this?

We build automation systems like this for clients every day.

Discuss Your Project