August 11, 2025
Claude AI DevOps Development

Building Custom GitHub Actions with Claude Code SDK

What if your GitHub Actions workflow could be intelligent? Not just run commands and check exit codes, but actually think through problems, suggest improvements, and adapt to your specific codebase without hardcoding every scenario?

That's the promise of combining the Claude Code SDK with GitHub Actions. In this guide, you'll learn how to package Claude-powered logic into reusable, publishable GitHub Actions that can run in anyone's workflow. We'll move beyond simple shell scripts and build actions that leverage AI reasoning to solve real problems in your CI/CD pipeline.

Whether you're automating code review suggestions, generating intelligent test cases, or building context-aware deployment actions, the techniques here will help you create actions that feel genuinely smart—not just scripted.

Table of Contents
  1. Why Custom GitHub Actions with AI?
  2. Understanding the GitHub Actions Ecosystem
  3. The High-Level Architecture
  4. Building a Code Review Action
  5. Step 1: Create the Action Metadata
  6. Step 2: Implement the Action Logic
  7. Issues Found: ${review.issues.length}
  8. Step 3: Build and Package
  9. Common Pitfalls When Building AI-Powered Actions
  10. Understanding Token Economics
  11. Advanced Patterns
  12. Pattern 1: Progressive Enhancement
  13. Pattern 2: Multi-Step Analysis
  14. Pattern 3: Iterative Refinement
  15. Real-World Usage Example
  16. Handling Cost and Rate Limits
  17. Publishing Your Action
  18. Testing Your Action
  19. Future Enhancements
  20. How It Works Under the Hood: The Real Process
  21. Handling Failure Gracefully
  22. Real-World Scenario: Preventing a Security Bug
  23. Comparison with Alternatives
  24. Production Considerations
  25. Extending Your Action
  26. Troubleshooting Guide
  27. Team Adoption Strategy
  28. Advanced Techniques for Scale: Batching and Caching
  29. Common Mistakes When Deploying to Production
  30. Understanding Success: What Good Deployment Looks Like
  31. Conclusion

Why Custom GitHub Actions with AI?

Before we dive into the how, let's talk about the why. GitHub Actions are powerful, but they're traditionally limited by their logic: if this file changed, run that command. If tests fail, send a notification. You're working with conditions, not reasoning.

Claude Code SDK changes that equation. You can now:

  • Analyze real code context instead of relying on file patterns. Your action understands what the code does, not just what files match a regex.
  • Generate personalized suggestions that account for your specific codebase, conventions, and goals.
  • Detect subtle issues that rule-based checkers miss—security vulnerabilities, performance problems, API misuse.
  • Adapt to your workflow instead of forcing you to adapt to the action. The action learns your style and reinforces consistency.
  • Make intelligent tradeoff decisions where multiple approaches exist and picking the right one requires reasoning about your context.

Consider what a traditional GitHub Action does. You write a script that checks for specific patterns. The script runs the same way for every repository. It can't understand your project's specific architecture, conventions, or goals. You end up writing lots of custom logic on top of generic actions, duplicating work across projects.

Claude Code SDK flips this. Your action understands context. It reads your code, your configuration, your history. It asks itself questions: "What's this project trying to do? What are the main risks here? What would improve quality without slowing developers down?" It's not a tool that does what you tell it—it's an assistant that understands your intent and helps you achieve it more effectively.

Understanding the GitHub Actions Ecosystem

Before building, let's understand how GitHub Actions work and where the SDK fits in. GitHub Actions have three main components:

Workflows are YAML files in .github/workflows/ that define when your action runs and what it does. They're the orchestration layer—they say "on every pull request, run code review" or "when tests fail, post a summary."

Actions are reusable units of work. They can be Docker-based, JavaScript-based, or composite actions that orchestrate other actions. Your action gets inputs (like which files to analyze) and produces outputs (like suggestions or reports). Actions are the building blocks of workflows.

Runners are the machines that execute your workflow. GitHub provides managed runners (fast, ephemeral), or you can self-host for custom hardware or security requirements.

Most GitHub Actions you'll find are relatively simple: they run a few commands, parse output, and post results. They work great for deterministic tasks like running linters or deploying infrastructure. But they struggle with nuanced analysis that requires understanding context and making judgment calls. That's where intelligent actions powered by Claude Code come in.

This is where the Claude Code SDK shines. It gives your GitHub Action access to real AI reasoning. Your action can become truly intelligent. Instead of checking if a file contains a hardcoded password by regex, it can understand the code's intent and ask: "This looks like it's trying to use authentication. Is the secret being handled securely?" That's reasoning, not pattern matching.

The High-Level Architecture

Here's how we'll structure an AI-powered GitHub Action:

  1. Trigger: The workflow fires (pull request opened, code pushed, etc.)
  2. Gather context: Your action reads files from the repository, examines the PR, checks the commit history
  3. Send to Claude: Use the SDK to send code and context to Claude API
  4. Receive analysis: Claude returns structured analysis (issues found, suggestions, recommendations)
  5. Post results: Your action posts results back to the PR, commits suggestions, or triggers downstream actions

The key insight: your action becomes a bridge between the repository and Claude's reasoning capability. The repository provides context (code, history, configuration), Claude provides analysis, and the action makes it actionable in your workflow.

Building a Code Review Action

Let's build a concrete example: an intelligent code review action that analyzes pull requests and posts suggestions. This will be your foundation for more complex actions.

Step 1: Create the Action Metadata

First, create action.yml with your action's interface:

yaml
name: "AI Code Review"
description: "Intelligent code review using Claude"
inputs:
  files-to-review:
    description: "Glob pattern for files to review"
    required: false
    default: "src/**/*.ts"
  anthropic-api-key:
    description: "API key for Claude"
    required: true
outputs:
  review-summary:
    description: "Summary of review findings"
runs:
  using: "node20"
  main: "dist/index.js"

This metadata tells GitHub how to call your action, what inputs it expects, and what outputs it provides. The runs.main field points to your compiled JavaScript.

Step 2: Implement the Action Logic

Create src/index.ts with the actual review implementation:

typescript
import * as core from "@actions/core";
import * as github from "@actions/github";
import Anthropic from "@anthropic-ai/sdk";
import * as fs from "fs";
import * as path from "path";
import { glob } from "glob";
 
async function main() {
  try {
    // Get inputs
    const filePattern = core.getInput("files-to-review");
    const apiKey = core.getInput("anthropic-api-key");
 
    // Find files to review
    const files = await glob(filePattern, {
      cwd: process.env.GITHUB_WORKSPACE,
    });
 
    if (files.length === 0) {
      core.info("No files matching pattern found");
      return;
    }
 
    // Read file contents
    const fileContents: Record<string, string> = {};
    for (const file of files.slice(0, 10)) {
      // Limit to 10 files for demo
      const fullPath = path.join(process.env.GITHUB_WORKSPACE, file);
      fileContents[file] = fs.readFileSync(fullPath, "utf-8");
    }
 
    // Get PR context if available
    const prNumber = github.context.payload.pull_request?.number;
    const prTitle = github.context.payload.pull_request?.title;
    const prBody = github.context.payload.pull_request?.body;
 
    // Initialize Claude client
    const client = new Anthropic({ apiKey });
 
    // Build review prompt
    const prompt = `You are reviewing code changes in a pull request.
 
PR Title: ${prTitle}
PR Description: ${prBody}
 
Files being reviewed:
${Object.entries(fileContents)
  .map(
    ([file, content]) => `\n## ${file}\n\`\`\`typescript\n${content}\n\`\`\``,
  )
  .join("\n")}
 
Provide a structured code review covering:
1. **Bugs and Issues**: Actual problems that would cause incorrect behavior
2. **Performance**: Operations that could be slow or resource-intensive
3. **Best Practices**: Patterns that don't follow TypeScript/JavaScript conventions
4. **Security**: Potential vulnerabilities
5. **Suggestions for Improvement**: Ways to make the code better
 
Format your response as JSON with this structure:
{
  "summary": "Brief overview of the review",
  "issues": [
    {
      "severity": "critical|high|medium|low",
      "type": "bug|performance|best-practice|security|improvement",
      "file": "path/to/file.ts",
      "line": 42,
      "message": "Description of the issue",
      "suggestion": "How to fix it"
    }
  ],
  "overallScore": 85,
  "recommendation": "What the team should do next"
}`;
 
    // Call Claude API
    const message = await client.messages.create({
      model: "claude-3-5-sonnet-20241022",
      max_tokens: 2048,
      messages: [
        {
          role: "user",
          content: prompt,
        },
      ],
    });
 
    // Parse response
    const responseText =
      message.content[0].type === "text" ? message.content[0].text : "";
 
    // Extract JSON from response
    const jsonMatch = responseText.match(/\{[\s\S]*\}/);
    if (!jsonMatch) {
      throw new Error("Could not parse Claude response");
    }
 
    const review = JSON.parse(jsonMatch[0]);
 
    // Post review as PR comment
    if (prNumber && github.context.repo.owner && github.context.repo.repo) {
      const octokit = github.getOctokit(
        core.getInput("github-token") || process.env.GITHUB_TOKEN || "",
      );
 
      const comment = `## AI Code Review
 
${review.summary}
 
### Issues Found: ${review.issues.length}
 
${review.issues
  .map(
    (issue: any) =>
      `- **[${issue.severity.toUpperCase()}]** ${issue.message}\n  - File: \`${issue.file}:${issue.line}\`\n  - Suggestion: ${issue.suggestion}`,
  )
  .join("\n\n")}
 
**Overall Score:** ${review.overallScore}/100
 
**Next Steps:** ${review.recommendation}`;
 
      await octokit.rest.issues.createComment({
        owner: github.context.repo.owner,
        repo: github.context.repo.repo,
        issue_number: prNumber,
        body: comment,
      });
 
      core.info(`Posted review comment with ${review.issues.length} issues`);
    }
 
    // Set output
    core.setOutput("review-summary", review.summary);
  } catch (error) {
    core.setFailed(
      `Action failed: ${error instanceof Error ? error.message : String(error)}`,
    );
  }
}
 
main();

This is doing real work: it's reading actual code files, sending them to Claude with context about the PR, and posting structured feedback back to the PR. Developers get intelligent feedback right where they're working.

Step 3: Build and Package

Create a GitHub Actions workflow to build your action:

yaml
# .github/workflows/build-action.yml
name: Build Action
on:
  push:
    branches: [main]
    paths: ["src/**", "package.json"]
 
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
 
      - name: Setup Node
        uses: actions/setup-node@v4
        with:
          node-version: "20"
 
      - name: Install dependencies
        run: npm install
 
      - name: Build
        run: npm run build
 
      - name: Package
        run: npm run package
 
      - name: Commit dist changes
        run: |
          git config user.name "GitHub Action"
          git config user.email "action@github.com"
          git add dist/
          git commit -m "Build action" || true
          git push

Common Pitfalls When Building AI-Powered Actions

Before we move to advanced patterns, let's talk about the mistakes teams make when building their first intelligent GitHub Actions. Understanding these pitfalls will save you weeks of debugging.

Pitfall 1: Sending Too Much Context

Your first instinct is to send everything: the entire codebase, all git history, all PR comments. This seems smart—more information should mean better analysis, right? Wrong. You'll hit token limits, slow down your action, and burn through API budget. Claude doesn't need to read your entire codebase to review a PR. It needs the files being changed, the PR description, and maybe recent relevant history.

The fix: Be selective. Implement file size limits. Skip generated files, dependencies, and large assets. If a file is over 10KB, summarize it instead of sending the whole thing. Your action becomes faster and cheaper without sacrificing quality.

Pitfall 2: Ignoring Action Timeouts

GitHub Actions have execution limits. Your job can run for at most six hours. If your code review action hangs waiting for the Claude API, it wastes runners and blocks other CI checks. Also, API calls sometimes fail. Networks are unreliable. Your action needs to handle timeouts gracefully.

The fix: Set explicit timeouts on API calls. Implement retry logic with exponential backoff. If the review can't complete in time, post what you have and let it fail gracefully rather than timing out completely. Better to provide partial feedback than no feedback.

Pitfall 3: Forgetting to Handle Rate Limits

Claude API has rate limits. If your team runs twenty PRs concurrently, all trying to call Claude, some will get rate-limited. You need to handle this.

The fix: Implement queue-based execution. If you hit a rate limit, queue the review for later instead of failing immediately. Add backoff headers to respect rate limit information from the API. Consider batching reviews if many PRs are pending.

Pitfall 4: Parsing JSON Fragily

Claude returns natural language. You ask it to return JSON, and sometimes it does. Sometimes it wraps the JSON in markdown code blocks. Sometimes it returns JSON-like-but-not-quite text. Your parsing needs to be defensive.

The fix: Look for JSON patterns robustly. Strip markdown code blocks. Validate the parsed object against a schema. If parsing fails, ask Claude to try again with stricter formatting instructions.

Pitfall 5: Not Validating Against Actual Code

Your action generates suggestions. But what if the line numbers are wrong? What if the file has changed since you started analyzing? What if the code sample Claude generated doesn't actually compile?

The fix: Before posting suggestions, validate them. Run tests for suggested code changes. Check that line numbers still exist. Cross-reference against the actual codebase. You'll catch a lot of Claude hallucinations this way.

Understanding Token Economics

Every API call to Claude costs money in tokens. You need to understand your token budget and optimize within it. Let's talk about the economics of running code review at scale.

A typical PR with 5 changed files, each 50 lines, is roughly 2,500 input tokens. Claude's response is maybe 1,000 tokens. At current pricing, that's around $0.01 per review. If your team does 100 PRs a week, that's $100 a week or roughly $5,000 a year.

Sounds cheap, but it adds up. Multiply by multiple teams, multiple projects, and you're looking at real money. This is why selective context matters. If you can cut input tokens in half through better selection, you cut costs in half.

But here's the thing: $5,000 a year to prevent one production incident is a steal. Production incidents cost thousands per hour. If your code review action prevents a single critical bug from reaching production, it pays for itself a hundred times over.

The key insight: don't optimize for cost at the expense of quality. Optimize for maximum value per token spent. Which means smart context selection, not minimal context.

Advanced Patterns

The basic code review action demonstrates the pattern, but real-world scenarios get more complex. Let's explore some advanced techniques that make your actions smarter and more efficient.

Pattern 1: Progressive Enhancement

Your action should gracefully handle varying levels of detail in code context. With more context, Claude gives better analysis. With less context, it still works.

typescript
async function gatherContext() {
  const context = {
    files: [] as string[],
    gitHistory: "",
    dependencies: {} as Record<string, string>,
    recentCommits: [] as string[],
  };
 
  // Always gather files
  context.files = await findChangedFiles();
 
  // Optionally gather git history if available
  try {
    context.gitHistory = await exec("git log --oneline -20");
  } catch {
    core.debug("Could not get git history");
  }
 
  // Optionally read package.json
  try {
    const pkg = JSON.parse(fs.readFileSync("package.json", "utf-8"));
    context.dependencies = pkg.dependencies || {};
  } catch {
    core.debug("Could not read package.json");
  }
 
  return context;
}

This approach lets your action work in any environment but take advantage of richer context when available. The action degrades gracefully—it still provides value even in constrained environments.

Pattern 2: Multi-Step Analysis

For complex reviews, break the analysis into steps. Claude reasons better with focused questions than with one massive prompt.

typescript
async function reviewCode(code: string) {
  // Step 1: Identify the code's purpose
  const purposeAnalysis = await claude.messages.create({
    model: "claude-3-5-sonnet-20241022",
    max_tokens: 512,
    messages: [
      {
        role: "user",
        content: `What does this code do? Summarize its purpose in one sentence.\n\n${code}`,
      },
    ],
  });
 
  const purpose = purposeAnalysis.content[0];
 
  // Step 2: Check for issues with that purpose in mind
  const issueAnalysis = await claude.messages.create({
    model: "claude-3-5-sonnet-20241022",
    max_tokens: 1024,
    messages: [
      {
        role: "user",
        content: `This code is supposed to: ${purpose}\n\nList any bugs, security issues, or performance problems:\n\n${code}`,
      },
    ],
  });
 
  // Step 3: Suggest improvements
  const improvements = await claude.messages.create({
    model: "claude-3-5-sonnet-20241022",
    max_tokens: 1024,
    messages: [
      {
        role: "user",
        content: `How could this code be improved while maintaining its purpose (${purpose})?\n\n${code}`,
      },
    ],
  });
 
  return {
    purpose,
    issues: issueAnalysis.content[0],
    improvements: improvements.content[0],
  };
}

Multi-step analysis is slower but more accurate. It's better when quality matters more than speed. The trade-off is clear: you're spending more API calls for better analysis. That's often worth it.

Pattern 3: Iterative Refinement

Your action can refine its own analysis by asking Claude follow-up questions.

typescript
async function askFollowUp(
  initialAnalysis: string,
  question: string,
  code: string,
) {
  const response = await client.messages.create({
    model: "claude-3-5-sonnet-20241022",
    max_tokens: 512,
    messages: [
      {
        role: "user",
        content: `Initial analysis: ${initialAnalysis}\n\nCode:\n${code}\n\nFollow-up question: ${question}`,
      },
    ],
  });
 
  return response.content[0].type === "text" ? response.content[0].text : "";
}

Use this when you need to clarify something from the initial analysis or dive deeper into a specific issue. This pattern is powerful for complex problems where surface-level analysis isn't enough.

Real-World Usage Example

Here's how someone would use your action in their workflow:

yaml
name: Review PR
on: [pull_request]
 
jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0
 
      - name: AI Code Review
        uses: your-org/ai-code-review@v1
        with:
          files-to-review: "src/**/*.ts"
          anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}

The action runs, analyzes the code, and posts a review comment. Developers see AI-generated suggestions right in their PR, just like a code review from a senior engineer.

Handling Cost and Rate Limits

Claude API calls cost money. Your action will make API calls for every PR. Consider these cost optimization strategies:

Caching: Don't re-analyze unchanged code.

typescript
const cacheKey = crypto.createHash("md5").update(code).digest("hex");
if (cache.has(cacheKey)) {
  return cache.get(cacheKey);
}

Sampling: Review only a subset of files for PRs that touch many files.

typescript
const files = await glob(pattern);
const sampled = files.slice(0, 10); // Review first 10

Conditional triggers: Only run on certain PR labels or conditions.

yaml
if: |
  contains(github.event.pull_request.labels.*.name, 'needs-review') ||
  github.event.pull_request.draft == false

Cost estimation: Log API usage so you can track spending.

typescript
const usage = message.usage;
const estimatedCost =
  (usage.input_tokens * 0.003 + usage.output_tokens * 0.015) / 1000;
core.info(`Estimated API cost: $${estimatedCost.toFixed(4)}`);

Publishing Your Action

Once you've built and tested your action, publish it to the GitHub Actions Marketplace. The process is simple but important:

  1. Create a action.yml in the root (we did this)
  2. Build everything to the dist/ directory
  3. Tag a release: git tag -a v1.0.0 -m "Initial release"
  4. Push the tag: git push origin v1.0.0

GitHub automatically publishes actions from public repositories with proper metadata.

Testing Your Action

Write tests to verify your action works:

typescript
// __tests__/review.test.ts
import { reviewCode } from "../src/review";
 
describe("Code Review", () => {
  it("should identify bugs", async () => {
    const buggyCode = `
      function add(a, b) {
        return a + b;
        return a - b;  // Unreachable, likely a bug
      }
    `;
 
    const result = await reviewCode(buggyCode);
    expect(result.issues).toContain(
      expect.objectContaining({
        type: "bug",
      }),
    );
  });
 
  it("should suggest performance improvements", async () => {
    const slowCode = `
      const results = data.map(d => d.value).filter(v => v > 0).map(v => v * 2);
    `;
 
    const result = await reviewCode(slowCode);
    expect(result.improvements).toBeDefined();
  });
});

Run tests with npm test in your CI pipeline.

Future Enhancements

Consider these extensions:

Automatic fixes: Have Claude suggest not just changes, but generate code diffs automatically.

Configuration: Let teams customize what Claude looks for via a config file in their repo.

Metrics tracking: Record metrics over time about code quality trends.

Integration with external tools: Combine Claude analysis with linting results or test coverage data.

Multiple language support: Build adapters for Python, Go, Rust, etc.

The foundation you've built here is extensible. Each enhancement builds on the core pattern: gather context, ask Claude, act on the response.

How It Works Under the Hood: The Real Process

Let's talk about what's actually happening when your GitHub Action runs, because understanding the flow helps you debug issues and optimize performance.

When a PR is opened, GitHub fires the pull_request event. Your workflow triggers. The action runs on a GitHub runner (a machine managed by GitHub or self-hosted). The action checks out your code using actions/checkout, which clones the repo and checks out the PR's branch.

Then your code reads the file pattern, finds matching files, reads their contents into memory, and constructs a prompt. This prompt—which might be 50KB of text—goes over HTTPS to Anthropic's API. The API processes your request (which might take 5-30 seconds depending on model load), and returns a response containing the review.

Your action parses the JSON response, formats it as a comment, and posts it back to GitHub using the octokit library. All of this happens within the GitHub Actions runner's execution context, which means you're subject to runner limits: 6-hour job timeout, GitHub-owned runners go away after the job finishes, disk space is limited.

This is important for a few reasons. First, you can't rely on persistent state. Each action run is a fresh machine. If you need to cache something, use GitHub's cache action or external storage.

Second, the execution environment is Linux (unless you choose otherwise). Your scripts should be written with Unix tools. Third, secrets are available as environment variables but are automatically redacted from logs. Use them for API keys, but be aware they'll be hidden in logs if logged.

Fourth, network requests from the runner have some limitations. GitHub runner IPs are published, so if you're calling APIs that have IP whitelisting, you'll need to use those IP ranges.

Understanding this execution model prevents nasty surprises. You can't write to the runner's filesystem and expect it to persist. You can't rely on job isolation being perfect. But you can leverage GitHub's infrastructure to run globally distributed analysis on every PR your team creates.

Handling Failure Gracefully

Real systems fail. Networks drop. APIs time out. Your action needs to be resilient.

The key principle: fail safely. If your code review can't complete for any reason, don't fail the entire CI pipeline unless it's critical. Instead, post a comment saying "Review couldn't complete, please try again" and exit with status 0 (success). This way the PR can still be reviewed manually if needed.

Implement circuit breakers. If the API is returning errors for multiple consecutive requests, stop trying and fail fast. Implement exponential backoff: wait 1 second, then 2, then 4, then 8, capping at 2 minutes. This prevents overwhelming a struggling service.

Log everything that might help debugging. When you call Claude, log the request size, response size, latency, and any errors. Store these metrics somewhere accessible. GitHub Actions has a built-in logging facility; use it.

Finally, add manual override capability. If the automatic review is broken, let developers skip it with a label or comment. Governance should facilitate flow, not block it entirely.

Real-World Scenario: Preventing a Security Bug

Here's a concrete example where this system saved a team actual money. A team built an action that reviews code for security issues. One PR added authentication logic. The developer used bcryptjs but forgot to set the salt rounds—it defaulted to 10, which is slow but not slow enough for password hashing.

The security review action spotted it immediately: "Password hashing uses bcryptjs with default salt rounds (10). Recommendation: increase to 12 or use argon2 instead." The developer fixed it before merge. The action prevented a subtle security weakness from reaching production.

This is a pattern. Intelligent code review catches things that generic linting misses. Not every issue, but the weird edge cases where reasoning matters more than rules.

Comparison with Alternatives

Why build an AI-powered action instead of using traditional static analysis tools? Let's be honest about tradeoffs.

Static Analysis (ESLint, Pylint, etc.): Fast, cheap, no API calls. But they catch pattern-based issues only. They can't understand intent. They have false positives. They miss subtle bugs because bugs are often conceptual rather than syntactic.

Human Code Review: Catches subtle issues, understands intent, can guide architecture. But it's slow, requires skilled reviewers, and doesn't scale. A PR might sit for days waiting for review.

AI Code Review (What We Built): Fast, catches both pattern and concept-based issues, scales infinitely, provides reasoning. But it costs money, can hallucinate, and sometimes misses obvious issues. Also, it requires API access (security consideration).

The right answer: layered approach. Use static analysis for fast feedback (catches obvious issues). Use AI review for intelligent feedback (catches conceptual issues). Use human review for critical decisions (architecture, security design). Each layer adds value without overlapping.

Production Considerations

When you're running this in production across a team, a few things matter:

Cost Budgeting: Set a monthly budget for Claude API calls. Monitor actual spending. If you're over budget, you have options: reduce analysis scope, sample only high-risk PRs, or increase the budget.

Team Communication: Tell your team the action exists and what it does. Some developers will love it. Others will be skeptical. Show examples of issues it caught. Make it part of your culture.

Feedback Loop: Track when developers follow the action's suggestions versus ignore them. If they're ignoring a particular type of suggestion, maybe that suggestion is noise—tune the rules.

Continuous Improvement: Every month, review the action's performance. Is it catching real issues? Is it creating false positives? Are developers finding it valuable? Use their feedback to improve.

Extending Your Action

Once you have the basic pattern working, you can extend it in many directions:

Language Support: Add analysis for Python, Go, Rust, etc. Each language gets an adapter that extracts language-specific information.

Custom Rules: Let teams define their own analysis rules. "Flag any hardcoded API keys," "Ensure all public functions have examples," etc. Store these in a config file.

Integration with Metrics: Pull in code coverage data, performance metrics, security scan results. Ask Claude to synthesize: "This change adds 2% test coverage, fixes 3 security issues, and degrades performance by 5%. Overall: approve."

Scheduled Analysis: Run reviews on a schedule for all recent PRs, not just new ones. Catch old issues in code that was merged before the action existed.

Team Customization: Different teams have different standards. Let each team customize which checks they care about. Some teams want strict performance analysis; others don't care.

Troubleshooting Guide

Action Times Out

Cause: Claude API is slow or your context is too large. Fix: Reduce context size. Skip large files. Implement timeout handling.

API Key Errors

Cause: Secret is not configured correctly or has expired. Fix: Check that ANTHROPIC_API_KEY secret is set in GitHub Actions settings. Verify the key is still valid in your Anthropic console.

JSON Parsing Errors

Cause: Claude's response doesn't match expected format. Fix: Check the prompt in your logs. Make sure you're requesting JSON output explicitly. Add error handling for when Claude returns text instead of JSON.

Rate Limit Errors

Cause: Too many API requests hitting rate limits. Fix: Implement queue-based execution and backoff. Consider using a lower-cost model for high-volume reviews.

Comment Not Posting

Cause: Octokit authentication failed or insufficient permissions. Fix: Verify github-token secret is set. Check that the token has write permissions on pull requests.

Memory Issues

Cause: Reading very large files into memory. Fix: Limit number of files reviewed. Skip huge files. Stream large files instead of reading fully.

Team Adoption Strategy

You've built this amazing action. Now you need your team to actually use it. Here's how to make that happen:

Phase 1: Opt-In (Week 1-2) Add the action to one repository as optional. Post in Slack: "Try the new AI code review—just merge when ready or ask questions." Let people kick the tires.

Phase 2: Soft Default (Week 3-4) Add it to all repos but don't block merges on it. Developers see the review comments but can ignore them. Track usage. Most teams will start adopting.

Phase 3: Mentoring (Week 5-8) When developers ask questions about suggestions, explain the reasoning. "The action suggested this because..." Make it collaborative, not authoritative.

Phase 4: Integration (Week 9+) Make it part of your standard workflow. Update your contribution guide. New developers learn the action is part of the process. It becomes invisible because it's everywhere.

Advanced Techniques for Scale: Batching and Caching

When your team runs dozens of PRs daily, every optimization matters. Let me show you some advanced techniques that make your action efficient at scale.

Request Batching: Instead of making one API call per PR, you can batch analysis. Collect reviews in a queue, then analyze multiple PRs in a single request.

typescript
async function batchAnalyzeReviews(prQueue: PullRequest[]) {
  if (prQueue.length === 0) return;
 
  // Group up to 5 PRs per batch
  const batches = chunk(prQueue, 5);
 
  for (const batch of batches) {
    const prompt = `Analyze these PRs and provide reviews for each:
 
${batch
  .map(
    (pr) => `
PR #${pr.number}: ${pr.title}
Files: ${pr.files.join(", ")}
Description: ${pr.description}
`,
  )
  .join("\n")}
 
Return a JSON object mapping PR numbers to review results.`;
 
    // One API call for multiple PRs
    const result = await claude.messages.create({
      model: "claude-3-5-sonnet-20241022",
      max_tokens: 4096,
      messages: [{ role: "user", content: prompt }],
    });
 
    // Parse and post reviews
    const reviews = parseReviews(result);
    for (const [prNumber, review] of Object.entries(reviews)) {
      await postReview(prNumber, review);
    }
  }
}

Batching reduces API calls by 80% while maintaining review quality. You pay once for five reviews instead of five times for one review.

Response Caching: If you've reviewed similar code recently, cache the analysis.

typescript
const reviewCache = new Map<string, ReviewResult>();
 
function getCacheKey(files: string[], code: string): string {
  return crypto.createHash("sha256").update(files.join(",") + code).digest("hex");
}
 
async function reviewWithCache(files: string[], code: string): Promise<ReviewResult> {
  const key = getCacheKey(files, code);
 
  if (reviewCache.has(key)) {
    console.log("Cache hit for review");
    return reviewCache.get(key)!;
  }
 
  const result = await reviewCode(files, code);
  reviewCache.set(key, result);
  return result;
}

Caching can reduce API calls by 30-40% on real teams where developers often work on similar features.

Progressive Feedback: Post partial results immediately, then update with full analysis.

typescript
async function reviewWithProgressiveFeedback(prNumber: number, files: string[]) {
  // Post initial comment immediately
  const initialComment = `🤖 AI Code Review in progress...`;
  const commentId = await postComment(prNumber, initialComment);
 
  // Start analysis in background
  const analysisPromise = analyzeFiles(files);
 
  // Meanwhile, post quick feedback we can generate without Claude
  const quickChecks = runLocalAnalysis(files);
  await updateComment(
    commentId,
    `## Quick Analysis\n${formatChecks(quickChecks)}\n\n⏳ Running deeper analysis...`,
  );
 
  // Wait for full analysis
  const fullAnalysis = await analysisPromise;
  await updateComment(
    commentId,
    formatFullReview(quickChecks, fullAnalysis),
  );
}

Progressive feedback gives developers immediate signals while you're running expensive analysis. They see quick wins immediately and detailed feedback a few seconds later.

Common Mistakes When Deploying to Production

You've built the action. You've tested locally. You're ready to deploy to your team. Here's where things often go wrong:

Mistake 1: No Rate Limit Handling

Your action works fine in testing. Then 10 PRs open simultaneously. All try to hit Claude. Half get rate-limited. Users are confused why some reviews work and some don't.

Fix this before deploying: implement exponential backoff and queue-based retries. If you hit a rate limit, queue the review and retry automatically.

Mistake 2: Hardcoded Configuration

You hardcode the model name, context limits, and analysis rules. When you want to change them, you have to update code and redeploy.

Instead, store configuration in a file in your repo:

yaml
# .github/claude-review.yml
model: claude-3-5-sonnet-20241022
max_tokens: 2048
file_limit: 10
max_file_size_kb: 50
analysis_rules:
  - security
  - performance
  - best_practices
cost_warning_threshold: 0.05

Then load it in your action:

typescript
const config = yaml.load(fs.readFileSync(".github/claude-review.yml", "utf-8"));

Now teams can customize behavior per repository without touching code.

Mistake 3: No Fallback for API Failures

Claude API goes down. Your action fails. The whole CI pipeline blocks.

Always have a fallback:

typescript
async function reviewCode(code: string): Promise<ReviewResult> {
  try {
    return await analyzeWithClaude(code);
  } catch (error) {
    if (error.status === 429) {
      // Rate limited: queue for retry
      queueForLaterRetry(code);
      return { status: "queued", message: "Review queued. Will retry in 5 minutes." };
    }
 
    if (error.status === 503) {
      // Service unavailable: fail soft
      return { status: "unavailable", message: "Claude service unavailable. Review skipped." };
    }
 
    // Unknown error: fail hard but informatively
    throw new Error(`Review failed: ${error.message}`);
  }
}

Mistake 4: Trying to Review Everything

You set a file pattern that matches 500 files in your large monorepo. The action times out trying to read them all.

Be selective. Skip generated code, dependencies, assets:

typescript
const skipPatterns = [
  "**/node_modules/**",
  "**/dist/**",
  "**/build/**",
  "**/*.min.js",
  "**/*.css",
  "**/*.svg",
  "**/package-lock.json",
];
 
function shouldReviewFile(path: string): boolean {
  if (fs.statSync(path).size > 50000) return false; // Skip huge files
 
  return !skipPatterns.some((pattern) => minimatch(path, pattern));
}
 
const reviewableFiles = files.filter(shouldReviewFile);
if (reviewableFiles.length === 0) {
  core.info("No reviewable files found");
  return;
}

Understanding Success: What Good Deployment Looks Like

When your action is working well in production, you'll see:

  • Every PR gets a review comment within 1-2 minutes
  • Developers are finding the suggestions valuable (not ignoring them)
  • Most reviews take less than $0.01 in API costs
  • You're catching real bugs and security issues
  • Teams are asking you to add more checks

These are signs your action has achieved escape velocity. It's self-sustaining and valuable.

Conclusion

You've now built an intelligent GitHub Action that leverages Claude's reasoning capability. This action can become as sophisticated as you want it to be. Start simple (basic code review), then add complexity (multi-step analysis, iterative refinement, cost optimization) as your needs grow.

The key insight: GitHub Actions are perfect delivery vehicles for AI-powered analysis. They run in your workflows, integrate with your process, and provide a natural way for teams to consume intelligent suggestions. You're not just automating code review—you're elevating it, making it consistent, and making it available to every PR your team creates.

This is the future of CI/CD: not just testing that code works, but reasoning about whether code is good. And that's something only AI reasoning can do at scale.


-iNet

Need help implementing this?

We build automation systems like this for clients every day.

Discuss Your Project