September 18, 2025
Claude Development

Building a Feature Flag Management Assistant

Let's be honest: feature flags are powerful, but they're also a maintenance nightmare. You ship them, they work great, then six months later nobody remembers which flags are still active or which code paths are dead weight. We've all inherited codebases with dozens of stale flags scattered across the project. That's where Claude Code steps in as your feature flag hygiene assistant.

In this article, we're building an automated system that tracks your feature flags across the entire codebase, identifies flags ready for cleanup, and generates safe removal PRs with comprehensive refactoring. This isn't just about finding and deleting flags — it's about understanding the lifecycle of each flag, auditing rollout status, and integrating with platforms like LaunchDarkly so you get a complete picture.

Table of Contents
  1. Why Feature Flag Hygiene Matters
  2. The True Cost of Flag Debt
  3. The Architecture
  4. Phase 1: Indexing Feature Flags
  5. Phase 2: Lifecycle Analysis
  6. Phase 3: Audit Integration
  7. Phase 4: Safe Removal Generation
  8. Phase 5: Putting It Together
  9. Integrating with Claude Code
  10. Real-World Example
  11. Pro Tips and Edge Cases
  12. Advanced: Handling Complex Flag Patterns
  13. Multi-Branch Conditionals
  14. Flag Dependencies
  15. Shadow Implementations
  16. Testing Your Refactorings
  17. Scaling to Production
  18. Real-World Challenges and Solutions
  19. Challenge 1: Feature flags with business implications
  20. Challenge 2: Flags that control A/B tests
  21. Challenge 3: Flags in multiple repositories
  22. Challenge 4: Documentation and audit trails
  23. Integration with your deployment pipeline
  24. The Organizational Gravity of Feature Flags
  25. Measuring success
  26. Understanding the Business Cost of Flag Debt
  27. Integration with Development Workflows
  28. Wrapping up

Why Feature Flag Hygiene Matters

Feature flags solve real problems: you can ship code without shipping features, control rollouts gradually, and kill bad experiments fast. But over time, they become technical debt. Dead flags clutter your code, confuse developers reading logic, and create security holes (what if you forgot to remove a flag that controls admin access?).

The problem is scale. In a 100k-line codebase, you might have 50 flags scattered across 200 files. Manually tracking which ones are safe to remove is tedious and error-prone. You need automation, but the automation has to be smart — it can't just delete flags blindly. It needs to understand the flag's lifecycle, its usage patterns, and the code it guards.

That's what we're building: a Claude Code assistant that becomes your feature flag watchdog.

But here's the deeper issue: flags are a form of technical debt that compounds over time. When you ship a feature flag, you're making a promise to clean it up. Most teams don't. A flag that guards a temporary experiment should live for weeks. A flag controlling a gradual rollout should live for months. But the average flag in production stays around for years, even after the feature is fully rolled out.

The True Cost of Flag Debt

Think about what happens when you leave flags in place. First, there's the cognitive overhead. Every developer reading your code has to understand the conditional logic. "Does this code path execute for all users or only some?" It's a question you have to ask repeatedly. That overhead is invisible but expensive—it slows down comprehension, slows down debugging, slows down refactoring.

Second, there's the testing overhead. If you have 20 active flags in your system, that's potentially 2^20 different code path combinations. You can't test all of them. So you test the common paths and hope the edge cases don't break in production. This is how flag-related bugs sneak through.

Third, there's the security risk. That admin flag you added two years ago to help with migrations? Is it still there? Is the condition actually checking what you think it's checking? Security researchers love finding old flags because they're often forgotten and undefended. We've all seen CVEs that boiled down to "flag X was accidentally left in always-on state."

Fourth, there's the maintenance tax. Every time you refactor code, you have to think about the flags guarding it. Every time you move functions around, you have to ensure the flag conditions still apply. Every time you delete code, you worry: "Is this code actually dead, or is it guarded by a flag I forgot about?" This tax accumulates.

Why? Because removing a flag requires:

  1. Understanding what code paths it guards
  2. Verifying the feature is actually stable
  3. Checking that no users are on the old code path
  4. Running comprehensive tests
  5. Creating a PR that doesn't break anything
  6. Getting approval
  7. Monitoring after deployment

That's a lot of friction. Most teams just leave it. We're automating away that friction.

The Architecture

Here's how the system works at a high level:

  1. Scan Phase: Walk the entire codebase and index every feature flag reference
  2. Lifecycle Analysis: Determine each flag's status (new, active, stale, deprecated)
  3. Usage Mapping: Build a dependency graph of code paths guarded by each flag
  4. Audit Phase: Check integration platforms (LaunchDarkly, etc.) for rollout metadata
  5. Cleanup Generation: Generate PRs that safely remove stale flags and collapse conditional logic
  6. Safety Validation: Verify that removal doesn't break the build or introduce logic errors

Let's start with the scanning phase — the foundation everything else builds on.

Phase 1: Indexing Feature Flags

The first step is understanding what flags exist and where they live. We're building a flagIndexer that walks your codebase and creates a comprehensive registry.

typescript
import * as fs from 'fs';
import * as path from 'path';
 
interface FlagReference {
  flagName: string;
  filePath: string;
  lineNumber: number;
  context: string; // The line of code containing the flag
  type: 'check' | 'definition' | 'removal'; // How the flag is used
}
 
interface FlagRegistry {
  flags: Map<string, FlagReference[]>;
  lastScanned: Date;
  totalReferences: number;
}
 
async function indexFeatureFlags(rootDir: string): Promise<FlagRegistry> {
  const **registry** = new Map<string, FlagReference[]>();
  const **patterns** = [
    /flagClient\.variation\(['"]([^'"]+)['"]/g,
    /feature\.([a-zA-Z_][a-zA-Z0-9_]*)\s*===?\s*(true|false)/g,
    /if\s*\(\s*flags\.([a-zA-Z_][a-zA-Z0-9_]*)/g,
    /LaunchDarkly\.check\(['"]([^'"]+)['"]/g,
  ];
 
  const **walkDir** = async (dir: string, relativePath = '') => {
    const entries = await fs.promises.readdir(dir, { withFileTypes: true });
 
    for (const entry of entries) {
      if (entry.name.startsWith('.')) continue;
      if (entry.name === 'node_modules') continue;
 
      const fullPath = path.join(dir, entry.name);
      const relPath = path.join(relativePath, entry.name);
 
      if (entry.isDirectory()) {
        await walkDir(fullPath, relPath);
      } else if (
        entry.name.endsWith('.ts') ||
        entry.name.endsWith('.tsx') ||
        entry.name.endsWith('.js')
      ) {
        await indexFile(fullPath, relPath, patterns, registry);
      }
    }
  };
 
  await walkDir(rootDir);
 
  return {
    flags: registry,
    lastScanned: new Date(),
    totalReferences: Array.from(registry.values()).reduce(
      (sum, refs) => sum + refs.length,
      0
    ),
  };
}
 
async function indexFile(
  filePath: string,
  relPath: string,
  patterns: RegExp[],
  registry: Map<string, FlagReference[]>
): Promise<void> {
  const content = await fs.promises.readFile(filePath, 'utf8');
  const lines = content.split('\n');
 
  for (let i = 0; i < lines.length; i++) {
    const line = lines[i];
 
    for (const pattern of patterns) {
      let match;
      pattern.lastIndex = 0;
 
      while ((match = pattern.exec(line)) !== null) {
        const flagName = match[1];
 
        if (!registry.has(flagName)) {
          registry.set(flagName, []);
        }
 
        registry.get(flagName)!.push({
          flagName,
          filePath: relPath,
          lineNumber: i + 1,
          context: line.trim(),
          type: determineFlagType(line),
        });
      }
    }
  }
}
 
function determineFlagType(line: string): 'check' | 'definition' | 'removal' {
  if (line.includes('flagClient.variation')) return 'check';
  if (line.includes('LaunchDarkly')) return 'check';
  if (line.includes('define') || line.includes('export')) return 'definition';
  return 'check';
}

This is the backbone. We're using regex patterns to catch common feature flag patterns in TypeScript/JavaScript. The patterns cover LaunchDarkly's API, generic flag checks, and conditional logic based on flags.

Why these patterns? Because they're the most common ways we see flags in the wild. LaunchDarkly's flagClient.variation() is explicit. Generic flags.myFeature checks are common in homegrown systems. The conditional if (flags.featureName) pattern is everywhere. We're not trying to catch every possible variation — we're catching the 90% case with high precision.

The key insight here: we're building a map from flag names to all their usages. This is our foundation for everything downstream. When we later ask "is this flag safe to remove?", we can instantly see every place it's referenced.

One thing to note: this scanner intentionally skips node_modules and dotfiles. You might find flags in dependencies, but those aren't your flags to remove. Similarly, we're only looking at JavaScript/TypeScript because that's where your runtime flags live. Flags in tests might need different handling (you might want to delete test coverage for removed flags too, but that's optional).

Phase 2: Lifecycle Analysis

Now we know where flags are, but we need to understand their status. A flag that was deployed three months ago and affects 50 files is very different from a flag that was just added and guards one new component.

typescript
interface FlagLifecycle {
  flagName: string;
  status: 'new' | 'active' | 'stale' | 'deprecated';
  firstSeen: Date;
  lastSeen: Date;
  fileCount: number;
  referenceCount: number;
  killDate?: Date; // When it should be removed
  confidence: number; // 0-100, how sure we are it's safe to remove
}
 
async function analyzeFlagLifecycles(
  registry: FlagRegistry,
  gitDir: string
): Promise<Map<string, FlagLifecycle>> {
  const **lifecycles** = new Map<string, FlagLifecycle>();
 
  for (const [flagName, references] of registry.flags) {
    const **gitHistory** = await getGitHistory(gitDir, flagName);
    const **fileCount** = new Set(references.map((r) => r.filePath)).size;
 
    const **lifecycle** = {
      flagName,
      status: determineStatus(gitHistory, references),
      firstSeen: gitHistory.oldestCommit || new Date(),
      lastSeen: gitHistory.newestCommit || new Date(),
      fileCount,
      referenceCount: references.length,
      killDate: calculateKillDate(gitHistory),
      confidence: calculateConfidence(references, gitHistory),
    };
 
    lifecycles.set(flagName, lifecycle);
  }
 
  return lifecycles;
}
 
async function getGitHistory(
  gitDir: string,
  flagName: string
): Promise<{ oldestCommit: Date; newestCommit: Date }> {
  // In real implementation, you'd shell out to git log
  // or use a library like nodegit
  // This is pseudo-code showing the concept
  const result = await exec(
    `git log -S "${flagName}" --pretty=format:"%aI" ${gitDir}`
  );
 
  const dates = result.stdout
    .split('\n')
    .filter((d) => d.trim())
    .map((d) => new Date(d));
 
  return {
    oldestCommit: dates[dates.length - 1],
    newestCommit: dates[0],
  };
}
 
function determineStatus(
  gitHistory: any,
  references: FlagReference[]
): 'new' | 'active' | 'stale' | 'deprecated' {
  const now = new Date();
  const lastSeen = gitHistory.newestCommit;
  const daysSinceLastSeen = (now.getTime() - lastSeen.getTime()) / 86400000;
 
  // If we see it being checked recently, it's active
  if (daysSinceLastSeen < 7) return 'active';
 
  // If it hasn't been mentioned in git for 3+ months, probably stale
  if (daysSinceLastSeen > 90) return 'stale';
 
  // New if added in the last 2 weeks
  if (daysSinceLastSeen < 14) return 'new';
 
  return 'active';
}
 
function calculateKillDate(gitHistory: any): Date | undefined {
  // If a flag is stale (untouched in 3 months),
  // we recommend killing it 1 month from now
  const daysSinceLastSeen =
    (new Date().getTime() - gitHistory.newestCommit.getTime()) / 86400000;
 
  if (daysSinceLastSeen > 90) {
    const killDate = new Date();
    killDate.setDate(killDate.getDate() + 30);
    return killDate;
  }
 
  return undefined;
}
 
function calculateConfidence(
  references: FlagReference[],
  gitHistory: any
): number {
  let **confidence** = 100;
 
  // If used in critical files, lower confidence
  const criticalFiles = [
    'auth',
    'payment',
    'security',
    'kernel',
    'core',
  ];
  if (
    references.some((r) =>
      criticalFiles.some((cf) => r.filePath.includes(cf))
    )
  ) {
    confidence -= 20;
  }
 
  // If used in many files, it's riskier
  if (references.length > 20) confidence -= 10;
 
  // Very old flags are riskier (changed patterns might have evolved)
  const daysSinceCreated =
    (new Date().getTime() - gitHistory.oldestCommit.getTime()) / 86400000;
  if (daysSinceCreated > 365) confidence -= 5;
 
  return Math.max(0, confidence);
}

This phase connects your git history to your flag references. We're asking: when was this flag first added? When was it last mentioned in code? Has the pattern changed? These answers tell us whether a flag is truly dead or still in use.

The confidence calculation is the real trick here. A flag that guards authentication code and is used in 30 places should be removed with extreme caution. A flag guarding an experiment that's used in one test file? That's low-risk cleanup. We're deliberately conservative because the cost of removing a flag wrong is much higher than the cost of keeping one around too long.

Notice how we use git history, not just code presence. A flag might still be in the code, but if nobody has touched it in three months, it's effectively dead. Git is our source of truth for "what was actually worked on recently." This prevents false positives from dead code that just never got cleaned up.

Phase 3: Audit Integration

Here's where we get serious. We connect to your feature flag platform (LaunchDarkly, Split.io, Unleash, etc.) and pull in real rollout data. Code analysis alone isn't enough — you need to know if the flag is actually enabled in production.

typescript
interface LaunchDarklyAudit {
  flagKey: string;
  name: string;
  description: string;
  creationDate: Date;
  lastModified: Date;
  enabled: boolean;
  rolloutPercentage: number; // 0-100
  targetedUsers: string[];
  archived: boolean;
}
 
async function auditLaunchDarkly(
  apiKey: string,
  projectKey: string
): Promise<Map<string, LaunchDarklyAudit>> {
  const **baseUrl** = 'https://app.launchdarkly.com/api/v2';
  const **audits** = new Map<string, LaunchDarklyAudit>();
 
  // Fetch all flags for this project
  const response = await fetch(`${baseUrl}/projects/${projectKey}/flags`, {
    headers: { Authorization: `Bearer ${apiKey}` },
  });
 
  const data = await response.json();
 
  for (const flag of data.items) {
    audits.set(flag.key, {
      flagKey: flag.key,
      name: flag.name,
      description: flag.description || '',
      creationDate: new Date(flag.creationDate),
      lastModified: new Date(flag.lastModified),
      enabled: flag.environments.production?.on || false,
      rolloutPercentage: flag.environments.production?.rollout?.variations[0]
        ?.percentage || 0,
      targetedUsers: flag.environments.production?.targets || [],
      archived: flag.archived,
    });
  }
 
  return audits;
}
 
interface AuditedFlag {
  flagName: string;
  status: string;
  confidence: number;
  inProduction: boolean;
  rolloutPercentage: number;
  archived: boolean;
  recommendation: 'remove_now' | 'remove_soon' | 'keep_monitoring' | 'keep';
  reasoning: string;
}
 
function auditedRemovalRecommendation(
  lifecycle: FlagLifecycle,
  ldAudit: LaunchDarklyAudit | undefined
): AuditedFlag {
  const **inProduction** = ldAudit?.enabled || false;
  const **archived** = ldAudit?.archived || false;
  const **rolloutPercentage** = ldAudit?.rolloutPercentage || 0;
 
  let **recommendation** = 'keep' as const;
  let **reasoning** = '';
 
  // Rule 1: If archived in LaunchDarkly and stale in code, remove now
  if (archived && lifecycle.status === 'stale') {
    recommendation = 'remove_now';
    reasoning = `Flag is archived in LaunchDarkly and hasn't been touched in ${
      (new Date().getTime() - lifecycle.lastSeen.getTime()) / 86400000
    } days.`;
  }
 
  // Rule 2: If rolled out to 0% and stale, safe to remove soon
  if (
    rolloutPercentage === 0 &&
    lifecycle.status === 'stale' &&
    lifecycle.confidence > 80
  ) {
    recommendation = 'remove_soon';
    reasoning = `Flag is disabled (0% rollout) and stale in codebase. Safe to remove within 30 days.`;
  }
 
  // Rule 3: If still in production, keep for now
  if (inProduction && lifecycle.status === 'active') {
    recommendation = 'keep';
    reasoning = `Flag is active in production (${rolloutPercentage}% rollout).`;
  }
 
  // Rule 4: If stale but in production, monitor closely
  if (inProduction && lifecycle.status === 'stale') {
    recommendation = 'keep_monitoring';
    reasoning = `Flag is enabled in production but hasn't been modified in code. Possible ghost flag — needs investigation.`;
  }
 
  return {
    flagName: lifecycle.flagName,
    status: lifecycle.status,
    confidence: lifecycle.confidence,
    inProduction,
    rolloutPercentage,
    archived,
    recommendation,
    reasoning,
  };
}

This is critical. We're pulling truth from LaunchDarkly — the source of record for what's actually running in production. A flag might look stale in code, but if it's still rolling out to 50% of users, we can't just delete it. The audit integration is what turns analysis into actionable intelligence.

Think about this: your codebase might have been updated to remove flag checks, but the flag still exists in LaunchDarkly. Or vice versa — the flag is archived but your code still references it (dead code). We're detecting both scenarios and reporting them.

Phase 4: Safe Removal Generation

Now we've got all the data. We know where flags are, how old they are, what their lifecycle is, and what the feature platform says. Time to generate removal PRs that are actually safe.

typescript
interface RemovalPR {
  flagName: string;
  title: string;
  description: string;
  changes: FileChange[];
  tests: string[];
  rolloutStrategy: string;
}
 
interface FileChange {
  filePath: string;
  originalCode: string;
  refactoredCode: string;
  explanation: string;
}
 
async function generateRemovalPR(
  flagName: string,
  references: FlagReference[],
  lifecycle: FlagLifecycle,
  auditedRecommendation: AuditedFlag
): Promise<RemovalPR> {
  if (auditedRecommendation.recommendation === 'keep') {
    throw new Error(`Cannot remove flag ${flagName}: still in production.`);
  }
 
  const **changes** = await generateChanges(flagName, references);
  const **tests** = await generateTests(flagName, references);
 
  const **rolloutStrategy** =
    auditedRecommendation.recommendation === 'remove_now'
      ? 'Immediate removal'
      : 'Staged removal over 2 sprints';
 
  return {
    flagName,
    title: `refactor: remove feature flag ${flagName}`,
    description: generatePRDescription(
      flagName,
      lifecycle,
      auditedRecommendation
    ),
    changes,
    tests,
    rolloutStrategy,
  };
}
 
async function generateChanges(
  flagName: string,
  references: FlagReference[]
): Promise<FileChange[]> {
  const **changesByFile** = new Map<string, FlagReference[]>();
 
  // Group references by file
  for (const ref of references) {
    if (!changesByFile.has(ref.filePath)) {
      changesByFile.set(ref.filePath, []);
    }
    changesByFile.get(ref.filePath)!.push(ref);
  }
 
  const **changes** = [];
 
  for (const [filePath, fileRefs] of changesByFile) {
    const originalCode = await fs.promises.readFile(filePath, 'utf8');
    let **refactoredCode** = originalCode;
    let **explanation** = '';
 
    // Strategy 1: Simple flag check that returns early
    // Before: if (flags.betaUI) { return <NewUI />; }
    // After: return <NewUI />;
    const simpleCheckPattern = new RegExp(
      `if\\s*\\(\\s*flags\\.${flagName}\\s*\\)\\s*{([^}]+)}`,
      'g'
    );
    if (simpleCheckPattern.test(originalCode)) {
      refactoredCode = refactoredCode.replace(
        simpleCheckPattern,
        (match, body) => body
      );
      explanation =
        'Removed conditional wrapper — keeping the enabled code path.';
    }
 
    // Strategy 2: Ternary expression
    // Before: const UI = flags.betaUI ? <NewUI /> : <OldUI />;
    // After: const UI = <NewUI />;
    const ternaryPattern = new RegExp(
      `flags\\.${flagName}\\s*\\?\\s*([^:]+)\\s*:\\s*([^;)]+)`,
      'g'
    );
    if (ternaryPattern.test(originalCode)) {
      refactoredCode = refactoredCode.replace(ternaryPattern, (match) => {
        // Extract the "true" branch
        const parts = match.split('?');
        const trueBranch = parts[1].split(':')[0].trim();
        return trueBranch;
      });
      explanation =
        'Unwrapped ternary — using the enabled code path exclusively.';
    }
 
    // Strategy 3: LaunchDarkly variation calls
    // Before: const feature = await client.variation('feature-x', user, false);
    // After: const feature = true; (or false, depending on which was enabled)
    const ldPattern = new RegExp(
      `flagClient\\.variation\\(['"]${flagName}['"],\\s*([^,]+),\\s*(true|false)\\)`,
      'g'
    );
    if (ldPattern.test(originalCode)) {
      refactoredCode = refactoredCode.replace(ldPattern, 'true');
      explanation = 'Replaced LaunchDarkly variation call with hardcoded value.';
    }
 
    changes.push({
      filePath,
      originalCode,
      refactoredCode,
      explanation,
    });
  }
 
  return changes;
}
 
async function generateTests(
  flagName: string,
  references: FlagReference[]
): Promise<string[]> {
  return [
    `Test: ${flagName} removed from imports`,
    `Test: All conditional branches evaluated correctly`,
    `Test: No dead code paths remain`,
    `Test: Runtime behavior unchanged`,
  ];
}
 
function generatePRDescription(
  flagName: string,
  lifecycle: FlagLifecycle,
  auditedRecommendation: AuditedFlag
): string {
  return `## Removing feature flag: ${flagName}
 
**Status**: ${lifecycle.status}
**Files affected**: ${lifecycle.fileCount}
**Total references**: ${lifecycle.referenceCount}
**Confidence**: ${lifecycle.confidence}%
 
**Recommendation**: ${auditedRecommendation.recommendation}
**Reasoning**: ${auditedRecommendation.reasoning}
 
**Changes**:
- Removed all conditional logic guarded by this flag
- Kept the enabled code path (the feature is now always-on)
- No behavior changes — this flag was fully rolled out
 
**Testing**:
- All existing tests pass
- New tests added to verify flag removal
- E2E tests verify no behavioral regression
 
**Rollback**: If issues arise, revert this PR and re-enable the flag in LaunchDarkly.
 
**Closes**: [ticket number]
`;
}

This is where the magic happens. We're not just deleting code — we're understanding the patterns and generating smart refactorings. If a flag guards a simple conditional, we unwrap it. If it's a ternary, we take the enabled branch. If it's a LaunchDarkly call, we replace it with the constant value.

The key insight: different flag patterns require different removal strategies. A simple if-check can be unwrapped. A ternary needs different logic. A LaunchDarkly variation call needs to be replaced with its resolved value. We're encoding these patterns as transformation rules.

Phase 5: Putting It Together

Here's how you'd wire this all together and run the full pipeline:

typescript
async function runFlagHygieneAudit(config: {
  rootDir: string;
  gitDir: string;
  launchDarklyKey?: string;
  launchDarklyProject?: string;
}) {
  console.log('🚀 Starting feature flag hygiene audit...\n');
 
  // Phase 1: Scan and index
  console.log('📍 Phase 1: Indexing flags...');
  const **registry** = await indexFeatureFlags(config.rootDir);
  console.log(
    `✓ Found ${registry.flags.size} unique flags across ${registry.totalReferences} references\n`
  );
 
  // Phase 2: Analyze lifecycles
  console.log('📊 Phase 2: Analyzing lifecycles...');
  const **lifecycles** = await analyzeFlagLifecycles(registry, config.gitDir);
  console.log(`✓ Analyzed ${lifecycles.size} flag lifecycles\n`);
 
  // Phase 3: Audit platforms
  console.log('🔍 Phase 3: Auditing LaunchDarkly...');
  const **ldAudits** = config.launchDarklyKey
    ? await auditLaunchDarkly(config.launchDarklyKey, config.launchDarklyProject!)
    : new Map();
  console.log(`✓ Retrieved ${ldAudits.size} flags from LaunchDarkly\n`);
 
  // Phase 4: Generate recommendations
  console.log('💡 Phase 4: Generating recommendations...');
  const **recommendations** = new Map<string, AuditedFlag>();
 
  for (const [flagName, lifecycle] of lifecycles) {
    const ldAudit = ldAudits.get(flagName);
    const recommendation = auditedRemovalRecommendation(lifecycle, ldAudit);
    recommendations.set(flagName, recommendation);
  }
 
  const **removeNow** = Array.from(recommendations.values()).filter(
    (r) => r.recommendation === 'remove_now'
  );
  const **removeSoon** = Array.from(recommendations.values()).filter(
    (r) => r.recommendation === 'remove_soon'
  );
 
  console.log(`✓ ${removeNow.length} flags ready for immediate removal`);
  console.log(`✓ ${removeSoon.length} flags safe to remove in 30 days\n`);
 
  // Phase 5: Generate PRs for removal_now flags
  console.log('🔧 Phase 5: Generating removal PRs...');
 
  for (const auditedFlag of removeNow) {
    const references = registry.flags.get(auditedFlag.flagName) || [];
    const lifecycle = lifecycles.get(auditedFlag.flagName)!;
 
    try {
      const **pr** = await generateRemovalPR(
        auditedFlag.flagName,
        references,
        lifecycle,
        auditedFlag
      );
      console.log(`✓ Generated PR: ${pr.title}`);
      // In real implementation, you'd actually create the PR on GitHub
    } catch (e) {
      console.log(`✗ Skipped ${auditedFlag.flagName}: ${e.message}`);
    }
  }
 
  console.log('\n✨ Audit complete!');
}
 
// Usage
runFlagHygieneAudit({
  rootDir: '/path/to/project',
  gitDir: '/path/to/project/.git',
  launchDarklyKey: process.env.LD_API_KEY,
  launchDarklyProject: 'your-project-key',
});

Run this, and you get a complete picture of your flag landscape. Which flags are actively being used? Which are forgotten artifacts? Which ones are safe to remove? When should you remove them?

Integrating with Claude Code

Now, here's the real power. Instead of running this once manually, you integrate it with Claude Code's agent system. Create a command like /cleanup-flags that runs this pipeline automatically, analyzes the results, and generates removal PRs without human intervention.

typescript
// In .claude/commands/cleanup-flags.ts
 
import { createCommand } from "@claude-code/sdk";
 
createCommand({
  name: "cleanup-flags",
  description: "Audit and clean up stale feature flags",
  async execute(args: string[]) {
    const config = {
      rootDir: process.cwd(),
      gitDir: path.join(process.cwd(), ".git"),
      launchDarklyKey: process.env.LD_API_KEY,
      launchDarklyProject: process.env.LD_PROJECT,
    };
 
    await runFlagHygieneAudit(config);
  },
});

Add this to your CI/CD pipeline as a scheduled job. Every week, it runs silently. It finds flags ready for removal. It generates PRs. Your team reviews them. Boom — technical debt cleaned automatically.

Real-World Example

Let's say you have this flag check in three different places:

typescript
// payment/checkout.ts
if (flags.newPaymentFlow) {
  return <NewCheckoutFlow />;
}
 
// payment/cart.ts
const displayNewCart = flags.newPaymentFlow ? true : false;
 
// admin/billing.ts
const client = new LaunchDarklyClient();
const useNewFlow = await client.variation('newPaymentFlow', user, false);

The audit finds all three. It hits LaunchDarkly and discovers the flag is archived and rolled out to 0%. It generates a PR that:

  1. Removes the conditional in checkout.ts — keeps the NewCheckoutFlow
  2. Unwraps the ternary in cart.ts
  3. Replaces the LaunchDarkly call in admin/billing.ts with a hardcoded true

One PR. One review. Flag deleted everywhere. Dependencies gone. Codebase simpler.

Pro Tips and Edge Cases

Batch removals: Don't try to remove 20 flags at once. Do 3-4 per sprint. This spreads the risk and makes reviews easier.

Always run tests: The generated PR description mentions tests, but you actually need to run them. Our code generation doesn't guarantee correctness — it needs human validation. This is where your test suite becomes your safety net.

Watch for flag interdependencies: Some flags might depend on other flags. Before removing, check if any flag checks reference another flag. Our regex patterns should catch this, but it's worth a manual scan. A flag that depends on "featureX && featureY" can't be removed until both are removed.

Communicate with the team: If a flag has been around for years, someone might have a good reason to keep it. Create a Slack alert before auto-removing. Give people time to object. Flag removal is a team activity, not an automation decision.

Version your removal PRs: Use the git commit hash in the PR description so you can trace back if something goes wrong. Document which version of the audit tool generated it.

Advanced: Handling Complex Flag Patterns

Not all flags are simple. Some guard multi-branch logic. Others depend on other flags. Some have shadow implementations. Let's handle these edge cases.

Multi-Branch Conditionals

Sometimes a flag controls multiple code paths:

typescript
// Before: complex branching
if (flags.enhancedMetrics) {
  analytics.trackEventV2(event);
  await database.logMetrics(event);
  webhooks.notify(event);
} else {
  analytics.trackEventV1(event);
}

The removal code needs to understand this is a compound statement, not just unwrap the condition:

typescript
async function handleComplexConditional(
  flagName: string,
  conditionalBlock: string,
): Promise<string> {
  // Parse the conditional to extract true and false branches
  const match = conditionalBlock.match(
    /if\s*\([^)]+\)\s*{([^]*?)}\s*(?:else\s*{([^]*?)})?/,
  );
 
  if (!match) return conditionalBlock;
 
  const [_, trueBranch, falseBranch] = match;
 
  // The flag was enabled, so we keep the true branch
  // But we need to verify it's actually the logic we want
  return trueBranch.trim();
}

The real insight: compound statements need more confidence. Our confidence scoring should drop when a flag guards 20+ lines of code versus a simple return statement. We're not uncertain about the refactoring; we're uncertain about whether unwrapping is the right call.

Flag Dependencies

What if one flag depends on another? if (flags.newPaymentFlow && flags.stripeIntegration)

Your removal logic needs to detect this:

typescript
function findFlagDependencies(
  flagName: string,
  references: FlagReference[]
): Set<string> {
  const **dependencies** = new Set<string>();
  const dependencyPattern = /&&\s*flags\.([a-zA-Z_][a-zA-Z0-9_]*)/g;
 
  for (const ref of references) {
    let match;
    dependencyPattern.lastIndex = 0;
 
    while ((match = dependencyPattern.exec(ref.context)) !== null) {
      dependencies.add(match[1]);
    }
  }
 
  return dependencies;
}
 
// In your removal check:
const deps = findFlagDependencies(flagName, references);
 
if (deps.size > 0) {
  // Can't remove this flag safely until its dependencies are removed first
  recommendation = 'keep_until_dependencies_resolved';
  reasoning = `This flag depends on: ${Array.from(deps).join(', ')}. Remove those first.`;
}

This creates an ordering constraint. You can only remove flags in dependency order. Your automation should schedule flag removals respecting these dependencies.

Shadow Implementations

Sometimes a flag has a "shadow implementation" — code that runs in parallel to test performance:

typescript
// Before
const result = newImplementation(data);
if (flags.compareResults) {
  const oldResult = oldImplementation(data);
  assert(result === oldResult); // A/B test
}

Removing the flag is tempting but risky. What if the new implementation has a bug that only appears in production under certain load? The shadow comparison is your safety net.

For these cases, increase the removal confidence threshold:

typescript
function assessShadowImplementation(
  flagName: string,
  references: FlagReference[],
): number {
  const shadowPatterns = [
    /oldImplementation|legacyVersion|comparison|assert/gi,
    /shadow|fallback|verification/gi,
  ];
 
  const shadowCount = references.filter((ref) =>
    shadowPatterns.some((p) => p.test(ref.context)),
  ).length;
 
  if (shadowCount > 0) {
    // Shadow implementations need higher confidence before removal
    return -15; // Reduce confidence score
  }
 
  return 0;
}

Testing Your Refactorings

Here's the thing: code generation is only as good as your validation. Before you merge a flag removal PR, you absolutely must verify it's safe.

typescript
async function validateRemovalPR(
  changes: FileChange[],
  testCommand: string
): Promise<ValidationResult> {
  const **results** = {
    passed: 0,
    failed: 0,
    details: [] as string[],
  };
 
  // Step 1: Verify syntax
  for (const change of changes) {
    try {
      parseTypeScript(change.refactoredCode);
      results.passed++;
      results.details.push(`✓ ${change.filePath}: Syntax valid`);
    } catch (e) {
      results.failed++;
      results.details.push(`✗ ${change.filePath}: ${e.message}`);
    }
  }
 
  // Step 2: Run test suite
  try {
    const testOutput = await exec(testCommand);
    if (testOutput.code === 0) {
      results.passed++;
      results.details.push('✓ All tests pass');
    } else {
      results.failed++;
      results.details.push('✗ Test suite failed');
    }
  } catch (e) {
    results.failed++;
    results.details.push(`✗ Test execution error: ${e.message}`);
  }
 
  return results;
}

The validation is your safety gate. If any test fails, the PR doesn't get created. The automation respects your test suite as the source of truth. This is critical: we're not trying to be perfect. We're trying to be safe. When in doubt, flag it for human review.

Scaling to Production

Once you've built confidence in small flag removals, scale the system:

  1. Run weekly audits instead of manually
  2. Auto-generate PRs for "remove_now" flags
  3. Auto-merge safe PRs (confidence > 95%, all tests pass)
  4. Slack notifications for removals above a certain risk level
  5. Monthly reports of flag removals and lines of code saved

Here's a Slack notification template:

typescript
async function notifyFlagRemovalProgress(
  removedFlags: string[],
  codelinesSaved: number,
) {
  const message = {
    text: `🧹 Feature Flag Hygiene Update`,
    blocks: [
      {
        type: "section",
        text: {
          type: "mrkdwn",
          text: `${removedFlags.length} stale flags removed this week
${codelinesSaved} lines of code simplified
Confidence threshold maintained at >85%`,
        },
      },
    ],
  };
 
  await slack.webhook(message);
}

Real-World Challenges and Solutions

In practice, feature flag cleanup faces real obstacles. Here's how to handle them:

Challenge 1: Feature flags with business implications

Some flags aren't purely technical—they control access to critical business logic. Removing them requires stakeholder coordination, not just code analysis.

Solution: Add a business metadata layer to your audit:

typescript
interface FlagMetadata {
  flagName: string;
  businessOwner: string;
  hasCostImplications: boolean;
  affectsPayment?: boolean;
  affectsAuth?: boolean;
  requiresStakeeholderApproval: boolean;
}

Before the audit recommends removal, check metadata. If the flag affects payment or auth, escalate to business owners. Build approval workflows into your pipeline.

Challenge 2: Flags that control A/B tests

A/B testing flags are different from feature flags. You can't just remove them—you need to finalize the winning variant and document the results.

Solution: Distinguish experiment flags from feature flags:

typescript
type FlagType = "feature" | "experiment" | "rollout" | "config";
 
function analyzeByFlagType(lifecycle: FlagLifecycle): Recommendation {
  if (lifecycle.type === "experiment") {
    // Requires statistical analysis, not just code analysis
    return {
      type: "archive_experiment",
      reasoning: "This A/B test has ended. Document winner.",
      requiredSteps: ["document_results", "implement_winner", "archive_loser"],
    };
  }
  // ... other types
}

Challenge 3: Flags in multiple repositories

Some flags span monorepos or multiple codebases. Your audit tool needs to coordinate across repos.

Solution: Create a flag registry service that aggregates flags across repositories:

typescript
interface GlobalFlagRegistry {
  flagName: string;
  repositories: { repo: string; usageCount: number }[];
  totalUsageCount: number;
  safeToRemove: boolean; // true only if safe in ALL repos
}

Query this registry before recommending removal. If a flag is used in 3 repos, removing it requires PRs in all 3.

Challenge 4: Documentation and audit trails

When you remove a flag, future developers might wonder why something worked the way it did. Document the removal properly.

Solution: Create a "decommissioned flags" log:

typescript
interface DecommissionedFlag {
  flagName: string;
  removedDate: string;
  removalCommit: string;
  wasAboutFeature: string;
  whyRemoved: string;
  equivalentCodePath: string;
  relatedIssues: string[];
}

Keep this in a file like DECOMMISSIONED_FLAGS.md. When someone searches git history for a removed flag, this file explains what happened and why.

Integration with your deployment pipeline

Feature flag cleanup should be part of your regular deployment workflow, not a separate project:

yaml
# .github/workflows/flag-cleanup.yml
name: Automated Feature Flag Cleanup
 
on:
  schedule:
    - cron: "0 9 * * 1" # Every Monday at 9am
  workflow_dispatch:
 
jobs:
  audit:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0
 
      - name: Run flag hygiene audit
        run: |
          npx ts-node scripts/audit-flags.ts \
            --output reports/flag-audit.json \
            --create-prs-for-safe-removals
 
      - name: Create cleanup PRs
        if: success()
        run: npx ts-node scripts/create-flag-removal-prs.ts reports/flag-audit.json
 
      - name: Notify team
        if: always()
        run: |
          curl -X POST $SLACK_WEBHOOK \
            -H 'Content-type: application/json' \
            -d '{
              "text": "Feature flag audit complete. Check created PRs."
            }'

This transforms flag cleanup from manual effort into automated, predictable work.

The Organizational Gravity of Feature Flags

Feature flags have a curious property: they become harder to remove the longer they exist. Not technically—removing code is straightforward. But socially and organizationally, they develop gravity. A flag that shipped six months ago has become institutional knowledge. People built processes around it. Someone's on-call playbook references it. Maybe you're using it for A/B testing and the product team is attached to the results. Removing it now isn't just a code change—it's a coordination problem across multiple teams.

This organizational gravity is why many flags stay in the codebase for years, long after they should be removed. The technical cost of removal becomes invisible against the organizational complexity. Nobody wants to be responsible for "hey, let's remove this flag we've been using for production testing." What if product needs it again? What if something breaks? What if a customer is unknowingly dependent on this flag?

The organizations that successfully manage flags are ones that have systematized the social coordination around removal. They have decision procedures. "Any flag inactive for 3 months gets discussed in the next tech review." They have ownership models. "The team that created the flag is responsible for removing it or documenting why it stays." They have communication practices. "Removing this flag? Announce it in #tech-decisions so nobody's surprised."

What's particularly clever about using Claude Code for this: it removes the friction from the removal process itself, which changes the equation. If removal takes two minutes (Claude generates the PR, you review and merge), then the social coordination overhead becomes the dominant cost. That overhead might still exist, but at least the technical part doesn't add to it. You're negotiating about whether to remove the flag, not negotiating about whether the removal is safe. That's progress.

Another aspect: flags exist for reasons. Sometimes those reasons are still valid. A flag guarding an experimental database schema might need to stay until you've fully migrated all data. A flag managing a gradual rollout might need to stay until you've hit 100% adoption. Removing flags without understanding their purpose is dangerous. The audit system in this article helps with that—by tracking why each flag exists and what it guards, you're preserving the institutional knowledge about why it's there.

Measuring success

Track these metrics to understand the impact of flag cleanup:

  • Dead code removed: Lines of conditional logic simplified
  • Test coverage: Did we add more tests or remove unnecessary ones?
  • Deployment frequency: Can you deploy faster without flag checks?
  • Code review time: Shorter diffs mean faster reviews
  • Production bugs: Do fewer bugs slip through with clearer logic paths?

Document these metrics in your team wiki so everyone understands the value.

Understanding the Business Cost of Flag Debt

The cleanup story often starts with a quiet realization during a code review. Someone notices a conditional for a feature that shipped 18 months ago. It's still being checked on every relevant code path. Every request that passes through authentication checks whether FEATURE_NEW_AUTH_SYSTEM is enabled. It's been 100% enabled since month 6, but the code path remains. That's not just a line of code—that's a recurring business cost.

Think about this at scale. If you have 50 flags in a production system, and each one adds even 2-3 extra conditionals per request, and you're handling 10,000 requests per second, you're doing 50-150 million unnecessary conditional checks every single second. Not all flags pay the same cost—some guard entire system components, others guard a single UI element—but cumulatively, they're not free. They tax your CPU, they tax your memory for branch prediction, they tax your developer focus.

The story gets worse when you consider the compliance and security angle. A security audit comes in and spots a flag that was supposed to gate admin access while you rolled out new authorization logic. It's been hardcoded to true for three months, but nobody actually removed the false branch. The false branch contains old, deprecated authentication code. An auditor sees it and flags a compliance issue: "You have dead code paths that could be used for privilege escalation." That's not theoretical. That's a finding you have to remediate. That's a meeting with your security team. That's explaining why you have deprecated access control code still in the repository.

And here's the sneaky cost: every time you want to refactor something, those flags become anchors. You're planning to upgrade your authentication library. You need to update it in the main code path. But that flag-wrapped old auth code also references the library. So you update both. Or you decide not to upgrade because it's too much work. Either way, the dead flag is creating friction for unrelated work.

This is why automated flag cleanup isn't a nice-to-have feature optimization. It's a business productivity tool. It's about clearing obstacles so your team can move faster, be more secure, and reduce cognitive load.

Integration with Development Workflows

The real power of this system emerges when you integrate it into your regular development rhythm. Most teams run flag audits quarterly or annually, which means they accumulate 6-12 months of debt before doing cleanup. But what if audits ran weekly? What if every Monday morning, your team saw a list of flags that are safe to remove?

Here's what a mature flag cleanup workflow looks like in practice. Monday morning, the audit runs automatically as part of your CI pipeline. It generates a report. The report is posted to Slack with a summary: "13 flags ready for removal, estimated 4,200 lines of code to clean, average confidence 92%." Your tech lead glances at it, picks the 3 safest ones, and labels issues with "flag-removal." Your junior developer picks up one of those issues, the automation generates a PR, they review it for 5 minutes, it merges. One flag removed. One PR. Wednesday they take another one. By month's end, you've cleaned 8-10 flags without it ever feeling like a major project.

Compare that to the status quo: someone spends a day manually identifying flags, reviewing them, making changes, writing tests. It's a whole sprint story. So it gets deprioritized. It never happens. Flags accumulate forever. That's the trap most teams are in.

The system we've described transforms flag cleanup from "big project that keeps getting delayed" to "small, repeatable, automated maintenance task that happens continuously." That's a fundamentally different relationship to technical debt.

Wrapping up

Feature flag cleanup is one of those tasks that feels optional until you're drowning in stale conditionals. With Claude Code automating the analysis, PR generation, and safety validation, you can finally close that gap between "flag shipped" and "flag removed."

The system we've built gives you complete visibility into your flag landscape, automated safety checks before removal, integration with your platform (LaunchDarkly, Split, etc.), smart refactoring that understands different patterns, and batch processing for cleanup at scale. You've learned how to categorize flags by lifecycle stage, understand their business implications, identify dependencies between flags, and remove them confidently with comprehensive testing.

But the real insight is this: flag management isn't about the code. It's about creating a system where technical debt doesn't compound. Where cleanup is a regular, predictable process instead of an overwhelming project. Where a junior developer can grab a flag removal task and complete it in an hour because the automation handles the dangerous parts.

Start with a manual audit of your largest 5 flags. Build confidence. Run the audit pipeline locally and see what it finds. Get comfortable with the removal process. Then deploy the full pipeline and let Claude Code handle the cleanup automatically. Integrate it into your weekly metrics. Make it part of your team's definition of done: "Feature shipped, and if the flag stays >6 months, there's a quarterly audit and removal plan."

Your future self will thank you when you're not drowning in stale conditional logic. Your security team will thank you when code audits reveal fewer deprecated code paths. Your deployment pipeline will thank you for the simpler logic to trace. And your developers will thank you for giving them a codebase that's continuously optimized instead of perpetually cluttered.

-iNet

Need help implementing this?

We build automation systems like this for clients every day.

Discuss Your Project