Why Domain Glossaries Matter (And Why Most Teams Skip Them)

A domain glossary is basically a business-to-code translation dictionary. Here's why they're underrated:

Onboarding becomes real. New team members can actually understand what you mean by "tenant," "order," or "settlement." Without this, they're reading code trying to reverse-engineer what the business logic even is.

Technical and non-technical teams speak the same language. Your product manager stops saying "workspace" and your engineer stops saying "org context." You're aligned.

Refactoring gets easier. When you rename something, you know everywhere it should be renamed. You catch duplicate concepts hiding under different names.

Documentation stays useful. Your glossary becomes the spine of your architecture docs. Every diagram, every spec, every PR description references the same terms.

Debugging becomes faster. Engineers don't spend thirty minutes figuring out that the "account" in the log is actually the "organization" in the code. The glossary tells them immediately.

Most teams skip this because manually curating a glossary from a codebase of 100k+ lines is tedious. You'd be grepping for patterns, cross-referencing class definitions, tracking comments, and keeping it all in sync. Claude Code changes that equation. It can read your code intelligently, understand relationships, and flag conflicts without you writing a single extraction script.

The business case is real, too. Every hour spent onboarding is multiplied across your engineering team. If you have 20 engineers spending an average of 40 hours onboarding, that's 800 engineer-hours of lost productivity per year. A domain glossary that cuts onboarding time by 30% pays for itself in the first few weeks. And that's not counting the debugging time saved, the refactoring mistakes avoided, or the architectural clarity that comes from shared vocabulary. When you quantify it, domain glossaries aren't nice-to-haves—they're strategic investments in team velocity.

The Three Parts of This Workflow

We're breaking this into three phases:

Extraction: Pull all candidate terms from code comments, class names, function names, and documentation.
Resolution: Find synonyms, merge duplicate concepts, surface conflicts.
Compilation: Build a structured glossary with cross-references to code locations.

Each phase is designed to be run iteratively. Your glossary grows as you discover new patterns. Think of extraction as casting a wide net—you're looking for signal without worrying much about noise. Resolution is where you collaborate with your team to decide what's actually meaningful. Compilation is where you turn raw data into a useful artifact that the whole team can consume.

The iterative approach matters because it means you never need a big multi-week project to "build the glossary." You can start small, extract 20 terms, resolve them, compile, get feedback, and iterate again the next sprint. This way, you get value immediately while continuously improving.

Phase 1: Extracting Terms from Code and Comments

Your codebase is already full of domain terminology. It's hiding in:

Class and type names: PaymentProcessor, CustomerOnboardingService, BillingCycle
Comments and docstrings: The intent behind code
Variable names: Reveal what data flows where
API endpoints and field names: Schema definitions
Test file names: Often describe behavior in domain language
Configuration keys: Settings that map to business concepts

The trick is extracting these signals without drowning in noise. Let's build a Claude Code query that finds substantive terms.

Setting Up Your Extraction Query

Here's how you'd start:

yaml

extraction_queries:
  - name: "class_names"
    pattern: "class\\s+([A-Z][a-zA-Z0-9]+)"
    file_types: [".java", ".ts", ".py", ".cs"]
    context_lines: 3
    description: "Extract class/type definitions"
 
  - name: "function_names"
    pattern: "def\\s+([a-z_]+)\\(|function\\s+([a-zA-Z_]+)|\\s+([a-zA-Z_]+)\\s*\\(.*\\)\\s*\\{|async\\s+([a-zA-Z_]+)"
    file_types: [".py", ".js", ".ts", ".go"]
    context_lines: 2
    description: "Extract function definitions"
 
  - name: "docstring_terms"
    pattern: "(?:\"\"\"|\\'\\'\\'|//|#|/\\*)\\s*((?:[A-Z][a-z]+\\s+){1,3}[A-Z][a-z]+|(?:manages|handles|processes|tracks|validates)\\s+([a-zA-Z\\s]+))"
    file_types: [".py", ".js", ".ts", ".java"]
    context_lines: 5
    description: "Extract documented concepts"
 
  - name: "enum_values"
    pattern: "enum\\s+([A-Z][a-zA-Z0-9]+)|case\\s+([A-Z_]+)|\\\"([A-Z_]+)\\\"\\s*:"
    file_types: [".ts", ".java", ".py"]
    context_lines: 1
    description: "Extract enumerated states and types"
 
  - name: "config_keys"
    pattern: "^\\s*([a-z_]+(?:_[a-z_]+)*)\\s*[:=]"
    file_types: [".yaml", ".yml", ".json", ".env", ".toml"]
    context_lines: 2
    description: "Extract configuration keys"

What's happening here? We're defining patterns that catch domain language in different contexts. Each pattern targets a specific signal. Class names are usually capitalized nouns. Docstrings often describe what something does. Config keys use snake_case and reveal business concepts.

Running the Extraction with Claude Code

You'd use Claude Code's search capability to run these patterns across your codebase:

bash

# Find all class definitions with surrounding context
cd /path/to/your/codebase
grep -r "^class\s" --include="*.java" --include="*.ts" --include="*.py" -B2 -A5 | head -500
 
# Extract docstrings with domain language
grep -r "def\s" --include="*.py" -A3 | grep -E "(def|\"\"\")" | head -300
 
# Find enum definitions
grep -r "enum\s" --include="*.ts" -A10 | head -200

But here's the power move: Claude Code can actually understand what these terms mean. It doesn't just list PaymentProcessor and CustomerOnboardingService—it reads the code, understands the relationships, and starts building a mental model of your domain.

Let's say your extraction finds these class names in a payments service:

PaymentProcessor
PaymentGateway
BillingCycle
Invoice
Receipt
Refund
PendingTransaction
CompletedTransaction
FailedPaymentRetry

Claude Code can look at the code inside each class and understand: "Oh, a PaymentProcessor actually uses a PaymentGateway. And BillingCycle generates Invoice objects. And when a PendingTransaction fails, it becomes a FailedPaymentRetry."

This is where the real signal-to-noise filtering happens. You're not just collecting names; you're understanding relationships. The semantic understanding that Claude brings means you get structured insights rather than flat lists. It can recognize that sendPayment(), processPayment(), and executePayment() are all essentially the same concept, even though they have different names. This kind of intelligent deduplication happens automatically when you use Claude Code's analysis capabilities instead of pure regex extraction.

Building Your Extraction Output

The extraction phase should produce a structured output like this:

yaml

# extracted_terms.yaml
candidates:
  domain_entities:
    - term: "Payment"
      sources:
        - file: "src/payment/PaymentProcessor.java"
          line: 12
          context: "class PaymentProcessor extends BasePayment"
        - file: "src/domain/Payment.ts"
          line: 1
          context: "interface Payment {"
        - file: "docs/architecture.md"
          line: 45
          context: "Payments are processed through the gateway"
      frequency: 8
      variations: ["payment", "Payment", "PAYMENT"]
 
    - term: "Invoice"
      sources:
        - file: "src/billing/Invoice.java"
          line: 23
          context: "public class Invoice implements Serializable"
        - file: "src/billing/BillingService.ts"
          line: 104
          context: "const invoice = await generateInvoice(...)"
      frequency: 12
      variations: ["invoice", "Invoice", "invoices"]
 
  services:
    - term: "BillingService"
      sources:
        - file: "src/services/BillingService.java"
          line: 1
          context: "class BillingService {"
      frequency: 3
      depends_on: ["Invoice", "Payment"]
 
  enums:
    - term: "PaymentStatus"
      values:
        - "PENDING"
        - "COMPLETED"
        - "FAILED"
        - "REFUNDED"
      sources:
        - file: "src/domain/PaymentStatus.java"
          line: 15
          context: "enum PaymentStatus"
 
  configuration:
    - term: "billing_cycle_length_days"
      sources:
        - file: "config/production.yaml"
          line: 42
      context: "billing_cycle_length_days: 30"

This output captures what was found, where it was found, and how often. The frequency and variations help you spot duplicates in the next phase. The source information becomes invaluable when you need to understand how a term is actually used in context, which you'll often need to do during the resolution phase.

Real-World Extraction Example

Here's what happens when you actually run this on a real codebase. Suppose you have a SaaS billing system. Claude Code extracts:

From class definitions:

Organization, OrganizationAccount, Org, Account, Workspace, Tenant

From database schema:

organizations table, accounts table, org_context table

From configuration files:

org_id, organization_id, account_id, tenant_id

From comments:

"The org represents a single customer's account"
"Organizations can have multiple billing accounts"
"Tenants are isolated workspaces"

Now you see the problem clearly: five different terms for essentially the same concept, used inconsistently across layers. That's the kind of entropy a glossary resolves. And notice how different layers prefer different names—this isn't malice or incompetence, it's natural divergence. The database team thinks in relational terms (organizations table). The frontend team thinks in user experience terms (workspaces). The configuration team uses abbreviations (org_id). A good glossary acknowledges all these perspectives while establishing a shared vocabulary.

Phase 2: Resolving Conflicts and Merging Synonyms

Here's where your codebase probably has entropy. Different teams or different eras of the code use different names for the same thing. This is completely normal and expected. It happens in every mature codebase. The question is whether you manage this entropy consciously or let it grow unchecked.

Let's say your extraction found:

Workspace (in frontend code)
Organization (in backend code)
Tenant (in database schema)
Account (in billing module)
Org (in configuration)

They all mean the same thing: a logical grouping of users and resources under a single administrative boundary. But they mean slightly different things to different teams. The frontend team emphasizes the user's experience (a workspace is where they work). The backend team emphasizes the organizational structure (an organization is the administrative unit). The database team emphasizes multi-tenancy (a tenant is an isolated instance). The billing team emphasizes the billing unit (an account is what gets billed).

Finding Synonyms and Aliases

Claude Code can help you spot these relationships by:

Looking at data flow: Which classes pass Workspace objects to which classes? If Workspace flows into code that expects Organization, they're probably the same.
Reading tests: Test names often describe what concepts interact. test_workspace_can_invite_members and test_organization_billing_setup both reference the same entity from different angles.
Checking database schema: Your ORM models or database migrations reveal the true schema. If Workspace maps to organizations table, they're the same.
Analyzing comments: Comments often clarify "this is the billing term for workspace" or "called tenant internally."

Here's what a conflict resolution document looks like:

yaml

# resolved_terms.yaml
canonical_terms:
  - id: "workspace_org"
    canonical_name: "Organization"
    business_term: "Workspace"
    technical_name: "Organization"
    database_name: "organizations"
    aliases:
      - "Workspace" # used in frontend
      - "Tenant" # used in multi-tenant contexts
      - "Account" # used in billing
      - "Org" # used in configs
    definition: |
      A logical grouping of users and resources under single administrative
      control. Core identity in the system. Responsible for billing,
      permissions, and resource allocation.
    locations:
      frontend: "src/types/Workspace.ts"
      backend: "src/domain/Organization.java"
      database: "schema/organizations.sql"
      config: "config.yaml"
    relationships:
      - type: "contains"
        target: "User"
        cardinality: "1:N"
      - type: "has"
        target: "BillingProfile"
        cardinality: "1:1"
      - type: "generates"
        target: "Invoice"
        cardinality: "1:N"
    tags: ["core-entity", "multi-tenant", "billing-unit"]
 
  - id: "payment_transaction"
    canonical_name: "PaymentTransaction"
    business_term: "Payment"
    technical_name: "PaymentTransaction"
    database_name: "payment_transactions"
    aliases:
      - "Payment"
      - "PendingPayment" # when status == PENDING
      - "FailedPayment" # when status == FAILED
    definition: |
      A record of money movement. Can be pending, completed, failed, or refunded.
      Every invoice generates payment transactions when charged.
    locations:
      backend: "src/domain/PaymentTransaction.java"
      database: "schema/payments.sql"
      api: "src/routes/payments.ts"
    enums:
      status:
        - "PENDING"
        - "COMPLETED"
        - "FAILED"
        - "REFUNDED"
    tags: ["financial-entity", "time-sensitive", "audit-critical"]

Handling Legitimate Duplicates

Not every duplicate is a mistake. Sometimes you want different names. This is crucial to understand, because the goal isn't perfect uniformity—it's conscious consistency.

For example:

Customer in the business sense (what your marketing says)
User in the technical sense (what the auth system says)
Account in the billing sense

These aren't synonyms—they're different facets of the same person. Each term emphasizes a different aspect. A customer is a business relationship. A user is an authentication identity. An account is a billing entity. Conflating them would actually make your system more confusing, not less. The glossary should capture this:

yaml

- id: "business_user_distinction"
  term: "Customer vs. User vs. Account"
  explanation: |
    While all three refer to the same human in the system, they're used
    in different contexts:
 
    - Customer: How we refer to them in marketing/business logic
    - User: How the auth system refers to them
    - Account: How the billing system refers to them
 
    When implementing features, use the term appropriate to your module:
    - Backend API: use "User"
    - Billing reports: use "Account"
    - Customer-facing docs: use "Customer"
  related_terms:
    - "User"
    - "Customer"
    - "Account"
  code_locations:
    - file: "src/auth/User.ts"
      type: "interface"
    - file: "src/billing/Account.java"
      type: "class"
    - file: "docs/glossary.md"
      type: "documentation"

This prevents false merging while still making the relationships explicit. Your team understands that yes, these are different terms for related concepts, and yes, that's intentional. The distinction is documented, which is infinitely better than hidden entropy.

Building a Conflict Resolution Workflow

In practice, you'll want a team discussion before finalizing resolutions. Create a file that surfaces conflicts in a format designed for discussion:

markdown

# Glossary Conflicts Identified
 
## Conflict Group 1: Organization/Workspace/Tenant/Account
 
**Terms found**: 5 variations
 
- Organization (50 occurrences in backend)
- Workspace (32 occurrences in frontend)
- Tenant (28 occurrences in schema)
- Account (15 occurrences in billing)
- Org (8 occurrences in config)
 
**Recommended canonical name**: Organization
 
**Reasoning**:
 
- Most frequent in backend code
- Database table is "organizations"
- Most technically precise (captures the administrative grouping concept)
 
**Question for team**: Should we deprecate Workspace in the frontend, or keep it as an alias?
 
---
 
## Conflict Group 2: Payment/Transaction/PaymentTransaction
 
**Terms found**: 3 variations
 
- Payment (40 occurrences)
- PaymentTransaction (22 occurrences)
- Transaction (15 occurrences)
 
**Recommended canonical name**: PaymentTransaction
 
**Reasoning**:
 
- Most precise (distinguishes from other types of transactions)
- Database table is "payment_transactions"
- Clearer intent in code reviews
 
---

Team discussions around this file help resolve ambiguity faster than trying to decide in code reviews. And notice how this document presents data, not ultimatums. The frequency information gives context. The reasoning explains the recommendation but invites questions. This format encourages discussion instead of defensiveness.

Phase 3: Compiling Your Glossary

Now we take all the extracted and resolved terms and build a structured glossary that's actually useful. This should be more than just a list—it should be navigable, linkable, and reference the actual code.

The Master Glossary Format

Here's a comprehensive structure:

yaml

# glossary.yaml
version: "1.0"
last_updated: "2026-03-16"
maintainers:
  - name: "Platform Team"
    slack: "#platform-architecture"
  - name: "Data Team"
    slack: "#data-engineering"
 
glossary:
  # Core Entities
  organizations:
    - term: "Organization"
      id: "org_001"
      category: "core-entity"
      definition: |
        A logical grouping of users and resources. The unit of multi-tenancy
        in the system. Responsible for billing, permissions, and subscriptions.
 
      business_term: "Workspace"
      technical_name: "Organization"
      database_table: "organizations"
      api_resource: "/api/v1/organizations"
 
      aliases:
        - term: "Workspace"
          context: "used in frontend and customer conversations"
          deprecated: false
 
        - term: "Tenant"
          context: "used when discussing multi-tenancy architecture"
          deprecated: false
 
        - term: "Account"
          context: "used in billing context, legacy, avoid in new code"
          deprecated: true
 
      attributes:
        - name: "id"
          type: "UUID"
          description: "Unique identifier"
 
        - name: "name"
          type: "string"
          description: "Display name, user-facing"
 
        - name: "billing_email"
          type: "string"
          description: "Where invoices are sent"
 
        - name: "status"
          type: "enum"
          values: ["ACTIVE", "SUSPENDED", "DELETED"]
 
      relationships:
        - target: "User"
          type: "contains"
          cardinality: "1:N"
          description: "An org has many users with different roles"
          code_reference: "src/domain/Organization.java:45"
 
        - target: "Subscription"
          type: "has"
          cardinality: "1:N"
          description: "An org can have multiple active subscriptions"
          code_reference: "src/billing/SubscriptionService.java:78"
 
        - target: "Invoice"
          type: "generates"
          cardinality: "1:N"
          description: "Billing cycles generate invoices"
          code_reference: "src/billing/InvoiceGenerator.java:22"
 
      code_locations:
        - file: "src/domain/Organization.java"
          type: "class"
          line: 12
 
        - file: "src/types/Organization.ts"
          type: "interface"
          line: 5
 
        - file: "schema/migrations/001_organizations.sql"
          type: "table"
          line: 1
 
        - file: "docs/architecture.md"
          type: "documentation"
          section: "Core Concepts"
 
      related_terms:
        - "User"
        - "Subscription"
        - "Invoice"
        - "BillingProfile"
 
      tags:
        - "core-entity"
        - "multi-tenant"
        - "billing-unit"
        - "audit-critical"
 
      examples:
        - context: "Creating an organization"
          code: |
            const org = await createOrganization({
              name: "Acme Corp",
              billing_email: "billing@acme.com"
            });
 
        - context: "Querying users in organization"
          code: |
            const users = await getUsersByOrganization(org.id);
 
      notes: |
        - The id field is immutable
        - Organizations can be soft-deleted but not hard-deleted
        - Changing org name does not affect user permissions
        - Billing email is separate from member emails
 
  # Financial Entities
  financial:
    - term: "Invoice"
      id: "inv_001"
      category: "financial-entity"
      definition: |
        A billing document sent to the organization. Generated at the end
        of each billing cycle. Contains line items for each subscription
        during that period.
 
      business_term: "Invoice"
      technical_name: "Invoice"
      database_table: "invoices"
      api_resource: "/api/v1/invoices"
 
      attributes:
        - name: "id"
          type: "UUID"
          description: "Unique invoice ID"
 
        - name: "organization_id"
          type: "UUID"
          description: "Which org this invoice is for"
 
        - name: "invoice_number"
          type: "string"
          description: "Human-readable number (e.g., INV-2026-001)"
 
        - name: "billing_period_start"
          type: "date"
 
        - name: "billing_period_end"
          type: "date"
 
        - name: "total_amount"
          type: "decimal(12,2)"
 
        - name: "status"
          type: "enum"
          values: ["DRAFT", "SENT", "PAID", "OVERDUE", "CANCELLED"]
 
        - name: "due_date"
          type: "date"
 
      relationships:
        - target: "Organization"
          type: "belongs_to"
          cardinality: "N:1"
 
        - target: "LineItem"
          type: "contains"
          cardinality: "1:N"
 
        - target: "Payment"
          type: "has"
          cardinality: "1:N"
 
      code_locations:
        - file: "src/billing/Invoice.java"
          type: "class"
          line: 25
 
        - file: "src/billing/InvoiceGenerator.ts"
          type: "service"
          line: 45
 
        - file: "schema/invoices.sql"
          type: "table"
 
      tags:
        - "financial-entity"
        - "audit-critical"
        - "customer-facing"
 
      notes: |
        - Invoices are immutable once sent
        - Total is calculated from line items
        - Status transitions: DRAFT -> SENT -> [PAID | OVERDUE] -> [CANCELLED]
 
  # Enums and Value Objects
  enums:
    - term: "PaymentStatus"
      id: "enum_payment_status"
      category: "value-object"
      definition: |
        Represents the current state of a payment attempt. Payments can be
        pending (awaiting processing), completed (funds received), failed
        (unsuccessful attempt), or refunded (money returned).
 
      code_location: "src/domain/PaymentStatus.java:1"
 
      values:
        - value: "PENDING"
          description: "Payment submitted but not yet processed"
 
        - value: "COMPLETED"
          description: "Payment successfully processed and funds received"
 
        - value: "FAILED"
          description: "Payment attempt failed (insufficient funds, invalid card, etc.)"
 
        - value: "REFUNDED"
          description: "Payment was refunded (partially or fully)"
 
      state_transitions:
        - from: "PENDING"
          to: ["COMPLETED", "FAILED"]
 
        - from: "COMPLETED"
          to: ["REFUNDED"]
 
        - from: "FAILED"
          to: ["PENDING"] # retry
 
        - from: "REFUNDED"
          to: [] # terminal state
 
      tags: ["financial-state", "time-sensitive"]
 
# Configuration reference
configuration:
  renewal: "Every sprint"
  process: |
    1. Run extraction queries against codebase
    2. Compare against this glossary
    3. Add new terms to candidates section
    4. Resolve conflicts with platform team
    5. Update canonical terms
    6. Version bump and commit
 
  validation:
    - "Every term must have at least one code location"
    - "Aliases should reference deprecation status"
    - "Relationships must be bidirectional (if A->B, then B's related_terms includes A)"
    - "No orphaned terms (every term should relate to at least one other)"

This structure is dense, yes—but it's complete. Each term has:

Canonical naming (what we call it)
Aliases (what others call it)
Definition (what it means in business terms)
Technical details (database table, API endpoint, enum values)
Code locations (where it lives)
Relationships (what it connects to)
Examples (how it's actually used)

The machine-readable format (YAML) makes it possible to validate, query, and cross-reference programmatically. The human-readable structure makes it easy to understand without tools.

Making Your Glossary Navigable

A YAML file is great for machine reading, but humans need a better interface. Here's how to build a readable glossary that people actually use:

markdown

# Domain Glossary: Financial Domain
 
## Core Concepts
 
### Invoice
 
**ID:** inv_001
**Status:** Active
**Last Updated:** 2026-03-16
 
**Definition:**
A billing document sent to the organization. Generated at the end of each
billing cycle. Contains line items for each subscription during that period.
 
**Also Known As:**
 
- Invoice (primary term)
- Billing Statement (external communication)
 
**Location in Code:**
 
- Class: `src/billing/Invoice.java:25`
- Database: `schema/invoices.sql`
- API: `GET /api/v1/organizations/{org_id}/invoices`
 
**Related Terms:**
 
- Organization (invoices belong to organizations)
- LineItem (invoices contain line items)
- Payment (payments settle invoices)
 
**Attributes:**
 
- `id` (UUID): Unique invoice identifier
- `organization_id` (UUID): Billing organization
- `invoice_number` (String): Human-readable number (INV-2026-001)
- `total_amount` (Decimal): Total due
- `status` (Enum): DRAFT, SENT, PAID, OVERDUE, CANCELLED
- `due_date` (Date): Payment deadline
 
**Status Flow:**

DRAFT → SENT → PAID / OVERDUE → CANCELLED (optional)


**Example:**
```typescript
// Query all unpaid invoices for an organization
const unpaidInvoices = await getInvoices(orgId, {
  status: ["SENT", "OVERDUE"]
});

// Pay an invoice
await recordPayment(invoice.id, {
  amount: invoice.totalAmount,
  method: "credit_card"
});

Notes:

Invoices are immutable once sent (prevents billing disputes)
Automatically generated from billing cycles
Retrigger invoice generation if subscription changes mid-cycle


You can generate this markdown from your YAML glossary programmatically. Then commit both—the machine-readable YAML for validation and the markdown for humans. This gives you the best of both worlds: programmatic power and human usability.

## Keeping Your Glossary Alive

A glossary that doesn't evolve is dead weight. Here's how to keep it current:

### Monthly Refresh Cycle

```yaml
# scripts/refresh_glossary.yaml
monthly_process:
  step_1_extract:
    description: "Run extraction patterns against codebase"
    command: |
      grep -r "^class\s" --include="*.java" --include="*.ts" -A5 \
        | grep -E "(class|/*)" > /tmp/extracted_classes.txt
    output: "extracted_candidates.yaml"

  step_2_compare:
    description: "Find new terms and conflicts"
    command: "diff extracted_candidates.yaml glossary.yaml"
    output: "candidates_for_review.md"

  step_3_review:
    description: "Team review and conflict resolution"
    participants: ["Platform Team", "Backend Team"]
    approval_required: true

  step_4_update:
    description: "Merge approved changes into glossary"
    command: "merge_glossary_updates.sh"
    output: "glossary.yaml (new version)"

  step_5_publish:
    description: "Generate markdown and commit"
    command: "generate_glossary_markdown.sh && git commit"
    output: "docs/glossary.md, glossary.yaml"

Linking from Code

Make the glossary part of your workflow:

java

/**
 * Represents a monetary transaction.
 *
 * @see glossary#inv_001 (Invoice)
 * @see glossary#enum_payment_status (PaymentStatus enum)
 *
 * A Payment represents a completed or pending money movement.
 * See the domain glossary for detailed relationships.
 */
public class Payment {
  private UUID id;
  private PaymentStatus status;
  private BigDecimal amount;
 
  // ...
}

In your README or architecture docs:

markdown

## Domain Model
 
Before diving into the code, read the [Domain Glossary](./docs/glossary.md).
This explains core entities like Organization, Invoice, and Subscription
in both business and technical terms.

Automating Detection of Drift

You can write a validation script that checks if your glossary is falling out of sync:

bash

#!/bin/bash
# validate_glossary.sh
 
GLOSSARY="docs/glossary.yaml"
CODEBASE="src/"
 
# Check 1: Are all documented classes still in the code?
echo "Checking for deleted classes..."
grep "class: " "$GLOSSARY" | while read line; do
  class_path=$(echo "$line" | sed 's/.*class: //g' | cut -d: -f1)
  if [ ! -f "$class_path" ]; then
    echo "WARNING: $class_path referenced in glossary but not found in code"
  fi
done
 
# Check 2: Are there new major classes not in glossary?
echo "Checking for undocumented classes..."
find "$CODEBASE" -name "*.java" -o -name "*.ts" | while read file; do
  class_name=$(grep "^class\s\|^interface\s\|^type\s" "$file" | head -1 | awk '{print $2}' | sed 's/{.*//')
  if ! grep -q "$class_name" "$GLOSSARY"; then
    echo "CANDIDATE: $class_name in $file (not in glossary)"
  fi
done

Run this monthly. It surfaces candidates for addition and warnings about deleted classes. The automation takes the drudgery out of maintenance.

Real-World Example: Building a Payment Glossary

Let's walk through this with a concrete example. Imagine you're building the glossary for a payments system. Here's what you'd extract:

From class names:

PaymentProcessor
PaymentGateway
RefundHandler
FailedPaymentRetry

From database schema:

payments table
payment_intents table
refunds table

From configuration:

payment_gateway_timeout_seconds
max_retry_attempts
refund_processing_days

From tests:

test_pending_payment_becomes_completed
test_failed_payment_triggers_retry
test_refund_reverses_payment

Now you start asking: What's the relationship between PaymentProcessor and PaymentGateway? You read the code. PaymentProcessor calls PaymentGateway. So PaymentGateway is a dependency, a tool that PaymentProcessor uses.

What about FailedPaymentRetry? Looking at the code, when a payment fails, a retry is scheduled. That retry is itself a payment attempt. So FailedPaymentRetry is actually a state—a Payment with retry logic.

Your glossary captures this:

yaml

payment_entities:
  - term: "Payment"
    definition: "A record of attempted money movement"
    variants:
      - "PendingPayment" (when status == PENDING)
      - "FailedPayment" (when status == FAILED, triggers retry)
      - "CompletedPayment" (when status == COMPLETED)
 
  - term: "PaymentGateway"
    definition: "External service that processes payment authorization"
    examples:
      - "Stripe"
      - "Square"
      - "Adyen"
    relationships:
      - "used by PaymentProcessor"

This is vastly more useful than just a list of names. It captures intent, relationships, and context.

Tools and Automation for Glossary Building

You don't have to do this manually. Here are tools that help:

Natural Language Processing:

Claude Code can read code and extract semantic meaning (not just regex patterns)
Identify terms that appear frequently together (likely related)
Flag inconsistencies in naming

Visualization:

Generate entity diagrams showing relationships
Highlight synonyms visually
Show code locations on a heat map

Validation:

Automated checks that all documented terms exist
Warnings when new major classes appear
Detection of orphaned terms (concepts with no relationships)

Integration:

Publish the glossary to your internal wiki
Add glossary links to IDE hover-text (via comment parsing)
Index the glossary for team search

Common Pitfalls and How to Avoid Them

Pitfall 1: Over-comprehensive

Including every variable name and function in your glossary bloats it. Only include business domain terms. Leave implementation details to the code.

Pitfall 2: Not updating it

A stale glossary is worse than no glossary. Schedule monthly reviews. Make updating the glossary a PR requirement when concepts change.

Pitfall 3: Forgetting to link it

Your glossary is useless if nobody knows it exists. Link from:

Architecture docs
README
Code comments (via @see glossary#term_id)
Slack channels
Onboarding docs

Pitfall 4: Forcing false consensus

If your frontend calls something different from your backend and it's actually different conceptually, don't force a merge. Document the distinction instead.

Pitfall 5: Making it too formal

Your glossary is not a legal document. Make definitions clear, not verbose. Use examples. Use human language. Avoid jargon.

Pitfall 6: Building it once and ignoring it

This is the death knell. Once built, your glossary must evolve with your codebase. Assign an owner. Schedule reviews. Make it a living document.

Putting It All Together: Your First Month

Here's how to start:

Week 1: Extraction

bash

# Use Claude Code to scan your codebase
# Generate extracted_candidates.yaml with patterns
# Should find 50-100 candidate terms

Week 2: Manual review

# Read your extracted terms
# Group obvious synonyms
# Mark questions for the team
# Store in candidates_for_review.md

Week 3: Resolution

# Team discussion (async Slack thread or synchronous meeting)
# Decide on canonical names
# Document relationships
# Update resolved_terms.yaml

Week 4: Compilation and publication

# Build final glossary.yaml
# Generate glossary.md
# Commit both to repository
# Announce to team, get feedback

Ongoing: Maintenance

# Monthly extraction and comparison
# Review new candidates in PR
# Update on every domain change
# Link from docs and code

Measuring Glossary Impact

Once you've built your glossary, how do you know it's working?

Onboarding time: Track time-to-productivity for new engineers. A good glossary should reduce the time spent asking "what does this term mean?"

Documentation quality: Count references to glossary terms in your architecture docs. More references = better alignment.

Naming consistency: Run your validation script monthly. Fewer "undocumented classes" means better team discipline.

Developer satisfaction: In retros, ask if the glossary helps. "Does the glossary help you understand the codebase?" Measure over time.

Conclusion

A domain glossary seems like busywork until it's not. The moment a new engineer can read your architecture docs, understand what a "workspace" is and how it differs from an "organization," and jump into the codebase with fewer questions—that's when you realize its worth.

Claude Code makes building and maintaining this glossary feasible. It can extract terms from your codebase intelligently, identify relationships, and flag conflicts. It handles the grunt work so you can focus on the conversations that matter.

Start small. Extract fifty terms. Resolve the obvious conflicts. Build a first draft. Then iterate. Your glossary will grow as your understanding of your own domain deepens.

Your future team members will thank you. More importantly, your current team will work faster, ship better code, and maintain fewer naming-related bugs. The glossary becomes a shared reference that everyone relies on, reducing friction and accelerating decision-making across your entire engineering organization.

Advanced Domain Glossary Techniques

Once you've built a basic glossary, you can expand it in powerful ways. Consider building specialized glossaries for specific domains:

Financial domain: Terms like "Payment," "Invoice," "Settlement," "Refund," "Charge," "Credit," "Debit." Each with their own state machines and relationships.

User management domain: "Organization," "User," "Role," "Permission," "Session," "Tenant." Understanding how these relate prevents permission bugs.

Data domain: "Dataset," "Pipeline," "Transformation," "Schema," "Partition," "Index." Technical terms but critical for data engineering.

You can have a master glossary that links to domain-specific glossaries. This scales better than trying to put everything in one document. The hierarchical approach also makes it easier to onboard people new to specific domains—they can read the domain glossary rather than needing to understand your entire system.

A glossary sitting in a file that nobody looks at is useless. Make it visible:

HTML version: Generate a searchable HTML version that's easy to browse
API documentation: Link glossary terms from your API docs
IDE plugins: Show glossary definitions in IDE hover tooltips
Slack bot: /glossary invoice shows the definition of Invoice
Wiki integration: Embed the glossary in your internal wiki
Code comments: Reference glossary terms in comments: @glossary org_001

The more accessible your glossary is, the more people will use it. And the more people use it, the more valuable it becomes. This is a classic network effect—the glossary's value grows exponentially as adoption increases.

Building Business Alignment

One of the secret benefits of domain glossaries is that they create alignment between technical and business teams. When your product team and engineering team are using the same terms, communication becomes clearer. Ambiguity decreases. Decisions get made faster.

Product managers might say "the workspace is the primary unit of billing." Engineers might say "yes, we call that Organization in the code, but Workspace in the frontend." The glossary makes this explicit. It's no longer a source of confusion—it's documented and understood.

This alignment also helps when you hire new people. Onboarding becomes "read the glossary" instead of "figure out what everyone means by these terms." The glossary becomes your shared vocabulary transmission mechanism.

Maintaining Long-Term Accuracy

The biggest challenge with domain glossaries is keeping them accurate as the codebase evolves. Here's how to maintain long-term accuracy:

Make it part of the code review process: When someone introduces a new domain term or changes how a term is used, that should be a code review comment: "This introduces a new term. Should we update the glossary?" This converts the glossary from a passive reference into an active part of your development workflow. Every code change that touches domain concepts flows through the glossary maintenance process.

Assign a glossary owner: Someone (ideally a tech lead or architect) is responsible for keeping it current. Make it part of their job, not something they do in spare time. This person is the glossary expert, the person who understands all the nuances. When someone has a question about terminology, they know who to ask. When ambiguity arises, this person helps resolve it.

Generate warnings for orphaned terms: Run a script that finds glossary terms that aren't used anywhere in the code. Either they're outdated and should be removed, or they're used in places the script didn't look. This automation surfaces drift. If you have a glossary term that hasn't been mentioned in the codebase for six months, that's signal that either the feature was removed (and the glossary term should be deprecated) or the term is being used in ways the script doesn't detect.

Version your glossary: Use semantic versioning. Major version for big changes (renamed terms), minor for additions, patch for clarifications. This makes it easy to track what changed between versions. Your changelog explains why changes happened. Over time, the version history becomes a record of how your domain model evolved.

Celebrate deprecations: When you retire a term and replace it with a new one, document why. "We renamed Account to Organization because Account was ambiguous (could mean user account, billing account, or email account). Organization is more precise." These deprecation notes are valuable for anyone trying to understand your codebase. They see that a term exists in old code but understand why it was retired.

Measuring Glossary Impact

How do you know your glossary is working? Track metrics over time:

Onboarding time: New engineers should get up to speed faster with the glossary. Track how long it takes new hires to understand domain concepts. When the glossary improves, this metric should decrease. If new hire onboarding time goes from 2 weeks to 1 week because they can read the glossary and understand your domain in a day, that's measurable impact.

Code review cycles: The glossary should reduce back-and-forth in code reviews about terminology. "What do you mean by tenant? Do you mean workspace?" These conversations should decrease. When reviews focus on logic and design instead of terminology confusion, you know the glossary is working.

Documentation completeness: Count references to glossary terms in your architecture docs. More references means better integration. When your architecture docs link to glossary definitions, developers have one source of truth.

Support ticket volume: Reduce support questions about terminology. "What's the difference between an invoice and a statement?" Fewer of these tickets means the glossary is doing its job. When you can point customers to your glossary, you've converted documentation debt into documentation value.

Team satisfaction: Periodically ask your team: "Does the glossary help you understand the codebase?" As the glossary improves, satisfaction should increase. It's a simple indicator that the glossary adds value.

The glossary that lives and evolves with your codebase is infinitely more valuable than the one you built once and never updated again. Make it a first-class artifact that receives the same care as your code itself. When your team views the glossary as a core part of your architecture, you know you've succeeded. The glossary becomes not just a reference but a tool that shapes how your team thinks about the domain.

Advanced Application: Glossaries as Refactoring Tools

Once you've built a domain glossary, you unlock a superpower: systematic refactoring. Armed with knowledge of which terms are synonyms, you can safely rename code elements across your codebase. Not with fear of breaking things—with confidence.

Suppose your glossary says "Workspace and Organization are synonyms, with Organization being canonical." Now you can confidently find every reference to Workspace in your backend code and rename it to Organization. Your IDE's refactoring tools will help. More importantly, anyone reviewing the refactoring can read the glossary and understand why the change is happening. The glossary becomes justification for the refactoring.

This doesn't just save time—it prevents the alternative where entropy grows unchecked. Without a glossary, different teams keep using their preferred terms. After five years, you have so many synonyms for the same concept that refactoring becomes impossible. The glossary allows you to actively manage naming consistency as a part of ongoing development, not something you attempt once every decade when things get too bad.

The refactoring becomes a form of continuous improvement. Every sprint, your team picks one group of synonyms and harmonizes them. Over time, your codebase becomes steadily more consistent. Developers read code more easily. New features are easier to implement because you're not fighting inconsistent naming. That compounds into significant velocity gains.

Integrating Glossaries with Code Generation and Documentation

Modern development increasingly uses code generation—from OpenAPI specs to gRPC definitions to database ORMs. A domain glossary should be the source of truth that feeds into all these systems. Instead of having your glossary be one artifact and your code be another, make them reference each other.

For example, your OpenAPI schema could reference glossary terms:

yaml

paths:
  /organizations:
    get:
      summary: List organizations
      description: |
        Retrieve organizations for the authenticated user.
        See glossary: [Organization]
      responses:
        200:
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/Organization"
 
components:
  schemas:
    Organization:
      type: object
      description: |
        An Organization is a logical grouping of users and resources
        under single administrative control. See glossary for full definition.
      properties:
        id:
          type: string
          description: Unique identifier
        name:
          type: string
          description: Display name for the organization

This approach keeps documentation synchronized with code. When the glossary changes, API documentation automatically reflects that change. When the API schema changes, you check the glossary to understand if this requires a terminology change. They're no longer separate concerns—they're integrated.

The Glossary as Onboarding Curriculum

Beyond just a reference, a well-structured domain glossary can actually serve as an onboarding curriculum. New engineers should read the glossary in order, learning foundational terms first, then increasingly specific terms. The glossary becomes a narrative about your domain, not just an alphabetical list.

Structure your glossary with layers:

Layer 1: Core concepts (Organization, User, Account)
Layer 2: Business processes (Billing, Authentication, Deployment)
Layer 3: Technical implementation (PaymentGateway, OrganizationService)

A new engineer reads Layer 1 and understands the business model. They read Layer 2 and understand what the system does. They read Layer 3 and understand how it's implemented. By the end, they have a coherent mental model of your system. This is far more effective than having them grep the codebase for a week and trying to infer the domain model.

Some teams even add "why" explanations to glossary terms. "Why did we create two separate concepts for User (identity) and Profile (metadata)?" These historical notes help new engineers understand the architecture decisions. They learn not just what exists but why it exists, which makes them better at extending and maintaining the system.

Scaling Glossaries Across Multiple Teams

In larger organizations with multiple engineering teams, a domain glossary becomes a coordination mechanism. Different teams need to understand how their domains interact. A shared glossary makes those interactions explicit.

The challenge is avoiding a glossary so large it becomes unwieldy. The solution is hierarchical glossaries. Each team maintains a team glossary for their domain-specific terms. The organization maintains a core glossary of cross-team terms that appear in multiple domains. A developer needing to understand their team's domain reads their team's glossary. Needing to understand how their domain connects to others, they read the core glossary.

With this structure, each glossary stays manageable. The core glossary usually has 50-100 terms—the critical concepts every engineer needs to understand. Team glossaries can be 100-500 terms, deep enough to be useful without becoming unwieldy. When multiple teams define similar terms, the core glossary specifies the official definition and how team variations relate to it.

Glossaries and Technical Debt

A domain glossary is a form of documentation that directly addresses technical debt. Most technical debt has a naming component: code that's been refactored multiple times, accrued naming inconsistency, and now new developers can't understand it because there are three names for the same thing.

By building a glossary, you're explicitly addressing this debt. The glossary creates clarity where there was confusion. This is debt reduction, not elimination—the inconsistent code still exists—but at least new people understand it. Over time, as you systematically refactor to use canonical names, you actually eliminate the debt.

This turns the glossary from a "nice-to-have documentation" into a "strategic debt reduction tool." When you can quantify that the glossary saves 30% of onboarding time, it's not a documentation project—it's an engineering efficiency project. Frame it that way and you get engineering budget for it.

Maintaining Glossaries Over Time

A glossary isn't built once and forgotten. It needs maintenance. As your code evolves, your glossary evolves. New concepts emerge. Old concepts become obsolete. You discover that what you thought were synonyms are actually subtly different. Maintaining accuracy is important because a glossary that's wrong is worse than no glossary at all.

Establish a process for glossary updates. When someone adds significant new functionality, they update the glossary to document new terms. When a refactoring consolidates two concepts into one, the glossary reflects that. When architectural changes introduce new domain concepts, the glossary captures them. This doesn't need to be burdensome—a simple rule like "glossary updates are part of pull request review" keeps it current.

Claude Code agents can help with maintenance. A glossary agent can run periodically, scanning for new class definitions, new API endpoints, new configuration keys that don't yet appear in the glossary. It flags potential new terms for human review. This automated detection prevents the glossary from drifting out of sync with reality. The agent can't resolve conflicts or make judgment calls, but it can catch what humans might miss when busy shipping features.

Conclusion: Building Organizational Memory

At its deepest level, a domain glossary is how your organization encodes its understanding into a persistent, learnable form. Every concept your engineers have discovered, every naming decision you've made, every relationship between entities—it all gets preserved in the glossary.

This is organizational memory. It's what allows your organization to grow from 5 engineers to 50 without losing coherence. It's what lets you hire aggressively without sacrificing code quality. It's what prevents the situation where only one person understands the billing system.

The investment in building and maintaining a domain glossary is an investment in organizational intelligence. It compounds. Each engineer who reads the glossary learns faster and makes better decisions. Each refactoring guided by the glossary makes the codebase more consistent. Each new architecture built with glossary terms in mind is better aligned with existing systems.

This is why mature organizations treat domain glossaries as essential infrastructure. They're not a luxury—they're how you scale human intelligence across a growing organization.

-iNet

Building a Domain Glossary from Code and Docs

Why Domain Glossaries Matter (And Why Most Teams Skip Them)

The Three Parts of This Workflow

Phase 1: Extracting Terms from Code and Comments

Setting Up Your Extraction Query

Running the Extraction with Claude Code

Building Your Extraction Output

Real-World Extraction Example

Phase 2: Resolving Conflicts and Merging Synonyms

Finding Synonyms and Aliases

Handling Legitimate Duplicates

Building a Conflict Resolution Workflow

Phase 3: Compiling Your Glossary

The Master Glossary Format

Making Your Glossary Navigable

Linking from Code

Automating Detection of Drift

Real-World Example: Building a Payment Glossary

Tools and Automation for Glossary Building

Common Pitfalls and How to Avoid Them

Putting It All Together: Your First Month

Measuring Glossary Impact

Conclusion

Advanced Domain Glossary Techniques

Building Business Alignment

Maintaining Long-Term Accuracy

Measuring Glossary Impact

Advanced Application: Glossaries as Refactoring Tools

Integrating Glossaries with Code Generation and Documentation

The Glossary as Onboarding Curriculum

Scaling Glossaries Across Multiple Teams

Glossaries and Technical Debt

Maintaining Glossaries Over Time

Conclusion: Building Organizational Memory

Need help implementing this?

Why Domain Glossaries Matter (And Why Most Teams Skip Them)

The Three Parts of This Workflow

Phase 1: Extracting Terms from Code and Comments

Setting Up Your Extraction Query

Running the Extraction with Claude Code

Building Your Extraction Output

Real-World Extraction Example

Phase 2: Resolving Conflicts and Merging Synonyms

Finding Synonyms and Aliases

Handling Legitimate Duplicates

Building a Conflict Resolution Workflow

Phase 3: Compiling Your Glossary

The Master Glossary Format

Making Your Glossary Navigable

Linking from Code

Automating Detection of Drift

Real-World Example: Building a Payment Glossary

Tools and Automation for Glossary Building

Common Pitfalls and How to Avoid Them

Putting It All Together: Your First Month

Measuring Glossary Impact

Conclusion

Advanced Domain Glossary Techniques

Publishing and Sharing Your Glossary

Building Business Alignment

Maintaining Long-Term Accuracy

Measuring Glossary Impact

Advanced Application: Glossaries as Refactoring Tools

Integrating Glossaries with Code Generation and Documentation

The Glossary as Onboarding Curriculum

Scaling Glossaries Across Multiple Teams

Glossaries and Technical Debt

Maintaining Glossaries Over Time

Conclusion: Building Organizational Memory

Need help implementing this?