December 29, 2025
Claude Development Automation Tutorial

Building Reusable AI Workflows with Claude Code Skills

Ever found yourself giving Claude the same instructions over and over? "Okay, run this code quality check, then generate the report, then format it like we did last time..." You know the workflow cold. Claude knows it too, until the next conversation where you start from scratch.

That's where skills come in.

Skills are your way of teaching Claude permanent procedures. They're not model training or fine-tuning. They're more like onboarding documents that Claude automatically finds and applies when needed. Write a skill once, and every future Claude instance knows how to execute that workflow without you having to repeat yourself.

Let me show you how to build them.

Table of Contents
  1. What Skills Actually Are
  2. The Key Insight: Progressive Disclosure
  3. Creating Your First Skill: A Practical Example
  4. Step 1: Name and Metadata
  5. Step 2: Directory Structure
  6. Step 3: Write the Skill Body
  7. Overview
  8. When to Use
  9. Core Review Checklist
  10. Quick Reference
  11. Common Issues to Flag
  12. Step 4: Add Supporting Resources (Optional)
  13. Automated Checking
  14. Understanding Metadata Fields
  15. Writing Effective Descriptions: The Make-or-Break Section
  16. Allowed Tools: Safety Boundaries
  17. Model Targeting: When to Upgrade
  18. Context Budget: Preventing Bloat
  19. String Substitutions: Making Skills Dynamic
  20. Testing Your Skill
  21. Distributing Skills: Sharing Across Projects
  22. Building Skill Discovery Into Your Team Workflow
  23. Naming Skills for Discoverability
  24. Versioning Skills: When to Update
  25. Advanced Pattern: Multi-File Skills with Progressive Disclosure
  26. Quick Overview
  27. Key Resources
  28. Quick Checklist
  29. Creating Skill Families: Organizing Related Skills
  30. How Skills Compound: The Long Game
  31. Common Pitfalls
  32. Summary: Why Skills Matter

What Skills Actually Are

A skill is a markdown file that lives in .claude/skills/ and contains procedural knowledge: workflows, tool integrations, domain expertise, or reusable patterns. When you ask Claude to do something, Claude scans available skills and automatically loads the ones that match what you're asking.

Think of it this way:

  • Without skill: You describe the entire workflow manually each time
  • With skill: Claude finds the skill, reads it, and executes automatically

Skills have three components:

  1. Metadata (YAML frontmatter): tells Claude when to use this skill
  2. Instructions (markdown body): the actual procedural knowledge
  3. Resources (optional): scripts, references, templates that support the skill

That's it. No model training. No API configuration. Just organized documentation that Claude learns to find and apply.

The Key Insight: Progressive Disclosure

Here's what makes skills efficient: they load in layers.

Layer 1: Metadata (name + description): Always in context
Layer 2: SKILL.md body: Loaded when skill triggers
Layer 3: Bundled resources (scripts, references, assets): Loaded only when needed

This means your metadata has to be really good at describing when the skill applies. Claude reads your description and decides "yep, this matches what the user is asking for." If your description is vague, Claude won't find your skill. If it's too broad, Claude loads it when it's irrelevant.

We'll come back to this. It matters.

Creating Your First Skill: A Practical Example

Let's build a real skill from scratch: one that teaches Claude how to review pull requests in your codebase.

Step 1: Name and Metadata

Start with the YAML frontmatter. This is what Claude reads first.

yaml
---
name: pr-quality-checker
description: Use when reviewing pull requests or analyzing code changes - validates code quality, identifies issues, and generates actionable feedback using automated checks and best practices
---

Notice the format: "Use when [specific triggering condition] - [what the skill does]"

This tells Claude:

  • Triggering condition: "reviewing pull requests or analyzing code changes"
  • What it does: "validates code quality, identifies issues, and generates actionable feedback"

The triggering condition is critical. Don't write "Use when you need to review code" (too vague). Write "Use when reviewing pull requests" (specific) or even better "Use when analyzing code changes for quality issues, maintainability, or security vulnerabilities" (concrete symptoms).

Step 2: Directory Structure

Create a folder for your skill:

.claude/skills/pr-quality-checker/
├── SKILL.md
├── scripts/
│   └── quality_check.py
└── references/
    └── review_checklist.md

The SKILL.md is required. Scripts and references are optional. Only add them if you actually need reusable code or reference material. Many skills are just SKILL.md and nothing else.

Step 3: Write the Skill Body

Open SKILL.md and write the procedural knowledge:

markdown
---
name: pr-quality-checker
description: Use when reviewing pull requests or analyzing code changes - validates code quality, identifies issues, and generates actionable feedback using automated checks and best practices
---
 
# PR Quality Checker
 
## Overview
 
This skill provides a systematic approach to code review that catches common issues early, provides constructive feedback, and improves code quality across the team.
 
## When to Use
 
Use this skill when:
- Reviewing pull requests before merge
- Analyzing code changes for quality issues
- Generating feedback on code structure or performance
- Checking adherence to team standards
 
Don't use this skill for:
- Quick design discussions (no code yet)
- Reviewing non-code changes (docs, configs)
- Emergency hotfixes (use abbreviated checklist)
 
## Core Review Checklist
 
When reviewing, check in this order:
 
1. **Does it work?** Run tests, spot obvious bugs
2. **Is it maintainable?** Clear naming, reasonable complexity, documented edge cases
3. **Is it secure?** Input validation, no hardcoded secrets, proper auth checks
4. **Is it efficient?** Unnecessary loops, N+1 queries, memory leaks
5. **Does it fit our standards?** Follows team conventions, patterns, style guide
 
## Quick Reference
 
| Check | Look For | Red Flag |
|-------|----------|----------|
| **Logic** | Correctness | Loop conditions, off-by-one errors |
| **Variables** | Clear names | Single letters (except i, j, k), unclear scope |
| **Functions** | Single purpose | Doing 3+ things, hard to name |
| **Tests** | Coverage | No tests for happy path, edge cases untested |
| **Security** | No vulnerabilities | String concatenation in queries, unvalidated input |
 
## Common Issues to Flag
 
**Issue:** Long functions (>50 lines)
**Suggestion:** Extract to smaller functions with clear names
 
**Issue:** No error handling
**Suggestion:** Add try-catch with meaningful error messages
 
**Issue:** Magic numbers
**Suggestion:** Extract to named constants with comments
 
**Issue:** Nested callbacks (3+ levels)
**Suggestion:** Use promises or async/await for clarity

That's a complete skill. Notice what it includes:

  • Clear triggering conditions in the description
  • Specific checklist (not "be thorough")
  • Common issues section
  • Searchable keywords (tests, security, logic, maintainability)

Notice what it doesn't include:

  • Generic philosophy about code quality
  • Multi-language examples (stick to one)
  • Everything under the sun (focused scope)

Step 4: Add Supporting Resources (Optional)

If you have code that repeats or reference material that's 100+ lines, move it to supporting files.

scripts/quality_check.py:

python
#!/usr/bin/env python3
"""
Automated code quality checks.
Used by pr-quality-checker skill when reviewing changes.
"""
 
import sys
import os
 
def check_function_length(code):
    """Flag functions longer than 50 lines"""
    lines = code.split('\n')
    in_function = False
    count = 0
    issues = []
 
    for i, line in enumerate(lines):
        if line.startswith('def ') or line.startswith('function '):
            in_function = True
            count = 0
            func_name = line.split('(')[0].split()[-1]
        elif in_function:
            if line and not line[0].isspace():
                in_function = False
                if count > 50:
                    issues.append(f"Function '{func_name}' is {count} lines (max 50)")
            else:
                count += 1
 
    return issues
 
def check_variable_names(code):
    """Flag unclear variable names"""
    bad_patterns = [
        'var x =', 'var y =', 'var z =',  # Single letters
        'let a =', 'let b =',
        'const temp =', 'const tmp ='  # Unclear names
    ]
 
    issues = []
    for i, line in enumerate(code.split('\n'), 1):
        for pattern in bad_patterns:
            if pattern in line:
                issues.append(f"Line {i}: Unclear variable name: {line.strip()}")
 
    return issues
 
if __name__ == '__main__':
    if len(sys.argv) < 2:
        print("Usage: quality_check.py <file_path>")
        sys.exit(1)
 
    filepath = sys.argv[1]
    with open(filepath) as f:
        code = f.read()
 
    all_issues = []
    all_issues.extend(check_function_length(code))
    all_issues.extend(check_variable_names(code))
 
    for issue in all_issues:
        print(f"[WARNING] {issue}")
 
    if all_issues:
        sys.exit(1)

In SKILL.md, reference this script:

markdown
## Automated Checking
 
For faster reviews, run the automated checker:
 
    python scripts/quality_check.py <file_path>
 
This flags common issues:
- Functions longer than 50 lines
- Unclear variable names (single letters, "temp", "tmp")
- Missing error handling patterns

Now Claude knows: "If the user is reviewing code, I should mention that this script exists and they can use it."

Understanding Metadata Fields

The YAML frontmatter supports more than just name and description. Let me show you the optional fields:

yaml
---
name: pr-quality-checker
description: Use when reviewing pull requests - validates code quality, identifies issues, generates actionable feedback
allowed-tools: bash, python, file-reader
model: haiku
context-budget: 10000
---

What each field does:

  • name (required): Skill identifier. Letters, numbers, hyphens only. No special characters or parentheses.
  • description (required): 500 characters max. Determines when Claude loads this skill. Must start with "Use when..."
  • allowed-tools (optional): Comma-separated list of tools this skill can invoke. If omitted, all tools available. Used for safety (restricts what this skill can do).
  • model (optional): Forces this skill to use specific model. Values: "haiku", "sonnet", "opus". Useful if skill works best on larger model.
  • context-budget (optional): Maximum tokens this skill can consume. Prevents long skills from bloating context.

Most skills don't need these. Set them if you have specific constraints.

Writing Effective Descriptions: The Make-or-Break Section

The description field determines discoverability. Claude reads descriptions to decide which skills to load, so you need to think like a search engine for Claude's brain.

Bad descriptions are vague or generic:

  • "For code quality"
  • "Helps with reviewing"
  • "General programming skill"

These fail because Claude can't distinguish them from similar skills. When you ask Claude to "review code," does Claude load the PR skill or the security skill or the style guide skill? If all descriptions are vague, Claude has to guess.

Good descriptions start with a concrete trigger and include specific symptoms:

  • "Use when reviewing pull requests or analyzing code changes for quality issues"
  • "Use when you need to find security vulnerabilities, validate input handling, check authentication flows"
  • "Use when enforcing team coding standards, checking naming conventions, verifying structure patterns"

The difference: specificity. Each description triggers on different conditions. Claude learns to load the right skill at the right time.

Include searchable keywords throughout. Don't hide the important terms in section headers. If your skill is about testing, use words like "test," "coverage," "unit," "integration," "mock," "fixture" in the description and body. Claude's skill discovery uses keyword matching, so dense keyword coverage matters.

Allowed Tools: Safety Boundaries

The allowed-tools field restricts what a skill can do. This is security-focused: you can prevent certain skills from using certain tools.

Example:

yaml
allowed-tools: bash, python

This skill can invoke bash and python but not file-reader, network tools, etc. Useful when you're distributing a skill to others and want to guarantee it won't steal credentials or access sensitive systems.

If you don't specify allowed-tools, the skill inherits whatever tools Claude normally has access to. For most team skills, you don't need this restriction.

Model Targeting: When to Upgrade

The model field forces a skill to use specific Claude model size. You might set this if:

  • Skill requires advanced reasoning (force Opus)
  • Skill is frequently used and cost matters (force Haiku)
  • Skill needs specific capabilities (some models are better at code, others at reasoning)

Example:

yaml
model: opus

This skill always runs on Claude Opus, regardless of what model you normally use. Useful for complex analysis tasks that benefit from extra capability.

Don't overuse this. Default (whatever Claude you normally use) is fine for most skills. Only set model if you have specific performance requirements.

Context Budget: Preventing Bloat

The context-budget field caps how many tokens a skill can consume. This prevents long skills from exhausting your context window.

Example:

yaml
context-budget: 8000

If the skill body exceeds 8000 tokens, Claude clips it or skips loading. Useful for large reference skills where you want to ensure Claude doesn't accidentally load everything at once.

Again: most skills don't need this. The skill system already loads progressively (metadata -> body -> resources), so context bloat is rare. Set context-budget if your skill is massive and you want to enforce a hard limit.

String Substitutions: Making Skills Dynamic

Skills can include placeholders that Claude fills in based on what you're asking.

The syntax is $VARIABLE_NAME. Here's a skill that uses it:

markdown
---
name: code-migration-helper
description: Use when migrating code between frameworks or languages - provides step-by-step guidance for converting code to $TARGET_LANGUAGE
---
 
# Code Migration Helper
 
When migrating from $SOURCE_LANGUAGE to $TARGET_LANGUAGE:
 
1. Understand the idioms of $TARGET_LANGUAGE
2. Map $SOURCE_LANGUAGE patterns to $TARGET_LANGUAGE equivalents
3. Handle ecosystem differences (package managers, testing tools)
4. Test thoroughly in $TARGET_LANGUAGE environment

When you invoke this skill, Claude substitutes the actual source and target languages. You ask "help me migrate this from Python to Rust" and Claude replaces:

  • $SOURCE_LANGUAGE -> Python
  • $TARGET_LANGUAGE -> Rust
  • $CODE -> your actual code

This makes one skill work for dozens of migration scenarios without duplicating the skill.

Testing Your Skill

Before deploying, test that Claude actually uses your skill and uses it correctly.

Create a test scenario:

User: "Review this pull request for code quality issues"
[paste code]

Without the skill:

  • Claude gives generic feedback ("this code could be better")
  • No specific checklist
  • Misses your team's specific standards

With the skill loaded:

  • Claude mentions the checklist you wrote
  • Catches off-by-one errors
  • Flags unclear variable names
  • References your team standards

If Claude doesn't use your skill, the description probably doesn't match what you're asking. Adjust:

BAD: "Use when reviewing code" (too vague, matches everything) GOOD: "Use when reviewing pull requests or analyzing code changes for quality" (specific)

Distributing Skills: Sharing Across Projects

Once a skill is tested, you can share it with teammates or use it across projects.

Option 1: Git repository

Add .claude/skills/your-skill/ to version control. Anyone who clones the repo gets the skill.

Option 2: Standalone distribution

Package your skill as a zip file:

bash
my-skill/
├── SKILL.md
├── scripts/
   └── checker.py
└── references/
    └── checklist.md

Share the zip. Others extract it into their .claude/skills/ directory.

Option 3: Claude Plugins (Enterprise)

For organizations, skills can be packaged as Claude plugins. This requires official distribution channels, but enables instant skill access across your workspace.

Building Skill Discovery Into Your Team Workflow

Okay, you've written a skill. How do teammates find it? How does Claude know to use it?

The answer: Claude reads the skill whenever it matches the description. This happens automatically, no configuration needed. But you need to make sure the matching actually works.

Test discovery by asking Claude to do the task without mentioning the skill. Ask Claude to "review this code" not "use the pr-quality-checker skill." Claude should load the skill based on description matching alone.

If Claude doesn't load your skill, debug by:

  1. Check the description trigger: Is it specific enough? Does it mention the exact scenario?
  2. Check keywords: Does the description include words Claude would search for? If it's about "pull requests," include "PR," "review," "code changes," "quality," "issues."
  3. Check scope: If your description matches everything (too broad), Claude might load it incorrectly or not at all.

A team workflow that works:

  1. Person A writes a skill for "testing TypeScript with Vitest"
  2. Person B asks Claude: "I need to write unit tests for this TypeScript function"
  3. Claude loads the skill because the description matches
  4. Person B gets guidance without asking for it explicitly
  5. Person B improves the skill based on feedback

This compounds. Every skill you write gets better as teammates use it and suggest improvements.

Naming Skills for Discoverability

Skill names should describe what the skill teaches, not what problem it solves.

Bad names:

  • fix-failing-tests (sounds like a one-off fix, not reusable)
  • testing (too generic)
  • vitest-setup-guide (too specific to tool, not teachable)

Good names:

  • unit-testing-typescript (specific language, clear purpose)
  • async-test-patterns (describes what it teaches)
  • test-isolation-strategies (describes the concept)

Use hyphens (not underscores) and lowercase. Keep it under 50 characters. The name should be searchable if someone knows they want this skill.

Versioning Skills: When to Update

Skills evolve. Your team's testing practices improve. New vulnerabilities are discovered. Security best practices change.

When your skill needs updates:

Minor updates (clarifications, examples, reordering):

  • Update SKILL.md directly
  • No version bump needed
  • Push to git, teammates automatically get the updated version

Major updates (new patterns, removed recommendations, tool changes):

  • Add a "CHANGELOG" section at the bottom of SKILL.md
  • Document what changed and why
  • Version it: v1.0 -> v1.1 (breaking) or v1.0 -> v1.0.1 (patch)
  • Push to git with a commit message like "Update security-review skill: add OWASP top 10 checks"

Teams that invest in keeping skills current get compounding benefits. Outdated skills are worse than no skill. They teach wrong practices.

Advanced Pattern: Multi-File Skills with Progressive Disclosure

Here's where the power comes in. Large workflows can reference supporting files without bloating the main SKILL.md.

Structure for a comprehensive testing skill:

testing-framework/
├── SKILL.md                              (200 words)
├── references/
│   ├── unit-testing-patterns.md          (2000 words)
│   ├── integration-testing-guide.md      (2000 words)
│   └── mock-strategies.md                (1500 words)
└── scripts/
    └── test_runner.py                    (300 lines)

SKILL.md mentions the references but doesn't include all the details:

markdown
# Testing Framework
 
## Quick Overview
 
This skill teaches the testing patterns we use across projects.
 
## Key Resources
 
- **Unit Testing:** See references/unit-testing-patterns.md for patterns
- **Integration Testing:** See references/integration-testing-guide.md for setup
- **Mocking:** See references/mock-strategies.md for common scenarios
 
## Quick Checklist
 
- [ ] Unit tests for all functions
- [ ] Integration tests for API endpoints
- [ ] Mock external services
- [ ] Test error paths, not just happy path
- [ ] Target 80%+ coverage

When Claude needs details, Claude reads the references. When Claude just needs the overview, Claude stops after SKILL.md. You get the best of both worlds: context-efficient, but complete when needed.

As you write more skills, they start forming families. You might have a testing family (unit-testing, integration-testing, e2e-testing), a security family (security-review, vulnerability-scanning, auth-patterns), and so on.

Organize related skills with a shared prefix:

testing-unit-typescript/
testing-integration-react/
testing-e2e-playwright/

security-review-checklist/
security-vulnerability-scanning/
security-auth-patterns/

docs-api-references/
docs-architecture-guides/
docs-decision-records/

This naming makes it easier to discover related skills. If Claude loads one testing skill, the naming convention helps Claude think of other testing skills.

When creating a skill family, consider one shared "hub" skill that references the others:

markdown
---
name: testing-guide
description: Use when setting up testing infrastructure for your project - guides on unit tests, integration tests, and e2e tests with examples for TypeScript, React, and Playwright
---
 
# Testing Guide
 
This skill family covers testing across multiple levels:
 
- **Unit Testing:** See testing-unit-typescript for isolated function tests
- **Integration Testing:** See testing-integration-react for component and API tests
- **E2E Testing:** See testing-e2e-playwright for full workflow automation
 
Choose the skill that matches your current task.

The hub skill acts as a navigation point. It guides teams toward the specific skill they need without duplicating content across the family.

How Skills Compound: The Long Game

Here's what happens when you build skills systematically:

Month 1: You write 2 skills (testing, code review). Takes 2 hours each.

Month 2: New team member asks the same questions. You point to the skills. Saves 2 hours of explanation.

Month 3: You write 3 more skills (security, API design, documentation). Continue 2-hour pattern.

Month 6: You have 10 skills covering most common tasks. New team members onboard 50% faster because they read skills instead of asking questions.

Year 1: You have 20 skills. When new best practices emerge, you update one skill (30 minutes) and everyone learns. When a vulnerability is discovered, you update the security skill and everyone knows how to prevent it.

Year 2: Your team's institutional knowledge is embedded in skills. Knowledge doesn't walk out the door when someone leaves. New team members inherit the entire playbook.

The ROI is massive. You spend hours writing skills, but save thousands of hours not repeating explanations.

Common Pitfalls

Pitfall 1: Description too vague BAD: "General programming advice" GOOD: "Use when writing TypeScript and need to avoid type errors. Covers common patterns and edge cases"

Pitfall 2: Skill does too much BAD: One skill for "all code review" (100+ sections) GOOD: Separate skills for "reviewing performance", "reviewing security", "reviewing style"

Pitfall 3: Assuming too much knowledge The reader is Claude, not your teammates. Spell things out.

BAD: "Use our standard patterns" GOOD: "Use when implementing API routes. Follow the factory pattern shown below"

Pitfall 4: Including generic advice BAD: "Write clean code and follow best practices" GOOD: "Extract functions longer than 50 lines. Each function should do one thing"

Pitfall 5: Forgetting to test The best skill in the world doesn't help if Claude doesn't find it. Test your description. Ask Claude to do the task and see if it loads your skill.

Summary: Why Skills Matter

Without skills, you're teaching Claude the same lessons repeatedly. With skills, you're building a knowledge base that compounds over time.

Each skill you write:

  • Reduces typing later
  • Ensures consistency across projects
  • Makes processes reproducible
  • Becomes better with each iteration

The compound effect is significant. A team with 20 well-written skills works dramatically faster than a team that explains everything from scratch every time.

Start small. Write one skill for your most-repeated workflow. Test it. Share it. Then write the next one. Before long, you have a library that makes Claude vastly more effective at work you care about.

Need help implementing this?

We build automation systems like this for clients every day.

Discuss Your Project