November 17, 2025
Prompt Engineering AI Advanced Claude

Advanced Prompt Techniques: System Prompts, Few-Shot, and Chain-of-Thought

You've probably hit the wall with basic prompting. You ask Claude a straightforward question, get an okay response, then think: there's got to be a better way. There is. Once you understand how to layer different prompting techniques together, your results jump from "good enough" to "I can actually ship this."

This guide covers the advanced techniques that separate amateur prompts from ones that work in production: system prompts that establish contracts, few-shot examples that train without training, chain-of-thought reasoning that shows the work, and extended thinking for the truly hard problems. You'll also learn when to use each technique and how to combine them.

Table of Contents
  1. Understanding System Prompts: Your Contract with Claude
  2. Role
  3. Scope of Review
  4. Output Format
  5. Constraints
  6. When Uncertain
  7. Few-Shot Prompting: Teaching Through Examples
  8. Zero-Shot vs Few-Shot vs Chain-of-Thought: When to Use Each
  9. Chain-of-Thought: Making Claude Show Its Work
  10. Extended Thinking: When Claude Should Reason Deeply
  11. Prefilling Claude's Response: Guiding the Format
  12. Prompt Chaining: Breaking Complex Tasks into Sequences
  13. XML Tags: Structuring Complex Prompts
  14. The Decision Framework: Putting It All Together
  15. Common Pitfalls and How to Avoid Them
  16. Advanced Technique: Negative Examples and Contrasts
  17. Combining All Techniques: The Prompt Hierarchy
  18. Summary: Advanced Prompting Is About Layers

Understanding System Prompts: Your Contract with Claude

Let's start with the foundation. A system prompt isn't just instructions. It's a contract. You're establishing Claude's role, constraints, and behavior for the entire conversation. Everything that follows operates within this context.

Here's a basic system prompt:

You are a technical writer specializing in API documentation.
Your job is to make complex concepts accessible to junior developers.
Always use clear language, avoid jargon, and explain terms before using them.
If you don't know something, say "I don't have enough information about that."

This does three things: defines the role, establishes the audience level, and sets the boundary condition (what to do when uncertain). That third part matters. It's a safety valve that prevents hallucination.

Now, here's a more sophisticated system prompt using what I call the "contract format":

# System Contract: Code Review Agent

## Role
You are a senior code reviewer with 15+ years of experience.
Your reviews are thorough but constructive. You point out real problems,
not style preferences.

## Scope of Review
- Security vulnerabilities (HIGH priority)
- Performance bottlenecks (MEDIUM priority)
- Readability issues (LOW priority)
- Missing error handling (HIGH priority)

## Output Format
For each issue found:
1. Location (file:line)
2. Category (security/performance/readability/error-handling)
3. Current code snippet
4. Explanation of the problem
5. Suggested fix with brief rationale

## Constraints
- Only flag issues you're confident about
- Explain why each issue matters
- Suggest fixes that match the existing code style
- Don't review comments or documentation (unless they're misleading)

## When Uncertain
State your concern and ask a clarifying question rather than guessing.

This contract format does something powerful: it creates boundaries. Claude knows what's in scope, what's out, what priority looks like, and how to handle edge cases. You're not hoping Claude does the right thing; you're telling Claude what the right thing is.

The key insight here is specificity breeds compliance. The more detailed your system prompt about what you want and why, the better your results. Claude responds to explicit contracts better than vague instructions.

Few-Shot Prompting: Teaching Through Examples

Few-shot prompting is deceptively simple: you show Claude a few examples of input and output, then it generalizes to new inputs. The "few" typically means 2-5 examples. More than that and you're just padding context; fewer and the pattern might not stick.

Here's a simple few-shot prompt:

Classify the following customer support tickets as "bug" or "feature request."

Example 1:
Ticket: "The login page won't load on iOS devices."
Category: bug

Example 2:
Ticket: "Can we add dark mode to the dashboard?"
Category: feature request

Example 3:
Ticket: "When I upload a CSV, it sometimes freezes for 10 seconds."
Category: bug

Now classify this new ticket:
Ticket: "Would love to export reports as PDF instead of just CSV."
Category:

See what happened? You didn't write a detailed rule like "bugs are technical problems and features are new functionality." You just showed Claude the pattern through examples. This works because Claude learns from patterns in context.

The hidden layer here is why few-shot works so well: Claude doesn't memorize your examples. Instead, it infers the underlying rule from the pattern. "Bug" examples are about things breaking; "feature request" examples are about new capabilities. Four examples is usually enough for Claude to lock in the pattern.

Here's a more sophisticated few-shot prompt for a practical task, extracting structured data:

Extract the main selling point from each product description.
Return only the selling point, nothing else.

Example 1:
Input: "Our coffee maker brews in 30 seconds using patented flash-heat technology.
No more waiting. Eco-friendly too. It uses 40% less water than competitors."
Selling Point: Brews in 30 seconds

Example 2:
Input: "Ergonomic design. Fits in your pocket. Comes in 12 colors.
Works with both iOS and Android. Battery lasts 48 hours."
Selling Point: Fits in your pocket with 48-hour battery

Example 3:
Input: "Premium noise-cancellation headphones with 40-hour battery life.
Active noise cancellation blocks up to 40dB. Comfortable for all-day wear."
Selling Point: 40-hour battery with active noise cancellation

Now extract from:
Input: "The Bolt Router uses mesh networking to cover 5,000 sq ft with zero dead zones.
Three devices included. Setup takes 2 minutes."
Selling Point:

Notice how the examples show Claude what level of detail you want. The selling points aren't the entire description; they're the core value prop. By showing this pattern, you guide Claude's extraction without explicit rules.

One gotcha: order matters in few-shot prompting. Put your best, most representative examples early. If you have a tricky case, lead with it so Claude understands the pattern from the start.

Zero-Shot vs Few-Shot vs Chain-of-Thought: When to Use Each

Let me cut through the confusion with a decision matrix:

Zero-Shot: You ask the question directly with no examples.

  • Best for: General knowledge questions, straightforward tasks
  • Worst for: Classification, specific output formats, nuanced judgments
  • Example: "What's the capital of France?"

Few-Shot: You provide examples of input/output pairs.

  • Best for: Classification, data extraction, style matching
  • Worst for: Novel creative tasks, complex reasoning
  • Example: "Here are 3 tickets marked 'bug'. Classify this one"

Chain-of-Thought (CoT): You ask Claude to show its reasoning step-by-step.

  • Best for: Math, logic puzzles, complex decision-making
  • Worst for: Simple factual questions (wastes tokens)
  • Example: "Let's think step by step. First, what information do we need?"

Here's the decision flowchart in text:

  1. Is this a straightforward factual question? → Zero-shot (fastest)
  2. Does the task have a specific input/output pattern? → Few-shot (you'll get consistent format)
  3. Does this require showing reasoning or complex logic? → Chain-of-Thought (Claude explains steps)
  4. Does this involve math, coding, or analysis? → Consider Extended Thinking (Claude reasons deeply)

The practical strategy: Start zero-shot. If results are inconsistent, add few-shot examples. If results are wrong in ways that suggest confused reasoning, add chain-of-thought.

Chain-of-Thought: Making Claude Show Its Work

Chain-of-thought prompting asks Claude to explain its reasoning before giving the answer. The magic phrase is: "Let's think step by step."

Here's a simple example:

Question: If a train leaves Station A at 2 PM traveling at 60 mph,
and another train leaves Station B at 3 PM traveling at 80 mph,
and the stations are 200 miles apart, when do the trains meet?

Think step by step before answering.

Claude's response would be something like:

Let me work through this:

Step 1: Define the problem
- Train A: starts at 2 PM, speed 60 mph from Station A
- Train B: starts at 3 PM, speed 80 mph from Station B
- Distance between stations: 200 miles

Step 2: Figure out when Train B starts
- Train A travels for 1 hour before Train B leaves
- In that hour, Train A covers 60 miles
- Remaining distance: 200 - 60 = 140 miles

Step 3: Calculate closing speed
- Both trains move toward each other
- Combined closing speed: 60 + 80 = 140 mph

Step 4: Calculate time to meet
- Time = 140 miles / 140 mph = 1 hour
- This is 1 hour after Train B starts (3 PM)
- Answer: 4 PM

Notice how this is objectively better than just the answer? You can see where Claude's logic is sound or where it might have gone wrong. For problems where accuracy matters, chain-of-thought often increases correctness because Claude catches its own errors in the reasoning.

But here's the hidden layer: chain-of-thought uses more tokens and takes longer. Use it when accuracy matters more than speed. For simple questions ("What's 5 + 3?"), it's wasteful.

Here's how to combine chain-of-thought with few-shot:

You're a decision-maker evaluating whether to approve a business proposal.
Evaluate based on: market size, competition, team experience, and financial viability.

Example 1:
Proposal: App for dog walking
Market size: Large (millions of pet owners)
Competition: Fierce (Rover, Wag, Bark)
Team: Two Stanford grads, 1 VC experience
Financials: $500K runway

Analysis: Market is large but saturated. Strong team might differentiate,
but competitive advantage unclear. Financial runway is thin for market share
fight. DECISION: Reject

Example 2:
Proposal: AI code review for legacy systems
Market size: Medium-large ($2B enterprise software)
Competition: Low (mostly internal tools)
Team: Former Google/Meta engineers
Financials: $2M runway, clear unit economics

Analysis: Addresses real pain point. Differentiated team. Low competition.
Strong financials. Clear path to MVP. DECISION: Approve

Now evaluate this proposal. Think through each factor before deciding.

Proposal: Marketplace for freelance veterinarians
Market size: ?
Competition: ?
Team: ?
Financials: ?

Analysis: [Work through each factor]
DECISION:

This combines few-shot (examples) with chain-of-thought (the "work through each factor" instruction). You're showing Claude the pattern AND asking it to show its work.

Extended Thinking: When Claude Should Reason Deeply

Extended thinking is a newer capability. It tells Claude: "Spend more computational effort thinking about this before answering." Claude reasons internally before responding, which helps with genuinely hard problems.

When to use extended thinking:

  • Math and logic problems: Multi-step algebra, proofs, complex reasoning
  • Code analysis: Debugging subtle bugs, understanding complex codebases
  • Research and synthesis: Pulling together disparate information into coherent conclusions
  • Strategic planning: Thinking through second-order consequences

Here's how you invoke it (conceptually, the exact API varies):

I need help debugging this race condition in my async code.
[paste complex code]

Please use extended thinking to work through this carefully.
Show me your reasoning, then explain what's wrong and how to fix it.

The practical impact: extended thinking takes longer and uses more tokens, but for genuinely hard problems, the answer quality jumps significantly. For a tricky algorithmic bug or a math proof, it's worth it. For "what rhymes with orange," it's overkill.

The hidden insight: extended thinking works because it lets Claude follow longer chains of reasoning without trying to finalize the answer prematurely. Most mistakes happen when Claude rushes to an answer. Extended thinking lets it think through the whole problem first.

Prefilling Claude's Response: Guiding the Format

Prefilling is a subtle technique that works surprisingly well. You start Claude's response for it, and Claude completes the thought.

Simple example:

Summarize this article in one sentence:
[article text]

Summary: The article argues that

Notice the "Summary: The article argues that" at the end? You've started the response. Claude will complete it, and this primes it to provide a summary rather than the full article or metadata.

Here's a more practical example:

Extract the key metrics from this financial report as JSON.

{
  "revenue":

By starting the JSON structure, you've told Claude exactly what format you want. It'll complete the JSON correctly because you've primed the format.

Prefilling is particularly useful when you want consistent structure. Here's a real-world example:

Evaluate this startup pitch and provide a structured analysis.

Startup: [name]
Market Opportunity: Large/Medium/Small
Market Size:
Competitive Landscape:
Team Strength:
Technology Risk:
Financial Health:
Overall Assessment: Invest/Pass
Reasoning:

You've prefilled the structure. Claude will fill in each section. This is faster and more consistent than asking Claude to create its own structure.

Prompt Chaining: Breaking Complex Tasks into Sequences

Prompt chaining is the orchestration layer. You break a complex task into sequential prompts, each one feeding into the next.

Example: You want to take customer feedback, extract themes, prioritize them, and suggest features.

Chain Step 1: Extract themes

Analyze these 50 customer support tickets and extract recurring themes.
List each theme with a count of how many tickets mentioned it.

[tickets]

Claude returns:

Onboarding confusion: 12 tickets
Export functionality: 8 tickets
Mobile app crashes: 6 tickets
...

Chain Step 2: Analyze impact

Here are recurring support themes:
[themes from step 1]

For each theme, estimate the business impact if we fixed it.
Use: High/Medium/Low impact and explain why.

Claude returns impact analysis.

Chain Step 3: Suggest features

Here are support themes and their impact:
[themes + impact from step 2]

Suggest one feature for each High-impact theme that would reduce support tickets.

Claude returns feature suggestions.

Chain Step 4: Prioritize

Here are suggested features:
[features from step 3]

Rank these by implementation effort (low/medium/high) and expected ticket reduction.
Create a prioritized roadmap.

Claude returns a roadmap.

The power of chaining: each step is simple, focused, and builds on the previous step. Claude is better at focused tasks than megaproblems. You're also creating a chain of evidence. Each step's output shows the reasoning.

The gotcha: chaining uses more tokens because you're making multiple API calls and passing previous results forward. Use it when quality matters more than cost, or when the multi-step process helps clarity.

XML Tags: Structuring Complex Prompts

XML tags help you organize complex prompts without confusion. They're especially useful when you're combining multiple techniques.

Here's an example combining system prompt, context, examples, and instructions:

xml
<system_prompt>
You are a security auditor reviewing code for vulnerabilities.
Your job is to find real security issues, not style problems.
When you find an issue, explain the vulnerability and suggest a fix.
</system_prompt>
 
<context>
This code is part of a web application handling user authentication.
Developers are junior to mid-level. Assume good faith, not malice.
</context>
 
<examples>
EXAMPLE 1 - Hardcoded credentials (SECURITY ISSUE):
```python
api_key = "sk-1234567890"
requests.get("https://api.example.com", headers={"auth": api_key})
```
Issue: API key hardcoded in code. Anyone with repo access can use it.
Fix: Use environment variables or secrets manager.
 
EXAMPLE 2 - SQL injection vulnerability (SECURITY ISSUE):
```python
query = f"SELECT * FROM users WHERE id = {user_input}"
db.execute(query)
```
Issue: Unescaped user input. Attacker can inject SQL.
Fix: Use parameterized queries: `db.execute("SELECT * FROM users WHERE id = ?", (user_input,))`
</examples>
 
<task>
Review the following code for security issues.
For each issue: state the problem, explain the risk, and suggest a fix.
Ignore style or lint issues.
 
[CODE TO REVIEW]
</task>

The XML tags here serve as organizational landmarks. Claude knows what's the system prompt, what's context, what's examples, and what's the actual task. This structure prevents confusion, especially in complex prompts.

The Decision Framework: Putting It All Together

Now, how do you decide which techniques to use?

For a classification task: Start with few-shot. Show 3-5 examples. If accuracy is still low, add a system prompt that explains the category definitions more clearly.

For a reasoning task: Start with chain-of-thought. If you need consistent formats, add few-shot examples of good reasoning. If the problem is really hard, add extended thinking.

For a data extraction task: Use few-shot with XML tags. Start Claude's response with the opening of the structure you want (prefilling). System prompt should define the exact format.

For a complex multi-step task: Use prompt chaining. Each step is a focused prompt. System prompt applies to the whole chain.

When quality is critical: Add extended thinking. Combine it with chain-of-thought so Claude shows its reasoning and thinks deeply.

The meta-framework:

  1. Understand your task type (classification, reasoning, extraction, generation, analysis)
  2. Start simple (zero-shot or few-shot depending on consistency needs)
  3. Add structure when needed (system prompt, XML tags, prefilling)
  4. Add reasoning when stuck (chain-of-thought, extended thinking)
  5. Orchestrate if complex (prompt chaining)

Common Pitfalls and How to Avoid Them

Pitfall 1: Over-detailed system prompts

Writing a 500-word system prompt doesn't make Claude smarter; it makes it confused. Keep system prompts focused on role, scope, and constraints. Details belong in the specific prompt, not the system prompt.

A good system prompt tells Claude what to do and why. A bad one drowns Claude in every possible detail, contradictory instructions, and edge cases. When you overload a system prompt, Claude gets confused about priorities. What matters most? What's negotiable? What's a hard boundary?

The fix is ruthless prioritization. If your system prompt is more than 200 words, trim it. Move nuanced instructions into the actual task prompt. Reserve the system prompt for things that truly apply to every request: role, core constraints, and escalation procedures.

Pitfall 2: Too many few-shot examples

Seven examples isn't better than three. More examples just dilute the pattern and waste tokens. Stick to 2-5 well-chosen examples.

Every example you add makes Claude slower and uses more tokens. More importantly, too many examples can confuse the pattern. Claude might notice surface-level similarities across all seven examples and lock onto the wrong pattern.

Here's the sweet spot: use 3 examples. If your task is simple (binary classification), use 2. If your task is genuinely complex with multiple edge cases, go to 5. Beyond that, you're not helping.

Pitfall 3: Chain-of-thought on trivial questions

"Let me think step by step: 2 + 2 = 4" is token waste. Use chain-of-thought for reasoning that actually benefits from showing work.

Chain-of-thought is a power tool. Using it indiscriminately is like running a blowtorch to toast bread. For questions where the answer is well-known and straightforward, chain-of-thought adds nothing.

The right use: when the problem requires multi-step reasoning, when Claude might take a wrong turn, or when you need to understand the logic (to audit it).

Pitfall 4: Prefilling without structure

Don't just start Claude's response randomly. Prefilling works best when you're starting a clear structure (JSON, list, etc.) that Claude will naturally complete.

The best prefilling goes like this: you start a structure that has an obvious continuation. JSON objects with opening braces. Lists with dashes. Outlines with headers. These have natural shapes that Claude completes.

Pitfall 5: Chaining with no feedback mechanism

If you chain 5 prompts together and the last one is wrong, you can't debug the chain easily. Build checkpoints where you validate intermediate results.

Prompt chaining is powerful but also dangerous. When you chain prompts, errors compound. If step 2's output is slightly off, step 3 works with corrupted input. By step 5, you might be far from the truth and not even know where it went wrong.

The fix: build validation checkpoints. After extraction, verify the output looks right before proceeding. This costs a few tokens but saves debugging nightmares later.

Advanced Technique: Negative Examples and Contrasts

You've seen positive examples (what Claude should do). Negative examples are equally powerful.

Show Claude what NOT to do:

DO NOT confuse these:
- Feature request: User wants new functionality that doesn't exist
- Bug report: Existing feature is broken or behaving unexpectedly

Bad example of confusing them:
Ticket: "The search is slow."
WRONG: Feature request (it's not, search exists, it's just slow)
CORRECT: Bug report (search is broken/slow)

Bad example of confusing them:
Ticket: "Can you add voice commands?"
WRONG: Bug report (it's not broken, it's not implemented)
CORRECT: Feature request (new capability)

This takes a few extra words, but the contrast sticks with Claude. Negative examples work because they show exactly where the boundary is.

Why does this work? Claude learns patterns from what you show it. By showing both positive and negative examples (and explicitly labeling the contrast), you're teaching Claude the decision boundary, not just one side of it.

Combining All Techniques: The Prompt Hierarchy

There's an art to layering these techniques without creating a confused mess. Think of it as a hierarchy:

  1. System Prompt (applies to everything)

    • Role, constraints, escalation
    • Stays the same across the conversation
  2. Task Prompt (specific to this request)

    • XML tags if complex
    • Few-shot examples (2-5)
    • Negative examples if boundaries are unclear
    • Chain-of-thought instruction if reasoning needed
  3. Prefilling (guides format)

    • Start the structure you want
    • Must be natural, not forced
  4. Extended Thinking (for hard problems)

    • Optional, only when accuracy > speed
    • Can combine with chain-of-thought
  5. Chaining (for multi-step tasks)

    • Each step follows the hierarchy above
    • Add validation between steps

This hierarchy prevents conflicts. Your system prompt doesn't override task-specific instructions. Your prefilling doesn't fight your few-shot examples. Each layer has a clear role.

Here's the key insight: Don't use all of these for every prompt. Start with the bottom of the hierarchy (system prompt + task prompt). Add layers only when results are wrong or inconsistent.

Summary: Advanced Prompting Is About Layers

You started this article probably thinking one technique was the answer. The truth is more nuanced. Advanced prompting is about layering techniques:

  • System prompts establish the contract and prevent scope creep
  • Few-shot examples train Claude on patterns without explicit rules
  • Chain-of-thought makes reasoning visible and often more accurate
  • Extended thinking lets Claude reason deeply on hard problems
  • Prefilling guides format and structure
  • Prompt chaining breaks complex problems into focused steps
  • XML tags organize complex prompts clearly

The decision framework is simple: match the technique to the problem. Classification? Few-shot. Reasoning? Chain-of-thought. Hard problem? Extended thinking. Multi-step task? Prompt chaining. All of these together? Powerful.

Start with the simplest technique that works, then layer in complexity only when needed. You'll find that even small adjustments (adding one few-shot example, one prefilled line, one layer of XML structure) often double your results.

Now go build something better.

Need help implementing this?

We build automation systems like this for clients every day.

Discuss Your Project