May 17, 2026Claude6 min read

Claude's extended thinking mode solves bugs I had given up on

Published May 17, 20266 min read

I had a race condition that only appeared under load, with a stack trace that pointed to three different possible causes. I had spent two days on it. I tried Claude without extended thinking — got a surface-level answer. I tried again with thinking enabled, gave it a budget of 8,000 tokens to reason, and got a precise diagnosis with a working fix. The difference was not the model — it was giving the model space to actually reason.

What extended thinking actually does

Extended thinking gives Claude a "scratchpad" to reason through a problem before giving you an answer. The thinking tokens are not shown in the output by default (you can request them). The effect is similar to asking someone to think out loud before answering — they catch errors in their own reasoning before committing to a response.

For debugging, this means Claude considers multiple hypotheses, eliminates inconsistent ones, and arrives at a more confident diagnosis.

Enabling extended thinking

python

import anthropic

client = anthropic.Anthropic()

def debug_with_thinking(
    code: str,
    error: str,
    context: str = "",
    thinking_budget: int = 8000,
) -> tuple[str, str]:
    """
    Debug a bug with extended thinking.
    Returns (thinking_content, answer).
    """
    response = client.messages.create(
        model="claude-opus-4-5",  # Extended thinking requires Opus
        max_tokens=16000,  # Must be > thinking_budget
        thinking={
            "type": "enabled",
            "budget_tokens": thinking_budget,
        },
        system="""You are a senior engineer debugging a complex issue.
Think through multiple possible causes before giving your diagnosis.
Consider: race conditions, state mutations, async issues, type coercions,
and edge cases in the specific code paths involved.""",
        messages=[{
            "role": "user",
            "content": f"""Debug this issue:

{f"Context: {context}" if context else ""}

Code:
{code}

Error / Unexpected behavior:
{error}

Please:
1. List all possible causes
2. Eliminate each based on the evidence  
3. Give your most likely diagnosis
4. Provide a fix with explanation"""
        }]
    )
    
    thinking_content = ""
    answer = ""
    
    for block in response.content:
        if block.type == "thinking":
            thinking_content = block.thinking
        elif block.type == "text":
            answer = block.text
    
    return thinking_content, answer

A real debugging session

python

code = """
async def process_payments(payment_ids: list[str]) -> dict:
    results = {}
    
    async def process_one(payment_id: str):
        payment = await db.get_payment(payment_id)
        if payment.status == 'pending':
            result = await stripe.charge(payment.amount, payment.card_token)
            payment.status = 'completed'
            payment.stripe_id = result.id
            await db.save_payment(payment)
            results[payment_id] = 'success'
        else:
            results[payment_id] = 'skipped'
    
    await asyncio.gather(*[process_one(pid) for pid in payment_ids])
    return results
"""

error = """
Under load, some payments are being charged twice on Stripe even though
the DB record shows status='completed'. It happens maybe 1 in 500 times,
only when the same payment_id appears in two concurrent batch jobs.
"""

thinking, answer = debug_with_thinking(code, error, thinking_budget=10000)
print("=== CLAUDE'S REASONING ===")
print(thinking[:2000])  # First 2000 chars of thinking
print("
=== DIAGNOSIS AND FIX ===")
print(answer)

Claude's diagnosis (abbreviated): "The race condition occurs when two concurrent calls to process_payments both read payment.status as 'pending' before either has written 'completed'. The fix requires an atomic check-and-set — use database-level optimistic locking or a SELECT FOR UPDATE to prevent the concurrent read."

python

async def process_one(payment_id: str):
    # Atomic check-and-update: only proceeds if status was 'pending'
    updated = await db.execute(
        "UPDATE payments SET status = 'processing' "
        "WHERE id = $1 AND status = 'pending' "
        "RETURNING id",
        payment_id
    )
    
    if not updated:
        results[payment_id] = 'skipped'
        return
    
    try:
        result = await stripe.charge(payment.amount, payment.card_token)
        await db.execute(
            "UPDATE payments SET status = 'completed', stripe_id = $2 WHERE id = $1",
            payment_id, result.id
        )
        results[payment_id] = 'success'
    except Exception as e:
        # Rollback to pending if charge fails
        await db.execute(
            "UPDATE payments SET status = 'pending' WHERE id = $1",
            payment_id
        )
        raise

When to use extended thinking vs standard mode

Use extended thinking for:

Bugs that only appear intermittently or under specific conditions
Race conditions and concurrent access issues
Performance problems with multiple interacting causes
Bugs in complex state machines or event flows
When you have tried the obvious fixes and they did not work

Standard mode is fine for:

Syntax errors and obvious typos
Missing null checks
Simple logic errors with clear stack traces
API usage questions

Extended thinking costs more (those thinking tokens are billed). Reserve it for bugs that actually deserve the investment — which means bugs you have already spent more than 30 minutes on.