Claude's extended thinking mode solves bugs I had given up on
I had a race condition that only appeared under load, with a stack trace that pointed to three different possible causes. I had spent two days on it. I tried Claude without extended thinking — got a surface-level answer. I tried again with thinking enabled, gave it a budget of 8,000 tokens to reason, and got a precise diagnosis with a working fix. The difference was not the model — it was giving the model space to actually reason.
What extended thinking actually does
Extended thinking gives Claude a "scratchpad" to reason through a problem before giving you an answer. The thinking tokens are not shown in the output by default (you can request them). The effect is similar to asking someone to think out loud before answering — they catch errors in their own reasoning before committing to a response.
For debugging, this means Claude considers multiple hypotheses, eliminates inconsistent ones, and arrives at a more confident diagnosis.
Enabling extended thinking
import anthropic
client = anthropic.Anthropic()
def debug_with_thinking(
code: str,
error: str,
context: str = "",
thinking_budget: int = 8000,
) -> tuple[str, str]:
"""
Debug a bug with extended thinking.
Returns (thinking_content, answer).
"""
response = client.messages.create(
model="claude-opus-4-5", # Extended thinking requires Opus
max_tokens=16000, # Must be > thinking_budget
thinking={
"type": "enabled",
"budget_tokens": thinking_budget,
},
system="""You are a senior engineer debugging a complex issue.
Think through multiple possible causes before giving your diagnosis.
Consider: race conditions, state mutations, async issues, type coercions,
and edge cases in the specific code paths involved.""",
messages=[{
"role": "user",
"content": f"""Debug this issue:
{f"Context: {context}" if context else ""}
Code:
{code}
Error / Unexpected behavior:
{error}
Please:
1. List all possible causes
2. Eliminate each based on the evidence
3. Give your most likely diagnosis
4. Provide a fix with explanation"""
}]
)
thinking_content = ""
answer = ""
for block in response.content:
if block.type == "thinking":
thinking_content = block.thinking
elif block.type == "text":
answer = block.text
return thinking_content, answer
A real debugging session
code = """
async def process_payments(payment_ids: list[str]) -> dict:
results = {}
async def process_one(payment_id: str):
payment = await db.get_payment(payment_id)
if payment.status == 'pending':
result = await stripe.charge(payment.amount, payment.card_token)
payment.status = 'completed'
payment.stripe_id = result.id
await db.save_payment(payment)
results[payment_id] = 'success'
else:
results[payment_id] = 'skipped'
await asyncio.gather(*[process_one(pid) for pid in payment_ids])
return results
"""
error = """
Under load, some payments are being charged twice on Stripe even though
the DB record shows status='completed'. It happens maybe 1 in 500 times,
only when the same payment_id appears in two concurrent batch jobs.
"""
thinking, answer = debug_with_thinking(code, error, thinking_budget=10000)
print("=== CLAUDE'S REASONING ===")
print(thinking[:2000]) # First 2000 chars of thinking
print("
=== DIAGNOSIS AND FIX ===")
print(answer)
Claude's diagnosis (abbreviated): "The race condition occurs when two concurrent calls to process_payments both read payment.status as 'pending' before either has written 'completed'. The fix requires an atomic check-and-set — use database-level optimistic locking or a SELECT FOR UPDATE to prevent the concurrent read."
async def process_one(payment_id: str):
# Atomic check-and-update: only proceeds if status was 'pending'
updated = await db.execute(
"UPDATE payments SET status = 'processing' "
"WHERE id = $1 AND status = 'pending' "
"RETURNING id",
payment_id
)
if not updated:
results[payment_id] = 'skipped'
return
try:
result = await stripe.charge(payment.amount, payment.card_token)
await db.execute(
"UPDATE payments SET status = 'completed', stripe_id = $2 WHERE id = $1",
payment_id, result.id
)
results[payment_id] = 'success'
except Exception as e:
# Rollback to pending if charge fails
await db.execute(
"UPDATE payments SET status = 'pending' WHERE id = $1",
payment_id
)
raise
When to use extended thinking vs standard mode
Use extended thinking for:
- Bugs that only appear intermittently or under specific conditions
- Race conditions and concurrent access issues
- Performance problems with multiple interacting causes
- Bugs in complex state machines or event flows
- When you have tried the obvious fixes and they did not work
Standard mode is fine for:
- Syntax errors and obvious typos
- Missing null checks
- Simple logic errors with clear stack traces
- API usage questions
Extended thinking costs more (those thinking tokens are billed). Reserve it for bugs that actually deserve the investment — which means bugs you have already spent more than 30 minutes on.