April 3, 2026Claude8 min read

The Claude extended thinking mode that changes how I debug hard problems

Published April 3, 20268 min read

Three hours into debugging an async race condition in our Node.js order processing service. Intermittent, maybe once every few hundred requests. Every reproduction attempt felt like fishing in fog. I'd asked Claude for help twice, got reasonable answers, tried them, and the bug was still there. Then, almost by accident, I flipped a setting I'd been ignoring: extended thinking.

What came back wasn't really an answer. It was roughly 2,000 tokens of Claude talking through the problem the way a senior engineer thinks out loud, and halfway down I spotted the bug myself.

What most engineers do with Claude

The typical Claude debugging workflow: paste the broken code, describe the symptom, hit send, read the response. For most problems, this works great. Claude is fast, the answer is usually in the first paragraph, and you move on.

For genuinely hard bugs, the pattern caps out. Race conditions, subtle async ordering, performance issues that only show up under load. Claude gives you an answer that's often treating the symptom. You try the suggestion. The bug is still there. You try again. Same result.

The issue isn't Claude's knowledge. For complex problems, the reasoning process is the thing. A senior engineer doesn't just output an answer. They think out loud, challenge their assumptions, chase edge cases. Standard Claude responses skip all of that. Extended thinking doesn't.

What I found: extended thinking mode

Extended thinking is a mode where Claude pauses before answering and works through the problem in an internal reasoning chain that's visible to you via the API. It's been in the Anthropic API for a while, but it's easy to miss. You have to opt in explicitly and it isn't wired up to the standard chat UI.

Here's how you enable it:

typescript

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const response = await client.messages.create({
  model: "claude-sonnet-4-5",
  max_tokens: 16000,
  thinking: {
    type: "enabled",
    budget_tokens: 10000, // how much Claude can "think" before answering
  },
  messages: [
    {
      role: "user",
      content: `Here's a Node.js async race condition I've been debugging for 3 hours.
The bug is intermittent (~1 in 300 requests), happens only under concurrent load,
and my existing mutex didn't fix it.

${yourCode}

The symptom: orders occasionally get double-charged. Here's the relevant log sequence...`,
    },
  ],
});

// Response contains two blocks: thinking + text
for (const block of response.content) {
  if (block.type === "thinking") {
    console.log("Claude's reasoning:\n", block.thinking);
  } else if (block.type === "text") {
    console.log("Claude's answer:\n", block.text);
  }
}

The thinking block is the part most people don't know about. It's Claude's internal monologue. Not filtered, not polished, just the actual reasoning chain before it assembled the final answer. That's where the real value lives.

Why it's different

Here's a condensed version of what the thinking block looked like for my race condition:

claude's thinking block (excerpt)

"The user describes a double-charge bug happening ~1 in 300 requests under
concurrent load. Their mutex didn't fix it. Let me think about why a mutex
might fail here...

If the mutex is instantiated per-request rather than shared across requests,
it's not providing any mutual exclusion at all — each request has its own
lock and they never contend. This is a common mistake.

But wait, looking at the code, the mutex IS module-level. So why would it fail?
Let me look at the async flow more carefully...

The issue might be in how they're awaiting. The charge function does:
  await stripe.charge() then await db.markCharged()
If two requests both pass the 'already charged?' check before either one
completes the stripe.charge(), they'll both proceed. The check isn't inside
the mutex — only the DB write is. That's the gap.

Let me verify: yes, the 'isAlreadyCharged' check on line 23 is OUTSIDE the
mutex acquire. The mutex only covers lines 28-35 (the DB write). So two
concurrent requests can both read isAlreadyCharged=false, both call
stripe.charge(), and only then fight over the mutex for the DB write.

The fix isn't a bigger mutex. They need to move the read-check-write into a
single atomic DB transaction, or move the entire operation inside the lock..."

I spotted my mistake on the second paragraph. The reasoning chain reconstructed the bug from first principles and walked straight to the actual root cause, which wasn't the symptom I'd been chasing.

The standard Claude response, by contrast, had correctly suggested "use a mutex." Technically right. Useless in practice, because it couldn't tell I'd already stuck the mutex in the wrong spot.

  STANDARD CLAUDE                    EXTENDED THINKING
  ─────────────────                  ────────────────────────────────
  Read prompt                        Read prompt
  → Generate answer                  → Think: What do I know?
                                     → Think: What assumptions am I making?
                                     → Think: What might the user be missing?
                                     → Think: What edge cases matter here?
                                     → Think: Let me trace the async flow...
                                     → Think: Wait — the check is outside the lock
                                     → Generate answer (with reasoning as context)

  Good for: 80% of questions         Good for: the other 20%
  Fast, accurate, moves you forward  Slower, but catches what standard misses

Where I actually reach for it

1. Race conditions and async bugs

The hardest category, because the failure mode depends on timing rather than logic. Extended thinking lets Claude trace execution order, think about interleavings, and reason about what the code does under concurrent load instead of in isolation. The thinking block often shows assumptions you didn't know you were making.

2. Architecture decisions with non-obvious tradeoffs

I've started using extended thinking for architecture reviews too. When I paste a proposed design and ask "what am I missing?", the thinking block finds second and third-order consequences a standard response glosses over. Things like: "if service A starts calling service B synchronously, and B scales independently, what happens during a B deployment?" It reasons about operational concerns, not just structural ones.

typescript

// Architecture review — extended thinking finds what you're not asking about
const response = await client.messages.create({
  model: "claude-sonnet-4-5",
  max_tokens: 16000,
  thinking: {
    type: "enabled",
    budget_tokens: 8000,
  },
  messages: [
    {
      role: "user",
      content: `I'm designing a notification system for ~50k daily active users.
Here's my proposed architecture: ${architectureDiagram}
What tradeoffs am I accepting, and what should I validate before building?`,
    },
  ],
});

3. Performance problems that only appear under load

Show Claude a query plan or a flame graph and ask "why is this slow?" and a standard response will hand you the textbook answer. Extended thinking reasons through the data: "at their described load of 500 RPS, and assuming row count of X, this index scan will… wait, they said the query is fast in dev. Dev probably has 1000 rows, production has 50 million. The index isn't being used because the planner is choosing a full scan, which is a statistics staleness problem, not an index design problem." That conditional reasoning is what makes extended thinking actually useful for non-obvious performance bugs.

4. Code review for subtle correctness issues

I paste PRs into Claude for a second look before merging. With extended thinking on, the review catches things like: "this function handles the error case, but if the caller retries on error, and this function has a side effect before the failure point, the retry will cause a double side-effect." That's the kind of observation that takes a careful human reviewer 30 minutes to land on. Extended thinking gets there in seconds.

(That one freaked me out a little the first time I saw it, honestly.)

The tradeoffs to know

Extended thinking isn't free. It's slower. You're waiting for Claude to complete a reasoning chain before the answer starts streaming. And budget_tokens counts against your token usage, so a 10,000-token thinking budget on every request adds up fast.

My current practice is a simple wrapper that defaults to standard mode, with a --think flag for hard problems:

typescript

async function askClaude(
  prompt: string,
  options: { think?: boolean; budgetTokens?: number } = {}
) {
  const { think = false, budgetTokens = 8000 } = options;

  const params: Anthropic.MessageCreateParams = {
    model: "claude-sonnet-4-5",
    max_tokens: think ? 16000 : 4096,
    messages: [{ role: "user", content: prompt }],
  };

  if (think) {
    params.thinking = {
      type: "enabled",
      budget_tokens: budgetTokens,
    };
  }

  const response = await client.messages.create(params);

  if (think) {
    const thinkingBlock = response.content.find((b) => b.type === "thinking");
    const textBlock = response.content.find((b) => b.type === "text");
    return {
      reasoning: thinkingBlock?.type === "thinking" ? thinkingBlock.thinking : null,
      answer: textBlock?.type === "text" ? textBlock.text : "",
    };
  }

  const textBlock = response.content.find((b) => b.type === "text");
  return {
    reasoning: null,
    answer: textBlock?.type === "text" ? textBlock.text : "",
  };
}

// Fast path — standard answers for standard questions
const { answer } = await askClaude("Explain LATERAL joins in Postgres");

// Deep path — reasoning chain for hard problems
const { reasoning, answer: fix } = await askClaude(
  `Intermittent race condition, 3 hours in, mutex didn't fix it...`,
  { think: true, budgetTokens: 10000 }
);

Try it on your next hard bug

The rule I've settled on: whenever I'm about to open a second tab, pull a colleague in for a rubber-duck, or start sprinkling console.logs across a problem I've already burned 30+ minutes on, that's when I reach for extended thinking.

It won't replace the rubber-duck. It will often get you to the right question faster, and for genuinely subtle bugs, reading Claude's reasoning chain is a bit like having a senior engineer talk through their mental model while you watch.

The budget_tokens parameter is the knob you'll tune most. 8,000 to 12,000 tokens covers most debugging sessions for me without getting expensive. For architecture reviews where I want Claude to really stress-test a design, I'll push it to 16,000.

Pick the bug you've been stuck on longest. Turn on extended thinking. Read the reasoning block before you read the answer.

bash

# Quick test — paste your hard bug and see the reasoning
npx ts-node -e "
const Anthropic = require('@anthropic-ai/sdk');
const client = new Anthropic.default();
client.messages.create({
  model: 'claude-sonnet-4-5',
  max_tokens: 12000,
  thinking: { type: 'enabled', budget_tokens: 8000 },
  messages: [{ role: 'user', content: process.argv[1] }]
}).then(r => {
  r.content.forEach(b => {
    if (b.type === 'thinking') console.log('=== REASONING ===
', b.thinking);
    if (b.type === 'text') console.log('=== ANSWER ===
', b.text);
  });
});
" -- 'Your bug description here'