AI & LLMs Intermediate

Reasoning in LLMs — Chain of Thought, Extended Thinking, and When to Use Them

Reasoning models think before answering. Here's how chain-of-thought prompting works, what Anthropic's extended thinking does differently, and when the extra cost is worth it.

DjangoZen Team May 09, 2026 9 min read 2 views

The problem reasoning solves

Standard LLMs predict tokens one at a time, left-to-right, with no chance to backtrack or double-check. For simple language tasks (rewriting an email, classifying a ticket) that's fine. For multi-step problems — a math word problem, a logic puzzle, debugging a piece of code — it often fails.

The reason: complex problems require planning, evaluation, and revision. A pure next-token predictor doesn't naturally do those. So researchers found ways to coax the model to do them.

Chain of thought (CoT) — the simplest trick

The original 2022 finding: if you ask an LLM to "think step by step" before giving its answer, accuracy on reasoning tasks jumps dramatically.

# Standard prompt — often wrong on multi-step problems
"What's 23 * 47?"

# Chain-of-thought prompt — much more reliable
"What's 23 * 47? Let's think step by step before giving the final answer."

The model writes out its reasoning ("23 * 47 = 23 * 50 - 23 * 3 = 1150 - 69 = 1081") before giving the answer. Even though the model is still just predicting tokens, the intermediate tokens act as a scratchpad — each next prediction has more relevant context.

This is prompted CoT, and it works on any LLM. Costs you a few hundred extra output tokens.

Few-shot CoT — even better

Add a worked example to your prompt:

prompt = '''
Q: A shop sold 14 apples on Monday and 22 on Tuesday. How many in total?
A: 14 + 22 = 36 apples.

Q: A shop sold 9 books in the morning and 17 in the afternoon. How many in total?
A:
'''

The model sees the pattern of "show your work" and follows it. This is the workhorse of production prompting before reasoning models existed.

Reasoning models — when CoT becomes built-in

In 2024–2025 a new generation of models was trained specifically to reason internally before answering. Examples:

  • OpenAI o1, o3 — generate hidden "reasoning tokens" before the visible answer
  • Anthropic Claude with Extended Thinking — same idea, with the thinking visible to the developer (controllable)
  • Google Gemini Thinking

These models are trained on reasoning traces — long, structured "thinking out loud" examples — so chain-of-thought is no longer a prompt trick, it's a built-in behavior.

In practical terms:

  • You don't add "think step by step" to your prompt
  • The model spends extra tokens (and time) reasoning before responding
  • Accuracy on hard problems is significantly higher
  • Cost and latency are also significantly higher

Anthropic's Extended Thinking, specifically

With Claude, you can opt into extended thinking on a per-call basis:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=4000,
    thinking={"type": "enabled", "budget_tokens": 10000},
    messages=[{
        "role": "user",
        "content": "Find the bug in this Django view: ..."
    }]
)

# response.content includes both thinking blocks and the final answer
for block in response.content:
    if block.type == "thinking":
        print("Reasoning:", block.thinking)
    elif block.type == "text":
        print("Answer:", block.text)

The budget_tokens is the cap on thinking tokens. Higher budget → potentially better answer, but more cost and latency.

When reasoning is worth the cost

Worth it:

  • Debugging gnarly code — reasoning models genuinely catch bugs that base models miss
  • Planning multi-step actions — "I need to: 1. fetch X, 2. transform Y, 3. validate Z..."
  • Math, logic, and constraint problems — where one wrong intermediate step poisons the answer
  • Code review and architecture decisions — when the answer requires weighing tradeoffs
  • Research-style questions — "what's the right approach to building X?"

Not worth it:

  • Single-step tasks — classify this email; rewrite this paragraph; extract this field
  • Latency-critical paths — reasoning models are 3–10x slower than base models
  • High-volume bulk processing — costs add up fast at scale
  • When the input is short and the answer is short — you're paying for thinking that doesn't help

A practical Django pattern

Use a fast non-reasoning model for the hot path, a reasoning model for the slow lane:

def handle_user_question(question):
    # Fast classifier — Haiku, no reasoning, < 1 second
    category = classify_with_haiku(question)

    if category == "simple_lookup":
        # Fast path: Sonnet, no reasoning
        return answer_with_sonnet(question)
    elif category == "complex_analysis":
        # Slow path: Opus with extended thinking
        return answer_with_reasoning(question)

This gives you reasoning quality where it matters and snappy responses where it doesn't.

Cost reality check, 2026 prices

  • Claude Sonnet 4.6 base: ~$3 / $15 per million tokens (in/out)
  • Claude Opus 4.7 base: ~$15 / $75
  • Claude Opus 4.7 with extended thinking: same per-token rates, but you'll burn 5,000–50,000 tokens on the thinking phase alone

So a single complex reasoning call can cost $0.50–$5. Multiply by users. Plan accordingly.

Caveats and pitfalls

  1. Reasoning models are not infallible. They produce more thoughtful wrong answers, sometimes. Always validate.
  2. Extended thinking is not visible to users by default. Don't expose the reasoning trace in your UI unless you want to (it's often messy, exploratory, with dead ends).
  3. Caching works differently — reasoning tokens don't fit the same cache patterns as input tokens. Read the model docs carefully if you're trying to optimize.

Summary

  • Chain-of-thought: prompt trick, works on any LLM, cheap, modest gains
  • Few-shot CoT: same trick with examples, more reliable
  • Reasoning models: built-in thinking, big quality gains on hard problems, big cost increases
  • Use selectively — fast model for fast tasks, reasoning for the genuinely hard ones

Next tutorial: how to actually wire all this into a Django view — the Claude API, streaming, caching, and error handling.