AI & LLMs Intermediate

Calling the Claude API from Django — Setup, Streaming, Error Handling

From pip install to a streaming chat view in production. Authentication, error handling, prompt caching, and cost-aware patterns for Django + Claude.

DjangoZen Team May 09, 2026 12 min read 9 views

What you'll build

A Django view that takes a user's prompt, sends it to Claude, streams the response back to the user as it generates, handles errors gracefully, and uses prompt caching to keep costs down.

Setup

1. Install the SDK

pip install anthropic

Add to requirements.txt with the latest stable version, no upper bound.

2. Get an API key

Sign up at console.anthropic.com, generate a key, and set it as an environment variable:

# .env
ANTHROPIC_API_KEY=sk-ant-...

Never commit the key. Use python-decouple (already in your Django stack) to read it.

3. Configure in settings.py

from decouple import config

ANTHROPIC_API_KEY = config("ANTHROPIC_API_KEY", default="")
ANTHROPIC_MODEL = config("ANTHROPIC_MODEL", default="claude-sonnet-4-6")

Externalising the model name means you can swap models without redeploying — useful for cost tuning.

Basic chat view

# myapp/views.py
import anthropic
from django.conf import settings
from django.http import JsonResponse
from django.views.decorators.http import require_POST
from django.contrib.auth.decorators import login_required

client = anthropic.Anthropic(api_key=settings.ANTHROPIC_API_KEY)


@login_required
@require_POST
def ask_claude(request):
    user_prompt = request.POST.get("prompt", "").strip()
    if not user_prompt:
        return JsonResponse({"error": "Empty prompt"}, status=400)

    try:
        response = client.messages.create(
            model=settings.ANTHROPIC_MODEL,
            max_tokens=1024,
            messages=[{"role": "user", "content": user_prompt}],
        )
    except anthropic.APIError as e:
        return JsonResponse({"error": str(e)}, status=502)

    answer = response.content[0].text
    return JsonResponse({"answer": answer})

This works. It's also slow — the user sees a spinner for 5–30 seconds while the whole response generates. Streaming fixes that.

Streaming with Server-Sent Events

Streaming sends tokens to the user as Claude produces them. Critical for any UI showing AI output longer than a sentence or two.

# myapp/views.py (streaming version)
import json
import anthropic
from django.conf import settings
from django.http import StreamingHttpResponse
from django.views.decorators.http import require_POST
from django.contrib.auth.decorators import login_required

client = anthropic.Anthropic(api_key=settings.ANTHROPIC_API_KEY)


def claude_stream(prompt):
    """Generator that yields SSE-formatted text deltas from Claude."""
    try:
        with client.messages.stream(
            model=settings.ANTHROPIC_MODEL,
            max_tokens=1024,
            messages=[{"role": "user", "content": prompt}],
        ) as stream:
            for text in stream.text_stream:
                # SSE format: each event is "data: <json>\n\n"
                yield f"data: {json.dumps({'delta': text})}\n\n"
            yield f"data: {json.dumps({'done': True})}\n\n"
    except anthropic.APIError as e:
        yield f"data: {json.dumps({'error': str(e)})}\n\n"


@login_required
@require_POST
def ask_claude_stream(request):
    prompt = request.POST.get("prompt", "").strip()
    if not prompt:
        return StreamingHttpResponse(
            f"data: {json.dumps({'error': 'Empty prompt'})}\n\n",
            content_type="text/event-stream",
        )

    response = StreamingHttpResponse(
        claude_stream(prompt),
        content_type="text/event-stream",
    )
    response["Cache-Control"] = "no-cache"
    response["X-Accel-Buffering"] = "no"  # Disable nginx buffering
    return response

Two important details:

  1. X-Accel-Buffering: no — without this, nginx buffers the entire response and your "streaming" arrives all at once at the end. Always set it for SSE through nginx.
  2. Cache-Control: no-cache — don't let any proxy cache the stream.

Tutorial 8 covers SSE in detail with the JavaScript client side.

Error handling that doesn't lie to your users

The Anthropic SDK raises specific exceptions you should handle distinctly:

try:
    response = client.messages.create(...)
except anthropic.AuthenticationError:
    # Your API key is wrong or revoked. Page yourself.
    logger.error("Anthropic auth failed — check API key")
    return JsonResponse({"error": "Service misconfigured"}, status=500)
except anthropic.RateLimitError as e:
    # You're sending too many requests. Back off.
    return JsonResponse(
        {"error": "Too many requests, please try again in a moment"},
        status=429
    )
except anthropic.APIStatusError as e:
    # Generic API error (4xx or 5xx from Anthropic)
    logger.warning(f"Anthropic API error {e.status_code}: {e}")
    return JsonResponse({"error": "AI service unavailable"}, status=502)
except anthropic.APIError as e:
    # Catch-all
    logger.exception("Unexpected Anthropic error")
    return JsonResponse({"error": "Something went wrong"}, status=500)

Three rules:

  1. Don't show raw API errors to users. They might leak prompts or internal details.
  2. Log everything with logger.exception so you can debug. Sentry catches these.
  3. Return appropriate HTTP status codes so frontend code can react sensibly (retry on 429, escalate on 5xx).

Prompt caching — the big cost win

If you have a long system prompt that's the same for every request (like instructions, examples, a knowledge base), enable prompt caching. Anthropic charges 10% of normal input cost for cached tokens.

response = client.messages.create(
    model=settings.ANTHROPIC_MODEL,
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": LONG_SYSTEM_PROMPT,  # Thousands of tokens of instructions/examples
            "cache_control": {"type": "ephemeral"},
        }
    ],
    messages=[{"role": "user", "content": user_prompt}],
)

The cache lasts ~5 minutes. If you serve a popular feature, your effective cost on the system prompt drops by 90%. Worth thousands of euros at scale.

Cache the largest stable parts of your prompt: system messages, tool definitions, document context, examples. Don't bother caching short or volatile content (the per-call user message).

Async and concurrency

Django 5.2 fully supports async views. Use the async Anthropic client when you have multiple AI calls in flight or when you're using async views anyway:

from anthropic import AsyncAnthropic

aclient = AsyncAnthropic(api_key=settings.ANTHROPIC_API_KEY)


async def ask_claude_async(request):
    prompt = request.POST.get("prompt", "")
    response = await aclient.messages.create(
        model=settings.ANTHROPIC_MODEL,
        max_tokens=1024,
        messages=[{"role": "user", "content": prompt}],
    )
    return JsonResponse({"answer": response.content[0].text})

Use this when you're fanning out parallel calls (e.g., classifying ten items at once with asyncio.gather).

Retries and idempotency

The SDK retries automatically on transient errors with exponential backoff. You can tune it:

client = anthropic.Anthropic(
    api_key=settings.ANTHROPIC_API_KEY,
    max_retries=3,  # default is 2
    timeout=60.0,   # in seconds
)

For your own retries (e.g., when the response fails validation), use idempotency keys to ensure the same call doesn't get charged twice if you retry:

response = client.messages.create(
    ...,
    extra_headers={"anthropic-idempotency-key": f"order-{order.id}-summary-v1"},
)

Stripe-style discipline: same key = same response, even on retry.

Putting it together — a production-ready pattern

# Reusable client wrapper
from django.conf import settings
import anthropic
import logging

logger = logging.getLogger(__name__)

_client = None

def get_client():
    global _client
    if _client is None:
        _client = anthropic.Anthropic(
            api_key=settings.ANTHROPIC_API_KEY,
            max_retries=3,
            timeout=60.0,
        )
    return _client


def ask(user_prompt: str, system_prompt: str = None) -> str:
    client = get_client()
    kwargs = {
        "model": settings.ANTHROPIC_MODEL,
        "max_tokens": 1024,
        "messages": [{"role": "user", "content": user_prompt}],
    }
    if system_prompt:
        kwargs["system"] = [{
            "type": "text",
            "text": system_prompt,
            "cache_control": {"type": "ephemeral"},
        }]
    try:
        response = client.messages.create(**kwargs)
    except anthropic.APIError:
        logger.exception("Anthropic API call failed")
        raise
    return response.content[0].text

That's enough to ship a feature today.

Next steps

  • Tutorial 5: building RAG so the model can answer questions about your data
  • Tutorial 7: prompt patterns that actually work in production
  • Tutorial 8: deep dive on streaming and the JavaScript client
  • Tutorial 9: cost optimization at scale