From pip install to a streaming chat view in production. Authentication, error handling, prompt caching, and cost-aware patterns for Django + Claude.
A Django view that takes a user's prompt, sends it to Claude, streams the response back to the user as it generates, handles errors gracefully, and uses prompt caching to keep costs down.
pip install anthropic
Add to requirements.txt with the latest stable version, no upper bound.
Sign up at console.anthropic.com, generate a key, and set it as an environment variable:
# .env
ANTHROPIC_API_KEY=sk-ant-...
Never commit the key. Use python-decouple (already in your Django stack) to read it.
settings.pyfrom decouple import config
ANTHROPIC_API_KEY = config("ANTHROPIC_API_KEY", default="")
ANTHROPIC_MODEL = config("ANTHROPIC_MODEL", default="claude-sonnet-4-6")
Externalising the model name means you can swap models without redeploying — useful for cost tuning.
# myapp/views.py
import anthropic
from django.conf import settings
from django.http import JsonResponse
from django.views.decorators.http import require_POST
from django.contrib.auth.decorators import login_required
client = anthropic.Anthropic(api_key=settings.ANTHROPIC_API_KEY)
@login_required
@require_POST
def ask_claude(request):
user_prompt = request.POST.get("prompt", "").strip()
if not user_prompt:
return JsonResponse({"error": "Empty prompt"}, status=400)
try:
response = client.messages.create(
model=settings.ANTHROPIC_MODEL,
max_tokens=1024,
messages=[{"role": "user", "content": user_prompt}],
)
except anthropic.APIError as e:
return JsonResponse({"error": str(e)}, status=502)
answer = response.content[0].text
return JsonResponse({"answer": answer})
This works. It's also slow — the user sees a spinner for 5–30 seconds while the whole response generates. Streaming fixes that.
Streaming sends tokens to the user as Claude produces them. Critical for any UI showing AI output longer than a sentence or two.
# myapp/views.py (streaming version)
import json
import anthropic
from django.conf import settings
from django.http import StreamingHttpResponse
from django.views.decorators.http import require_POST
from django.contrib.auth.decorators import login_required
client = anthropic.Anthropic(api_key=settings.ANTHROPIC_API_KEY)
def claude_stream(prompt):
"""Generator that yields SSE-formatted text deltas from Claude."""
try:
with client.messages.stream(
model=settings.ANTHROPIC_MODEL,
max_tokens=1024,
messages=[{"role": "user", "content": prompt}],
) as stream:
for text in stream.text_stream:
# SSE format: each event is "data: <json>\n\n"
yield f"data: {json.dumps({'delta': text})}\n\n"
yield f"data: {json.dumps({'done': True})}\n\n"
except anthropic.APIError as e:
yield f"data: {json.dumps({'error': str(e)})}\n\n"
@login_required
@require_POST
def ask_claude_stream(request):
prompt = request.POST.get("prompt", "").strip()
if not prompt:
return StreamingHttpResponse(
f"data: {json.dumps({'error': 'Empty prompt'})}\n\n",
content_type="text/event-stream",
)
response = StreamingHttpResponse(
claude_stream(prompt),
content_type="text/event-stream",
)
response["Cache-Control"] = "no-cache"
response["X-Accel-Buffering"] = "no" # Disable nginx buffering
return response
Two important details:
X-Accel-Buffering: no — without this, nginx buffers the entire response and your "streaming" arrives all at once at the end. Always set it for SSE through nginx.Cache-Control: no-cache — don't let any proxy cache the stream.Tutorial 8 covers SSE in detail with the JavaScript client side.
The Anthropic SDK raises specific exceptions you should handle distinctly:
try:
response = client.messages.create(...)
except anthropic.AuthenticationError:
# Your API key is wrong or revoked. Page yourself.
logger.error("Anthropic auth failed — check API key")
return JsonResponse({"error": "Service misconfigured"}, status=500)
except anthropic.RateLimitError as e:
# You're sending too many requests. Back off.
return JsonResponse(
{"error": "Too many requests, please try again in a moment"},
status=429
)
except anthropic.APIStatusError as e:
# Generic API error (4xx or 5xx from Anthropic)
logger.warning(f"Anthropic API error {e.status_code}: {e}")
return JsonResponse({"error": "AI service unavailable"}, status=502)
except anthropic.APIError as e:
# Catch-all
logger.exception("Unexpected Anthropic error")
return JsonResponse({"error": "Something went wrong"}, status=500)
Three rules:
logger.exception so you can debug. Sentry catches these.If you have a long system prompt that's the same for every request (like instructions, examples, a knowledge base), enable prompt caching. Anthropic charges 10% of normal input cost for cached tokens.
response = client.messages.create(
model=settings.ANTHROPIC_MODEL,
max_tokens=1024,
system=[
{
"type": "text",
"text": LONG_SYSTEM_PROMPT, # Thousands of tokens of instructions/examples
"cache_control": {"type": "ephemeral"},
}
],
messages=[{"role": "user", "content": user_prompt}],
)
The cache lasts ~5 minutes. If you serve a popular feature, your effective cost on the system prompt drops by 90%. Worth thousands of euros at scale.
Cache the largest stable parts of your prompt: system messages, tool definitions, document context, examples. Don't bother caching short or volatile content (the per-call user message).
Django 5.2 fully supports async views. Use the async Anthropic client when you have multiple AI calls in flight or when you're using async views anyway:
from anthropic import AsyncAnthropic
aclient = AsyncAnthropic(api_key=settings.ANTHROPIC_API_KEY)
async def ask_claude_async(request):
prompt = request.POST.get("prompt", "")
response = await aclient.messages.create(
model=settings.ANTHROPIC_MODEL,
max_tokens=1024,
messages=[{"role": "user", "content": prompt}],
)
return JsonResponse({"answer": response.content[0].text})
Use this when you're fanning out parallel calls (e.g., classifying ten items at once with asyncio.gather).
The SDK retries automatically on transient errors with exponential backoff. You can tune it:
client = anthropic.Anthropic(
api_key=settings.ANTHROPIC_API_KEY,
max_retries=3, # default is 2
timeout=60.0, # in seconds
)
For your own retries (e.g., when the response fails validation), use idempotency keys to ensure the same call doesn't get charged twice if you retry:
response = client.messages.create(
...,
extra_headers={"anthropic-idempotency-key": f"order-{order.id}-summary-v1"},
)
Stripe-style discipline: same key = same response, even on retry.
# Reusable client wrapper
from django.conf import settings
import anthropic
import logging
logger = logging.getLogger(__name__)
_client = None
def get_client():
global _client
if _client is None:
_client = anthropic.Anthropic(
api_key=settings.ANTHROPIC_API_KEY,
max_retries=3,
timeout=60.0,
)
return _client
def ask(user_prompt: str, system_prompt: str = None) -> str:
client = get_client()
kwargs = {
"model": settings.ANTHROPIC_MODEL,
"max_tokens": 1024,
"messages": [{"role": "user", "content": user_prompt}],
}
if system_prompt:
kwargs["system"] = [{
"type": "text",
"text": system_prompt,
"cache_control": {"type": "ephemeral"},
}]
try:
response = client.messages.create(**kwargs)
except anthropic.APIError:
logger.exception("Anthropic API call failed")
raise
return response.content[0].text
That's enough to ship a feature today.