Django Advanced

Event-Driven Django: Redis Streams, Kafka, and the Transactional Outbox Pattern

Decouple your Django services with events instead of synchronous calls. Choose between Redis Streams and Kafka, guarantee delivery with the transactional outbox pattern, and build idempotent consumers that survive retries.

DjangoZen Team Jun 06, 2026 18 min read 9 views

Synchronous calls between services are a tax: every hop adds latency and a new way to fail, and a chain of services calling services means one slow link stalls the whole request. Event-driven architecture flips the model — services publish facts ("OrderPaid") and others react on their own schedule. The hard part is not the messaging; it is never losing an event and never processing one twice. This tutorial builds event-driven Django correctly, from broker choice through the outbox pattern to idempotent consumers.

Why events instead of calls

In a request-response design, when an order is paid your checkout view calls the email service, the inventory service, the analytics service, and the loyalty service in turn. Each call couples checkout to a service that might be slow or down, the latencies add up into the user's wait, and adding a fifth reaction means editing checkout again. With events, checkout publishes a single order.paid fact and returns; the four reactions happen independently, in parallel, and a fifth consumer can subscribe later without checkout knowing it exists. You trade immediate consistency for decoupling, resilience, and the ability to evolve each side separately — usually a good trade for anything that does not need a synchronous answer.

The cost is a new class of problem. Events are asynchronous, so the caller does not learn whether a reaction succeeded; delivery is "at least once," so duplicates happen; and ordering is not free. The rest of this tutorial is about meeting those guarantees, because an event system that loses or double-applies facts is worse than the synchronous calls it replaced.

Redis Streams vs Kafka

Redis StreamsKafka
SetupTrivial — you likely already run RedisHeavy — brokers, partitions, KRaft/ZooKeeper
RetentionCapped, memory-boundDurable, long retention, replayable
ThroughputHighVery high, horizontally partitioned
OrderingPer streamPer partition
Best forMost apps, moderate scaleHigh volume, replay, many consumers

Default to Redis Streams unless you genuinely need Kafka's durability, long retention, or replay. Redis Streams give you consumer groups, acknowledgements, and at-least-once delivery with almost no operational overhead — they are the right starting point for the vast majority of Django apps. Kafka is a serious commitment: a cluster to run, partitions to balance, consumer offsets to reason about. Adopt it when you have a real need for a durable, replayable log consumed by many independent teams — not because it appears on every "modern architecture" diagram.

The dual-write problem

The naive implementation looks obvious and is quietly broken: save the order to the database, then publish the event to the broker. Consider what happens when the process crashes between those two steps — you have committed the order but never told anyone, so the email never sends and inventory never decrements. Reverse the order and it is just as bad: you publish, consumers react, and then the database transaction rolls back, leaving everyone acting on an order that does not exist. You cannot wrap a database commit and a broker publish in a single atomic operation with a try/except, because they are two different systems with two different transaction boundaries. This is the dual-write problem, and it is the central correctness challenge of event-driven systems.

The transactional outbox

The fix is elegant: instead of publishing to the broker directly, write the event into an outbox table in the same database transaction as your business data. One commit, fully atomic — either both the order and its event are persisted, or neither is. A separate relay process then reads unpublished rows from the outbox and ships them to the broker.

class OutboxEvent(models.Model):
    topic = models.CharField(max_length=100)
    payload = models.JSONField()
    created_at = models.DateTimeField(auto_now_add=True)
    published_at = models.DateTimeField(null=True, blank=True)

@transaction.atomic
def pay_order(order):
    order.status = "paid"
    order.save()
    OutboxEvent.objects.create(
        topic="order.paid",
        payload={"order_id": str(order.id), "amount": str(order.total)},
    )   # commits together with the order — atomic, no lost events

Because the event lives in the same database, the same transaction either commits both or rolls back both. There is no window where the order exists but the event does not. This single pattern eliminates lost events, and it is the foundation everything else builds on.

The relay process

A relay reads unpublished outbox rows and publishes them to the broker, marking each as sent. It can be a Celery beat task running every few seconds, or a small dedicated daemon for lower latency:

@shared_task
def relay_outbox():
    rows = (OutboxEvent.objects
            .filter(published_at__isnull=True)
            .order_by("created_at")[:500])
    for ev in rows:
        redis.xadd(ev.topic, {"data": json.dumps(ev.payload)})
        ev.published_at = timezone.now()
        ev.save(update_fields=["published_at"])

The relay itself must tolerate failure: if it crashes after publishing but before marking the row sent, it will republish on restart — which is fine, because consumers are idempotent (next section). That is the deliberate design choice: the relay guarantees at-least-once publication, and the consumer side absorbs the resulting duplicates. Order the relay by created_at to preserve rough ordering, and batch the reads so it scales.

Idempotent consumers

At-least-once delivery means every consumer will see duplicates — from relay restarts, broker redelivery, or consumer crashes after work but before acknowledgement. A consumer that is not idempotent will send two emails, charge twice, or decrement inventory twice. Make every handler idempotent by giving each event a stable ID and recording which IDs you have processed, so reprocessing is a harmless no-op:

@transaction.atomic
def handle(event_id, payload):
    _, created = ProcessedEvent.objects.get_or_create(event_id=event_id)
    if not created:
        return                      # already handled — skip silently
    do_the_work(payload)

The get_or_create on a unique event_id is atomic, so even two concurrent deliveries of the same event cannot both pass. Wrap the dedupe check and the work in one transaction, so if the work fails the dedupe record rolls back too and a retry can succeed. Idempotency is not optional polish — it is the other half of the correctness guarantee that the outbox starts.

Event schemas and versioning

An event is a contract between the publisher and every consumer, and consumers deploy on their own schedules, so you cannot change an event's shape freely. Treat event payloads like a public API: include a schema version, make additive changes (new optional fields) the default, and never repurpose or remove a field without a new version that old consumers can ignore. Keep payloads focused on the fact — "order 123 was paid for 49.99" — rather than dumping your entire order object, so consumers depend on as little as possible. A thin, versioned, well-named event ages gracefully; a fat one coupled to your internal model breaks consumers every time you refactor.

Ordering and consumer groups

Sometimes order matters: order.created must be processed before order.shipped. Both Redis Streams and Kafka preserve order within a single stream or partition, so the trick is to route related events to the same partition — typically by keying on the entity ID (all events for order 123 go to the same partition and stay ordered). Consumer groups let multiple workers share the load of a stream while each message goes to exactly one worker, which is how you scale a consumer horizontally without double-processing. Design your keys so that things that must be ordered share a key, and things that are independent spread across partitions for parallelism.

Handling poison messages

Some events will never process successfully — malformed payloads, references to deleted data, bugs in the handler. Without a strategy, a single poison message blocks the stream or loops forever, retrying and failing. Give consumers a bounded retry with backoff, and after N failures move the event to a dead-letter queue — a separate stream of events that need human attention — so the main flow keeps moving. Monitor the dead-letter queue; a growing one is a signal something is wrong upstream. This is the operational safety valve that keeps one bad event from stalling everything behind it.

Observability and tracing

Asynchronous systems are harder to debug because a single user action fans out across many services and processes. Propagate a correlation ID from the original request into every event payload, and log it in every consumer, so you can reconstruct the full causal chain — "this email, this inventory change, and this analytics event all came from order 123's payment." Track per-topic lag (how far behind consumers are), processing rates, and dead-letter counts. Without this, an event-driven system becomes a set of black boxes that are impossible to reason about when something goes wrong at 2am.

When not to go event-driven

Events are not free, and not every interaction should be one. If the caller needs an immediate answer — "is this coupon valid?" — a synchronous call is correct; forcing it through events adds latency and complexity for nothing. If you have a small monolith with no real decoupling problem, in-process Django signals or direct function calls are simpler and perfectly adequate. Reach for a broker, an outbox, and idempotent consumers when you have genuine asynchronous workflows, multiple independent reactions to the same fact, or services that must evolve and scale separately. Adopt the machinery for the problems it solves, not as a default architecture.

Choreography versus orchestration

There are two ways to coordinate a multi-step process across services, and the choice shapes your whole system. In choreography, each service reacts to events and emits its own, with no central controller — order-paid triggers inventory, which emits stock-reserved, which triggers shipping. It is loosely coupled and resilient, but the overall flow is implicit, spread across many services, and hard to see in one place. In orchestration, a central coordinator explicitly drives the steps, calling or signalling each service in turn. The flow is visible and easy to reason about, but the orchestrator becomes a point of coupling and a place complexity accumulates. Most systems mix both: choreography for simple fan-out reactions, orchestration for complex business processes that need a clear, auditable sequence.

Sagas: transactions across services

A single database transaction cannot span multiple services, so when a business process touches several of them you need a saga — a sequence of local transactions where each step has a compensating action that undoes it. If booking a trip reserves a flight, a hotel, and a car, and the car step fails, the saga runs compensations to cancel the hotel and the flight, returning the system to a consistent state. Sagas trade the simplicity of an all-or-nothing transaction for the reality that distributed steps cannot be atomic. They are more work to design — every step needs an undo — but they are the honest answer to consistency across services, and the outbox pattern is what makes each step's event reliable.

Why exactly-once delivery is a myth

Teams new to event-driven systems often ask for exactly-once delivery, and it is worth being clear: in a distributed system with failures, exactly-once delivery is impossible. A message can always be delivered, the acknowledgement lost, and the message redelivered. What you can achieve is exactly-once processing, by combining at-least-once delivery (the broker keeps trying) with idempotent consumers (duplicates are harmless). This is why idempotency is non-negotiable rather than a nice-to-have — it is the only thing standing between "the broker did its job and redelivered" and "the customer was charged twice." Stop chasing exactly-once delivery and build for at-least-once plus idempotency instead.

Event-driven versus event sourcing

These two terms are often confused but are different ideas. Event-driven, the subject of this tutorial, means services communicate by publishing and reacting to events while keeping their own current-state databases. Event sourcing goes further: it makes the event log itself the source of truth, storing every change as an event and deriving current state by replaying them. Event sourcing gives you a perfect audit trail and the ability to reconstruct any past state, but it is a much larger commitment that reshapes how you store and query everything. You can be fully event-driven without event sourcing, and most teams should — adopt event sourcing only when the audit and time-travel benefits clearly justify its complexity.

Testing event-driven systems

Asynchronous systems need a deliberate testing strategy because the pieces are decoupled. Test each consumer in isolation by feeding it events and asserting on the resulting state and emitted events, which is fast and where most of your coverage should live. Test idempotency explicitly by delivering the same event twice and asserting the effect happened once — this is the test that catches the most dangerous class of bug. Add a few integration tests that exercise a full flow end to end through a real broker to catch wiring mistakes. The decoupling that makes event-driven systems resilient in production also makes them straightforward to unit test, one consumer at a time.

Summary

Event-driven Django comes down to two guarantees. Do not lose events: write each event into an outbox table in the same transaction as your business data, and let a relay publish it at-least-once — this closes the dual-write hole that silently corrupts naive implementations. Do not double-apply events: make every consumer idempotent, keyed on a stable event ID, so duplicates are harmless. Around that core, choose Redis Streams unless you truly need Kafka, version your event schemas like public APIs, key related events together for ordering, dead-letter the poison messages, and propagate correlation IDs so the whole asynchronous chain stays debuggable. Get the outbox and idempotency right and your services decouple cleanly without ever losing or duplicating a fact.