Event-driven architecture patterns that actually work
Event-driven architecture promises loose coupling and independent scalability. In practice, it introduces distributed systems problems — out-of-order delivery, duplicate events, lost events, and debugging that requires reconstructing a timeline across five services. These patterns have been refined by teams who learned those lessons. Here is what actually works and when to use it.
What an event is (and is not)
An event is a fact about something that happened in the past. "Order was placed." "Payment was processed." "User signed up." Events are immutable — they describe history, not intent. This is different from a command (which says "do this") and a query (which says "tell me this").
Good event names are past tense and domain-meaningful: order.placed, payment.failed, user.email_verified. Bad event names: update_order, send_notification.
Pattern 1: Domain events within a service
Start here before reaching for Kafka. Domain events within a single service give you decoupled modules without distributed systems complexity:
# Simple in-process event bus
from dataclasses import dataclass
from typing import Callable, Any
from datetime import datetime
@dataclass
class DomainEvent:
event_type: str
aggregate_id: str
payload: dict
occurred_at: datetime = None
def __post_init__(self):
if not self.occurred_at:
self.occurred_at = datetime.utcnow()
class EventBus:
def __init__(self):
self._handlers: dict[str, list[Callable]] = {}
def subscribe(self, event_type: str, handler: Callable) -> None:
self._handlers.setdefault(event_type, []).append(handler)
def publish(self, event: DomainEvent) -> None:
for handler in self._handlers.get(event.event_type, []):
handler(event)
# Usage
bus = EventBus()
# Notification module subscribes to order events
@bus.subscribe("order.placed")
def send_confirmation_email(event: DomainEvent):
order_id = event.aggregate_id
email = event.payload["customer_email"]
send_email(email, f"Order {order_id} confirmed")
# Order service publishes
def place_order(cart: Cart) -> Order:
order = Order.create(cart)
order.save()
bus.publish(DomainEvent(
event_type="order.placed",
aggregate_id=order.id,
payload={"customer_email": cart.user.email, "total": order.total},
))
return order
Pattern 2: The outbox pattern (guaranteed delivery)
The hardest problem in event-driven systems: how do you atomically save a database record AND publish an event? If you do them separately, you can save the record but fail to publish (lost event) or publish but fail to save (inconsistent state).
The outbox pattern solves this by writing the event to a database table (the "outbox") in the same transaction as your business data. A separate process reads the outbox and publishes to Kafka/SQS:
-- Outbox table
CREATE TABLE outbox_events (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
aggregate_type VARCHAR(100) NOT NULL,
aggregate_id UUID NOT NULL,
event_type VARCHAR(100) NOT NULL,
payload JSONB NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW(),
published_at TIMESTAMPTZ,
retry_count INT DEFAULT 0
);
# Write to DB and outbox in the same transaction
def place_order(cart: Cart, db: Session) -> Order:
order = Order.create(cart)
db.add(order)
# Write event to outbox in same transaction
outbox_event = OutboxEvent(
aggregate_type="Order",
aggregate_id=order.id,
event_type="order.placed",
payload={"customer_email": cart.user.email, "total": float(order.total)},
)
db.add(outbox_event)
db.commit() # atomic: both order and event saved or neither
return order
# Outbox poller (runs separately)
def poll_and_publish():
with get_db() as db:
unpublished = db.query(OutboxEvent).filter(
OutboxEvent.published_at.is_(None),
OutboxEvent.retry_count < 5,
).order_by(OutboxEvent.created_at).limit(100).all()
for event in unpublished:
try:
kafka_producer.send(
topic=event.event_type,
key=str(event.aggregate_id).encode(),
value=event.payload,
)
event.published_at = datetime.utcnow()
except Exception as e:
event.retry_count += 1
db.commit()
Pattern 3: Idempotent consumers
At-least-once delivery means the same event may arrive more than once. Your consumers must handle duplicates:
# Track which events have been processed
def handle_order_placed(event: dict):
event_id = event["event_id"]
# Check if already processed
if ProcessedEvent.exists(event_id):
logger.info("Skipping duplicate event %s", event_id)
return
# Process the event
send_confirmation_email(
to=event["customer_email"],
order_id=event["aggregate_id"],
)
# Mark as processed
ProcessedEvent.create(event_id=event_id, processed_at=datetime.utcnow())
Pattern 4: Event sourcing (use sparingly)
Instead of storing current state, store the full history of events. The current state is derived by replaying events. Powerful for audit trails and temporal queries; expensive to implement correctly.
# Events as the source of truth
events = [
{"type": "order.created", "data": {"items": [...]}},
{"type": "item.added", "data": {"item_id": "a", "qty": 2}},
{"type": "item.removed", "data": {"item_id": "b"}},
{"type": "coupon.applied", "data": {"code": "SAVE10"}},
]
def rebuild_order(events: list) -> Order:
"""Derive current state from event history."""
order = Order()
for event in events:
if event["type"] == "order.created":
order.items = event["data"]["items"]
elif event["type"] == "item.added":
order.add_item(event["data"]["item_id"], event["data"]["qty"])
elif event["type"] == "coupon.applied":
order.apply_coupon(event["data"]["code"])
return order
Only use event sourcing if you genuinely need the full history. Most systems need current state 99% of the time. Outbox + Kafka gives you the benefits of event-driven architecture without the complexity of full event sourcing.
When event-driven architecture is the wrong choice
- Simple CRUD apps: Synchronous calls are fine. Events add complexity without benefit.
- When you need immediate consistency: A payment that needs to be confirmed before showing a success screen cannot be asynchronous.
- Small teams early in development: Start with synchronous services, extract to async when you have a clear scaling bottleneck.
Event-driven architecture is not better than synchronous architecture — it makes different tradeoffs. The patterns above are what you need when you have made that tradeoff deliberately and need the system to be reliable.