March 13, 2026API8 min read

How a Missing Idempotency Key Charged 12,000 Users Twice in 4 Minutes

Published March 13, 20268 min read

At 11:47 PM on a Friday, our on-call Slack channel lit up with a single message: "Users are getting charged twice." Within 4 minutes, 12,000 customers had been billed a second time. $47,000 in duplicate charges. Support tickets were flooding in before we could even open our dashboards.

Production Failure

We ran a subscription-based SaaS product with a mobile app (iOS and Android) handling in-app purchases and plan upgrades. The payment flow hit our FastAPI backend, which created a Stripe payment intent and confirmed the charge.

That Friday night, a brief AWS us-east-1 latency spike hit around 11:45 PM. Nothing catastrophic — just 8 seconds of elevated response times on our payment service. But those 8 seconds were enough to trigger our mobile client's retry logic, and what followed was a textbook API design failure we'd been one bad latency spike away from for two years.

12,000duplicate charges

$47kin 4 minutes

4 minblast radius

~22spayment p99 under load

Refunds were issued within 2 hours. The financial damage was recoverable. The trust damage took longer.

False Assumptions

The first theory was Stripe webhook replay. Our webhook processor handles payment_intent.succeeded events — maybe a duplicate event slipped through? We pulled the Stripe dashboard and ruled it out immediately: each affected user had two distinct payment intent IDs created seconds apart. Stripe wasn't replaying anything. We were creating the intents twice.

The second theory was a frontend double-submit bug. Maybe the mobile client was submitting the payment form twice on a rapid button tap? We checked the mobile release deployed that week — no changes to the payment flow, no button debouncing regressions. And the request timestamps didn't match a double-tap pattern. They were 15–30 seconds apart, every single time.

That gap — 15 to 30 seconds — was the clue we missed for 40 minutes.

Profiling the Request Trail

We pulled API gateway logs for POST /api/v1/payments/charge between 11:45 and 11:52 PM. The pattern was unmistakable:

API Gateway Log — POST /api/v1/payments/charge (11:46–11:49 PM)

user_id=83741   11:46:03.211   200 OK    (22.4s response time)
user_id=83741   11:46:18.744   200 OK    ( 8.1s response time)  ← RETRY

user_id=91052   11:46:04.889   200 OK    (19.8s response time)
user_id=91052   11:46:19.901   200 OK    ( 7.2s response time)  ← RETRY

user_id=77130   11:46:05.441   200 OK    (21.1s response time)
user_id=77130   11:46:20.503   200 OK    ( 6.4s response time)  ← RETRY

Pattern: first request exceeded client timeout (15s threshold)
         server still processed and charged — response never arrived
         second request succeeded — creating a SECOND Stripe charge

The mobile client had a hardcoded 15-second network timeout. Under normal conditions our payment endpoint responded in 3–6 seconds. During the AWS latency spike, response times ballooned to 18–24 seconds — just enough to cross the timeout threshold for thousands of concurrent users.

The mobile HTTP library had automatic retry-on-timeout enabled by default with a single retry attempt. The first request actually succeeded on the server — Stripe processed the charge — but the response never reached the client before it abandoned the connection. The client retried. The server processed a second, independent charge with no knowledge of the first. Both succeeded. Both users were billed.

Without Idempotency Key — The Double-Charge Flow

Mobile Client               API Server             Stripe
     |                           |                    |
     |--- POST /charge --------->|                    |
     |                           |--- CreateIntent -->|
     |   [15s timeout fires]     |    [processing...] |
     |   [client abandons] ✗     |                    |
     |                           |<-- Intent OK ------|
     |                           |   [response lost]  |
     |                           |                    |
     |--- POST /charge (retry) ->|                    |
     |                           |--- CreateIntent -->|  ← 2nd charge!
     |                           |<-- Intent OK ------|
     |<-- 200 OK -----------------|                    |
     |                           |                    |
  User sees 1 charge.    Server processed 2.     Stripe billed twice.

Root Cause

The root cause was a non-idempotent POST endpoint on a state-mutating financial operation. Our /payments/charge handler had no mechanism to detect or reject duplicate requests. Every call created a fresh Stripe payment intent regardless of whether an identical request had been processed moments earlier.

Three compounding factors aligned that night:

Timeout miscalibration: The mobile client timeout (15s) was shorter than our p99 payment latency under load (22s). This wasn't a known gap — we'd never measured p99 under concurrent load before.
Silent retry: The HTTP library retried on timeout by default. There was no documentation in our codebase noting this behavior. A new engineer had integrated it 8 months earlier without flagging it.
No deduplication layer: The FastAPI endpoint had no request fingerprinting, no idempotency key check, not even a basic client-generated request ID field in the schema. Every POST was treated as a novel intent.

We had been lucky for two years. Normal payment latency (3–6s) kept us well below the 15-second cliff. The AWS spike was the first time production conditions exposed the gap at scale.

Architecture Fix

The fix required coordinated changes at three layers: client generation, server enforcement, and infrastructure observability.

Layer 1 — Client: UUID per payment attempt. Before initiating any payment request, the mobile app now generates a X-Idempotency-Key UUID and stores it in local state. On retry, the same key is sent. The key is cleared only after a confirmed server success or explicit user cancellation — never on timeout alone.

Layer 2 — Server: Redis deduplication before Stripe. The FastAPI endpoint checks Redis for the idempotency key before touching Stripe. If the key exists, it returns the cached response immediately — same payload, zero additional charges. If the key is new, it processes normally, stores the result with a 24-hour TTL, then responds.

payments/routes.py

import uuid
import json
from fastapi import Header, HTTPException, Depends
from redis.asyncio import Redis

async def charge_payment(
    payload: ChargeRequest,
    x_idempotency_key: str = Header(...),
    redis: Redis = Depends(get_redis),
):
    if not is_valid_uuid(x_idempotency_key):
        raise HTTPException(400, "Invalid idempotency key format")

    cache_key = f"idem:{x_idempotency_key}"

    # Return cached result for duplicate requests
    cached = await redis.get(cache_key)
    if cached:
        return json.loads(cached)

    # Forward same idempotency key to Stripe — prevents double-charge
    # even if our Redis write fails after Stripe succeeds
    result = await stripe_client.create_and_confirm_intent(
        amount=payload.amount_cents,
        currency=payload.currency,
        customer_id=payload.stripe_customer_id,
        idempotency_key=x_idempotency_key,  # critical
    )

    response = {
        "payment_intent_id": result.id,
        "status": result.status,
        "amount": result.amount,
    }

    # Cache with 24-hour TTL — covers all realistic retry windows
    await redis.setex(cache_key, 86400, json.dumps(response))

    return response

The Redis lookup adds roughly 1–2ms of overhead on the happy path — completely negligible against a payment flow that takes 3–6 seconds under normal conditions. We also forward the same idempotency key to Stripe directly, adding a second deduplication layer at the gateway level in case our Redis write ever fails after Stripe succeeds.

Layer 3 — Observability: Alert before clients time out. We added a p99 latency alarm on the payment service that fires at 20 seconds — 25 seconds below the new 45-second client timeout. The goal is to catch degradation before retry conditions can occur, not just after the damage is done.

With Idempotency Key — Safe Retry Flow

Mobile Client               API Server      Redis          Stripe
     |                           |             |               |
     | X-Idempotency-Key: abc123 |             |               |
     |--- POST /charge --------->|             |               |
     |                           |-- GET idem:abc123 -------->|
     |                           |<-- (nil) ---|               |
     |                           |--- CreateIntent (key=abc) ->|
     |   [timeout — retry]       |             |    [processing]|
     |                           |<-- OK ----------------------|
     |                           |-- SET idem:abc, result --->|
     |                           |             |               |
     | X-Idempotency-Key: abc123 |             |               |
     |--- POST /charge (retry) ->|             |               |
     |                           |-- GET idem:abc123 -------->|
     |                           |<-- (cached!) |              |
     |<-- 200 (cached response) -|             |               |
     |                           |             |               |
  One charge. One intent.   Stripe billed once. Client satisfied.

Lessons Learned

Any endpoint that creates money, sends a notification, or mutates critical state must be structurally idempotent — enforced in code, not assumed from client behavior.

Measure p99 under load, not p50 at rest. Our 15-second timeout looked safe against a 3-second average. It was not safe against a 22-second p99 during a 2,000 req/s payment spike. We now benchmark p95 and p99 for every external-facing endpoint and document them alongside timeout configuration.
Retry logic and idempotency are a package deal. Any HTTP client that retries on failure must send idempotency keys. We added a CI lint rule in our mobile codebase: any POST to a payment, notification, or order endpoint without an X-Idempotency-Key header is a build error.
Forward idempotency keys to downstream services. Our Redis cache deduplicates at the API layer, but Stripe also supports idempotency keys natively. Forwarding the same key to Stripe adds a second safety net for the race condition where our Redis write fails after a successful Stripe charge.
Alert below your timeout thresholds. A p99 latency alarm at 20 seconds (below the 45-second client timeout) gives us a 25-second window to intervene before retry storms become possible. Previously we had no payment latency alarm at all.

The refunds landed in 2 hours. The architectural fix shipped 48 hours later. The real lesson wasn't about idempotency keys — it was that two years of fast payment responses had masked a structural flaw in our API contract. One AWS hiccup was all it took.

— Built from a real production incident. Dollar figures and user counts are approximate but structurally accurate.