How Rotating a JWT Secret Logged Out 34,000 Users and Exposed a Session Design Flaw
← Back
March 13, 2026Security10 min read

How Rotating a JWT Secret Logged Out 34,000 Users and Exposed a Session Design Flaw

Published March 13, 202610 min read

At 10:14 AM on a Tuesday, I rotated our JWT signing secret as part of a scheduled security review. By 10:16 AM, our support inbox had 47 new tickets. By 10:30 AM, 34,000 users had been forcibly logged out simultaneously. By noon, we had identified a session architecture flaw that had been silently waiting to cause exactly this failure for three years.

Production Failure

The rotation itself took seconds. Update the JWT_SECRET environment variable, restart the API pods, done. Every security checklist says to rotate secrets regularly. What no checklist warned me about: when your entire authentication system uses a single symmetric HMAC secret for both signing and verification, rotating that secret instantly invalidates 100% of active tokens. There is no grace period. There is no overlap. Every user currently logged in becomes unauthenticated the moment the new pods come up.

34,000 users logged out instantly
2 min from rotation to support flood
847 support tickets in 90 minutes
0 seconds of warning to users

False Assumptions: "It's Just a Config Change"

The team's mental model was: rotate secret → old tokens invalid → users re-authenticate → business as usual. On paper, that's correct. In practice, it ignored three things:

  • Scale of simultaneous impact. 34,000 active sessions at 10 AM on a Tuesday is not a slow trickle of re-logins. It's a synchronised stampede. Every user hitting any authenticated endpoint got a 401. Our rate limiter — which was tuned for attack traffic — tripped on the login surge and started blocking legitimate re-authentication attempts.
  • State lost in long sessions. Several enterprise users were mid-way through multi-step workflows (bulk uploads, report generation) that had no session persistence. Their work was gone.
  • Mobile apps don't re-auth gracefully. Our mobile apps had no interceptor to handle 401s and redirect to login. They showed a blank screen or a cryptic error, not a login prompt.
"Why is everyone suddenly logged out? Did we get hacked?" — first support ticket, 10:16 AM. The irony: the rotation was preventing a potential hack. But users can't tell the difference between a security measure and a security breach if both look identical from the outside.

Investigation: Finding the Actual Architecture Flaw

The immediate fix was obvious — rotate back to the old secret, restore sessions, then plan the rotation properly. But while doing the post-mortem we found something worse than the incident itself: the JWT architecture had no revocation mechanism. At all.

Tokens were signed with HS256 (symmetric HMAC). The secret was a single value. There was no token version field, no jti (JWT ID) claim, no token family tracking, and no server-side session store. Once a token was issued, the only way to invalidate it was to rotate the secret — which, as we'd just learned, was a nuclear option.

  ORIGINAL ARCHITECTURE (symmetric, single secret)
  ──────────────────────────────────────────────────────────────────

  Login                     API Request                  Rotation
  ──────                    ───────────                  ─────────
  User logs in              Client sends JWT             Secret changes
       │                         │                            │
       ▼                         ▼                            ▼
  Server signs JWT          Server verifies:            ALL tokens
  with JWT_SECRET           HMAC(header.payload,        immediately
       │                    JWT_SECRET) == sig?          invalid
       ▼                         │                            │
  Token issued               ✅ YES → serve            34,000 users
  (no expiry tracking)       ❌ NO  → 401              logged out
  (no revocation list)
  (no token version)


  TOKEN STRUCTURE (missing critical fields)
  ──────────────────────────────────────────────────────────────────

  {
    "sub": "user_1041",
    "iat": 1773190000,
    "exp": 1773276400,
    ← no "jti" (token ID)
    ← no "ver" (secret version)
    ← no "fam" (token family for rotation)
  }

The missing jti claim meant we couldn't revoke individual tokens (no way to track "this specific token has been invalidated"). The missing version field meant we couldn't support multiple active secrets simultaneously during a rotation window.

Root Cause: Symmetric HMAC + No Rotation Strategy = A Ticking Clock

HS256 with a single secret is not inherently wrong for small systems. The flaw was that we had scaled to 34,000 active users without ever designing a rotation strategy. The secret had never been rotated in three years of production — which meant if it had ever been compromised, an attacker could have been forging tokens for up to three years without us knowing. The rotation was correct. The architecture that made it painful was the problem.

auth/jwt.ts — before and after
// BEFORE — single secret, no version, no rotation support
import jwt from 'jsonwebtoken';

export function signToken(userId: string) {
  return jwt.sign(
    { sub: userId },
    process.env.JWT_SECRET!,
    { expiresIn: '24h' }
  );
}

export function verifyToken(token: string) {
  // Only works with current secret — instant global logout on rotation
  return jwt.verify(token, process.env.JWT_SECRET!);
}


// AFTER — versioned secrets, overlap window, graceful rotation
const secrets: Record = {
  [process.env.JWT_SECRET_VERSION!]: process.env.JWT_SECRET!,
  // During rotation: keep previous version active for overlap window
  ...(process.env.JWT_SECRET_PREV_VERSION && process.env.JWT_SECRET_PREV
    ? { [process.env.JWT_SECRET_PREV_VERSION]: process.env.JWT_SECRET_PREV }
    : {}),
};

export function signToken(userId: string) {
  const version = process.env.JWT_SECRET_VERSION!;
  return jwt.sign(
    {
      sub: userId,
      ver: version,           // which secret version signed this
      jti: crypto.randomUUID(), // unique token ID for future revocation
    },
    secrets[version],
    { expiresIn: '24h' }
  );
}

export function verifyToken(token: string) {
  // Decode without verification first to read the version claim
  const decoded = jwt.decode(token) as { ver?: string } | null;
  const version = decoded?.ver ?? process.env.JWT_SECRET_VERSION!;
  const secret = secrets[version];

  if (!secret) {
    throw new Error('Token signed with unknown secret version');
  }

  // Verify with the correct versioned secret
  return jwt.verify(token, secret);
}

Architecture Fix: Versioned Secrets + Overlap Window + Client-Side 401 Handling

We made four changes. We chose this approach over switching to RS256 (asymmetric) because it required zero infrastructure changes — no key management service, no JWKS endpoint, no client-side public key distribution. The versioned HMAC approach gave us graceful rotation with one environment variable change and a 24-hour overlap window.

  NEW ROTATION PROCESS (versioned secrets, 24h overlap)
  ──────────────────────────────────────────────────────────────────

  Day 0: Normal operation
  ─────────────────────────────────────────────
  JWT_SECRET_VERSION = "v3"
  JWT_SECRET         = "secret-v3"
  (no PREV vars set)

  All tokens signed with v3. Verified with v3. ✓


  Day 1: Rotation starts
  ─────────────────────────────────────────────
  JWT_SECRET_VERSION      = "v4"       ← new version
  JWT_SECRET              = "secret-v4" ← new secret
  JWT_SECRET_PREV_VERSION = "v3"       ← keep old version
  JWT_SECRET_PREV         = "secret-v3" ← keep old secret

  New tokens: signed with v4
  Old tokens (ver=v3): still verified with v3 ✓
  Users stay logged in through the overlap ✓


  Day 2: Rotation complete (remove PREV vars)
  ─────────────────────────────────────────────
  JWT_SECRET_VERSION = "v4"
  JWT_SECRET         = "secret-v4"
  (PREV vars removed — v3 tokens now expired naturally)

  Zero forced logouts. Zero support tickets. ✓

The other three changes:

  • Mobile 401 interceptor. Both iOS and Android clients now have an Axios/URLSession interceptor that catches 401 responses, clears the stored token, and navigates to the login screen with a human-readable message rather than a blank screen.
  • Rate limiter carve-out for /auth/login. The login endpoint is now exempt from the general rate limiter and has its own dedicated limit with exponential back-off per IP — so legitimate re-auth surges don't get blocked alongside actual brute-force attempts.
  • Rotation runbook. A documented procedure in the team wiki: what to set, what to monitor, how long to keep the overlap window, and how to verify the rotation completed cleanly before removing the previous secret.

Lessons Learned

  • A secret you've never rotated is a secret you can't safely rotate. If your first rotation is also your first time discovering whether the rotation is safe, you will have an incident. Rotation drills belong in staging, not as a surprise in production.
  • Symmetric JWT + single secret has no graceful degradation. It works fine until you need to rotate, revoke, or audit. Adding a version claim costs nothing and buys everything.
  • Rate limiters need to know about planned surges. A login surge caused by a forced logout looks identical to a credential-stuffing attack. Plan for both patterns differently.
  • Mobile clients need a 401 strategy, not just a 401 response. A bare 401 is a complete session failure on mobile. The client must know how to recover, not just fail.
  • Security work can feel like an outage to users. Communicate. Even a brief status page update saying "we rotated security credentials, please log in again" would have halved the support tickets.
— The rotation was right. The architecture that made it painful was three years in the making.
Share this
← All Posts10 min read