01 Goal
Replace the current Rails-style session cookie with short-lived JWTs (15-min access + 7-day refresh) so we can scale the auth service horizontally without sticky sessions.
Non-goal: introducing OAuth providers. Out of scope for this iteration.
02 Architecture
flowchart LR classDef client fill:#161b22,stroke:#58a6ff,color:#e6edf3 classDef edge fill:#1f1a0a,stroke:#d29922,color:#ffa657 classDef app fill:#0a1f12,stroke:#3fb950,color:#7ee787 classDef store fill:#1c2128,stroke:#8b949e,color:#c9d1d9 user["Browser"]:::client cdn["Edge (CDN)"]:::edge api["API gateway
verifies JWT"]:::app auth["auth-service
issues + refreshes"]:::app cache["KV cache
refresh tokens"]:::store user --> cdn --> api api -. on 401 .-> auth auth <--> cache auth -- "access + refresh" --> user
03 Phases
| Phase | Owner | Effort | Outcome |
|---|---|---|---|
| 1 — Issue JWT alongside cookies | alice | 3 days | Dual-write. No user-visible change. |
| 2 — API gateway accepts both | bob | 2 days | Either auth source works. Metrics on use. |
| 3 — Frontend switches to JWT | alice | 4 days | JWT preferred; cookies still emitted for fallback. |
| 4 — Shadow 14 days | — | 14 days | Watch for token-refresh edge cases at scale. |
| 5 — Stop emitting cookies | bob | 1 day | Cleanup. PR removes the dual-write code. |
04 Token format
{
"iss": "auth.example.com",
"sub": "u_a1b2c3d4",
"iat": 1715900000,
"exp": 1715900900,
"roles": ["user", "billing"],
"tenant": "acme-corp"
}
Signed with ES256 (P-256). Public key rotated quarterly; both current and previous accepted during rotation.
05 Rollback
The dual-write phase makes rollback trivial: at any point through phase 4, the API gateway can prefer cookies again with a single config flag. The cleanup phase (5) is the last step only after the shadow period passes without incident.
06 Risks
- Refresh-token theft — mitigated by IP+UA fingerprint mismatch detection (rotate on mismatch, invalidate old).
- Clock skew — accept tokens with
nbfup to 30s in the future to tolerate small drifts. - Logout sync — cookies invalidate server-side; JWT requires a deny-list in cache for 15 min after explicit logout.