Security [HIGH]: MFA TOTP replay-race + missing backup codes #43

Closed
opened 2026-04-16 22:05:09 +02:00 by Hartmut · 1 comment
Owner

Problem

(1) TOTP validation in auth.ts is SELECT → validate → UPDATE without atomic CAS. Two parallel requests with the same valid code both see lastTotpAt=null and both succeed → MFA replay-within-window. (2) No backup/recovery codes exist — lost device forces admin to disable MFA, a security regression.

Evidence

  • apps/web/src/server/auth.ts:151,134,174 — non-atomic read/validate/write sequence
  • packages/api/src/router/user-self-service-procedure-support.ts:214-224,276-287 — same pattern
  • Grep backupCode|recoveryCode → 0 hits

Impact

(1) Stolen TOTP code (shoulder-surf, phishing proxy) usable twice within 30 s window — MFA design promise violated. (2) Lost-device MFA recovery requires admin to disable MFA → temporary 1FA state is routine.

Proposed Fix

(1) Atomic CAS: prisma.user.updateMany({ where: { id, OR: [{lastTotpAt: null}, {lastTotpAt: {lt: windowStart}}] }, data: { lastTotpAt: now }}) — rows-affected=0 → replay. (2) Add 8-10 backup codes (argon2-hashed, single-use) generated at MFA enablement, displayed once, confirmed consumed in DB.

Acceptance Criteria

  • Parallel TOTP requests with same code: exactly one succeeds (race test)
  • Backup codes: generated on enable, shown once, consumed on use, regeneratable
  • docs/security-architecture.md updated

Parent Epic: #1
Source: Full-Codebase Security Audit 2026-04-16 (A-7, A-8)

## Problem (1) TOTP validation in auth.ts is SELECT → validate → UPDATE without atomic CAS. Two parallel requests with the same valid code both see `lastTotpAt=null` and both succeed → MFA replay-within-window. (2) No backup/recovery codes exist — lost device forces admin to disable MFA, a security regression. ## Evidence - `apps/web/src/server/auth.ts:151,134,174 — non-atomic read/validate/write sequence` - `packages/api/src/router/user-self-service-procedure-support.ts:214-224,276-287 — same pattern` - `Grep `backupCode|recoveryCode` → 0 hits` ## Impact (1) Stolen TOTP code (shoulder-surf, phishing proxy) usable twice within 30 s window — MFA design promise violated. (2) Lost-device MFA recovery requires admin to disable MFA → temporary 1FA state is routine. ## Proposed Fix (1) Atomic CAS: `prisma.user.updateMany({ where: { id, OR: [{lastTotpAt: null}, {lastTotpAt: {lt: windowStart}}] }, data: { lastTotpAt: now }})` — rows-affected=0 → replay. (2) Add 8-10 backup codes (argon2-hashed, single-use) generated at MFA enablement, displayed once, confirmed consumed in DB. ## Acceptance Criteria - [ ] Parallel TOTP requests with same code: exactly one succeeds (race test) - [ ] Backup codes: generated on enable, shown once, consumed on use, regeneratable - [ ] docs/security-architecture.md updated --- Parent Epic: #1 Source: Full-Codebase Security Audit 2026-04-16 (A-7, A-8)
Hartmut added the security label 2026-04-16 22:05:09 +02:00
Author
Owner

Part 1 — TOTP replay race — resolved in commit 3222bec (security: atomic compare-and-swap for TOTP replay window).

  • New helper packages/api/src/lib/totp-consume.ts::consumeTotpWindow() runs a single user.updateMany({ where: { id, OR: [{ lastTotpAt: null }, { lastTotpAt: { lt: windowStart } }] }, data: { lastTotpAt: now } }). Postgres row-locks that one statement, so two concurrent verifies serialize and exactly one receives { count: 1 }.
  • Three call-sites migrated: apps/web/src/server/auth.ts (login path), packages/api/src/router/user-self-service-procedure-support.ts (verifyAndEnableTotp + verifyTotp).
  • Coverage: packages/api/src/lib/__tests__/totp-consume.test.ts including a simulated race.

Part 2 — Backup codes — deferred. Tracked as a follow-up; will need a Prisma migration (totpBackupCodes String[] + hashed storage), recovery-code UI, and backend consume-on-use logic. Not blocking deploy.

Closing Part 1; part 2 will open as a new ticket.

**Part 1 — TOTP replay race — resolved** in commit 3222bec (`security: atomic compare-and-swap for TOTP replay window`). - New helper `packages/api/src/lib/totp-consume.ts::consumeTotpWindow()` runs a single `user.updateMany({ where: { id, OR: [{ lastTotpAt: null }, { lastTotpAt: { lt: windowStart } }] }, data: { lastTotpAt: now } })`. Postgres row-locks that one statement, so two concurrent verifies serialize and exactly one receives `{ count: 1 }`. - Three call-sites migrated: `apps/web/src/server/auth.ts` (login path), `packages/api/src/router/user-self-service-procedure-support.ts` (verifyAndEnableTotp + verifyTotp). - Coverage: `packages/api/src/lib/__tests__/totp-consume.test.ts` including a simulated race. **Part 2 — Backup codes — deferred.** Tracked as a follow-up; will need a Prisma migration (`totpBackupCodes String[]` + hashed storage), recovery-code UI, and backend consume-on-use logic. Not blocking deploy. Closing Part 1; part 2 will open as a new ticket.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Hartmut/CapaKraken#43