Files
CapaKraken/docs/security-architecture.md
T
Hartmut 01c45d0344 security: align client password policy with server, enforce AUTH_SECRET length + entropy (#56)
Client-side validators (reset-password, invite-accept, first-admin setup,
user-create modal) previously checked password.length < 8 while every
server-side Zod schema required .min(12). External API consumers (or a
confused browser UI) could get past the client check but fail at the tRPC
boundary — or worse, quietly under-enforce policy compared to what
admins expect.

Fix: introduce PASSWORD_MIN_LENGTH (12) and PASSWORD_MAX_LENGTH (128) in
@capakraken/shared and import them from every pre-submit client validator
and every server Zod schema. Single source of truth; drift becomes a
compile error rather than a security finding.

Also hardens the AUTH_SECRET runtime check: in addition to the existing
placeholder-blacklist, production startup now rejects secrets shorter
than 32 chars OR with Shannon entropy below 3.5 bits/char. That covers
low-entropy-but-long values like "aaaa..." (38 chars, entropy 0) which
would have passed the previous checks.

Documented the rotation process for AUTH_SECRET + POSTGRES_PASSWORD in
docs/security-architecture.md §3.

Verified:
- pnpm test:unit — 396 files / 1922 tests passed
- pnpm --filter @capakraken/web exec tsc --noEmit — clean
- pnpm --filter @capakraken/api exec tsc --noEmit — clean

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-17 14:56:43 +02:00

16 KiB

Security Architecture — CapaKraken

Version: 1.0 | Date: 2026-03-27


1. Authentication

  • Auth.js v5 (NextAuth) with Credentials provider
  • Password hashing: Argon2id via @node-rs/argon2 (memory cost 65536, time cost 3)
  • Multi-Factor Authentication: TOTP (RFC 6238) via otpauth library
    • Configurable per user (enable/disable via admin or self-service)
    • 30-second window, SHA-1, 6-digit codes with 1-step tolerance
  • Rate limiting: 5 login attempts per 15 minutes per email address (in-memory sliding window)
  • Session strategy: JWT with server-side validation
    • Absolute timeout: 8 hours (configurable via sessionMaxAge)
    • Idle timeout: 30 minutes (configurable via sessionIdleTimeout)
  • Concurrent session limit: configurable maxConcurrentSessions (default 3), kick-oldest strategy
  • Login/logout audit: all authentication events (success, failure, rate-limit, invalid TOTP, logout) are recorded in the audit log

2. Authorization

Role-Based Access Control (RBAC)

Five-level role hierarchy:

Role Level Capabilities
ADMIN 5 Full system access, user management, system settings
MANAGER 4 Project management, resource allocation, vacation approval
CONTROLLER 3 Financial views, budget management, reporting
USER 2 Self-service (own vacations, own resource profile)
VIEWER 1 Read-only access to permitted areas

Per-User Permission Overrides

  • permissionOverrides JSONB field on User model
  • resolvePermissions(role, overrides) computes effective permissions
  • requirePermission(ctx, key) enforced on every tRPC procedure
  • Granular PermissionKey enum covering all domain actions

tRPC Middleware Stack

publicProcedure
  -> protectedProcedure (requires authenticated session)
    -> controllerProcedure (ADMIN + MANAGER + CONTROLLER)
      -> managerProcedure (ADMIN + MANAGER)
        -> adminProcedure (ADMIN only)

3. Data Protection

Database Security

  • PostgreSQL with TLS in production
  • Prisma ORM: parameterized queries by default — no SQL injection risk
  • Database not exposed to the internet (Docker internal network only)
  • All monetary values stored as integer cents (no floating-point precision issues)

Data at Rest

  • Passwords: Argon2id hash (never stored in plaintext)
  • TOTP secrets: stored in DB (encrypted at-rest via PostgreSQL TDE when available)
  • Runtime secrets now resolve env-first for AI, Gemini, SMTP, and anonymization seed values. Database-backed SystemSettings values remain transitional compatibility storage, not the preferred production source of truth.
  • Recommended runtime overrides: OPENAI_API_KEY, AZURE_OPENAI_API_KEY, AZURE_DALLE_API_KEY, GEMINI_API_KEY, SMTP_PASSWORD, ANONYMIZATION_SEED
  • Admin settings reads expose only presence flags (hasApiKey, hasSmtpPassword, hasGeminiApiKey) instead of returning secret values to the browser, and those flags also reflect environment-backed runtime overrides
  • The admin settings mutation no longer persists new secret values into SystemSettings; secret inputs must be provisioned through environment or a deployment-time secret manager, and legacy database copies can be cleared explicitly
  • The admin UI now exposes runtime secret source/status plus an explicit "clear legacy DB secrets" cleanup path so operators can complete the migration without direct database writes
  • Production startup now validates Auth.js runtime configuration and refuses to boot if AUTH_SECRET/NEXTAUTH_SECRET is missing, left on a known development placeholder, paired with a non-HTTPS public auth URL, shorter than 32 characters, or failing a Shannon-entropy check (≥ 3.5 bits/char)
  • User passwords: minimum 12 characters, maximum 128 characters; single PASSWORD_MIN_LENGTH / PASSWORD_MAX_LENGTH constant (@capakraken/shared/constants) is imported by every client-side pre-submit validator and server-side Zod schema — prevents client/server policy drift

Secret rotation

  • AUTH_SECRET / NEXTAUTH_SECRET is the signing key for all JWT session cookies. Rotation forces every user to re-authenticate on their next request.
  • Generate replacement: openssl rand -base64 32
  • Deploy path:
    1. Update the secret in the deployment secret store (not in repo).
    2. Roll all application containers — existing JWTs signed under the old key fail verification and the user is redirected to sign-in.
    3. There is no multi-key transition window: this is a hard cut on purpose, because a compromised signing key must be retired immediately.
  • Recommended cadence: quarterly, or immediately on suspected compromise.
  • POSTGRES_PASSWORD rotation is coordinated across postgres container init, the app container's DATABASE_URL, and any external replication consumers — follow the deployment runbook.

Anonymization

  • Configurable global anonymization for VIEWER role
  • Resource names, emails replaced with deterministic pseudonyms (seeded hash)
  • Anonymization domain and mode configurable in SystemSettings

4. Session Management

  • Server-side JWT with SameSite=Strict cookies
  • httpOnly cookies prevent XSS-based session theft
  • secure flag enforced in production (HTTPS only)
  • CSRF protection via Auth.js built-in CSRF token
  • Configurable session timeouts (absolute + idle) via SystemSettings
  • Active session registry with concurrent session limit enforcement

5. Input Validation

  • Zod schemas on every tRPC procedure input
  • Strict TypeScript (strict: true, exactOptionalPropertyTypes: true)
  • Blueprint dynamic fields validated at runtime against stored Zod schema definitions
  • File uploads validated by:
    • MIME type whitelist (image/png, image/jpeg, image/webp, image/tiff, image/bmp)
    • Size limit (10 MB client-side, 4 MB server-side after compression)
    • Magic byte verification (actual file content matched against declared MIME)

Prompt-Injection Guard (defense-in-depth only)

packages/api/src/lib/prompt-guard.ts runs a short regex list against every free-text user prompt sent to an AI tool (assistant chat + project-cover DALL-E prompt). Input is normalised before the regex runs:

  1. Unicode NFKD decomposition (collapses fullwidth / compatibility forms and splits diacritics from their base letter).
  2. Strip zero-width / directional / combining code points that attackers use to break contiguous substring matches.
  3. Fold a small set of Cyrillic / Greek homoglyphs to their Latin equivalents.

This guard is defense-in-depth, not an authorisation boundary. The actual security boundary for AI-initiated actions is the per-tool requirePermission(ctx, PermissionKey.*) check inside every assistant tool — an LLM that has been successfully jailbroken still cannot perform an action its caller's role does not allow. Motivated adversaries will find prompts that defeat the regex layer; its purpose is to raise the cost of casual injection attempts and to surface them as audit-log entries.

6. Audit Logging

Activity History System

  • Centralized createAuditEntry() function (fire-and-forget, never blocks)
  • Covers 29+ of 36 tRPC routers
  • Logged fields: entityType, entityId, action, userId, changes (JSONB with before/after/diff), source, summary
  • Authentication events: login success/failure, logout, rate limiting, MFA failures

External API Call Logging

  • All OpenAI/Azure/Gemini API calls logged via loggedAiCall() wrapper
  • Structured Pino logs: { provider, model, promptLength, responseTimeMs }
  • Failed calls logged at warn level with sanitized diagnostics only, with URL and secret-like tokens redacted before they reach structured logs

tRPC Request Logging

  • Every tRPC call logged with request ID, user ID, path, duration
  • Slow calls (>500ms) logged at warn level

7. HTTP Security Headers

Static headers are configured in next.config.ts. The Content-Security-Policy is emitted per-request by apps/web/src/middleware.ts so it can carry a per-request nonce.

Header Value
Strict-Transport-Security max-age=63072000; includeSubDomains; preload
Content-Security-Policy Restrictive CSP with nonce-based script-src
X-Frame-Options DENY
X-Content-Type-Options nosniff
X-XSS-Protection 1; mode=block
Referrer-Policy strict-origin-when-cross-origin
Permissions-Policy Camera, microphone, geolocation disabled

Content-Security-Policy directives (production)

Directive Value Rationale
default-src 'self' Baseline deny-all-cross-origin.
script-src 'self' 'nonce-<random>' No unsafe-inline / unsafe-eval in prod.
style-src 'self' 'unsafe-inline' Accepted residual risk — see note below.
img-src 'self' data: blob: Allow base64 previews and generated blobs only.
font-src 'self' data: Data URLs for inline-embedded fonts.
connect-src 'self' All AI / third-party calls are server-side.
frame-ancestors 'none' Clickjacking defence.
frame-src 'none' No third-party iframes.
object-src 'none' Blocks legacy <object> / Flash / applet vectors.
media-src 'self' No cross-origin video / audio.
worker-src 'self' blob: Next.js runtime uses blob-URL workers.
base-uri 'self' Blocks <base> hijacks.
form-action 'self' Blocks form-exfiltration to third parties.

Residual risk — style-src 'unsafe-inline': React inlines component-scoped style attributes and @react-pdf/renderer emits inline <style> blocks that cannot carry a nonce. A strict style-src-elem would break both. The risk is bounded because script-src is nonce-based — a pure CSS-injection attack cannot escalate to JS execution in this application.

8. Rate Limiting

  • Per-IP rate limiting: via middleware on all API routes
  • Per-user rate limiting: configurable per-procedure
  • Shared rate-limit backend: Redis-backed counters when REDIS_URL is configured; in-memory fallback remains available for local development and degraded operation
  • Auth-specific rate limiting: 5 attempts / 15 min per email
  • AI API call rate limits: upstream provider limits surfaced as user-friendly errors

9. Error Handling

  • Sentry integration for production error tracking
  • Pino structured logging (JSON in production, pretty-print in development)
  • tRPC errors mapped to appropriate HTTP status codes
  • AI API errors translated to human-readable messages via parseAiError() / parseGeminiError()
  • Admin connection tests for AI/SMTP return sanitized, user-facing diagnostics only; raw upstream details stay in server logs with redaction for URLs, hosts, emails, and secret-like tokens
  • Internal errors never leak stack traces to the client

10. Dependency Security

  • Dependabot configured for automated dependency updates
  • pnpm audit runs in the scheduled nightly-security.yml workflow, and high-signal architecture guardrails run on every PR in ci.yml
  • Lockfile integrity verified on install
  • transitive audit hotspots such as flatted and picomatch are pinned through root pnpm.overrides to keep dev-tooling CVEs from drifting back in through nested dependencies
  • runtime workbook parsing and export generation now use exceljs boundaries instead of direct xlsx usage in application, engine, and web paths
  • pnpm audit --audit-level=high is clean as of 2026-03-30; the remaining dependency findings are low/moderate only

11. Network Architecture

Browser -> Next.js (port 3100) -> tRPC -> Prisma -> PostgreSQL (port 5433)
                                       -> Redis (port 6380, SSE pub/sub)
                                       -> Azure OpenAI / Gemini (external HTTPS)
                                       -> SMTP (email notifications)
  • PostgreSQL and Redis accessible only within Docker network
  • External API calls (AI, SMTP) over TLS
  • No direct database access from the internet

12. Database Security

Authentication and Access

  • PostgreSQL uses password-based authentication (capakraken user with strong password)
  • Connection restricted to the Docker internal network (port 5433 on host, 5432 inside container)
  • No direct internet access to the database — all queries routed through Prisma ORM via the application layer
  • Application uses a single database user; no shared or anonymous access

Query Safety

  • Prisma ORM enforces parameterized queries by default — no raw SQL concatenation
  • All user inputs validated by Zod schemas before reaching the data layer
  • JSONB fields (blueprints, skill matrices, permission overrides) are type-checked at the application boundary

Active Hardening Measures

  • PostgreSQL audit logging enabled via docker-compose.yml command flags:
    • log_connections=on / log_disconnections=on — all connection lifecycle events
    • log_statement=ddl — all DDL statements (CREATE, ALTER, DROP)
    • log_min_duration_statement=1000 — slow queries (>1s) logged for performance review
    • log_line_prefix='%t [%p] %u@%d ' — timestamp, PID, user, and database in every log line
  • SUPERUSER removed from the application database user (capakraken); hardening script at scripts/harden-postgres.sh
  • Minimal privilege grants: application user has only SELECT, INSERT, UPDATE, DELETE on tables and USAGE/SELECT on sequences — no CREATE, DROP, or SUPERUSER capabilities

Recommendations for Further Production Hardening

  1. Enable PostgreSQL SSL/TLS: Set ssl: true in the Prisma connection string and configure postgresql.conf with ssl = on, ssl_cert_file, ssl_key_file
  2. Restrict connections by IP: Configure pg_hba.conf to accept connections only from the application container's subnet (e.g., 172.18.0.0/16)
  3. Use separate database roles: Create a read-only role for reporting queries and a migration-only role for schema changes, limiting the default application role to DML operations
  4. Enable connection pooling: Use PgBouncer in production to limit maximum connections and prevent resource exhaustion attacks
  5. Backup encryption: Ensure pg_dump backups are encrypted at rest (GPG or filesystem-level encryption)

Redis Security

  • Redis instance runs without authentication in development (Docker-internal only)
  • Production recommendation: Enable requirepass in Redis configuration and set REDIS_URL to include the password (redis://:password@host:port)
  • Redis is used only for SSE pub/sub (no sensitive data persisted)

13. Proactive Monitoring

Health Check Cron (/api/cron/health-check)

  • Verifies PostgreSQL and Redis connectivity on each invocation
  • On failure: creates CRITICAL in-app notifications for all ADMIN users
  • Designed to be triggered by external cron (e.g., curl every 5 minutes)
  • Protected by CRON_SECRET Bearer token

Security Audit Cron (/api/cron/security-audit)

  • Scans installed dependency versions against known minimum safe versions
  • Alerts ADMIN users when high-severity outdated packages are detected
  • Complements Dependabot with an in-app awareness layer

nginx Hardening

  • Reference configuration: docs/nginx-hardening.conf
  • Covers: server token removal, rate limiting (auth: 1r/s, API: 10r/s), SSL hardening (TLS 1.2+), OCSP stapling
  • Security headers applied at nginx level as a defense-in-depth backup to Next.js headers