# Security Architecture — CapaKraken > Version: 1.0 | Date: 2026-03-27 --- ## 1. Authentication - **Auth.js v5** (NextAuth) with Credentials provider - **Password hashing**: Argon2id via `@node-rs/argon2` (memory cost 65536, time cost 3) - **Multi-Factor Authentication**: TOTP (RFC 6238) via `otpauth` library - Configurable per user (enable/disable via admin or self-service) - 30-second window, SHA-1, 6-digit codes with 1-step tolerance - **Rate limiting**: 5 login attempts per 15 minutes per email address (in-memory sliding window) - **Session strategy**: JWT with server-side validation - Absolute timeout: 8 hours (configurable via `sessionMaxAge`) - Idle timeout: 30 minutes (configurable via `sessionIdleTimeout`) - **Concurrent session limit**: configurable `maxConcurrentSessions` (default 3), kick-oldest strategy - **Login/logout audit**: all authentication events (success, failure, rate-limit, invalid TOTP, logout) are recorded in the audit log ## 2. Authorization ### Role-Based Access Control (RBAC) Five-level role hierarchy: | Role | Level | Capabilities | |------|-------|-------------| | ADMIN | 5 | Full system access, user management, system settings | | MANAGER | 4 | Project management, resource allocation, vacation approval | | CONTROLLER | 3 | Financial views, budget management, reporting | | USER | 2 | Self-service (own vacations, own resource profile) | | VIEWER | 1 | Read-only access to permitted areas | ### Per-User Permission Overrides - `permissionOverrides` JSONB field on User model - `resolvePermissions(role, overrides)` computes effective permissions - `requirePermission(ctx, key)` enforced on every tRPC procedure - Granular `PermissionKey` enum covering all domain actions ### tRPC Middleware Stack ``` publicProcedure -> protectedProcedure (requires authenticated session) -> controllerProcedure (ADMIN + MANAGER + CONTROLLER) -> managerProcedure (ADMIN + MANAGER) -> adminProcedure (ADMIN only) ``` ## 3. Data Protection ### Database Security - **PostgreSQL** with TLS in production - **Prisma ORM**: parameterized queries by default — no SQL injection risk - Database not exposed to the internet (Docker internal network only) - All monetary values stored as integer cents (no floating-point precision issues) ### Data at Rest - Passwords: Argon2id hash (never stored in plaintext) - TOTP secrets: stored in DB (encrypted at-rest via PostgreSQL TDE when available) - Runtime secrets now resolve env-first for AI, Gemini, SMTP, and anonymization seed values. Database-backed `SystemSettings` values remain transitional compatibility storage, not the preferred production source of truth. - Recommended runtime overrides: `OPENAI_API_KEY`, `AZURE_OPENAI_API_KEY`, `AZURE_DALLE_API_KEY`, `GEMINI_API_KEY`, `SMTP_PASSWORD`, `ANONYMIZATION_SEED` - Admin settings reads expose only presence flags (`hasApiKey`, `hasSmtpPassword`, `hasGeminiApiKey`) instead of returning secret values to the browser, and those flags also reflect environment-backed runtime overrides - The admin settings mutation no longer persists new secret values into `SystemSettings`; secret inputs must be provisioned through environment or a deployment-time secret manager, and legacy database copies can be cleared explicitly - The admin UI now exposes runtime secret source/status plus an explicit "clear legacy DB secrets" cleanup path so operators can complete the migration without direct database writes - Production startup now validates Auth.js runtime configuration and refuses to boot if `AUTH_SECRET`/`NEXTAUTH_SECRET` is missing, left on a known development placeholder, or paired with a non-HTTPS public auth URL ### Anonymization - Configurable global anonymization for VIEWER role - Resource names, emails replaced with deterministic pseudonyms (seeded hash) - Anonymization domain and mode configurable in SystemSettings ## 4. Session Management - **Server-side JWT** with `SameSite=Strict` cookies - `httpOnly` cookies prevent XSS-based session theft - `secure` flag enforced in production (HTTPS only) - CSRF protection via Auth.js built-in CSRF token - Configurable session timeouts (absolute + idle) via SystemSettings - Active session registry with concurrent session limit enforcement ## 5. Input Validation - **Zod schemas** on every tRPC procedure input - Strict TypeScript (`strict: true`, `exactOptionalPropertyTypes: true`) - Blueprint dynamic fields validated at runtime against stored Zod schema definitions - File uploads validated by: - MIME type whitelist (`image/png`, `image/jpeg`, `image/webp`, `image/tiff`, `image/bmp`) - Size limit (10 MB client-side, 4 MB server-side after compression) - Magic byte verification (actual file content matched against declared MIME) ## 6. Audit Logging ### Activity History System - Centralized `createAuditEntry()` function (fire-and-forget, never blocks) - Covers 29+ of 36 tRPC routers - Logged fields: `entityType`, `entityId`, `action`, `userId`, `changes` (JSONB with before/after/diff), `source`, `summary` - Authentication events: login success/failure, logout, rate limiting, MFA failures ### External API Call Logging - All OpenAI/Azure/Gemini API calls logged via `loggedAiCall()` wrapper - Structured Pino logs: `{ provider, model, promptLength, responseTimeMs }` - Failed calls logged at `warn` level with sanitized diagnostics only, with URL and secret-like tokens redacted before they reach structured logs ### tRPC Request Logging - Every tRPC call logged with request ID, user ID, path, duration - Slow calls (>500ms) logged at `warn` level ## 7. HTTP Security Headers Configured in `next.config.ts`: | Header | Value | |--------|-------| | Strict-Transport-Security | `max-age=63072000; includeSubDomains; preload` | | Content-Security-Policy | Restrictive CSP with nonce-based script-src | | X-Frame-Options | `DENY` | | X-Content-Type-Options | `nosniff` | | X-XSS-Protection | `1; mode=block` | | Referrer-Policy | `strict-origin-when-cross-origin` | | Permissions-Policy | Camera, microphone, geolocation disabled | ## 8. Rate Limiting - **Per-IP rate limiting**: via middleware on all API routes - **Per-user rate limiting**: configurable per-procedure - **Shared rate-limit backend**: Redis-backed counters when `REDIS_URL` is configured; in-memory fallback remains available for local development and degraded operation - **Auth-specific rate limiting**: 5 attempts / 15 min per email - **AI API call rate limits**: upstream provider limits surfaced as user-friendly errors ## 9. Error Handling - **Sentry** integration for production error tracking - **Pino** structured logging (JSON in production, pretty-print in development) - tRPC errors mapped to appropriate HTTP status codes - AI API errors translated to human-readable messages via `parseAiError()` / `parseGeminiError()` - Admin connection tests for AI/SMTP return sanitized, user-facing diagnostics only; raw upstream details stay in server logs with redaction for URLs, hosts, emails, and secret-like tokens - Internal errors never leak stack traces to the client ## 10. Dependency Security - **Dependabot** configured for automated dependency updates - `pnpm audit` runs in the scheduled [nightly-security.yml](/home/hartmut/Documents/Copilot/capakraken/.github/workflows/nightly-security.yml) workflow, and high-signal architecture guardrails run on every PR in [ci.yml](/home/hartmut/Documents/Copilot/capakraken/.github/workflows/ci.yml) - Lockfile integrity verified on install - transitive audit hotspots such as `flatted` and `picomatch` are pinned through root `pnpm.overrides` to keep dev-tooling CVEs from drifting back in through nested dependencies - runtime workbook parsing and export generation now use `exceljs` boundaries instead of direct `xlsx` usage in application, engine, and web paths - `pnpm audit --audit-level=high` is clean as of 2026-03-30; the remaining dependency findings are low/moderate only ## 11. Network Architecture ``` Browser -> Next.js (port 3100) -> tRPC -> Prisma -> PostgreSQL (port 5433) -> Redis (port 6380, SSE pub/sub) -> Azure OpenAI / Gemini (external HTTPS) -> SMTP (email notifications) ``` - PostgreSQL and Redis accessible only within Docker network - External API calls (AI, SMTP) over TLS - No direct database access from the internet ## 12. Database Security ### Authentication and Access - PostgreSQL uses password-based authentication (`capakraken` user with strong password) - Connection restricted to the Docker internal network (port 5433 on host, 5432 inside container) - No direct internet access to the database — all queries routed through Prisma ORM via the application layer - Application uses a single database user; no shared or anonymous access ### Query Safety - **Prisma ORM** enforces parameterized queries by default — no raw SQL concatenation - All user inputs validated by Zod schemas before reaching the data layer - JSONB fields (blueprints, skill matrices, permission overrides) are type-checked at the application boundary ### Active Hardening Measures - **PostgreSQL audit logging** enabled via `docker-compose.yml` command flags: - `log_connections=on` / `log_disconnections=on` — all connection lifecycle events - `log_statement=ddl` — all DDL statements (CREATE, ALTER, DROP) - `log_min_duration_statement=1000` — slow queries (>1s) logged for performance review - `log_line_prefix='%t [%p] %u@%d '` — timestamp, PID, user, and database in every log line - **SUPERUSER removed** from the application database user (`capakraken`); hardening script at `scripts/harden-postgres.sh` - **Minimal privilege grants**: application user has only SELECT, INSERT, UPDATE, DELETE on tables and USAGE/SELECT on sequences — no CREATE, DROP, or SUPERUSER capabilities ### Recommendations for Further Production Hardening 1. **Enable PostgreSQL SSL/TLS**: Set `ssl: true` in the Prisma connection string and configure `postgresql.conf` with `ssl = on`, `ssl_cert_file`, `ssl_key_file` 2. **Restrict connections by IP**: Configure `pg_hba.conf` to accept connections only from the application container's subnet (e.g., `172.18.0.0/16`) 3. **Use separate database roles**: Create a read-only role for reporting queries and a migration-only role for schema changes, limiting the default application role to DML operations 4. **Enable connection pooling**: Use PgBouncer in production to limit maximum connections and prevent resource exhaustion attacks 5. **Backup encryption**: Ensure `pg_dump` backups are encrypted at rest (GPG or filesystem-level encryption) ### Redis Security - Redis instance runs without authentication in development (Docker-internal only) - **Production recommendation**: Enable `requirepass` in Redis configuration and set `REDIS_URL` to include the password (`redis://:password@host:port`) - Redis is used only for SSE pub/sub (no sensitive data persisted) ## 13. Proactive Monitoring ### Health Check Cron (`/api/cron/health-check`) - Verifies PostgreSQL and Redis connectivity on each invocation - On failure: creates CRITICAL in-app notifications for all ADMIN users - Designed to be triggered by external cron (e.g., `curl` every 5 minutes) - Protected by `CRON_SECRET` Bearer token ### Security Audit Cron (`/api/cron/security-audit`) - Scans installed dependency versions against known minimum safe versions - Alerts ADMIN users when high-severity outdated packages are detected - Complements Dependabot with an in-app awareness layer ### nginx Hardening - Reference configuration: `docs/nginx-hardening.conf` - Covers: server token removal, rate limiting (auth: 1r/s, API: 10r/s), SSL hardening (TLS 1.2+), OCSP stapling - Security headers applied at nginx level as a defense-in-depth backup to Next.js headers