Client-side validators (reset-password, invite-accept, first-admin setup, user-create modal) previously checked password.length < 8 while every server-side Zod schema required .min(12). External API consumers (or a confused browser UI) could get past the client check but fail at the tRPC boundary — or worse, quietly under-enforce policy compared to what admins expect. Fix: introduce PASSWORD_MIN_LENGTH (12) and PASSWORD_MAX_LENGTH (128) in @capakraken/shared and import them from every pre-submit client validator and every server Zod schema. Single source of truth; drift becomes a compile error rather than a security finding. Also hardens the AUTH_SECRET runtime check: in addition to the existing placeholder-blacklist, production startup now rejects secrets shorter than 32 chars OR with Shannon entropy below 3.5 bits/char. That covers low-entropy-but-long values like "aaaa..." (38 chars, entropy 0) which would have passed the previous checks. Documented the rotation process for AUTH_SECRET + POSTGRES_PASSWORD in docs/security-architecture.md §3. Verified: - pnpm test:unit — 396 files / 1922 tests passed - pnpm --filter @capakraken/web exec tsc --noEmit — clean - pnpm --filter @capakraken/api exec tsc --noEmit — clean Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
16 KiB
Security Architecture — CapaKraken
Version: 1.0 | Date: 2026-03-27
1. Authentication
- Auth.js v5 (NextAuth) with Credentials provider
- Password hashing: Argon2id via
@node-rs/argon2(memory cost 65536, time cost 3) - Multi-Factor Authentication: TOTP (RFC 6238) via
otpauthlibrary- Configurable per user (enable/disable via admin or self-service)
- 30-second window, SHA-1, 6-digit codes with 1-step tolerance
- Rate limiting: 5 login attempts per 15 minutes per email address (in-memory sliding window)
- Session strategy: JWT with server-side validation
- Absolute timeout: 8 hours (configurable via
sessionMaxAge) - Idle timeout: 30 minutes (configurable via
sessionIdleTimeout)
- Absolute timeout: 8 hours (configurable via
- Concurrent session limit: configurable
maxConcurrentSessions(default 3), kick-oldest strategy - Login/logout audit: all authentication events (success, failure, rate-limit, invalid TOTP, logout) are recorded in the audit log
2. Authorization
Role-Based Access Control (RBAC)
Five-level role hierarchy:
| Role | Level | Capabilities |
|---|---|---|
| ADMIN | 5 | Full system access, user management, system settings |
| MANAGER | 4 | Project management, resource allocation, vacation approval |
| CONTROLLER | 3 | Financial views, budget management, reporting |
| USER | 2 | Self-service (own vacations, own resource profile) |
| VIEWER | 1 | Read-only access to permitted areas |
Per-User Permission Overrides
permissionOverridesJSONB field on User modelresolvePermissions(role, overrides)computes effective permissionsrequirePermission(ctx, key)enforced on every tRPC procedure- Granular
PermissionKeyenum covering all domain actions
tRPC Middleware Stack
publicProcedure
-> protectedProcedure (requires authenticated session)
-> controllerProcedure (ADMIN + MANAGER + CONTROLLER)
-> managerProcedure (ADMIN + MANAGER)
-> adminProcedure (ADMIN only)
3. Data Protection
Database Security
- PostgreSQL with TLS in production
- Prisma ORM: parameterized queries by default — no SQL injection risk
- Database not exposed to the internet (Docker internal network only)
- All monetary values stored as integer cents (no floating-point precision issues)
Data at Rest
- Passwords: Argon2id hash (never stored in plaintext)
- TOTP secrets: stored in DB (encrypted at-rest via PostgreSQL TDE when available)
- Runtime secrets now resolve env-first for AI, Gemini, SMTP, and anonymization seed values. Database-backed
SystemSettingsvalues remain transitional compatibility storage, not the preferred production source of truth. - Recommended runtime overrides:
OPENAI_API_KEY,AZURE_OPENAI_API_KEY,AZURE_DALLE_API_KEY,GEMINI_API_KEY,SMTP_PASSWORD,ANONYMIZATION_SEED - Admin settings reads expose only presence flags (
hasApiKey,hasSmtpPassword,hasGeminiApiKey) instead of returning secret values to the browser, and those flags also reflect environment-backed runtime overrides - The admin settings mutation no longer persists new secret values into
SystemSettings; secret inputs must be provisioned through environment or a deployment-time secret manager, and legacy database copies can be cleared explicitly - The admin UI now exposes runtime secret source/status plus an explicit "clear legacy DB secrets" cleanup path so operators can complete the migration without direct database writes
- Production startup now validates Auth.js runtime configuration and refuses to boot if
AUTH_SECRET/NEXTAUTH_SECRETis missing, left on a known development placeholder, paired with a non-HTTPS public auth URL, shorter than 32 characters, or failing a Shannon-entropy check (≥ 3.5 bits/char) - User passwords: minimum 12 characters, maximum 128 characters; single
PASSWORD_MIN_LENGTH/PASSWORD_MAX_LENGTHconstant (@capakraken/shared/constants) is imported by every client-side pre-submit validator and server-side Zod schema — prevents client/server policy drift
Secret rotation
AUTH_SECRET/NEXTAUTH_SECRETis the signing key for all JWT session cookies. Rotation forces every user to re-authenticate on their next request.- Generate replacement:
openssl rand -base64 32 - Deploy path:
- Update the secret in the deployment secret store (not in repo).
- Roll all application containers — existing JWTs signed under the old key fail verification and the user is redirected to sign-in.
- There is no multi-key transition window: this is a hard cut on purpose, because a compromised signing key must be retired immediately.
- Recommended cadence: quarterly, or immediately on suspected compromise.
POSTGRES_PASSWORDrotation is coordinated across postgres container init, the app container'sDATABASE_URL, and any external replication consumers — follow the deployment runbook.
Anonymization
- Configurable global anonymization for VIEWER role
- Resource names, emails replaced with deterministic pseudonyms (seeded hash)
- Anonymization domain and mode configurable in SystemSettings
4. Session Management
- Server-side JWT with
SameSite=Strictcookies httpOnlycookies prevent XSS-based session theftsecureflag enforced in production (HTTPS only)- CSRF protection via Auth.js built-in CSRF token
- Configurable session timeouts (absolute + idle) via SystemSettings
- Active session registry with concurrent session limit enforcement
5. Input Validation
- Zod schemas on every tRPC procedure input
- Strict TypeScript (
strict: true,exactOptionalPropertyTypes: true) - Blueprint dynamic fields validated at runtime against stored Zod schema definitions
- File uploads validated by:
- MIME type whitelist (
image/png,image/jpeg,image/webp,image/tiff,image/bmp) - Size limit (10 MB client-side, 4 MB server-side after compression)
- Magic byte verification (actual file content matched against declared MIME)
- MIME type whitelist (
Prompt-Injection Guard (defense-in-depth only)
packages/api/src/lib/prompt-guard.ts runs a short regex list against every
free-text user prompt sent to an AI tool (assistant chat + project-cover
DALL-E prompt). Input is normalised before the regex runs:
- Unicode NFKD decomposition (collapses fullwidth / compatibility forms and splits diacritics from their base letter).
- Strip zero-width / directional / combining code points that attackers use to break contiguous substring matches.
- Fold a small set of Cyrillic / Greek homoglyphs to their Latin equivalents.
This guard is defense-in-depth, not an authorisation boundary. The actual
security boundary for AI-initiated actions is the per-tool
requirePermission(ctx, PermissionKey.*) check inside every assistant tool —
an LLM that has been successfully jailbroken still cannot perform an action
its caller's role does not allow. Motivated adversaries will find prompts
that defeat the regex layer; its purpose is to raise the cost of casual
injection attempts and to surface them as audit-log entries.
6. Audit Logging
Activity History System
- Centralized
createAuditEntry()function (fire-and-forget, never blocks) - Covers 29+ of 36 tRPC routers
- Logged fields:
entityType,entityId,action,userId,changes(JSONB with before/after/diff),source,summary - Authentication events: login success/failure, logout, rate limiting, MFA failures
External API Call Logging
- All OpenAI/Azure/Gemini API calls logged via
loggedAiCall()wrapper - Structured Pino logs:
{ provider, model, promptLength, responseTimeMs } - Failed calls logged at
warnlevel with sanitized diagnostics only, with URL and secret-like tokens redacted before they reach structured logs
tRPC Request Logging
- Every tRPC call logged with request ID, user ID, path, duration
- Slow calls (>500ms) logged at
warnlevel
7. HTTP Security Headers
Static headers are configured in next.config.ts. The Content-Security-Policy
is emitted per-request by apps/web/src/middleware.ts so it can carry a
per-request nonce.
| Header | Value |
|---|---|
| Strict-Transport-Security | max-age=63072000; includeSubDomains; preload |
| Content-Security-Policy | Restrictive CSP with nonce-based script-src |
| X-Frame-Options | DENY |
| X-Content-Type-Options | nosniff |
| X-XSS-Protection | 1; mode=block |
| Referrer-Policy | strict-origin-when-cross-origin |
| Permissions-Policy | Camera, microphone, geolocation disabled |
Content-Security-Policy directives (production)
| Directive | Value | Rationale |
|---|---|---|
default-src |
'self' |
Baseline deny-all-cross-origin. |
script-src |
'self' 'nonce-<random>' |
No unsafe-inline / unsafe-eval in prod. |
style-src |
'self' 'unsafe-inline' |
Accepted residual risk — see note below. |
img-src |
'self' data: blob: |
Allow base64 previews and generated blobs only. |
font-src |
'self' data: |
Data URLs for inline-embedded fonts. |
connect-src |
'self' |
All AI / third-party calls are server-side. |
frame-ancestors |
'none' |
Clickjacking defence. |
frame-src |
'none' |
No third-party iframes. |
object-src |
'none' |
Blocks legacy <object> / Flash / applet vectors. |
media-src |
'self' |
No cross-origin video / audio. |
worker-src |
'self' blob: |
Next.js runtime uses blob-URL workers. |
base-uri |
'self' |
Blocks <base> hijacks. |
form-action |
'self' |
Blocks form-exfiltration to third parties. |
Residual risk — style-src 'unsafe-inline': React inlines component-scoped
style attributes and @react-pdf/renderer emits inline <style> blocks that
cannot carry a nonce. A strict style-src-elem would break both. The risk is
bounded because script-src is nonce-based — a pure CSS-injection attack
cannot escalate to JS execution in this application.
8. Rate Limiting
- Per-IP rate limiting: via middleware on all API routes
- Per-user rate limiting: configurable per-procedure
- Shared rate-limit backend: Redis-backed counters when
REDIS_URLis configured; in-memory fallback remains available for local development and degraded operation - Auth-specific rate limiting: 5 attempts / 15 min per email
- AI API call rate limits: upstream provider limits surfaced as user-friendly errors
9. Error Handling
- Sentry integration for production error tracking
- Pino structured logging (JSON in production, pretty-print in development)
- tRPC errors mapped to appropriate HTTP status codes
- AI API errors translated to human-readable messages via
parseAiError()/parseGeminiError() - Admin connection tests for AI/SMTP return sanitized, user-facing diagnostics only; raw upstream details stay in server logs with redaction for URLs, hosts, emails, and secret-like tokens
- Internal errors never leak stack traces to the client
10. Dependency Security
- Dependabot configured for automated dependency updates
pnpm auditruns in the scheduled nightly-security.yml workflow, and high-signal architecture guardrails run on every PR in ci.yml- Lockfile integrity verified on install
- transitive audit hotspots such as
flattedandpicomatchare pinned through rootpnpm.overridesto keep dev-tooling CVEs from drifting back in through nested dependencies - runtime workbook parsing and export generation now use
exceljsboundaries instead of directxlsxusage in application, engine, and web paths pnpm audit --audit-level=highis clean as of 2026-03-30; the remaining dependency findings are low/moderate only
11. Network Architecture
Browser -> Next.js (port 3100) -> tRPC -> Prisma -> PostgreSQL (port 5433)
-> Redis (port 6380, SSE pub/sub)
-> Azure OpenAI / Gemini (external HTTPS)
-> SMTP (email notifications)
- PostgreSQL and Redis accessible only within Docker network
- External API calls (AI, SMTP) over TLS
- No direct database access from the internet
12. Database Security
Authentication and Access
- PostgreSQL uses password-based authentication (
capakrakenuser with strong password) - Connection restricted to the Docker internal network (port 5433 on host, 5432 inside container)
- No direct internet access to the database — all queries routed through Prisma ORM via the application layer
- Application uses a single database user; no shared or anonymous access
Query Safety
- Prisma ORM enforces parameterized queries by default — no raw SQL concatenation
- All user inputs validated by Zod schemas before reaching the data layer
- JSONB fields (blueprints, skill matrices, permission overrides) are type-checked at the application boundary
Active Hardening Measures
- PostgreSQL audit logging enabled via
docker-compose.ymlcommand flags:log_connections=on/log_disconnections=on— all connection lifecycle eventslog_statement=ddl— all DDL statements (CREATE, ALTER, DROP)log_min_duration_statement=1000— slow queries (>1s) logged for performance reviewlog_line_prefix='%t [%p] %u@%d '— timestamp, PID, user, and database in every log line
- SUPERUSER removed from the application database user (
capakraken); hardening script atscripts/harden-postgres.sh - Minimal privilege grants: application user has only SELECT, INSERT, UPDATE, DELETE on tables and USAGE/SELECT on sequences — no CREATE, DROP, or SUPERUSER capabilities
Recommendations for Further Production Hardening
- Enable PostgreSQL SSL/TLS: Set
ssl: truein the Prisma connection string and configurepostgresql.confwithssl = on,ssl_cert_file,ssl_key_file - Restrict connections by IP: Configure
pg_hba.confto accept connections only from the application container's subnet (e.g.,172.18.0.0/16) - Use separate database roles: Create a read-only role for reporting queries and a migration-only role for schema changes, limiting the default application role to DML operations
- Enable connection pooling: Use PgBouncer in production to limit maximum connections and prevent resource exhaustion attacks
- Backup encryption: Ensure
pg_dumpbackups are encrypted at rest (GPG or filesystem-level encryption)
Redis Security
- Redis instance runs without authentication in development (Docker-internal only)
- Production recommendation: Enable
requirepassin Redis configuration and setREDIS_URLto include the password (redis://:password@host:port) - Redis is used only for SSE pub/sub (no sensitive data persisted)
13. Proactive Monitoring
Health Check Cron (/api/cron/health-check)
- Verifies PostgreSQL and Redis connectivity on each invocation
- On failure: creates CRITICAL in-app notifications for all ADMIN users
- Designed to be triggered by external cron (e.g.,
curlevery 5 minutes) - Protected by
CRON_SECRETBearer token
Security Audit Cron (/api/cron/security-audit)
- Scans installed dependency versions against known minimum safe versions
- Alerts ADMIN users when high-severity outdated packages are detected
- Complements Dependabot with an in-app awareness layer
nginx Hardening
- Reference configuration:
docs/nginx-hardening.conf - Covers: server token removal, rate limiting (auth: 1r/s, API: 10r/s), SSL hardening (TLS 1.2+), OCSP stapling
- Security headers applied at nginx level as a defense-in-depth backup to Next.js headers