Files
CapaKraken/docs/security-audit-2026-03-15.md

507 lines
18 KiB
Markdown

# Security Audit - 2026-03-15
## Scope
Static security review of the current CapaKraken codebase, focused on:
- authentication and authorization boundaries
- sensitive read/write API routes
- browser-exposed endpoints and SSE
- spreadsheet import surfaces
- secret handling and operational scripts
- dependency advisories from `pnpm audit --json`
This review was done by parallel audit slices across API routes, auth/session code, frontend/browser entry points, import code, and dependency/config hygiene. It is a code audit, not a full penetration test.
## Executive Summary
The main security problem is not one isolated bug. It is that CapaKraken currently treats "authenticated" as broadly equivalent to "allowed to see most planning data". That shows up in four places:
1. any signed-in user can currently create a vacation request for any `resourceId`
2. many sensitive read routes are only protected by `protectedProcedure`
3. the SSE timeline feed broadcasts global planning events to every signed-in user
4. untrusted spreadsheet parsing still relies on vulnerable `xlsx@0.18.5`
## Findings
### High: Any authenticated user can create vacation requests for arbitrary resources
**Evidence**
- `packages/api/src/router/vacation.ts:140-207`
- `create` is only `protectedProcedure`
- input accepts any `resourceId`
- the handler verifies the caller exists as a user, but does not verify ownership of the target resource
**Impact**
Any authenticated user can submit `PENDING` vacation requests on behalf of other employees. That is an authorization flaw with direct workflow impact, notification side effects, and possible HR/process abuse.
**Recommended fix**
- For `USER` role, require `resource.userId === ctx.dbUser.id`
- Allow cross-resource creation only for `MANAGER`/`ADMIN`
- Add an audit log entry containing actor and target resource
- Add tests for:
- user can create own vacation
- user cannot create vacation for another resource
- manager can create for another resource
### High: Broad org and PII exposure behind `protectedProcedure`
**Evidence**
- `packages/api/src/trpc.ts:57-70`
- `protectedProcedure` only checks `ctx.session?.user`
- it does not require `ctx.dbUser`
Representative routes exposed to any authenticated session:
- `packages/api/src/router/user.ts:18-29` returns all users with `name`, `email`, `systemRole`
- `packages/api/src/router/resource.ts:50-292` supports listing and searching resources by `displayName`, `eid`, `email`
- `packages/api/src/router/project.ts:16-78` lists all projects
- `packages/api/src/router/project.ts:80-101` returns project plus allocations/demands/assignments
- `packages/api/src/router/allocation.ts:165-299` exposes allocations, demands, and assignments
- `packages/api/src/router/vacation.ts:81-132` exposes vacation records, requester, approver, and resource links
- `packages/api/src/router/dashboard.ts:13-61` exposes overview, peak times, demand analytics to any signed-in user
**Impact**
- least-privilege is not enforced for planning, staffing, and people data
- stale sessions can still pass `protectedProcedure` even if the backing DB user record is missing
- authenticated users can enumerate internal users, resource identities, staffing status, vacations, and project demand
**Recommended fix**
- Introduce a stricter base procedure that requires both `session.user` and `dbUser`
- Reclassify read routes by data sensitivity:
- `adminProcedure` for user administration
- `managerProcedure` or `controllerProcedure` for staffing/financial/planning views
- narrow self-service procedures for ordinary users
- Add field-level redaction for lower-privilege roles
- Define and document a route-by-route access matrix
### High: Global SSE fan-out leaks planning events across users
**Evidence**
- `apps/web/src/app/api/sse/timeline/route.ts:8-60`
- any authenticated session can subscribe
- `packages/api/src/sse/event-bus.ts:117-145`
- `EventBus.subscribe()` adds subscribers to a global set with no user, role, chapter, or project scoping
- `packages/api/src/sse/event-bus.ts:176-210`
- allocation, vacation, role, budget, and notification events are broadcast globally
**Impact**
Any signed-in user connected to the timeline SSE endpoint can receive metadata about organization-wide planning changes. Even when payloads are small, `projectId`, `resourceId`, `notificationId`, vacation status changes, and timing metadata are still sensitive internal signals.
**Recommended fix**
- Scope subscriptions per user and per permission set
- Filter event delivery before enqueueing into the stream
- Split channels by event type and audience, for example:
- personal notifications
- chapter-scoped planning updates
- manager/controller-only global planning updates
- Add contract tests to ensure a standard user does not receive unrelated events
### High: Vulnerable spreadsheet parser used on untrusted input
**Evidence**
- `pnpm audit --json` reports:
- `xlsx` prototype pollution advisory `GHSA-4r6h-8v6p-xvw6`
- `xlsx` ReDoS advisory `GHSA-5pgg-2g8v-p4x9`
- `packages/application/src/use-cases/dispo-import/read-workbook.ts:22-41`
- `apps/web/src/lib/excel.ts:7-35`
- `apps/web/src/lib/skillMatrixParser.ts:85-124`
**Impact**
CapaKraken parses spreadsheet data from files, including browser-side and import-related flows, with a library version that has known high-severity issues when reading crafted workbooks. Export-only flows are lower risk; read/parse flows are the real problem.
**Recommended fix**
- Replace `xlsx` for untrusted parsing, or isolate parsing into a hardened service/container
- Enforce strict upload limits:
- max file size
- max sheet count
- max row/column count
- parse timeout / worker isolation
- Reject unsupported formats early
- Treat import parsing as untrusted-content processing, not as a normal request path
### Medium: Secrets are stored in application settings and raw diagnostics are returned
**Evidence**
- `packages/api/src/router/settings.ts:50-140` writes `azureOpenAiApiKey` and `smtpPassword` directly into `systemSettings`
- `packages/api/src/router/settings.ts:142-225` returns `raw` AI connection errors
- `packages/api/src/lib/email.ts:64-80` returns raw SMTP verification errors
**Impact**
- secrets appear to be stored plaintext at rest unless database-level encryption exists outside the app
- raw provider/network errors can leak infrastructure details, hostnames, or provider responses into the admin UI
**Recommended fix**
- Move secrets to a proper secret manager or add application-level encryption at rest
- Return sanitized admin-facing error categories and log detailed diagnostics server-side only
- Restrict who can trigger connection tests and audit those actions
### Medium: Missing rate limiting on authentication and admin connectivity tests
**Evidence**
- `apps/web/src/server/auth.ts:20-37` performs credential verification with no visible throttle, lockout, or rate limiting
- `packages/api/src/router/settings.ts:142-229` exposes outbound AI and SMTP test actions to admins with no visible throttling
- repository search found no application-level rate limiter for auth or admin test endpoints
**Impact**
- increased brute-force and credential-stuffing risk on login
- increased abuse and outbound scanning risk on AI/SMTP test functions
**Recommended fix**
- Add IP and account-based rate limiting to sign-in
- Add short cooldowns and audit logging to admin test endpoints
- Consider temporary lockout or progressive backoff for failed login attempts
### Medium: Self-service skill matrix import has weak abuse controls
**Evidence**
- `packages/api/src/router/resource.ts:573-610`
- endpoint is only `protectedProcedure`
- client can submit arbitrary-length `skills` arrays and employee metadata
**Impact**
The caller is limited to their linked resource, which is good, but the endpoint still permits large user-controlled JSON payloads without explicit payload size caps, skill count caps, or normalization safeguards. That creates storage bloat and performance/availability risk, and can amplify downstream rendering problems.
**Recommended fix**
- Add maximum skill count and per-field length caps
- Reject duplicate or oversized payloads
- Normalize and sanitize string fields before persistence
- Add request body size limits on the transport layer
### Medium: Reset script uses unsafe raw SQL and weak default bootstrap credentials
**Evidence**
- `packages/db/src/reset-dispo-import.ts:24-31` defaults to `admin@capakraken.dev` / `admin123`
- `packages/db/src/reset-dispo-import.ts:107-115` uses `prisma.$executeRawUnsafe(...)`
**Impact**
This is operational tooling, not a normal app route, so the exposure is lower. But if used in the wrong environment, it can bootstrap a predictable admin credential and relies on unsafe SQL execution patterns.
**Recommended fix**
- Remove default admin password and require explicit credentials
- Keep the script restricted to local/admin-only environments
- Replace `$executeRawUnsafe` with a safer pattern if practical, or add a hard guard that blocks execution outside approved environments
### Low: Inline script in layout is static today, but CSP hardening is still missing
**Evidence**
- `apps/web/src/app/layout.tsx:27-33` uses `dangerouslySetInnerHTML` for theme bootstrapping from `localStorage`
**Impact**
The current snippet is static and does not interpolate user input, so this is not a major vulnerability by itself. The issue is that it keeps the app dependent on inline-script allowance unless a nonce/hash-based CSP is introduced.
**Recommended fix**
- Add a Content Security Policy
- Move to nonce or hash-based allowance for this bootstrap script
- Keep inline script content static and minimal
### Low: Dependency advisories also exist in tooling/dev chain
**Evidence**
`pnpm audit --json` additionally reports:
- `esbuild` moderate advisory in the Vite/Vitest chain
- `flatted` high advisory through the ESLint toolchain
**Impact**
These are materially lower priority than the `xlsx` parser issue because they appear in development/tooling paths rather than the primary production import surface. They still deserve cleanup to keep local developer environments safer and reduce future drift.
**Recommended fix**
- upgrade `esbuild` through Vite/Vitest where possible
- upgrade `flatted` via the ESLint dependency chain
- keep dependency audit in CI so new advisories are caught early
## Additional Findings (Independent Audit — 2026-03-15)
A second independent audit identified the following issues not covered above.
### Critical: Vacation cancel endpoint has no ownership check (IDOR)
**Evidence**
- `packages/api/src/router/vacation.ts:346-364`
- `cancel` uses `protectedProcedure`
- no check that the caller owns the vacation or the associated resource
**Impact**
Any authenticated user can cancel APPROVED or PENDING vacations for any employee by supplying the vacation ID. This is distinct from the vacation-create IDOR above and has equal or greater impact since it can undo already-approved time-off.
**Recommended fix**
- For `USER` role, verify `existing.requestedById === ctx.dbUser.id` or `resource.userId === ctx.dbUser.id`
- Allow cross-resource cancellation only for `MANAGER`/`ADMIN`
### High: No security headers (X-Frame-Options, HSTS, Referrer-Policy)
**Evidence**
- `apps/web/next.config.ts` contains no `headers()` configuration
- no `helmet` or equivalent middleware found
**Impact**
Clickjacking (missing X-Frame-Options), protocol downgrade (missing HSTS), information leakage (missing Referrer-Policy), MIME sniffing (missing X-Content-Type-Options).
**Recommended fix**
Add a `headers()` function to `next.config.ts` returning `X-Frame-Options: DENY`, `X-Content-Type-Options: nosniff`, `Referrer-Policy: strict-origin-when-cross-origin`, and `Strict-Transport-Security` for production.
### High: Weak password policy
**Evidence**
- `packages/api/src/router/user.ts:56` — password validation is `z.string().min(8)`
- `apps/web/src/server/auth.ts:9` — login accepts `z.string().min(1)`
- seed uses `admin123`, `manager123`, `viewer123`
**Impact**
Combined with the lack of rate limiting, weak passwords are easily brute-forced.
**Recommended fix**
Strengthen password schema to require minimum 12 characters with mixed case, digits, and special characters. Add a password change flow.
### High: Audit log entries missing user attribution
**Evidence**
- `packages/api/src/router/allocation.ts`, `project.ts`, `role.ts``auditLog.create` calls do not set `userId`
- the `AuditLog` Prisma model has a `userId` field, but router code does not populate it
**Impact**
Audit trails cannot attribute actions to specific users. Insider threats are untraceable.
**Recommended fix**
Always include `userId: ctx.dbUser?.id` in every `auditLog.create` call. Consider middleware-based automatic attribution.
### High: `user.list` exposes all users to any authenticated user
**Evidence**
- `packages/api/src/router/user.ts:18-29``protectedProcedure`
- returns `name`, `email`, `systemRole`, `createdAt` for all users
**Impact**
Full user directory enumeration by any authenticated account including VIEWER role. Facilitates social engineering.
**Recommended fix**
Restrict to `adminProcedure` or `managerProcedure`.
### Medium: No Next.js middleware for route protection
**Evidence**
- no `middleware.ts` exists in `apps/web/src/`
**Impact**
Authentication is only enforced at the API/tRPC layer. Page shells and layouts render for unauthenticated requests before client-side auth kicks in.
**Recommended fix**
Add `middleware.ts` that redirects unauthenticated users to `/auth/signin` for all routes under `/(app)/`.
### Medium: Blueprint `updateRolePresets` accepts unvalidated JSON
**Evidence**
- `packages/api/src/router/blueprint.ts:72``rolePresets: z.array(z.unknown())`
**Impact**
Arbitrary JSON stored in JSONB field. Could cause rendering errors or storage bloat.
**Recommended fix**
Define at minimum a basic shape schema. Add serialized size check.
### Medium: Docker Compose hardcodes NEXTAUTH_SECRET
**Evidence**
- `docker-compose.yml:46-49``NEXTAUTH_SECRET: dev-secret-change-in-production`
**Impact**
If used in non-dev environments, sessions can be forged with the known secret.
**Recommended fix**
Reference environment variable: `NEXTAUTH_SECRET: ${NEXTAUTH_SECRET}`. Add startup check.
### Medium: Entitlement balance accessible without ownership check
**Evidence**
- `packages/api/src/router/entitlement.ts:103-145``protectedProcedure`, accepts any `resourceId`
**Impact**
Any authenticated user can query vacation balances for any employee.
**Recommended fix**
For USER-role callers, restrict to their own linked resource.
### Medium: Email content not sanitized for HTML injection
**Evidence**
- `packages/api/src/lib/email.ts``html` parameter available in `EmailPayload`
- `rejectionReason` field included in email body without sanitization
**Impact**
If HTML emails are ever sent, user-supplied rejection reasons could contain executable HTML.
**Recommended fix**
Apply HTML escaping to all user-supplied text in emails.
### Low: JWT session strategy without explicit maxAge or rotation
**Evidence**
- `apps/web/src/server/auth.ts:60-62``strategy: "jwt"` with no `maxAge`
**Impact**
Stolen JWTs remain valid for 30 days (Auth.js default). No session revocation mechanism.
**Recommended fix**
Set shorter `maxAge` (e.g., 24 hours). Implement token rotation.
### Low: tRPC error formatter exposes Zod validation details
**Evidence**
- `packages/api/src/trpc.ts:34-44``zodError` flattened in error responses
**Impact**
Internal field names and validation rules leak to clients.
**Recommended fix**
Conditionally include `zodError` only when `NODE_ENV=development`.
### Low: Redis connection without authentication
**Evidence**
- `packages/api/src/sse/event-bus.ts:80``redis://localhost:6380` with no password
- Docker Compose Redis has no `requirepass`
**Impact**
In production, unauthenticated Redis can be accessed by network neighbors.
**Recommended fix**
Configure Redis with `requirepass` in production.
### Low: `user.setPermissions` accepts arbitrary string arrays
**Evidence**
- `packages/api/src/router/user.ts:116-135``granted: z.array(z.string())`
**Impact**
Invalid permission keys can be stored in the database.
**Recommended fix**
Use `z.array(z.enum([...PermissionKey values]))` to validate known keys.
### Positive findings
- Argon2 password hashing via `@node-rs/argon2`
- All write operations gated behind `managerProcedure` or `adminProcedure`; no `publicProcedure` in routers
- Prisma ORM prevents SQL injection (no raw SQL in app routes)
- Comprehensive anonymization system consistently applied across endpoints
- Zod input validation on all tRPC endpoints
- `.env` files properly gitignored
- SSE endpoint requires authentication (lacks filtering)
- Report endpoint applies anonymization
## Recommended Remediation Order
### Immediate
1. Fix vacation ownership checks
2. Lock down the SSE timeline feed by audience/permission
3. Require `dbUser` for all protected access
4. Tighten the most sensitive read routes (`user`, `vacation`, `allocation`, project detail)
### Next
1. Replace or isolate `xlsx` for all untrusted parsing
2. Add auth/admin rate limiting
3. Sanitize admin diagnostics and improve secret storage
4. Add limits to skill matrix import payloads
### After That
1. Harden the reset script
2. Add CSP and browser hardening headers
3. Clean up dev/tooling dependency advisories
## Concrete Improvement Backlog
- Build a permission matrix for every tRPC procedure
- Introduce `authenticatedDbUserProcedure` as the new default
- Separate self-service APIs from planning/admin APIs
- Add structured audit logging for sensitive reads and admin test actions
- Add request size limits for upload/import endpoints
- Add CI checks for `pnpm audit`, secret scanning, and security regression tests
## Suggested Verification After Fixes
- authorization tests for every role against high-risk routes
- SSE tests proving event isolation by role/user
- import fuzz tests with malformed and oversized spreadsheets
- login throttling tests
- regression tests for vacation ownership rules
## Bottom Line
The codebase is not in a catastrophic state, but it is currently too trusting of any authenticated session. The most important improvements are authorization narrowing, event-stream scoping, and safer spreadsheet import handling.