Files
Nexus/docs/import-hardening.md
T
Hartmut 4a5edeef3e
CI / Unit Tests (pull_request) Successful in 5m46s
CI / Lint (pull_request) Failing after 3m49s
CI / E2E Tests (pull_request) Has been skipped
CI / Fresh-Linux Docker Deploy (pull_request) Has been skipped
CI / Assistant Split Regression (pull_request) Failing after 35s
CI / Architecture Guardrails (pull_request) Failing after 2m14s
CI / Typecheck (pull_request) Successful in 4m22s
CI / Build (pull_request) Has been skipped
CI / Release Images (pull_request) Has been skipped
rename(phase 1): CapaKraken → Nexus across code, UI, docs, CI
- @capakraken/* → @nexus/* across 12 packages (root + 11 workspaces),
  1551 import lines migrated via codemod
- User-visible brand strings renamed (emails, page titles, PWA
  manifest, mobile header, MFA backup-codes header, tooltips, signin
  page, invite page, weekly digest, install prompt)
- TOTP issuer "CapaKraken" → "Nexus" (existing secrets still valid;
  re-enrollment relabels them in users' authenticator apps)
- Function rename: assertCapaKrakenDbTarget → assertNexusDbTarget
- LocalStorage migration shim in apps/web/src/app/layout.tsx copies
  capakraken_* → nexus_* on first load (guarded by nexus_migrated_v1
  sentinel; runs once per browser, then never again)
- Service-worker cache name capakraken-v2 → nexus-v2 with one-time
  caches.delete('capakraken-v2') from the same shim
- Email-domain fixtures @capakraken.{dev,app} → @nexus.{dev,app} in
  seed data, e2e specs, SMTP default fallback
- Dockerfile.dev / Dockerfile.prod / all .github/workflows/*.yml
  pnpm --filter @capakraken/* → @nexus/*
- README, CLAUDE.md, LEARNINGS.md, all docs/*.md, .env.example,
  tooling/deploy/.env.production.example brand sweep

Phase 1 deliberately leaves untouched (handled in Phase 3 cutover):
- PostgreSQL DB name "capakraken" and POSTGRES_USER "capakraken"
- Volume names capakraken_pgdata etc.
- Compose project name "capakraken" / "capakraken-prod"
- db-target-guard default expectedDatabase
- env-var CAPAKRAKEN_EXPECTED_DB_NAME
- Container DNS names in docker-compose.ci.yml

Quality gates green: pnpm typecheck (7/7), pnpm test:unit (7/7),
pnpm lint (0 errors), check:exports/imports/architecture all pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 15:10:44 +02:00

2.3 KiB

Import Hardening

Date: 2026-03-30 Purpose: Define the safe parser boundary for untrusted spreadsheet imports.

Decision

  • Untrusted workbook imports no longer accept legacy .xls.
  • Server-side dispo imports accept only .xlsx files.
  • Browser-side ad hoc imports accept .xlsx and .csv.
  • Workbook import and export generation now use exceljs instead of direct runtime xlsx usage.

Server Boundary

The dispo-import reader in read-workbook.ts now enforces:

  • normalized filesystem paths before reading
  • regular-file checks
  • non-empty file checks
  • a hard size limit of 15 MiB
  • a worksheet row limit of 10,000
  • a worksheet column limit of 256
  • .xlsx-only parsing through exceljs behind a hardened server-side parser boundary

The API entry points in dispo.ts reject non-.xlsx workbook paths before staging or validation begins.

Browser Boundary

The browser import helpers in excel.ts and skillMatrixParser.ts now enforce:

  • a hard client-side file size limit of 10 MiB
  • explicit rejection of legacy .xls
  • a tabular row limit of 5,000 data rows plus the header row
  • a tabular column limit of 200
  • header validation that rejects blank and duplicate column names
  • .xlsx parsing through exceljs
  • .csv parsing through a local parser for simple tabular imports

Affected upload flows:

  • resource CSV/XLSX import
  • estimate scope spreadsheet import
  • single skill-matrix import
  • batch skill-matrix import

Rationale

  • .xls support keeps the old binary workbook format in the untrusted path without enough payoff.
  • the server path keeps compatibility-first .xlsx parsing for the current dispo workbooks, but only behind explicit file validation, size limits, and exceljs
  • the browser path moves away from blanket spreadsheet parsing to a narrower parser boundary
  • export generation follows the same maintained workbook stack as import parsing
  • CSV remains useful for lightweight business imports and is small enough to parse with a narrow local parser.