feat(platform): harden access scoping and delivery baseline

2026-03-30 00:27:31 +02:00
parent 00b936fa1f
commit 819345acfa
109 changed files with 26142 additions and 8081 deletions
@@ -7,6 +7,8 @@

 | Topic | File | Use |
 |---|---|---|
+| AI excellence due diligence | [ai-excellence-due-diligence-roadmap.md](/home/hartmut/Documents/Copilot/capakraken/docs/ai-excellence-due-diligence-roadmap.md) | Frank quality assessment and cleanup roadmap toward a showcase AI-built project |
+| Target CI/CD architecture | [cicd-target-architecture.md](/home/hartmut/Documents/Copilot/capakraken/docs/cicd-target-architecture.md) | Proposed image-based build, deploy, and rollback flow |
 | Active roadmap and open gaps | [product-roadmap.md](/home/hartmut/Documents/Copilot/capakraken/docs/product-roadmap.md) | Primary backlog and current delivery order |
 | Estimating system design | [estimating-extension-design.md](/home/hartmut/Documents/Copilot/capakraken/docs/estimating-extension-design.md) | Workbook analysis, field mapping, and implementation plan |
 | Dispo import implementation | [dispo-import-implementation.md](/home/hartmut/Documents/Copilot/capakraken/docs/dispo-import-implementation.md) | Clean-slate Dispo v2 import design, mapping rules, staging flow, and commit policy |
@@ -0,0 +1,239 @@
+# AI Excellence Due Diligence And Roadmap
+
+**Date:** 2026-03-29
+**Purpose:** Frank assessment of the current codebase plus a pragmatic roadmap to turn CapaKraken into a reference project for disciplined AI-assisted software engineering.
+
+## Executive Summary
+
+CapaKraken is already well beyond a prototype. The repository shows a real domain model, a non-trivial bounded-context split, a meaningful automated test baseline, and active delivery discipline.
+
+At the same time, the codebase still carries several risks that are typical of fast-moving AI-assisted development:
+
+1. some critical cross-cutting concerns are only partially productized
+2. several files and routers have grown beyond comfortable ownership size
+3. broad read access and global real-time fan-out still leak too much internal state
+4. spreadsheet import parsing remains a security and reliability weak point
+5. the current operational model is improving, but not yet fully standardized
+
+The project feels strong enough to build on, but it is not yet a showcase of "how AI-built software should look" without another cleanup and hardening pass.
+
+## Current Strengths
+
+- Clear monorepo and package split across `api`, `application`, `db`, `engine`, `shared`, `staffing`, `ui`, and `web`, with shared tooling through `turbo` and `pnpm`.
+- Product scope is substantial and business-oriented rather than CRUD-only: estimating, planning, demand/assignment, chargeability, import/export, dashboards, report building, and admin surfaces.
+- CI already enforces typecheck, lint, unit tests, build, and E2E with PostgreSQL and Redis in the loop.
+- Application-layer use cases exist and are not just thin router wrappers.
+- Documentation coverage is materially better than average for a fast-moving product.
+
+## Due Diligence Findings
+
+### Critical
+
+1. Real-time SSE delivery is still global instead of audience-scoped.
+   Evidence: [route.ts](/home/hartmut/Documents/Copilot/capakraken/apps/web/src/app/api/sse/timeline/route.ts) subscribes any authenticated user to the same bus, and [event-bus.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/sse/event-bus.ts) maintains one global subscriber set and broadcasts events without per-user or per-role filtering.
+   Risk: internal planning, vacation, budget, task, and notification metadata can be over-shared to authenticated users who should not see global changes.
+
+2. Untrusted spreadsheet parsing still depends on `xlsx@0.18.5`.
+   Evidence: import parsing remains in [read-workbook.ts](/home/hartmut/Documents/Copilot/capakraken/packages/application/src/use-cases/dispo-import/read-workbook.ts), browser-side parsing remains in [excel.ts](/home/hartmut/Documents/Copilot/capakraken/apps/web/src/lib/excel.ts) and [skillMatrixParser.ts](/home/hartmut/Documents/Copilot/capakraken/apps/web/src/lib/skillMatrixParser.ts), and the package is still declared in [apps/web/package.json](/home/hartmut/Documents/Copilot/capakraken/apps/web/package.json) and [packages/application/package.json](/home/hartmut/Documents/Copilot/capakraken/packages/application/package.json).
+   Risk: file-import attack surface remains higher than acceptable for a flagship reference implementation.
+
+### High
+
+1. Several high-sensitivity read paths are still too broad for least-privilege.
+   Evidence: multiple planning, resource, project, dashboard, allocation, and timeline reads still use `protectedProcedure` rather than narrower role-specific gates in [dashboard.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/dashboard.ts), [timeline.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/timeline.ts), [allocation.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/allocation.ts), [resource.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/resource.ts), and [project.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/project.ts).
+   Risk: authorization intent remains hard to reason about and easy to regress.
+
+2. Router and UI module size is now an operational risk.
+   Evidence: [assistant-tools.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/assistant-tools.ts), [resource.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/resource.ts), [allocation.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/allocation.ts), [timeline.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/timeline.ts), [vacation.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/vacation.ts), and large frontend files such as [SystemSettingsClient.tsx](/home/hartmut/Documents/Copilot/capakraken/apps/web/src/components/admin/SystemSettingsClient.tsx) and [TimelineProjectPanel.tsx](/home/hartmut/Documents/Copilot/capakraken/apps/web/src/components/timeline/TimelineProjectPanel.tsx) are each well past the size where safe ownership stays easy.
+   Risk: AI-generated changes become harder to review, humans lose local reasoning context, and regressions become more likely.
+
+3. Secret handling is still application-database centric.
+   Evidence: system settings mutate and persist API keys and SMTP credentials in [settings.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/settings.ts).
+   Risk: operational secrets remain too coupled to the main app data plane for a gold-standard project.
+
+### Medium
+
+1. Rate limiting is process-local and not deployment-grade.
+   Evidence: [rate-limit.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/middleware/rate-limit.ts) uses an in-memory `Map` and explicitly notes that multi-instance deployments need Redis-backed replacement.
+   Risk: protections weaken as soon as the app scales horizontally.
+
+2. Performance hotspots are well understood but not yet structurally solved.
+   Evidence: the current performance review identifies repeated in-memory filtering, broad invalidation, and heavyweight timeline/report derivations in [performance-optimization-review-2026-03-18.md](/home/hartmut/Documents/Copilot/capakraken/docs/performance-optimization-review-2026-03-18.md).
+   Risk: user experience and infrastructure cost will degrade as data volume grows.
+
+3. Production delivery is still in transition.
+   Evidence: the current repo now has a target CI/CD path, but the old manual production path still coexists with the new image-based deploy model in [cicd-target-architecture.md](/home/hartmut/Documents/Copilot/capakraken/docs/cicd-target-architecture.md).
+   Risk: the operational source of truth is not yet singular.
+
+## Overall Rating
+
+### Product Engineering Quality
+
+`8/10`
+
+This is materially better than a typical startup CRUD app and already has the bones of a serious internal platform or vertical SaaS.
+
+### Security Posture
+
+`6/10`
+
+There are good foundations, but the remaining SSE, spreadsheet, and least-privilege gaps are not acceptable for a "parade example" yet.
+
+### Maintainability
+
+`6.5/10`
+
+The architecture is promising, but file size, router density, and compatibility residue will eventually slow everyone down unless addressed deliberately.
+
+### Operational Maturity
+
+`6.5/10`
+
+Good CI and improving deploy discipline are in place, but production standardization still needs one more step.
+
+### AI-Excellence Readiness
+
+`6/10`
+
+The project already proves that AI can help build serious software fast. It does not yet prove that AI-assisted development can stay consistently clean, minimal, and audit-friendly at scale.
+
+## What A Showcase AI Project Should Demonstrate
+
+To be a true showcase for AI-assisted development, this repository should visibly demonstrate:
+
+- small, composable files with clear ownership boundaries
+- explicit security and permission models at every boundary
+- deterministic build and deploy flow
+- measurable quality gates beyond "tests pass"
+- strong documentation that explains not only what exists, but why the structure is this way
+- low-friction reviewability, so humans can still govern AI speed
+
+## Roadmap
+
+### Phase 1: Close the Dangerous Gaps
+
+Target window: 1 to 2 weeks
+
+Goals:
+
+- Replace global SSE fan-out with audience-aware channels.
+- Remove `xlsx` from untrusted import paths or isolate it behind a hardened parser boundary.
+- Create a route access matrix and reclassify broad `protectedProcedure` read endpoints.
+- Move production secrets out of regular application settings, or add an interim encrypted-secrets layer with clear migration path.
+
+Definition of done:
+
+- standard users cannot subscribe to unrelated real-time planning events
+- file import paths have documented limits and safer parsing
+- every sensitive router is explicitly classified by audience
+- secret storage policy is documented and enforced
+
+### Phase 2: Cut Down Complexity
+
+Target window: 2 to 4 weeks
+
+Goals:
+
+- Split oversized routers into bounded router modules by feature slice.
+- Split oversized React components into container, state, and presentational layers.
+- Introduce file-size and complexity guardrails for new code.
+- Create "AI review rules" for generated patches: max file growth, required tests, required docs for cross-cutting changes.
+
+Priority candidates:
+
+- `packages/api/src/router/assistant-tools.ts`
+- `packages/api/src/router/resource.ts`
+- `packages/api/src/router/allocation.ts`
+- `packages/api/src/router/timeline.ts`
+- `apps/web/src/components/admin/SystemSettingsClient.tsx`
+- `apps/web/src/components/timeline/*`
+
+Definition of done:
+
+- no new source file over 500 lines without an explicit exception
+- top 10 largest business-critical source files are materially reduced
+- patch reviews become narrower and easier to reason about
+
+### Phase 3: Make Quality Measurable
+
+Target window: 2 to 3 weeks
+
+Goals:
+
+- Add architecture fitness checks, not just lint/tests.
+- Add API authorization tests for all sensitive routers.
+- Add bundle-size and route-size monitoring for the web app.
+- Add mutation-path audit coverage checks where business-critical state changes occur.
+- Add a dependency and unsafe-library policy.
+
+Suggested checks:
+
+- role/permission regression tests
+- SSE audience contract tests
+- import abuse tests with oversized files
+- max file size / max router size lint or CI checks
+- coverage thresholds for critical packages
+
+Definition of done:
+
+- the repo can fail CI for architectural regressions, not only syntax or unit failures
+- critical security assumptions are test-backed
+
+### Phase 4: Standardize Operations
+
+Target window: 1 to 2 weeks
+
+Goals:
+
+- complete the move to image-based deploys as the canonical path
+- document staging and production bootstrap as code, not tribal knowledge
+- replace in-memory rate limits with Redis-backed limits where appropriate
+- define rollback drills and incident response playbooks
+
+Definition of done:
+
+- one production deployment path
+- one rollback path
+- one source of truth for runtime configuration
+
+### Phase 5: Turn It Into A Reference Project
+
+Target window: ongoing
+
+Goals:
+
+- add a concise engineering doctrine for AI-assisted development in this repo
+- publish coding heuristics for humans and AI: file size limits, change budgets, ownership boundaries, review expectations
+- maintain a "why this is structured this way" architecture guide
+- log selected before/after refactors to demonstrate how AI was used responsibly
+
+Artifacts to add:
+
+- `docs/engineering-doctrine.md`
+- `docs/architecture-decision-records/`
+- `docs/ai-collaboration-standards.md`
+- a small set of "reference slices" that show exemplary patterns end to end
+
+## Suggested Order Of Execution
+
+1. SSE scoping
+2. spreadsheet import hardening
+3. access-matrix and authorization tightening
+4. secrets policy
+5. router/component decomposition
+6. architecture fitness checks in CI
+7. full operational standardization
+
+## Success Criteria For The Next 60 Days
+
+- no critical or high-severity known security gap remains open without an owner and due date
+- no core router continues to grow unchecked
+- at least one major domain slice is refactored into a clear "reference implementation" pattern
+- production deployment uses the same artifact that passed CI
+- the repo gains explicit AI-development rules that improve reviewability instead of just increasing output
+
+## Bottom Line
+
+CapaKraken is already good enough to justify further investment. It is not a cleanup disaster.
+
+The opportunity is not to rebuild it. The opportunity is to harden the weak edges, reduce oversized ownership surfaces, and make the engineering standards visible enough that the repository becomes evidence that AI can accelerate serious software without normalizing architectural debt.
@@ -14,11 +14,10 @@ Der Assistant ist bereits relativ breit aufgestellt:

 Trotzdem ist die Paritaet zur eigentlichen App/API noch nicht erreicht. Die groessten Luecken liegen nicht bei "gar nichts vorhanden", sondern bei:

- fehlenden Admin- und Konfigurationsfaehigkeiten,
- fehlenden tiefen Fach-Readmodels,
- inkonsistentem Permission-Gating,
- fehlender serverseitiger Absicherung fuer schreibende AI-Aktionen,
- und einigen objektbezogenen Sichtbarkeitsfehlern.
+- weiterhin fehlenden tiefen Fach-Readmodels und Spezialworkflows,
+- noch nicht vollstaendiger Router-/Objektscope-Paritaet,
+- fehlender Approval-/Governance-UX ausserhalb des Chats,
+- und einigen verbleibenden objektbezogenen Sichtbarkeitsfehlern.

 ## Architektur des Assistants

@@ -76,8 +75,38 @@ Es gibt aktuell vier Permission-/Scope-Ebenen:
  - `import_csv_data`
  - `list_dispo_import_batches`
  - `get_dispo_import_batch`
-  - damit sind CSV-Export, CSV-Import und die Batch-Uebersicht der Dispo-Importe jetzt ueber echte Router-Pfade verfuegbar
- Estimates: nur Suche, Detail und Anlegen, aber kein voller Lifecycle
+  - `stage_dispo_import_batch`
+  - `validate_dispo_import_batch`
+  - `cancel_dispo_import_batch`
+  - `list_dispo_staged_resources`
+  - `list_dispo_staged_projects`
+  - `list_dispo_staged_assignments`
+  - `list_dispo_staged_vacations`
+  - `list_dispo_staged_unresolved_records`
+  - `resolve_dispo_staged_record`
+  - `commit_dispo_import_batch`
+  - damit sind CSV-Export, CSV-Import sowie die operativen Dispo-Import-Workflows jetzt ueber echte Router-Pfade verfuegbar
+- Admin-/Systemsteuerung:
+  - `get_system_settings`
+  - `update_system_settings`
+  - `test_ai_connection`
+  - `test_smtp_connection`
+  - `test_gemini_connection`
+  - `get_ai_configured`
+  - `list_system_role_configs`
+  - `update_system_role_config`
+  - `list_webhooks`
+  - `get_webhook`
+  - `create_webhook`
+  - `update_webhook`
+  - `delete_webhook`
+  - `test_webhook`
+  - Settings/Webhooks laufen ueber die echten Router; Secrets werden in Assistant-Antworten maskiert
+- Estimates:
+  - Suche
+  - Detail / Weekly Phasing / Commercial Terms auf Controller-/Manager-/Admin-Niveau
+  - zentrale Lifecycle-Mutationen inkl. Revision, Export und Planning Handoff
+  - Restluecke: fachlich tiefere Unterobjekte und Spezialworkflows ausserhalb der bereits angebundenen Router-Operationen
 - Reports: `run_report` ist flexibel, deckt aber nicht die spezialisierten Report-/Analyse-Readmodels ab
 - Chargeability / Transparenz:
  - `get_chargeability_report`
@@ -88,24 +117,24 @@ Es gibt aktuell vier Permission-/Scope-Ebenen:
  - vereinfachte History-Abfragen
  - echte Audit-API fuer Liste, Detail, Entity-History, Timeline und Activity Summary
  - Governance-Workbench ausserhalb des Chats bleibt offen
- Notification/Tasking: Kernfaelle vorhanden, aber keine volle Reminder-/Task-/Notification-Paritaet
+- Notification/Tasking:
+  - Self-Service-Reads und Reminder-Verwaltung vorhanden
+  - Manager-/Admin-Lifecycle fuer Notification-, Task- und Broadcast-Workflows vorhanden
+  - Restluecke: weitere Spezialfaelle ausserhalb der bereits exponierten Router-Operationen
 - Country-/Location-Stammdaten: nur lesend und auch dort nur flach
 - Insights: Summary-Ebene vorhanden, Drilldowns fehlen
+- Rollen-/Client-/Org-Unit-Stammdaten:
+  - Kernmutationen fuer Rollen, Clients und Org-Units laufen jetzt ueber die echten Router-Pfade statt ueber Assistant-Sonderlogik
+  - Restluecke: weitere Readmodels und Lifecycle-Faelle ausserhalb der bereits exponierten Router-Operationen

 ### Vollstaendig fehlend oder fachlich nicht ausreichend

- Webhook-Administration
- System Settings / AI / SMTP / Image-Provider Administration
- System Role Config Administration
- Import/Export-Flows
- User Self-Service und Preferences
- Country- und Metro-City-Administration
- Voller Estimate-Lifecycle
- Dispo-/Import-spezifische Flows
+- Country- und Metro-City-Administration ausserhalb der bereits vorhandenen Kernmutationen
+- Governance-/Approval-Workspace ausserhalb des Chats

 ## Kritische Inkonsistenzen und Risiken

-Stand 2026-03-29: Die frueheren P0s bei Notification-Scoping, `list_users`, Mutation-Audit und reinen Permission-Texten sind behoben. Holiday-Calendar-Lesezugriffe sowie Admin-Mutationen fuer Kalender und Entries sind jetzt im Assistant vorhanden. Die folgenden Punkte bleiben relevant.
+Stand 2026-03-29: Die frueheren P0s bei Notification-Scoping, `list_users`, Mutation-Audit und reinen Permission-Texten sind behoben. Zusaetzlich sind User-Self-Service/Admin-Paritaet im Tool-Visibility-Layer, Manager-Notification-Lifecycle und die wichtigsten Estimate-Lifecycle-/Readmodel-Gates jetzt an die Router-Rollen angenaehert. Holiday-Calendar-Lesezugriffe sowie Admin-Mutationen fuer Kalender und Entries sind ebenfalls im Assistant vorhanden. Die folgenden Punkte bleiben relevant.

 ### P0: Human-in-the-Loop ist serverseitig persistiert, aber noch nicht als vollwertiger Approval-Workspace ausgebaut

@@ -174,6 +203,7 @@ Aktuell im Assistant vorhanden:
 Restluecke:

 - Country-/Metro-City-Stammdaten und tiefere Standortregeln sind weiterhin nicht in derselben Pflegebreite wie die eigentliche Admin-Oberflaeche abgedeckt
+- Weitere Admin-Stammdatenbereiche mit direkten Assistant-Queries, vor allem Resource-/Project-/Vacation-Spezialfaelle, brauchen weiterhin Router-Paritaet

 ### Timeline und Disposition

@@ -261,9 +291,17 @@ Fehlend:
 - Webhooks:
  - Liste, Detail, Create, Update, Delete, Test

-Konsequenz:
+Aktuell im Assistant vorhanden:

- Ein Admin kann in der UI deutlich mehr Systemsteuerung als der Assistant.
+- System Settings lesen/aktualisieren
+- AI-/SMTP-/Gemini-Connection-Tests
+- AI-Konfigurationsstatus lesen
+- System-Role-Configs lesen/aktualisieren
+- Webhooks lesen/anlegen/aendern/loeschen/testen
+
+Restluecke:
+
+- eigenstaendige Admin-Oberflaechen und mehrschrittige Governance-Workflows ausserhalb des Chats

 ### User Self-Service

@@ -276,7 +314,7 @@ Konsequenz:

 Konsequenz:

- Der Assistant kennt den Nutzerkontext nur oberflaechlich, aber nicht dessen persoenliche Einstellungen und Self-Service-Moeglichkeiten.
+- Die wichtigsten Self-Service-Bausteine sind inzwischen vorhanden; offen bleiben vor allem tiefere persoenliche Einstellungs- und Spezialflows ausserhalb der bereits exponierten Router-Prozeduren.

 ### Stammdaten fuer Laender und Orte

@@ -296,50 +334,42 @@ Restluecke:

 ### Estimate-Lifecycle und Fachobjekte unterhalb des Estimates

- volle Estimate-Listen-/Detail-Paritaet
- Versionen, Scope Items, Demand Lines, Locking, Freigaben, weiterfuehrende Mutationen
-
 Aktuell im Assistant vorhanden:

 - Suche
- Baseline-Detail
- Anlegen
+- Detail / Weekly Phasing / Commercial Terms
+- Anlegen, Klonen, Draft-Update, Submit, Approve, Revision, Export, Planning Handoff

 Fehlend:

- der eigentliche Arbeitsprozess auf Estimate-Ebene
+- tiefere Unterobjekt- und Spezialworkflows jenseits der bereits angebundenen Router-Prozeduren

 ### Notifications, Tasks und Reminder

 Vorhanden:

- Listen, Task-Detail, Statuswechsel, Reminder anlegen, Task fuer User anlegen, Broadcast senden
+- Listen, Unread Count, Task-Detail, Task Counts, Statuswechsel
+- Reminder anlegen, listen, aktualisieren, loeschen
+- generische Notification-Erstellung, Task fuer User anlegen, Task zuweisen, Broadcast senden/listen

 Fehlend:

- Reminder-Liste
- Reminder-Update/Delete
- Unread Count
- Task Counts
- generische Notification-Erstellung mit derselben Tiefe wie `notificationRouter`
+- weitere Spezialfaelle ausserhalb der bereits exponierten Notification-Router-Prozeduren

 ## Capability Gaps nach Router

 ### Komplett fehlende Router-Paritaet

- `settings`
- `systemRoleConfig`
- `webhook`
+- derzeit keine in den zuvor priorisierten Admin-/Audit-/Import-Bereichen

 ### Deutlich unvollstaendige Router-Paritaet

 - `importExport`
- `dispo`
 - `timeline` (Kern-Readmodels und wichtigste Write-Paritaet vorhanden, Spezial-Workflows fehlen)
 - `vacation`
+- `user`
 - `estimate`
 - `notification`
- `user`
 - `country`
 - `insights`
 - `scenario`
@@ -354,6 +384,11 @@ Fehlend:
 - `staffing`
 - `report`
 - `dashboard`
+- `settings`
+- `systemRoleConfig`
+- `webhook`
+- `importExport`
+- `dispo`

 ## System Prompt: offensichtliche Uebertreibungen / Irrefuehrungen

@@ -407,6 +442,7 @@ Die Human-in-the-Loop-Regel ist inzwischen serverseitig erzwungen. Der Prompt so
 - `update_holiday_entry`
 - `delete_holiday_entry`
 - `preview_resolved_holidays`
+- Status: Kern-Read/Write-Pfad und Preview sind umgesetzt; offen bleiben nur weitergehende Editor-/Governance-Flows.

 2. Timeline-Assistant-Strang bauen
 - Read:
@@ -422,6 +458,7 @@ Die Human-in-the-Loop-Regel ist inzwischen serverseitig erzwungen. Der Prompt so
 - `get_chargeability_report`
 - `get_resource_computation_graph`
 - `get_project_computation_graph`
+- Status: die zentralen Readmodels sind umgesetzt; offen bleibt vor allem breitere Reuse-Tiefe in weiteren Spezialansichten.

 ### P2: Admin- und Stammdaten-Paritaet

@@ -0,0 +1,193 @@
+# CI/CD Target Architecture
+
+## Goal
+
+This document captures the intended delivery model for CapaKraken without replacing the currently working manual production setup immediately.
+
+The target state is:
+
+1. CI validates every PR.
+2. GitHub Actions builds immutable Docker images.
+3. Staging and production pull those exact images from a registry.
+4. Database migrations run as an explicit deploy step.
+5. Traffic is considered safe only after the app answers `GET /api/ready`.
+
+## Core Idea
+
+The production host should stop building application code from a Git checkout. Instead, it should only:
+
+- pull a versioned `app` image
+- pull a matching `migrator` image
+- run Prisma deploy migrations
+- start the application container
+- wait for readiness
+
+That removes "works on the server but not in CI" drift and makes rollbacks much simpler.
+
+## Delivery Flow
+
+### 1. Pull Request Validation
+
+The existing `CI` workflow continues to validate:
+
+- typecheck
+- lint
+- unit tests
+- build
+- E2E
+
+This remains the quality gate before merge.
+
+### 2. Image Build
+
+The new manual workflow [release-image.yml](/home/hartmut/Documents/Copilot/capakraken/.github/workflows/release-image.yml) builds two images from [Dockerfile.prod](/home/hartmut/Documents/Copilot/capakraken/Dockerfile.prod):
+
+- `runner` target as the production app image
+- `migrator` target as the Prisma migration image
+
+Recommended tag format:
+
+- `sha-<git-commit>`
+
+Example:
+
+```text
+ghcr.io/<owner>/capakraken-app:sha-abc123
+ghcr.io/<owner>/capakraken-migrator:sha-abc123
+```
+
+### 3. Staging Deploy
+
+The staging workflow [deploy-staging.yml](/home/hartmut/Documents/Copilot/capakraken/.github/workflows/deploy-staging.yml) is intended to:
+
+1. connect to the staging host over SSH
+2. copy the deploy assets
+3. export `APP_IMAGE` and `MIGRATOR_IMAGE`
+4. run [deploy-compose.sh](/home/hartmut/Documents/Copilot/capakraken/tooling/deploy/deploy-compose.sh)
+
+The compose file used for this target flow is [docker-compose.cicd.yml](/home/hartmut/Documents/Copilot/capakraken/docker-compose.cicd.yml).
+
+### 4. Production Promotion
+
+The production workflow [deploy-prod.yml](/home/hartmut/Documents/Copilot/capakraken/.github/workflows/deploy-prod.yml) follows the same logic as staging, but the image tag is promoted manually.
+
+That means production uses an image that was already built and can already have been exercised in staging.
+
+## Required Infrastructure
+
+### Minimum
+
+- GitHub repository with Actions enabled
+- GHCR or another container registry
+- 1 Linux host with Docker and Docker Compose
+- PostgreSQL
+- Redis
+- reverse proxy such as nginx
+- SSH access from GitHub Actions to the host
+
+### Recommended
+
+- separate staging and production hosts
+- GitHub Environments for `staging` and `production`
+- required reviewer approval for `production`
+- backup strategy for PostgreSQL volumes
+- uptime monitoring and error tracking
+
+## Secrets
+
+### GitHub Environment Secrets
+
+For `staging`:
+
+- `STAGING_SSH_HOST`
+- `STAGING_SSH_PORT`
+- `STAGING_SSH_USER`
+- `STAGING_SSH_KEY`
+- `STAGING_DEPLOY_PATH`
+- `STAGING_APP_HOST_PORT`
+- `STAGING_GHCR_USERNAME`
+- `STAGING_GHCR_TOKEN`
+
+For `production`:
+
+- `PROD_SSH_HOST`
+- `PROD_SSH_PORT`
+- `PROD_SSH_USER`
+- `PROD_SSH_KEY`
+- `PROD_DEPLOY_PATH`
+- `PROD_APP_HOST_PORT`
+- `PROD_GHCR_USERNAME`
+- `PROD_GHCR_TOKEN`
+
+### Host-side Files
+
+Each target host should already have:
+
+- `.env.production`
+- Docker installed
+- network access to the container registry
+
+The repository now also contains a small host example at [tooling/deploy/.env.production.example](/home/hartmut/Documents/Copilot/capakraken/tooling/deploy/.env.production.example) and an operator note at [tooling/deploy/README.md](/home/hartmut/Documents/Copilot/capakraken/tooling/deploy/README.md).
+
+### Minimum Host Bootstrap
+
+For each target host, create a dedicated deploy directory such as `/opt/capakraken` and place these files there:
+
+```text
+docker-compose.cicd.yml
+.env.production
+tooling/deploy/deploy-compose.sh
+```
+
+`.env.production` should hold the long-lived runtime settings, including:
+
+```env
+POSTGRES_PASSWORD=<long-random-password>
+NEXTAUTH_URL=https://capakraken.example.com
+NEXTAUTH_SECRET=<long-random-secret>
+```
+
+GitHub Actions only injects the short-lived image references through `deploy.env`. The deploy script then loads both files before calling Docker Compose, so compose interpolation and container runtime env use the same source of truth.
+
+## Database Policy
+
+For release environments, use:
+
+```bash
+pnpm --filter @capakraken/db db:migrate:deploy
+```
+
+Do not use `db:push` as the main production deployment mechanism. `db:push` is convenient for local development, but it does not give the release traceability that a migration-based deploy requires.
+
+## Rollback Model
+
+Rollback should be image-based:
+
+1. choose the previous good `sha-...` tag
+2. run the production deploy workflow again with that tag
+3. confirm readiness
+
+This is only safe when schema changes follow backwards-compatible expand and contract rules.
+
+## How A Production Update Works
+
+The intended production update path is:
+
+1. merge to `main` after the existing CI workflow is green
+2. run [release-image.yml](/home/hartmut/Documents/Copilot/capakraken/.github/workflows/release-image.yml) to build immutable `app` and `migrator` images tagged as `sha-<commit>`
+3. run [deploy-staging.yml](/home/hartmut/Documents/Copilot/capakraken/.github/workflows/deploy-staging.yml) with that exact image tag
+4. GitHub Actions uploads the deploy bundle to the staging host and writes a temporary `deploy.env`
+5. [deploy-compose.sh](/home/hartmut/Documents/Copilot/capakraken/tooling/deploy/deploy-compose.sh) pulls images, starts PostgreSQL and Redis, runs Prisma deploy migrations, starts the new app container, and waits for `GET /api/ready`
+6. after staging is accepted, run [deploy-prod.yml](/home/hartmut/Documents/Copilot/capakraken/.github/workflows/deploy-prod.yml) with the same tag
+7. production repeats the same image-based flow, so the running artifact matches staging
+
+That means the production host no longer builds from Git. It only receives a versioned image and starts it after migrations complete.
+
+## Current Status
+
+The repository now contains the CI/CD scaffolding, but the existing manual production setup remains untouched:
+
+- current manual compose flow: [docker-compose.prod.yml](/home/hartmut/Documents/Copilot/capakraken/docker-compose.prod.yml)
+- current manual runbook: [ci-cd-manual.md](/home/hartmut/Documents/Copilot/capakraken/docs/ci-cd-manual.md)
+
+This allows the team to introduce the new path gradually instead of switching production in one step.
@@ -230,7 +230,7 @@ Estimated effort: medium

 Proposal:

- Avoid loading large supporting datasets such as `resource.list({ limit: 500 })` or “all projects” queries when only label resolution for selected IDs is required.
+- Avoid loading large supporting datasets such as `resource.listStaff({ limit: 500 })` or “all projects” queries when only label resolution for selected IDs is required; prefer `resource.directory` for lightweight label resolution.
 - Add lightweight lookup endpoints like `getByIds()` or `resolveLabels()`.
 - On the resources page, fetch chargeability/stat enrichments only for the visible page or current filtered result slice.

@@ -0,0 +1,122 @@
+# Route Access Matrix
+
+**Date:** 2026-03-29
+**Purpose:** Explicit interim audience model for sensitive API read routes while the broader least-privilege refactor is still in progress.
+
+## Decision Rules
+
+- `protectedProcedure`: only for clearly personal or low-sensitivity reads.
+- `controllerProcedure`: planning, financial, staffing, or portfolio-wide analytics that should only be visible to `CONTROLLER`, `MANAGER`, or `ADMIN`.
+- Ownership checks: self-service routes stay user-accessible, but only for the caller's own linked resource unless elevated staff access applies.
+
+## Applied In This Pass
+
+### Dashboard
+
+All routes in [dashboard.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/dashboard.ts) are treated as portfolio analytics and now require `controllerProcedure`.
+
+| Route | Classification | Reason |
+| --- | --- | --- |
+| `getOverview` | `controllerProcedure` | exposes global resource/project counts and budget context |
+| `getPeakTimes` | `controllerProcedure` | exposes portfolio-wide demand/utilization peaks |
+| `getTopValueResources` | `controllerProcedure` | exposes ranked value/cost-related resource data |
+| `getDemand` | `controllerProcedure` | exposes staffing demand by project/person/chapter |
+| `getDetail` | `controllerProcedure` | aggregates the above into assistant-facing strategic detail |
+| `getChargeabilityOverview` | `controllerProcedure` | already correctly scoped |
+| `getBudgetForecast` | `controllerProcedure` | exposes budget burn and exhaustion projections |
+| `getSkillGaps` | `controllerProcedure` | exposes org-wide capability shortfalls |
+| `getSkillGapSummary` | `controllerProcedure` | summary variant of strategic skill analytics |
+| `getProjectHealth` | `controllerProcedure` | exposes portfolio-level delivery risk indicators |
+
+### Vacation
+
+Routes in [vacation.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/vacation.ts) now distinguish between self-service and staff oversight.
+
+| Route | Classification | Reason |
+| --- | --- | --- |
+| `previewRequest` | self-service | personal validation before request creation |
+| `create` | self-service with ownership check | users may request only for their own resource |
+| `cancel` | self-service with ownership check | users may cancel only their own requests |
+| `list` | self-service scoped to own resource, or full staff view for manager/admin | broad vacation visibility is sensitive absence data |
+| `getById` | self-service scoped to own vacation, or full staff view for manager/admin | direct object lookup should not bypass ownership |
+| `getForResource` | self-service scoped to own resource, or full staff view for manager/admin | calculator support should not expose foreign absence data |
+| `getTeamOverlap` | self-service scoped to own resource, or full staff view for manager/admin | overlap warnings are valid, but only for the caller's team context |
+| `getPendingApprovals` | manager/admin | approval queue is supervisory data |
+
+### Resource
+
+Routes in [resource.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/resource.ts) remain partially deferred, but the clearest sensitive reads are now explicitly scoped.
+
+| Route | Classification | Reason |
+| --- | --- | --- |
+| `directory` | authenticated safe directory | dedicated low-sensitivity directory for generic pickers, filters, calendars, and lookups; returns only `id`, `eid`, `displayName`, `chapter`, and `isActive`, limits search to name/EID, and preserves anonymization behavior |
+| `getMyResource` | self-service | explicit route for the caller's linked resource |
+| `getChargeabilitySummary` | self-service scoped to own resource, or staff with `VIEW_ALL_RESOURCES` | exposes detailed capacity, holiday, assignment, and target data for an individual resource |
+| `getValueScores` | explicit permission gate `VIEW_SCORES` | ranked score output should not depend on ad hoc session-role strings |
+| `getById` | self-service scoped to own resource, or staff with `VIEW_ALL_RESOURCES` | full resource detail page includes person-level operational and cost context |
+| `getByEid` | self-service scoped to own resource, or staff with `VIEW_ALL_RESOURCES` | direct identifier lookup should not bypass ownership |
+| `getHoverCard` | self-service scoped to own resource, or staff with `VIEW_ALL_RESOURCES` | hover card exposes rates, role, skills, and staffing targets |
+| `getByIdentifier` | exact self lookup for regular users; broad lookup for staff with `VIEW_ALL_RESOURCES` | lightweight identifier read returns only identity-safe fields (`id`, `eid`, `displayName`, `chapter`, `isActive`) |
+| `getByIdentifierDetail` | exact self lookup for regular users; broad lookup for staff with `VIEW_ALL_RESOURCES` | explicit detail route for assistant and UI flows that truly need rates, targets, org placement, and skill/count context |
+| `resolveByIdentifier` | exact self lookup for regular users; broad lookup for staff with `VIEW_ALL_RESOURCES` | minimal identity resolver used by tool chains to convert free-form names/EIDs into canonical IDs without leaking cost or location detail |
+| `listSummaries` | staff with `VIEW_ALL_RESOURCES` | staff-only org search that returns non-financial summary cards for discovery and candidate selection |
+| `listSummariesDetail` | staff with `VIEW_ALL_RESOURCES` | explicit richer search variant for assistant/staff workflows that need FTE, LCR, and chargeability context |
+| `listStaff` | staff with `VIEW_ALL_RESOURCES` | explicit staff-only list for cost-aware, role-aware, and estimate/planning workflows; supports email search, rates, roles, and dynamic field filters |
+
+### Resource Directory Split
+
+This pass introduces an explicit audience split in [resource.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/resource.ts):
+
+- `resource.directory` is the default route for generic UI selectors and org-directory style lookups.
+- `resource.listStaff` is the explicit staff-only route for estimate, staffing, and scenario-planning screens that need cost-sensitive resource data.
+
+The following web consumers now use `resource.directory`:
+
+- generic resource comboboxes and assignment pickers
+- vacation filters and team calendar selectors
+- timeline quick filters, toolbar lookup, and project panel add-member search
+- project responsible-person picker
+- computation graph resource selector
+- batch skill import resource matching
+
+### Project
+
+Routes in [project.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/project.ts) now distinguish between lightweight project discovery and planning/commercial detail.
+
+| Route | Classification | Reason |
+| --- | --- | --- |
+| `resolveByIdentifier` | authenticated safe resolver | minimal project identity lookup for names/codes/IDs without commercial detail |
+| `searchSummaries` | authenticated safe summary search | lightweight project discovery returns only code, name, status, dates, and client |
+| `searchSummariesDetail` | `controllerProcedure` | exposes budget, win probability, and staffing/estimate counts |
+| `getByIdentifier` | authenticated safe identifier read | exact/fuzzy lookup returns only identity-safe project fields |
+| `getByIdentifierDetail` | `controllerProcedure` | exposes commercial and staffing detail, including budget, responsible person, category, and top allocations |
+| `list` | `controllerProcedure` | broad project listing can expose commercial/custom-field planning context |
+| `getById` | `controllerProcedure` | full project read model includes allocations, demands, and assignments |
+| `getShoringRatio` | `controllerProcedure` | derived staffing geography analytics should not be generally user-visible |
+
+### Timeline
+
+Routes in [timeline.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/timeline.ts) now split personal self-service reads from broad planning views.
+
+| Route | Classification | Reason |
+| --- | --- | --- |
+| `getEntries` | `controllerProcedure` | returns broad staffing allocations across projects/resources for a time window |
+| `getEntriesView` | `controllerProcedure` | exposes the full timeline read model, including demands and assignments |
+| `getHolidayOverlays` | `controllerProcedure` | org-wide absence overlays reveal staffing availability context |
+| `getMyEntriesView` | self-service scoped to own linked resource | personal timeline view for `USER`/`VIEWER`; ignores foreign resource scoping and never broadens beyond the caller's linked resource |
+| `getMyHolidayOverlays` | self-service scoped to own linked resource | personal holiday overlays for the caller's own timeline window without org-wide absence visibility |
+| `getEntriesDetail` | `controllerProcedure` | assistant-facing planning detail aggregates allocations, demands, assignments, and holiday summaries |
+| `getHolidayOverlayDetail` | `controllerProcedure` | detailed overlay summaries are planning-sensitive absence context |
+| `getProjectContext` | `controllerProcedure` | project-side planning context includes all allocations and cross-resource context |
+| `getProjectContextDetail` | `controllerProcedure` | detailed project timeline context exposes conflict and overlap analysis |
+| `previewShift` | `controllerProcedure` | shift preview computes operational and budget impacts before mutation |
+| `getShiftPreviewDetail` | `controllerProcedure` | detail variant includes project metadata plus cost/conflict preview |
+| `getBudgetStatus` | `controllerProcedure` | budget burn/remaining exposure is commercial data |
+
+## Review Standard
+
+- Any new sensitive read route must document one of:
+  - personal self-service ownership
+  - explicit role gate
+  - explicit permission gate
+- Any route returning portfolio-wide financial, staffing, scheduling, or HR absence data should not default to plain `protectedProcedure`.