docs(architecture): refresh hardening status

2026-03-30 18:56:53 +02:00
parent dd71e8f80b
commit 4f5d410b94
3 changed files with 67 additions and 42 deletions
@@ -1,6 +1,6 @@
 # AI Excellence Due Diligence And Roadmap
-**Date:** 2026-03-29
+**Date:** 2026-03-30
 **Purpose:** Frank assessment of the current codebase plus a pragmatic roadmap to turn CapaKraken into a reference project for disciplined AI-assisted software engineering.
 ## Executive Summary
@@ -11,9 +11,9 @@ At the same time, the codebase still carries several risks that are typical of f
 1. some critical cross-cutting concerns are only partially productized
 2. several files and routers have grown beyond comfortable ownership size
-3. broad read access and global real-time fan-out still leak too much internal state
+3. runtime configuration and secret handling are still too application-database centric
-4. spreadsheet import parsing remains a security and reliability weak point
+4. the current operational model is improving, but not yet fully standardized
-5. the current operational model is improving, but not yet fully standardized
+5. production-grade multi-instance safeguards are not complete yet
 The project feels strong enough to build on, but it is not yet a showcase of "how AI-built software should look" without another cleanup and hardening pass.
@@ -25,32 +25,36 @@ The project feels strong enough to build on, but it is not yet a showcase of "ho
 - Application-layer use cases exist and are not just thin router wrappers.
 - Documentation coverage is materially better than average for a fast-moving product.
 ## Status Update Since Initial Review
 The highest-risk quick wins from the original review are now closed:
 - SSE delivery is now audience-scoped with architecture guardrails in CI
 - browser-side spreadsheet parsing now has focused regression coverage in `apps/web`
 - the route access matrix is in place and the ready-now audience-hardening slices were completed
 - comment visibility is now entity-scoped across API policy, assistant metadata, web consumers, and mention autocomplete
 ## Due Diligence Findings
 ### Critical
-1. Real-time SSE delivery is still global instead of audience-scoped.
+No currently open item in this review remains in the earlier "critical quick fix" class.
-   Evidence: [route.ts](/home/hartmut/Documents/Copilot/capakraken/apps/web/src/app/api/sse/timeline/route.ts) subscribes any authenticated user to the same bus, and [event-bus.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/sse/event-bus.ts) maintains one global subscriber set and broadcasts events without per-user or per-role filtering.
+The previously critical SSE and browser parser coverage issues were addressed during the hardening batch.
   Risk: internal planning, vacation, budget, task, and notification metadata can be over-shared to authenticated users who should not see global changes.
 2. Untrusted spreadsheet parsing still depends on `xlsx@0.18.5`.
   Evidence: import parsing remains in [read-workbook.ts](/home/hartmut/Documents/Copilot/capakraken/packages/application/src/use-cases/dispo-import/read-workbook.ts), browser-side parsing remains in [excel.ts](/home/hartmut/Documents/Copilot/capakraken/apps/web/src/lib/excel.ts) and [skillMatrixParser.ts](/home/hartmut/Documents/Copilot/capakraken/apps/web/src/lib/skillMatrixParser.ts), and the package is still declared in [apps/web/package.json](/home/hartmut/Documents/Copilot/capakraken/apps/web/package.json) and [packages/application/package.json](/home/hartmut/Documents/Copilot/capakraken/packages/application/package.json).
   Risk: file-import attack surface remains higher than acceptable for a flagship reference implementation.
 ### High
-1. Several high-sensitivity read paths are still too broad for least-privilege.
+1. Router and UI module size is now an operational risk.
   Evidence: multiple planning, resource, project, dashboard, allocation, and timeline reads still use `protectedProcedure` rather than narrower role-specific gates in [dashboard.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/dashboard.ts), [timeline.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/timeline.ts), [allocation.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/allocation.ts), [resource.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/resource.ts), and [project.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/project.ts).
   Risk: authorization intent remains hard to reason about and easy to regress.
 2. Router and UI module size is now an operational risk.
   Evidence: [assistant-tools.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/assistant-tools.ts), [resource.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/resource.ts), [allocation.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/allocation.ts), [timeline.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/timeline.ts), [vacation.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/vacation.ts), and large frontend files such as [SystemSettingsClient.tsx](/home/hartmut/Documents/Copilot/capakraken/apps/web/src/components/admin/SystemSettingsClient.tsx) and [TimelineProjectPanel.tsx](/home/hartmut/Documents/Copilot/capakraken/apps/web/src/components/timeline/TimelineProjectPanel.tsx) are each well past the size where safe ownership stays easy.
   Risk: AI-generated changes become harder to review, humans lose local reasoning context, and regressions become more likely.
-3. Secret handling is still application-database centric.
+2. Secret handling is still application-database centric.
   Evidence: system settings mutate and persist API keys and SMTP credentials in [settings.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/settings.ts).
   Risk: operational secrets remain too coupled to the main app data plane for a gold-standard project.
 3. Least-privilege is materially better documented now, but it still needs long-lived enforcement rather than relying mainly on one hardening batch.
   Evidence: the route audience model is now explicit in [route-access-matrix.md](/home/hartmut/Documents/Copilot/capakraken/docs/route-access-matrix.md) and backed by multiple focused auth tests, but the remaining guarantee still depends on continuing test coverage and architecture guardrails as new routes evolve.
   Risk: future feature work can slowly widen access again if the matrix and tests are not treated as an enforced contract.
 ### Medium
 1. Rate limiting is process-local and not deployment-grade.
@@ -75,9 +79,9 @@ This is materially better than a typical startup CRUD app and already has the bo
 ### Security Posture
-`6/10`
+`7/10`
-There are good foundations, but the remaining SSE, spreadsheet, and least-privilege gaps are not acceptable for a "parade example" yet.
+There are good foundations, and the most obvious real-time and comment-visibility gaps were closed, but secrets policy and long-lived least-privilege enforcement still need structural work.
 ### Maintainability
@@ -87,13 +91,13 @@ The architecture is promising, but file size, router density, and compatibility
 ### Operational Maturity
-`6.5/10`
+`7/10`
 Good CI and improving deploy discipline are in place, but production standardization still needs one more step.
 ### AI-Excellence Readiness
-`6/10`
+`7/10`
 The project already proves that AI can help build serious software fast. It does not yet prove that AI-assisted development can stay consistently clean, minimal, and audit-friendly at scale.
@@ -112,20 +116,20 @@ To be a true showcase for AI-assisted development, this repository should visibl
 ### Phase 1: Close the Dangerous Gaps
-Target window: 1 to 2 weeks
+Status: substantially completed
 Goals:
- Replace global SSE fan-out with audience-aware channels.
+- Keep SSE audience scoping under test and CI guardrails.
- Remove `xlsx` from untrusted import paths or isolate it behind a hardened parser boundary.
+- Keep hardened spreadsheet parser boundaries under regression coverage.
- Create a route access matrix and reclassify broad `protectedProcedure` read endpoints.
+- Treat the route access matrix and narrowed auth slices as maintained architecture contracts.
 - Move production secrets out of regular application settings, or add an interim encrypted-secrets layer with clear migration path.
 Definition of done:
 - standard users cannot subscribe to unrelated real-time planning events
- file import paths have documented limits and safer parsing
+- file import paths stay covered by focused regression tests
- every sensitive router is explicitly classified by audience
+- every sensitive router remains explicitly classified by audience
 - secret storage policy is documented and enforced
 ### Phase 2: Cut Down Complexity
@@ -216,13 +220,12 @@ Artifacts to add:
 ## Suggested Order Of Execution
-1. SSE scoping
+1. secrets policy
-2. spreadsheet import hardening
+2. router/component decomposition
-3. access-matrix and authorization tightening
+3. architecture fitness checks in CI
-4. secrets policy
+4. full operational standardization
-5. router/component decomposition
+5. production-grade rate limiting
-6. architecture fitness checks in CI
+6. performance hotspot reduction
 7. full operational standardization
 ## Success Criteria For The Next 60 Days
@@ -26,6 +26,17 @@
 No queued hardening slice is currently pinned in this document.
 Reassess after the current batch so the next item reflects the then-real highest-risk gap instead of stale cleanup residue.
 ## Remaining Major Themes
 The small hardening slices are effectively exhausted.
 The remaining work is now structural rather than another quick batch:
 1. secrets and runtime configuration policy
 2. oversized router and UI decomposition
 3. production-grade rate limiting
 4. canonical image-based production delivery
 5. performance hotspot reduction
 ## Working Rule
 For the next batches, prefer work in this order:
@@ -1,7 +1,7 @@
 # Audience Scoping Backlog
 **Date:** 2026-03-30
-**Purpose:** Collect the remaining audience-scoping work into a single batch backlog so the small auth-hardening slices can be finished before broader architecture work starts.
+**Purpose:** Historical record of the audience-scoping hardening batch and its exit state before larger architecture work begins.
 ## Status Snapshot
@@ -19,7 +19,10 @@
 - `project.isImageGenConfigured`, `project.isDalleConfigured`: covered as authenticated low-risk configuration checks
 - `notification` self-service and manager boundaries: auth-covered across list, unread counts, reminders, deletes, broadcasts, task creation, and assignment boundaries
 - `assistant-tools` parity metadata: descriptions and parity assertions now match narrowed router audiences for resource overview, controller-only, self-service, and manager broadcast/task tools
- `comment` architecture phase 1: generic free-form entity comments replaced by an explicit supported-entity registry, currently limited to `estimate` with controller/manager/admin access plus entity-aware checks on list/count/create/resolve/delete
+- `comment` entity support now uses an explicit supported-entity registry with:
  - `estimate` visibility for controller, manager, and admin
  - `resource` visibility aligned to resource detail ownership and staff-access rules
  - entity-scoped mention candidate lookup instead of the narrower assignment user directory
 ### Dirty Files To Avoid Mixing Into This Batch
@@ -30,7 +33,7 @@
 These files already have unrelated local edits. Audience parity work that would normally touch them should be deferred or handled through adjacent files and dedicated follow-up tests.
-## Remaining Categories
+## Final Batch Outcome
 ### Completed In This Batch
@@ -41,14 +44,17 @@ These files already have unrelated local edits. Audience parity work that would
 - `packages/api/src/router/resource.ts` -> `importSkillMatrix`
 - `packages/api/src/router/project.ts` -> `isImageGenConfigured`, `isDalleConfigured`
-### No Further Small Slices Currently Ready
+### No Further Small Slices Remain In This Batch
- the previously identified small hardening and tests/docs candidates have been completed, including the notification auth follow-up and assistant tool parity metadata cleanup
+- the previously identified small hardening and tests/docs candidates were completed, including the notification auth follow-up and assistant tool parity metadata cleanup
- the remaining audience work is now architectural (`comment.ts`) or depends on broader policy decisions rather than another ready-made auth slice
+- the formerly architectural `comment` follow-up is also completed through explicit entity onboarding and mention-audience alignment
 - no additional audience-scoping slice remains that is both small and isolated enough to justify another batch before larger architecture work
-## Recommended Next Order
+## Next Major Themes
-1. extend the comment entity registry only when a second real consumer exists and its backing audience is explicitly documented
+1. convert the still-open runtime secret model away from application-database centric storage
 2. add broader authorization regression coverage and long-lived guardrails around the narrowed route audiences
 3. reduce oversized routers and UI ownership surfaces so audience rules stay reviewable
 ## Slice Definition
@@ -67,3 +73,8 @@ Each “ready now” slice should follow the same template:
 - every formerly `ready now` route now has router-level authorization coverage or explicit low-risk documentation
 - the access matrix documents all low-risk exceptions explicitly
 - larger architecture work starts only after this batch is either completed or intentionally deferred
 Status:
 - this batch is complete
 - keep this file as a historical artifact, not as an active backlog