docs(architecture): refresh hardening status

2026-03-30 18:56:53 +02:00
parent dd71e8f80b
commit 4f5d410b94
3 changed files with 67 additions and 42 deletions
@@ -1,6 +1,6 @@
 # AI Excellence Due Diligence And Roadmap

-**Date:** 2026-03-29
+**Date:** 2026-03-30
 **Purpose:** Frank assessment of the current codebase plus a pragmatic roadmap to turn CapaKraken into a reference project for disciplined AI-assisted software engineering.

 ## Executive Summary
@@ -11,9 +11,9 @@ At the same time, the codebase still carries several risks that are typical of f

 1. some critical cross-cutting concerns are only partially productized
 2. several files and routers have grown beyond comfortable ownership size
-3. broad read access and global real-time fan-out still leak too much internal state
-4. spreadsheet import parsing remains a security and reliability weak point
-5. the current operational model is improving, but not yet fully standardized
+3. runtime configuration and secret handling are still too application-database centric
+4. the current operational model is improving, but not yet fully standardized
+5. production-grade multi-instance safeguards are not complete yet

 The project feels strong enough to build on, but it is not yet a showcase of "how AI-built software should look" without another cleanup and hardening pass.

@@ -25,32 +25,36 @@ The project feels strong enough to build on, but it is not yet a showcase of "ho
 - Application-layer use cases exist and are not just thin router wrappers.
 - Documentation coverage is materially better than average for a fast-moving product.

+## Status Update Since Initial Review
+
+The highest-risk quick wins from the original review are now closed:
+
+- SSE delivery is now audience-scoped with architecture guardrails in CI
+- browser-side spreadsheet parsing now has focused regression coverage in `apps/web`
+- the route access matrix is in place and the ready-now audience-hardening slices were completed
+- comment visibility is now entity-scoped across API policy, assistant metadata, web consumers, and mention autocomplete
+
 ## Due Diligence Findings

 ### Critical

-1. Real-time SSE delivery is still global instead of audience-scoped.
-   Evidence: [route.ts](/home/hartmut/Documents/Copilot/capakraken/apps/web/src/app/api/sse/timeline/route.ts) subscribes any authenticated user to the same bus, and [event-bus.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/sse/event-bus.ts) maintains one global subscriber set and broadcasts events without per-user or per-role filtering.
-   Risk: internal planning, vacation, budget, task, and notification metadata can be over-shared to authenticated users who should not see global changes.
-
-2. Untrusted spreadsheet parsing still depends on `xlsx@0.18.5`.
-   Evidence: import parsing remains in [read-workbook.ts](/home/hartmut/Documents/Copilot/capakraken/packages/application/src/use-cases/dispo-import/read-workbook.ts), browser-side parsing remains in [excel.ts](/home/hartmut/Documents/Copilot/capakraken/apps/web/src/lib/excel.ts) and [skillMatrixParser.ts](/home/hartmut/Documents/Copilot/capakraken/apps/web/src/lib/skillMatrixParser.ts), and the package is still declared in [apps/web/package.json](/home/hartmut/Documents/Copilot/capakraken/apps/web/package.json) and [packages/application/package.json](/home/hartmut/Documents/Copilot/capakraken/packages/application/package.json).
-   Risk: file-import attack surface remains higher than acceptable for a flagship reference implementation.
+No currently open item in this review remains in the earlier "critical quick fix" class.
+The previously critical SSE and browser parser coverage issues were addressed during the hardening batch.

 ### High

-1. Several high-sensitivity read paths are still too broad for least-privilege.
-   Evidence: multiple planning, resource, project, dashboard, allocation, and timeline reads still use `protectedProcedure` rather than narrower role-specific gates in [dashboard.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/dashboard.ts), [timeline.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/timeline.ts), [allocation.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/allocation.ts), [resource.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/resource.ts), and [project.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/project.ts).
-   Risk: authorization intent remains hard to reason about and easy to regress.
-
-2. Router and UI module size is now an operational risk.
+1. Router and UI module size is now an operational risk.
   Evidence: [assistant-tools.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/assistant-tools.ts), [resource.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/resource.ts), [allocation.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/allocation.ts), [timeline.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/timeline.ts), [vacation.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/vacation.ts), and large frontend files such as [SystemSettingsClient.tsx](/home/hartmut/Documents/Copilot/capakraken/apps/web/src/components/admin/SystemSettingsClient.tsx) and [TimelineProjectPanel.tsx](/home/hartmut/Documents/Copilot/capakraken/apps/web/src/components/timeline/TimelineProjectPanel.tsx) are each well past the size where safe ownership stays easy.
   Risk: AI-generated changes become harder to review, humans lose local reasoning context, and regressions become more likely.

-3. Secret handling is still application-database centric.
+2. Secret handling is still application-database centric.
   Evidence: system settings mutate and persist API keys and SMTP credentials in [settings.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/settings.ts).
   Risk: operational secrets remain too coupled to the main app data plane for a gold-standard project.

+3. Least-privilege is materially better documented now, but it still needs long-lived enforcement rather than relying mainly on one hardening batch.
+   Evidence: the route audience model is now explicit in [route-access-matrix.md](/home/hartmut/Documents/Copilot/capakraken/docs/route-access-matrix.md) and backed by multiple focused auth tests, but the remaining guarantee still depends on continuing test coverage and architecture guardrails as new routes evolve.
+   Risk: future feature work can slowly widen access again if the matrix and tests are not treated as an enforced contract.
+
 ### Medium

 1. Rate limiting is process-local and not deployment-grade.
@@ -75,9 +79,9 @@ This is materially better than a typical startup CRUD app and already has the bo

 ### Security Posture

-`6/10`
+`7/10`

-There are good foundations, but the remaining SSE, spreadsheet, and least-privilege gaps are not acceptable for a "parade example" yet.
+There are good foundations, and the most obvious real-time and comment-visibility gaps were closed, but secrets policy and long-lived least-privilege enforcement still need structural work.

 ### Maintainability

@@ -87,13 +91,13 @@ The architecture is promising, but file size, router density, and compatibility

 ### Operational Maturity

-`6.5/10`
+`7/10`

 Good CI and improving deploy discipline are in place, but production standardization still needs one more step.

 ### AI-Excellence Readiness

-`6/10`
+`7/10`

 The project already proves that AI can help build serious software fast. It does not yet prove that AI-assisted development can stay consistently clean, minimal, and audit-friendly at scale.

@@ -112,20 +116,20 @@ To be a true showcase for AI-assisted development, this repository should visibl

 ### Phase 1: Close the Dangerous Gaps

-Target window: 1 to 2 weeks
+Status: substantially completed

 Goals:

- Replace global SSE fan-out with audience-aware channels.
- Remove `xlsx` from untrusted import paths or isolate it behind a hardened parser boundary.
- Create a route access matrix and reclassify broad `protectedProcedure` read endpoints.
+- Keep SSE audience scoping under test and CI guardrails.
+- Keep hardened spreadsheet parser boundaries under regression coverage.
+- Treat the route access matrix and narrowed auth slices as maintained architecture contracts.
 - Move production secrets out of regular application settings, or add an interim encrypted-secrets layer with clear migration path.

 Definition of done:

 - standard users cannot subscribe to unrelated real-time planning events
- file import paths have documented limits and safer parsing
- every sensitive router is explicitly classified by audience
+- file import paths stay covered by focused regression tests
+- every sensitive router remains explicitly classified by audience
 - secret storage policy is documented and enforced

 ### Phase 2: Cut Down Complexity
@@ -216,13 +220,12 @@ Artifacts to add:

 ## Suggested Order Of Execution

-1. SSE scoping
-2. spreadsheet import hardening
-3. access-matrix and authorization tightening
-4. secrets policy
-5. router/component decomposition
-6. architecture fitness checks in CI
-7. full operational standardization
+1. secrets policy
+2. router/component decomposition
+3. architecture fitness checks in CI
+4. full operational standardization
+5. production-grade rate limiting
+6. performance hotspot reduction

 ## Success Criteria For The Next 60 Days

@@ -26,6 +26,17 @@
 No queued hardening slice is currently pinned in this document.
 Reassess after the current batch so the next item reflects the then-real highest-risk gap instead of stale cleanup residue.

+## Remaining Major Themes
+
+The small hardening slices are effectively exhausted.
+The remaining work is now structural rather than another quick batch:
+
+1. secrets and runtime configuration policy
+2. oversized router and UI decomposition
+3. production-grade rate limiting
+4. canonical image-based production delivery
+5. performance hotspot reduction
+
 ## Working Rule

 For the next batches, prefer work in this order:
@@ -1,7 +1,7 @@
 # Audience Scoping Backlog

 **Date:** 2026-03-30
-**Purpose:** Collect the remaining audience-scoping work into a single batch backlog so the small auth-hardening slices can be finished before broader architecture work starts.
+**Purpose:** Historical record of the audience-scoping hardening batch and its exit state before larger architecture work begins.

 ## Status Snapshot

@@ -19,7 +19,10 @@
 - `project.isImageGenConfigured`, `project.isDalleConfigured`: covered as authenticated low-risk configuration checks
 - `notification` self-service and manager boundaries: auth-covered across list, unread counts, reminders, deletes, broadcasts, task creation, and assignment boundaries
 - `assistant-tools` parity metadata: descriptions and parity assertions now match narrowed router audiences for resource overview, controller-only, self-service, and manager broadcast/task tools
- `comment` architecture phase 1: generic free-form entity comments replaced by an explicit supported-entity registry, currently limited to `estimate` with controller/manager/admin access plus entity-aware checks on list/count/create/resolve/delete
+- `comment` entity support now uses an explicit supported-entity registry with:
+  - `estimate` visibility for controller, manager, and admin
+  - `resource` visibility aligned to resource detail ownership and staff-access rules
+  - entity-scoped mention candidate lookup instead of the narrower assignment user directory

 ### Dirty Files To Avoid Mixing Into This Batch

@@ -30,7 +33,7 @@

 These files already have unrelated local edits. Audience parity work that would normally touch them should be deferred or handled through adjacent files and dedicated follow-up tests.

-## Remaining Categories
+## Final Batch Outcome

 ### Completed In This Batch

@@ -41,14 +44,17 @@ These files already have unrelated local edits. Audience parity work that would
 - `packages/api/src/router/resource.ts` -> `importSkillMatrix`
 - `packages/api/src/router/project.ts` -> `isImageGenConfigured`, `isDalleConfigured`

-### No Further Small Slices Currently Ready
+### No Further Small Slices Remain In This Batch

- the previously identified small hardening and tests/docs candidates have been completed, including the notification auth follow-up and assistant tool parity metadata cleanup
- the remaining audience work is now architectural (`comment.ts`) or depends on broader policy decisions rather than another ready-made auth slice
+- the previously identified small hardening and tests/docs candidates were completed, including the notification auth follow-up and assistant tool parity metadata cleanup
+- the formerly architectural `comment` follow-up is also completed through explicit entity onboarding and mention-audience alignment
+- no additional audience-scoping slice remains that is both small and isolated enough to justify another batch before larger architecture work

-## Recommended Next Order
+## Next Major Themes

-1. extend the comment entity registry only when a second real consumer exists and its backing audience is explicitly documented
+1. convert the still-open runtime secret model away from application-database centric storage
+2. add broader authorization regression coverage and long-lived guardrails around the narrowed route audiences
+3. reduce oversized routers and UI ownership surfaces so audience rules stay reviewable

 ## Slice Definition

@@ -67,3 +73,8 @@ Each “ready now” slice should follow the same template:
 - every formerly `ready now` route now has router-level authorization coverage or explicit low-risk documentation
 - the access matrix documents all low-risk exceptions explicitly
 - larger architecture work starts only after this batch is either completed or intentionally deferred
+
+Status:
+
+- this batch is complete
+- keep this file as a historical artifact, not as an active backlog