chore(repo): initialize planarchy workspace

2026-03-14 14:31:09 +01:00
commit dd55d0e78b
769 changed files with 166461 additions and 0 deletions
@@ -0,0 +1,533 @@
+# Planarchy V2 Architecture Proposal
+
+**Date:** 2026-03-11  
+**Scope:** Codebase review, v2 direction, architecture rethink, parallel agent strategy
+
+## Executive Summary
+
+Planarchy already has a good base:
+- monorepo boundaries are mostly clean
+- `engine` and `staffing` contain useful pure domain logic
+- Next.js + tRPC + Prisma keeps product iteration fast
+- Redis-backed SSE is already a reasonable realtime baseline
+
+The main issue is not the stack. The issue is that domain logic is split across:
+- large client components
+- large tRPC routers
+- JSONB-heavy persistence models
+- ad-hoc calculations in handlers
+
+My recommendation for **v2** is:
+
+1. **Do not jump to microservices yet.**
+2. **Do move to a modular monolith with a real application layer and async workers.**
+3. **Split “planning demand” from “actual assignments” at the data model level.**
+4. **Keep JSONB only for extensibility, not for core planning workflows.**
+5. **Introduce event/outbox-driven parallel agents for matching, conflicts, budget risk, notifications, and AI work.**
+
+This gives you a v2 that is safer, easier to change, and still realistic for a small team.
+
+---
+
+## What The Codebase Does Well
+
+- Domain packages are separated from the web app.
+- Shared types and schemas reduce transport mismatch.
+- Money is stored in integer cents.
+- The app stays operationally simple: one app, one DB, one Redis.
+- The timeline already has virtualization and SSE hooks, which means the product is past prototype stage.
+
+---
+
+## Current Pain Points
+
+## 1. Critical correctness and security issues exist today
+
+### Auth hashing is inconsistent
+- Login verifies Argon2 hashes in [`apps/web/src/server/auth.ts#L20`](/home/hartmut/Documents/Copilot/planarchy/apps/web/src/server/auth.ts#L20).
+- Admin-created users are still stored with SHA-256 in [`packages/api/src/router/user.ts#L41`](/home/hartmut/Documents/Copilot/planarchy/packages/api/src/router/user.ts#L41).
+- Impact: users created from the admin flow are likely unable to log in.
+
+### Notification creation is open to any authenticated user
+- `notification.create` is only `protectedProcedure` in [`packages/api/src/router/notification.ts#L66`](/home/hartmut/Documents/Copilot/planarchy/packages/api/src/router/notification.ts#L66).
+- Impact: any logged-in user can create notifications for arbitrary users.
+
+### AI connection testing is Azure-shaped even when provider is OpenAI
+- `testAiConnection` always constructs an Azure deployment URL in [`packages/api/src/router/settings.ts#L122`](/home/hartmut/Documents/Copilot/planarchy/packages/api/src/router/settings.ts#L122).
+- Impact: provider abstraction is not actually reliable.
+
+### Repo health checks are currently failing
+- `pnpm test:unit` fails because `@planarchy/shared` has a Vitest script but no tests in [`packages/shared/package.json`](/home/hartmut/Documents/Copilot/planarchy/packages/shared/package.json).
+- `pnpm typecheck` fails because `crypto.randomUUID()` is used without a visible import/global typing in [`packages/shared/src/schemas/project.schema.ts#L5`](/home/hartmut/Documents/Copilot/planarchy/packages/shared/src/schemas/project.schema.ts#L5).
+
+These are not “v2 someday” items. They should be fixed before deeper refactoring.
+
+## 2. Large surfaces are carrying too much responsibility
+
+The biggest modules are already a warning sign:
+- [`apps/web/src/components/timeline/TimelineView.tsx`](/home/hartmut/Documents/Copilot/planarchy/apps/web/src/components/timeline/TimelineView.tsx) is 1720 lines.
+- [`apps/web/src/components/projects/ProjectWizard.tsx`](/home/hartmut/Documents/Copilot/planarchy/apps/web/src/components/projects/ProjectWizard.tsx) is 1171 lines.
+- [`packages/api/src/router/resource.ts`](/home/hartmut/Documents/Copilot/planarchy/packages/api/src/router/resource.ts) is 908 lines.
+- [`packages/api/src/router/timeline.ts`](/home/hartmut/Documents/Copilot/planarchy/packages/api/src/router/timeline.ts) is 631 lines.
+
+That usually means:
+- transport, orchestration, validation, business rules, and data access are mixed
+- testing becomes expensive
+- one change touches too many concerns
+
+## 3. The core planning model is overloaded
+
+The Prisma schema uses JSONB heavily in core workflows:
+- blueprints and role presets in [`packages/db/prisma/schema.prisma#L147`](/home/hartmut/Documents/Copilot/planarchy/packages/db/prisma/schema.prisma#L147)
+- resource availability, skills, and dynamic fields in [`packages/db/prisma/schema.prisma#L208`](/home/hartmut/Documents/Copilot/planarchy/packages/db/prisma/schema.prisma#L208)
+- project staffing requirements and dynamic fields in [`packages/db/prisma/schema.prisma#L267`](/home/hartmut/Documents/Copilot/planarchy/packages/db/prisma/schema.prisma#L267)
+- allocation metadata in [`packages/db/prisma/schema.prisma#L301`](/home/hartmut/Documents/Copilot/planarchy/packages/db/prisma/schema.prisma#L301)
+
+The bigger modeling problem is that **`Allocation` currently represents both demand and assignment**:
+- placeholder demand is modeled with `resourceId = null`
+- headcount is stored on the same entity
+- legacy `role` text and `roleId` coexist
+
+This is the wrong aggregate for v2.
+
+## 4. Staffing logic is not yet trustworthy enough to become a differentiator
+
+`staffing.getSuggestions` currently:
+- loads all active resources with overlapping allocations
+- computes utilization in the router
+- uses only Monday availability as the denominator in [`packages/api/src/router/staffing.ts#L45`](/home/hartmut/Documents/Copilot/planarchy/packages/api/src/router/staffing.ts#L45)
+
+That means the suggestion layer is:
+- hard to scale
+- not consistent with calendar-aware engine logic
+- not a strong base for “AI-assisted staffing”
+
+## 5. Routers are doing application-service work
+
+Representative examples:
+- timeline queries and update workflows live directly in [`packages/api/src/router/timeline.ts#L12`](/home/hartmut/Documents/Copilot/planarchy/packages/api/src/router/timeline.ts#L12)
+- allocation creation, placeholder fill, validation, vacation handling, cost calc, audit log, and event emission all live in [`packages/api/src/router/allocation.ts#L8`](/home/hartmut/Documents/Copilot/planarchy/packages/api/src/router/allocation.ts#L8)
+
+The pure `engine` package exists, but the application layer that should orchestrate it does not.
+
+---
+
+## Recommended V2 Architecture
+
+## Core Decision
+
+**V2 should be a modular monolith plus worker processes, not a microservice split.**
+
+Why:
+- the product is still changing fast
+- most failures are domain modeling and module-boundary problems, not network topology problems
+- a microservice split would increase operational cost before domain seams are stable
+
+### Target shape
+
+```text
+apps/web
+  -> UI + route handlers only
+
+packages/api
+  -> transport adapters only (tRPC procedures, auth boundary, DTO mapping)
+
+packages/application
+  -> use cases / command handlers / query handlers
+
+packages/domain-people
+packages/domain-projects
+packages/domain-demand
+packages/domain-scheduling
+packages/domain-calendar
+packages/domain-notifications
+packages/domain-ai
+  -> pure domain logic and policies
+
+packages/infrastructure
+  -> Prisma repos, Redis pub/sub, job queue, mail, AI clients
+
+workers/agents
+  -> async processors consuming outbox events and jobs
+```
+
+The key change is: **routers stop containing business workflows**. They become thin.
+
+---
+
+## Data Model Changes For V2
+
+## 1. Split demand from assignment
+
+Replace the current overloaded `Allocation` concept with:
+
+- `DemandRequirement`
+  - projectId
+  - roleId
+  - requiredSkills
+  - date range
+  - hoursPerDay
+  - headcount
+  - priority
+  - status
+
+- `Assignment`
+  - demandRequirementId nullable during migration
+  - resourceId
+  - projectId
+  - date range
+  - hoursPerDay
+  - cost snapshot
+  - status
+
+- `AssignmentChange` or `AssignmentRevision`
+  - audit-friendly timeline history
+  - supports undo/redo and reasoning
+
+This removes:
+- nullable resource meaning two different business states
+- headcount logic from real assignments
+- placeholder branching across the whole codebase
+
+## 2. Normalize the skill model
+
+Today `Resource.skills` is JSONB. For v2, use:
+- `Skill`
+- `ResourceSkill`
+- optional `RoleSkillProfile`
+
+Keep JSONB only for imported raw skill matrix payloads if needed.
+
+Benefits:
+- real filtering
+- better analytics
+- reusable recommendation features
+- explainable ranking
+
+## 3. Normalize calendar capacity
+
+Today availability is template-like JSON plus vacation overlays. For v2:
+- `AvailabilityTemplate`
+- `ResourceAvailabilityOverride`
+- `CalendarException`
+- `PublicHolidayCalendar`
+
+This lets the engine answer:
+- “what is capacity on this exact date?”
+- “why is this person unavailable?”
+- “what changed after a vacation approval?”
+
+## 4. Keep blueprints, but narrow their role
+
+Blueprints should remain for:
+- custom fields
+- UI configuration
+- optional default demand templates
+
+Blueprints should **not** continue to carry too much core planning state in JSONB.
+
+## 5. Add an outbox
+
+Introduce:
+- `DomainEventOutbox`
+- `Job`
+
+Every important mutation writes:
+- domain row changes
+- audit row
+- outbox event
+
+in one transaction.
+
+That is the foundation for safe parallel agents.
+
+---
+
+## Application Layer Design
+
+Every important user action should map to a use case, for example:
+
+- `CreateProject`
+- `DefineDemand`
+- `AssignResource`
+- `MoveAssignment`
+- `ApproveVacation`
+- `ImportSkillMatrix`
+- `RecomputeValueScore`
+- `GenerateAiSummary`
+
+Each use case should:
+- load aggregates via repositories
+- call pure domain policies
+- persist through a transaction
+- publish outbox events
+
+Routers then become simple wrappers:
+- validate input
+- call use case
+- map result to DTO
+
+This is the main architectural upgrade missing today.
+
+---
+
+## Query Side Design
+
+V2 should use a **CQRS-lite** pattern:
+
+- commands go through application services
+- heavy timeline/dashboard/staffing reads use query services or read models
+
+Examples:
+- `timeline_read_model`
+- `resource_capacity_snapshot`
+- `project_budget_snapshot`
+- `staffing_candidate_snapshot`
+
+These can start as SQL views/materialized views or dedicated query handlers. No need for a separate read database yet.
+
+This is especially important because the timeline and dashboards are read-heavy and aggregate-heavy.
+
+---
+
+## Parallel Runtime Agents
+
+These are the v2 agents I would actually build. They should run as worker processes consuming outbox events and job records.
+
+## 1. Match Agent
+
+Input:
+- `DemandRequirementCreated`
+- `DemandRequirementChanged`
+- `ResourceSkillChanged`
+- `CalendarChanged`
+
+Output:
+- ranked candidate snapshots
+- recommendation explanations
+
+Responsibility:
+- candidate filtering
+- deterministic scoring
+- optional AI explanation layer after deterministic ranking
+
+## 2. Conflict Agent
+
+Input:
+- `AssignmentCreated`
+- `AssignmentChanged`
+- `VacationApproved`
+- `CalendarExceptionChanged`
+
+Output:
+- overallocation/conflict records
+- blocked-demand warnings
+
+Responsibility:
+- recompute exact day-level conflicts
+- explain why a conflict exists
+
+## 3. Budget Risk Agent
+
+Input:
+- assignment changes
+- project budget changes
+- project date changes
+
+Output:
+- burn snapshots
+- over-budget warnings
+- forecast deltas
+
+Responsibility:
+- separate financial forecasting from request/response latency
+
+## 4. Notification Agent
+
+Input:
+- all user-visible domain events
+
+Output:
+- in-app notifications
+- email sends
+- digest batches
+
+Responsibility:
+- centralize fan-out
+- remove notification logic from feature routers
+
+## 5. Import Agent
+
+Input:
+- uploaded Excel/CSV/HRIS files
+
+Output:
+- staged import rows
+- validation results
+- normalized upserts
+
+Responsibility:
+- make imports resumable and auditable
+
+## 6. AI Agent
+
+Input:
+- explicit AI jobs only
+
+Output:
+- summaries
+- staffing rationale
+- project risk narratives
+
+Responsibility:
+- all model interaction happens asynchronously
+- stores prompt/result metadata for traceability
+
+Important rule: **AI never becomes the system of record.** It annotates deterministic outputs.
+
+---
+
+## Parallel Build Workstreams
+
+If you want to execute v2 with parallel coding agents, use these lanes to avoid file collisions.
+
+## Agent A: Core Model Refactor
+
+Owns:
+- `packages/db`
+- `packages/shared`
+- new domain packages
+
+Tasks:
+- introduce `DemandRequirement`
+- introduce normalized skill/calendar models
+- add outbox and job tables
+- define new shared DTOs/events
+
+## Agent B: Application Service Extraction
+
+Owns:
+- `packages/application` new package
+- router-to-service extraction in `packages/api`
+
+Tasks:
+- move create/update/fill/approve workflows out of routers
+- standardize transaction boundaries
+- standardize audit + outbox emission
+
+## Agent C: Timeline V2
+
+Owns:
+- `apps/web/src/components/timeline/*`
+- timeline read models and UI contracts
+
+Tasks:
+- break `TimelineView` into screen shell + view model + row renderers
+- move timeline state machine into dedicated hooks/store
+- consume new query DTOs instead of raw Prisma-shaped payloads
+
+## Agent D: Project Creation And Staffing UX
+
+Owns:
+- `apps/web/src/components/projects/*`
+- staffing query DTO consumers
+
+Tasks:
+- split `ProjectWizard`
+- convert wizard from local mega-state to step reducers / use cases
+- integrate recommendation snapshots from Match Agent
+
+## Agent E: Security, Platform, And Notifications
+
+Owns:
+- auth
+- user management
+- settings
+- notification workflows
+
+Tasks:
+- unify password hashing
+- close permission gaps
+- move secret handling behind infrastructure services
+- wire Notification Agent
+
+This split keeps most workstreams independent.
+
+---
+
+## Migration Plan
+
+## Phase 0: Stabilize The Current System
+
+Do this before any architecture refactor:
+
+1. Fix user creation to use Argon2.
+2. Restrict `notification.create` to admin/system workflows.
+3. Fix `testAiConnection` to truly support both providers.
+4. Make `pnpm test:unit` and `pnpm typecheck` green again.
+5. Remove remaining legacy `role`/`roleId` ambiguity where possible.
+
+## Phase 1: Extract The Application Layer
+
+Without changing the UI yet:
+- add use-case services
+- move router logic into them
+- introduce outbox writes
+- standardize domain events
+
+This phase creates the seam for the rest of v2.
+
+## Phase 2: Introduce New Core Tables With Dual Write
+
+- create `DemandRequirement`, normalized skills, normalized calendar tables
+- dual-write from old flows
+- build migration scripts and backfills
+- add compatibility query adapters
+
+## Phase 3: Rebuild The Timeline And Wizard Against New Read Models
+
+- timeline consumes query DTOs
+- wizard consumes demand/assignment APIs
+- staffing suggestions come from snapshots, not direct all-resource scans
+
+## Phase 4: Turn On Parallel Agents
+
+- Match Agent
+- Conflict Agent
+- Budget Risk Agent
+- Notification Agent
+- Import Agent
+- AI Agent
+
+## Phase 5: Optional Service Extraction
+
+Only after the domain seams hold:
+- extract workers into separate deployables if load justifies it
+- keep the transactional core close to the DB
+
+---
+
+## Recommended Immediate Improvement Backlog
+
+If I had to choose the highest-leverage next moves:
+
+1. Fix auth, notification permissions, AI test path, and broken repo checks.
+2. Create `packages/application` and move allocation/timeline/project workflows into it.
+3. Introduce `DemandRequirement` and stop using placeholder allocations as a dual-purpose model.
+4. Rebuild staffing suggestions around normalized skills + calendar-aware capacity.
+5. Split timeline and project wizard around view-model boundaries, not just JSX extraction.
+
+---
+
+## Bottom Line
+
+**V2 should not be “more features on the current shape.”**  
+It should be:
+
+- a cleaner domain model
+- a thinner API layer
+- async agents for expensive side effects
+- read models for planning screens
+- normalized planning entities with JSONB reserved for extension points
+
+That will make Planarchy better at the thing it claims to be: a planning system, not just a CRUD app with a timeline.