534 lines
16 KiB
Markdown
534 lines
16 KiB
Markdown
# Planarchy V2 Architecture Proposal
|
|
|
|
**Date:** 2026-03-11
|
|
**Scope:** Codebase review, v2 direction, architecture rethink, parallel agent strategy
|
|
|
|
## Executive Summary
|
|
|
|
Planarchy already has a good base:
|
|
- monorepo boundaries are mostly clean
|
|
- `engine` and `staffing` contain useful pure domain logic
|
|
- Next.js + tRPC + Prisma keeps product iteration fast
|
|
- Redis-backed SSE is already a reasonable realtime baseline
|
|
|
|
The main issue is not the stack. The issue is that domain logic is split across:
|
|
- large client components
|
|
- large tRPC routers
|
|
- JSONB-heavy persistence models
|
|
- ad-hoc calculations in handlers
|
|
|
|
My recommendation for **v2** is:
|
|
|
|
1. **Do not jump to microservices yet.**
|
|
2. **Do move to a modular monolith with a real application layer and async workers.**
|
|
3. **Split “planning demand” from “actual assignments” at the data model level.**
|
|
4. **Keep JSONB only for extensibility, not for core planning workflows.**
|
|
5. **Introduce event/outbox-driven parallel agents for matching, conflicts, budget risk, notifications, and AI work.**
|
|
|
|
This gives you a v2 that is safer, easier to change, and still realistic for a small team.
|
|
|
|
---
|
|
|
|
## What The Codebase Does Well
|
|
|
|
- Domain packages are separated from the web app.
|
|
- Shared types and schemas reduce transport mismatch.
|
|
- Money is stored in integer cents.
|
|
- The app stays operationally simple: one app, one DB, one Redis.
|
|
- The timeline already has virtualization and SSE hooks, which means the product is past prototype stage.
|
|
|
|
---
|
|
|
|
## Current Pain Points
|
|
|
|
## 1. Critical correctness and security issues exist today
|
|
|
|
### Auth hashing is inconsistent
|
|
- Login verifies Argon2 hashes in [`apps/web/src/server/auth.ts#L20`](/home/hartmut/Documents/Copilot/planarchy/apps/web/src/server/auth.ts#L20).
|
|
- Admin-created users are still stored with SHA-256 in [`packages/api/src/router/user.ts#L41`](/home/hartmut/Documents/Copilot/planarchy/packages/api/src/router/user.ts#L41).
|
|
- Impact: users created from the admin flow are likely unable to log in.
|
|
|
|
### Notification creation is open to any authenticated user
|
|
- `notification.create` is only `protectedProcedure` in [`packages/api/src/router/notification.ts#L66`](/home/hartmut/Documents/Copilot/planarchy/packages/api/src/router/notification.ts#L66).
|
|
- Impact: any logged-in user can create notifications for arbitrary users.
|
|
|
|
### AI connection testing is Azure-shaped even when provider is OpenAI
|
|
- `testAiConnection` always constructs an Azure deployment URL in [`packages/api/src/router/settings.ts#L122`](/home/hartmut/Documents/Copilot/planarchy/packages/api/src/router/settings.ts#L122).
|
|
- Impact: provider abstraction is not actually reliable.
|
|
|
|
### Repo health checks are currently failing
|
|
- `pnpm test:unit` fails because `@planarchy/shared` has a Vitest script but no tests in [`packages/shared/package.json`](/home/hartmut/Documents/Copilot/planarchy/packages/shared/package.json).
|
|
- `pnpm typecheck` fails because `crypto.randomUUID()` is used without a visible import/global typing in [`packages/shared/src/schemas/project.schema.ts#L5`](/home/hartmut/Documents/Copilot/planarchy/packages/shared/src/schemas/project.schema.ts#L5).
|
|
|
|
These are not “v2 someday” items. They should be fixed before deeper refactoring.
|
|
|
|
## 2. Large surfaces are carrying too much responsibility
|
|
|
|
The biggest modules are already a warning sign:
|
|
- [`apps/web/src/components/timeline/TimelineView.tsx`](/home/hartmut/Documents/Copilot/planarchy/apps/web/src/components/timeline/TimelineView.tsx) is 1720 lines.
|
|
- [`apps/web/src/components/projects/ProjectWizard.tsx`](/home/hartmut/Documents/Copilot/planarchy/apps/web/src/components/projects/ProjectWizard.tsx) is 1171 lines.
|
|
- [`packages/api/src/router/resource.ts`](/home/hartmut/Documents/Copilot/planarchy/packages/api/src/router/resource.ts) is 908 lines.
|
|
- [`packages/api/src/router/timeline.ts`](/home/hartmut/Documents/Copilot/planarchy/packages/api/src/router/timeline.ts) is 631 lines.
|
|
|
|
That usually means:
|
|
- transport, orchestration, validation, business rules, and data access are mixed
|
|
- testing becomes expensive
|
|
- one change touches too many concerns
|
|
|
|
## 3. The core planning model is overloaded
|
|
|
|
The Prisma schema uses JSONB heavily in core workflows:
|
|
- blueprints and role presets in [`packages/db/prisma/schema.prisma#L147`](/home/hartmut/Documents/Copilot/planarchy/packages/db/prisma/schema.prisma#L147)
|
|
- resource availability, skills, and dynamic fields in [`packages/db/prisma/schema.prisma#L208`](/home/hartmut/Documents/Copilot/planarchy/packages/db/prisma/schema.prisma#L208)
|
|
- project staffing requirements and dynamic fields in [`packages/db/prisma/schema.prisma#L267`](/home/hartmut/Documents/Copilot/planarchy/packages/db/prisma/schema.prisma#L267)
|
|
- allocation metadata in [`packages/db/prisma/schema.prisma#L301`](/home/hartmut/Documents/Copilot/planarchy/packages/db/prisma/schema.prisma#L301)
|
|
|
|
The bigger modeling problem is that **`Allocation` currently represents both demand and assignment**:
|
|
- placeholder demand is modeled with `resourceId = null`
|
|
- headcount is stored on the same entity
|
|
- legacy `role` text and `roleId` coexist
|
|
|
|
This is the wrong aggregate for v2.
|
|
|
|
## 4. Staffing logic is not yet trustworthy enough to become a differentiator
|
|
|
|
`staffing.getSuggestions` currently:
|
|
- loads all active resources with overlapping allocations
|
|
- computes utilization in the router
|
|
- uses only Monday availability as the denominator in [`packages/api/src/router/staffing.ts#L45`](/home/hartmut/Documents/Copilot/planarchy/packages/api/src/router/staffing.ts#L45)
|
|
|
|
That means the suggestion layer is:
|
|
- hard to scale
|
|
- not consistent with calendar-aware engine logic
|
|
- not a strong base for “AI-assisted staffing”
|
|
|
|
## 5. Routers are doing application-service work
|
|
|
|
Representative examples:
|
|
- timeline queries and update workflows live directly in [`packages/api/src/router/timeline.ts#L12`](/home/hartmut/Documents/Copilot/planarchy/packages/api/src/router/timeline.ts#L12)
|
|
- allocation creation, placeholder fill, validation, vacation handling, cost calc, audit log, and event emission all live in [`packages/api/src/router/allocation.ts#L8`](/home/hartmut/Documents/Copilot/planarchy/packages/api/src/router/allocation.ts#L8)
|
|
|
|
The pure `engine` package exists, but the application layer that should orchestrate it does not.
|
|
|
|
---
|
|
|
|
## Recommended V2 Architecture
|
|
|
|
## Core Decision
|
|
|
|
**V2 should be a modular monolith plus worker processes, not a microservice split.**
|
|
|
|
Why:
|
|
- the product is still changing fast
|
|
- most failures are domain modeling and module-boundary problems, not network topology problems
|
|
- a microservice split would increase operational cost before domain seams are stable
|
|
|
|
### Target shape
|
|
|
|
```text
|
|
apps/web
|
|
-> UI + route handlers only
|
|
|
|
packages/api
|
|
-> transport adapters only (tRPC procedures, auth boundary, DTO mapping)
|
|
|
|
packages/application
|
|
-> use cases / command handlers / query handlers
|
|
|
|
packages/domain-people
|
|
packages/domain-projects
|
|
packages/domain-demand
|
|
packages/domain-scheduling
|
|
packages/domain-calendar
|
|
packages/domain-notifications
|
|
packages/domain-ai
|
|
-> pure domain logic and policies
|
|
|
|
packages/infrastructure
|
|
-> Prisma repos, Redis pub/sub, job queue, mail, AI clients
|
|
|
|
workers/agents
|
|
-> async processors consuming outbox events and jobs
|
|
```
|
|
|
|
The key change is: **routers stop containing business workflows**. They become thin.
|
|
|
|
---
|
|
|
|
## Data Model Changes For V2
|
|
|
|
## 1. Split demand from assignment
|
|
|
|
Replace the current overloaded `Allocation` concept with:
|
|
|
|
- `DemandRequirement`
|
|
- projectId
|
|
- roleId
|
|
- requiredSkills
|
|
- date range
|
|
- hoursPerDay
|
|
- headcount
|
|
- priority
|
|
- status
|
|
|
|
- `Assignment`
|
|
- demandRequirementId nullable during migration
|
|
- resourceId
|
|
- projectId
|
|
- date range
|
|
- hoursPerDay
|
|
- cost snapshot
|
|
- status
|
|
|
|
- `AssignmentChange` or `AssignmentRevision`
|
|
- audit-friendly timeline history
|
|
- supports undo/redo and reasoning
|
|
|
|
This removes:
|
|
- nullable resource meaning two different business states
|
|
- headcount logic from real assignments
|
|
- placeholder branching across the whole codebase
|
|
|
|
## 2. Normalize the skill model
|
|
|
|
Today `Resource.skills` is JSONB. For v2, use:
|
|
- `Skill`
|
|
- `ResourceSkill`
|
|
- optional `RoleSkillProfile`
|
|
|
|
Keep JSONB only for imported raw skill matrix payloads if needed.
|
|
|
|
Benefits:
|
|
- real filtering
|
|
- better analytics
|
|
- reusable recommendation features
|
|
- explainable ranking
|
|
|
|
## 3. Normalize calendar capacity
|
|
|
|
Today availability is template-like JSON plus vacation overlays. For v2:
|
|
- `AvailabilityTemplate`
|
|
- `ResourceAvailabilityOverride`
|
|
- `CalendarException`
|
|
- `PublicHolidayCalendar`
|
|
|
|
This lets the engine answer:
|
|
- “what is capacity on this exact date?”
|
|
- “why is this person unavailable?”
|
|
- “what changed after a vacation approval?”
|
|
|
|
## 4. Keep blueprints, but narrow their role
|
|
|
|
Blueprints should remain for:
|
|
- custom fields
|
|
- UI configuration
|
|
- optional default demand templates
|
|
|
|
Blueprints should **not** continue to carry too much core planning state in JSONB.
|
|
|
|
## 5. Add an outbox
|
|
|
|
Introduce:
|
|
- `DomainEventOutbox`
|
|
- `Job`
|
|
|
|
Every important mutation writes:
|
|
- domain row changes
|
|
- audit row
|
|
- outbox event
|
|
|
|
in one transaction.
|
|
|
|
That is the foundation for safe parallel agents.
|
|
|
|
---
|
|
|
|
## Application Layer Design
|
|
|
|
Every important user action should map to a use case, for example:
|
|
|
|
- `CreateProject`
|
|
- `DefineDemand`
|
|
- `AssignResource`
|
|
- `MoveAssignment`
|
|
- `ApproveVacation`
|
|
- `ImportSkillMatrix`
|
|
- `RecomputeValueScore`
|
|
- `GenerateAiSummary`
|
|
|
|
Each use case should:
|
|
- load aggregates via repositories
|
|
- call pure domain policies
|
|
- persist through a transaction
|
|
- publish outbox events
|
|
|
|
Routers then become simple wrappers:
|
|
- validate input
|
|
- call use case
|
|
- map result to DTO
|
|
|
|
This is the main architectural upgrade missing today.
|
|
|
|
---
|
|
|
|
## Query Side Design
|
|
|
|
V2 should use a **CQRS-lite** pattern:
|
|
|
|
- commands go through application services
|
|
- heavy timeline/dashboard/staffing reads use query services or read models
|
|
|
|
Examples:
|
|
- `timeline_read_model`
|
|
- `resource_capacity_snapshot`
|
|
- `project_budget_snapshot`
|
|
- `staffing_candidate_snapshot`
|
|
|
|
These can start as SQL views/materialized views or dedicated query handlers. No need for a separate read database yet.
|
|
|
|
This is especially important because the timeline and dashboards are read-heavy and aggregate-heavy.
|
|
|
|
---
|
|
|
|
## Parallel Runtime Agents
|
|
|
|
These are the v2 agents I would actually build. They should run as worker processes consuming outbox events and job records.
|
|
|
|
## 1. Match Agent
|
|
|
|
Input:
|
|
- `DemandRequirementCreated`
|
|
- `DemandRequirementChanged`
|
|
- `ResourceSkillChanged`
|
|
- `CalendarChanged`
|
|
|
|
Output:
|
|
- ranked candidate snapshots
|
|
- recommendation explanations
|
|
|
|
Responsibility:
|
|
- candidate filtering
|
|
- deterministic scoring
|
|
- optional AI explanation layer after deterministic ranking
|
|
|
|
## 2. Conflict Agent
|
|
|
|
Input:
|
|
- `AssignmentCreated`
|
|
- `AssignmentChanged`
|
|
- `VacationApproved`
|
|
- `CalendarExceptionChanged`
|
|
|
|
Output:
|
|
- overallocation/conflict records
|
|
- blocked-demand warnings
|
|
|
|
Responsibility:
|
|
- recompute exact day-level conflicts
|
|
- explain why a conflict exists
|
|
|
|
## 3. Budget Risk Agent
|
|
|
|
Input:
|
|
- assignment changes
|
|
- project budget changes
|
|
- project date changes
|
|
|
|
Output:
|
|
- burn snapshots
|
|
- over-budget warnings
|
|
- forecast deltas
|
|
|
|
Responsibility:
|
|
- separate financial forecasting from request/response latency
|
|
|
|
## 4. Notification Agent
|
|
|
|
Input:
|
|
- all user-visible domain events
|
|
|
|
Output:
|
|
- in-app notifications
|
|
- email sends
|
|
- digest batches
|
|
|
|
Responsibility:
|
|
- centralize fan-out
|
|
- remove notification logic from feature routers
|
|
|
|
## 5. Import Agent
|
|
|
|
Input:
|
|
- uploaded Excel/CSV/HRIS files
|
|
|
|
Output:
|
|
- staged import rows
|
|
- validation results
|
|
- normalized upserts
|
|
|
|
Responsibility:
|
|
- make imports resumable and auditable
|
|
|
|
## 6. AI Agent
|
|
|
|
Input:
|
|
- explicit AI jobs only
|
|
|
|
Output:
|
|
- summaries
|
|
- staffing rationale
|
|
- project risk narratives
|
|
|
|
Responsibility:
|
|
- all model interaction happens asynchronously
|
|
- stores prompt/result metadata for traceability
|
|
|
|
Important rule: **AI never becomes the system of record.** It annotates deterministic outputs.
|
|
|
|
---
|
|
|
|
## Parallel Build Workstreams
|
|
|
|
If you want to execute v2 with parallel coding agents, use these lanes to avoid file collisions.
|
|
|
|
## Agent A: Core Model Refactor
|
|
|
|
Owns:
|
|
- `packages/db`
|
|
- `packages/shared`
|
|
- new domain packages
|
|
|
|
Tasks:
|
|
- introduce `DemandRequirement`
|
|
- introduce normalized skill/calendar models
|
|
- add outbox and job tables
|
|
- define new shared DTOs/events
|
|
|
|
## Agent B: Application Service Extraction
|
|
|
|
Owns:
|
|
- `packages/application` new package
|
|
- router-to-service extraction in `packages/api`
|
|
|
|
Tasks:
|
|
- move create/update/fill/approve workflows out of routers
|
|
- standardize transaction boundaries
|
|
- standardize audit + outbox emission
|
|
|
|
## Agent C: Timeline V2
|
|
|
|
Owns:
|
|
- `apps/web/src/components/timeline/*`
|
|
- timeline read models and UI contracts
|
|
|
|
Tasks:
|
|
- break `TimelineView` into screen shell + view model + row renderers
|
|
- move timeline state machine into dedicated hooks/store
|
|
- consume new query DTOs instead of raw Prisma-shaped payloads
|
|
|
|
## Agent D: Project Creation And Staffing UX
|
|
|
|
Owns:
|
|
- `apps/web/src/components/projects/*`
|
|
- staffing query DTO consumers
|
|
|
|
Tasks:
|
|
- split `ProjectWizard`
|
|
- convert wizard from local mega-state to step reducers / use cases
|
|
- integrate recommendation snapshots from Match Agent
|
|
|
|
## Agent E: Security, Platform, And Notifications
|
|
|
|
Owns:
|
|
- auth
|
|
- user management
|
|
- settings
|
|
- notification workflows
|
|
|
|
Tasks:
|
|
- unify password hashing
|
|
- close permission gaps
|
|
- move secret handling behind infrastructure services
|
|
- wire Notification Agent
|
|
|
|
This split keeps most workstreams independent.
|
|
|
|
---
|
|
|
|
## Migration Plan
|
|
|
|
## Phase 0: Stabilize The Current System
|
|
|
|
Do this before any architecture refactor:
|
|
|
|
1. Fix user creation to use Argon2.
|
|
2. Restrict `notification.create` to admin/system workflows.
|
|
3. Fix `testAiConnection` to truly support both providers.
|
|
4. Make `pnpm test:unit` and `pnpm typecheck` green again.
|
|
5. Remove remaining legacy `role`/`roleId` ambiguity where possible.
|
|
|
|
## Phase 1: Extract The Application Layer
|
|
|
|
Without changing the UI yet:
|
|
- add use-case services
|
|
- move router logic into them
|
|
- introduce outbox writes
|
|
- standardize domain events
|
|
|
|
This phase creates the seam for the rest of v2.
|
|
|
|
## Phase 2: Introduce New Core Tables With Dual Write
|
|
|
|
- create `DemandRequirement`, normalized skills, normalized calendar tables
|
|
- dual-write from old flows
|
|
- build migration scripts and backfills
|
|
- add compatibility query adapters
|
|
|
|
## Phase 3: Rebuild The Timeline And Wizard Against New Read Models
|
|
|
|
- timeline consumes query DTOs
|
|
- wizard consumes demand/assignment APIs
|
|
- staffing suggestions come from snapshots, not direct all-resource scans
|
|
|
|
## Phase 4: Turn On Parallel Agents
|
|
|
|
- Match Agent
|
|
- Conflict Agent
|
|
- Budget Risk Agent
|
|
- Notification Agent
|
|
- Import Agent
|
|
- AI Agent
|
|
|
|
## Phase 5: Optional Service Extraction
|
|
|
|
Only after the domain seams hold:
|
|
- extract workers into separate deployables if load justifies it
|
|
- keep the transactional core close to the DB
|
|
|
|
---
|
|
|
|
## Recommended Immediate Improvement Backlog
|
|
|
|
If I had to choose the highest-leverage next moves:
|
|
|
|
1. Fix auth, notification permissions, AI test path, and broken repo checks.
|
|
2. Create `packages/application` and move allocation/timeline/project workflows into it.
|
|
3. Introduce `DemandRequirement` and stop using placeholder allocations as a dual-purpose model.
|
|
4. Rebuild staffing suggestions around normalized skills + calendar-aware capacity.
|
|
5. Split timeline and project wizard around view-model boundaries, not just JSX extraction.
|
|
|
|
---
|
|
|
|
## Bottom Line
|
|
|
|
**V2 should not be “more features on the current shape.”**
|
|
It should be:
|
|
|
|
- a cleaner domain model
|
|
- a thinner API layer
|
|
- async agents for expensive side effects
|
|
- read models for planning screens
|
|
- normalized planning entities with JSONB reserved for extension points
|
|
|
|
That will make Planarchy better at the thing it claims to be: a planning system, not just a CRUD app with a timeline.
|