chore(repo): initialize planarchy workspace
This commit is contained in:
@@ -0,0 +1,533 @@
|
||||
# Planarchy V2 Architecture Proposal
|
||||
|
||||
**Date:** 2026-03-11
|
||||
**Scope:** Codebase review, v2 direction, architecture rethink, parallel agent strategy
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Planarchy already has a good base:
|
||||
- monorepo boundaries are mostly clean
|
||||
- `engine` and `staffing` contain useful pure domain logic
|
||||
- Next.js + tRPC + Prisma keeps product iteration fast
|
||||
- Redis-backed SSE is already a reasonable realtime baseline
|
||||
|
||||
The main issue is not the stack. The issue is that domain logic is split across:
|
||||
- large client components
|
||||
- large tRPC routers
|
||||
- JSONB-heavy persistence models
|
||||
- ad-hoc calculations in handlers
|
||||
|
||||
My recommendation for **v2** is:
|
||||
|
||||
1. **Do not jump to microservices yet.**
|
||||
2. **Do move to a modular monolith with a real application layer and async workers.**
|
||||
3. **Split “planning demand” from “actual assignments” at the data model level.**
|
||||
4. **Keep JSONB only for extensibility, not for core planning workflows.**
|
||||
5. **Introduce event/outbox-driven parallel agents for matching, conflicts, budget risk, notifications, and AI work.**
|
||||
|
||||
This gives you a v2 that is safer, easier to change, and still realistic for a small team.
|
||||
|
||||
---
|
||||
|
||||
## What The Codebase Does Well
|
||||
|
||||
- Domain packages are separated from the web app.
|
||||
- Shared types and schemas reduce transport mismatch.
|
||||
- Money is stored in integer cents.
|
||||
- The app stays operationally simple: one app, one DB, one Redis.
|
||||
- The timeline already has virtualization and SSE hooks, which means the product is past prototype stage.
|
||||
|
||||
---
|
||||
|
||||
## Current Pain Points
|
||||
|
||||
## 1. Critical correctness and security issues exist today
|
||||
|
||||
### Auth hashing is inconsistent
|
||||
- Login verifies Argon2 hashes in [`apps/web/src/server/auth.ts#L20`](/home/hartmut/Documents/Copilot/planarchy/apps/web/src/server/auth.ts#L20).
|
||||
- Admin-created users are still stored with SHA-256 in [`packages/api/src/router/user.ts#L41`](/home/hartmut/Documents/Copilot/planarchy/packages/api/src/router/user.ts#L41).
|
||||
- Impact: users created from the admin flow are likely unable to log in.
|
||||
|
||||
### Notification creation is open to any authenticated user
|
||||
- `notification.create` is only `protectedProcedure` in [`packages/api/src/router/notification.ts#L66`](/home/hartmut/Documents/Copilot/planarchy/packages/api/src/router/notification.ts#L66).
|
||||
- Impact: any logged-in user can create notifications for arbitrary users.
|
||||
|
||||
### AI connection testing is Azure-shaped even when provider is OpenAI
|
||||
- `testAiConnection` always constructs an Azure deployment URL in [`packages/api/src/router/settings.ts#L122`](/home/hartmut/Documents/Copilot/planarchy/packages/api/src/router/settings.ts#L122).
|
||||
- Impact: provider abstraction is not actually reliable.
|
||||
|
||||
### Repo health checks are currently failing
|
||||
- `pnpm test:unit` fails because `@planarchy/shared` has a Vitest script but no tests in [`packages/shared/package.json`](/home/hartmut/Documents/Copilot/planarchy/packages/shared/package.json).
|
||||
- `pnpm typecheck` fails because `crypto.randomUUID()` is used without a visible import/global typing in [`packages/shared/src/schemas/project.schema.ts#L5`](/home/hartmut/Documents/Copilot/planarchy/packages/shared/src/schemas/project.schema.ts#L5).
|
||||
|
||||
These are not “v2 someday” items. They should be fixed before deeper refactoring.
|
||||
|
||||
## 2. Large surfaces are carrying too much responsibility
|
||||
|
||||
The biggest modules are already a warning sign:
|
||||
- [`apps/web/src/components/timeline/TimelineView.tsx`](/home/hartmut/Documents/Copilot/planarchy/apps/web/src/components/timeline/TimelineView.tsx) is 1720 lines.
|
||||
- [`apps/web/src/components/projects/ProjectWizard.tsx`](/home/hartmut/Documents/Copilot/planarchy/apps/web/src/components/projects/ProjectWizard.tsx) is 1171 lines.
|
||||
- [`packages/api/src/router/resource.ts`](/home/hartmut/Documents/Copilot/planarchy/packages/api/src/router/resource.ts) is 908 lines.
|
||||
- [`packages/api/src/router/timeline.ts`](/home/hartmut/Documents/Copilot/planarchy/packages/api/src/router/timeline.ts) is 631 lines.
|
||||
|
||||
That usually means:
|
||||
- transport, orchestration, validation, business rules, and data access are mixed
|
||||
- testing becomes expensive
|
||||
- one change touches too many concerns
|
||||
|
||||
## 3. The core planning model is overloaded
|
||||
|
||||
The Prisma schema uses JSONB heavily in core workflows:
|
||||
- blueprints and role presets in [`packages/db/prisma/schema.prisma#L147`](/home/hartmut/Documents/Copilot/planarchy/packages/db/prisma/schema.prisma#L147)
|
||||
- resource availability, skills, and dynamic fields in [`packages/db/prisma/schema.prisma#L208`](/home/hartmut/Documents/Copilot/planarchy/packages/db/prisma/schema.prisma#L208)
|
||||
- project staffing requirements and dynamic fields in [`packages/db/prisma/schema.prisma#L267`](/home/hartmut/Documents/Copilot/planarchy/packages/db/prisma/schema.prisma#L267)
|
||||
- allocation metadata in [`packages/db/prisma/schema.prisma#L301`](/home/hartmut/Documents/Copilot/planarchy/packages/db/prisma/schema.prisma#L301)
|
||||
|
||||
The bigger modeling problem is that **`Allocation` currently represents both demand and assignment**:
|
||||
- placeholder demand is modeled with `resourceId = null`
|
||||
- headcount is stored on the same entity
|
||||
- legacy `role` text and `roleId` coexist
|
||||
|
||||
This is the wrong aggregate for v2.
|
||||
|
||||
## 4. Staffing logic is not yet trustworthy enough to become a differentiator
|
||||
|
||||
`staffing.getSuggestions` currently:
|
||||
- loads all active resources with overlapping allocations
|
||||
- computes utilization in the router
|
||||
- uses only Monday availability as the denominator in [`packages/api/src/router/staffing.ts#L45`](/home/hartmut/Documents/Copilot/planarchy/packages/api/src/router/staffing.ts#L45)
|
||||
|
||||
That means the suggestion layer is:
|
||||
- hard to scale
|
||||
- not consistent with calendar-aware engine logic
|
||||
- not a strong base for “AI-assisted staffing”
|
||||
|
||||
## 5. Routers are doing application-service work
|
||||
|
||||
Representative examples:
|
||||
- timeline queries and update workflows live directly in [`packages/api/src/router/timeline.ts#L12`](/home/hartmut/Documents/Copilot/planarchy/packages/api/src/router/timeline.ts#L12)
|
||||
- allocation creation, placeholder fill, validation, vacation handling, cost calc, audit log, and event emission all live in [`packages/api/src/router/allocation.ts#L8`](/home/hartmut/Documents/Copilot/planarchy/packages/api/src/router/allocation.ts#L8)
|
||||
|
||||
The pure `engine` package exists, but the application layer that should orchestrate it does not.
|
||||
|
||||
---
|
||||
|
||||
## Recommended V2 Architecture
|
||||
|
||||
## Core Decision
|
||||
|
||||
**V2 should be a modular monolith plus worker processes, not a microservice split.**
|
||||
|
||||
Why:
|
||||
- the product is still changing fast
|
||||
- most failures are domain modeling and module-boundary problems, not network topology problems
|
||||
- a microservice split would increase operational cost before domain seams are stable
|
||||
|
||||
### Target shape
|
||||
|
||||
```text
|
||||
apps/web
|
||||
-> UI + route handlers only
|
||||
|
||||
packages/api
|
||||
-> transport adapters only (tRPC procedures, auth boundary, DTO mapping)
|
||||
|
||||
packages/application
|
||||
-> use cases / command handlers / query handlers
|
||||
|
||||
packages/domain-people
|
||||
packages/domain-projects
|
||||
packages/domain-demand
|
||||
packages/domain-scheduling
|
||||
packages/domain-calendar
|
||||
packages/domain-notifications
|
||||
packages/domain-ai
|
||||
-> pure domain logic and policies
|
||||
|
||||
packages/infrastructure
|
||||
-> Prisma repos, Redis pub/sub, job queue, mail, AI clients
|
||||
|
||||
workers/agents
|
||||
-> async processors consuming outbox events and jobs
|
||||
```
|
||||
|
||||
The key change is: **routers stop containing business workflows**. They become thin.
|
||||
|
||||
---
|
||||
|
||||
## Data Model Changes For V2
|
||||
|
||||
## 1. Split demand from assignment
|
||||
|
||||
Replace the current overloaded `Allocation` concept with:
|
||||
|
||||
- `DemandRequirement`
|
||||
- projectId
|
||||
- roleId
|
||||
- requiredSkills
|
||||
- date range
|
||||
- hoursPerDay
|
||||
- headcount
|
||||
- priority
|
||||
- status
|
||||
|
||||
- `Assignment`
|
||||
- demandRequirementId nullable during migration
|
||||
- resourceId
|
||||
- projectId
|
||||
- date range
|
||||
- hoursPerDay
|
||||
- cost snapshot
|
||||
- status
|
||||
|
||||
- `AssignmentChange` or `AssignmentRevision`
|
||||
- audit-friendly timeline history
|
||||
- supports undo/redo and reasoning
|
||||
|
||||
This removes:
|
||||
- nullable resource meaning two different business states
|
||||
- headcount logic from real assignments
|
||||
- placeholder branching across the whole codebase
|
||||
|
||||
## 2. Normalize the skill model
|
||||
|
||||
Today `Resource.skills` is JSONB. For v2, use:
|
||||
- `Skill`
|
||||
- `ResourceSkill`
|
||||
- optional `RoleSkillProfile`
|
||||
|
||||
Keep JSONB only for imported raw skill matrix payloads if needed.
|
||||
|
||||
Benefits:
|
||||
- real filtering
|
||||
- better analytics
|
||||
- reusable recommendation features
|
||||
- explainable ranking
|
||||
|
||||
## 3. Normalize calendar capacity
|
||||
|
||||
Today availability is template-like JSON plus vacation overlays. For v2:
|
||||
- `AvailabilityTemplate`
|
||||
- `ResourceAvailabilityOverride`
|
||||
- `CalendarException`
|
||||
- `PublicHolidayCalendar`
|
||||
|
||||
This lets the engine answer:
|
||||
- “what is capacity on this exact date?”
|
||||
- “why is this person unavailable?”
|
||||
- “what changed after a vacation approval?”
|
||||
|
||||
## 4. Keep blueprints, but narrow their role
|
||||
|
||||
Blueprints should remain for:
|
||||
- custom fields
|
||||
- UI configuration
|
||||
- optional default demand templates
|
||||
|
||||
Blueprints should **not** continue to carry too much core planning state in JSONB.
|
||||
|
||||
## 5. Add an outbox
|
||||
|
||||
Introduce:
|
||||
- `DomainEventOutbox`
|
||||
- `Job`
|
||||
|
||||
Every important mutation writes:
|
||||
- domain row changes
|
||||
- audit row
|
||||
- outbox event
|
||||
|
||||
in one transaction.
|
||||
|
||||
That is the foundation for safe parallel agents.
|
||||
|
||||
---
|
||||
|
||||
## Application Layer Design
|
||||
|
||||
Every important user action should map to a use case, for example:
|
||||
|
||||
- `CreateProject`
|
||||
- `DefineDemand`
|
||||
- `AssignResource`
|
||||
- `MoveAssignment`
|
||||
- `ApproveVacation`
|
||||
- `ImportSkillMatrix`
|
||||
- `RecomputeValueScore`
|
||||
- `GenerateAiSummary`
|
||||
|
||||
Each use case should:
|
||||
- load aggregates via repositories
|
||||
- call pure domain policies
|
||||
- persist through a transaction
|
||||
- publish outbox events
|
||||
|
||||
Routers then become simple wrappers:
|
||||
- validate input
|
||||
- call use case
|
||||
- map result to DTO
|
||||
|
||||
This is the main architectural upgrade missing today.
|
||||
|
||||
---
|
||||
|
||||
## Query Side Design
|
||||
|
||||
V2 should use a **CQRS-lite** pattern:
|
||||
|
||||
- commands go through application services
|
||||
- heavy timeline/dashboard/staffing reads use query services or read models
|
||||
|
||||
Examples:
|
||||
- `timeline_read_model`
|
||||
- `resource_capacity_snapshot`
|
||||
- `project_budget_snapshot`
|
||||
- `staffing_candidate_snapshot`
|
||||
|
||||
These can start as SQL views/materialized views or dedicated query handlers. No need for a separate read database yet.
|
||||
|
||||
This is especially important because the timeline and dashboards are read-heavy and aggregate-heavy.
|
||||
|
||||
---
|
||||
|
||||
## Parallel Runtime Agents
|
||||
|
||||
These are the v2 agents I would actually build. They should run as worker processes consuming outbox events and job records.
|
||||
|
||||
## 1. Match Agent
|
||||
|
||||
Input:
|
||||
- `DemandRequirementCreated`
|
||||
- `DemandRequirementChanged`
|
||||
- `ResourceSkillChanged`
|
||||
- `CalendarChanged`
|
||||
|
||||
Output:
|
||||
- ranked candidate snapshots
|
||||
- recommendation explanations
|
||||
|
||||
Responsibility:
|
||||
- candidate filtering
|
||||
- deterministic scoring
|
||||
- optional AI explanation layer after deterministic ranking
|
||||
|
||||
## 2. Conflict Agent
|
||||
|
||||
Input:
|
||||
- `AssignmentCreated`
|
||||
- `AssignmentChanged`
|
||||
- `VacationApproved`
|
||||
- `CalendarExceptionChanged`
|
||||
|
||||
Output:
|
||||
- overallocation/conflict records
|
||||
- blocked-demand warnings
|
||||
|
||||
Responsibility:
|
||||
- recompute exact day-level conflicts
|
||||
- explain why a conflict exists
|
||||
|
||||
## 3. Budget Risk Agent
|
||||
|
||||
Input:
|
||||
- assignment changes
|
||||
- project budget changes
|
||||
- project date changes
|
||||
|
||||
Output:
|
||||
- burn snapshots
|
||||
- over-budget warnings
|
||||
- forecast deltas
|
||||
|
||||
Responsibility:
|
||||
- separate financial forecasting from request/response latency
|
||||
|
||||
## 4. Notification Agent
|
||||
|
||||
Input:
|
||||
- all user-visible domain events
|
||||
|
||||
Output:
|
||||
- in-app notifications
|
||||
- email sends
|
||||
- digest batches
|
||||
|
||||
Responsibility:
|
||||
- centralize fan-out
|
||||
- remove notification logic from feature routers
|
||||
|
||||
## 5. Import Agent
|
||||
|
||||
Input:
|
||||
- uploaded Excel/CSV/HRIS files
|
||||
|
||||
Output:
|
||||
- staged import rows
|
||||
- validation results
|
||||
- normalized upserts
|
||||
|
||||
Responsibility:
|
||||
- make imports resumable and auditable
|
||||
|
||||
## 6. AI Agent
|
||||
|
||||
Input:
|
||||
- explicit AI jobs only
|
||||
|
||||
Output:
|
||||
- summaries
|
||||
- staffing rationale
|
||||
- project risk narratives
|
||||
|
||||
Responsibility:
|
||||
- all model interaction happens asynchronously
|
||||
- stores prompt/result metadata for traceability
|
||||
|
||||
Important rule: **AI never becomes the system of record.** It annotates deterministic outputs.
|
||||
|
||||
---
|
||||
|
||||
## Parallel Build Workstreams
|
||||
|
||||
If you want to execute v2 with parallel coding agents, use these lanes to avoid file collisions.
|
||||
|
||||
## Agent A: Core Model Refactor
|
||||
|
||||
Owns:
|
||||
- `packages/db`
|
||||
- `packages/shared`
|
||||
- new domain packages
|
||||
|
||||
Tasks:
|
||||
- introduce `DemandRequirement`
|
||||
- introduce normalized skill/calendar models
|
||||
- add outbox and job tables
|
||||
- define new shared DTOs/events
|
||||
|
||||
## Agent B: Application Service Extraction
|
||||
|
||||
Owns:
|
||||
- `packages/application` new package
|
||||
- router-to-service extraction in `packages/api`
|
||||
|
||||
Tasks:
|
||||
- move create/update/fill/approve workflows out of routers
|
||||
- standardize transaction boundaries
|
||||
- standardize audit + outbox emission
|
||||
|
||||
## Agent C: Timeline V2
|
||||
|
||||
Owns:
|
||||
- `apps/web/src/components/timeline/*`
|
||||
- timeline read models and UI contracts
|
||||
|
||||
Tasks:
|
||||
- break `TimelineView` into screen shell + view model + row renderers
|
||||
- move timeline state machine into dedicated hooks/store
|
||||
- consume new query DTOs instead of raw Prisma-shaped payloads
|
||||
|
||||
## Agent D: Project Creation And Staffing UX
|
||||
|
||||
Owns:
|
||||
- `apps/web/src/components/projects/*`
|
||||
- staffing query DTO consumers
|
||||
|
||||
Tasks:
|
||||
- split `ProjectWizard`
|
||||
- convert wizard from local mega-state to step reducers / use cases
|
||||
- integrate recommendation snapshots from Match Agent
|
||||
|
||||
## Agent E: Security, Platform, And Notifications
|
||||
|
||||
Owns:
|
||||
- auth
|
||||
- user management
|
||||
- settings
|
||||
- notification workflows
|
||||
|
||||
Tasks:
|
||||
- unify password hashing
|
||||
- close permission gaps
|
||||
- move secret handling behind infrastructure services
|
||||
- wire Notification Agent
|
||||
|
||||
This split keeps most workstreams independent.
|
||||
|
||||
---
|
||||
|
||||
## Migration Plan
|
||||
|
||||
## Phase 0: Stabilize The Current System
|
||||
|
||||
Do this before any architecture refactor:
|
||||
|
||||
1. Fix user creation to use Argon2.
|
||||
2. Restrict `notification.create` to admin/system workflows.
|
||||
3. Fix `testAiConnection` to truly support both providers.
|
||||
4. Make `pnpm test:unit` and `pnpm typecheck` green again.
|
||||
5. Remove remaining legacy `role`/`roleId` ambiguity where possible.
|
||||
|
||||
## Phase 1: Extract The Application Layer
|
||||
|
||||
Without changing the UI yet:
|
||||
- add use-case services
|
||||
- move router logic into them
|
||||
- introduce outbox writes
|
||||
- standardize domain events
|
||||
|
||||
This phase creates the seam for the rest of v2.
|
||||
|
||||
## Phase 2: Introduce New Core Tables With Dual Write
|
||||
|
||||
- create `DemandRequirement`, normalized skills, normalized calendar tables
|
||||
- dual-write from old flows
|
||||
- build migration scripts and backfills
|
||||
- add compatibility query adapters
|
||||
|
||||
## Phase 3: Rebuild The Timeline And Wizard Against New Read Models
|
||||
|
||||
- timeline consumes query DTOs
|
||||
- wizard consumes demand/assignment APIs
|
||||
- staffing suggestions come from snapshots, not direct all-resource scans
|
||||
|
||||
## Phase 4: Turn On Parallel Agents
|
||||
|
||||
- Match Agent
|
||||
- Conflict Agent
|
||||
- Budget Risk Agent
|
||||
- Notification Agent
|
||||
- Import Agent
|
||||
- AI Agent
|
||||
|
||||
## Phase 5: Optional Service Extraction
|
||||
|
||||
Only after the domain seams hold:
|
||||
- extract workers into separate deployables if load justifies it
|
||||
- keep the transactional core close to the DB
|
||||
|
||||
---
|
||||
|
||||
## Recommended Immediate Improvement Backlog
|
||||
|
||||
If I had to choose the highest-leverage next moves:
|
||||
|
||||
1. Fix auth, notification permissions, AI test path, and broken repo checks.
|
||||
2. Create `packages/application` and move allocation/timeline/project workflows into it.
|
||||
3. Introduce `DemandRequirement` and stop using placeholder allocations as a dual-purpose model.
|
||||
4. Rebuild staffing suggestions around normalized skills + calendar-aware capacity.
|
||||
5. Split timeline and project wizard around view-model boundaries, not just JSX extraction.
|
||||
|
||||
---
|
||||
|
||||
## Bottom Line
|
||||
|
||||
**V2 should not be “more features on the current shape.”**
|
||||
It should be:
|
||||
|
||||
- a cleaner domain model
|
||||
- a thinner API layer
|
||||
- async agents for expensive side effects
|
||||
- read models for planning screens
|
||||
- normalized planning entities with JSONB reserved for extension points
|
||||
|
||||
That will make Planarchy better at the thing it claims to be: a planning system, not just a CRUD app with a timeline.
|
||||
Reference in New Issue
Block a user