Files
CapaKraken/research/v2-architecture-proposal-2026-03-11.md
T
Hartmut cd78f72f33 chore: full technical rename planarchy → capakraken
Complete rename of all technical identifiers across the codebase:

Package names (11 packages):
- @planarchy/* → @capakraken/* in all package.json, tsconfig, imports

Import statements: 277 files, 548 occurrences replaced

Database & Docker:
- PostgreSQL user/db: planarchy → capakraken
- Docker volumes: planarchy_pgdata → capakraken_pgdata
- Connection strings updated in docker-compose, .env, CI

CI/CD:
- GitHub Actions workflow: all filter commands updated
- Test database credentials updated

Infrastructure:
- Redis channel: planarchy:sse → capakraken:sse
- Logger service name: planarchy-api → capakraken-api
- Anonymization seed updated
- Start/stop/restart scripts updated

Test data:
- Seed emails: @planarchy.dev → @capakraken.dev
- E2E test credentials: all 11 spec files updated
- Email defaults: @planarchy.app → @capakraken.app
- localStorage keys: planarchy_* → capakraken_*

Documentation: 30+ .md files updated

Verification:
- pnpm install: workspace resolution works
- TypeScript: only pre-existing TS2589 (no new errors)
- Engine: 310/310 tests pass
- Staffing: 37/37 tests pass

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-03-27 13:18:09 +01:00

534 lines
16 KiB
Markdown

# Planarchy V2 Architecture Proposal
**Date:** 2026-03-11
**Scope:** Codebase review, v2 direction, architecture rethink, parallel agent strategy
## Executive Summary
Planarchy already has a good base:
- monorepo boundaries are mostly clean
- `engine` and `staffing` contain useful pure domain logic
- Next.js + tRPC + Prisma keeps product iteration fast
- Redis-backed SSE is already a reasonable realtime baseline
The main issue is not the stack. The issue is that domain logic is split across:
- large client components
- large tRPC routers
- JSONB-heavy persistence models
- ad-hoc calculations in handlers
My recommendation for **v2** is:
1. **Do not jump to microservices yet.**
2. **Do move to a modular monolith with a real application layer and async workers.**
3. **Split “planning demand” from “actual assignments” at the data model level.**
4. **Keep JSONB only for extensibility, not for core planning workflows.**
5. **Introduce event/outbox-driven parallel agents for matching, conflicts, budget risk, notifications, and AI work.**
This gives you a v2 that is safer, easier to change, and still realistic for a small team.
---
## What The Codebase Does Well
- Domain packages are separated from the web app.
- Shared types and schemas reduce transport mismatch.
- Money is stored in integer cents.
- The app stays operationally simple: one app, one DB, one Redis.
- The timeline already has virtualization and SSE hooks, which means the product is past prototype stage.
---
## Current Pain Points
## 1. Critical correctness and security issues exist today
### Auth hashing is inconsistent
- Login verifies Argon2 hashes in [`apps/web/src/server/auth.ts#L20`](/home/hartmut/Documents/Copilot/planarchy/apps/web/src/server/auth.ts#L20).
- Admin-created users are still stored with SHA-256 in [`packages/api/src/router/user.ts#L41`](/home/hartmut/Documents/Copilot/planarchy/packages/api/src/router/user.ts#L41).
- Impact: users created from the admin flow are likely unable to log in.
### Notification creation is open to any authenticated user
- `notification.create` is only `protectedProcedure` in [`packages/api/src/router/notification.ts#L66`](/home/hartmut/Documents/Copilot/planarchy/packages/api/src/router/notification.ts#L66).
- Impact: any logged-in user can create notifications for arbitrary users.
### AI connection testing is Azure-shaped even when provider is OpenAI
- `testAiConnection` always constructs an Azure deployment URL in [`packages/api/src/router/settings.ts#L122`](/home/hartmut/Documents/Copilot/planarchy/packages/api/src/router/settings.ts#L122).
- Impact: provider abstraction is not actually reliable.
### Repo health checks are currently failing
- `pnpm test:unit` fails because `@capakraken/shared` has a Vitest script but no tests in [`packages/shared/package.json`](/home/hartmut/Documents/Copilot/planarchy/packages/shared/package.json).
- `pnpm typecheck` fails because `crypto.randomUUID()` is used without a visible import/global typing in [`packages/shared/src/schemas/project.schema.ts#L5`](/home/hartmut/Documents/Copilot/planarchy/packages/shared/src/schemas/project.schema.ts#L5).
These are not “v2 someday” items. They should be fixed before deeper refactoring.
## 2. Large surfaces are carrying too much responsibility
The biggest modules are already a warning sign:
- [`apps/web/src/components/timeline/TimelineView.tsx`](/home/hartmut/Documents/Copilot/planarchy/apps/web/src/components/timeline/TimelineView.tsx) is 1720 lines.
- [`apps/web/src/components/projects/ProjectWizard.tsx`](/home/hartmut/Documents/Copilot/planarchy/apps/web/src/components/projects/ProjectWizard.tsx) is 1171 lines.
- [`packages/api/src/router/resource.ts`](/home/hartmut/Documents/Copilot/planarchy/packages/api/src/router/resource.ts) is 908 lines.
- [`packages/api/src/router/timeline.ts`](/home/hartmut/Documents/Copilot/planarchy/packages/api/src/router/timeline.ts) is 631 lines.
That usually means:
- transport, orchestration, validation, business rules, and data access are mixed
- testing becomes expensive
- one change touches too many concerns
## 3. The core planning model is overloaded
The Prisma schema uses JSONB heavily in core workflows:
- blueprints and role presets in [`packages/db/prisma/schema.prisma#L147`](/home/hartmut/Documents/Copilot/planarchy/packages/db/prisma/schema.prisma#L147)
- resource availability, skills, and dynamic fields in [`packages/db/prisma/schema.prisma#L208`](/home/hartmut/Documents/Copilot/planarchy/packages/db/prisma/schema.prisma#L208)
- project staffing requirements and dynamic fields in [`packages/db/prisma/schema.prisma#L267`](/home/hartmut/Documents/Copilot/planarchy/packages/db/prisma/schema.prisma#L267)
- allocation metadata in [`packages/db/prisma/schema.prisma#L301`](/home/hartmut/Documents/Copilot/planarchy/packages/db/prisma/schema.prisma#L301)
The bigger modeling problem is that **`Allocation` currently represents both demand and assignment**:
- placeholder demand is modeled with `resourceId = null`
- headcount is stored on the same entity
- legacy `role` text and `roleId` coexist
This is the wrong aggregate for v2.
## 4. Staffing logic is not yet trustworthy enough to become a differentiator
`staffing.getSuggestions` currently:
- loads all active resources with overlapping allocations
- computes utilization in the router
- uses only Monday availability as the denominator in [`packages/api/src/router/staffing.ts#L45`](/home/hartmut/Documents/Copilot/planarchy/packages/api/src/router/staffing.ts#L45)
That means the suggestion layer is:
- hard to scale
- not consistent with calendar-aware engine logic
- not a strong base for “AI-assisted staffing”
## 5. Routers are doing application-service work
Representative examples:
- timeline queries and update workflows live directly in [`packages/api/src/router/timeline.ts#L12`](/home/hartmut/Documents/Copilot/planarchy/packages/api/src/router/timeline.ts#L12)
- allocation creation, placeholder fill, validation, vacation handling, cost calc, audit log, and event emission all live in [`packages/api/src/router/allocation.ts#L8`](/home/hartmut/Documents/Copilot/planarchy/packages/api/src/router/allocation.ts#L8)
The pure `engine` package exists, but the application layer that should orchestrate it does not.
---
## Recommended V2 Architecture
## Core Decision
**V2 should be a modular monolith plus worker processes, not a microservice split.**
Why:
- the product is still changing fast
- most failures are domain modeling and module-boundary problems, not network topology problems
- a microservice split would increase operational cost before domain seams are stable
### Target shape
```text
apps/web
-> UI + route handlers only
packages/api
-> transport adapters only (tRPC procedures, auth boundary, DTO mapping)
packages/application
-> use cases / command handlers / query handlers
packages/domain-people
packages/domain-projects
packages/domain-demand
packages/domain-scheduling
packages/domain-calendar
packages/domain-notifications
packages/domain-ai
-> pure domain logic and policies
packages/infrastructure
-> Prisma repos, Redis pub/sub, job queue, mail, AI clients
workers/agents
-> async processors consuming outbox events and jobs
```
The key change is: **routers stop containing business workflows**. They become thin.
---
## Data Model Changes For V2
## 1. Split demand from assignment
Replace the current overloaded `Allocation` concept with:
- `DemandRequirement`
- projectId
- roleId
- requiredSkills
- date range
- hoursPerDay
- headcount
- priority
- status
- `Assignment`
- demandRequirementId nullable during migration
- resourceId
- projectId
- date range
- hoursPerDay
- cost snapshot
- status
- `AssignmentChange` or `AssignmentRevision`
- audit-friendly timeline history
- supports undo/redo and reasoning
This removes:
- nullable resource meaning two different business states
- headcount logic from real assignments
- placeholder branching across the whole codebase
## 2. Normalize the skill model
Today `Resource.skills` is JSONB. For v2, use:
- `Skill`
- `ResourceSkill`
- optional `RoleSkillProfile`
Keep JSONB only for imported raw skill matrix payloads if needed.
Benefits:
- real filtering
- better analytics
- reusable recommendation features
- explainable ranking
## 3. Normalize calendar capacity
Today availability is template-like JSON plus vacation overlays. For v2:
- `AvailabilityTemplate`
- `ResourceAvailabilityOverride`
- `CalendarException`
- `PublicHolidayCalendar`
This lets the engine answer:
- “what is capacity on this exact date?”
- “why is this person unavailable?”
- “what changed after a vacation approval?”
## 4. Keep blueprints, but narrow their role
Blueprints should remain for:
- custom fields
- UI configuration
- optional default demand templates
Blueprints should **not** continue to carry too much core planning state in JSONB.
## 5. Add an outbox
Introduce:
- `DomainEventOutbox`
- `Job`
Every important mutation writes:
- domain row changes
- audit row
- outbox event
in one transaction.
That is the foundation for safe parallel agents.
---
## Application Layer Design
Every important user action should map to a use case, for example:
- `CreateProject`
- `DefineDemand`
- `AssignResource`
- `MoveAssignment`
- `ApproveVacation`
- `ImportSkillMatrix`
- `RecomputeValueScore`
- `GenerateAiSummary`
Each use case should:
- load aggregates via repositories
- call pure domain policies
- persist through a transaction
- publish outbox events
Routers then become simple wrappers:
- validate input
- call use case
- map result to DTO
This is the main architectural upgrade missing today.
---
## Query Side Design
V2 should use a **CQRS-lite** pattern:
- commands go through application services
- heavy timeline/dashboard/staffing reads use query services or read models
Examples:
- `timeline_read_model`
- `resource_capacity_snapshot`
- `project_budget_snapshot`
- `staffing_candidate_snapshot`
These can start as SQL views/materialized views or dedicated query handlers. No need for a separate read database yet.
This is especially important because the timeline and dashboards are read-heavy and aggregate-heavy.
---
## Parallel Runtime Agents
These are the v2 agents I would actually build. They should run as worker processes consuming outbox events and job records.
## 1. Match Agent
Input:
- `DemandRequirementCreated`
- `DemandRequirementChanged`
- `ResourceSkillChanged`
- `CalendarChanged`
Output:
- ranked candidate snapshots
- recommendation explanations
Responsibility:
- candidate filtering
- deterministic scoring
- optional AI explanation layer after deterministic ranking
## 2. Conflict Agent
Input:
- `AssignmentCreated`
- `AssignmentChanged`
- `VacationApproved`
- `CalendarExceptionChanged`
Output:
- overallocation/conflict records
- blocked-demand warnings
Responsibility:
- recompute exact day-level conflicts
- explain why a conflict exists
## 3. Budget Risk Agent
Input:
- assignment changes
- project budget changes
- project date changes
Output:
- burn snapshots
- over-budget warnings
- forecast deltas
Responsibility:
- separate financial forecasting from request/response latency
## 4. Notification Agent
Input:
- all user-visible domain events
Output:
- in-app notifications
- email sends
- digest batches
Responsibility:
- centralize fan-out
- remove notification logic from feature routers
## 5. Import Agent
Input:
- uploaded Excel/CSV/HRIS files
Output:
- staged import rows
- validation results
- normalized upserts
Responsibility:
- make imports resumable and auditable
## 6. AI Agent
Input:
- explicit AI jobs only
Output:
- summaries
- staffing rationale
- project risk narratives
Responsibility:
- all model interaction happens asynchronously
- stores prompt/result metadata for traceability
Important rule: **AI never becomes the system of record.** It annotates deterministic outputs.
---
## Parallel Build Workstreams
If you want to execute v2 with parallel coding agents, use these lanes to avoid file collisions.
## Agent A: Core Model Refactor
Owns:
- `packages/db`
- `packages/shared`
- new domain packages
Tasks:
- introduce `DemandRequirement`
- introduce normalized skill/calendar models
- add outbox and job tables
- define new shared DTOs/events
## Agent B: Application Service Extraction
Owns:
- `packages/application` new package
- router-to-service extraction in `packages/api`
Tasks:
- move create/update/fill/approve workflows out of routers
- standardize transaction boundaries
- standardize audit + outbox emission
## Agent C: Timeline V2
Owns:
- `apps/web/src/components/timeline/*`
- timeline read models and UI contracts
Tasks:
- break `TimelineView` into screen shell + view model + row renderers
- move timeline state machine into dedicated hooks/store
- consume new query DTOs instead of raw Prisma-shaped payloads
## Agent D: Project Creation And Staffing UX
Owns:
- `apps/web/src/components/projects/*`
- staffing query DTO consumers
Tasks:
- split `ProjectWizard`
- convert wizard from local mega-state to step reducers / use cases
- integrate recommendation snapshots from Match Agent
## Agent E: Security, Platform, And Notifications
Owns:
- auth
- user management
- settings
- notification workflows
Tasks:
- unify password hashing
- close permission gaps
- move secret handling behind infrastructure services
- wire Notification Agent
This split keeps most workstreams independent.
---
## Migration Plan
## Phase 0: Stabilize The Current System
Do this before any architecture refactor:
1. Fix user creation to use Argon2.
2. Restrict `notification.create` to admin/system workflows.
3. Fix `testAiConnection` to truly support both providers.
4. Make `pnpm test:unit` and `pnpm typecheck` green again.
5. Remove remaining legacy `role`/`roleId` ambiguity where possible.
## Phase 1: Extract The Application Layer
Without changing the UI yet:
- add use-case services
- move router logic into them
- introduce outbox writes
- standardize domain events
This phase creates the seam for the rest of v2.
## Phase 2: Introduce New Core Tables With Dual Write
- create `DemandRequirement`, normalized skills, normalized calendar tables
- dual-write from old flows
- build migration scripts and backfills
- add compatibility query adapters
## Phase 3: Rebuild The Timeline And Wizard Against New Read Models
- timeline consumes query DTOs
- wizard consumes demand/assignment APIs
- staffing suggestions come from snapshots, not direct all-resource scans
## Phase 4: Turn On Parallel Agents
- Match Agent
- Conflict Agent
- Budget Risk Agent
- Notification Agent
- Import Agent
- AI Agent
## Phase 5: Optional Service Extraction
Only after the domain seams hold:
- extract workers into separate deployables if load justifies it
- keep the transactional core close to the DB
---
## Recommended Immediate Improvement Backlog
If I had to choose the highest-leverage next moves:
1. Fix auth, notification permissions, AI test path, and broken repo checks.
2. Create `packages/application` and move allocation/timeline/project workflows into it.
3. Introduce `DemandRequirement` and stop using placeholder allocations as a dual-purpose model.
4. Rebuild staffing suggestions around normalized skills + calendar-aware capacity.
5. Split timeline and project wizard around view-model boundaries, not just JSX extraction.
---
## Bottom Line
**V2 should not be “more features on the current shape.”**
It should be:
- a cleaner domain model
- a thinner API layer
- async agents for expensive side effects
- read models for planning screens
- normalized planning entities with JSONB reserved for extension points
That will make Planarchy better at the thing it claims to be: a planning system, not just a CRUD app with a timeline.