16 KiB
CapaKraken V2 Architecture Proposal
Date: 2026-03-11
Scope: Codebase review, v2 direction, architecture rethink, parallel agent strategy
Executive Summary
CapaKraken already has a good base:
- monorepo boundaries are mostly clean
engineandstaffingcontain useful pure domain logic- Next.js + tRPC + Prisma keeps product iteration fast
- Redis-backed SSE is already a reasonable realtime baseline
The main issue is not the stack. The issue is that domain logic is split across:
- large client components
- large tRPC routers
- JSONB-heavy persistence models
- ad-hoc calculations in handlers
My recommendation for v2 is:
- Do not jump to microservices yet.
- Do move to a modular monolith with a real application layer and async workers.
- Split “planning demand” from “actual assignments” at the data model level.
- Keep JSONB only for extensibility, not for core planning workflows.
- Introduce event/outbox-driven parallel agents for matching, conflicts, budget risk, notifications, and AI work.
This gives you a v2 that is safer, easier to change, and still realistic for a small team.
What The Codebase Does Well
- Domain packages are separated from the web app.
- Shared types and schemas reduce transport mismatch.
- Money is stored in integer cents.
- The app stays operationally simple: one app, one DB, one Redis.
- The timeline already has virtualization and SSE hooks, which means the product is past prototype stage.
Current Pain Points
1. Critical correctness and security issues exist today
Auth hashing is inconsistent
- Login verifies Argon2 hashes in
apps/web/src/server/auth.ts#L20. - Admin-created users are still stored with SHA-256 in
packages/api/src/router/user.ts#L41. - Impact: users created from the admin flow are likely unable to log in.
Notification creation is open to any authenticated user
notification.createis onlyprotectedProcedureinpackages/api/src/router/notification.ts#L66.- Impact: any logged-in user can create notifications for arbitrary users.
AI connection testing is Azure-shaped even when provider is OpenAI
testAiConnectionalways constructs an Azure deployment URL inpackages/api/src/router/settings.ts#L122.- Impact: provider abstraction is not actually reliable.
Repo health checks are currently failing
pnpm test:unitfails because@capakraken/sharedhas a Vitest script but no tests inpackages/shared/package.json.pnpm typecheckfails becausecrypto.randomUUID()is used without a visible import/global typing inpackages/shared/src/schemas/project.schema.ts#L5.
These are not “v2 someday” items. They should be fixed before deeper refactoring.
2. Large surfaces are carrying too much responsibility
The biggest modules are already a warning sign:
apps/web/src/components/timeline/TimelineView.tsxis 1720 lines.apps/web/src/components/projects/ProjectWizard.tsxis 1171 lines.packages/api/src/router/resource.tsis 908 lines.packages/api/src/router/timeline.tsis 631 lines.
That usually means:
- transport, orchestration, validation, business rules, and data access are mixed
- testing becomes expensive
- one change touches too many concerns
3. The core planning model is overloaded
The Prisma schema uses JSONB heavily in core workflows:
- blueprints and role presets in
packages/db/prisma/schema.prisma#L147 - resource availability, skills, and dynamic fields in
packages/db/prisma/schema.prisma#L208 - project staffing requirements and dynamic fields in
packages/db/prisma/schema.prisma#L267 - allocation metadata in
packages/db/prisma/schema.prisma#L301
The bigger modeling problem is that Allocation currently represents both demand and assignment:
- placeholder demand is modeled with
resourceId = null - headcount is stored on the same entity
- legacy
roletext androleIdcoexist
This is the wrong aggregate for v2.
4. Staffing logic is not yet trustworthy enough to become a differentiator
staffing.getSuggestions currently:
- loads all active resources with overlapping allocations
- computes utilization in the router
- uses only Monday availability as the denominator in
packages/api/src/router/staffing.ts#L45
That means the suggestion layer is:
- hard to scale
- not consistent with calendar-aware engine logic
- not a strong base for “AI-assisted staffing”
5. Routers are doing application-service work
Representative examples:
- timeline queries and update workflows live directly in
packages/api/src/router/timeline.ts#L12 - allocation creation, placeholder fill, validation, vacation handling, cost calc, audit log, and event emission all live in
packages/api/src/router/allocation.ts#L8
The pure engine package exists, but the application layer that should orchestrate it does not.
Recommended V2 Architecture
Core Decision
V2 should be a modular monolith plus worker processes, not a microservice split.
Why:
- the product is still changing fast
- most failures are domain modeling and module-boundary problems, not network topology problems
- a microservice split would increase operational cost before domain seams are stable
Target shape
apps/web
-> UI + route handlers only
packages/api
-> transport adapters only (tRPC procedures, auth boundary, DTO mapping)
packages/application
-> use cases / command handlers / query handlers
packages/domain-people
packages/domain-projects
packages/domain-demand
packages/domain-scheduling
packages/domain-calendar
packages/domain-notifications
packages/domain-ai
-> pure domain logic and policies
packages/infrastructure
-> Prisma repos, Redis pub/sub, job queue, mail, AI clients
workers/agents
-> async processors consuming outbox events and jobs
The key change is: routers stop containing business workflows. They become thin.
Data Model Changes For V2
1. Split demand from assignment
Replace the current overloaded Allocation concept with:
-
DemandRequirement- projectId
- roleId
- requiredSkills
- date range
- hoursPerDay
- headcount
- priority
- status
-
Assignment- demandRequirementId nullable during migration
- resourceId
- projectId
- date range
- hoursPerDay
- cost snapshot
- status
-
AssignmentChangeorAssignmentRevision- audit-friendly timeline history
- supports undo/redo and reasoning
This removes:
- nullable resource meaning two different business states
- headcount logic from real assignments
- placeholder branching across the whole codebase
2. Normalize the skill model
Today Resource.skills is JSONB. For v2, use:
SkillResourceSkill- optional
RoleSkillProfile
Keep JSONB only for imported raw skill matrix payloads if needed.
Benefits:
- real filtering
- better analytics
- reusable recommendation features
- explainable ranking
3. Normalize calendar capacity
Today availability is template-like JSON plus vacation overlays. For v2:
AvailabilityTemplateResourceAvailabilityOverrideCalendarExceptionPublicHolidayCalendar
This lets the engine answer:
- “what is capacity on this exact date?”
- “why is this person unavailable?”
- “what changed after a vacation approval?”
4. Keep blueprints, but narrow their role
Blueprints should remain for:
- custom fields
- UI configuration
- optional default demand templates
Blueprints should not continue to carry too much core planning state in JSONB.
5. Add an outbox
Introduce:
DomainEventOutboxJob
Every important mutation writes:
- domain row changes
- audit row
- outbox event
in one transaction.
That is the foundation for safe parallel agents.
Application Layer Design
Every important user action should map to a use case, for example:
CreateProjectDefineDemandAssignResourceMoveAssignmentApproveVacationImportSkillMatrixRecomputeValueScoreGenerateAiSummary
Each use case should:
- load aggregates via repositories
- call pure domain policies
- persist through a transaction
- publish outbox events
Routers then become simple wrappers:
- validate input
- call use case
- map result to DTO
This is the main architectural upgrade missing today.
Query Side Design
V2 should use a CQRS-lite pattern:
- commands go through application services
- heavy timeline/dashboard/staffing reads use query services or read models
Examples:
timeline_read_modelresource_capacity_snapshotproject_budget_snapshotstaffing_candidate_snapshot
These can start as SQL views/materialized views or dedicated query handlers. No need for a separate read database yet.
This is especially important because the timeline and dashboards are read-heavy and aggregate-heavy.
Parallel Runtime Agents
These are the v2 agents I would actually build. They should run as worker processes consuming outbox events and job records.
1. Match Agent
Input:
DemandRequirementCreatedDemandRequirementChangedResourceSkillChangedCalendarChanged
Output:
- ranked candidate snapshots
- recommendation explanations
Responsibility:
- candidate filtering
- deterministic scoring
- optional AI explanation layer after deterministic ranking
2. Conflict Agent
Input:
AssignmentCreatedAssignmentChangedVacationApprovedCalendarExceptionChanged
Output:
- overallocation/conflict records
- blocked-demand warnings
Responsibility:
- recompute exact day-level conflicts
- explain why a conflict exists
3. Budget Risk Agent
Input:
- assignment changes
- project budget changes
- project date changes
Output:
- burn snapshots
- over-budget warnings
- forecast deltas
Responsibility:
- separate financial forecasting from request/response latency
4. Notification Agent
Input:
- all user-visible domain events
Output:
- in-app notifications
- email sends
- digest batches
Responsibility:
- centralize fan-out
- remove notification logic from feature routers
5. Import Agent
Input:
- uploaded Excel/CSV/HRIS files
Output:
- staged import rows
- validation results
- normalized upserts
Responsibility:
- make imports resumable and auditable
6. AI Agent
Input:
- explicit AI jobs only
Output:
- summaries
- staffing rationale
- project risk narratives
Responsibility:
- all model interaction happens asynchronously
- stores prompt/result metadata for traceability
Important rule: AI never becomes the system of record. It annotates deterministic outputs.
Parallel Build Workstreams
If you want to execute v2 with parallel coding agents, use these lanes to avoid file collisions.
Agent A: Core Model Refactor
Owns:
packages/dbpackages/shared- new domain packages
Tasks:
- introduce
DemandRequirement - introduce normalized skill/calendar models
- add outbox and job tables
- define new shared DTOs/events
Agent B: Application Service Extraction
Owns:
packages/applicationnew package- router-to-service extraction in
packages/api
Tasks:
- move create/update/fill/approve workflows out of routers
- standardize transaction boundaries
- standardize audit + outbox emission
Agent C: Timeline V2
Owns:
apps/web/src/components/timeline/*- timeline read models and UI contracts
Tasks:
- break
TimelineViewinto screen shell + view model + row renderers - move timeline state machine into dedicated hooks/store
- consume new query DTOs instead of raw Prisma-shaped payloads
Agent D: Project Creation And Staffing UX
Owns:
apps/web/src/components/projects/*- staffing query DTO consumers
Tasks:
- split
ProjectWizard - convert wizard from local mega-state to step reducers / use cases
- integrate recommendation snapshots from Match Agent
Agent E: Security, Platform, And Notifications
Owns:
- auth
- user management
- settings
- notification workflows
Tasks:
- unify password hashing
- close permission gaps
- move secret handling behind infrastructure services
- wire Notification Agent
This split keeps most workstreams independent.
Migration Plan
Phase 0: Stabilize The Current System
Do this before any architecture refactor:
- Fix user creation to use Argon2.
- Restrict
notification.createto admin/system workflows. - Fix
testAiConnectionto truly support both providers. - Make
pnpm test:unitandpnpm typecheckgreen again. - Remove remaining legacy
role/roleIdambiguity where possible.
Phase 1: Extract The Application Layer
Without changing the UI yet:
- add use-case services
- move router logic into them
- introduce outbox writes
- standardize domain events
This phase creates the seam for the rest of v2.
Phase 2: Introduce New Core Tables With Dual Write
- create
DemandRequirement, normalized skills, normalized calendar tables - dual-write from old flows
- build migration scripts and backfills
- add compatibility query adapters
Phase 3: Rebuild The Timeline And Wizard Against New Read Models
- timeline consumes query DTOs
- wizard consumes demand/assignment APIs
- staffing suggestions come from snapshots, not direct all-resource scans
Phase 4: Turn On Parallel Agents
- Match Agent
- Conflict Agent
- Budget Risk Agent
- Notification Agent
- Import Agent
- AI Agent
Phase 5: Optional Service Extraction
Only after the domain seams hold:
- extract workers into separate deployables if load justifies it
- keep the transactional core close to the DB
Recommended Immediate Improvement Backlog
If I had to choose the highest-leverage next moves:
- Fix auth, notification permissions, AI test path, and broken repo checks.
- Create
packages/applicationand move allocation/timeline/project workflows into it. - Introduce
DemandRequirementand stop using placeholder allocations as a dual-purpose model. - Rebuild staffing suggestions around normalized skills + calendar-aware capacity.
- Split timeline and project wizard around view-model boundaries, not just JSX extraction.
Bottom Line
V2 should not be “more features on the current shape.”
It should be:
- a cleaner domain model
- a thinner API layer
- async agents for expensive side effects
- read models for planning screens
- normalized planning entities with JSONB reserved for extension points
That will make CapaKraken better at the thing it claims to be: a planning system, not just a CRUD app with a timeline.