Files
CapaKraken/research/v2-architecture-proposal-2026-03-11.md
T

16 KiB

CapaKraken V2 Architecture Proposal

Date: 2026-03-11
Scope: Codebase review, v2 direction, architecture rethink, parallel agent strategy

Executive Summary

CapaKraken already has a good base:

  • monorepo boundaries are mostly clean
  • engine and staffing contain useful pure domain logic
  • Next.js + tRPC + Prisma keeps product iteration fast
  • Redis-backed SSE is already a reasonable realtime baseline

The main issue is not the stack. The issue is that domain logic is split across:

  • large client components
  • large tRPC routers
  • JSONB-heavy persistence models
  • ad-hoc calculations in handlers

My recommendation for v2 is:

  1. Do not jump to microservices yet.
  2. Do move to a modular monolith with a real application layer and async workers.
  3. Split “planning demand” from “actual assignments” at the data model level.
  4. Keep JSONB only for extensibility, not for core planning workflows.
  5. Introduce event/outbox-driven parallel agents for matching, conflicts, budget risk, notifications, and AI work.

This gives you a v2 that is safer, easier to change, and still realistic for a small team.


What The Codebase Does Well

  • Domain packages are separated from the web app.
  • Shared types and schemas reduce transport mismatch.
  • Money is stored in integer cents.
  • The app stays operationally simple: one app, one DB, one Redis.
  • The timeline already has virtualization and SSE hooks, which means the product is past prototype stage.

Current Pain Points

1. Critical correctness and security issues exist today

Auth hashing is inconsistent

Notification creation is open to any authenticated user

AI connection testing is Azure-shaped even when provider is OpenAI

Repo health checks are currently failing

These are not “v2 someday” items. They should be fixed before deeper refactoring.

2. Large surfaces are carrying too much responsibility

The biggest modules are already a warning sign:

That usually means:

  • transport, orchestration, validation, business rules, and data access are mixed
  • testing becomes expensive
  • one change touches too many concerns

3. The core planning model is overloaded

The Prisma schema uses JSONB heavily in core workflows:

The bigger modeling problem is that Allocation currently represents both demand and assignment:

  • placeholder demand is modeled with resourceId = null
  • headcount is stored on the same entity
  • legacy role text and roleId coexist

This is the wrong aggregate for v2.

4. Staffing logic is not yet trustworthy enough to become a differentiator

staffing.getSuggestions currently:

That means the suggestion layer is:

  • hard to scale
  • not consistent with calendar-aware engine logic
  • not a strong base for “AI-assisted staffing”

5. Routers are doing application-service work

Representative examples:

The pure engine package exists, but the application layer that should orchestrate it does not.


Core Decision

V2 should be a modular monolith plus worker processes, not a microservice split.

Why:

  • the product is still changing fast
  • most failures are domain modeling and module-boundary problems, not network topology problems
  • a microservice split would increase operational cost before domain seams are stable

Target shape

apps/web
  -> UI + route handlers only

packages/api
  -> transport adapters only (tRPC procedures, auth boundary, DTO mapping)

packages/application
  -> use cases / command handlers / query handlers

packages/domain-people
packages/domain-projects
packages/domain-demand
packages/domain-scheduling
packages/domain-calendar
packages/domain-notifications
packages/domain-ai
  -> pure domain logic and policies

packages/infrastructure
  -> Prisma repos, Redis pub/sub, job queue, mail, AI clients

workers/agents
  -> async processors consuming outbox events and jobs

The key change is: routers stop containing business workflows. They become thin.


Data Model Changes For V2

1. Split demand from assignment

Replace the current overloaded Allocation concept with:

  • DemandRequirement

    • projectId
    • roleId
    • requiredSkills
    • date range
    • hoursPerDay
    • headcount
    • priority
    • status
  • Assignment

    • demandRequirementId nullable during migration
    • resourceId
    • projectId
    • date range
    • hoursPerDay
    • cost snapshot
    • status
  • AssignmentChange or AssignmentRevision

    • audit-friendly timeline history
    • supports undo/redo and reasoning

This removes:

  • nullable resource meaning two different business states
  • headcount logic from real assignments
  • placeholder branching across the whole codebase

2. Normalize the skill model

Today Resource.skills is JSONB. For v2, use:

  • Skill
  • ResourceSkill
  • optional RoleSkillProfile

Keep JSONB only for imported raw skill matrix payloads if needed.

Benefits:

  • real filtering
  • better analytics
  • reusable recommendation features
  • explainable ranking

3. Normalize calendar capacity

Today availability is template-like JSON plus vacation overlays. For v2:

  • AvailabilityTemplate
  • ResourceAvailabilityOverride
  • CalendarException
  • PublicHolidayCalendar

This lets the engine answer:

  • “what is capacity on this exact date?”
  • “why is this person unavailable?”
  • “what changed after a vacation approval?”

4. Keep blueprints, but narrow their role

Blueprints should remain for:

  • custom fields
  • UI configuration
  • optional default demand templates

Blueprints should not continue to carry too much core planning state in JSONB.

5. Add an outbox

Introduce:

  • DomainEventOutbox
  • Job

Every important mutation writes:

  • domain row changes
  • audit row
  • outbox event

in one transaction.

That is the foundation for safe parallel agents.


Application Layer Design

Every important user action should map to a use case, for example:

  • CreateProject
  • DefineDemand
  • AssignResource
  • MoveAssignment
  • ApproveVacation
  • ImportSkillMatrix
  • RecomputeValueScore
  • GenerateAiSummary

Each use case should:

  • load aggregates via repositories
  • call pure domain policies
  • persist through a transaction
  • publish outbox events

Routers then become simple wrappers:

  • validate input
  • call use case
  • map result to DTO

This is the main architectural upgrade missing today.


Query Side Design

V2 should use a CQRS-lite pattern:

  • commands go through application services
  • heavy timeline/dashboard/staffing reads use query services or read models

Examples:

  • timeline_read_model
  • resource_capacity_snapshot
  • project_budget_snapshot
  • staffing_candidate_snapshot

These can start as SQL views/materialized views or dedicated query handlers. No need for a separate read database yet.

This is especially important because the timeline and dashboards are read-heavy and aggregate-heavy.


Parallel Runtime Agents

These are the v2 agents I would actually build. They should run as worker processes consuming outbox events and job records.

1. Match Agent

Input:

  • DemandRequirementCreated
  • DemandRequirementChanged
  • ResourceSkillChanged
  • CalendarChanged

Output:

  • ranked candidate snapshots
  • recommendation explanations

Responsibility:

  • candidate filtering
  • deterministic scoring
  • optional AI explanation layer after deterministic ranking

2. Conflict Agent

Input:

  • AssignmentCreated
  • AssignmentChanged
  • VacationApproved
  • CalendarExceptionChanged

Output:

  • overallocation/conflict records
  • blocked-demand warnings

Responsibility:

  • recompute exact day-level conflicts
  • explain why a conflict exists

3. Budget Risk Agent

Input:

  • assignment changes
  • project budget changes
  • project date changes

Output:

  • burn snapshots
  • over-budget warnings
  • forecast deltas

Responsibility:

  • separate financial forecasting from request/response latency

4. Notification Agent

Input:

  • all user-visible domain events

Output:

  • in-app notifications
  • email sends
  • digest batches

Responsibility:

  • centralize fan-out
  • remove notification logic from feature routers

5. Import Agent

Input:

  • uploaded Excel/CSV/HRIS files

Output:

  • staged import rows
  • validation results
  • normalized upserts

Responsibility:

  • make imports resumable and auditable

6. AI Agent

Input:

  • explicit AI jobs only

Output:

  • summaries
  • staffing rationale
  • project risk narratives

Responsibility:

  • all model interaction happens asynchronously
  • stores prompt/result metadata for traceability

Important rule: AI never becomes the system of record. It annotates deterministic outputs.


Parallel Build Workstreams

If you want to execute v2 with parallel coding agents, use these lanes to avoid file collisions.

Agent A: Core Model Refactor

Owns:

  • packages/db
  • packages/shared
  • new domain packages

Tasks:

  • introduce DemandRequirement
  • introduce normalized skill/calendar models
  • add outbox and job tables
  • define new shared DTOs/events

Agent B: Application Service Extraction

Owns:

  • packages/application new package
  • router-to-service extraction in packages/api

Tasks:

  • move create/update/fill/approve workflows out of routers
  • standardize transaction boundaries
  • standardize audit + outbox emission

Agent C: Timeline V2

Owns:

  • apps/web/src/components/timeline/*
  • timeline read models and UI contracts

Tasks:

  • break TimelineView into screen shell + view model + row renderers
  • move timeline state machine into dedicated hooks/store
  • consume new query DTOs instead of raw Prisma-shaped payloads

Agent D: Project Creation And Staffing UX

Owns:

  • apps/web/src/components/projects/*
  • staffing query DTO consumers

Tasks:

  • split ProjectWizard
  • convert wizard from local mega-state to step reducers / use cases
  • integrate recommendation snapshots from Match Agent

Agent E: Security, Platform, And Notifications

Owns:

  • auth
  • user management
  • settings
  • notification workflows

Tasks:

  • unify password hashing
  • close permission gaps
  • move secret handling behind infrastructure services
  • wire Notification Agent

This split keeps most workstreams independent.


Migration Plan

Phase 0: Stabilize The Current System

Do this before any architecture refactor:

  1. Fix user creation to use Argon2.
  2. Restrict notification.create to admin/system workflows.
  3. Fix testAiConnection to truly support both providers.
  4. Make pnpm test:unit and pnpm typecheck green again.
  5. Remove remaining legacy role/roleId ambiguity where possible.

Phase 1: Extract The Application Layer

Without changing the UI yet:

  • add use-case services
  • move router logic into them
  • introduce outbox writes
  • standardize domain events

This phase creates the seam for the rest of v2.

Phase 2: Introduce New Core Tables With Dual Write

  • create DemandRequirement, normalized skills, normalized calendar tables
  • dual-write from old flows
  • build migration scripts and backfills
  • add compatibility query adapters

Phase 3: Rebuild The Timeline And Wizard Against New Read Models

  • timeline consumes query DTOs
  • wizard consumes demand/assignment APIs
  • staffing suggestions come from snapshots, not direct all-resource scans

Phase 4: Turn On Parallel Agents

  • Match Agent
  • Conflict Agent
  • Budget Risk Agent
  • Notification Agent
  • Import Agent
  • AI Agent

Phase 5: Optional Service Extraction

Only after the domain seams hold:

  • extract workers into separate deployables if load justifies it
  • keep the transactional core close to the DB

If I had to choose the highest-leverage next moves:

  1. Fix auth, notification permissions, AI test path, and broken repo checks.
  2. Create packages/application and move allocation/timeline/project workflows into it.
  3. Introduce DemandRequirement and stop using placeholder allocations as a dual-purpose model.
  4. Rebuild staffing suggestions around normalized skills + calendar-aware capacity.
  5. Split timeline and project wizard around view-model boundaries, not just JSX extraction.

Bottom Line

V2 should not be “more features on the current shape.”
It should be:

  • a cleaner domain model
  • a thinner API layer
  • async agents for expensive side effects
  • read models for planning screens
  • normalized planning entities with JSONB reserved for extension points

That will make CapaKraken better at the thing it claims to be: a planning system, not just a CRUD app with a timeline.