Files
CapaKraken/plan.md
T
Hartmut 0d78fe1770 feat: Sprint 0 — CI/CD pipeline, production Docker, health checks
CI Pipeline (.github/workflows/ci.yml):
- 5 jobs: typecheck, lint, test, build, e2e (parallel where possible)
- PostgreSQL 16 + Redis 7 service containers for test/e2e
- pnpm store, Turborepo, Playwright browser caching
- Concurrency groups cancel in-progress runs

Production Docker:
- Dockerfile.prod: 3-stage build (deps → build → runtime ~150MB)
- docker-compose.prod.yml: postgres + redis + app with health checks
- .dockerignore for fast builds
- next.config.ts: output: "standalone" for minimal runtime

Health Check Endpoints:
- GET /api/health — liveness probe (200 OK, no deps)
- GET /api/ready — readiness probe (postgres + redis connectivity)

Documentation:
- docs/ci-cd-manual.md — full pipeline manual with troubleshooting
- plan.md — Product Owner strategic plan (bottlenecks, growth, automation)

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-03-19 20:33:18 +01:00

14 KiB

Planarchy — Product Owner Strategic Plan

Consolidated analysis from 4 expert agents: Roadmap, API Surface, Frontend UX, and Test Infrastructure. Date: 2026-03-19


Executive Summary

Planarchy has reached Phase 9 with a mature core: timeline planning, allocation management, estimating, vacation pro, skill matrix, RBAC, and chargeability reporting. The product covers 34 routes, 47 DB models, ~200 tRPC procedures, and 109+ domain components.

However, the product has critical gaps preventing production readiness and growth:

Dimension Score Verdict
Feature completeness 85% Strong core, thin edges (staffing, reporting)
Code quality 90% Zero TODOs, clean architecture, typed end-to-end
Test coverage 55% Engine excellent, API routers ~5%, no integration tests
CI/CD & DevOps 10% No pipeline, no prod Docker, no monitoring
UX polish 75% Deep timeline/estimates, but gaps in staffing workflow
Growth readiness 40% No scenario planning, no integrations, no mobile

Part 1: Bottlenecks

1.1 Production Readiness Blockers (Critical)

# Bottleneck Impact Severity
B1 No CI/CD pipeline — tests, lint, tsc not automated on PR Regressions ship undetected CRITICAL
B2 No production Docker image — only dev Dockerfile exists Cannot deploy containerized CRITICAL
B3 No monitoring/logging — no Sentry, no Pino, no APM Blind in production, cannot debug CRITICAL
B4 No health check endpoints/health, /ready missing Cannot detect/recover from failures HIGH
B5 API router test coverage ~5% — 28 routers, almost no unit tests Mutations untested at API boundary HIGH

1.2 UX Bottlenecks

# Bottleneck Impact Severity
B6 Staffing -> Allocation gap — match results don't link to allocation creation Users must manually recreate allocations after finding matches HIGH
B7 Reporting is thin — only 2 report types (chargeability, PDF allocations) Finance/PMs can't self-serve custom reports MEDIUM
B8 No bulk operations in list views — no multi-select outside timeline Slow to manage 10+ resources/projects at once MEDIUM
B9 Dashboard metrics computed live — no caching/pre-computation Slow dashboard load with growing data MEDIUM
B10 Timeline 3.3K LOC ecosystem — ResourcePanel 1035, ProjectPanel 1315 LOC Hard to maintain, risky to modify LOW

1.3 Architecture Bottlenecks

# Bottleneck Impact Severity
B11 Prisma client cache invalidation — dev server restart required after schema changes Developer friction, CI complexity MEDIUM
B12 No webhook/event outbound — SSE event bus exists but no external subscriptions Cannot notify external systems (Slack, Jira) MEDIUM
B13 No soft-delete strategy — mixed approach (isActive, status, hard delete) Data loss risk, no audit trail on deletions LOW
B14 Rate card lookup manual in estimates — no auto-lookup by resource chapter/level Estimate creation slower than needed LOW

Part 2: Growth Potential

2.1 High-Value Feature Opportunities

Tier 1 — Quick Wins (1-3 days each)

# Feature Value Effort
G1 Staffing "Assign" button — pre-populate allocation modal from match result Closes biggest UX gap, saves 5+ clicks per staffing decision 1-2 days
G2 Dashboard caching — pre-compute metrics, invalidate on SSE events 3-5x dashboard load speed improvement 1-2 days
G3 Bulk list operations — multi-select + context menu on resources/projects Enables batch edit, export, status change 2-3 days
G4 Health check endpoints/api/health (liveness), /api/ready (DB + Redis) Production deployment prerequisite 0.5 day

Tier 2 — Strategic Features (1-2 weeks each)

# Feature Value Effort
G5 Scenario/What-If Planning — alternate staffing mixes, cost simulations Differentiation for PMs and finance; leverages existing engine 1-2 weeks
G6 Skill Marketplace — searchable skill inventory, gap heat map, hiring priorities High leverage from existing skill matrix; enables org-wide planning 1 week
G7 Custom Report Builder — drag columns, pivot, grouping, scheduled exports Unlocks self-service analytics for finance and executives 1-2 weeks
G8 Collaboration Layer — inline comments on estimates, @mention, approval feedback Enables cross-functional workflows (finance, PM, staffing) 1-2 weeks

Tier 3 — Market Differentiators (2-4 weeks each)

# Feature Value Effort
G9 AI-Powered Insights — auto-suggest staffing, anomaly detection, narrative reports Leverages existing Azure OpenAI integration; executive decision support 2-3 weeks
G10 External Integrations — Jira/Linear sync, Slack notifications, Google Calendar Stickiness; connects Planarchy into existing workflows 2-4 weeks
G11 Mobile Companion — PWA with quick-view (status, gaps, approvals, push notifications) Engagement for field PMs and remote staff 3-4 weeks
G12 Dispo V2 Clean-Slate Import — design doc + tickets exist, ready for implementation Unblocks migration from legacy system; critical for customer onboarding 1-2 weeks

2.2 Missing Dashboard Widgets

Widget Purpose Effort
Budget spend forecast Forward-looking actuals vs budget trend line 2 days
Team utilization heatmap Resource x week grid with color intensity 2 days
Skill gap analysis Required vs available skills across open demands 3 days
Project health scorecard On-time, on-budget, quality composite score 2 days
Hiring pipeline Forecast unfilled demand 3-6 months out 3 days

Part 3: Automation Potential

3.1 Development Workflow Automation

# Automation Current State Target Effort
A1 CI/CD Pipeline None GitHub Actions: test + lint + tsc on PR, build + deploy on merge 1-2 days
A2 Dependency scanning None Dependabot + npm audit in CI 0.5 day
A3 E2E test suite expansion 4 specs (auth, timeline, projects, resources) 20+ specs covering key user flows 1 week
A4 API integration tests ~5% router coverage 80% coverage with mock DB layer 1-2 weeks
A5 Coverage gates Engine 95%, staffing 90%, others none All packages minimum 80% 2 days config

3.2 Business Process Automation

# Automation Current Manual Process Automated Process Effort
A6 Auto-staffing suggestions PM manually searches for resources per demand System proposes top-3 matches when demand is created 3 days
A7 Vacation conflict alerts Manager manually checks team calendar before approving Auto-detect overlap > threshold, flag in approval flow 2 days
A8 Budget overrun notifications Finance checks dashboards manually SSE-triggered notification when project hits 80%/100% budget 1 day
A9 Estimate approval reminders Verbal follow-up Scheduled notification after N days in SUBMITTED status 1 day
A10 Chargeability alerts Monthly manual review Weekly auto-email when resource chargeability drops below target 2 days
A11 Rate card auto-apply Manual rate lookup when creating estimate demand lines Auto-fill LCR/UCR from rate card by resource chapter + level + client 2 days
A12 Public holiday auto-import Admin manually batch-creates per year Auto-generate on year rollover based on country/state config 1 day

3.3 Monitoring & Observability Automation

# Automation Target Effort
A13 Structured logging (Pino) All API requests logged with correlation ID 2 days
A14 Error tracking (Sentry) Unhandled exceptions captured with context 1 day
A15 Performance monitoring Slow query detection, API response time tracking 2 days
A16 Uptime monitoring External health check probe, alerting 0.5 day

Part 4: Prioritized Roadmap

Sprint 0: Production Foundation (Week 1)

Goal: Unblock production deployment.

  • A1 — GitHub Actions CI pipeline (test + lint + tsc + build)
  • G4 — Health check endpoints (/api/health, /api/ready)
  • A14 — Sentry error tracking integration
  • A13 — Pino structured logging in API layer
  • Production Dockerfile (multi-stage, distroless base)
  • docker-compose.prod.yml with env-based config
  • Database backup strategy (pg_dump cron + S3)

Acceptance: main branch has green CI, production image builds, errors are captured.

Sprint 1: Quick Wins (Week 2)

Goal: Close the biggest UX gaps and improve daily workflows.

  • G1 — Staffing "Assign" button (match -> allocation in 1 click)
  • G2 — Dashboard metric caching (Redis-backed, SSE-invalidated)
  • G3 — Bulk operations on resource/project lists
  • A8 — Budget overrun notifications (80% + 100% thresholds)
  • A9 — Estimate approval reminders (auto-notify after 3 days)

Acceptance: Staffing-to-allocation is 1 click, dashboard loads <500ms, bulk select works.

Sprint 2: Test Coverage & Stability (Week 3)

Goal: Harden the codebase for confident iteration.

  • A4 — API router integration tests (target 15 most-used routers)
  • A5 — Coverage gates: api + application packages at 80%
  • A3 — E2E expansion: 10 new specs (estimate lifecycle, vacation flow, bulk ops, filters)
  • A2 — Dependabot + npm audit in CI

Acceptance: pnpm test:unit covers all routers, E2E suite runs in CI, zero high-severity vulnerabilities.

Sprint 3: Automation & Intelligence (Week 4-5)

Goal: Automate repetitive decisions, surface insights proactively.

  • A6 — Auto-staffing suggestions on demand creation
  • A7 — Vacation conflict detection in approval flow
  • A10 — Weekly chargeability alerts
  • A11 — Rate card auto-apply in estimate demand lines
  • A12 — Public holiday auto-import on year rollover
  • G6 — Skill marketplace MVP (searchable inventory + gap heat map)

Acceptance: Demands auto-suggest resources, vacation conflicts auto-flagged, rate cards auto-filled.

Sprint 4: Strategic Features (Week 6-8)

Goal: Build differentiation features that create competitive moat.

  • G5 — Scenario/what-if planning (staffing mix simulator)
  • G7 — Custom report builder MVP (column picker, filters, export)
  • G8 — Collaboration layer (comments on estimates, @mention)
  • G12 — Dispo V2 clean-slate import (leverage existing design docs + tickets)
  • Dashboard new widgets: budget forecast, skill gap, project health scorecard

Acceptance: PMs can simulate staffing scenarios, finance can build custom reports, Dispo import onboards first customer.

Sprint 5: Market Expansion (Week 9-12)

Goal: Expand the platform beyond core planning.

  • G9 — AI insights: auto-staffing, anomaly detection, narrative summaries
  • G10 — Jira/Linear integration + Slack notifications
  • G11 — Mobile PWA companion
  • A15 — Performance monitoring + load testing baseline
  • Advanced: multi-tenant architecture planning

Acceptance: AI suggestions active, Jira sync live, mobile app installable.


Part 5: Risk Register

# Risk Probability Impact Mitigation
R1 Production deployment without CI catches regressions HIGH CRITICAL Sprint 0 is mandatory before any feature work
R2 Timeline 3.3K LOC becomes unmaintainable MEDIUM HIGH Decompose into sub-hook modules when next touching timeline
R3 Dashboard performance degrades with data growth MEDIUM MEDIUM G2 (caching) in Sprint 1; monitor query times
R4 Prisma schema changes break dev workflow HIGH LOW Automate restart in dev scripts (already documented)
R5 Skill matrix AI costs grow with usage LOW MEDIUM Add token budget tracking in SystemSettings
R6 No data backup strategy MEDIUM CRITICAL Add pg_dump cron + S3 upload in Sprint 0
R7 Single-point-of-failure (1 dev, 1 server) HIGH CRITICAL Document architecture, automate deployment, enable team onboarding

Part 6: Key Metrics to Track

Product Metrics

  • Time-to-staff: Minutes from demand creation to resource assignment
  • Estimate turnaround: Days from estimate creation to approval
  • Vacation approval latency: Hours from request to decision
  • Dashboard load time: P95 response time for dashboard page
  • Chargeability accuracy: Forecast vs actual deviation %

Engineering Metrics

  • Test coverage: % by package (target: all >=80%)
  • CI green rate: % of PRs passing all gates
  • Build time: Minutes for full next build
  • Error rate: Sentry exceptions per hour
  • API latency: P95 tRPC procedure response time

Appendix: Current State Snapshot

Dimension Count
Database models 47
tRPC routers 28
tRPC procedures ~200 (120Q + 80M)
Frontend routes 34
Domain components 109+
Shared UI components 20+
Unit test files 62
E2E test specs 4
Engine test coverage 95% (gated)
Staffing test coverage 90% (gated)
API router test coverage ~5% (not gated)
CI/CD pipeline None
Production Docker None
Monitoring/APM None
Completed phases 9
Known pain points 24 (documented in LEARNINGS.md)