Files

T

Hartmut 0d78fe1770 feat: Sprint 0 — CI/CD pipeline, production Docker, health checks

CI Pipeline (.github/workflows/ci.yml):
- 5 jobs: typecheck, lint, test, build, e2e (parallel where possible)
- PostgreSQL 16 + Redis 7 service containers for test/e2e
- pnpm store, Turborepo, Playwright browser caching
- Concurrency groups cancel in-progress runs

Production Docker:
- Dockerfile.prod: 3-stage build (deps → build → runtime ~150MB)
- docker-compose.prod.yml: postgres + redis + app with health checks
- .dockerignore for fast builds
- next.config.ts: output: "standalone" for minimal runtime

Health Check Endpoints:
- GET /api/health — liveness probe (200 OK, no deps)
- GET /api/ready — readiness probe (postgres + redis connectivity)

Documentation:
- docs/ci-cd-manual.md — full pipeline manual with troubleshooting
- plan.md — Product Owner strategic plan (bottlenecks, growth, automation)

Co-Authored-By: claude-flow <ruv@ruv.net>

2026-03-19 20:33:18 +01:00

14 KiB

Raw Blame History

Planarchy — Product Owner Strategic Plan

Consolidated analysis from 4 expert agents: Roadmap, API Surface, Frontend UX, and Test Infrastructure. Date: 2026-03-19

Executive Summary

Planarchy has reached Phase 9 with a mature core: timeline planning, allocation management, estimating, vacation pro, skill matrix, RBAC, and chargeability reporting. The product covers 34 routes, 47 DB models, ~200 tRPC procedures, and 109+ domain components.

However, the product has critical gaps preventing production readiness and growth:

Dimension	Score	Verdict
Feature completeness	85%	Strong core, thin edges (staffing, reporting)
Code quality	90%	Zero TODOs, clean architecture, typed end-to-end
Test coverage	55%	Engine excellent, API routers ~5%, no integration tests
CI/CD & DevOps	10%	No pipeline, no prod Docker, no monitoring
UX polish	75%	Deep timeline/estimates, but gaps in staffing workflow
Growth readiness	40%	No scenario planning, no integrations, no mobile

Part 1: Bottlenecks

1.1 Production Readiness Blockers (Critical)

#	Bottleneck	Impact	Severity
B1	No CI/CD pipeline — tests, lint, tsc not automated on PR	Regressions ship undetected	CRITICAL
B2	No production Docker image — only dev Dockerfile exists	Cannot deploy containerized	CRITICAL
B3	No monitoring/logging — no Sentry, no Pino, no APM	Blind in production, cannot debug	CRITICAL
B4	No health check endpoints — `/health`, `/ready` missing	Cannot detect/recover from failures	HIGH
B5	API router test coverage ~5% — 28 routers, almost no unit tests	Mutations untested at API boundary	HIGH

1.2 UX Bottlenecks

#	Bottleneck	Impact	Severity
B6	Staffing -> Allocation gap — match results don't link to allocation creation	Users must manually recreate allocations after finding matches	HIGH
B7	Reporting is thin — only 2 report types (chargeability, PDF allocations)	Finance/PMs can't self-serve custom reports	MEDIUM
B8	No bulk operations in list views — no multi-select outside timeline	Slow to manage 10+ resources/projects at once	MEDIUM
B9	Dashboard metrics computed live — no caching/pre-computation	Slow dashboard load with growing data	MEDIUM
B10	Timeline 3.3K LOC ecosystem — ResourcePanel 1035, ProjectPanel 1315 LOC	Hard to maintain, risky to modify	LOW

1.3 Architecture Bottlenecks

#	Bottleneck	Impact	Severity
B11	Prisma client cache invalidation — dev server restart required after schema changes	Developer friction, CI complexity	MEDIUM
B12	No webhook/event outbound — SSE event bus exists but no external subscriptions	Cannot notify external systems (Slack, Jira)	MEDIUM
B13	No soft-delete strategy — mixed approach (isActive, status, hard delete)	Data loss risk, no audit trail on deletions	LOW
B14	Rate card lookup manual in estimates — no auto-lookup by resource chapter/level	Estimate creation slower than needed	LOW

Part 2: Growth Potential

2.1 High-Value Feature Opportunities

Tier 1 — Quick Wins (1-3 days each)

#	Feature	Value	Effort
G1	Staffing "Assign" button — pre-populate allocation modal from match result	Closes biggest UX gap, saves 5+ clicks per staffing decision	1-2 days
G2	Dashboard caching — pre-compute metrics, invalidate on SSE events	3-5x dashboard load speed improvement	1-2 days
G3	Bulk list operations — multi-select + context menu on resources/projects	Enables batch edit, export, status change	2-3 days
G4	Health check endpoints — `/api/health` (liveness), `/api/ready` (DB + Redis)	Production deployment prerequisite	0.5 day

Tier 2 — Strategic Features (1-2 weeks each)

#	Feature	Value	Effort
G5	Scenario/What-If Planning — alternate staffing mixes, cost simulations	Differentiation for PMs and finance; leverages existing engine	1-2 weeks
G6	Skill Marketplace — searchable skill inventory, gap heat map, hiring priorities	High leverage from existing skill matrix; enables org-wide planning	1 week
G7	Custom Report Builder — drag columns, pivot, grouping, scheduled exports	Unlocks self-service analytics for finance and executives	1-2 weeks
G8	Collaboration Layer — inline comments on estimates, @mention, approval feedback	Enables cross-functional workflows (finance, PM, staffing)	1-2 weeks

Tier 3 — Market Differentiators (2-4 weeks each)

#	Feature	Value	Effort
G9	AI-Powered Insights — auto-suggest staffing, anomaly detection, narrative reports	Leverages existing Azure OpenAI integration; executive decision support	2-3 weeks
G10	External Integrations — Jira/Linear sync, Slack notifications, Google Calendar	Stickiness; connects Planarchy into existing workflows	2-4 weeks
G11	Mobile Companion — PWA with quick-view (status, gaps, approvals, push notifications)	Engagement for field PMs and remote staff	3-4 weeks
G12	Dispo V2 Clean-Slate Import — design doc + tickets exist, ready for implementation	Unblocks migration from legacy system; critical for customer onboarding	1-2 weeks

2.2 Missing Dashboard Widgets

Widget	Purpose	Effort
Budget spend forecast	Forward-looking actuals vs budget trend line	2 days
Team utilization heatmap	Resource x week grid with color intensity	2 days
Skill gap analysis	Required vs available skills across open demands	3 days
Project health scorecard	On-time, on-budget, quality composite score	2 days
Hiring pipeline	Forecast unfilled demand 3-6 months out	3 days

Part 3: Automation Potential

3.1 Development Workflow Automation

#	Automation	Current State	Target	Effort
A1	CI/CD Pipeline	None	GitHub Actions: test + lint + tsc on PR, build + deploy on merge	1-2 days
A2	Dependency scanning	None	Dependabot + npm audit in CI	0.5 day
A3	E2E test suite expansion	4 specs (auth, timeline, projects, resources)	20+ specs covering key user flows	1 week
A4	API integration tests	~5% router coverage	80% coverage with mock DB layer	1-2 weeks
A5	Coverage gates	Engine 95%, staffing 90%, others none	All packages minimum 80%	2 days config

3.2 Business Process Automation

#	Automation	Current Manual Process	Automated Process	Effort
A6	Auto-staffing suggestions	PM manually searches for resources per demand	System proposes top-3 matches when demand is created	3 days
A7	Vacation conflict alerts	Manager manually checks team calendar before approving	Auto-detect overlap > threshold, flag in approval flow	2 days
A8	Budget overrun notifications	Finance checks dashboards manually	SSE-triggered notification when project hits 80%/100% budget	1 day
A9	Estimate approval reminders	Verbal follow-up	Scheduled notification after N days in SUBMITTED status	1 day
A10	Chargeability alerts	Monthly manual review	Weekly auto-email when resource chargeability drops below target	2 days
A11	Rate card auto-apply	Manual rate lookup when creating estimate demand lines	Auto-fill LCR/UCR from rate card by resource chapter + level + client	2 days
A12	Public holiday auto-import	Admin manually batch-creates per year	Auto-generate on year rollover based on country/state config	1 day

3.3 Monitoring & Observability Automation

#	Automation	Target	Effort
A13	Structured logging (Pino)	All API requests logged with correlation ID	2 days
A14	Error tracking (Sentry)	Unhandled exceptions captured with context	1 day
A15	Performance monitoring	Slow query detection, API response time tracking	2 days
A16	Uptime monitoring	External health check probe, alerting	0.5 day

Part 4: Prioritized Roadmap

Sprint 0: Production Foundation (Week 1)

Goal: Unblock production deployment.

A1 — GitHub Actions CI pipeline (test + lint + tsc + build)
G4 — Health check endpoints (/api/health, /api/ready)
A14 — Sentry error tracking integration
A13 — Pino structured logging in API layer
Production Dockerfile (multi-stage, distroless base)
docker-compose.prod.yml with env-based config
Database backup strategy (pg_dump cron + S3)

Acceptance: main branch has green CI, production image builds, errors are captured.

Sprint 1: Quick Wins (Week 2)

Goal: Close the biggest UX gaps and improve daily workflows.

G1 — Staffing "Assign" button (match -> allocation in 1 click)
G2 — Dashboard metric caching (Redis-backed, SSE-invalidated)
G3 — Bulk operations on resource/project lists
A8 — Budget overrun notifications (80% + 100% thresholds)
A9 — Estimate approval reminders (auto-notify after 3 days)

Acceptance: Staffing-to-allocation is 1 click, dashboard loads <500ms, bulk select works.

Sprint 2: Test Coverage & Stability (Week 3)

Goal: Harden the codebase for confident iteration.

A4 — API router integration tests (target 15 most-used routers)
A5 — Coverage gates: api + application packages at 80%
A3 — E2E expansion: 10 new specs (estimate lifecycle, vacation flow, bulk ops, filters)
A2 — Dependabot + npm audit in CI

Acceptance: pnpm test:unit covers all routers, E2E suite runs in CI, zero high-severity vulnerabilities.

Sprint 3: Automation & Intelligence (Week 4-5)

Goal: Automate repetitive decisions, surface insights proactively.

A6 — Auto-staffing suggestions on demand creation
A7 — Vacation conflict detection in approval flow
A10 — Weekly chargeability alerts
A11 — Rate card auto-apply in estimate demand lines
A12 — Public holiday auto-import on year rollover
G6 — Skill marketplace MVP (searchable inventory + gap heat map)

Acceptance: Demands auto-suggest resources, vacation conflicts auto-flagged, rate cards auto-filled.

Sprint 4: Strategic Features (Week 6-8)

Goal: Build differentiation features that create competitive moat.

G5 — Scenario/what-if planning (staffing mix simulator)
G7 — Custom report builder MVP (column picker, filters, export)
G8 — Collaboration layer (comments on estimates, @mention)
G12 — Dispo V2 clean-slate import (leverage existing design docs + tickets)
Dashboard new widgets: budget forecast, skill gap, project health scorecard

Acceptance: PMs can simulate staffing scenarios, finance can build custom reports, Dispo import onboards first customer.

Sprint 5: Market Expansion (Week 9-12)

Goal: Expand the platform beyond core planning.

G9 — AI insights: auto-staffing, anomaly detection, narrative summaries
G10 — Jira/Linear integration + Slack notifications
G11 — Mobile PWA companion
A15 — Performance monitoring + load testing baseline
Advanced: multi-tenant architecture planning

Acceptance: AI suggestions active, Jira sync live, mobile app installable.

Part 5: Risk Register

#	Risk	Probability	Impact	Mitigation
R1	Production deployment without CI catches regressions	HIGH	CRITICAL	Sprint 0 is mandatory before any feature work
R2	Timeline 3.3K LOC becomes unmaintainable	MEDIUM	HIGH	Decompose into sub-hook modules when next touching timeline
R3	Dashboard performance degrades with data growth	MEDIUM	MEDIUM	G2 (caching) in Sprint 1; monitor query times
R4	Prisma schema changes break dev workflow	HIGH	LOW	Automate restart in dev scripts (already documented)
R5	Skill matrix AI costs grow with usage	LOW	MEDIUM	Add token budget tracking in SystemSettings
R6	No data backup strategy	MEDIUM	CRITICAL	Add pg_dump cron + S3 upload in Sprint 0
R7	Single-point-of-failure (1 dev, 1 server)	HIGH	CRITICAL	Document architecture, automate deployment, enable team onboarding

Part 6: Key Metrics to Track

Product Metrics

Time-to-staff: Minutes from demand creation to resource assignment
Estimate turnaround: Days from estimate creation to approval
Vacation approval latency: Hours from request to decision
Dashboard load time: P95 response time for dashboard page
Chargeability accuracy: Forecast vs actual deviation %

Engineering Metrics

Test coverage: % by package (target: all >=80%)
CI green rate: % of PRs passing all gates
Build time: Minutes for full next build
Error rate: Sentry exceptions per hour
API latency: P95 tRPC procedure response time

Appendix: Current State Snapshot

Dimension	Count
Database models	47
tRPC routers	28
tRPC procedures	~200 (120Q + 80M)
Frontend routes	34
Domain components	109+
Shared UI components	20+
Unit test files	62
E2E test specs	4
Engine test coverage	95% (gated)
Staffing test coverage	90% (gated)
API router test coverage	~5% (not gated)
CI/CD pipeline	None
Production Docker	None
Monitoring/APM	None
Completed phases	9
Known pain points	24 (documented in LEARNINGS.md)

14 KiB Raw Blame History