CI Pipeline (.github/workflows/ci.yml): - 5 jobs: typecheck, lint, test, build, e2e (parallel where possible) - PostgreSQL 16 + Redis 7 service containers for test/e2e - pnpm store, Turborepo, Playwright browser caching - Concurrency groups cancel in-progress runs Production Docker: - Dockerfile.prod: 3-stage build (deps → build → runtime ~150MB) - docker-compose.prod.yml: postgres + redis + app with health checks - .dockerignore for fast builds - next.config.ts: output: "standalone" for minimal runtime Health Check Endpoints: - GET /api/health — liveness probe (200 OK, no deps) - GET /api/ready — readiness probe (postgres + redis connectivity) Documentation: - docs/ci-cd-manual.md — full pipeline manual with troubleshooting - plan.md — Product Owner strategic plan (bottlenecks, growth, automation) Co-Authored-By: claude-flow <ruv@ruv.net>
14 KiB
Planarchy — Product Owner Strategic Plan
Consolidated analysis from 4 expert agents: Roadmap, API Surface, Frontend UX, and Test Infrastructure. Date: 2026-03-19
Executive Summary
Planarchy has reached Phase 9 with a mature core: timeline planning, allocation management, estimating, vacation pro, skill matrix, RBAC, and chargeability reporting. The product covers 34 routes, 47 DB models, ~200 tRPC procedures, and 109+ domain components.
However, the product has critical gaps preventing production readiness and growth:
| Dimension | Score | Verdict |
|---|---|---|
| Feature completeness | 85% | Strong core, thin edges (staffing, reporting) |
| Code quality | 90% | Zero TODOs, clean architecture, typed end-to-end |
| Test coverage | 55% | Engine excellent, API routers ~5%, no integration tests |
| CI/CD & DevOps | 10% | No pipeline, no prod Docker, no monitoring |
| UX polish | 75% | Deep timeline/estimates, but gaps in staffing workflow |
| Growth readiness | 40% | No scenario planning, no integrations, no mobile |
Part 1: Bottlenecks
1.1 Production Readiness Blockers (Critical)
| # | Bottleneck | Impact | Severity |
|---|---|---|---|
| B1 | No CI/CD pipeline — tests, lint, tsc not automated on PR | Regressions ship undetected | CRITICAL |
| B2 | No production Docker image — only dev Dockerfile exists | Cannot deploy containerized | CRITICAL |
| B3 | No monitoring/logging — no Sentry, no Pino, no APM | Blind in production, cannot debug | CRITICAL |
| B4 | No health check endpoints — /health, /ready missing |
Cannot detect/recover from failures | HIGH |
| B5 | API router test coverage ~5% — 28 routers, almost no unit tests | Mutations untested at API boundary | HIGH |
1.2 UX Bottlenecks
| # | Bottleneck | Impact | Severity |
|---|---|---|---|
| B6 | Staffing -> Allocation gap — match results don't link to allocation creation | Users must manually recreate allocations after finding matches | HIGH |
| B7 | Reporting is thin — only 2 report types (chargeability, PDF allocations) | Finance/PMs can't self-serve custom reports | MEDIUM |
| B8 | No bulk operations in list views — no multi-select outside timeline | Slow to manage 10+ resources/projects at once | MEDIUM |
| B9 | Dashboard metrics computed live — no caching/pre-computation | Slow dashboard load with growing data | MEDIUM |
| B10 | Timeline 3.3K LOC ecosystem — ResourcePanel 1035, ProjectPanel 1315 LOC | Hard to maintain, risky to modify | LOW |
1.3 Architecture Bottlenecks
| # | Bottleneck | Impact | Severity |
|---|---|---|---|
| B11 | Prisma client cache invalidation — dev server restart required after schema changes | Developer friction, CI complexity | MEDIUM |
| B12 | No webhook/event outbound — SSE event bus exists but no external subscriptions | Cannot notify external systems (Slack, Jira) | MEDIUM |
| B13 | No soft-delete strategy — mixed approach (isActive, status, hard delete) | Data loss risk, no audit trail on deletions | LOW |
| B14 | Rate card lookup manual in estimates — no auto-lookup by resource chapter/level | Estimate creation slower than needed | LOW |
Part 2: Growth Potential
2.1 High-Value Feature Opportunities
Tier 1 — Quick Wins (1-3 days each)
| # | Feature | Value | Effort |
|---|---|---|---|
| G1 | Staffing "Assign" button — pre-populate allocation modal from match result | Closes biggest UX gap, saves 5+ clicks per staffing decision | 1-2 days |
| G2 | Dashboard caching — pre-compute metrics, invalidate on SSE events | 3-5x dashboard load speed improvement | 1-2 days |
| G3 | Bulk list operations — multi-select + context menu on resources/projects | Enables batch edit, export, status change | 2-3 days |
| G4 | Health check endpoints — /api/health (liveness), /api/ready (DB + Redis) |
Production deployment prerequisite | 0.5 day |
Tier 2 — Strategic Features (1-2 weeks each)
| # | Feature | Value | Effort |
|---|---|---|---|
| G5 | Scenario/What-If Planning — alternate staffing mixes, cost simulations | Differentiation for PMs and finance; leverages existing engine | 1-2 weeks |
| G6 | Skill Marketplace — searchable skill inventory, gap heat map, hiring priorities | High leverage from existing skill matrix; enables org-wide planning | 1 week |
| G7 | Custom Report Builder — drag columns, pivot, grouping, scheduled exports | Unlocks self-service analytics for finance and executives | 1-2 weeks |
| G8 | Collaboration Layer — inline comments on estimates, @mention, approval feedback | Enables cross-functional workflows (finance, PM, staffing) | 1-2 weeks |
Tier 3 — Market Differentiators (2-4 weeks each)
| # | Feature | Value | Effort |
|---|---|---|---|
| G9 | AI-Powered Insights — auto-suggest staffing, anomaly detection, narrative reports | Leverages existing Azure OpenAI integration; executive decision support | 2-3 weeks |
| G10 | External Integrations — Jira/Linear sync, Slack notifications, Google Calendar | Stickiness; connects Planarchy into existing workflows | 2-4 weeks |
| G11 | Mobile Companion — PWA with quick-view (status, gaps, approvals, push notifications) | Engagement for field PMs and remote staff | 3-4 weeks |
| G12 | Dispo V2 Clean-Slate Import — design doc + tickets exist, ready for implementation | Unblocks migration from legacy system; critical for customer onboarding | 1-2 weeks |
2.2 Missing Dashboard Widgets
| Widget | Purpose | Effort |
|---|---|---|
| Budget spend forecast | Forward-looking actuals vs budget trend line | 2 days |
| Team utilization heatmap | Resource x week grid with color intensity | 2 days |
| Skill gap analysis | Required vs available skills across open demands | 3 days |
| Project health scorecard | On-time, on-budget, quality composite score | 2 days |
| Hiring pipeline | Forecast unfilled demand 3-6 months out | 3 days |
Part 3: Automation Potential
3.1 Development Workflow Automation
| # | Automation | Current State | Target | Effort |
|---|---|---|---|---|
| A1 | CI/CD Pipeline | None | GitHub Actions: test + lint + tsc on PR, build + deploy on merge | 1-2 days |
| A2 | Dependency scanning | None | Dependabot + npm audit in CI | 0.5 day |
| A3 | E2E test suite expansion | 4 specs (auth, timeline, projects, resources) | 20+ specs covering key user flows | 1 week |
| A4 | API integration tests | ~5% router coverage | 80% coverage with mock DB layer | 1-2 weeks |
| A5 | Coverage gates | Engine 95%, staffing 90%, others none | All packages minimum 80% | 2 days config |
3.2 Business Process Automation
| # | Automation | Current Manual Process | Automated Process | Effort |
|---|---|---|---|---|
| A6 | Auto-staffing suggestions | PM manually searches for resources per demand | System proposes top-3 matches when demand is created | 3 days |
| A7 | Vacation conflict alerts | Manager manually checks team calendar before approving | Auto-detect overlap > threshold, flag in approval flow | 2 days |
| A8 | Budget overrun notifications | Finance checks dashboards manually | SSE-triggered notification when project hits 80%/100% budget | 1 day |
| A9 | Estimate approval reminders | Verbal follow-up | Scheduled notification after N days in SUBMITTED status | 1 day |
| A10 | Chargeability alerts | Monthly manual review | Weekly auto-email when resource chargeability drops below target | 2 days |
| A11 | Rate card auto-apply | Manual rate lookup when creating estimate demand lines | Auto-fill LCR/UCR from rate card by resource chapter + level + client | 2 days |
| A12 | Public holiday auto-import | Admin manually batch-creates per year | Auto-generate on year rollover based on country/state config | 1 day |
3.3 Monitoring & Observability Automation
| # | Automation | Target | Effort |
|---|---|---|---|
| A13 | Structured logging (Pino) | All API requests logged with correlation ID | 2 days |
| A14 | Error tracking (Sentry) | Unhandled exceptions captured with context | 1 day |
| A15 | Performance monitoring | Slow query detection, API response time tracking | 2 days |
| A16 | Uptime monitoring | External health check probe, alerting | 0.5 day |
Part 4: Prioritized Roadmap
Sprint 0: Production Foundation (Week 1)
Goal: Unblock production deployment.
- A1 — GitHub Actions CI pipeline (test + lint + tsc + build)
- G4 — Health check endpoints (
/api/health,/api/ready) - A14 — Sentry error tracking integration
- A13 — Pino structured logging in API layer
- Production Dockerfile (multi-stage, distroless base)
- docker-compose.prod.yml with env-based config
- Database backup strategy (pg_dump cron + S3)
Acceptance: main branch has green CI, production image builds, errors are captured.
Sprint 1: Quick Wins (Week 2)
Goal: Close the biggest UX gaps and improve daily workflows.
- G1 — Staffing "Assign" button (match -> allocation in 1 click)
- G2 — Dashboard metric caching (Redis-backed, SSE-invalidated)
- G3 — Bulk operations on resource/project lists
- A8 — Budget overrun notifications (80% + 100% thresholds)
- A9 — Estimate approval reminders (auto-notify after 3 days)
Acceptance: Staffing-to-allocation is 1 click, dashboard loads <500ms, bulk select works.
Sprint 2: Test Coverage & Stability (Week 3)
Goal: Harden the codebase for confident iteration.
- A4 — API router integration tests (target 15 most-used routers)
- A5 — Coverage gates: api + application packages at 80%
- A3 — E2E expansion: 10 new specs (estimate lifecycle, vacation flow, bulk ops, filters)
- A2 — Dependabot + npm audit in CI
Acceptance: pnpm test:unit covers all routers, E2E suite runs in CI, zero high-severity vulnerabilities.
Sprint 3: Automation & Intelligence (Week 4-5)
Goal: Automate repetitive decisions, surface insights proactively.
- A6 — Auto-staffing suggestions on demand creation
- A7 — Vacation conflict detection in approval flow
- A10 — Weekly chargeability alerts
- A11 — Rate card auto-apply in estimate demand lines
- A12 — Public holiday auto-import on year rollover
- G6 — Skill marketplace MVP (searchable inventory + gap heat map)
Acceptance: Demands auto-suggest resources, vacation conflicts auto-flagged, rate cards auto-filled.
Sprint 4: Strategic Features (Week 6-8)
Goal: Build differentiation features that create competitive moat.
- G5 — Scenario/what-if planning (staffing mix simulator)
- G7 — Custom report builder MVP (column picker, filters, export)
- G8 — Collaboration layer (comments on estimates, @mention)
- G12 — Dispo V2 clean-slate import (leverage existing design docs + tickets)
- Dashboard new widgets: budget forecast, skill gap, project health scorecard
Acceptance: PMs can simulate staffing scenarios, finance can build custom reports, Dispo import onboards first customer.
Sprint 5: Market Expansion (Week 9-12)
Goal: Expand the platform beyond core planning.
- G9 — AI insights: auto-staffing, anomaly detection, narrative summaries
- G10 — Jira/Linear integration + Slack notifications
- G11 — Mobile PWA companion
- A15 — Performance monitoring + load testing baseline
- Advanced: multi-tenant architecture planning
Acceptance: AI suggestions active, Jira sync live, mobile app installable.
Part 5: Risk Register
| # | Risk | Probability | Impact | Mitigation |
|---|---|---|---|---|
| R1 | Production deployment without CI catches regressions | HIGH | CRITICAL | Sprint 0 is mandatory before any feature work |
| R2 | Timeline 3.3K LOC becomes unmaintainable | MEDIUM | HIGH | Decompose into sub-hook modules when next touching timeline |
| R3 | Dashboard performance degrades with data growth | MEDIUM | MEDIUM | G2 (caching) in Sprint 1; monitor query times |
| R4 | Prisma schema changes break dev workflow | HIGH | LOW | Automate restart in dev scripts (already documented) |
| R5 | Skill matrix AI costs grow with usage | LOW | MEDIUM | Add token budget tracking in SystemSettings |
| R6 | No data backup strategy | MEDIUM | CRITICAL | Add pg_dump cron + S3 upload in Sprint 0 |
| R7 | Single-point-of-failure (1 dev, 1 server) | HIGH | CRITICAL | Document architecture, automate deployment, enable team onboarding |
Part 6: Key Metrics to Track
Product Metrics
- Time-to-staff: Minutes from demand creation to resource assignment
- Estimate turnaround: Days from estimate creation to approval
- Vacation approval latency: Hours from request to decision
- Dashboard load time: P95 response time for dashboard page
- Chargeability accuracy: Forecast vs actual deviation %
Engineering Metrics
- Test coverage: % by package (target: all >=80%)
- CI green rate: % of PRs passing all gates
- Build time: Minutes for full
next build - Error rate: Sentry exceptions per hour
- API latency: P95 tRPC procedure response time
Appendix: Current State Snapshot
| Dimension | Count |
|---|---|
| Database models | 47 |
| tRPC routers | 28 |
| tRPC procedures | ~200 (120Q + 80M) |
| Frontend routes | 34 |
| Domain components | 109+ |
| Shared UI components | 20+ |
| Unit test files | 62 |
| E2E test specs | 4 |
| Engine test coverage | 95% (gated) |
| Staffing test coverage | 90% (gated) |
| API router test coverage | ~5% (not gated) |
| CI/CD pipeline | None |
| Production Docker | None |
| Monitoring/APM | None |
| Completed phases | 9 |
| Known pain points | 24 (documented in LEARNINGS.md) |