Files
CapaKraken/plan.md
T
Hartmut 0d78fe1770 feat: Sprint 0 — CI/CD pipeline, production Docker, health checks
CI Pipeline (.github/workflows/ci.yml):
- 5 jobs: typecheck, lint, test, build, e2e (parallel where possible)
- PostgreSQL 16 + Redis 7 service containers for test/e2e
- pnpm store, Turborepo, Playwright browser caching
- Concurrency groups cancel in-progress runs

Production Docker:
- Dockerfile.prod: 3-stage build (deps → build → runtime ~150MB)
- docker-compose.prod.yml: postgres + redis + app with health checks
- .dockerignore for fast builds
- next.config.ts: output: "standalone" for minimal runtime

Health Check Endpoints:
- GET /api/health — liveness probe (200 OK, no deps)
- GET /api/ready — readiness probe (postgres + redis connectivity)

Documentation:
- docs/ci-cd-manual.md — full pipeline manual with troubleshooting
- plan.md — Product Owner strategic plan (bottlenecks, growth, automation)

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-03-19 20:33:18 +01:00

266 lines
14 KiB
Markdown

# Planarchy — Product Owner Strategic Plan
> Consolidated analysis from 4 expert agents: Roadmap, API Surface, Frontend UX, and Test Infrastructure.
> Date: 2026-03-19
---
## Executive Summary
Planarchy has reached **Phase 9** with a mature core: timeline planning, allocation management, estimating, vacation pro, skill matrix, RBAC, and chargeability reporting. The product covers 34 routes, 47 DB models, ~200 tRPC procedures, and 109+ domain components.
**However, the product has critical gaps preventing production readiness and growth:**
| Dimension | Score | Verdict |
|-----------|-------|---------|
| Feature completeness | 85% | Strong core, thin edges (staffing, reporting) |
| Code quality | 90% | Zero TODOs, clean architecture, typed end-to-end |
| Test coverage | 55% | Engine excellent, API routers ~5%, no integration tests |
| CI/CD & DevOps | 10% | No pipeline, no prod Docker, no monitoring |
| UX polish | 75% | Deep timeline/estimates, but gaps in staffing workflow |
| Growth readiness | 40% | No scenario planning, no integrations, no mobile |
---
## Part 1: Bottlenecks
### 1.1 Production Readiness Blockers (Critical)
| # | Bottleneck | Impact | Severity |
|---|-----------|--------|----------|
| B1 | **No CI/CD pipeline** — tests, lint, tsc not automated on PR | Regressions ship undetected | CRITICAL |
| B2 | **No production Docker image** — only dev Dockerfile exists | Cannot deploy containerized | CRITICAL |
| B3 | **No monitoring/logging** — no Sentry, no Pino, no APM | Blind in production, cannot debug | CRITICAL |
| B4 | **No health check endpoints**`/health`, `/ready` missing | Cannot detect/recover from failures | HIGH |
| B5 | **API router test coverage ~5%** — 28 routers, almost no unit tests | Mutations untested at API boundary | HIGH |
### 1.2 UX Bottlenecks
| # | Bottleneck | Impact | Severity |
|---|-----------|--------|----------|
| B6 | **Staffing -> Allocation gap** — match results don't link to allocation creation | Users must manually recreate allocations after finding matches | HIGH |
| B7 | **Reporting is thin** — only 2 report types (chargeability, PDF allocations) | Finance/PMs can't self-serve custom reports | MEDIUM |
| B8 | **No bulk operations in list views** — no multi-select outside timeline | Slow to manage 10+ resources/projects at once | MEDIUM |
| B9 | **Dashboard metrics computed live** — no caching/pre-computation | Slow dashboard load with growing data | MEDIUM |
| B10 | **Timeline 3.3K LOC ecosystem** — ResourcePanel 1035, ProjectPanel 1315 LOC | Hard to maintain, risky to modify | LOW |
### 1.3 Architecture Bottlenecks
| # | Bottleneck | Impact | Severity |
|---|-----------|--------|----------|
| B11 | **Prisma client cache invalidation** — dev server restart required after schema changes | Developer friction, CI complexity | MEDIUM |
| B12 | **No webhook/event outbound** — SSE event bus exists but no external subscriptions | Cannot notify external systems (Slack, Jira) | MEDIUM |
| B13 | **No soft-delete strategy** — mixed approach (isActive, status, hard delete) | Data loss risk, no audit trail on deletions | LOW |
| B14 | **Rate card lookup manual in estimates** — no auto-lookup by resource chapter/level | Estimate creation slower than needed | LOW |
---
## Part 2: Growth Potential
### 2.1 High-Value Feature Opportunities
#### Tier 1 — Quick Wins (1-3 days each)
| # | Feature | Value | Effort |
|---|---------|-------|--------|
| G1 | **Staffing "Assign" button** — pre-populate allocation modal from match result | Closes biggest UX gap, saves 5+ clicks per staffing decision | 1-2 days |
| G2 | **Dashboard caching** — pre-compute metrics, invalidate on SSE events | 3-5x dashboard load speed improvement | 1-2 days |
| G3 | **Bulk list operations** — multi-select + context menu on resources/projects | Enables batch edit, export, status change | 2-3 days |
| G4 | **Health check endpoints**`/api/health` (liveness), `/api/ready` (DB + Redis) | Production deployment prerequisite | 0.5 day |
#### Tier 2 — Strategic Features (1-2 weeks each)
| # | Feature | Value | Effort |
|---|---------|-------|--------|
| G5 | **Scenario/What-If Planning** — alternate staffing mixes, cost simulations | Differentiation for PMs and finance; leverages existing engine | 1-2 weeks |
| G6 | **Skill Marketplace** — searchable skill inventory, gap heat map, hiring priorities | High leverage from existing skill matrix; enables org-wide planning | 1 week |
| G7 | **Custom Report Builder** — drag columns, pivot, grouping, scheduled exports | Unlocks self-service analytics for finance and executives | 1-2 weeks |
| G8 | **Collaboration Layer** — inline comments on estimates, @mention, approval feedback | Enables cross-functional workflows (finance, PM, staffing) | 1-2 weeks |
#### Tier 3 — Market Differentiators (2-4 weeks each)
| # | Feature | Value | Effort |
|---|---------|-------|--------|
| G9 | **AI-Powered Insights** — auto-suggest staffing, anomaly detection, narrative reports | Leverages existing Azure OpenAI integration; executive decision support | 2-3 weeks |
| G10 | **External Integrations** — Jira/Linear sync, Slack notifications, Google Calendar | Stickiness; connects Planarchy into existing workflows | 2-4 weeks |
| G11 | **Mobile Companion** — PWA with quick-view (status, gaps, approvals, push notifications) | Engagement for field PMs and remote staff | 3-4 weeks |
| G12 | **Dispo V2 Clean-Slate Import** — design doc + tickets exist, ready for implementation | Unblocks migration from legacy system; critical for customer onboarding | 1-2 weeks |
### 2.2 Missing Dashboard Widgets
| Widget | Purpose | Effort |
|--------|---------|--------|
| Budget spend forecast | Forward-looking actuals vs budget trend line | 2 days |
| Team utilization heatmap | Resource x week grid with color intensity | 2 days |
| Skill gap analysis | Required vs available skills across open demands | 3 days |
| Project health scorecard | On-time, on-budget, quality composite score | 2 days |
| Hiring pipeline | Forecast unfilled demand 3-6 months out | 3 days |
---
## Part 3: Automation Potential
### 3.1 Development Workflow Automation
| # | Automation | Current State | Target | Effort |
|---|-----------|--------------|--------|--------|
| A1 | **CI/CD Pipeline** | None | GitHub Actions: test + lint + tsc on PR, build + deploy on merge | 1-2 days |
| A2 | **Dependency scanning** | None | Dependabot + npm audit in CI | 0.5 day |
| A3 | **E2E test suite expansion** | 4 specs (auth, timeline, projects, resources) | 20+ specs covering key user flows | 1 week |
| A4 | **API integration tests** | ~5% router coverage | 80% coverage with mock DB layer | 1-2 weeks |
| A5 | **Coverage gates** | Engine 95%, staffing 90%, others none | All packages minimum 80% | 2 days config |
### 3.2 Business Process Automation
| # | Automation | Current Manual Process | Automated Process | Effort |
|---|-----------|----------------------|-------------------|--------|
| A6 | **Auto-staffing suggestions** | PM manually searches for resources per demand | System proposes top-3 matches when demand is created | 3 days |
| A7 | **Vacation conflict alerts** | Manager manually checks team calendar before approving | Auto-detect overlap > threshold, flag in approval flow | 2 days |
| A8 | **Budget overrun notifications** | Finance checks dashboards manually | SSE-triggered notification when project hits 80%/100% budget | 1 day |
| A9 | **Estimate approval reminders** | Verbal follow-up | Scheduled notification after N days in SUBMITTED status | 1 day |
| A10 | **Chargeability alerts** | Monthly manual review | Weekly auto-email when resource chargeability drops below target | 2 days |
| A11 | **Rate card auto-apply** | Manual rate lookup when creating estimate demand lines | Auto-fill LCR/UCR from rate card by resource chapter + level + client | 2 days |
| A12 | **Public holiday auto-import** | Admin manually batch-creates per year | Auto-generate on year rollover based on country/state config | 1 day |
### 3.3 Monitoring & Observability Automation
| # | Automation | Target | Effort |
|---|-----------|--------|--------|
| A13 | **Structured logging** (Pino) | All API requests logged with correlation ID | 2 days |
| A14 | **Error tracking** (Sentry) | Unhandled exceptions captured with context | 1 day |
| A15 | **Performance monitoring** | Slow query detection, API response time tracking | 2 days |
| A16 | **Uptime monitoring** | External health check probe, alerting | 0.5 day |
---
## Part 4: Prioritized Roadmap
### Sprint 0: Production Foundation (Week 1)
**Goal:** Unblock production deployment.
- [ ] **A1** — GitHub Actions CI pipeline (test + lint + tsc + build)
- [ ] **G4** — Health check endpoints (`/api/health`, `/api/ready`)
- [ ] **A14** — Sentry error tracking integration
- [ ] **A13** — Pino structured logging in API layer
- [ ] Production Dockerfile (multi-stage, distroless base)
- [ ] docker-compose.prod.yml with env-based config
- [ ] Database backup strategy (pg_dump cron + S3)
**Acceptance:** `main` branch has green CI, production image builds, errors are captured.
### Sprint 1: Quick Wins (Week 2)
**Goal:** Close the biggest UX gaps and improve daily workflows.
- [ ] **G1** — Staffing "Assign" button (match -> allocation in 1 click)
- [ ] **G2** — Dashboard metric caching (Redis-backed, SSE-invalidated)
- [ ] **G3** — Bulk operations on resource/project lists
- [ ] **A8** — Budget overrun notifications (80% + 100% thresholds)
- [ ] **A9** — Estimate approval reminders (auto-notify after 3 days)
**Acceptance:** Staffing-to-allocation is 1 click, dashboard loads <500ms, bulk select works.
### Sprint 2: Test Coverage & Stability (Week 3)
**Goal:** Harden the codebase for confident iteration.
- [ ] **A4** — API router integration tests (target 15 most-used routers)
- [ ] **A5** — Coverage gates: api + application packages at 80%
- [ ] **A3** — E2E expansion: 10 new specs (estimate lifecycle, vacation flow, bulk ops, filters)
- [ ] **A2** — Dependabot + npm audit in CI
**Acceptance:** `pnpm test:unit` covers all routers, E2E suite runs in CI, zero high-severity vulnerabilities.
### Sprint 3: Automation & Intelligence (Week 4-5)
**Goal:** Automate repetitive decisions, surface insights proactively.
- [ ] **A6** — Auto-staffing suggestions on demand creation
- [ ] **A7** — Vacation conflict detection in approval flow
- [ ] **A10** — Weekly chargeability alerts
- [ ] **A11** — Rate card auto-apply in estimate demand lines
- [ ] **A12** — Public holiday auto-import on year rollover
- [ ] **G6** — Skill marketplace MVP (searchable inventory + gap heat map)
**Acceptance:** Demands auto-suggest resources, vacation conflicts auto-flagged, rate cards auto-filled.
### Sprint 4: Strategic Features (Week 6-8)
**Goal:** Build differentiation features that create competitive moat.
- [ ] **G5** — Scenario/what-if planning (staffing mix simulator)
- [ ] **G7** — Custom report builder MVP (column picker, filters, export)
- [ ] **G8** — Collaboration layer (comments on estimates, @mention)
- [ ] **G12** — Dispo V2 clean-slate import (leverage existing design docs + tickets)
- [ ] Dashboard new widgets: budget forecast, skill gap, project health scorecard
**Acceptance:** PMs can simulate staffing scenarios, finance can build custom reports, Dispo import onboards first customer.
### Sprint 5: Market Expansion (Week 9-12)
**Goal:** Expand the platform beyond core planning.
- [ ] **G9** — AI insights: auto-staffing, anomaly detection, narrative summaries
- [ ] **G10** — Jira/Linear integration + Slack notifications
- [ ] **G11** — Mobile PWA companion
- [ ] **A15** — Performance monitoring + load testing baseline
- [ ] Advanced: multi-tenant architecture planning
**Acceptance:** AI suggestions active, Jira sync live, mobile app installable.
---
## Part 5: Risk Register
| # | Risk | Probability | Impact | Mitigation |
|---|------|-------------|--------|------------|
| R1 | Production deployment without CI catches regressions | HIGH | CRITICAL | Sprint 0 is mandatory before any feature work |
| R2 | Timeline 3.3K LOC becomes unmaintainable | MEDIUM | HIGH | Decompose into sub-hook modules when next touching timeline |
| R3 | Dashboard performance degrades with data growth | MEDIUM | MEDIUM | G2 (caching) in Sprint 1; monitor query times |
| R4 | Prisma schema changes break dev workflow | HIGH | LOW | Automate restart in dev scripts (already documented) |
| R5 | Skill matrix AI costs grow with usage | LOW | MEDIUM | Add token budget tracking in SystemSettings |
| R6 | No data backup strategy | MEDIUM | CRITICAL | Add pg_dump cron + S3 upload in Sprint 0 |
| R7 | Single-point-of-failure (1 dev, 1 server) | HIGH | CRITICAL | Document architecture, automate deployment, enable team onboarding |
---
## Part 6: Key Metrics to Track
### Product Metrics
- **Time-to-staff**: Minutes from demand creation to resource assignment
- **Estimate turnaround**: Days from estimate creation to approval
- **Vacation approval latency**: Hours from request to decision
- **Dashboard load time**: P95 response time for dashboard page
- **Chargeability accuracy**: Forecast vs actual deviation %
### Engineering Metrics
- **Test coverage**: % by package (target: all >=80%)
- **CI green rate**: % of PRs passing all gates
- **Build time**: Minutes for full `next build`
- **Error rate**: Sentry exceptions per hour
- **API latency**: P95 tRPC procedure response time
---
## Appendix: Current State Snapshot
| Dimension | Count |
|-----------|-------|
| Database models | 47 |
| tRPC routers | 28 |
| tRPC procedures | ~200 (120Q + 80M) |
| Frontend routes | 34 |
| Domain components | 109+ |
| Shared UI components | 20+ |
| Unit test files | 62 |
| E2E test specs | 4 |
| Engine test coverage | 95% (gated) |
| Staffing test coverage | 90% (gated) |
| API router test coverage | ~5% (not gated) |
| CI/CD pipeline | None |
| Production Docker | None |
| Monitoring/APM | None |
| Completed phases | 9 |
| Known pain points | 24 (documented in LEARNINGS.md) |