feat: Sprint 0 — CI/CD pipeline, production Docker, health checks

CI Pipeline (.github/workflows/ci.yml):
- 5 jobs: typecheck, lint, test, build, e2e (parallel where possible)
- PostgreSQL 16 + Redis 7 service containers for test/e2e
- pnpm store, Turborepo, Playwright browser caching
- Concurrency groups cancel in-progress runs

Production Docker:
- Dockerfile.prod: 3-stage build (deps → build → runtime ~150MB)
- docker-compose.prod.yml: postgres + redis + app with health checks
- .dockerignore for fast builds
- next.config.ts: output: "standalone" for minimal runtime

Health Check Endpoints:
- GET /api/health — liveness probe (200 OK, no deps)
- GET /api/ready — readiness probe (postgres + redis connectivity)

Documentation:
- docs/ci-cd-manual.md — full pipeline manual with troubleshooting
- plan.md — Product Owner strategic plan (bottlenecks, growth, automation)

Co-Authored-By: claude-flow <ruv@ruv.net>
This commit is contained in:
2026-03-19 20:33:18 +01:00
parent c02f453679
commit 0d78fe1770
9 changed files with 1070 additions and 210 deletions
+316
View File
@@ -0,0 +1,316 @@
# Planarchy CI/CD Manual
## Overview
Planarchy uses GitHub Actions for continuous integration and Docker for deployment. This document covers the full pipeline from code push to production.
---
## 1. CI Pipeline (Automatic on every PR)
### What triggers it
| Event | Trigger |
|-------|---------|
| Pull request to `main` | All CI jobs run |
| Push to `main` | All CI jobs run |
### Jobs and their purpose
```
PR opened / pushed
├──→ typecheck (tsc --noEmit, ~40s)
├──→ lint (ESLint via Turborepo, ~20s)
├──→ test (Vitest unit tests, ~60s, needs PostgreSQL + Redis)
└──→ build (next build, ~90s, runs after typecheck)
└──→ e2e (Playwright, ~3-5min, runs after build)
```
**typecheck, lint, and test run in parallel** for speed. Build waits for typecheck. E2E waits for build.
### What each job checks
| Job | Command | What it catches |
|-----|---------|----------------|
| **typecheck** | `pnpm --filter @planarchy/web exec tsc --noEmit` | Type errors across the full web app |
| **lint** | `pnpm lint` | Code style violations, unused imports, etc. |
| **test** | `pnpm test:unit` | Unit test failures in engine, staffing, API, shared |
| **build** | `pnpm --filter @planarchy/web exec next build` | SSR errors, dynamic import issues, bundle problems |
| **e2e** | `pnpm test:e2e` | End-to-end user flow regressions |
### Required status checks
Before merging a PR, **all 5 jobs must pass**. Configure this in GitHub Settings > Branches > Branch protection rules > Require status checks.
### Caching
The pipeline caches these artifacts to speed up subsequent runs:
| Cache | Key | Saves |
|-------|-----|-------|
| pnpm store | `pnpm-lock.yaml` hash | ~30s install time |
| Turborepo | `.turbo` directory | ~60s on unchanged packages |
| Playwright browsers | Playwright version | ~45s browser download |
---
## 2. Local Development Quality Gates
Run these before pushing to catch issues early:
```bash
# Quick check (< 2 min)
pnpm --filter @planarchy/web exec tsc --noEmit && pnpm lint
# Full check (< 3 min)
pnpm test:unit
# Full check including build (< 5 min)
pnpm --filter @planarchy/web exec next build
```
### Pre-commit hook (optional)
You can add a Git pre-commit hook to run the quick check automatically:
```bash
# .husky/pre-commit
pnpm --filter @planarchy/web exec tsc --noEmit
pnpm lint
```
---
## 3. Health Check Endpoints
Two endpoints are available for monitoring:
### GET `/api/health` — Liveness Probe
Returns 200 if the Node.js process is running. No external dependencies checked.
```json
{ "status": "ok", "timestamp": "2026-03-19T10:00:00.000Z" }
```
**Use for:** Kubernetes/Docker liveness probe, uptime monitoring.
### GET `/api/ready` — Readiness Probe
Checks PostgreSQL and Redis connectivity. Returns 200 if all services are reachable, 503 if not.
```json
// Healthy
{ "status": "ready", "postgres": "ok", "redis": "ok" }
// Unhealthy
{ "status": "not_ready", "postgres": "ok", "redis": "error" }
```
**Use for:** Kubernetes/Docker readiness probe, load balancer health checks, nginx upstream checks.
---
## 4. Production Docker Build
### Building the production image
```bash
# Build the image
docker build -f Dockerfile.prod -t planarchy:latest .
# Test it locally
docker compose -f docker-compose.prod.yml up -d
```
### Image details
| Property | Value |
|----------|-------|
| Base | `node:20-bookworm-slim` |
| Size | ~150-200 MB (vs ~1.5 GB dev image) |
| Output | Next.js standalone mode |
| Healthcheck | `curl -f http://localhost:3000/api/health` |
| Port | 3000 (internal), mapped to 3100 externally |
### Environment variables
The production image requires these environment variables:
```env
# Required
DATABASE_URL=postgresql://user:pass@host:5432/planarchy
REDIS_URL=redis://host:6379
NEXTAUTH_URL=https://planarchy.your-domain.com
NEXTAUTH_SECRET=<random-32-char-string>
# Optional
SENTRY_DSN=https://xxx@sentry.io/xxx
SMTP_HOST=smtp.example.com
SMTP_PORT=587
SMTP_USER=notifications@example.com
SMTP_PASSWORD=<password>
SMTP_FROM=Planarchy <notifications@example.com>
```
Generate a secure `NEXTAUTH_SECRET`:
```bash
openssl rand -base64 32
```
---
## 5. Deployment
### docker-compose (simplest)
```bash
# On your server
git pull
docker compose -f docker-compose.prod.yml up -d --build
# Run database migrations
docker compose -f docker-compose.prod.yml exec app \
npx prisma db push --skip-generate
# Seed initial data (first deployment only)
docker compose -f docker-compose.prod.yml exec app \
npx prisma db seed
```
### Manual deployment (current setup)
Since `planarchy.hartmut-noerenberg.com` runs behind nginx:
```bash
# On the server
cd /home/hartmut/Documents/Copilot/planarchy
git pull origin main
pnpm install
pnpm --filter @planarchy/db exec prisma generate
pnpm --filter @planarchy/web exec next build
rm -rf apps/web/.next/cache # clear stale cache
# Restart the app (systemd, pm2, or manual)
fuser -k 3100/tcp 2>/dev/null
PORT=3100 pnpm --filter @planarchy/web start &
```
### nginx configuration
The existing nginx reverse proxy should forward to port 3100:
```nginx
server {
server_name planarchy.hartmut-noerenberg.com;
location / {
proxy_pass http://127.0.0.1:3100;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# SSE support (keep connection open)
proxy_read_timeout 86400s;
proxy_buffering off;
}
}
```
---
## 6. Monitoring Setup
### Sentry (error tracking)
After creating a Sentry project, add the DSN to `.env.production`:
```env
SENTRY_DSN=https://xxx@sentry.io/xxx
```
Errors are automatically captured by the Sentry integration in Next.js.
### Uptime monitoring
Point an external monitor (UptimeRobot, Better Stack, etc.) at:
```
https://planarchy.hartmut-noerenberg.com/api/health
```
Alert if status code != 200 for more than 2 consecutive checks.
---
## 7. Troubleshooting
### CI job fails: "tsc --noEmit"
TypeScript error in the web app. Run locally:
```bash
pnpm --filter @planarchy/web exec tsc --noEmit
```
### CI job fails: "test:unit"
Unit test failure. Run locally:
```bash
pnpm test:unit
```
### CI job fails: "next build"
Build error (often `ssr: false` in Server Components, missing exports). Run locally:
```bash
pnpm --filter @planarchy/web exec next build
```
### CI job fails: "e2e"
Playwright test failure. Check the HTML report artifact in the GitHub Actions run.
### Production: 502 Bad Gateway
The Next.js process isn't running. Check:
```bash
ss -tlnp | grep 3100 # Is anything listening?
tail -50 /tmp/planarchy-dev.log # Check app logs
```
Restart:
```bash
fuser -k 3100/tcp 2>/dev/null
pnpm dev & # or pnpm start for production mode
```
### Production: 500 Internal Server Error
Usually a stale Prisma client after schema changes:
```bash
pnpm --filter @planarchy/db exec prisma generate
rm -rf apps/web/.next
pnpm --filter @planarchy/web exec next build
# Restart the server
```
### Database connection issues
Check the `/api/ready` endpoint:
```bash
curl -s https://planarchy.hartmut-noerenberg.com/api/ready | jq .
```
If `postgres: "error"`, verify:
```bash
docker ps | grep postgres # Is container running?
psql -h localhost -p 5433 -U planarchy -d planarchy # Can you connect?
```