Files
CapaKraken/docs/ci-cd-manual.md

195 lines
4.8 KiB
Markdown

# CapaKraken CI/CD Manual
## Overview
This is the operational runbook for the canonical CapaKraken delivery path:
1. CI validates every PR.
2. Every push to `main` publishes immutable release images.
3. Staging deploys one `sha-<commit>` tag.
4. Production promotes the same tag.
5. The host never builds application code from Git.
## 1. CI Gate
The merge gate is [ci.yml](/home/hartmut/Documents/Copilot/capakraken/.github/workflows/ci.yml).
It covers:
- architecture guardrails
- typecheck
- lint
- unit tests
- build
- E2E
Before merging, all required checks must pass.
Useful local commands:
```bash
pnpm --filter @capakraken/web exec tsc --project tsconfig.typecheck.json --noEmit
pnpm lint
pnpm test:unit
pnpm --filter @capakraken/web exec next build
```
## 2. Image Release
[release-image.yml](/home/hartmut/Documents/Copilot/capakraken/.github/workflows/release-image.yml) runs automatically on every push to `main`.
It publishes:
- `ghcr.io/<owner>/<repo>-app:sha-<commit>`
- `ghcr.io/<owner>/<repo>-migrator:sha-<commit>`
The workflow is also callable manually if a rebuild or tag override is needed.
## 3. Host Bootstrap
Each deploy target should have a dedicated directory such as `/opt/capakraken` containing:
```text
docker-compose.prod.yml
.env.production
deploy.env
tooling/deploy/deploy-compose.sh
```
Use these examples from the repo:
- [tooling/deploy/.env.production.example](/home/hartmut/Documents/Copilot/capakraken/tooling/deploy/.env.production.example)
- [tooling/deploy/deploy.env.example](/home/hartmut/Documents/Copilot/capakraken/tooling/deploy/deploy.env.example)
Important host-side rules:
- keep `RATE_LIMIT_BACKEND=redis`
- keep runtime secrets in `.env.production` or the platform secret layer
- do not rotate runtime secrets through admin settings
- ensure the host can pull from `ghcr.io`
Generate a secure `NEXTAUTH_SECRET` with:
```bash
openssl rand -base64 32
```
## 4. Staging Deployment
Standard path:
1. merge to `main`
2. wait for [release-image.yml](/home/hartmut/Documents/Copilot/capakraken/.github/workflows/release-image.yml) to publish `sha-<commit>`
3. run [deploy-staging.yml](/home/hartmut/Documents/Copilot/capakraken/.github/workflows/deploy-staging.yml) with that tag
The workflow uploads:
- [docker-compose.prod.yml](/home/hartmut/Documents/Copilot/capakraken/docker-compose.prod.yml)
- [tooling/deploy](/home/hartmut/Documents/Copilot/capakraken/tooling/deploy/README.md)
- a short-lived `deploy.env`
On the host, [deploy-compose.sh](/home/hartmut/Documents/Copilot/capakraken/tooling/deploy/deploy-compose.sh):
1. validates the rendered compose file
2. pulls `APP_IMAGE` and `MIGRATOR_IMAGE`
3. starts PostgreSQL and Redis
4. runs Prisma migrations with the `migrator` image
5. starts the app
6. waits for `GET /api/ready`
## 5. Production Promotion
After staging is accepted:
1. run [deploy-prod.yml](/home/hartmut/Documents/Copilot/capakraken/.github/workflows/deploy-prod.yml)
2. use the exact same `sha-<commit>` tag
3. verify `GET /api/ready`
Production must promote the already-tested image, not rebuild from source.
## 6. Manual Host Dry Run
If you need to verify the host outside GitHub Actions:
```bash
cp tooling/deploy/.env.production.example .env.production
cp tooling/deploy/deploy.env.example deploy.env
# fill in real secrets and image refs first
set -a
. ./deploy.env
set +a
bash tooling/deploy/deploy-compose.sh staging
```
## 7. Health Endpoints
### GET `/api/health`
Process liveness only. Use it for coarse uptime checks.
### GET `/api/ready`
Checks PostgreSQL and Redis connectivity. Use it for deploy readiness and traffic admission.
For deploys, `/api/ready` is the source of truth.
## 8. Rollback
Rollback is image-based:
1. choose the previous healthy `sha-<commit>`
2. rerun the staging or production deploy workflow with that tag
3. confirm `GET /api/ready`
Schema changes still need expand-and-contract discipline for rollback safety.
## 9. Troubleshooting
### CI failure
Run the failing command locally:
```bash
pnpm --filter @capakraken/web exec tsc --project tsconfig.typecheck.json --noEmit
pnpm lint
pnpm test:unit
pnpm --filter @capakraken/web exec next build
```
### Deploy fails before container start
Check the rendered compose configuration on the host:
```bash
docker compose -f docker-compose.prod.yml config -q
```
Then verify `.env.production` and `deploy.env`.
### App never becomes ready
Check:
```bash
docker compose -f docker-compose.prod.yml ps
docker compose -f docker-compose.prod.yml logs --tail 200 app
curl -s http://127.0.0.1:${APP_HOST_PORT:-3000}/api/ready
```
### Database migration failure
Inspect the migrator logs:
```bash
docker compose -f docker-compose.prod.yml run --rm migrator
```
### Registry pull failure
Verify `GHCR_USERNAME` and `GHCR_TOKEN`, then test:
```bash
printf '%s\n' "$GHCR_TOKEN" | docker login ghcr.io -u "$GHCR_USERNAME" --password-stdin
```