225 lines
8.2 KiB
Markdown
225 lines
8.2 KiB
Markdown
# CI/CD Target Architecture
|
|
|
|
## Goal
|
|
|
|
This document captures the intended delivery model for CapaKraken without replacing the currently working manual production setup immediately.
|
|
|
|
The target state is:
|
|
|
|
1. CI validates every PR.
|
|
2. GitHub Actions builds immutable Docker images.
|
|
3. Staging and production pull those exact images from a registry.
|
|
4. Database migrations run as an explicit deploy step.
|
|
5. Traffic is considered safe only after the app answers `GET /api/ready`.
|
|
|
|
## Core Idea
|
|
|
|
The production host should stop building application code from a Git checkout. Instead, it should only:
|
|
|
|
- pull a versioned `app` image
|
|
- pull a matching `migrator` image
|
|
- run Prisma deploy migrations
|
|
- start the application container
|
|
- wait for readiness
|
|
|
|
That removes "works on the server but not in CI" drift and makes rollbacks much simpler.
|
|
|
|
## Delivery Flow
|
|
|
|
### 1. Pull Request Validation
|
|
|
|
The existing `CI` workflow continues to validate:
|
|
|
|
- architecture guardrails for SSE audience scoping
|
|
- typecheck
|
|
- lint
|
|
- unit tests
|
|
- build
|
|
- E2E
|
|
|
|
This remains the quality gate before merge.
|
|
|
|
The guardrail step currently enforces three invariants:
|
|
|
|
- no role-based SSE audience fan-out in [event-bus.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/sse/event-bus.ts)
|
|
- no role-derived subscription audiences in [subscription-policy.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/sse/subscription-policy.ts)
|
|
- no client-provided audience parsing in [route.ts](/home/hartmut/Documents/Copilot/capakraken/apps/web/src/app/api/sse/timeline/route.ts)
|
|
|
|
### 2. Image Build
|
|
|
|
The new manual workflow [release-image.yml](/home/hartmut/Documents/Copilot/capakraken/.github/workflows/release-image.yml) builds two images from [Dockerfile.prod](/home/hartmut/Documents/Copilot/capakraken/Dockerfile.prod):
|
|
|
|
- `runner` target as the production app image
|
|
- `migrator` target as the Prisma migration image
|
|
|
|
Recommended tag format:
|
|
|
|
- `sha-<git-commit>`
|
|
|
|
Example:
|
|
|
|
```text
|
|
ghcr.io/<owner>/capakraken-app:sha-abc123
|
|
ghcr.io/<owner>/capakraken-migrator:sha-abc123
|
|
```
|
|
|
|
### 3. Staging Deploy
|
|
|
|
The staging workflow [deploy-staging.yml](/home/hartmut/Documents/Copilot/capakraken/.github/workflows/deploy-staging.yml) is intended to:
|
|
|
|
1. connect to the staging host over SSH
|
|
2. copy the deploy assets
|
|
3. export `APP_IMAGE` and `MIGRATOR_IMAGE`
|
|
4. run [deploy-compose.sh](/home/hartmut/Documents/Copilot/capakraken/tooling/deploy/deploy-compose.sh)
|
|
|
|
The compose file used for this target flow is [docker-compose.cicd.yml](/home/hartmut/Documents/Copilot/capakraken/docker-compose.cicd.yml).
|
|
|
|
### 4. Production Promotion
|
|
|
|
The production workflow [deploy-prod.yml](/home/hartmut/Documents/Copilot/capakraken/.github/workflows/deploy-prod.yml) follows the same logic as staging, but the image tag is promoted manually.
|
|
|
|
That means production uses an image that was already built and can already have been exercised in staging.
|
|
|
|
## Required Infrastructure
|
|
|
|
### Minimum
|
|
|
|
- GitHub repository with Actions enabled
|
|
- GHCR or another container registry
|
|
- 1 Linux host with Docker and Docker Compose
|
|
- PostgreSQL
|
|
- Redis
|
|
- reverse proxy such as nginx
|
|
- SSH access from GitHub Actions to the host
|
|
|
|
### Recommended
|
|
|
|
- separate staging and production hosts
|
|
- GitHub Environments for `staging` and `production`
|
|
- required reviewer approval for `production`
|
|
- backup strategy for PostgreSQL volumes
|
|
- uptime monitoring and error tracking
|
|
|
|
## Secrets
|
|
|
|
### GitHub Environment Secrets
|
|
|
|
For `staging`:
|
|
|
|
- `STAGING_SSH_HOST`
|
|
- `STAGING_SSH_PORT`
|
|
- `STAGING_SSH_USER`
|
|
- `STAGING_SSH_KEY`
|
|
- `STAGING_DEPLOY_PATH`
|
|
- `STAGING_APP_HOST_PORT`
|
|
- `STAGING_GHCR_USERNAME`
|
|
- `STAGING_GHCR_TOKEN`
|
|
|
|
For `production`:
|
|
|
|
- `PROD_SSH_HOST`
|
|
- `PROD_SSH_PORT`
|
|
- `PROD_SSH_USER`
|
|
- `PROD_SSH_KEY`
|
|
- `PROD_DEPLOY_PATH`
|
|
- `PROD_APP_HOST_PORT`
|
|
- `PROD_GHCR_USERNAME`
|
|
- `PROD_GHCR_TOKEN`
|
|
|
|
### Host-side Files
|
|
|
|
Each target host should already have:
|
|
|
|
- `.env.production`
|
|
- Docker installed
|
|
- network access to the container registry
|
|
|
|
The repository now also contains a small host example at [tooling/deploy/.env.production.example](/home/hartmut/Documents/Copilot/capakraken/tooling/deploy/.env.production.example) and an operator note at [tooling/deploy/README.md](/home/hartmut/Documents/Copilot/capakraken/tooling/deploy/README.md).
|
|
|
|
### Minimum Host Bootstrap
|
|
|
|
For each target host, create a dedicated deploy directory such as `/opt/capakraken` and place these files there:
|
|
|
|
```text
|
|
docker-compose.cicd.yml
|
|
.env.production
|
|
tooling/deploy/deploy-compose.sh
|
|
```
|
|
|
|
`.env.production` should hold the long-lived runtime settings, including:
|
|
|
|
```env
|
|
POSTGRES_PASSWORD=<long-random-password>
|
|
NEXTAUTH_URL=https://capakraken.example.com
|
|
NEXTAUTH_SECRET=<long-random-secret>
|
|
```
|
|
|
|
GitHub Actions only injects the short-lived image references through `deploy.env`. The deploy script then loads both files before calling Docker Compose, so compose interpolation and container runtime env use the same source of truth.
|
|
|
|
### Runtime Secret Provisioning Policy
|
|
|
|
Production and staging secrets should be provisioned at the host or platform-secret layer, not through admin mutations and not through application database writes.
|
|
|
|
That includes at least:
|
|
|
|
```env
|
|
OPENAI_API_KEY=<optional-if-openai-used>
|
|
AZURE_OPENAI_API_KEY=<optional-if-azure-chat-used>
|
|
AZURE_DALLE_API_KEY=<optional-if-azure-image-gen-used>
|
|
GEMINI_API_KEY=<optional-if-gemini-used>
|
|
SMTP_PASSWORD=<required-if-smtp-auth-used>
|
|
ANONYMIZATION_SEED=<required-if-deterministic-anonymization-enabled>
|
|
```
|
|
|
|
Operational rule:
|
|
|
|
- keep these values in `.env.production` only for smaller self-managed hosts, or preferably in the host's secret manager / encrypted environment facility
|
|
- do not rotate or patch these values through `SystemSettings`
|
|
- use the admin settings page only to verify runtime source/status and to clear leftover legacy database copies
|
|
- after migration, legacy database secret fields should be empty in both staging and production
|
|
|
|
## Database Policy
|
|
|
|
For release environments, use:
|
|
|
|
```bash
|
|
pnpm --filter @capakraken/db db:migrate:deploy
|
|
```
|
|
|
|
Do not use `db:push` as the main production deployment mechanism. `db:push` is convenient for local development, but it does not give the release traceability that a migration-based deploy requires.
|
|
|
|
## Rollback Model
|
|
|
|
Rollback should be image-based:
|
|
|
|
1. choose the previous good `sha-...` tag
|
|
2. run the production deploy workflow again with that tag
|
|
3. confirm readiness
|
|
|
|
This is only safe when schema changes follow backwards-compatible expand and contract rules.
|
|
|
|
## How A Production Update Works
|
|
|
|
The intended production update path is:
|
|
|
|
1. merge to `main` after the existing CI workflow is green
|
|
2. run [release-image.yml](/home/hartmut/Documents/Copilot/capakraken/.github/workflows/release-image.yml) to build immutable `app` and `migrator` images tagged as `sha-<commit>`
|
|
3. run [deploy-staging.yml](/home/hartmut/Documents/Copilot/capakraken/.github/workflows/deploy-staging.yml) with that exact image tag
|
|
4. GitHub Actions uploads the deploy bundle to the staging host and writes a temporary `deploy.env`
|
|
5. [deploy-compose.sh](/home/hartmut/Documents/Copilot/capakraken/tooling/deploy/deploy-compose.sh) pulls images, starts PostgreSQL and Redis, runs Prisma deploy migrations, starts the new app container, and waits for `GET /api/ready`
|
|
6. after staging is accepted, run [deploy-prod.yml](/home/hartmut/Documents/Copilot/capakraken/.github/workflows/deploy-prod.yml) with the same tag
|
|
7. production repeats the same image-based flow, so the running artifact matches staging
|
|
|
|
That means the production host no longer builds from Git. It only receives a versioned image and starts it after migrations complete.
|
|
|
|
The same principle applies to secrets: the running container reads them from the deployment environment at start time, so an update only needs a new image tag unless secret material itself is being rotated.
|
|
|
|
## Current Status
|
|
|
|
The repository now contains the CI/CD scaffolding, but the existing manual production setup remains untouched:
|
|
|
|
- current manual compose flow: [docker-compose.prod.yml](/home/hartmut/Documents/Copilot/capakraken/docker-compose.prod.yml)
|
|
- current manual runbook: [ci-cd-manual.md](/home/hartmut/Documents/Copilot/capakraken/docs/ci-cd-manual.md)
|
|
|
|
This allows the team to introduce the new path gradually instead of switching production in one step.
|