CapaKraken/docs/cicd-target-architecture.md

# CI/CD Target Architecture

## Goal

This document captures the intended delivery model for CapaKraken without replacing the currently working manual production setup immediately.

The target state is:

1. CI validates every PR.
2. GitHub Actions builds immutable Docker images.
3. Staging and production pull those exact images from a registry.
4. Database migrations run as an explicit deploy step.
5. Traffic is considered safe only after the app answers `GET /api/ready`.

## Core Idea

The production host should stop building application code from a Git checkout. Instead, it should only:

- pull a versioned `app` image
- pull a matching `migrator` image
- run Prisma deploy migrations
- start the application container
- wait for readiness

That removes "works on the server but not in CI" drift and makes rollbacks much simpler.

## Delivery Flow

### 1. Pull Request Validation

The existing `CI` workflow continues to validate:

- architecture guardrails for SSE audience scoping
- typecheck
- lint
- unit tests
- build
- E2E

This remains the quality gate before merge.

The guardrail step currently enforces three invariants:

- no role-based SSE audience fan-out in [event-bus.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/sse/event-bus.ts)
- no role-derived subscription audiences in [subscription-policy.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/sse/subscription-policy.ts)
- no client-provided audience parsing in [route.ts](/home/hartmut/Documents/Copilot/capakraken/apps/web/src/app/api/sse/timeline/route.ts)

### 2. Image Build

The new manual workflow [release-image.yml](/home/hartmut/Documents/Copilot/capakraken/.github/workflows/release-image.yml) builds two images from [Dockerfile.prod](/home/hartmut/Documents/Copilot/capakraken/Dockerfile.prod):

- `runner` target as the production app image
- `migrator` target as the Prisma migration image

Recommended tag format:

- `sha-<git-commit>`

Example:

```text
ghcr.io/<owner>/capakraken-app:sha-abc123
ghcr.io/<owner>/capakraken-migrator:sha-abc123
```

### 3. Staging Deploy

The staging workflow [deploy-staging.yml](/home/hartmut/Documents/Copilot/capakraken/.github/workflows/deploy-staging.yml) is intended to:

1. connect to the staging host over SSH
2. copy the deploy assets
3. export `APP_IMAGE` and `MIGRATOR_IMAGE`
4. run [deploy-compose.sh](/home/hartmut/Documents/Copilot/capakraken/tooling/deploy/deploy-compose.sh)

The compose file used for this target flow is [docker-compose.cicd.yml](/home/hartmut/Documents/Copilot/capakraken/docker-compose.cicd.yml).

### 4. Production Promotion

The production workflow [deploy-prod.yml](/home/hartmut/Documents/Copilot/capakraken/.github/workflows/deploy-prod.yml) follows the same logic as staging, but the image tag is promoted manually.

That means production uses an image that was already built and can already have been exercised in staging.

## Required Infrastructure

### Minimum

- GitHub repository with Actions enabled
- GHCR or another container registry
- 1 Linux host with Docker and Docker Compose
- PostgreSQL
- Redis
- reverse proxy such as nginx
- SSH access from GitHub Actions to the host

### Recommended

- separate staging and production hosts
- GitHub Environments for `staging` and `production`
- required reviewer approval for `production`
- backup strategy for PostgreSQL volumes
- uptime monitoring and error tracking

## Secrets

### GitHub Environment Secrets

For `staging`:

- `STAGING_SSH_HOST`
- `STAGING_SSH_PORT`
- `STAGING_SSH_USER`
- `STAGING_SSH_KEY`
- `STAGING_DEPLOY_PATH`
- `STAGING_APP_HOST_PORT`
- `STAGING_GHCR_USERNAME`
- `STAGING_GHCR_TOKEN`

For `production`:

- `PROD_SSH_HOST`
- `PROD_SSH_PORT`
- `PROD_SSH_USER`
- `PROD_SSH_KEY`
- `PROD_DEPLOY_PATH`
- `PROD_APP_HOST_PORT`
- `PROD_GHCR_USERNAME`
- `PROD_GHCR_TOKEN`

### Host-side Files

Each target host should already have:

- `.env.production`
- Docker installed
- network access to the container registry

The repository now also contains a small host example at [tooling/deploy/.env.production.example](/home/hartmut/Documents/Copilot/capakraken/tooling/deploy/.env.production.example) and an operator note at [tooling/deploy/README.md](/home/hartmut/Documents/Copilot/capakraken/tooling/deploy/README.md).

### Minimum Host Bootstrap

For each target host, create a dedicated deploy directory such as `/opt/capakraken` and place these files there:

```text
docker-compose.cicd.yml
.env.production
tooling/deploy/deploy-compose.sh
```

`.env.production` should hold the long-lived runtime settings, including:

```env
POSTGRES_PASSWORD=<long-random-password>
NEXTAUTH_URL=https://capakraken.example.com
NEXTAUTH_SECRET=<long-random-secret>
```

GitHub Actions only injects the short-lived image references through `deploy.env`. The deploy script then loads both files before calling Docker Compose, so compose interpolation and container runtime env use the same source of truth.

### Runtime Secret Provisioning Policy

Production and staging secrets should be provisioned at the host or platform-secret layer, not through admin mutations and not through application database writes.

That includes at least:

```env
OPENAI_API_KEY=<optional-if-openai-used>
AZURE_OPENAI_API_KEY=<optional-if-azure-chat-used>
AZURE_DALLE_API_KEY=<optional-if-azure-image-gen-used>
GEMINI_API_KEY=<optional-if-gemini-used>
SMTP_PASSWORD=<required-if-smtp-auth-used>
ANONYMIZATION_SEED=<required-if-deterministic-anonymization-enabled>
```

Operational rule:

- keep these values in `.env.production` only for smaller self-managed hosts, or preferably in the host's secret manager / encrypted environment facility
- do not rotate or patch these values through `SystemSettings`
- use the admin settings page only to verify runtime source/status and to clear leftover legacy database copies
- after migration, legacy database secret fields should be empty in both staging and production

## Database Policy

For release environments, use:

```bash
pnpm --filter @capakraken/db db:migrate:deploy
```

Do not use `db:push` as the main production deployment mechanism. `db:push` is convenient for local development, but it does not give the release traceability that a migration-based deploy requires.

## Rollback Model

Rollback should be image-based:

1. choose the previous good `sha-...` tag
2. run the production deploy workflow again with that tag
3. confirm readiness

This is only safe when schema changes follow backwards-compatible expand and contract rules.

## How A Production Update Works

The intended production update path is:

1. merge to `main` after the existing CI workflow is green
2. run [release-image.yml](/home/hartmut/Documents/Copilot/capakraken/.github/workflows/release-image.yml) to build immutable `app` and `migrator` images tagged as `sha-<commit>`
3. run [deploy-staging.yml](/home/hartmut/Documents/Copilot/capakraken/.github/workflows/deploy-staging.yml) with that exact image tag
4. GitHub Actions uploads the deploy bundle to the staging host and writes a temporary `deploy.env`
5. [deploy-compose.sh](/home/hartmut/Documents/Copilot/capakraken/tooling/deploy/deploy-compose.sh) pulls images, starts PostgreSQL and Redis, runs Prisma deploy migrations, starts the new app container, and waits for `GET /api/ready`
6. after staging is accepted, run [deploy-prod.yml](/home/hartmut/Documents/Copilot/capakraken/.github/workflows/deploy-prod.yml) with the same tag
7. production repeats the same image-based flow, so the running artifact matches staging

That means the production host no longer builds from Git. It only receives a versioned image and starts it after migrations complete.

The same principle applies to secrets: the running container reads them from the deployment environment at start time, so an update only needs a new image tag unless secret material itself is being rotated.

## Current Status

The repository now contains the CI/CD scaffolding, but the existing manual production setup remains untouched:

- current manual compose flow: [docker-compose.prod.yml](/home/hartmut/Documents/Copilot/capakraken/docker-compose.prod.yml)
- current manual runbook: [ci-cd-manual.md](/home/hartmut/Documents/Copilot/capakraken/docs/ci-cd-manual.md)

This allows the team to introduce the new path gradually instead of switching production in one step.