refactor(settings): adopt environment-only runtime secret flow
This commit is contained in:
@@ -11,7 +11,7 @@ At the same time, the codebase still carries several risks that are typical of f
|
||||
|
||||
1. some critical cross-cutting concerns are only partially productized
|
||||
2. several files and routers have grown beyond comfortable ownership size
|
||||
3. runtime configuration and secret handling are still too application-database centric
|
||||
3. runtime secret handling is now materially cleaner, but the repo still needs to standardize the operational source of truth around that model
|
||||
4. the current operational model is improving, but not yet fully standardized
|
||||
5. production-grade multi-instance safeguards are not complete yet
|
||||
|
||||
@@ -47,10 +47,10 @@ The previously critical SSE and browser parser coverage issues were addressed du
|
||||
Evidence: [assistant-tools.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/assistant-tools.ts), [resource.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/resource.ts), [allocation.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/allocation.ts), [timeline.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/timeline.ts), [vacation.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/vacation.ts), and large frontend files such as [SystemSettingsClient.tsx](/home/hartmut/Documents/Copilot/capakraken/apps/web/src/components/admin/SystemSettingsClient.tsx) and [TimelineProjectPanel.tsx](/home/hartmut/Documents/Copilot/capakraken/apps/web/src/components/timeline/TimelineProjectPanel.tsx) are each well past the size where safe ownership stays easy.
|
||||
Risk: AI-generated changes become harder to review, humans lose local reasoning context, and regressions become more likely.
|
||||
|
||||
2. Secret handling is still application-database centric.
|
||||
Evidence: system settings mutate and persist API keys and SMTP credentials in [settings.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/settings.ts).
|
||||
Risk: operational secrets remain too coupled to the main app data plane for a gold-standard project.
|
||||
Update: runtime resolution is now env-first for the active secret consumers, but persistence is still transitional and should be reduced further.
|
||||
2. Runtime secret policy is mostly corrected, but deploy standardization still has to catch up.
|
||||
Evidence: runtime resolution and admin flows now treat environment-backed secrets as the preferred source in [settings.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/router/settings.ts), [system-settings-runtime.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/lib/system-settings-runtime.ts), and [SystemSettingsClient.tsx](/home/hartmut/Documents/Copilot/capakraken/apps/web/src/components/admin/SystemSettingsClient.tsx).
|
||||
Risk: a strong secret policy is only fully effective once staging and production provisioning use one canonical deployment path and operators clear remaining legacy database copies.
|
||||
Update: the application no longer persists new operational secret values through admin settings; the remaining work is rollout discipline and cleanup completion.
|
||||
|
||||
3. Least-privilege is materially better documented now, but it still needs long-lived enforcement rather than relying mainly on one hardening batch.
|
||||
Evidence: the route audience model is now explicit in [route-access-matrix.md](/home/hartmut/Documents/Copilot/capakraken/docs/route-access-matrix.md) and backed by multiple focused auth tests, but the remaining guarantee still depends on continuing test coverage and architecture guardrails as new routes evolve.
|
||||
@@ -80,9 +80,9 @@ This is materially better than a typical startup CRUD app and already has the bo
|
||||
|
||||
### Security Posture
|
||||
|
||||
`7/10`
|
||||
`7.5/10`
|
||||
|
||||
There are good foundations, and the most obvious real-time and comment-visibility gaps were closed, but secrets policy and long-lived least-privilege enforcement still need structural work.
|
||||
There are good foundations, and the most obvious real-time, comment-visibility, and runtime-secret-policy gaps were closed, but long-lived least-privilege enforcement and operational standardization still need structural work.
|
||||
|
||||
### Maintainability
|
||||
|
||||
@@ -124,8 +124,8 @@ Goals:
|
||||
- Keep SSE audience scoping under test and CI guardrails.
|
||||
- Keep hardened spreadsheet parser boundaries under regression coverage.
|
||||
- Treat the route access matrix and narrowed auth slices as maintained architecture contracts.
|
||||
- Move production secrets out of regular application settings, or add an interim encrypted-secrets layer with clear migration path.
|
||||
Status: in progress. Runtime consumers now prefer environment overrides; the remaining gap is eliminating or encrypting compatibility persistence in the admin settings path.
|
||||
- Enforce the environment-only runtime secret policy operationally and clear remaining legacy database secret residue.
|
||||
Status: mostly completed in code. Runtime consumers prefer environment values, admin updates no longer store new secret material, and operators now need to finish rollout/bootstrap documentation plus cleanup of old database copies.
|
||||
|
||||
Definition of done:
|
||||
|
||||
@@ -222,12 +222,11 @@ Artifacts to add:
|
||||
|
||||
## Suggested Order Of Execution
|
||||
|
||||
1. secrets policy
|
||||
2. router/component decomposition
|
||||
3. architecture fitness checks in CI
|
||||
4. full operational standardization
|
||||
5. production-grade rate limiting
|
||||
6. performance hotspot reduction
|
||||
1. router/component decomposition
|
||||
2. architecture fitness checks in CI
|
||||
3. full operational standardization
|
||||
4. production-grade rate limiting
|
||||
5. performance hotspot reduction
|
||||
|
||||
## Success Criteria For The Next 60 Days
|
||||
|
||||
|
||||
@@ -0,0 +1,89 @@
|
||||
# ADR 0001: Runtime Secret Provisioning
|
||||
|
||||
**Status:** Accepted
|
||||
**Date:** 2026-03-30
|
||||
|
||||
## Context
|
||||
|
||||
CapaKraken historically allowed some operational runtime secrets to be persisted through `SystemSettings`.
|
||||
|
||||
That included values such as:
|
||||
|
||||
- primary AI API credentials
|
||||
- dedicated DALL-E credentials
|
||||
- Gemini credentials
|
||||
- SMTP password
|
||||
- anonymization seed
|
||||
|
||||
This was convenient for fast iteration, but it coupled operational secret material to the main application data plane and blurred the line between configuration metadata and deployment secrets.
|
||||
|
||||
The project is moving toward a production model where the running artifact should be immutable and environment-driven. That model is weakened if operators can still rotate runtime secrets through normal application writes.
|
||||
|
||||
## Decision
|
||||
|
||||
Operational runtime secrets must be provisioned outside the application database.
|
||||
|
||||
Allowed sources:
|
||||
|
||||
- deployment environment variables
|
||||
- host-level secret files such as `.env.production` on self-managed infrastructure
|
||||
- platform secret managers or encrypted environment facilities
|
||||
|
||||
Disallowed source for new secret values:
|
||||
|
||||
- admin updates that write runtime secrets into `SystemSettings`
|
||||
|
||||
`SystemSettings` remains valid for non-secret runtime metadata such as:
|
||||
|
||||
- provider selection
|
||||
- endpoints
|
||||
- model names
|
||||
- SMTP host/user/from settings
|
||||
- anonymization mode and domain
|
||||
|
||||
Legacy secret values that already exist in `SystemSettings` may still be read during migration for compatibility, but they are not the target state and should be cleared after equivalent deployment secrets are provisioned.
|
||||
|
||||
## Consequences
|
||||
|
||||
Positive:
|
||||
|
||||
- production updates become more predictable because images and runtime secrets are managed as separate deployment concerns
|
||||
- operational secrets stop depending on ordinary application write paths
|
||||
- admin tooling can expose status and diagnostics without pretending to be the system of record for secrets
|
||||
- secret rotation becomes an infrastructure operation rather than a product mutation
|
||||
|
||||
Tradeoffs:
|
||||
|
||||
- smaller self-managed installs need a disciplined host bootstrap process
|
||||
- operators must understand that updating app settings is no longer sufficient for secret rotation
|
||||
- migration requires visibility into which secrets are still backed by database residue
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
The implementation should follow these rules:
|
||||
|
||||
1. runtime consumers resolve supported secret values from environment first
|
||||
2. admin settings reads expose presence and source status, not secret values
|
||||
3. admin settings updates ignore incoming secret payloads
|
||||
4. the UI explains the expected environment variables for each runtime secret
|
||||
5. a dedicated cleanup action removes legacy database-stored secret values after migration
|
||||
|
||||
## Operational Guidance
|
||||
|
||||
For staging and production:
|
||||
|
||||
1. provision runtime secrets on the host or platform before starting a new release
|
||||
2. deploy the already-built application image
|
||||
3. restart the application so the new process reads the current secret source
|
||||
4. verify runtime status in admin settings
|
||||
5. clear any leftover legacy database secret values once the environment-backed source is confirmed
|
||||
|
||||
Secret rotation should follow the same model. In most cases, no application data mutation is needed. The operator updates the deployment secret source and restarts or redeploys the app.
|
||||
|
||||
## Follow-up
|
||||
|
||||
Still required after this decision:
|
||||
|
||||
- complete the canonical image-based staging/production rollout
|
||||
- ensure staging and production hosts both use the same secret provisioning rules
|
||||
- periodically verify that legacy database secret fields remain empty
|
||||
@@ -20,6 +20,7 @@
|
||||
- comment entity support is now centralized across shared constants, API registry policy, assistant tool metadata, and the web comment target API without pretending a second consumer exists
|
||||
- `resource` is now onboarded as the second real comment entity, reusing the same ownership and staff-visibility rules as the resource detail route
|
||||
- comment mention autocomplete now uses a dedicated entity-scoped API route instead of inheriting the narrower `user.listAssignable` audience
|
||||
- runtime secret handling is now environment-first end to end: admin updates no longer persist new operational secrets, runtime status is surfaced explicitly, and legacy database secret copies can be cleared through a dedicated cleanup path
|
||||
|
||||
## Next Up
|
||||
|
||||
|
||||
@@ -52,9 +52,9 @@ These files already have unrelated local edits. Audience parity work that would
|
||||
|
||||
## Next Major Themes
|
||||
|
||||
1. convert the still-open runtime secret model away from application-database centric storage
|
||||
2. add broader authorization regression coverage and long-lived guardrails around the narrowed route audiences
|
||||
3. reduce oversized routers and UI ownership surfaces so audience rules stay reviewable
|
||||
1. add broader authorization regression coverage and long-lived guardrails around the narrowed route audiences
|
||||
2. reduce oversized routers and UI ownership surfaces so audience rules stay reviewable
|
||||
3. keep runtime secret policy and role/audience boundaries aligned as adjacent architecture guardrails
|
||||
|
||||
## Slice Definition
|
||||
|
||||
|
||||
+16
-2
@@ -154,6 +154,11 @@ SMTP_PORT=587
|
||||
SMTP_USER=notifications@example.com
|
||||
SMTP_PASSWORD=<password>
|
||||
SMTP_FROM=CapaKraken <notifications@example.com>
|
||||
OPENAI_API_KEY=<optional-if-openai-used>
|
||||
AZURE_OPENAI_API_KEY=<optional-if-azure-chat-used>
|
||||
AZURE_DALLE_API_KEY=<optional-if-azure-image-gen-used>
|
||||
GEMINI_API_KEY=<optional-if-gemini-used>
|
||||
ANONYMIZATION_SEED=<required-if-deterministic-anonymization-enabled>
|
||||
```
|
||||
|
||||
Generate a secure `NEXTAUTH_SECRET`:
|
||||
@@ -162,6 +167,12 @@ Generate a secure `NEXTAUTH_SECRET`:
|
||||
openssl rand -base64 32
|
||||
```
|
||||
|
||||
Runtime secret policy:
|
||||
|
||||
- production secrets are injected through the deployment environment or host secret store
|
||||
- admin settings must not be used to enter or rotate AI, SMTP, or anonymization secrets
|
||||
- the admin UI is only for status checks and cleanup of legacy database-stored secret values
|
||||
|
||||
---
|
||||
|
||||
## 5. Deployment
|
||||
@@ -169,13 +180,13 @@ openssl rand -base64 32
|
||||
### docker-compose (simplest)
|
||||
|
||||
```bash
|
||||
# On your server
|
||||
# On your server, after updating the host-side env/secret source
|
||||
git pull
|
||||
docker compose -f docker-compose.prod.yml up -d --build
|
||||
|
||||
# Run database migrations
|
||||
docker compose -f docker-compose.prod.yml exec app \
|
||||
pnpm db:push
|
||||
pnpm --filter @capakraken/db db:migrate:deploy
|
||||
|
||||
# Seed initial data (first deployment only)
|
||||
docker compose -f docker-compose.prod.yml exec app \
|
||||
@@ -193,6 +204,7 @@ git pull origin main
|
||||
pnpm install
|
||||
pnpm db:generate
|
||||
pnpm db:validate
|
||||
pnpm --filter @capakraken/db db:migrate:deploy
|
||||
pnpm --filter @capakraken/web exec next build
|
||||
rm -rf apps/web/.next/cache # clear stale cache
|
||||
|
||||
@@ -203,6 +215,8 @@ PORT=3100 pnpm --filter @capakraken/web start &
|
||||
|
||||
Use the repo-level `pnpm db:*` commands for Prisma/database operations. They load `.env`, `.env.local`, `.env.$NODE_ENV`, and `.env.$NODE_ENV.local` automatically before invoking Prisma.
|
||||
|
||||
If you rotate runtime secrets during a manual deploy, update the host-side environment source first, then restart the app so the new process reads the updated values. Do not patch those values through admin settings.
|
||||
|
||||
### nginx configuration
|
||||
|
||||
The existing nginx reverse proxy should forward to port 3100:
|
||||
|
||||
@@ -30,6 +30,7 @@ That removes "works on the server but not in CI" drift and makes rollbacks much
|
||||
|
||||
The existing `CI` workflow continues to validate:
|
||||
|
||||
- architecture guardrails for SSE audience scoping
|
||||
- typecheck
|
||||
- lint
|
||||
- unit tests
|
||||
@@ -38,6 +39,12 @@ The existing `CI` workflow continues to validate:
|
||||
|
||||
This remains the quality gate before merge.
|
||||
|
||||
The guardrail step currently enforces three invariants:
|
||||
|
||||
- no role-based SSE audience fan-out in [event-bus.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/sse/event-bus.ts)
|
||||
- no role-derived subscription audiences in [subscription-policy.ts](/home/hartmut/Documents/Copilot/capakraken/packages/api/src/sse/subscription-policy.ts)
|
||||
- no client-provided audience parsing in [route.ts](/home/hartmut/Documents/Copilot/capakraken/apps/web/src/app/api/sse/timeline/route.ts)
|
||||
|
||||
### 2. Image Build
|
||||
|
||||
The new manual workflow [release-image.yml](/home/hartmut/Documents/Copilot/capakraken/.github/workflows/release-image.yml) builds two images from [Dockerfile.prod](/home/hartmut/Documents/Copilot/capakraken/Dockerfile.prod):
|
||||
@@ -149,6 +156,28 @@ NEXTAUTH_SECRET=<long-random-secret>
|
||||
|
||||
GitHub Actions only injects the short-lived image references through `deploy.env`. The deploy script then loads both files before calling Docker Compose, so compose interpolation and container runtime env use the same source of truth.
|
||||
|
||||
### Runtime Secret Provisioning Policy
|
||||
|
||||
Production and staging secrets should be provisioned at the host or platform-secret layer, not through admin mutations and not through application database writes.
|
||||
|
||||
That includes at least:
|
||||
|
||||
```env
|
||||
OPENAI_API_KEY=<optional-if-openai-used>
|
||||
AZURE_OPENAI_API_KEY=<optional-if-azure-chat-used>
|
||||
AZURE_DALLE_API_KEY=<optional-if-azure-image-gen-used>
|
||||
GEMINI_API_KEY=<optional-if-gemini-used>
|
||||
SMTP_PASSWORD=<required-if-smtp-auth-used>
|
||||
ANONYMIZATION_SEED=<required-if-deterministic-anonymization-enabled>
|
||||
```
|
||||
|
||||
Operational rule:
|
||||
|
||||
- keep these values in `.env.production` only for smaller self-managed hosts, or preferably in the host's secret manager / encrypted environment facility
|
||||
- do not rotate or patch these values through `SystemSettings`
|
||||
- use the admin settings page only to verify runtime source/status and to clear leftover legacy database copies
|
||||
- after migration, legacy database secret fields should be empty in both staging and production
|
||||
|
||||
## Database Policy
|
||||
|
||||
For release environments, use:
|
||||
@@ -183,6 +212,8 @@ The intended production update path is:
|
||||
|
||||
That means the production host no longer builds from Git. It only receives a versioned image and starts it after migrations complete.
|
||||
|
||||
The same principle applies to secrets: the running container reads them from the deployment environment at start time, so an update only needs a new image tag unless secret material itself is being rotated.
|
||||
|
||||
## Current Status
|
||||
|
||||
The repository now contains the CI/CD scaffolding, but the existing manual production setup remains untouched:
|
||||
|
||||
+2
-1
@@ -46,7 +46,8 @@ See `.github/PULL_REQUEST_TEMPLATE.md` for the security checklist that must be c
|
||||
|
||||
- No secrets in source code
|
||||
- Environment variables for all credentials (`DATABASE_URL`, API keys)
|
||||
- `SystemSettings` table for runtime-configurable secrets (AI keys, SMTP credentials)
|
||||
- Runtime application secrets are provisioned outside the application data plane through environment variables or a deployment-time secret manager
|
||||
- `SystemSettings` may still contain legacy secret residue during migration, but new secret values must not be written there
|
||||
- `.env` files excluded from version control via `.gitignore`
|
||||
|
||||
## Incident Response
|
||||
|
||||
@@ -65,6 +65,8 @@ publicProcedure
|
||||
- Runtime secrets now resolve env-first for AI, Gemini, SMTP, and anonymization seed values. Database-backed `SystemSettings` values remain transitional compatibility storage, not the preferred production source of truth.
|
||||
- Recommended runtime overrides: `OPENAI_API_KEY`, `AZURE_OPENAI_API_KEY`, `AZURE_DALLE_API_KEY`, `GEMINI_API_KEY`, `SMTP_PASSWORD`, `ANONYMIZATION_SEED`
|
||||
- Admin settings reads expose only presence flags (`hasApiKey`, `hasSmtpPassword`, `hasGeminiApiKey`) instead of returning secret values to the browser, and those flags also reflect environment-backed runtime overrides
|
||||
- The admin settings mutation no longer persists new secret values into `SystemSettings`; secret inputs must be provisioned through environment or a deployment-time secret manager, and legacy database copies can be cleared explicitly
|
||||
- The admin UI now exposes runtime secret source/status plus an explicit "clear legacy DB secrets" cleanup path so operators can complete the migration without direct database writes
|
||||
|
||||
### Anonymization
|
||||
|
||||
|
||||
Reference in New Issue
Block a user