CapaKraken

Author	SHA1	Message	Date
Hartmut	e2dddd30df	security: RBAC cache cross-instance invalidation + force re-login on role/perm change (#57 ) - shrink roleDefaults cache TTL from 60s to 10s (safety-net staleness bound) - publish/subscribe on capakraken:rbac-invalidate so peer instances drop their local role-defaults cache on mutation (ioredis pub/sub; lazy init so idle test files don't open connections) - after updateUserRole/setUserPermissions/resetUserPermissions: delete all ActiveSession rows for that user so the next request re-auths via tRPC's jti check, and invalidate the role-defaults cache - tests: peer-instance invalidation via FakeRedis pub/sub fan-out; mutation side-effects assert session deletion + cache invalidation on each path Without this, demoted admins kept their JWT valid until expiry and peer instances kept serving stale role defaults for up to the TTL window. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-17 13:01:15 +02:00
Hartmut	23c6e0e04b	security: sanitise Prisma error leaks in AI-tool helpers (#53 ) Five helper error mappers (timeline / project-creation / resource-creation / vacation-creation / task-action-execution) fell through to `return { error: error.message }` for BAD_REQUEST and CONFLICT cases. When the TRPCError wrapped a Prisma error, the message contained column names, relation paths, and the offending unique-constraint value — all of which would reach the LLM in chat context and, via audit_log.changes JSONB, the DB. Add `sanitizeAssistantErrorMessage()` that regex-detects Prisma and raw Postgres signatures (P2002/P2003/P2025, not-null, FK, check-constraint, duplicate-key) and replaces them with a generic "Invalid input". Also caps messages at 500 chars to defend against stack-trace-like payloads. Wire the helper into all five call-sites; the developer-constructed `AssistantVisibleError` branch in `normalizeAssistantExecutionError` is left untouched since those strings are hand-written. Coverage: 11 new tests in assistant-tools-error-sanitiser.test.ts; existing vacation / task-action / resource-creation / project-creation error tests (12 tests, 5 files) all remain green. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-17 09:40:01 +02:00
Hartmut	019702c043	security: ReDoS hardening on blueprint field validator (#52 ) Admin-editable blueprint field patterns go through `new RegExp(pattern).test(userValue)` — a classic ReDoS sink if the admin account is compromised or the permission is ever delegated. A pattern like `^(a+)+$` against 30 'a's followed by '!' freezes the event loop for seconds per request. Three layers of defence: - Save-time: FieldValidationSchema.pattern now has `.max(200)` and a `.refine()` that rejects nested-quantifier shapes like `(x+)+`, `(?:x)+`, `(x{2,})`. - Runtime (engine/blueprint/validator.ts): - isSuspectRegexPattern() runs the same heuristic. If it fires, the field fails validation outright — regex is never compiled. - Input strings are sliced to 4096 chars before .test() so even a benign pattern against a 10 MB payload returns in < 50 ms. - RegExp compile failures are caught and treated as validation errors rather than crashing the request. Tests: 10 cases in packages/engine/src/__tests__/blueprint-validator-redos.test.ts, including the canonical `^(a+)+$` attack — completes in < 50 ms. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-17 09:33:42 +02:00
Hartmut	b9040cb328	test(security): scoped-caller forwarding preserves read-only proxy (#47 ) Adds a regression suite asserting that the read-only Prisma proxy is still in effect after a tool's executor forwards ctx.db into a scoped tRPC caller (helpers.ts::createScopedCallerContext). Covers all three attack surfaces: model writes, raw-SQL escape hatches, and interactive $transaction / $runCommandRaw calls. These tests pin the behaviour enforced by 1ff5c33; any future refactor that unwraps the proxy during forwarding will fail this suite. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-17 09:28:02 +02:00
Hartmut	3d89d7d8eb	security: redact sensitive fields in audit DB entries (#46 ) createAuditEntry now deep-walks before/after/metadata and replaces values of password, newPassword, currentPassword, passwordHash, token, accessToken, refreshToken, sessionToken, apiKey, authorization, cookie, secret, totpSecret, backupCode(s) with "[REDACTED]" before the JSONB write. The pino logger already redacts these paths for stdout (see lib/logger.ts), but DB writes had no equivalent guard — the AI chat loop at assistant-chat-loop.ts:265 blindly stores parsedArgs from tool calls (e.g. set_user_password, create_user) into the AuditLog table. Matching is case-insensitive; nested objects and arrays are recursed to a depth of 8. Diffs are computed post-redaction so UPDATE entries that only changed a sensitive field are correctly collapsed to no-op. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-17 09:25:15 +02:00
Hartmut	4ff7bc90c3	security: SSRF guard covers IPv6 + DNS-rebind defence via pinned IP (#49 ) Expand the SSRF blocklist from IPv4-only to IPv6 loopback/ULA (fc00::/7)/ link-local (fe80::/10)/multicast/IPv4-mapped, plus the missing IPv4 ranges 0.0.0.0/8, 100.64.0.0/10 CGNAT, and TEST-NET/benchmark ranges. Replace the single-lookup SSRF guard with resolveAndValidate(): resolves all DNS records (lookup { all: true }) so a hostname returning "public + private" is rejected, and returns the first validated address for connection pinning. The webhook dispatcher now switches from plain fetch() to https.request() with a custom Agent.lookup that returns the pre-validated IP. A DNS rebind between the guard check and the TCP connect() can no longer redirect the dial to an internal address. Hostname still flows through for SNI and certificate validation. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-17 09:19:07 +02:00
Hartmut	3222bec8a5	security: atomic compare-and-swap for TOTP replay window (#43 , part 1) The previous SELECT → compare → UPDATE sequence let two concurrent login requests with the same valid 6-digit code both observe a stale lastTotpAt, both pass the in-JS replay check, and both succeed. A stolen TOTP (shoulder- surf, phishing-proxy replay) was usable twice within its 30 s window. Replace the three callsites (login authorize, self-service enable, self- service verify) with a shared consumeTotpWindow() helper: a single updateMany() expresses "window unused" as a SQL WHERE clause, so Postgres' row lock serialises concurrent writers and whichever commits second sees count=0 and is treated as a replay. Backup codes (ticket part 2) are tracked as follow-up work. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-17 09:11:50 +02:00
Hartmut	d1075af77d	security: tighten CSP — drop provider wildcards, add object/frame/worker-src (#45 ) Browser code never calls OpenAI/Azure/Gemini directly; all AI traffic is server-side tRPC. connect-src is now locked to 'self'. Added object-src 'none', frame-src 'none', media-src 'self', and worker-src 'self' blob:. style-src keeps 'unsafe-inline' for React + @react-pdf/renderer (documented residual risk — script-src is nonce-based so CSS injection cannot escalate to JS). Added three regression tests covering connect-src no-wildcards, object/frame-src 'none', and worker-src scope. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-17 09:08:40 +02:00
Hartmut	b32160d546	security: default-deny /api middleware allowlist (#44 ) Previously middleware.ts listed /api/ as a public prefix, so any new API route added under /api/** was served without a session check unless the developer remembered to self-authenticate it. The middleware now returns 404 for any /api path not explicitly allowlisted (auth, trpc, sse, cron, reports, health, ready, perf) — adding a new API route is a deliberate allowlist edit. verifyCronSecret was already fail-closed when CRON_SECRET is unset; added unit tests. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-17 09:03:24 +02:00
Hartmut	d45cc00f2f	security: cookie + session hardening (#41 ) Three related fixes: - Cookie secure flag now tracks AUTH_URL scheme (https → Secure), not NODE_ENV — staging over HTTPS with NODE_ENV!=production used to ship Set-Cookie without Secure. Cookie name gains __Host- prefix when Secure is on. - jwt() callback no longer swallows session-registry write failures; concurrent-session cap is now fail-closed. - Session callback no longer copies token.sid onto session.user.jti. The tRPC route handler reads the JTI directly from the encrypted JWT via getToken() so it stays server-side. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-17 09:00:54 +02:00
Hartmut	93a7fbaa4c	security: fail-fast dev-bypass flag in production (#42 ) Both auth.ts and trpc.ts now delegate the E2E_TEST_MODE-in-production check to a single shared helper (packages/api/src/lib/runtime-security.ts). trpc.ts used to only console.warn; it now throws at module load time, matching the behaviour already enforced by assertSecureRuntimeEnv on the auth side. A future refactor can no longer silently drop the guard on either side. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-17 08:56:27 +02:00
Hartmut	c2d05b4b99	security: Unicode-aware prompt-injection guard (#39 ) checkPromptInjection now NFKD-normalises, strips zero-width / combining chars, and folds common Cyrillic / Greek homoglyphs before matching. 10 documented bypass examples (fullwidth, ZWJ, ZWSP, soft-hyphen, Cyrillic е/о, combining marks, LRM, BOM) are covered by unit tests. Security docs explicitly mark the guard as defense-in-depth — real boundary is per-tool requirePermission. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-17 08:53:38 +02:00
Hartmut	03030639d7	security: constant-time authorize + uniform audit summaries (#40 ) Prevent user-enumeration via login-response timing and audit-log content. All failing branches now run argon2.verify against a precomputed dummy hash (discarding the result), and emit a single "Login failed" audit summary. Detailed reason stays in the server-only pino logger. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-17 08:50:25 +02:00
Hartmut	c0ea1d0cb9	security: cap assistant chat payload + injection-guard project cover prompt (#38 ) `messages[].content` and `pageContext` had no `.max()` — a single chat turn could ship 50 MB / 200 messages and OOM JSON.parse, balloon prompt assembly, and burn arbitrary AI-provider cost. Separately, the project-cover image-generation path concatenated user free-text into the DALL-E / Gemini prompt without any injection check, so a manager could pivot the image model into "ignore previous instructions" / role-override style attacks against downstream prompt-aware infra. - assistant-procedure-support: add `.max(10_000)` per message, `.max(2_000)` on pageContext, and a `.superRefine` aggregate cap (200 KB total bytes across all messages + page context). Constants exported so call sites and tests share one source of truth. - project-cover.generateCover: run `checkPromptInjection` over the user-supplied `prompt` field; reject with BAD_REQUEST on match. - 7 schema-bound tests covering per-message, page-context, aggregate, message-count, and happy-path cases. Covers EAPPS 3.2.7 (input bounds) / EGAI 4.6.3.2 (prompt-injection detection on user inputs). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-17 08:46:03 +02:00
Hartmut	c0c5f762b8	security: bound JSONB inputs + whitelist batchUpdateCustomFields keys (#48 ) batchUpdateCustomFields used $executeRaw to merge a manager-supplied record straight into Resource.dynamicFields with no key whitelist — so a manager could pollute the JSONB namespace with arbitrary keys (e.g. ones admin tools later interpret). Separately, several user-facing JSONB fields (allocation/demand metadata, dynamicFields) were typed as unbounded z.record(z.string(), z.unknown()), letting clients ship multi-MB payloads that flow into DB writes, audit logs, and SSE frames. - Add BoundedJsonRecord helper (shared) — 64 keys / depth 4 / 8 KB strings / 32 KB serialized total. Conservative defaults; call sites needing more should use a strict object schema. - Apply BoundedJsonRecord to the highest-traffic untrusted JSONB inputs: allocation metadata (Create/CreateDemandRequirement/CreateAssignment), resource & project dynamicFields, and the createDemand router input. - batchUpdateCustomFields: * Tighten input schema (key length, value bounds, max 100 keys). * Fetch each target resource and verify all input keys are in the union of (specific blueprint defs) ∪ (active global RESOURCE blueprint defs) for that resource. Empty whitelist → reject all keys (stricter than create/update, but appropriate for a bulk escape-hatch endpoint). * Run the existing per-key value validator afterwards. * 404 if any requested id does not exist (was silently skipped). - New helper getAllowedDynamicFieldKeys() in blueprint-validation. - 7 new BoundedJsonRecord tests, 2 new batchUpdateCustomFields tests covering the whitelist-rejection and not-found paths. Covers EAPPS 3.2.7 (input bounds) / OWASP A03 (injection / mass assignment). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-17 08:44:11 +02:00
Hartmut	1ff5c3377c	security: block raw/tx escape hatches on read-only AI DB proxy (#47 ) The read-only proxy previously wrapped model delegates to block writes, but left client-level raw/escape hatches ($transaction, $executeRaw, $executeRawUnsafe, $queryRawUnsafe, $runCommandRaw) intact. A read-tool could smuggle DML via raw SQL, or open an interactive $transaction whose tx-scoped client (unproxied by construction) accepts writes. - read-only-prisma: block $transaction, $executeRaw, $executeRawUnsafe, $queryRawUnsafe, $runCommandRaw at the client level. Template-tagged $queryRaw stays allowed (read-only by API contract). - assistant-tools: add create_estimate to MUTATION_TOOLS — it uses $transaction internally and was previously bypassing the proxy only because $transaction wasn't blocked. - shared: document isReadOnly flag on ToolContext so any scoped tRPC caller a tool spawns keeps the proxied client. - helpers: note the runtime wrap at assistant-tools.ts:739 is authoritative; forwarding ctx.db verbatim is correct. - tests: cover model writes, raw escapes, and the allowed $queryRaw path (7 cases, all pass). - loosen one estimate-detail test that compared the exact db instance (fails once that instance is a proxy; the assertion's intent is the estimate id). Covers EGAI 4.1.1.2 / IAAI 3.6.22. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-17 08:38:05 +02:00
Hartmut	3c5d1d37f7	security: rate-limit IP-keyed, fail-closed on empty key (#37 ) Rate-limiter now accepts string \| string[] so callers can key on multiple buckets simultaneously. If any bucket is exhausted the request is denied, which lets login/TOTP/reset-password throttle on BOTH user identifier and source IP without either becoming a bypass. Fail-closed: empty/whitespace-only keys now deny by default instead of silently allowing unbounded attempts (was CWE-307 gap). Degraded-fallback divisor reduced from /10 to /2 — the old aggressive clamp forced-logged-out legitimate users during brief Redis outages; /2 still meaningfully slows distributed brute-force. Callers updated: - auth.ts (login): both email: and ip: buckets - auth router requestPasswordReset: email + IP - auth router resetPassword: IP before lookup, email-reset after - invite router getInvite/acceptInvite: IP - user-self-service verifyTotp: userId + IP TRPCContext now carries clientIp; web tRPC route extracts it from X-Forwarded-For / X-Real-IP. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-17 08:19:33 +02:00
Hartmut	534945f6e3	security: bound password inputs, configure pino redact, patch deps (#36 #46 #58 ) #36 CRITICAL: add .max(128) to all password Zod schemas to prevent Argon2-based DoS from unbounded password strings. #46 HIGH: configure pino redact paths so passwords/tokens/cookies/TOTP secrets are never serialized in logs. #58 MEDIUM: upgrade dompurify to ^3.4.0 and add pnpm overrides for brace-expansion (>=5.0.5) and esbuild (>=0.25.0) to patch known CVEs. Vite moderate (path traversal, dev-only) remains — requires vitest 3.x major upgrade, deferred. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-17 08:13:25 +02:00
Hartmut	0ef9add935	ci(docker-deploy): pin DATABASE_URL to unique container name to fix split-brain CI / Architecture Guardrails (push) Successful in 3m13s Details CI / Typecheck (push) Successful in 3m39s Details CI / Lint (push) Successful in 4m15s Details CI / Unit Tests (push) Successful in 7m10s Details CI / Build (push) Successful in 7m8s Details CI / E2E Tests (push) Successful in 4m50s Details CI / Fresh-Linux Docker Deploy (push) Successful in 5m1s Details CI / Release Images (push) Successful in 5m10s Details Nightly Security / Dependency Audit (push) Successful in 1m38s Details CI / Assistant Split Regression (push) Successful in 5m18s Details The app container is attached to both `default` and `gitea_gitea` networks. Both have a container answering to "postgres" (ours on default, Gitea's core on gitea_gitea). Docker's embedded DNS returns IPs from all attached networks, so the app startup script's `prisma db push` and the seed script's `prisma.user.count()` cached different IPs and hit different postgres instances. The seed then saw "table public.users does not exist" even though `/api/health` reported db:ok. Override DATABASE_URL and REDIS_URL in docker-compose.ci.yml to use the unique compose container names (capakraken-postgres-1, capakraken-redis-1) so resolution is unambiguous. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-13 09:16:12 +02:00
Hartmut	bb117e9179	fix(docker): provide build-time auth/db env to next build CI / Architecture Guardrails (push) Successful in 3m12s Details CI / Assistant Split Regression (push) Successful in 4m6s Details CI / Typecheck (push) Successful in 4m36s Details CI / Lint (push) Successful in 4m33s Details CI / Unit Tests (push) Successful in 6m40s Details CI / Build (push) Successful in 6m53s Details CI / Fresh-Linux Docker Deploy (push) Failing after 1m42s Details CI / E2E Tests (push) Successful in 4m11s Details CI / Release Images (push) Has been skipped Details next build collects page data for /api/auth/[...nextauth] and aborts when NEXTAUTH_URL/SECRET/DATABASE_URL are unset. The CI Build job sets these as env vars; Dockerfile.prod did not, so the prod image build failed during Release Images even though plain build worked. Add ARG defaults that mirror the CI placeholders. Real values are injected at container start, so build-time placeholders are inert. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-13 08:54:18 +02:00
Hartmut	4cbfb2508d	ci(release): build images with plain docker, not buildx CI / Architecture Guardrails (push) Successful in 3m2s Details CI / Typecheck (push) Successful in 3m49s Details CI / Assistant Split Regression (push) Successful in 4m15s Details CI / Lint (push) Successful in 4m21s Details CI / Unit Tests (push) Successful in 7m22s Details CI / Build (push) Successful in 6m44s Details CI / E2E Tests (push) Successful in 5m23s Details CI / Fresh-Linux Docker Deploy (push) Successful in 5m39s Details CI / Release Images (push) Failing after 4m11s Details The QNAP host kernel rejects fchmodat2 AT_EMPTY_PATH calls that newer buildkit's runc emits, breaking docker/build-push-action@v5. The docker-deploy-test job already builds the same Dockerfile.prod via plain docker build (DooD) and works, so do the same here: drop the buildx setup and use docker build + docker push directly against the host daemon. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-13 08:31:01 +02:00
Hartmut	69d74881dc	ci(release): use REGISTRY_TOKEN PAT for Gitea registry login CI / Architecture Guardrails (push) Successful in 3m3s Details CI / Lint (push) Successful in 3m49s Details CI / Typecheck (push) Successful in 3m56s Details CI / Assistant Split Regression (push) Successful in 5m54s Details CI / Build (push) Successful in 6m48s Details CI / E2E Tests (push) Successful in 5m23s Details CI / Fresh-Linux Docker Deploy (push) Successful in 6m10s Details CI / Release Images (push) Failing after 2m7s Details CI / Unit Tests (push) Successful in 7m22s Details The auto-provisioned GITHUB_TOKEN in Gitea Actions does not carry package-registry write permission. Use a personal access token stored as a repo secret instead.	2026-04-13 08:09:56 +02:00
Hartmut	62de038497	ci(release): hardcode external Gitea registry host CI / Architecture Guardrails (push) Successful in 3m32s Details CI / Lint (push) Successful in 4m27s Details CI / Typecheck (push) Successful in 4m38s Details CI / Assistant Split Regression (push) Successful in 5m19s Details CI / Unit Tests (push) Successful in 7m59s Details CI / Build (push) Successful in 7m13s Details CI / E2E Tests (push) Successful in 6m45s Details CI / Fresh-Linux Docker Deploy (push) Successful in 6m53s Details CI / Release Images (push) Failing after 37s Details GITHUB_SERVER_URL inside act_runner resolves to gitea:3000 (internal docker hostname) which is not reachable from the build job container. Use the externally-resolvable hostname instead.	2026-04-13 07:44:21 +02:00
Hartmut	a1f7abc850	ci: float setup-node to v4 to avoid act_runner cleanup race CI / Architecture Guardrails (push) Successful in 3m52s Details CI / Typecheck (push) Successful in 5m4s Details CI / Lint (push) Successful in 4m51s Details CI / Assistant Split Regression (push) Successful in 6m20s Details CI / Unit Tests (push) Successful in 7m2s Details CI / Build (push) Successful in 6m50s Details CI / E2E Tests (push) Successful in 6m55s Details CI / Fresh-Linux Docker Deploy (push) Successful in 7m34s Details CI / Release Images (push) Failing after 45s Details act_runner v0.3.1 occasionally cleans the action checkout dir between the main and post step; v4.0.4's post step then errors on the missing .gitignore ("remove ... .gitignore: no such file") and fails the job. Floating to v4 picks up the more defensive cleanup in v4.1+.	2026-04-13 07:21:59 +02:00
Hartmut	69c52e2875	ci(release): push images to Gitea registry, drop GHCR secret requirement CI / Architecture Guardrails (push) Successful in 3m15s Details CI / Typecheck (push) Successful in 4m15s Details CI / Assistant Split Regression (push) Successful in 5m0s Details CI / Lint (push) Successful in 5m4s Details CI / Build (push) Failing after 1m41s Details CI / E2E Tests (push) Has been skipped Details CI / Fresh-Linux Docker Deploy (push) Has been skipped Details CI / Release Images (push) Has been cancelled Details CI / Unit Tests (push) Has been cancelled Details The release-images job failed on every run because GHCR_USERNAME and GHCR_TOKEN are not configured on the Gitea repo — and they don't need to be: Gitea has its own container registry at the same host, reachable with the auto-provisioned GITHUB_TOKEN. - Derive the registry host from GITHUB_SERVER_URL (the Gitea base URL) - Log in with $GITHUB_TOKEN + ${{ github.actor }} - Tag images as <gitea-host>/<owner>/<repo>-{app,migrator}:sha-<commit> - Add packages: write permission - Drop the workflow_call secrets block — no external secrets needed Consumers (deploy-staging.yml, deploy-prod.yml) that previously pulled from ghcr.io/<owner>/<repo>-app will need to be updated to pull from the Gitea registry next; flagging separately.	2026-04-13 07:13:37 +02:00
Hartmut	0b330fd344	test(web/e2e): verify root redirect via HTTP not Chromium navigation CI / Architecture Guardrails (push) Successful in 3m38s Details CI / Assistant Split Regression (push) Successful in 4m42s Details CI / Lint (push) Successful in 5m9s Details CI / Typecheck (push) Successful in 5m40s Details CI / Unit Tests (push) Successful in 7m49s Details CI / Build (push) Successful in 6m18s Details CI / E2E Tests (push) Successful in 6m22s Details CI / Release Images (push) Failing after 1m53s Details CI / Fresh-Linux Docker Deploy (push) Successful in 7m27s Details Chromium on the QNAP act_runner intermittently raises ERR_CONNECTION_ REFUSED on page.goto('/') even when curl on the same pinned IP returns 307 a second earlier and the other four smoke tests (api/health, /auth/signin, login, nav) all pass against the same container. The smoke suite has blocked release-images on two successive docker-deploy failures (`bee5bbf`, `e2982a8`) and a shell-level suite retry didn't help — the Chromium refusal is reproducible per run. Switch this one test to Playwright's HTTP request API with maxRedirects: 0 and assert on status + Location. Semantically equivalent (it verifies middleware wires / to /auth/signin) and bypasses whatever Chromium-specific quirk is refusing the navigation.	2026-04-13 06:44:39 +02:00
Hartmut	e2982a8bd1	ci: bump retrigger marker to force Gitea workflow run CI / Architecture Guardrails (push) Successful in 4m5s Details CI / Lint (push) Successful in 5m1s Details CI / Typecheck (push) Successful in 5m5s Details CI / Assistant Split Regression (push) Successful in 5m15s Details CI / Unit Tests (push) Successful in 8m36s Details CI / Build (push) Successful in 8m19s Details CI / E2E Tests (push) Successful in 6m19s Details CI / Fresh-Linux Docker Deploy (push) Failing after 7m39s Details CI / Release Images (push) Has been skipped Details	2026-04-13 06:21:16 +02:00
Hartmut	b2d89ca4f0	ci: retrigger docker-deploy after Gitea dbfs lost task 403 log	2026-04-13 06:20:39 +02:00
Hartmut	bee5bbf25e	ci(docker-deploy): retry smoke run once after aggressive re-warm CI / Architecture Guardrails (push) Successful in 3m21s Details CI / Typecheck (push) Successful in 4m1s Details CI / Lint (push) Successful in 4m0s Details CI / Assistant Split Regression (push) Successful in 4m33s Details CI / Unit Tests (push) Successful in 7m45s Details CI / Build (push) Successful in 7m31s Details CI / E2E Tests (push) Successful in 4m44s Details CI / Fresh-Linux Docker Deploy (push) Failing after 11m44s Details CI / Release Images (push) Has been cancelled Details Next.js dev mode on the QNAP runner intermittently drops its listening socket for ~1-2s during route-transition compiles — smoke test #2 (page.goto('/')) has hit ERR_CONNECTION_REFUSED despite both warm-ups and the immediately preceding health test succeeding. Playwright's in-process retry fires while the socket is still down. Wrap the playwright invocation in a shell-level retry: if the first full run fails, re-warm / aggressively (up to 10 probes waiting for 307) and rerun the whole suite once.	2026-04-13 05:54:06 +02:00
Hartmut	c7d36ecbbd	test(application): extend ExcelJS read-workbook timeouts to 30s CI / Assistant Split Regression (push) Successful in 11m15s Details CI / Lint (push) Successful in 9m38s Details CI / Typecheck (push) Successful in 11m19s Details CI / Unit Tests (push) Successful in 9m48s Details CI / Build (push) Successful in 8m19s Details CI / E2E Tests (push) Successful in 5m54s Details CI / Fresh-Linux Docker Deploy (push) Failing after 6m45s Details CI / Release Images (push) Has been skipped Details CI / Architecture Guardrails (push) Successful in 9m17s Details The 'rejects worksheets that exceed the row limit' test took 6599ms on the QNAP act_runner, overflowing the default 5000ms vitest timeout. Writing and parsing MAX_DISPO_WORKBOOK_ROWS+1 rows via ExcelJS is slow on constrained hardware. Extend timeout for all three writeWorkbook- dependent tests (row limit, column limit) to 30s, matching the fix already applied to excel.test.ts and workbook-export.test.ts.	2026-04-13 05:24:07 +02:00
Hartmut	d90a86c7d7	ci(docker-deploy): pin APP_IP via docker inspect, not shared DNS CI / Architecture Guardrails (push) Successful in 4m15s Details CI / Assistant Split Regression (push) Successful in 6m29s Details CI / Typecheck (push) Successful in 7m50s Details CI / Lint (push) Successful in 7m46s Details CI / Unit Tests (push) Failing after 10m56s Details CI / E2E Tests (push) Has been cancelled Details CI / Fresh-Linux Docker Deploy (push) Has been cancelled Details CI / Release Images (push) Has been cancelled Details CI / Build (push) Has been cancelled Details The 'app' hostname on gitea_gitea collides with foreign containers from other stacks that also answer /api/health. Previous logic picked the first IP whose health check returned 200 — sometimes a neighbor whose process died mid-test, producing ERR_CONNECTION_REFUSED on smoke test #2. Use 'docker compose ps -q app' + docker inspect to read our own container's gitea_gitea IP. Zero DNS ambiguity.	2026-04-13 05:07:09 +02:00
Hartmut	a984635ef3	test(web): extend timeout for ExcelJS workbook export tests CI / Architecture Guardrails (push) Successful in 7m28s Details CI / Assistant Split Regression (push) Successful in 8m49s Details CI / Lint (push) Successful in 9m32s Details CI / Typecheck (push) Successful in 10m14s Details CI / Unit Tests (push) Successful in 10m41s Details CI / Build (push) Successful in 9m1s Details CI / E2E Tests (push) Successful in 7m15s Details CI / Fresh-Linux Docker Deploy (push) Failing after 8m35s Details CI / Release Images (push) Has been skipped Details Same pattern as excel.test.ts and skillMatrixParser.test.ts: ExcelJS dynamic import + writeBuffer exceeds the default 5s vitest timeout on the QNAP CI runner. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-13 04:33:40 +02:00
Hartmut	0b718f8025	ci: re-warm routes immediately before smoke run CI / Architecture Guardrails (push) Successful in 2m43s Details CI / Lint (push) Successful in 6m16s Details CI / Typecheck (push) Successful in 6m40s Details CI / Unit Tests (push) Failing after 6m44s Details CI / E2E Tests (push) Has been cancelled Details CI / Fresh-Linux Docker Deploy (push) Has been cancelled Details CI / Release Images (push) Has been cancelled Details CI / Build (push) Has been cancelled Details CI / Assistant Split Regression (push) Successful in 8m46s Details The initial warm-up runs ~4 minutes before the smoke tests (seed, Node setup, Playwright install all take real time on the QNAP runner). Between those steps, Next.js dev server can evict or recompile routes under memory pressure — test #2 kept hitting ERR_CONNECTION_REFUSED on / (139ms, consistently) while /auth/signin, login, and authed nav all passed cleanly in the same run. Re-warm both routes right before Playwright starts so the server is guaranteed hot at the moment smoke test #2 navigates. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-13 04:21:41 +02:00
Hartmut	97b77c29f9	ci: pin Docker Deploy to a single app container IP CI / Lint (push) Successful in 3m27s Details CI / Architecture Guardrails (push) Successful in 4m31s Details CI / Assistant Split Regression (push) Successful in 5m32s Details CI / Typecheck (push) Successful in 6m24s Details CI / Unit Tests (push) Successful in 8m31s Details CI / Build (push) Successful in 7m35s Details CI / E2E Tests (push) Successful in 7m48s Details Nightly Security / Dependency Audit (push) Successful in 1m42s Details CI / Fresh-Linux Docker Deploy (push) Failing after 9m57s Details CI / Release Images (push) Has been skipped Details Smoke test #2 kept hitting ERR_CONNECTION_REFUSED on the root path even though curl warm-ups of the same path succeeded. Root cause is the same split-brain bug we just fixed for e2epg: the 'app' hostname on the shared gitea_gitea network resolves to multiple IPs (leftover containers from concurrent runs), and curl vs Chromium picked different ones. Probe each resolved IP for /api/health, pin the winner as APP_BASE_URL via GITHUB_ENV, and route health check, warm-up, and the Playwright smoke run through that explicit IP. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-13 03:54:19 +02:00
Hartmut	5da90af432	ci: probe every e2epg IP and pin DATABASE_URL to the one with our DB CI / Unit Tests (push) Has been cancelled Details CI / Build (push) Has been cancelled Details CI / E2E Tests (push) Has been cancelled Details CI / Fresh-Linux Docker Deploy (push) Has been cancelled Details CI / Release Images (push) Has been cancelled Details CI / Typecheck (push) Has started running Details CI / Assistant Split Regression (push) Has started running Details CI / Lint (push) Has started running Details CI / Architecture Guardrails (push) Has started running Details The 'e2epg' service-container hostname resolves to 3 IPs on the shared gitea_gitea network (leftover containers from concurrent / crashed runs). Prisma picked one IP, psql picked another — push reported success but the verification query saw an empty schema. Probe every resolved IP with our credentials and lock onto the one that accepts them, then rewrite DATABASE_URL / PLAYWRIGHT_DATABASE_URL via GITHUB_ENV so every subsequent step (prisma push, seed, E2E webServer, Playwright fixtures) hits the same postgres instance. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-13 03:52:03 +02:00
Hartmut	e39cae62dc	ci: retrigger after transient setup-node clone race	2026-04-13 03:31:25 +02:00
Hartmut	5dfa1e2aab	ci: warm both root and signin paths without following redirects CI / Architecture Guardrails (push) Successful in 4m52s Details CI / Assistant Split Regression (push) Successful in 4m18s Details CI / Typecheck (push) Successful in 5m53s Details CI / Unit Tests (push) Failing after 1m57s Details CI / Lint (push) Successful in 3m30s Details CI / Build (push) Successful in 11m3s Details CI / E2E Tests (push) Failing after 8m46s Details CI / Fresh-Linux Docker Deploy (push) Failing after 10m30s Details CI / Release Images (push) Has been skipped Details Previous warm-up used curl -L, which followed the 307 from / to a Location target the runner could not reach (the curl output was '307000' — root redirected, follow-up connection refused). That meant the warm-up never exited early on a ready server, and smoke test #2 still hit an uncompiled root occasionally. Replace with two independent warm-ups (/ expecting 307, /auth/signin expecting 200) that compile each route without following the redirect. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-13 03:19:56 +02:00
Hartmut	2ca101100f	ci: fix audit_logs verification to query pg_tables directly CI / Architecture Guardrails (push) Successful in 2m51s Details CI / Release Images (push) Has been cancelled Details CI / Lint (push) Successful in 4m54s Details CI / Typecheck (push) Successful in 5m46s Details CI / Unit Tests (push) Failing after 7m42s Details CI / Build (push) Successful in 9m25s Details CI / Fresh-Linux Docker Deploy (push) Failing after 4m2s Details CI / E2E Tests (push) Failing after 10m49s Details CI / Assistant Split Regression (push) Successful in 6m25s Details psql's \\dt meta-command interpreted 'public.' as a literal pattern on the runner's psql build, returning 'Did not find any relation named public.' even though prisma db push had succeeded. Replace with a direct query against pg_tables so the verification reflects actual schema state. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-13 03:17:04 +02:00
Hartmut	ee84f6e316	test(web): extend timeout for ExcelJS-based excel import tests CI / Architecture Guardrails (push) Successful in 3m44s Details CI / Assistant Split Regression (push) Successful in 5m16s Details CI / Typecheck (push) Successful in 7m23s Details CI / Lint (push) Successful in 8m20s Details CI / Unit Tests (push) Successful in 8m22s Details CI / E2E Tests (push) Failing after 5m12s Details CI / Fresh-Linux Docker Deploy (push) Failing after 8m19s Details CI / Release Images (push) Has been skipped Details CI / Build (push) Successful in 7m34s Details ExcelJS dynamic import + workbook writeBuffer exceeds the default 5s vitest timeout on the constrained QNAP CI runner, matching the same pattern already applied to skillMatrixParser.test.ts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-13 02:52:54 +02:00
Hartmut	1006167e76	ci(deploy): warm up root path before smoke tests CI / Architecture Guardrails (push) Successful in 2m23s Details CI / Typecheck (push) Successful in 4m52s Details CI / Lint (push) Successful in 5m23s Details CI / Assistant Split Regression (push) Successful in 6m45s Details CI / Unit Tests (push) Failing after 6m7s Details CI / E2E Tests (push) Has been cancelled Details CI / Fresh-Linux Docker Deploy (push) Has been cancelled Details CI / Build (push) Has been cancelled Details CI / Release Images (push) Has been cancelled Details Dockerfile.dev serves via 'pnpm dev', so Next.js JIT-compiles routes on first hit. On the QNAP runner, the cold compile of the root page + middleware can take >10s and occasionally OOM-kills a worker, causing test #2 (unauthenticated / → signin) to hit ERR_CONNECTION_REFUSED while the other smoke tests (which target /auth/signin, pre-warmed via admin-login steps) pass fine. Add an explicit curl warm-up loop so Playwright only runs against a ready server.	2026-04-13 02:42:49 +02:00
Hartmut	e7d0151d6b	ci(e2e): scope CI E2E to smoke.spec.ts only CI / Assistant Split Regression (push) Failing after 57s Details CI / Architecture Guardrails (push) Successful in 2m4s Details CI / Lint (push) Successful in 4m8s Details CI / Typecheck (push) Successful in 4m17s Details CI / Unit Tests (push) Successful in 7m53s Details CI / Build (push) Successful in 5m31s Details CI / E2E Tests (push) Successful in 5m25s Details CI / Fresh-Linux Docker Deploy (push) Failing after 6m11s Details CI / Release Images (push) Has been skipped Details QNAP runner's Next.js test server hits memory threshold mid-run with the full 167-test suite, restarts, and cascading ECONNREFUSED errors mark 96/167 tests as failed — unrelated to code under test. Limit the CI E2E job to e2e/smoke.spec.ts (5 tests). Full suite runs locally and in a future dedicated nightly job with a beefier runner.	2026-04-13 02:17:31 +02:00
Hartmut	a0b407e92d	ci: bump skill matrix parser test timeout; install playwright in isolated dir CI / Architecture Guardrails (push) Successful in 19m4s Details CI / Assistant Split Regression (push) Successful in 20m21s Details CI / Lint (push) Successful in 21m52s Details CI / Typecheck (push) Successful in 22m37s Details CI / Unit Tests (push) Successful in 7m48s Details CI / Build (push) Successful in 5m16s Details CI / Fresh-Linux Docker Deploy (push) Failing after 12m42s Details CI / E2E Tests (push) Failing after 35m15s Details CI / Release Images (push) Has been skipped Details Unit Tests flaked on QNAP: skillMatrixParser ExcelJS workbook builds exceeded the 5s default per-test timeout (runtime ~8.6s for the suite). Bumped to 30s. Docker Deploy smoke tests failed because `npm install` in the repo root tried to resolve sibling workspace:* deps (pnpm protocol, not npm-supported). Install @playwright/test into /tmp/pw-install instead and symlink the package dirs into apps/web/node_modules so the CJS require() in playwright.ci.config.ts resolves it by walking up from apps/web/.	2026-04-13 01:11:37 +02:00
Hartmut	a88db567ad	ci: fix E2E postgres-test collision and smoke @playwright/test resolution CI / Architecture Guardrails (push) Successful in 3m46s Details CI / Assistant Split Regression (push) Successful in 4m38s Details CI / Lint (push) Successful in 4m56s Details CI / Typecheck (push) Successful in 5m24s Details CI / Unit Tests (push) Failing after 5m21s Details CI / Build (push) Successful in 5m46s Details CI / Fresh-Linux Docker Deploy (push) Failing after 4m35s Details CI / Release Images (push) Has been cancelled Details CI / E2E Tests (push) Has been cancelled Details E2E: test-server.mjs always spins up its own postgres-test container and publishes port 5432 on the docker host — colliding with Gitea's core postgres on the QNAP runner. Add PLAYWRIGHT_USE_EXTERNAL_DB opt-in so CI can reuse the e2epg job-service container (which test-server still pushes+seeds into). Set the flag in the E2E job. docker-deploy smoke: install @playwright/test locally (no -g, no --save) so the CJS require() in apps/web/playwright.ci.config.ts resolves it by walking up from the config directory. Global npm install lands in a hostedtoolcache path Node does not search.	2026-04-13 00:53:19 +02:00
Hartmut	ca71be14c5	ci(e2e): provide dummy PGADMIN_PASSWORD for test-server compose CI / Architecture Guardrails (push) Successful in 3m35s Details CI / Typecheck (push) Successful in 4m18s Details CI / Assistant Split Regression (push) Successful in 4m20s Details CI / Lint (push) Successful in 4m19s Details CI / Unit Tests (push) Successful in 6m56s Details CI / Build (push) Successful in 6m31s Details CI / E2E Tests (push) Failing after 4m50s Details CI / Release Images (push) Has been skipped Details CI / Fresh-Linux Docker Deploy (push) Failing after 5m23s Details test-server.mjs spawns 'docker compose --profile test up postgres-test' but compose validates env interpolation across ALL services before filtering by profile. The unused pgadmin service's PGADMIN_PASSWORD:? check fires and aborts the call. Set a dummy value in the job env.	2026-04-13 00:31:11 +02:00
Hartmut	e6b11120ab	ci(docker-deploy): symlink packages/db node_modules into scripts/ CI / Architecture Guardrails (push) Successful in 2m37s Details CI / Typecheck (push) Successful in 3m22s Details CI / Assistant Split Regression (push) Successful in 4m48s Details CI / Lint (push) Successful in 5m17s Details CI / E2E Tests (push) Has been cancelled Details CI / Fresh-Linux Docker Deploy (push) Has been cancelled Details CI / Release Images (push) Has been cancelled Details CI / Build (push) Has started running Details CI / Unit Tests (push) Has started running Details Node's ESM bare-specifier resolver walks up from the script's directory and ignores NODE_PATH (that's CJS-only). Create scripts/node_modules with symlinks to @prisma, @node-rs, and .prisma from packages/db/node_modules so setup-admin.mjs's imports resolve on the first step up.	2026-04-13 00:25:36 +02:00
Hartmut	d6df582e5e	chore: stop tracking .claude/worktrees agent scratch repos CI / Architecture Guardrails (push) Successful in 2m19s Details CI / Typecheck (push) Successful in 4m48s Details CI / Lint (push) Successful in 4m41s Details CI / Assistant Split Regression (push) Successful in 7m58s Details CI / Unit Tests (push) Successful in 10m18s Details CI / Build (push) Successful in 8m43s Details CI / Fresh-Linux Docker Deploy (push) Failing after 3m34s Details CI / E2E Tests (push) Failing after 4m29s Details CI / Release Images (push) Has been skipped Details	2026-04-13 00:04:43 +02:00
Hartmut	b164c4ca70	ci: fix e2e hostname collision and docker-deploy admin seed CI / Architecture Guardrails (push) Has started running Details CI / Typecheck (push) Has started running Details CI / Lint (push) Has started running Details CI / Assistant Split Regression (push) Has started running Details CI / Unit Tests (push) Has been cancelled Details CI / Build (push) Has been cancelled Details CI / E2E Tests (push) Has been cancelled Details CI / Fresh-Linux Docker Deploy (push) Has been cancelled Details CI / Release Images (push) Has been cancelled Details E2E: rename service hosts postgres/redis to e2epg/e2eredis — the gitea_gitea network has multiple containers answering to 'postgres' (Gitea core + concurrent job services), causing split-brain where prisma db push and db:seed connected to different databases and audit_logs ended up missing. docker-compose.ci.yml: stop attaching postgres/redis to gitea_gitea for the docker-deploy-test job — only the app needs cross-network reachability; the compose services talk to each other on the internal default network. Docker Deploy: setup-admin.mjs imports @prisma/client and @node-rs/argon2 which only live in packages/db/node_modules. Node resolves bare specifiers from the script's directory (/app/scripts), not cwd, so pnpm --filter wrappers did not help. Set NODE_PATH to packages/db/node_modules as a fallback resolution root.	2026-04-13 00:04:32 +02:00
Hartmut	f856dd26b3	ci: diagnose e2e audit_logs mystery; fix docker-deploy admin seed CI / Architecture Guardrails (push) Successful in 2m18s Details CI / Assistant Split Regression (push) Successful in 5m10s Details CI / Lint (push) Successful in 6m2s Details CI / Typecheck (push) Successful in 6m37s Details CI / Unit Tests (push) Successful in 9m5s Details CI / Build (push) Successful in 5m24s Details CI / E2E Tests (push) Failing after 3m55s Details CI / Release Images (push) Has been skipped Details CI / Fresh-Linux Docker Deploy (push) Failing after 3m18s Details - e2e: install psql; dump 'getent hosts postgres' (suspect two hosts answer to 'postgres' on gitea_gitea) and the table list after push. Fail loudly when audit_logs is missing so we see the true state at push time instead of later at seed time. - docker-deploy: setup-admin.mjs imports @prisma/client via bare specifier, which only resolves inside packages/db in pnpm workspaces. Run the script through `pnpm --filter @capakraken/db exec` so Node walks the right node_modules. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-12 23:43:10 +02:00
Hartmut	931d1f5d5f	ci: bridge docker-deploy compose to gitea_gitea; bypass turbo for e2e CI / Architecture Guardrails (push) Successful in 2m13s Details CI / Assistant Split Regression (push) Successful in 3m42s Details CI / Typecheck (push) Successful in 4m46s Details CI / Lint (push) Successful in 5m43s Details CI / Unit Tests (push) Successful in 8m1s Details CI / Build (push) Successful in 6m6s Details CI / E2E Tests (push) Failing after 4m12s Details CI / Release Images (push) Has been skipped Details CI / Fresh-Linux Docker Deploy (push) Failing after 3m26s Details - docker-compose.ci.yml: attach app/postgres/redis to the external gitea_gitea network so the act_runner job container (which lives on gitea_gitea) can reach the compose services by name. Otherwise 'localhost:3100' from the job container resolves to the job container itself, not the compose-network app — all health checks and smoke tests were hitting nothing. - ci.yml: switch health/smoke URLs from localhost to http://app:3100 and expose PLAYWRIGHT_BASE_URL so the smoke config can override. - ci.yml: run E2E playwright directly via pnpm --filter, bypassing turbo which strict-filters PLAYWRIGHT_DATABASE_URL and friends. - playwright.ci.config.ts: honour PLAYWRIGHT_BASE_URL env override. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-12 23:22:50 +02:00
Hartmut	0b2d263d30	ci: use prisma db execute (no psql dep); baseline migrations after push CI / Architecture Guardrails (push) Successful in 2m54s Details CI / Typecheck (push) Successful in 3m38s Details CI / Lint (push) Successful in 3m56s Details CI / Assistant Split Regression (push) Successful in 4m17s Details CI / Unit Tests (push) Successful in 6m32s Details CI / Build (push) Successful in 6m8s Details CI / E2E Tests (push) Failing after 4m37s Details CI / Fresh-Linux Docker Deploy (push) Failing after 6m7s Details CI / Release Images (push) Has been skipped Details - e2e: switch schema reset + sanity check from psql (not installed in act_runner's catthehacker/ubuntu image) to `prisma db execute --stdin` which is already a dev dep. - docker-deploy: after `db push` the schema matches schema.prisma but _prisma_migrations is empty, so the follow-up `migrate deploy` fails with P3005. Baseline each migration directory as applied via `prisma migrate resolve --applied` before deploy; the migrations themselves are idempotent supplements, so marking-as-applied is safe. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-12 23:01:51 +02:00

1 2 3 4 5 ...

781 Commits