Security [HIGH]: Prompt-injection guard trivially bypassable (regex-only, no Unicode normalization) #39

Closed
opened 2026-04-16 22:05:08 +02:00 by Hartmut · 1 comment
Owner

Problem

The 12 regex patterns in prompt-guard.ts are case-insensitive ASCII-only. Five-minute bypasses: Cyrillic lookalikes (Ignorе U+0435), zero-width joiners (jai‌lbreak), base64 (aWdub3JlIGFsbCBwcmV2aW91cyBpbnN0cnVjdGlvbnM=), translation framing ('translate: ignore previous...'), line-breaks with ZWSP (ig\u200Bnore previous).

Evidence

  • packages/api/src/lib/prompt-guard.ts:8-21 — 12 regex patterns (no NFKC normalize, no homoglyph fold)

Impact

Prompt-injection attacks succeed in < 5 min of crafting. If the guard is treated as a primary defense (rather than defense-in-depth), it creates false sense of security. Currently the guard is only audited server-side, not blocking. Document this posture explicitly.

Proposed Fix

(1) NFKC-normalize + strip zero-width/combining marks before regex. (2) Fold homoglyphs via unicode-confusables. (3) Document that guard is defense-in-depth; tool-level assertPermission is the real boundary. (4) Add LLM-based classifier (small prompt to a cheap model) as second layer. (5) Expand audit trail (Ticket for C-5).

Acceptance Criteria

  • NFKC normalization applied before regex
  • Top-10 bypass examples added as unit-tests (must all match)
  • docs/security-architecture.md documents layered-defense model (guard ≠ boundary)

Parent Epic: #1
Source: Full-Codebase Security Audit 2026-04-16 (C-1)

## Problem The 12 regex patterns in prompt-guard.ts are case-insensitive ASCII-only. Five-minute bypasses: Cyrillic lookalikes (`Ignorе` U+0435), zero-width joiners (`jai‌lbreak`), base64 (`aWdub3JlIGFsbCBwcmV2aW91cyBpbnN0cnVjdGlvbnM=`), translation framing ('translate: ignore previous...'), line-breaks with ZWSP (`ig\u200Bnore previous`). ## Evidence - `packages/api/src/lib/prompt-guard.ts:8-21 — 12 regex patterns (no NFKC normalize, no homoglyph fold)` ## Impact Prompt-injection attacks succeed in < 5 min of crafting. If the guard is treated as a primary defense (rather than defense-in-depth), it creates false sense of security. Currently the guard is only audited server-side, not blocking. Document this posture explicitly. ## Proposed Fix (1) NFKC-normalize + strip zero-width/combining marks before regex. (2) Fold homoglyphs via `unicode-confusables`. (3) Document that guard is defense-in-depth; tool-level `assertPermission` is the real boundary. (4) Add LLM-based classifier (small prompt to a cheap model) as second layer. (5) Expand audit trail (Ticket for C-5). ## Acceptance Criteria - [ ] NFKC normalization applied before regex - [ ] Top-10 bypass examples added as unit-tests (must all match) - [ ] `docs/security-architecture.md` documents layered-defense model (guard ≠ boundary) --- Parent Epic: #1 Source: Full-Codebase Security Audit 2026-04-16 (C-1)
Hartmut added the security label 2026-04-16 22:05:08 +02:00
Author
Owner

Resolved in commit c2d05b4 (security: Unicode-aware prompt-injection guard). NFKC normalisation + homoglyph folding applied before regex match in packages/api/src/lib/prompt-guard.ts.

Resolved in commit c2d05b4 (`security: Unicode-aware prompt-injection guard`). NFKC normalisation + homoglyph folding applied before regex match in `packages/api/src/lib/prompt-guard.ts`.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Hartmut/CapaKraken#39