Files
HartOMat/PLAN_REFACTOR.md
T
Hartmut ea31ed657c feat(refactor/phase1): foundation infrastructure for modular pipeline
Phase 1 of PLAN_REFACTOR.md — all four sub-tasks implemented:

1.1 PipelineLogger (backend/app/core/pipeline_logger.py)
  - Structured step_start/step_done/step_error/step_progress API
  - Publishes to Python logging AND Redis SSE via log_task_event
  - Context manager `pl.step("name")` for auto-timing

1.2 RenderJobDocument (backend/app/domains/rendering/job_document.py)
  - Pydantic JSONB schema: state machine + per-step records + timing
  - begin_step/finish_step/fail_step/skip_step helpers
  - Migration 048: adds render_job_doc JSONB column to order_lines
  - OrderLine model updated with render_job_doc field

1.3 TenantContextMiddleware (backend/app/core/middleware.py)
  - Decodes JWT, stores tenant_id + role in request.state
  - get_db updated to auto-apply RLS SET LOCAL from request.state
  - Registered in main.py (runs before every request)
  - JWT now embeds tenant_id claim via create_access_token()
  - Login endpoint passes tenant_id to token creation

1.4 ProcessStep Registry (backend/app/core/process_steps.py)
  - StepName StrEnum with all 20 pipeline step names
  - Single source of truth for log prefixes, DB records, UI labels

Also adds db_utils.py with set_tenant_sync() + get_sync_session()
for use inside Celery tasks (bypass-safe RLS helper).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 19:25:08 +01:00

1174 lines
51 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Schaeffler Automat — Refactor Plan
> Document date: 2026-03-08
> Branch: refactor/v2
> Author: Architecture review via Claude Code
---
## Executive Summary
### Current State
Schaeffler Automat is a working Blender-based media production pipeline with:
- Domain-driven backend structure (partially migrated, many compat shims still present)
- 7 Docker services with GPU render-worker
- PostgreSQL with tenant_id columns + Row Level Security (RLS) enabled but inconsistently
applied at the application layer
- Celery task queues with two workers (step_processing + thumbnail_rendering)
- WebSocket real-time events via Redis Pub/Sub
- React/Vite frontend with workflow editor (ReactFlow), media browser, notifications
### Core Problems
1. `step_tasks.py` is 1,170 lines — monolithic task file containing 8+ distinct pipeline steps
2. Tenant isolation is partial: RLS is defined in DB migration 036 but `set_tenant_context()`
is not called consistently in every router; Celery tasks bypass RLS entirely
3. Pillow overlay code (green bar + model name label) is dead code — all renders use
`transparent_bg=True` but the 55-line block still runs conditionally
4. STL workflow remnants: `stl_quality` setting, `VALID_STL_QUALITIES`, `stl_size_bytes` in
render_log dicts still reference the old STL-based pipeline; the actual pipeline is GLB-only
5. Render job cancellation uses a synthetic task ID (`render-{line_id}`) that does not match
actual Celery task IDs — making revoke() a no-op
6. The MATERIAL_PALETTE + palette fallback lives in `step_processor.py` — should be replaced
with `SCHAEFFLER_059999_FailedMaterial` (magenta) per the project goals
7. Log messages are inconsistent: some use Python f-strings with no prefix, others use
`[STEP_NAME]` markers; structured logging is not enforced
8. `render_order_line_task` in `step_tasks.py` duplicates most of
`render_order_line_still_task` in `domains/rendering/tasks.py`
9. The blender_render.py Blender script is 853 lines with no sub-module structure
10. No GPU-first enforcement: `cycles_device` defaults to "auto" with no explicit fallback log
### Vision
A clean, modular pipeline where:
- Every step is a named `ProcessStep` with start/progress/done log events and DB audit trail
- Render jobs are tracked as structured JSON documents (job tickets) in the DB
- Tenant isolation is enforced at the dependency-injection layer, not ad-hoc per endpoint
- Dead code (Pillow overlays, STL workflow, Flamenco shims, threejs renderer) is deleted
- The auth hierarchy supports GlobalAdmin > TenantAdmin > ProjectManager > Client
- Workers scale dynamically without service restarts
- Notifications are batched summaries, not per-render noise
---
## Architecture Overview
### Current Architecture
```
┌─────────────┐ HTTP ┌──────────────────────────────────────────┐
│ Frontend │ ──────────> │ backend:8888 (FastAPI) │
│ React/Vite │ │ ├─ domains/auth │
│ :5173 │ <─ WS ──── │ ├─ domains/orders │
└─────────────┘ │ ├─ domains/products │
│ ├─ domains/rendering │
│ ├─ domains/tenants │
│ └─ api/routers/ (compat shims) │
└──────────┬───────────────────────────────┘
│ Celery tasks via Redis broker
┌─────────────────┼──────────────────┐
│ │ │
┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
│ worker │ │render-worker│ │ beat │
│ step_proc │ │thumbnail_ │ │ scheduler │
│ ai_valid │ │ rendering │ └─────────────┘
│ concurr=8 │ │ concurr=1 │
└─────────────┘ └──────▼──────┘
│ subprocess
┌──────▼──────┐
│ blender │
│ /opt/blend │
└─────────────┘
┌──────────────┐ ┌──────────┐ ┌──────────┐
│ PostgreSQL │ │ Redis │ │ MinIO │
│ :5432 │ │ :6379 │ │ :9000 │
└──────────────┘ └──────────┘ └──────────┘
```
### Target Architecture (Post-Refactor)
```
┌─────────────────────────────────────────────────────────┐
│ Frontend React/Vite :5173 │
│ ├─ WorkflowEditor (ReactFlow) — visual pipeline │
│ ├─ MediaBrowser — server-side filtered + virtual scroll│
│ ├─ NotificationCenter — batched summaries only │
│ └─ Admin — tooltips on every setting │
└────────────────────┬────────────────────────────────────┘
│ HTTP + WebSocket
┌────────────────────▼────────────────────────────────────┐
│ backend:8888 (FastAPI) │
│ middleware: TenantContextMiddleware (injects RLS) │
│ ├─ domains/auth (GlobalAdmin|TenantAdmin|PM|Client)│
│ ├─ domains/pipeline (process step registry + dispatch) │
│ ├─ domains/rendering (render job documents, workflows) │
│ ├─ domains/products (CAD files, media assets) │
│ ├─ domains/orders (order state machine) │
│ ├─ domains/tenants (tenant management) │
│ └─ domains/billing (pricing, invoices) │
└────────────────────┬────────────────────────────────────┘
│ Celery canvas / chain / group
┌───────────────┼───────────────┐
│ │ │
┌────▼────┐ ┌──────▼──────┐ ┌────▼────┐
│ worker │ │render-worker│ │ beat │
│ step_ │ │ concurr=1 │ │ sched. │
│ process │ │ +Blender GPU│ │ recover │
│ concr=8 │ └──────▼──────┘ │ queues │
└─────────┘ │ └─────────┘
subprocess (SIGTERM → SIGKILL + cleanup)
┌──────▼──────┐
│ blender │ (GPU-first, explicit CPU-fallback log)
└─────────────┘
```
---
## Phase 1: Foundation (Weeks 12)
Critical infrastructure that blocks everything else.
### 1.1 Structured Logging Framework
**Current state:**
Log messages are a mix of bare `logger.info(f"...")`, `emit(order_line_id, "...")`, and
`log_task_event(task_id, "...")`. No consistent prefix, no structured fields.
**Target:**
A `PipelineLogger` class that wraps Python's `logging` module and additionally writes
structured events to the DB (`audit_log` or a new `pipeline_events` table).
**Design:**
```python
# backend/app/core/pipeline_logger.py
class PipelineLogger:
PREFIX_FORMAT = "[{step_name}]"
def step_start(self, step: str, context: dict): ...
def step_progress(self, step: str, pct: int, msg: str): ...
def step_done(self, step: str, duration_s: float, result: dict): ...
def step_error(self, step: str, error: str, exc: Exception | None): ...
```
Every log call emits:
- Python `logging` line with `[STEP_NAME] message`
- Redis `log_task_event` for SSE streaming
- Optional DB insert into `pipeline_events(task_id, step_name, level, message, duration_s, context JSONB, created_at)`
**Files to create:**
- `backend/app/core/pipeline_logger.py` — PipelineLogger class
- `backend/alembic/versions/048_pipeline_events.py` — new table migration
**Files to modify:**
- All task files to replace bare `logger.info/error` with `PipelineLogger` calls
- `backend/app/core/task_logs.py` — keep Redis SSE publish, add DB write path
### 1.2 Render Job Document
**Current state:**
`OrderLine.render_log` is a loosely-structured JSONB dict. No schema, no state machine,
no step-level results stored.
**Target:**
A `RenderJobDocument` JSONB schema stored in `order_lines.render_job_doc`. Acts as the
single source of truth for a render job's state machine.
**Schema (JSONB):**
```json
{
"version": 1,
"job_id": "<order_line_id>",
"created_at": "ISO8601",
"state": "pending|queued|running|completed|failed|cancelled",
"celery_task_id": "uuid",
"steps": [
{
"name": "resolve_step_path",
"status": "done",
"started_at": "ISO8601",
"completed_at": "ISO8601",
"duration_s": 0.02,
"output": {"step_path": "/app/uploads/..."}
},
{
"name": "occ_glb_export",
"status": "done",
"duration_s": 8.4,
"output": {"glb_path": "...", "size_bytes": 204800}
},
{
"name": "blender_render",
"status": "running",
"started_at": "ISO8601",
"gpu_type": "OPTIX",
"engine": "cycles",
"samples": 256
}
],
"error": null,
"result": {
"output_path": "...",
"duration_s": 34.2,
"engine_used": "cycles",
"gpu": "RTX 3090"
}
}
```
**Migration:**
- `backend/alembic/versions/049_render_job_document.py` — add `render_job_doc JSONB` to `order_lines`; keep `render_log` for backward compat (deprecate, remove in Phase 3)
**Files to create:**
- `backend/app/domains/rendering/job_document.py``RenderJobDocument` Pydantic model + helpers (`update_step`, `set_state`, `append_error`)
### 1.3 Tenant Context Middleware
**Current state:**
`set_tenant_context()` must be called manually in each endpoint. Celery tasks bypass RLS
entirely (they use sync engines without `SET LOCAL app.current_tenant_id`).
**Problem:**
Migration 036 enables RLS, but `build_tenant_db_dep()` in `database.py` actually yields
`db` without setting the tenant context (line 92: `yield db # context-setting happens
via set_tenant_context when needed`). This means most endpoints are silently bypassing RLS.
**Target:**
A FastAPI middleware `TenantContextMiddleware` that automatically sets RLS context for
every request based on the JWT `tenant_id` claim.
```python
# backend/app/core/middleware.py
class TenantContextMiddleware(BaseHTTPMiddleware):
async def dispatch(self, request: Request, call_next):
# Extract JWT, decode tenant_id
# Store in request.state.tenant_id
# After DB session is acquired, SET LOCAL app.current_tenant_id
...
```
**JWT changes:**
`create_access_token()` must embed `tenant_id` in claims:
```python
payload = {"sub": user_id, "role": role, "tenant_id": str(tenant_id), "exp": expires}
```
**Celery tasks:**
All sync DB sessions in Celery tasks must receive `tenant_id` as a task argument and
execute `session.execute(text("SET LOCAL app.current_tenant_id = :tid"), {"tid": tenant_id})`
immediately after session creation. Add a `_set_tenant(session, tenant_id)` helper in
`backend/app/core/db_utils.py`.
**Files to create:**
- `backend/app/core/middleware.py` — TenantContextMiddleware
- `backend/app/core/db_utils.py``_set_tenant(session, tenant_id)`
**Files to modify:**
- `backend/app/main.py` — add middleware
- `backend/app/utils/auth.py` — embed tenant_id in JWT
- All Celery task functions — accept `tenant_id: str | None` parameter, call `_set_tenant`
### 1.4 Process Step Registry
**Current state:**
Pipeline steps are implicit — scattered across `step_tasks.py`, `rendering/tasks.py`,
`step_processor.py`, `render_blender.py`. No central definition.
**Target:**
A `ProcessStep` enum and registry that all tasks reference by name.
```python
# backend/app/domains/pipeline/steps.py
class ProcessStep(str, enum.Enum):
UPLOAD_STEP = "upload_step"
PARSE_EXCEL = "parse_excel"
EXTRACT_METADATA = "extract_metadata"
OCC_GLB_EXPORT = "occ_glb_export"
RENDER_THUMBNAIL = "render_thumbnail"
RENDER_STILL = "render_still"
RENDER_TURNTABLE = "render_turntable"
EXPORT_GLB = "export_glb"
EXPORT_BLEND = "export_blend"
DELIVER = "deliver"
```
Each step maps to exactly one Celery task and one workflow node type. This enum becomes
the contract between the visual workflow editor and the task executor.
---
## Phase 2: Pipeline Modularity (Weeks 34)
Break up `step_tasks.py` (1,170 lines). One file = one pipeline stage.
### 2.1 Decompose step_tasks.py
**Current functions and their new homes:**
| Current location | Function | Target file |
|---|---|---|
| `step_tasks.py` | `process_step_file` | `domains/pipeline/tasks/extract_metadata.py` |
| `step_tasks.py` | `render_step_thumbnail` | `domains/pipeline/tasks/render_thumbnail.py` |
| `step_tasks.py` | `generate_gltf_geometry_task` | `domains/pipeline/tasks/export_glb_geometry.py` |
| `step_tasks.py` | `generate_gltf_production_task` | `domains/pipeline/tasks/export_glb_production.py` |
| `step_tasks.py` | `regenerate_thumbnail` | `domains/pipeline/tasks/render_thumbnail.py` |
| `step_tasks.py` | `dispatch_order_line_render` | `domains/pipeline/tasks/dispatch.py` |
| `step_tasks.py` | `render_order_line_task` | **DELETE** (duplicate of `domains/rendering/tasks.render_order_line_still_task`) |
| `step_tasks.py` | `reextract_cad_metadata` | `domains/pipeline/tasks/extract_metadata.py` |
| `step_tasks.py` | `_auto_populate_materials_for_cad` | `domains/pipeline/tasks/auto_materials.py` |
| `step_tasks.py` | `_bbox_from_glb`, `_bbox_from_step_cadquery` | `domains/pipeline/tasks/bbox.py` |
| `rendering/tasks.py` | `render_order_line_still_task` | `domains/rendering/tasks/render_still.py` |
| `rendering/tasks.py` | `render_turntable_task` | `domains/rendering/tasks/render_turntable.py` |
| `rendering/tasks.py` | `export_gltf_for_order_line_task` | `domains/pipeline/tasks/export_glb_geometry.py` |
| `rendering/tasks.py` | `export_blend_for_order_line_task` | `domains/rendering/tasks/export_blend.py` |
| `rendering/tasks.py` | `publish_asset` | `domains/media/tasks.py` |
**`step_tasks.py` becomes a compatibility shim** (import-only, deprecated) until all
callers are updated. Remove it in Phase 3.
### 2.2 Render Job Document Integration
Every Celery task in the new structure:
1. Reads/creates `RenderJobDocument` at task start
2. Updates the relevant step via `job_doc.update_step(step_name, status="running")`
3. On completion: `job_doc.update_step(step_name, status="done", duration_s=elapsed)`
4. On failure: `job_doc.set_state("failed")` + `job_doc.append_error(...)`
5. Writes document back to `order_lines.render_job_doc`
### 2.3 Render Job Cancellation (Proper)
**Current problem:**
`celery_app.control.revoke("render-{line_id}", terminate=True)` — this ID is synthetic
and does not match the actual Celery task ID, so revoke is a no-op. The Blender process
continues running.
**Solution:**
1. Store the actual Celery task ID in `render_job_doc.celery_task_id` when the task starts
2. Cancel endpoint reads `render_job_doc.celery_task_id` and revokes with that real ID
3. The render subprocess uses `start_new_session=True` (already done in `render_blender.py`)
and stores `proc.pid` in the job document
4. On SIGTERM, the Celery task's signal handler calls `os.killpg(pgid, SIGTERM)`, waits 10s,
then `os.killpg(pgid, SIGKILL)`
5. Clean up: remove partial output file, remove `_frames_*` temp directory
6. Update `render_job_doc.state = "cancelled"`, clear `OrderLine.render_status = "cancelled"`
**Files to modify:**
- `backend/app/api/routers/orders.py` — read celery_task_id from job doc, not synthetic ID
- `backend/app/domains/rendering/tasks/render_still.py` — store task ID + PID in job doc,
register SIGTERM handler
- `backend/app/domains/rendering/tasks/render_turntable.py` — same
### 2.4 GPU-Primary Rendering
**Current state:**
`cycles_device` defaults to "auto". When GPU is unavailable, Blender silently falls back
to CPU with no log message. The `_activate_gpu()` function in `blender_render.py` already
probes for GPU but the result is not reflected in the render job document.
**Target:**
- `cycles_device` default changes from "auto" to "gpu" in system settings
- `_activate_gpu()` result is logged with `[GPU_PROBE]` prefix:
- Success: `[GPU_PROBE] RTX 3090 activated (OPTIX) — using GPU render`
- Failure: `[GPU_PROBE] No GPU found, falling back to CPU — set cycles_device=cpu to suppress this warning`
- GPU type and fallback reason are written to `render_job_doc.result.gpu_info`
- Admin UI shows GPU status on the Settings page (already partially exists via worker activity)
**Files to modify:**
- `render-worker/scripts/blender_render.py` — enhance `_activate_gpu()` logging
- `backend/app/api/routers/admin.py` — change default `cycles_device` to "gpu"
- `backend/app/domains/rendering/job_document.py` — add `gpu_info` field to result
### 2.5 Blender Script Modularity
**Current state:**
`render-worker/scripts/blender_render.py` is 853 lines with everything inline.
**Target structure:**
```
render-worker/scripts/
├── blender_render.py — entry point, arg parsing, top-level flow
├── _blender_gpu.py — GPU probe + activation
├── _blender_import.py — GLB import, rotation, smooth shading
├── _blender_materials.py — material library application + fallback
├── _blender_camera.py — auto camera from bbox, clip planes
├── _blender_scene.py — scene setup (Mode A vs Mode B)
└── _blender_post.py — (currently Pillow overlay — DELETE THIS FILE)
```
`blender_render.py` imports from these sub-modules. Blender Python's `sys.path` is updated
at the top of the script to include the scripts directory.
---
## Phase 3: Code Deletion (Weeks 34, parallel with Phase 2)
### 3.1 Remove Pillow Overlay Code
**Location:** `render-worker/scripts/blender_render.py` lines 798851
**Why it's dead:** `transparent_bg=True` is always passed for production renders. The
`else:` branch at line 802 can never execute in production. The green Schaeffler bar is
now part of the `.blend` template, not post-processing.
**Delete:**
- Lines 798851 in `blender_render.py` (the entire `if transparent_bg: ... else: try PIL...` block)
- Remove Pillow from render-worker dependencies in `render-worker/Dockerfile`
- Remove the line `- Schaeffler green top bar + model name label via Pillow post-processing.`
from the script docstring
### 3.2 Remove STL Workflow Remnants
**What to delete:**
| Location | What to remove |
|---|---|
| `backend/app/api/routers/admin.py` | `VALID_STL_QUALITIES`, `stl_quality` from `SettingsOut`, `SettingsUpdate`, and all `SETTINGS_DEFAULTS` |
| `backend/app/api/routers/admin.py` | `generate-missing-stls` endpoint (if still present) |
| `backend/app/api/routers/cad.py` | `generate-stl/{quality}` endpoint |
| `backend/app/services/render_blender.py` | `stl_quality` parameter from `render_still()` and `render_turntable_to_file()` |
| `backend/app/services/render_blender.py` | Key `stl_duration_s` → rename to `glb_duration_s` (remove `# key kept for backward compat` comment) |
| `backend/app/tasks/step_tasks.py` | `generate_stl_cache` task (check if it still exists) |
| `render-worker/scripts/` | Any `_import_stl`, `_convert_stl`, `_scale_mm_to_m` functions |
| `backend/app/api/routers/analytics.py` | `avg_stl_s` field in analytics response |
| All render log dicts | Replace `stl_size_bytes: 0` and `stl_duration_s:` with `glb_*` equivalents |
| DB migration | `backend/alembic/versions/050_cleanup_stl_settings.py``DELETE FROM system_settings WHERE key = 'stl_quality'` |
**Files to delete entirely:**
- `blender-renderer/` directory (already removed from docker-compose.yml, remove directory)
- `threejs-renderer/` directory (migration 033 already removed it from services)
- `flamenco/` directory (migration 032 removed Flamenco; verify nothing still imports from it)
**Verify before deleting:**
```bash
grep -rn "blender-renderer\|threejs-renderer\|flamenco" backend/ frontend/ --include="*.py" --include="*.ts" --include="*.tsx"
```
### 3.3 Remove Compat Shims
After all callers are migrated, delete these shim files:
- `backend/app/models/user.py` (shim → `domains/auth/models.py`)
- `backend/app/models/cad_file.py` (shim → `domains/products/models.py`)
- `backend/app/services/render_dispatcher.py` (shim, 10 lines)
- `backend/app/services/material_service.py` (shim → `domains/materials/service.py`)
- `backend/app/services/render_blender.py` (move fully into `domains/rendering/`)
- `backend/app/models/` directory → all models are already in `domains/*/models.py`
### 3.4 Remove Duplicate render_order_line_task
`step_tasks.render_order_line_task` (lines 7051050 of `step_tasks.py`) duplicates
`rendering/tasks.render_order_line_still_task`. The step_tasks version has more
baggage (compat imports, `emit()` calls, stl_quality references). Delete the step_tasks
version, migrate all queue routes to the `rendering/tasks` version.
**Migration:**
- `celery_app.py` task routes: route `app.tasks.step_tasks.*` to empty list, removing
step_tasks from the routing table after all tasks are migrated
- Update `CLAUDE.md` to reflect new task locations
---
## Phase 4: Tenant & Auth (Weeks 56)
### 4.1 Role Hierarchy
**Current roles:** `admin | project_manager | client`
**Target roles:**
```python
class UserRole(str, enum.Enum):
global_admin = "global_admin" # platform operator, bypass RLS, all tenants
tenant_admin = "tenant_admin" # per-tenant admin, full control within tenant
project_manager = "project_manager" # order/render management within tenant
client = "client" # read own orders, create draft orders
```
**Permission matrix:**
| Permission | GlobalAdmin | TenantAdmin | ProjectManager | Client |
|---|---|---|---|---|
| Manage tenants | YES | no | no | no |
| Manage users (all tenants) | YES | no | no | no |
| Manage users (own tenant) | YES | YES | no | no |
| All system settings | YES | YES | no | no |
| Trigger renders | YES | YES | YES | no |
| View all orders in tenant | YES | YES | YES | no |
| Create/view own orders | YES | YES | YES | YES |
| Reject orders | YES | YES | YES | no |
| Delete renders | YES | YES | YES | no |
| View analytics | YES | YES | YES | no |
**DB migration:**
- `backend/alembic/versions/051_role_hierarchy.py` — rename `admin``global_admin`,
add `tenant_admin` to the `userrole` enum; backfill existing `admin` users to `global_admin`
**Auth utilities:**
- `require_global_admin()` — replaces `require_admin()`
- `require_tenant_admin_or_above()` — TenantAdmin or GlobalAdmin
- `require_pm_or_above()` — PM, TenantAdmin, GlobalAdmin
### 4.2 Tenant Isolation — Consistency Audit
**The problem:**
`database.py:build_tenant_db_dep()` yields the session without setting RLS context
(line 92 comments say "context-setting happens via set_tenant_context when needed").
This means every endpoint that uses `Depends(get_db)` bypasses RLS.
**Fix — Middleware approach (preferred):**
```python
# backend/app/core/middleware.py
class TenantContextMiddleware(BaseHTTPMiddleware):
"""Set PostgreSQL RLS context on every request from JWT claims."""
BYPASS_PATHS = {"/health", "/api/auth/login", "/api/auth/refresh"}
async def dispatch(self, request: Request, call_next):
if request.url.path in self.BYPASS_PATHS:
return await call_next(request)
token = self._extract_token(request)
if token:
payload = decode_token_safe(token)
tenant_id = payload.get("tenant_id")
role = payload.get("role")
request.state.tenant_id = tenant_id
request.state.role = role
response = await call_next(request)
return response
```
The `get_db` dependency is modified to read `tenant_id` from `request.state`:
```python
async def get_db(request: Request) -> AsyncGenerator[AsyncSession, None]:
async with AsyncSessionLocal() as session:
tenant_id = getattr(request.state, "tenant_id", None)
role = getattr(request.state, "role", None)
if tenant_id:
if role == "global_admin":
await session.execute(text("SET LOCAL app.current_tenant_id = 'bypass'"))
else:
await session.execute(
text("SET LOCAL app.current_tenant_id = :tid"),
{"tid": str(tenant_id)},
)
yield session
```
### 4.3 Tenant Isolation Strategy — Shared vs. Dedicated Containers
**Decision: Shared containers with DB-level isolation (current model)**
**Analysis:**
| Factor | Shared containers | Dedicated containers per tenant |
|---|---|---|
| Cost | Low (6 containers total) | High (6 containers × N tenants) |
| Complexity | Low | Very high (orchestration, networking) |
| Data isolation | DB-level (RLS) | Full OS-level |
| GPU sharing | Single GPU shared | Dedicated GPU per tenant (expensive) |
| Blender jobs | Queue + concurrency control | Per-tenant render queue |
| Failure blast radius | All tenants affected by worker crash | Isolated per tenant |
| Scaling | Celery autoscale | Docker Swarm / Kubernetes HPA |
| Migration effort | Weeks (Phase 3-4) | Months (new orchestration layer) |
**Recommendation:** Maintain shared containers with DB-level RLS isolation. Dedicated
containers are only justified if tenants have strict contractual data isolation requirements
(e.g., GDPR-mandated separate processing). For the current internal use case (Schaeffler
internal teams), RLS + tenant_id partitioning is sufficient.
**If dedicated containers are required in future:**
- Docker Compose override file per tenant (`docker-compose.{tenant-slug}.yml`)
- Each tenant gets own PostgreSQL schema (not separate DB) with schema-based routing
- Shared MinIO with per-tenant bucket policies
- Separate Redis database (0-15) per tenant (max 16 tenants)
- Celery routing: per-tenant queue prefix `{tenant_slug}.thumbnail_rendering`
### 4.4 Per-Tenant Feature Flags
Add a `tenant_config` JSONB column to the `tenants` table:
```python
# backend/alembic/versions/052_tenant_feature_flags.py
tenant_config JSONB DEFAULT '{
"max_concurrent_renders": 3,
"render_engines_allowed": ["cycles"],
"max_order_size": 500,
"fallback_material": "SCHAEFFLER_059999_FailedMaterial",
"notifications_enabled": true,
"invoice_prefix": "INV"
}'
```
Feature flags checked at render dispatch time:
- `max_concurrent_renders` — enforced in Celery queue routing
- `render_engines_allowed` — validated in OutputType creation
- `fallback_material` — passed to Blender scripts (see §6.4)
---
## Phase 5: Material & Rendering Improvements (Weeks 56)
### 5.1 Fallback Material — SCHAEFFLER_059999_FailedMaterial
**Current state:**
`step_processor.py:MATERIAL_PALETTE` assigns rainbow colors from a palette when material
assignment fails or no material is specified. `blender_render.py` has its own
`PALETTE_LINEAR` for the same purpose.
**Target:**
When material resolution fails (no alias, no exact match, material library link broken),
assign `SCHAEFFLER_059999_FailedMaterial` (magenta) so failed assignments are immediately
visible in renders.
**Implementation:**
- `domains/materials/service.py:resolve_material_map()` — instead of pass-through, return
`SCHAEFFLER_059999_FailedMaterial` for unresolved parts (configurable per-tenant via
`tenant_config.fallback_material`)
- `render-worker/scripts/blender_render.py` — when material library is provided but a
part name does not match any library material, assign `SCHAEFFLER_059999_FailedMaterial`
rather than palette color
- `render-worker/scripts/_blender_materials.py` — a new sub-module for material logic
with explicit logging: `[MATERIAL] part 'Outer_Ring' → 'SCHAEFFLER_010101_Steel-Bare' (alias match)`
and `[MATERIAL] part 'Unknown_Part' → 'SCHAEFFLER_059999_FailedMaterial' (no match)`
- `step_processor.py` — remove `MATERIAL_PALETTE` and `_material_to_color()`; the palette
is no longer used once fallback material is in place. Part colors for geometry GLB viewer
should come from the material library color map, not a rainbow palette.
### 5.2 Remove EEVEE Fallback
**Current state:**
`render_blender.py` has an EEVEE-to-Cycles fallback:
```python
if returncode > 0 and engine == "eevee":
logger.warning("EEVEE failed (exit %d) — retrying with Cycles", returncode)
returncode, stdout_lines2, stderr_lines2 = _run("cycles")
engine_used = "cycles (eevee fallback)"
```
This hides failures and makes debugging harder. Per the Blender 5.0.1 requirement, EEVEE
Next should work reliably. If it fails, it should be a hard failure, not a silent retry.
**Target:** Remove the EEVEE-to-Cycles fallback. If EEVEE fails, the task fails with a
clear error. Set `EEVEE_FALLBACK_ENABLED=false` system setting (default false from now on).
### 5.3 Remove Blender Version Check
**Current state:**
`backend/app/services/render_blender.py` defines:
```python
MIN_BLENDER_VERSION = (5, 0, 1)
```
This constant is defined but the check that uses it has been removed. Search for any
remaining version-comparison code in `blender_render.py` and render scripts.
**Target:**
- Remove `MIN_BLENDER_VERSION = (5, 0, 1)` from `render_blender.py`
- Remove any `bpy.app.version` comparisons in render scripts
- Blender 5.0.1+ is assumed; older versions are not supported
---
## Phase 6: Notification Center Refactor (Week 7)
### 6.1 Current Problems
Per-render notifications (render.completed, render.failed) fire for every single
`OrderLine`. An order with 200 lines generates 200 notifications. This is too noisy.
### 6.2 Notification Architecture
**Three channels:**
1. **Activity Feed** (`/api/activity`) — per-action events: every render start/complete,
every order state change, every upload. Low-level, not shown in bell dropdown. Available
in a dedicated `/activity` page for debugging.
2. **Notification Center** (`/api/notifications`) — batch summaries only:
- "Order #ORD-2026-042 rendering complete: 47/50 succeeded, 3 failed"
- "Excel import failed: 12 products skipped (see import log)"
- "Worker recovery: 3 stalled renders requeued after 120min timeout"
3. **System Alerts** (admin only) — infrastructure issues: GPU probe failed, Blender
binary not found, Redis connection lost.
**Notification trigger rules:**
- `render.completed` per-line → suppress; emit batch when ALL lines in order reach terminal state
- `render.failed` per-line → suppress; emit batch on order completion
- `excel.imported` → one notification per upload with summary counts
- `order.submitted` → one notification (always keep)
- System alerts → always emit individually
**DB changes:**
- `audit_log` — add `channel VARCHAR(20)` column: `activity | notification | alert`
- `notification_configs` — extend `event_type` to include new batch event types
- New beat task: `batch_render_notifications` — runs every 60s, checks for orders where
all lines are terminal but no batch notification has been emitted; emits the summary
### 6.3 Per-User Notification Preferences
Current `notification_configs` table has `event_type` + `channel` + `enabled`. Extend:
- Add `frequency: str` column — `immediate | hourly | daily | never`
- Frequency is respected by the batch notification beat task
**Files to modify:**
- `backend/app/domains/notifications/models.py` — add `channel`, `frequency` columns
- `backend/app/services/notification_service.py` — add `emit_batch_notification()` function
- `backend/app/tasks/beat_tasks.py` — add `batch_render_notifications` schedule
- `frontend/src/pages/NotificationSettings.tsx` — add frequency selector per event type
- `frontend/src/pages/Notifications.tsx` — separate tabs for Activity | Notifications | Alerts
---
## Phase 7: UI/UX Improvements (Week 78)
### 7.1 Tooltip / Help Text System
Every setting, parameter, and action in the Admin UI and order wizard needs a tooltip
explaining what it does and what it affects in the pipeline.
**Architecture:**
```typescript
// frontend/src/help/helpTexts.ts
export const HELP_TEXTS: Record<string, HelpText> = {
"setting.blender_cycles_samples": {
title: "Cycles Samples",
body: "Number of render samples per pixel. Higher = better quality, longer render time. 256 is a good balance for product shots. 64 is fast for previews.",
affects: ["render quality", "render time"],
unit: "samples",
range: [1, 4096],
recommendation: "256 for production, 64 for preview",
},
"setting.gltf_preview_linear_deflection": {
title: "3D Viewer Mesh Quality",
body: "Controls tessellation precision for the 3D browser viewer. Lower values = finer mesh, larger file. 0.1mm is a good default for medium-complexity parts.",
affects: ["3D viewer file size", "viewer load time"],
unit: "mm",
},
"action.regenerate_thumbnails": {
title: "Regenerate All Thumbnails",
body: "Re-renders thumbnails for all STEP files using current settings. This queues all files on the thumbnail_rendering worker. Expected time: N × 30s. Only needed after changing renderer settings.",
warning: "This will queue a large number of tasks. Only run during off-peak hours.",
},
// ... all settings
}
```
```typescript
// frontend/src/components/HelpTooltip.tsx
interface HelpTooltipProps {
helpKey: string
position?: "top" | "right" | "bottom" | "left"
}
export function HelpTooltip({ helpKey, position = "right" }: HelpTooltipProps) {
const help = HELP_TEXTS[helpKey]
if (!help) return null
return (
<Tooltip content={<HelpContent help={help} />} position={position}>
<HelpCircle size={14} className="text-text-muted ml-1 cursor-help" />
</Tooltip>
)
}
```
**Where to add tooltips (minimum required):**
- All `system_settings` keys in Admin > Settings
- All `OutputType.render_settings` fields in the OutputType editor
- All `RenderTemplate` fields in the template editor
- All actions in Admin > Settings (regenerate thumbnails, process unprocessed, etc.)
- All fields in the Order Wizard with non-obvious meaning
### 7.2 Media Browser Refactor
**Current state:**
`frontend/src/pages/MediaBrowser.tsx` — exists but no details on current filter capabilities.
**Target:**
Server-side filtered media browser with:
- Filters: `lagertyp | category_key | render_status | asset_type | tenant_id (admin)`
- Text search on product name, pim_id
- Server-side pagination (50 per page)
- Virtual scroll for large catalogs (react-virtual or TanStack Virtual)
- Batch download selected assets
**API changes:**
```
GET /api/media/assets?
asset_type=still&
category_key=TRB&
lagertyp=Axial-Zylinderrollenlager&
render_status=completed&
page=1&
page_size=50&
q=81113
```
**DB indexes required:**
```sql
-- backend/alembic/versions/053_media_browser_indexes.py
CREATE INDEX ix_media_assets_asset_type_created ON media_assets(asset_type, created_at DESC);
CREATE INDEX ix_products_category_lagertyp ON products(category_key, lagertyp);
CREATE INDEX ix_products_name_gin ON products USING GIN(to_tsvector('simple', COALESCE(name, '') || ' ' || COALESCE(pim_id, '')));
```
**Files to modify:**
- `backend/app/domains/media/router.py` — add `GET /assets` with filter params
- `backend/app/domains/media/schemas.py` — add `MediaAssetFilter` Pydantic model
- `frontend/src/pages/MediaBrowser.tsx` — complete rewrite with virtual scroll
- `frontend/src/api/media.ts` — add `getMediaAssets(filters)` function
### 7.3 Workflow Editor — Pipeline Step Nodes
**Current state:**
`WorkflowEditor.tsx` has 5 node types (Upload, Parse, Render, Export, Deliver) but they
do not map to actual Celery tasks. `WorkflowDefinition.config` is a free-form JSONB blob
with no schema validation.
**Target:**
Node types correspond 1:1 to `ProcessStep` enum values. The workflow editor saves a
validated workflow config that the `dispatch_workflow()` function can execute.
**WorkflowDefinition config schema:**
```json
{
"version": 1,
"nodes": [
{"id": "n1", "step": "extract_metadata", "params": {}},
{"id": "n2", "step": "render_thumbnail", "params": {"engine": "cycles", "samples": 64}},
{"id": "n3", "step": "render_still", "params": {"width": 2048, "height": 2048}},
{"id": "n4", "step": "export_glb", "params": {"quality": "high"}},
{"id": "n5", "step": "deliver", "params": {}}
],
"edges": [
{"from": "n1", "to": "n2"},
{"from": "n2", "to": "n3"},
{"from": "n3", "to": "n4"},
{"from": "n4", "to": "n5"}
]
}
```
Backend validation: `workflow_router.py` validates that all `step` values are in
`ProcessStep` enum before saving.
Frontend: `WorkflowEditor.tsx` builds available node types from a `GET /api/workflows/steps`
endpoint that returns all `ProcessStep` entries with their parameter schemas.
### 7.4 Kanban Rejection Flow
**Current state:**
`OrderStatus.rejected` exists but the rejection flow is undefined. The admin panel has no
rejection UI. `rejected_at` column exists but there is no rejection reason field.
**Target flow:**
1. **Who can reject:** `ProjectManager`, `TenantAdmin`, `GlobalAdmin`
2. **Trigger:** `POST /api/orders/{id}/reject` with body `{"reason": "...", "notify_client": true}`
3. **What happens:**
- Order status → `rejected`, `rejected_at` = now
- `rejection_reason` stored (new `Text` column on `Order`)
- All pending/processing renders are cancelled (same as cancel-renders endpoint)
- Notification emitted to order creator: "Your order #ORD-2026-042 was rejected. Reason: ..."
- Audit log entry created
4. **Client sees:** Order status badge changes to `REJECTED` with reason visible
5. **Re-submission:** Client can `POST /api/orders/{id}/resubmit` which clears rejection,
resets to `draft`, allowing edits before re-submitting. Re-submit creates a new audit log
entry and emits notification to PMs.
**DB migration:**
- `backend/alembic/versions/054_order_rejection.py` — add `rejection_reason TEXT` to `orders`
---
## Phase 8: Scalable Workers (Week 8)
### 8.1 Current Concurrency Controls
- `worker` (step_processing): `CELERY_WORKER_CONCURRENCY` env var, default 8
- `render-worker` (thumbnail_rendering): hardcoded 1 (Blender serial access)
- Both require Docker service restart to change concurrency
### 8.2 Dynamic Worker Scaling
**Short term (no Kubernetes):**
Use Celery's built-in `autoscale` option:
```yaml
# docker-compose.yml
render-worker:
command: celery -A app.tasks.celery_app worker
--loglevel=info
-Q thumbnail_rendering
--autoscale=1,1 # min=1, max=1 (single Blender concurrency)
--concurrency=1
```
For `worker`:
```yaml
worker:
command: celery -A app.tasks.celery_app worker
--loglevel=info
-Q step_processing,ai_validation
--autoscale=${MAX_CONCURRENCY:-8},${MIN_CONCURRENCY:-2}
```
**Per-queue concurrency via DB:**
Add a `worker_configs` table:
```sql
CREATE TABLE worker_configs (
queue_name VARCHAR(100) PRIMARY KEY,
max_concurrency INT NOT NULL DEFAULT 8,
min_concurrency INT NOT NULL DEFAULT 2,
updated_at TIMESTAMP NOT NULL DEFAULT now()
);
```
A beat task `apply_worker_concurrency` runs every 5 minutes and uses Celery control
commands to adjust pool size:
```python
celery_app.control.broadcast("pool_shrink", arguments={"n": 2}, destination=["worker@host"])
celery_app.control.broadcast("pool_grow", arguments={"n": 4}, destination=["worker@host"])
```
**Long term (Kubernetes):**
Workers run as Kubernetes Deployments with HPA on `celery_queue_length` metric (exposed via
Flower or a custom `/metrics` endpoint for Prometheus). Render-workers use GPU node pools
with `nvidia.com/gpu: 1` resource requests.
### 8.3 Worker Health Recovery
**Current state:**
`beat_tasks.recover_stuck_cad_files` runs every 5 minutes and handles stuck processing state.
**Extend to:**
- Detect `render_status = 'processing'` with `render_started_at` > `render_stall_timeout_minutes` ago
- SIGTERM any still-running Blender PID (stored in `render_job_doc.celery_task_id`)
- Reset `render_status` to `failed`, update `render_job_doc.state = 'failed'`
- Emit system alert notification (admin channel)
- Log with `[WORKER_RECOVERY] Stalled render for order_line {id} terminated after {N}min`
---
## Detailed Task Breakdown by Area
### A. step_tasks.py Decomposition
**Current problems:**
- 1,170 lines, 8 distinct Celery tasks, many private helpers, multiple inline DB session
creation patterns
- Imports scattered: some at module level, some inside functions (Celery pattern)
- `render_order_line_task` (lines 7051050+) duplicates `render_order_line_still_task`
**Migration path:**
1. Create new `domains/pipeline/tasks/` directory with one file per step
2. Each new task calls `PipelineLogger` instead of bare `logger.info`
3. Each new task writes to `render_job_doc` via `job_document.py` helpers
4. Old `step_tasks.py` becomes import-only shim: `from app.domains.pipeline.tasks.extract_metadata import process_step_file`
5. After 2-week migration period, delete `step_tasks.py`
### B. Auth Token Claims
**Current:** `{"sub": user_id, "role": role, "exp": expires}` — no tenant_id in token
**Target:** `{"sub": user_id, "role": role, "tenant_id": str(tenant_id), "exp": expires}`
**Impact:** All existing tokens become invalid after deploy. Users must re-login.
**Mitigation:** Rotate `JWT_SECRET_KEY` as part of the deployment to force re-login.
### C. Celery Task Routing Update
After Phase 2 decomposition, update `celery_app.conf.update(task_routes={...})`:
```python
task_routes = {
"app.domains.pipeline.tasks.*": {"queue": "step_processing"},
"app.domains.rendering.tasks.*": {"queue": "thumbnail_rendering"},
"app.domains.media.tasks.*": {"queue": "step_processing"},
"app.tasks.ai_tasks.*": {"queue": "ai_validation"},
"app.tasks.beat_tasks.*": {"queue": "step_processing"},
}
```
### D. Frontend API Client Consistency
All `frontend/src/api/*.ts` files should:
- Use the axios client from `api/client.ts` (which injects `X-Tenant-ID` header)
- Export typed interfaces for all response shapes
- Use `useQuery` / `useMutation` from TanStack Query, not bare `axios.get` in components
**Audit needed:** Check each `api/*.ts` file to confirm `X-Tenant-ID` header is sent
(it is wired in the axios interceptor per commit 5da90b5, but verify all files use
the configured client, not `axios.create()` directly).
---
## Architectural Decisions (ADRs)
### ADR-001: Shared containers vs. per-tenant containers
**Decision:** Shared containers with PostgreSQL RLS
**Rationale:** Cost and complexity savings. RLS provides adequate isolation for internal use.
**Consequences:** Must ensure RLS is applied consistently (Phase 1.3). Blender sessions are
shared; GPU contention is managed via Celery queue depth, not isolation.
### ADR-002: Render Job Document as JSONB
**Decision:** Store render job state machine as JSONB in `order_lines.render_job_doc`
**Rationale:** Avoids additional `workflow_node_results` table queries for debugging;
JSONB is flexible for schema evolution; indexed for state-based queries.
**Alternatives considered:** Separate `render_job_steps` table — rejected (too many joins
for the common "show me render status" query).
### ADR-003: No per-render notifications
**Decision:** Suppress individual render.completed notifications; emit batch at order completion
**Rationale:** An order with 200 lines generates 200 notifications under the current model.
Batch summaries at order completion are actionable; per-render events are noise.
**Consequences:** Activity feed still records all events for debugging.
### ADR-004: GPU-first rendering
**Decision:** Default `cycles_device = "gpu"`, explicit log on CPU fallback
**Rationale:** The render-worker has GPU reservation in docker-compose.yml. CPU fallback
should be visible and logged, not silent.
**Consequences:** Renders on machines without GPU will always log a CPU fallback warning.
### ADR-005: Fallback material over palette
**Decision:** Replace `MATERIAL_PALETTE` rainbow fallback with `SCHAEFFLER_059999_FailedMaterial`
**Rationale:** Failed material assignments should be immediately visible (magenta) rather
than disguised as intentional palette colors.
**Consequences:** Parts with missing material mapping will render magenta in both
thumbnail and production renders. This is a feature, not a bug.
### ADR-006: Blender 5.0.1 minimum, no version guards
**Decision:** Remove all `bpy.app.version` checks and `MIN_BLENDER_VERSION` guards
**Rationale:** The project is Blender 5.0.1-only. Version shims add complexity without value.
**Consequences:** Running with an older Blender binary will cause cryptic errors. Document
the minimum version requirement clearly in the Dockerfile and README.
---
## What Gets Deleted
### Python files to delete entirely:
- `backend/app/models/user.py` — compat shim
- `backend/app/models/cad_file.py` — compat shim
- `backend/app/models/order.py` — compat shim (if exists)
- `backend/app/models/order_item.py` — compat shim
- `backend/app/models/order_line.py` — compat shim
- `backend/app/models/material.py` — compat shim
- `backend/app/models/material_alias.py` — compat shim
- `backend/app/models/render_template.py` — compat shim
- `backend/app/models/output_type.py` — compat shim
- `backend/app/models/system_setting.py` — compat shim
- `backend/app/models/template.py` — compat shim
- `backend/app/models/render_position.py` — compat shim
- `backend/app/services/render_dispatcher.py` — 10-line shim
- `backend/app/services/material_service.py` — 3-line shim
- `backend/app/tasks/step_tasks.py` — after Phase 2 migration complete
- `backend/app/domains/rendering/tasks.py` — split into per-step files in Phase 2
### Directories to delete entirely:
- `blender-renderer/` — HTTP microservice, removed from docker-compose in refactor/v2
- `threejs-renderer/` — removed in migration 033
- `flamenco/` — removed in migration 032
### Code blocks to delete (within files):
- `render-worker/scripts/blender_render.py` lines 798851 — Pillow overlay
- `render-worker/scripts/blender_render.py` line 17 — docstring Pillow mention
- `backend/app/services/render_blender.py` line 17 — `MIN_BLENDER_VERSION = (5, 0, 1)`
- `backend/app/services/render_blender.py` lines 229233 — EEVEE-to-Cycles fallback
- `backend/app/services/step_processor.py` lines 1931 — `MATERIAL_PALETTE` + `_material_to_color()`
- `backend/app/api/routers/admin.py``VALID_STL_QUALITIES`, `stl_quality` in all schemas
### System settings to delete (DB migration):
- `stl_quality` — GLB-only pipeline, no STL concept
- `threejs_render_size` — renderer removed
- `thumbnail_renderer` — was multi-value (pillow|blender|threejs), now always blender
---
## Migration Strategy
### Deployment Order (Zero-Downtime)
**Step 1 — DB migrations (non-breaking):**
- Run migrations 048054 (new columns: `render_job_doc`, `rejection_reason`, feature flags, etc.)
- New columns are nullable, no existing queries break
**Step 2 — Backend deploy (backward compatible):**
- Deploy new backend with compat shims in place
- New endpoints and middleware active
- Old endpoints still work
- JWT tokens are extended with `tenant_id` claim (existing tokens without it still work
via fallback in middleware)
**Step 3 — Celery worker deploy:**
- Deploy new `domains/pipeline/tasks/` structure
- `step_tasks.py` compat shim routes to new functions
- Old task names still registered via shim
**Step 4 — Frontend deploy:**
- New WorkflowEditor with validated step types
- HelpTooltip components added
- MediaBrowser refactor with virtual scroll
**Step 5 — Cleanup (breaking):**
- Remove compat shims
- Delete `step_tasks.py`
- Rotate `JWT_SECRET_KEY` to force re-login (tenant_id now required in claims)
- Run DB migration to clean up stl_quality and threejs settings
### Rollback Plan
- All migrations have `downgrade()` implemented
- Compat shims mean old task names still work during migration window
- `render_log` column kept alongside `render_job_doc` until all consumers migrated
### Testing Before Delete
Before deleting any compat shim or old code, verify:
```bash
grep -rn "<old_import_path>" backend/ frontend/ --include="*.py" --include="*.ts" --include="*.tsx"
```
Must return 0 results from non-shim files.
---
## Open Questions
These require product decisions before implementation:
1. **Tenant onboarding flow** — How are new tenants created? Self-service signup, or
admin creates tenant + TenantAdmin user manually? What is the initial data setup?
2. **Blender binary distribution** — Currently host-mounted (`/opt/blender:/opt/blender:ro`).
If multiple render-workers run on different hosts in a future cluster, how is Blender
distributed? Container image vs. network share?
3. **MinIO vs. filesystem storage** — All media assets are stored on the local filesystem
(`/app/uploads` volume). MinIO is configured but not used for primary storage yet. Should
Phase 2 migrate assets to MinIO for horizontal scaling?
4. **Invoice workflow**`billing/models.py` has `Invoice` + `InvoiceLine` models and an
`invoices` table (migration 042). Is billing actually used? If not, should it be removed
to reduce complexity?
5. **AI validation (Azure OpenAI)**`ai_tasks.py` and `azure_ai.py` exist but Azure
credentials are optional. Is this feature actively used or can it be removed?
6. **Email notifications** — SMTP settings exist in system_settings but email sending is
not implemented. Is this a required feature for the next phase?
7. **Rejection re-submission UX** — When a client re-submits a rejected order, do they
create a new order or update the existing one? The current data model supports only
one status per order, not a history of submissions.
8. **Media browser download format** — Bulk download: ZIP of individual files, or separate
download links? ZIP requires server-side assembly which adds load.
9. **Tooltip language** — Help texts in English (per CLAUDE.md coding standards) or German
(for end-user-facing UI)? The admin UI is currently in English labels.
10. **3D Viewer geometry quality** — The `gltf_preview_linear_deflection` default is 0.1mm.
For very small parts (sub-1mm features), this may be too coarse. Should the deflection
auto-scale based on the CAD file's bounding box dimensions?agentId: a6cf206cd46b868cb (for resuming to continue this agent's work if needed)
<usage>total_tokens: 132964
tool_uses: 72
duration_ms: 467361</usage>