feat: tenant AI chat agent with function calling

Actionable AI assistant that uses per-tenant Azure OpenAI credentials to execute natural language commands against the render pipeline. Backend: - ChatMessage model + migration (session-based conversations) - Chat service with 10 OpenAI function-calling tools: list_orders, search_products, create_order, dispatch_renders, get_order_status, set_material_override, set_render_overrides, get_render_stats, check_materials, query_database - All tools tenant-scoped (queries filtered by tenant_id) - Write operations use httpx to call backend API internally - Chat API: POST /chat/messages, GET /chat/sessions, DELETE session - Conversation history preserved in DB (last 50 messages per session) Frontend: - Slide-out ChatPanel (right side, w-96, animated) - User/assistant message styling with avatars and timestamps - Session management (new chat, session history, delete) - Typing indicator while waiting for AI response - Floating chat button in bottom-right corner - Error state for unconfigured AI tenants Example: "Render all Kugellager products as WebP at 1024x1024" → Agent calls search_products + create_order + dispatch_renders Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 12:46:21 +01:00
parent daad2c64f3
commit 59ce61098c
10 changed files with 1627 additions and 66 deletions
@@ -1,89 +1,208 @@
-# Plan: Render Pipeline Performance Optimizations
+# Plan: Tenant AI Chat Agent (Actionable)

 ## Context

-Analysis of render logs shows the first render of a complex 140-part bearing takes 181s, while subsequent renders take 20s (OptiX cache — already fixed). Further optimizations can reduce per-render time and increase throughput.
+Each tenant has Azure OpenAI credentials stored in `tenant_config` JSONB. The goal is an **actionable AI agent** where users can type natural language commands to control the render pipeline — create orders, dispatch renders, check status, set overrides — scoped to their tenant.

-Current baseline (2048x2048, 256 samples, Cycles GPU, OIDN denoiser):
- GLB import: 7-11s
- GPU render: 11-13s (warm cache)
- Total: 20-22s per render
+Example interactions:
+- "Render all Kugellager products as WebP at 1024x1024"
+- "What's the status of my last order?"
+- "Set material override to Steel-Bare on order SA-2026-00160"
+- "How many renders failed this week?"

-## Tasks (in order of impact)
+The agent uses **function calling** (Azure OpenAI tool use) — the LLM decides which API action to execute, the backend executes it, and returns the result. Tenants are fully isolated — each uses their own Azure API key and only sees their own data.

-### [x] Task 1: Resolution-aware sample count for thumbnails
+**What exists:**
+- Per-tenant Azure OpenAI credentials in `tenant_config` JSONB
+- WebSocket system scoped by tenant for real-time events
+- `ai_validation` Celery queue (concurrency=8)
+- Azure OpenAI integration boilerplate in `azure_ai.py`

- **File**: `backend/app/domains/pipeline/tasks/render_order_line.py`
- **What**: When the output type resolution is <= 1024x1024 (thumbnails, previews), auto-scale samples down. Formula: `samples = max(32, base_samples * min(width, height) / 2048)`. Only apply when the output type doesn't explicitly set samples.
- **Also**: `backend/app/domains/pipeline/tasks/render_thumbnail.py` — thumbnail renders use hardcoded settings; ensure they use low samples (32-64).
- **Acceptance gate**: A 512x512 thumbnail uses ~64 samples instead of 256; a 2048x2048 HQ render still uses 256.
+## Affected Files
+
+| File | Change |
+|------|--------|
+| `backend/app/models/chat.py` | **NEW** — ChatMessage model |
+| `backend/app/models/__init__.py` | Import ChatMessage |
+| `backend/app/api/routers/chat.py` | **NEW** — Chat API endpoints |
+| `backend/app/services/chat_service.py` | **NEW** — Azure OpenAI chat + DB context |
+| `backend/app/main.py` | Register chat router |
+| `backend/alembic/versions/XXX_add_chat_messages.py` | Migration |
+| `frontend/src/api/chat.ts` | **NEW** — Chat API types + functions |
+| `frontend/src/components/chat/ChatPanel.tsx` | **NEW** — Chat UI component |
+| `frontend/src/components/layout/Layout.tsx` | Add chat toggle button |
+
+## Tasks (in order)
+
+### [ ] Task 1: ChatMessage model + migration
+
+- **File**: `backend/app/models/chat.py` (new)
+- **What**: Create a ChatMessage model:
+  ```python
+  class ChatMessage(Base):
+      __tablename__ = "chat_messages"
+      id: UUID PK
+      tenant_id: UUID FK → tenants.id (nullable, indexed)
+      user_id: UUID FK → users.id (nullable)
+      session_id: UUID (groups messages in a conversation, indexed)
+      role: String(20) — "user", "assistant", "system"
+      content: Text
+      context_type: String(50) nullable — "order", "product", "general"
+      context_id: UUID nullable — order_id or product_id
+      token_count: Integer nullable — track usage
+      created_at: DateTime
+  ```
+- **Also**: Import in `backend/app/models/__init__.py`
+- **Migration**: `alembic revision --autogenerate -m "add chat_messages table"`
+- **Acceptance gate**: Table exists in DB; model importable
 - **Dependencies**: None
- **Risk**: Low — only affects auto-calculated samples, explicit per-OT samples override this
- **Savings**: 50-75% GPU time on thumbnail/preview renders

-### [ ] Task 2: Prefer USD path over GLB when USD master exists
+### [ ] Task 2: Chat service — Azure OpenAI with function calling

- **File**: `backend/app/domains/pipeline/tasks/render_order_line.py`
- **What**: The render task already checks for USD masters (lines 145-166) but the GLB tessellation step still runs as fallback. Audit the USD detection logic and ensure:
-  1. When `usd_render_path` is found, skip GLB tessellation entirely (no `export_step_to_gltf` subprocess)
-  2. Log when USD path is used vs GLB fallback
-  3. The USD path should be the default when available
- **Also check**: `backend/app/services/render_blender.py` — verify `render_still()` skips GLB conversion when `usd_path` is provided (line 100-101 says it does)
- **Acceptance gate**: A product with a USD master renders without the 7-11s GLB tessellation step
- **Dependencies**: None
- **Risk**: Low — USD path already works; this just ensures it's always preferred
+- **File**: `backend/app/services/chat_service.py` (new)
+- **What**: Service with Azure OpenAI **tool use / function calling**:
+  1. Takes a user message + session_id + tenant_id + user_id
+  2. Loads tenant Azure credentials from `tenant_config`
+  3. Defines **tools** the LLM can call (JSON schema for each):
+     - `list_orders(status, limit)` — list tenant's orders
+     - `search_products(query, category, limit)` — search products
+     - `create_order(product_ids, output_type_name, render_overrides, material_override)` — create & submit
+     - `dispatch_renders(order_id)` — dispatch renders for an order
+     - `get_order_status(order_id)` — check render progress
+     - `set_material_override(order_id, material_name)` — batch material override
+     - `set_render_overrides(order_id, overrides)` — batch render overrides
+     - `get_render_stats()` — throughput stats
+     - `check_materials(order_id)` — unmapped materials check
+     - `query_database(sql)` — read-only SQL (SELECT only, tenant-scoped)
+  4. Calls Azure OpenAI with `tools` parameter — the LLM decides which tool to call
+  5. Executes the tool call internally (same functions as MCP server but tenant-scoped)
+  6. Returns tool result to LLM for a natural language response
+  7. Stores conversation in ChatMessage table

-### [ ] Task 3: Enable Blender persistent data for animations
+  **Tenant isolation**: All DB queries filter by `tenant_id`. The `query_database` tool auto-appends `WHERE tenant_id = '{tenant_id}'` or validates tenant scope.

- **File**: `render-worker/scripts/turntable_render.py`
- **What**: Add `scene.render.use_persistent_data = True` before rendering turntable frames. This keeps the BVH acceleration structure in memory between frames, avoiding rebuild for each of the 12-24 frames.
- **Acceptance gate**: Turntable renders of complex products are 20-30% faster
- **Dependencies**: None
- **Risk**: Low — Blender 5.0 supports this; increases VRAM usage slightly
+  **Tool execution**: Uses the existing backend API functions directly (not HTTP calls) — import from the routers/services.

-### [x] Task 4: Dual render queue for light/heavy workloads
+  ```python
+  tools = [
+      {
+          "type": "function",
+          "function": {
+              "name": "search_products",
+              "description": "Search products by name, PIM-ID, or category",
+              "parameters": {
+                  "type": "object",
+                  "properties": {
+                      "query": {"type": "string"},
+                      "category": {"type": "string"},
+                  }
+              }
+          }
+      },
+      # ... more tools
+  ]
+  response = client.chat.completions.create(
+      model=deployment,
+      messages=messages,
+      tools=tools,
+      tool_choice="auto",
+  )
+  # Handle tool_calls in response, execute, return result
+  ```
+- **Acceptance gate**: User can say "show my last 5 orders" and get real data back via function calling
+- **Dependencies**: Task 1

- **Files**:
-  - `docker-compose.yml` — add second render-worker service for light tasks
-  - `backend/app/domains/pipeline/tasks/render_thumbnail.py` — route thumbnails to light queue
-  - `backend/app/domains/pipeline/tasks/render_order_line.py` — route based on resolution
- **What**: Split `asset_pipeline` into two queues:
-  - `asset_pipeline` — heavy renders (2048x2048, turntables): concurrency=1
-  - `asset_pipeline_light` — thumbnails and small stills (<=1024): concurrency=2
-  - Route based on output resolution or task type
- **Acceptance gate**: Thumbnail generation doesn't block HQ renders; 2 thumbnails render concurrently
- **Dependencies**: Task 1 (lower samples for light queue makes concurrent rendering safer)
- **Risk**: Medium — VRAM contention if both workers render simultaneously. Mitigated by thumbnails being small (512x512, 64 samples = minimal VRAM)
+### [ ] Task 3: Chat API endpoints

-### [x] Task 5: Skip re-tessellation when GLB already exists
+- **File**: `backend/app/api/routers/chat.py` (new)
+- **What**: FastAPI router with endpoints:
+  - `POST /api/chat/messages` — send a message, get AI response
+    - Body: `{ message: str, session_id: str | None, context_type: str | None, context_id: str | None }`
+    - Creates session_id if not provided
+    - Returns: `{ session_id: str, message: ChatMessageOut, response: ChatMessageOut }`
+    - Auth: `get_current_user` — uses user's tenant AI config
+  - `GET /api/chat/sessions` — list user's chat sessions
+    - Returns: `[{ session_id, last_message, message_count, created_at }]`
+  - `GET /api/chat/sessions/{session_id}/messages` — get conversation history
+    - Returns: `[{ id, role, content, created_at }]`
+  - `DELETE /api/chat/sessions/{session_id}` — delete a conversation
+- **Also**: Register router in `backend/app/main.py`
+- **Acceptance gate**: POST /api/chat/messages returns an AI response using tenant credentials
+- **Dependencies**: Task 2

- **File**: `backend/app/services/render_blender.py`
- **What**: In `render_still()`, the STEP→GLB tessellation runs every time. Cache the GLB file per CAD file (already stored as `gltf_geometry` MediaAsset). Before tessellating, check if a GLB MediaAsset exists for this cad_file_id and reuse it.
- **Also**: `backend/app/domains/pipeline/tasks/render_order_line.py` — pass the existing GLB path to the render service when available
- **Acceptance gate**: Second render of same product skips the 7-11s tessellation step; GLB is reused from MediaAsset
- **Dependencies**: Task 2 (USD path is preferred; this is fallback for products without USD)
- **Risk**: Low — GLB is deterministic per CAD file; if the CAD file changes, a new GLB is generated
+### [ ] Task 4: Frontend — Chat API types

-### [x] Task 6: Output format optimization (WebP for stills)
+- **File**: `frontend/src/api/chat.ts` (new)
+- **What**: TypeScript interfaces and API functions:
+  ```typescript
+  interface ChatMessage { id: string; role: 'user' | 'assistant' | 'system'; content: string; created_at: string }
+  interface ChatSession { session_id: string; last_message: string; message_count: number; created_at: string }
+  interface ChatResponse { session_id: string; message: ChatMessage; response: ChatMessage }

- **File**: `render-worker/scripts/_blender_scene_setup.py` (or `blender_render.py`)
- **What**: After Blender renders a PNG, optionally convert to WebP for 50-70% smaller files. Add a `webp` output format option to OutputType. When selected, render as PNG then convert via Pillow.
- **Also**: `backend/app/services/render_blender.py` — add post-render WebP conversion
- **Acceptance gate**: WebP output type produces smaller files with no visible quality loss
- **Dependencies**: None
- **Risk**: Low — WebP is widely supported; PNG is kept as default
+  function sendMessage(message: string, sessionId?: string, contextType?: string, contextId?: string): Promise<ChatResponse>
+  function getSessions(): Promise<ChatSession[]>
+  function getSessionMessages(sessionId: string): Promise<ChatMessage[]>
+  function deleteSession(sessionId: string): Promise<void>
+  ```
+- **Acceptance gate**: Types compile; functions callable
+- **Dependencies**: Task 3
+
+### [ ] Task 5: Frontend — ChatPanel component
+
+- **File**: `frontend/src/components/chat/ChatPanel.tsx` (new)
+- **What**: Slide-out chat panel (right side, similar to notification panels in modern apps):
+  1. **Header**: "AI Assistant" title + close button + session selector
+  2. **Message list**: Scrollable area with role-based styling:
+     - User messages: right-aligned, accent background
+     - Assistant messages: left-aligned, surface background, markdown support
+     - Timestamps below each message
+  3. **Input area**: Text input + send button (Enter to send)
+  4. **Loading state**: Typing indicator while waiting for AI response
+  5. **Session management**: "New conversation" button, session history dropdown
+  6. **Context awareness**: When opened from an order/product page, auto-includes context
+
+  **Styling**:
+  - Fixed right panel (w-96, full height)
+  - Backdrop overlay on mobile
+  - Smooth slide-in animation
+  - Use existing CSS variables (surface, content, accent)
+  - lucide-react icons (MessageSquare, Send, Loader2, X, Plus)
+- **Acceptance gate**: Panel opens/closes, messages send and display, AI responds
+- **Dependencies**: Task 4
+
+### [ ] Task 6: Frontend — Chat toggle in Layout
+
+- **File**: `frontend/src/components/layout/Layout.tsx`
+- **What**: Add a chat toggle button:
+  1. Floating button in bottom-right corner (or in the sidebar)
+  2. Icon: `MessageSquare` from lucide-react
+  3. Badge with unread count (optional, for future)
+  4. Click toggles ChatPanel visibility
+  5. Only show when tenant has `ai_enabled = true`
+- **Acceptance gate**: Button visible for users with AI-enabled tenant; clicking opens/closes ChatPanel
+- **Dependencies**: Task 5

 ## Migration Check

-**No** — no database changes needed. All optimizations are in the render pipeline and Docker config.
+**Yes** — one new table `chat_messages` with UUID PK, FK to tenants and users.

 ## Order Recommendation

-1. Task 1 (sample scaling) — simple, immediate impact
-2. Task 2 (USD preference) — audit + small code change
-3. Task 3 (persistent data) — one-liner in turntable script
-4. Task 5 (GLB caching) — avoids redundant tessellation
-5. Task 4 (dual queue) — architecture change, needs testing
-6. Task 6 (WebP) — new feature, lowest priority
+1. Backend model + migration (Task 1)
+2. Backend service (Task 2)
+3. Backend API (Task 3)
+4. Frontend types (Task 4)
+5. Frontend chat UI (Task 5)
+6. Frontend layout integration (Task 6)

-Tasks 1-3 can be done in parallel (independent files).
+## Risks / Open Questions
+
+1. **Azure OpenAI availability**: If tenant hasn't configured AI credentials, the chat should show a helpful message ("AI not configured — ask your admin to set up Azure OpenAI in Tenant Settings")
+
+2. **Token costs**: Each message uses Azure OpenAI tokens. Consider adding token counting and a configurable monthly limit per tenant.
+
+3. **Context enrichment**: The system prompt could include live data (order counts, render status). This makes the AI more helpful but costs more tokens. Start simple, enhance later.
+
+4. **Streaming responses**: Azure OpenAI supports streaming. V1 uses a simple request/response. V2 could stream via WebSocket for real-time typing effect.
+
+5. **openai package**: The `openai` Python package must be installed in the backend container. Check if it's already a dependency (it may be via `azure_ai.py`).