feat: tenant AI chat agent with function calling

Actionable AI assistant that uses per-tenant Azure OpenAI credentials
to execute natural language commands against the render pipeline.

Backend:
- ChatMessage model + migration (session-based conversations)
- Chat service with 10 OpenAI function-calling tools:
  list_orders, search_products, create_order, dispatch_renders,
  get_order_status, set_material_override, set_render_overrides,
  get_render_stats, check_materials, query_database
- All tools tenant-scoped (queries filtered by tenant_id)
- Write operations use httpx to call backend API internally
- Chat API: POST /chat/messages, GET /chat/sessions, DELETE session
- Conversation history preserved in DB (last 50 messages per session)

Frontend:
- Slide-out ChatPanel (right side, w-96, animated)
- User/assistant message styling with avatars and timestamps
- Session management (new chat, session history, delete)
- Typing indicator while waiting for AI response
- Floating chat button in bottom-right corner
- Error state for unconfigured AI tenants

Example: "Render all Kugellager products as WebP at 1024x1024"
→ Agent calls search_products + create_order + dispatch_renders

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-03-15 12:46:21 +01:00
parent daad2c64f3
commit 59ce61098c
10 changed files with 1627 additions and 66 deletions
+183 -64
View File
@@ -1,89 +1,208 @@
# Plan: Render Pipeline Performance Optimizations
# Plan: Tenant AI Chat Agent (Actionable)
## Context
Analysis of render logs shows the first render of a complex 140-part bearing takes 181s, while subsequent renders take 20s (OptiX cache — already fixed). Further optimizations can reduce per-render time and increase throughput.
Each tenant has Azure OpenAI credentials stored in `tenant_config` JSONB. The goal is an **actionable AI agent** where users can type natural language commands to control the render pipeline — create orders, dispatch renders, check status, set overrides — scoped to their tenant.
Current baseline (2048x2048, 256 samples, Cycles GPU, OIDN denoiser):
- GLB import: 7-11s
- GPU render: 11-13s (warm cache)
- Total: 20-22s per render
Example interactions:
- "Render all Kugellager products as WebP at 1024x1024"
- "What's the status of my last order?"
- "Set material override to Steel-Bare on order SA-2026-00160"
- "How many renders failed this week?"
## Tasks (in order of impact)
The agent uses **function calling** (Azure OpenAI tool use) — the LLM decides which API action to execute, the backend executes it, and returns the result. Tenants are fully isolated — each uses their own Azure API key and only sees their own data.
### [x] Task 1: Resolution-aware sample count for thumbnails
**What exists:**
- Per-tenant Azure OpenAI credentials in `tenant_config` JSONB
- WebSocket system scoped by tenant for real-time events
- `ai_validation` Celery queue (concurrency=8)
- Azure OpenAI integration boilerplate in `azure_ai.py`
- **File**: `backend/app/domains/pipeline/tasks/render_order_line.py`
- **What**: When the output type resolution is <= 1024x1024 (thumbnails, previews), auto-scale samples down. Formula: `samples = max(32, base_samples * min(width, height) / 2048)`. Only apply when the output type doesn't explicitly set samples.
- **Also**: `backend/app/domains/pipeline/tasks/render_thumbnail.py` — thumbnail renders use hardcoded settings; ensure they use low samples (32-64).
- **Acceptance gate**: A 512x512 thumbnail uses ~64 samples instead of 256; a 2048x2048 HQ render still uses 256.
## Affected Files
| File | Change |
|------|--------|
| `backend/app/models/chat.py` | **NEW** — ChatMessage model |
| `backend/app/models/__init__.py` | Import ChatMessage |
| `backend/app/api/routers/chat.py` | **NEW** — Chat API endpoints |
| `backend/app/services/chat_service.py` | **NEW** — Azure OpenAI chat + DB context |
| `backend/app/main.py` | Register chat router |
| `backend/alembic/versions/XXX_add_chat_messages.py` | Migration |
| `frontend/src/api/chat.ts` | **NEW** — Chat API types + functions |
| `frontend/src/components/chat/ChatPanel.tsx` | **NEW** — Chat UI component |
| `frontend/src/components/layout/Layout.tsx` | Add chat toggle button |
## Tasks (in order)
### [ ] Task 1: ChatMessage model + migration
- **File**: `backend/app/models/chat.py` (new)
- **What**: Create a ChatMessage model:
```python
class ChatMessage(Base):
__tablename__ = "chat_messages"
id: UUID PK
tenant_id: UUID FK → tenants.id (nullable, indexed)
user_id: UUID FK → users.id (nullable)
session_id: UUID (groups messages in a conversation, indexed)
role: String(20) — "user", "assistant", "system"
content: Text
context_type: String(50) nullable — "order", "product", "general"
context_id: UUID nullable — order_id or product_id
token_count: Integer nullable — track usage
created_at: DateTime
```
- **Also**: Import in `backend/app/models/__init__.py`
- **Migration**: `alembic revision --autogenerate -m "add chat_messages table"`
- **Acceptance gate**: Table exists in DB; model importable
- **Dependencies**: None
- **Risk**: Low — only affects auto-calculated samples, explicit per-OT samples override this
- **Savings**: 50-75% GPU time on thumbnail/preview renders
### [ ] Task 2: Prefer USD path over GLB when USD master exists
### [ ] Task 2: Chat service — Azure OpenAI with function calling
- **File**: `backend/app/domains/pipeline/tasks/render_order_line.py`
- **What**: The render task already checks for USD masters (lines 145-166) but the GLB tessellation step still runs as fallback. Audit the USD detection logic and ensure:
1. When `usd_render_path` is found, skip GLB tessellation entirely (no `export_step_to_gltf` subprocess)
2. Log when USD path is used vs GLB fallback
3. The USD path should be the default when available
- **Also check**: `backend/app/services/render_blender.py` — verify `render_still()` skips GLB conversion when `usd_path` is provided (line 100-101 says it does)
- **Acceptance gate**: A product with a USD master renders without the 7-11s GLB tessellation step
- **Dependencies**: None
- **Risk**: Low — USD path already works; this just ensures it's always preferred
- **File**: `backend/app/services/chat_service.py` (new)
- **What**: Service with Azure OpenAI **tool use / function calling**:
1. Takes a user message + session_id + tenant_id + user_id
2. Loads tenant Azure credentials from `tenant_config`
3. Defines **tools** the LLM can call (JSON schema for each):
- `list_orders(status, limit)` — list tenant's orders
- `search_products(query, category, limit)` — search products
- `create_order(product_ids, output_type_name, render_overrides, material_override)` — create & submit
- `dispatch_renders(order_id)` — dispatch renders for an order
- `get_order_status(order_id)` — check render progress
- `set_material_override(order_id, material_name)` — batch material override
- `set_render_overrides(order_id, overrides)` — batch render overrides
- `get_render_stats()` — throughput stats
- `check_materials(order_id)` — unmapped materials check
- `query_database(sql)` — read-only SQL (SELECT only, tenant-scoped)
4. Calls Azure OpenAI with `tools` parameter — the LLM decides which tool to call
5. Executes the tool call internally (same functions as MCP server but tenant-scoped)
6. Returns tool result to LLM for a natural language response
7. Stores conversation in ChatMessage table
### [ ] Task 3: Enable Blender persistent data for animations
**Tenant isolation**: All DB queries filter by `tenant_id`. The `query_database` tool auto-appends `WHERE tenant_id = '{tenant_id}'` or validates tenant scope.
- **File**: `render-worker/scripts/turntable_render.py`
- **What**: Add `scene.render.use_persistent_data = True` before rendering turntable frames. This keeps the BVH acceleration structure in memory between frames, avoiding rebuild for each of the 12-24 frames.
- **Acceptance gate**: Turntable renders of complex products are 20-30% faster
- **Dependencies**: None
- **Risk**: Low — Blender 5.0 supports this; increases VRAM usage slightly
**Tool execution**: Uses the existing backend API functions directly (not HTTP calls) — import from the routers/services.
### [x] Task 4: Dual render queue for light/heavy workloads
```python
tools = [
{
"type": "function",
"function": {
"name": "search_products",
"description": "Search products by name, PIM-ID, or category",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string"},
"category": {"type": "string"},
}
}
}
},
# ... more tools
]
response = client.chat.completions.create(
model=deployment,
messages=messages,
tools=tools,
tool_choice="auto",
)
# Handle tool_calls in response, execute, return result
```
- **Acceptance gate**: User can say "show my last 5 orders" and get real data back via function calling
- **Dependencies**: Task 1
- **Files**:
- `docker-compose.yml` — add second render-worker service for light tasks
- `backend/app/domains/pipeline/tasks/render_thumbnail.py` — route thumbnails to light queue
- `backend/app/domains/pipeline/tasks/render_order_line.py` — route based on resolution
- **What**: Split `asset_pipeline` into two queues:
- `asset_pipeline` — heavy renders (2048x2048, turntables): concurrency=1
- `asset_pipeline_light` — thumbnails and small stills (<=1024): concurrency=2
- Route based on output resolution or task type
- **Acceptance gate**: Thumbnail generation doesn't block HQ renders; 2 thumbnails render concurrently
- **Dependencies**: Task 1 (lower samples for light queue makes concurrent rendering safer)
- **Risk**: Medium — VRAM contention if both workers render simultaneously. Mitigated by thumbnails being small (512x512, 64 samples = minimal VRAM)
### [ ] Task 3: Chat API endpoints
### [x] Task 5: Skip re-tessellation when GLB already exists
- **File**: `backend/app/api/routers/chat.py` (new)
- **What**: FastAPI router with endpoints:
- `POST /api/chat/messages` — send a message, get AI response
- Body: `{ message: str, session_id: str | None, context_type: str | None, context_id: str | None }`
- Creates session_id if not provided
- Returns: `{ session_id: str, message: ChatMessageOut, response: ChatMessageOut }`
- Auth: `get_current_user` — uses user's tenant AI config
- `GET /api/chat/sessions` — list user's chat sessions
- Returns: `[{ session_id, last_message, message_count, created_at }]`
- `GET /api/chat/sessions/{session_id}/messages` — get conversation history
- Returns: `[{ id, role, content, created_at }]`
- `DELETE /api/chat/sessions/{session_id}` — delete a conversation
- **Also**: Register router in `backend/app/main.py`
- **Acceptance gate**: POST /api/chat/messages returns an AI response using tenant credentials
- **Dependencies**: Task 2
- **File**: `backend/app/services/render_blender.py`
- **What**: In `render_still()`, the STEP→GLB tessellation runs every time. Cache the GLB file per CAD file (already stored as `gltf_geometry` MediaAsset). Before tessellating, check if a GLB MediaAsset exists for this cad_file_id and reuse it.
- **Also**: `backend/app/domains/pipeline/tasks/render_order_line.py` — pass the existing GLB path to the render service when available
- **Acceptance gate**: Second render of same product skips the 7-11s tessellation step; GLB is reused from MediaAsset
- **Dependencies**: Task 2 (USD path is preferred; this is fallback for products without USD)
- **Risk**: Low — GLB is deterministic per CAD file; if the CAD file changes, a new GLB is generated
### [ ] Task 4: Frontend — Chat API types
### [x] Task 6: Output format optimization (WebP for stills)
- **File**: `frontend/src/api/chat.ts` (new)
- **What**: TypeScript interfaces and API functions:
```typescript
interface ChatMessage { id: string; role: 'user' | 'assistant' | 'system'; content: string; created_at: string }
interface ChatSession { session_id: string; last_message: string; message_count: number; created_at: string }
interface ChatResponse { session_id: string; message: ChatMessage; response: ChatMessage }
- **File**: `render-worker/scripts/_blender_scene_setup.py` (or `blender_render.py`)
- **What**: After Blender renders a PNG, optionally convert to WebP for 50-70% smaller files. Add a `webp` output format option to OutputType. When selected, render as PNG then convert via Pillow.
- **Also**: `backend/app/services/render_blender.py` — add post-render WebP conversion
- **Acceptance gate**: WebP output type produces smaller files with no visible quality loss
- **Dependencies**: None
- **Risk**: Low — WebP is widely supported; PNG is kept as default
function sendMessage(message: string, sessionId?: string, contextType?: string, contextId?: string): Promise<ChatResponse>
function getSessions(): Promise<ChatSession[]>
function getSessionMessages(sessionId: string): Promise<ChatMessage[]>
function deleteSession(sessionId: string): Promise<void>
```
- **Acceptance gate**: Types compile; functions callable
- **Dependencies**: Task 3
### [ ] Task 5: Frontend — ChatPanel component
- **File**: `frontend/src/components/chat/ChatPanel.tsx` (new)
- **What**: Slide-out chat panel (right side, similar to notification panels in modern apps):
1. **Header**: "AI Assistant" title + close button + session selector
2. **Message list**: Scrollable area with role-based styling:
- User messages: right-aligned, accent background
- Assistant messages: left-aligned, surface background, markdown support
- Timestamps below each message
3. **Input area**: Text input + send button (Enter to send)
4. **Loading state**: Typing indicator while waiting for AI response
5. **Session management**: "New conversation" button, session history dropdown
6. **Context awareness**: When opened from an order/product page, auto-includes context
**Styling**:
- Fixed right panel (w-96, full height)
- Backdrop overlay on mobile
- Smooth slide-in animation
- Use existing CSS variables (surface, content, accent)
- lucide-react icons (MessageSquare, Send, Loader2, X, Plus)
- **Acceptance gate**: Panel opens/closes, messages send and display, AI responds
- **Dependencies**: Task 4
### [ ] Task 6: Frontend — Chat toggle in Layout
- **File**: `frontend/src/components/layout/Layout.tsx`
- **What**: Add a chat toggle button:
1. Floating button in bottom-right corner (or in the sidebar)
2. Icon: `MessageSquare` from lucide-react
3. Badge with unread count (optional, for future)
4. Click toggles ChatPanel visibility
5. Only show when tenant has `ai_enabled = true`
- **Acceptance gate**: Button visible for users with AI-enabled tenant; clicking opens/closes ChatPanel
- **Dependencies**: Task 5
## Migration Check
**No**no database changes needed. All optimizations are in the render pipeline and Docker config.
**Yes** — one new table `chat_messages` with UUID PK, FK to tenants and users.
## Order Recommendation
1. Task 1 (sample scaling) — simple, immediate impact
2. Task 2 (USD preference) — audit + small code change
3. Task 3 (persistent data) — one-liner in turntable script
4. Task 5 (GLB caching) — avoids redundant tessellation
5. Task 4 (dual queue) — architecture change, needs testing
6. Task 6 (WebP) — new feature, lowest priority
1. Backend model + migration (Task 1)
2. Backend service (Task 2)
3. Backend API (Task 3)
4. Frontend types (Task 4)
5. Frontend chat UI (Task 5)
6. Frontend layout integration (Task 6)
Tasks 1-3 can be done in parallel (independent files).
## Risks / Open Questions
1. **Azure OpenAI availability**: If tenant hasn't configured AI credentials, the chat should show a helpful message ("AI not configured — ask your admin to set up Azure OpenAI in Tenant Settings")
2. **Token costs**: Each message uses Azure OpenAI tokens. Consider adding token counting and a configurable monthly limit per tenant.
3. **Context enrichment**: The system prompt could include live data (order counts, render status). This makes the AI more helpful but costs more tokens. Start simple, enhance later.
4. **Streaming responses**: Azure OpenAI supports streaming. V1 uses a simple request/response. V2 could stream via WebSocket for real-time typing effect.
5. **openai package**: The `openai` Python package must be installed in the backend container. Check if it's already a dependency (it may be via `azure_ai.py`).