# Plan: Render Pipeline Performance Optimizations ## Context Analysis of render logs shows the first render of a complex 140-part bearing takes 181s, while subsequent renders take 20s (OptiX cache — already fixed). Further optimizations can reduce per-render time and increase throughput. Current baseline (2048x2048, 256 samples, Cycles GPU, OIDN denoiser): - GLB import: 7-11s - GPU render: 11-13s (warm cache) - Total: 20-22s per render ## Tasks (in order of impact) ### [x] Task 1: Resolution-aware sample count for thumbnails - **File**: `backend/app/domains/pipeline/tasks/render_order_line.py` - **What**: When the output type resolution is <= 1024x1024 (thumbnails, previews), auto-scale samples down. Formula: `samples = max(32, base_samples * min(width, height) / 2048)`. Only apply when the output type doesn't explicitly set samples. - **Also**: `backend/app/domains/pipeline/tasks/render_thumbnail.py` — thumbnail renders use hardcoded settings; ensure they use low samples (32-64). - **Acceptance gate**: A 512x512 thumbnail uses ~64 samples instead of 256; a 2048x2048 HQ render still uses 256. - **Dependencies**: None - **Risk**: Low — only affects auto-calculated samples, explicit per-OT samples override this - **Savings**: 50-75% GPU time on thumbnail/preview renders ### [ ] Task 2: Prefer USD path over GLB when USD master exists - **File**: `backend/app/domains/pipeline/tasks/render_order_line.py` - **What**: The render task already checks for USD masters (lines 145-166) but the GLB tessellation step still runs as fallback. Audit the USD detection logic and ensure: 1. When `usd_render_path` is found, skip GLB tessellation entirely (no `export_step_to_gltf` subprocess) 2. Log when USD path is used vs GLB fallback 3. The USD path should be the default when available - **Also check**: `backend/app/services/render_blender.py` — verify `render_still()` skips GLB conversion when `usd_path` is provided (line 100-101 says it does) - **Acceptance gate**: A product with a USD master renders without the 7-11s GLB tessellation step - **Dependencies**: None - **Risk**: Low — USD path already works; this just ensures it's always preferred ### [ ] Task 3: Enable Blender persistent data for animations - **File**: `render-worker/scripts/turntable_render.py` - **What**: Add `scene.render.use_persistent_data = True` before rendering turntable frames. This keeps the BVH acceleration structure in memory between frames, avoiding rebuild for each of the 12-24 frames. - **Acceptance gate**: Turntable renders of complex products are 20-30% faster - **Dependencies**: None - **Risk**: Low — Blender 5.0 supports this; increases VRAM usage slightly ### [x] Task 4: Dual render queue for light/heavy workloads - **Files**: - `docker-compose.yml` — add second render-worker service for light tasks - `backend/app/domains/pipeline/tasks/render_thumbnail.py` — route thumbnails to light queue - `backend/app/domains/pipeline/tasks/render_order_line.py` — route based on resolution - **What**: Split `asset_pipeline` into two queues: - `asset_pipeline` — heavy renders (2048x2048, turntables): concurrency=1 - `asset_pipeline_light` — thumbnails and small stills (<=1024): concurrency=2 - Route based on output resolution or task type - **Acceptance gate**: Thumbnail generation doesn't block HQ renders; 2 thumbnails render concurrently - **Dependencies**: Task 1 (lower samples for light queue makes concurrent rendering safer) - **Risk**: Medium — VRAM contention if both workers render simultaneously. Mitigated by thumbnails being small (512x512, 64 samples = minimal VRAM) ### [x] Task 5: Skip re-tessellation when GLB already exists - **File**: `backend/app/services/render_blender.py` - **What**: In `render_still()`, the STEP→GLB tessellation runs every time. Cache the GLB file per CAD file (already stored as `gltf_geometry` MediaAsset). Before tessellating, check if a GLB MediaAsset exists for this cad_file_id and reuse it. - **Also**: `backend/app/domains/pipeline/tasks/render_order_line.py` — pass the existing GLB path to the render service when available - **Acceptance gate**: Second render of same product skips the 7-11s tessellation step; GLB is reused from MediaAsset - **Dependencies**: Task 2 (USD path is preferred; this is fallback for products without USD) - **Risk**: Low — GLB is deterministic per CAD file; if the CAD file changes, a new GLB is generated ### [x] Task 6: Output format optimization (WebP for stills) - **File**: `render-worker/scripts/_blender_scene_setup.py` (or `blender_render.py`) - **What**: After Blender renders a PNG, optionally convert to WebP for 50-70% smaller files. Add a `webp` output format option to OutputType. When selected, render as PNG then convert via Pillow. - **Also**: `backend/app/services/render_blender.py` — add post-render WebP conversion - **Acceptance gate**: WebP output type produces smaller files with no visible quality loss - **Dependencies**: None - **Risk**: Low — WebP is widely supported; PNG is kept as default ## Migration Check **No** — no database changes needed. All optimizations are in the render pipeline and Docker config. ## Order Recommendation 1. Task 1 (sample scaling) — simple, immediate impact 2. Task 2 (USD preference) — audit + small code change 3. Task 3 (persistent data) — one-liner in turntable script 4. Task 5 (GLB caching) — avoids redundant tessellation 5. Task 4 (dual queue) — architecture change, needs testing 6. Task 6 (WebP) — new feature, lowest priority Tasks 1-3 can be done in parallel (independent files).