# Plan: Render Pipeline Performance Optimizations

## Context

Analysis of render logs shows the first render of a complex 140-part bearing takes 181s, while subsequent renders take 20s (OptiX cache — already fixed). Further optimizations can reduce per-render time and increase throughput.

Current baseline (2048x2048, 256 samples, Cycles GPU, OIDN denoiser):
- GLB import: 7-11s
- GPU render: 11-13s (warm cache)
- Total: 20-22s per render

## Tasks (in order of impact)

### [x] Task 1: Resolution-aware sample count for thumbnails

- **File**: `backend/app/domains/pipeline/tasks/render_order_line.py`
- **What**: When the output type resolution is <= 1024x1024 (thumbnails, previews), auto-scale samples down. Formula: `samples = max(32, base_samples * min(width, height) / 2048)`. Only apply when the output type doesn't explicitly set samples.
- **Also**: `backend/app/domains/pipeline/tasks/render_thumbnail.py` — thumbnail renders use hardcoded settings; ensure they use low samples (32-64).
- **Acceptance gate**: A 512x512 thumbnail uses ~64 samples instead of 256; a 2048x2048 HQ render still uses 256.
- **Dependencies**: None
- **Risk**: Low — only affects auto-calculated samples, explicit per-OT samples override this
- **Savings**: 50-75% GPU time on thumbnail/preview renders

### [ ] Task 2: Prefer USD path over GLB when USD master exists

- **File**: `backend/app/domains/pipeline/tasks/render_order_line.py`
- **What**: The render task already checks for USD masters (lines 145-166) but the GLB tessellation step still runs as fallback. Audit the USD detection logic and ensure:
  1. When `usd_render_path` is found, skip GLB tessellation entirely (no `export_step_to_gltf` subprocess)
  2. Log when USD path is used vs GLB fallback
  3. The USD path should be the default when available
- **Also check**: `backend/app/services/render_blender.py` — verify `render_still()` skips GLB conversion when `usd_path` is provided (line 100-101 says it does)
- **Acceptance gate**: A product with a USD master renders without the 7-11s GLB tessellation step
- **Dependencies**: None
- **Risk**: Low — USD path already works; this just ensures it's always preferred

### [ ] Task 3: Enable Blender persistent data for animations

- **File**: `render-worker/scripts/turntable_render.py`
- **What**: Add `scene.render.use_persistent_data = True` before rendering turntable frames. This keeps the BVH acceleration structure in memory between frames, avoiding rebuild for each of the 12-24 frames.
- **Acceptance gate**: Turntable renders of complex products are 20-30% faster
- **Dependencies**: None
- **Risk**: Low — Blender 5.0 supports this; increases VRAM usage slightly

### [x] Task 4: Dual render queue for light/heavy workloads

- **Files**:
  - `docker-compose.yml` — add second render-worker service for light tasks
  - `backend/app/domains/pipeline/tasks/render_thumbnail.py` — route thumbnails to light queue
  - `backend/app/domains/pipeline/tasks/render_order_line.py` — route based on resolution
- **What**: Split `asset_pipeline` into two queues:
  - `asset_pipeline` — heavy renders (2048x2048, turntables): concurrency=1
  - `asset_pipeline_light` — thumbnails and small stills (<=1024): concurrency=2
  - Route based on output resolution or task type
- **Acceptance gate**: Thumbnail generation doesn't block HQ renders; 2 thumbnails render concurrently
- **Dependencies**: Task 1 (lower samples for light queue makes concurrent rendering safer)
- **Risk**: Medium — VRAM contention if both workers render simultaneously. Mitigated by thumbnails being small (512x512, 64 samples = minimal VRAM)

### [x] Task 5: Skip re-tessellation when GLB already exists

- **File**: `backend/app/services/render_blender.py`
- **What**: In `render_still()`, the STEP→GLB tessellation runs every time. Cache the GLB file per CAD file (already stored as `gltf_geometry` MediaAsset). Before tessellating, check if a GLB MediaAsset exists for this cad_file_id and reuse it.
- **Also**: `backend/app/domains/pipeline/tasks/render_order_line.py` — pass the existing GLB path to the render service when available
- **Acceptance gate**: Second render of same product skips the 7-11s tessellation step; GLB is reused from MediaAsset
- **Dependencies**: Task 2 (USD path is preferred; this is fallback for products without USD)
- **Risk**: Low — GLB is deterministic per CAD file; if the CAD file changes, a new GLB is generated

### [x] Task 6: Output format optimization (WebP for stills)

- **File**: `render-worker/scripts/_blender_scene_setup.py` (or `blender_render.py`)
- **What**: After Blender renders a PNG, optionally convert to WebP for 50-70% smaller files. Add a `webp` output format option to OutputType. When selected, render as PNG then convert via Pillow.
- **Also**: `backend/app/services/render_blender.py` — add post-render WebP conversion
- **Acceptance gate**: WebP output type produces smaller files with no visible quality loss
- **Dependencies**: None
- **Risk**: Low — WebP is widely supported; PNG is kept as default

## Migration Check

**No** — no database changes needed. All optimizations are in the render pipeline and Docker config.

## Order Recommendation

1. Task 1 (sample scaling) — simple, immediate impact
2. Task 2 (USD preference) — audit + small code change
3. Task 3 (persistent data) — one-liner in turntable script
4. Task 5 (GLB caching) — avoids redundant tessellation
5. Task 4 (dual queue) — architecture change, needs testing
6. Task 6 (WebP) — new feature, lowest priority

Tasks 1-3 can be done in parallel (independent files).