Files

T

Hartmut ffe3eebfca perf: render pipeline optimizations — sample scaling, USD logging, persistent BVH

Task 1: Resolution-aware sample count
- Auto-scale samples for resolutions <= 1024: max(32, samples * max_dim / 2048)
- 512x512 thumbnails: 256 → 64 samples (75% GPU savings)
- Thumbnail tasks capped at 64 samples via context manager
- 2048x2048 HQ renders unchanged

Task 2: USD path preference audit + logging
- Verified USD master path is correctly preferred over GLB tessellation
- Added clear emit() messages: "Using USD master" vs "No USD master — GLB path"
- Dynamic render log label: "USD → Blender" vs "STEP → GLB → Blender"

Task 3: Persistent BVH for turntable animations
- Added scene.render.use_persistent_data = True before frame loop
- BVH acceleration structure cached between frames (not rebuilt per frame)
- Applies to both camera orbit and object rotation modes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-15 12:03:31 +01:00

5.5 KiB

Raw Blame History

Plan: Render Pipeline Performance Optimizations

Context

Analysis of render logs shows the first render of a complex 140-part bearing takes 181s, while subsequent renders take 20s (OptiX cache — already fixed). Further optimizations can reduce per-render time and increase throughput.

Current baseline (2048x2048, 256 samples, Cycles GPU, OIDN denoiser):

GLB import: 7-11s
GPU render: 11-13s (warm cache)
Total: 20-22s per render

Tasks (in order of impact)

[x] Task 1: Resolution-aware sample count for thumbnails

File: backend/app/domains/pipeline/tasks/render_order_line.py
What: When the output type resolution is <= 1024x1024 (thumbnails, previews), auto-scale samples down. Formula: samples = max(32, base_samples * min(width, height) / 2048). Only apply when the output type doesn't explicitly set samples.
Also: backend/app/domains/pipeline/tasks/render_thumbnail.py — thumbnail renders use hardcoded settings; ensure they use low samples (32-64).
Acceptance gate: A 512x512 thumbnail uses ~64 samples instead of 256; a 2048x2048 HQ render still uses 256.
Dependencies: None
Risk: Low — only affects auto-calculated samples, explicit per-OT samples override this
Savings: 50-75% GPU time on thumbnail/preview renders

[ ] Task 2: Prefer USD path over GLB when USD master exists

File: backend/app/domains/pipeline/tasks/render_order_line.py
What: The render task already checks for USD masters (lines 145-166) but the GLB tessellation step still runs as fallback. Audit the USD detection logic and ensure:
1. When usd_render_path is found, skip GLB tessellation entirely (no export_step_to_gltf subprocess)
2. Log when USD path is used vs GLB fallback
3. The USD path should be the default when available
Also check: backend/app/services/render_blender.py — verify render_still() skips GLB conversion when usd_path is provided (line 100-101 says it does)
Acceptance gate: A product with a USD master renders without the 7-11s GLB tessellation step
Dependencies: None
Risk: Low — USD path already works; this just ensures it's always preferred

[ ] Task 3: Enable Blender persistent data for animations

File: render-worker/scripts/turntable_render.py
What: Add scene.render.use_persistent_data = True before rendering turntable frames. This keeps the BVH acceleration structure in memory between frames, avoiding rebuild for each of the 12-24 frames.
Acceptance gate: Turntable renders of complex products are 20-30% faster
Dependencies: None
Risk: Low — Blender 5.0 supports this; increases VRAM usage slightly

[ ] Task 4: Dual render queue for light/heavy workloads

Files:
- docker-compose.yml — add second render-worker service for light tasks
- backend/app/domains/pipeline/tasks/render_thumbnail.py — route thumbnails to light queue
- backend/app/domains/pipeline/tasks/render_order_line.py — route based on resolution
What: Split asset_pipeline into two queues:
- asset_pipeline — heavy renders (2048x2048, turntables): concurrency=1
- asset_pipeline_light — thumbnails and small stills (<=1024): concurrency=2
- Route based on output resolution or task type
Acceptance gate: Thumbnail generation doesn't block HQ renders; 2 thumbnails render concurrently
Dependencies: Task 1 (lower samples for light queue makes concurrent rendering safer)
Risk: Medium — VRAM contention if both workers render simultaneously. Mitigated by thumbnails being small (512x512, 64 samples = minimal VRAM)

[ ] Task 5: Skip re-tessellation when GLB already exists

File: backend/app/services/render_blender.py
What: In render_still(), the STEP→GLB tessellation runs every time. Cache the GLB file per CAD file (already stored as gltf_geometry MediaAsset). Before tessellating, check if a GLB MediaAsset exists for this cad_file_id and reuse it.
Also: backend/app/domains/pipeline/tasks/render_order_line.py — pass the existing GLB path to the render service when available
Acceptance gate: Second render of same product skips the 7-11s tessellation step; GLB is reused from MediaAsset
Dependencies: Task 2 (USD path is preferred; this is fallback for products without USD)
Risk: Low — GLB is deterministic per CAD file; if the CAD file changes, a new GLB is generated

[ ] Task 6: Output format optimization (WebP for stills)

File: render-worker/scripts/_blender_scene_setup.py (or blender_render.py)
What: After Blender renders a PNG, optionally convert to WebP for 50-70% smaller files. Add a webp output format option to OutputType. When selected, render as PNG then convert via Pillow.
Also: backend/app/services/render_blender.py — add post-render WebP conversion
Acceptance gate: WebP output type produces smaller files with no visible quality loss
Dependencies: None
Risk: Low — WebP is widely supported; PNG is kept as default

Migration Check

No — no database changes needed. All optimizations are in the render pipeline and Docker config.

Order Recommendation

Task 1 (sample scaling) — simple, immediate impact
Task 2 (USD preference) — audit + small code change
Task 3 (persistent data) — one-liner in turntable script
Task 5 (GLB caching) — avoids redundant tessellation
Task 4 (dual queue) — architecture change, needs testing
Task 6 (WebP) — new feature, lowest priority

Tasks 1-3 can be done in parallel (independent files).

5.5 KiB Raw Blame History