Root cause: render-worker and render-worker-light shared the same GPU,
causing contention. Complex TRB renders went from 17s → 36s (2x slower).
Changes:
- Thumbnails back to asset_pipeline queue (not asset_pipeline_light)
- Dispatch routing always uses asset_pipeline (no queue splitting)
- render-worker-light gated behind "multi-gpu" profile — only starts with:
docker compose --profile multi-gpu up -d
- For single-GPU setups: all rendering is sequential on one worker
The dual queue approach is correct for multi-GPU machines where each
worker gets its own GPU. On single-GPU, serial execution is faster
than concurrent GPU contention.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Task 4: Dual render queue
- render-worker: heavy (asset_pipeline, concurrency=1) — HQ 2048x2048, animations
- render-worker-light: light (asset_pipeline_light, concurrency=2) — thumbnails, <=1024
- Thumbnails routed to light queue automatically
- Order line renders routed by resolution at dispatch time
Task 5: GLB caching (skip re-tessellation)
- Before tessellating, check if gltf_geometry MediaAsset exists for the cad_file_id
- If found, copy to expected path — render_blender.py finds it and skips tessellation
- Saves 7-11s per re-render of the same product
Task 6: WebP output format
- New 'webp' option in output_format (OutputType admin)
- Blender renders PNG intermediate, Pillow converts to WebP (quality=90, method=4)
- 50-70% smaller files with no visible quality loss
- Correct MIME type (image/webp) in MediaAsset
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The OptiX cache was mounted at /root/.nv but NVIDIA writes to
/var/tmp/OptixCache_root/optix7cache.db (28MB). Fixed volume mount.
Before: first render after container restart = 181s (OptiX recompilation)
After: first render after container restart = 20s (cached kernels)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The queue handles far more than thumbnails: OCC tessellation, USD master
generation, GLB production, order line renders, and workflow renders.
asset_pipeline better reflects its role as the render-worker's primary queue.
Updated all references in: task decorators, celery_app.py, beat_tasks.py,
docker-compose.yml worker command, worker.py MONITORED_QUEUES, admin.py,
CLAUDE.md, LEARNINGS.md, Dockerfile, helpTexts.ts, test files,
and all .claude/commands/*.md skill files.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Mount named volume optix-cache:/root/.nv so the OptiX ComputeCache
survives docker compose rebuild. Without this every rebuild wiped the
BVH acceleration structure, causing the first render of any complex
scene (~175 parts) to take 130–150s instead of 22s while OptiX
recompiles kernels and rebuilds the BVH from scratch.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- gpu_probe.py: Blender script that probes OPTIX/CUDA/HIP/ONEAPI and
exits 1 on no GPU — used at startup + on-demand from Admin UI
- blender_render.py, still_render.py, turntable_render.py: emit
RENDER_DEVICE_USED: engine=CYCLES device=GPU|CPU compute_type=...
after GPU activation; exit 2 when CYCLES_DEVICE=gpu and CPU fallback
- render_blender.py: parse RENDER_DEVICE_USED token into render_log
(device_used, compute_type, gpu_fallback); handle exit code 2 as
explicit GPU strict-mode failure
- check_version.py: check_gpu() runs gpu_probe.py at container startup;
CYCLES_DEVICE=gpu aborts startup if no GPU found
- docker-compose.yml: CYCLES_DEVICE=${CYCLES_DEVICE:-auto} env var
- gpu_tasks.py: probe_gpu Celery task on thumbnail_rendering queue;
saves result to system_settings.gpu_probe_last_result; beat every 30min
- worker.py: POST /probe/gpu (trigger) + GET /probe/gpu/result (last result)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- docker-compose.yml: change render-worker build context from ./render-worker
to . (project root) so pyproject.toml is accessible; update dockerfile path
- render-worker/Dockerfile: update COPY paths for new build context;
install Python 3.11 via deadsnakes PPA (Ubuntu 22.04 ships 3.10 which
fails the >=3.11 requirement in pyproject.toml)
- 040_media_assets.py: rewrite upgrade() with raw idempotent SQL (CREATE TYPE
inside DO $$ EXCEPTION WHEN duplicate_object $$; CREATE TABLE IF NOT EXISTS;
CREATE INDEX IF NOT EXISTS) to handle pre-existing enum from partial runs
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>