fix: deduplicate GLB/USD generation with Redis locks + review fixes
- Add per-file Redis SET NX EX 1800 locks to generate_gltf_geometry_task and generate_usd_master_task — concurrent duplicates (e.g. double-click of bulk action buttons) now log a warning and return immediately instead of running two expensive OCC tessellation subprocesses on the same file - Fix eng.dispose() called inside with Session() block in cache-hit path of both tasks — moved to after the with block exits (Tasks 3+4 from plan) - Add cad.updated_at = datetime.utcnow() in save_manual_material_overrides (was missing vs parallel save_part_materials endpoint) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -1,108 +1,150 @@
|
||||
# Plan: P2 USD Foundation — Commit & Verify
|
||||
# Plan: Deduplication for GLB/USD Generation + Two Review Fixes
|
||||
|
||||
## Context
|
||||
|
||||
All five P2 milestones are already implemented in the working tree as uncommitted changes.
|
||||
The task now is to apply the DB migrations, commit the work, and verify the stack runs.
|
||||
Two problems to solve:
|
||||
|
||||
### Milestone status (assessed 2026-03-12)
|
||||
**1. Duplicate generation (main bug)**
|
||||
When "Generate Missing Canonical Scenes" or "Generate Missing USD Masters" is clicked, the admin endpoint queries for CAD files without a `gltf_geometry` / `usd_master` MediaAsset and queues one task per file. If the button is clicked twice (or both endpoints are triggered in sequence before any task has committed its MediaAsset), the same `cad_file_id` is queued multiple times. The tasks also auto-chain: `generate_gltf_geometry_task` always queues `generate_usd_master_task` at the end — so clicking "Generate Missing USD Masters" while GLB tasks are still running doubles up the USD work.
|
||||
|
||||
| Milestone | Status | Key files |
|
||||
|---|---|---|
|
||||
| M1: `export_step_to_usd.py` with `schaeffler:partKey` | ✅ DONE | `render-worker/scripts/export_step_to_usd.py` (631 lines) |
|
||||
| M2: `usd_master` MediaAsset + migrations 060–062 + Celery task | ✅ DONE | migrations 060/061/062, `generate_usd_master_task` in `export_glb.py` |
|
||||
| M3: `GET /api/cad/{id}/scene-manifest` | ✅ DONE | `part_key_service.py`, `SceneManifest` schema, endpoint in `cad.py` |
|
||||
| M4: `PUT /api/cad/{id}/manual-material-overrides` | ✅ DONE | New endpoint pair in `cad.py`, `saveManualOverrides` in `cad.ts` |
|
||||
| M5: ThreeDViewer uses partKey, survives reload | ✅ DONE | `partKeyMap` in GLB extras, `effectiveMaterials` merge, server-side persistence |
|
||||
The existing cache check (`step_file_hash`) only short-circuits tessellation when a MediaAsset already exists — it does not prevent two concurrent tasks from both starting the expensive subprocess on the same file. Two processes writing to `_geometry.glb` simultaneously causes corruption / wasted compute.
|
||||
|
||||
## Affected Files (all uncommitted — working tree only)
|
||||
**Solution**: Apply the same Redis `SET NX EX` dedup lock that `process_step_file` uses (lock key `step_processing_lock:{id}`, released in `finally`). Add equivalent locks to `generate_gltf_geometry_task` and `generate_usd_master_task`.
|
||||
|
||||
**Backend**
|
||||
- `backend/alembic/versions/060_usd_master_asset_type.py` — new migration
|
||||
- `backend/alembic/versions/061_material_assignment_layers.py` — new migration
|
||||
- `backend/alembic/versions/062_rename_tessellation_settings.py` — new migration
|
||||
- `backend/app/domains/media/models.py` — `MediaAssetType.usd_master` added
|
||||
- `backend/app/domains/products/models.py` — 3 new JSONB columns on `CadFile`
|
||||
- `backend/app/domains/products/schemas.py` — `SceneManifest`, `PartEntry` Pydantic models
|
||||
- `backend/app/domains/pipeline/tasks/export_glb.py` — `generate_usd_master_task` + auto-chain
|
||||
- `backend/app/domains/pipeline/tasks/extract_metadata.py` — minor update
|
||||
- `backend/app/domains/pipeline/tasks/render_thumbnail.py` — minor update
|
||||
- `backend/app/domains/pipeline/tasks/render_order_line.py` — minor update
|
||||
- `backend/app/api/routers/cad.py` — scene-manifest + manual-material-overrides endpoints
|
||||
- `backend/app/api/routers/admin.py` — generate-missing-usd-masters + generate-missing-canonical-scenes buttons
|
||||
- `backend/app/services/part_key_service.py` — new file: `build_scene_manifest()`, `generate_part_key()`
|
||||
- `backend/app/core/config_service.py` — minor update
|
||||
- `backend/app/core/tenant_context.py` — new file
|
||||
- `backend/app/tasks/step_tasks.py` — re-exports `generate_usd_master_task`
|
||||
**2. Review fix A — `eng.dispose()` inside `with Session` block**
|
||||
`export_glb.py` line 89: `eng.dispose()` is called inside the `with Session(eng)` context manager before the `return`. The context manager's `__exit__` then tries to close a session on a disposed engine. Safe in practice (no exception raised) but fragile and misleading. Move `eng.dispose()` to after the `with` block exits.
|
||||
|
||||
**Render worker**
|
||||
- `render-worker/scripts/export_step_to_usd.py` — new file: full USD exporter
|
||||
- `render-worker/scripts/export_step_to_gltf.py` — injects `partKeyMap` into GLB extras
|
||||
- `render-worker/scripts/still_render.py` — USD path support
|
||||
- `render-worker/scripts/turntable_render.py` — USD path support
|
||||
- `render-worker/Dockerfile` — `usd-core>=24.11` added
|
||||
**3. Review fix B — `save_manual_material_overrides` missing `updated_at`**
|
||||
`cad.py` line 537: `cad.manual_material_overrides = body.overrides` is committed without updating `cad.updated_at`. The parallel endpoint `save_part_materials` (line 430) does call `cad.updated_at = datetime.utcnow()`. Add the same line to `save_manual_material_overrides`.
|
||||
|
||||
**Frontend**
|
||||
- `frontend/src/api/cad.ts` — `getManualOverrides()`, `saveManualOverrides()`
|
||||
- `frontend/src/api/media.ts` — `usd_master` type added
|
||||
- `frontend/src/api/sceneManifest.ts` — new file: `SceneManifest`, `fetchSceneManifest()`
|
||||
- `frontend/src/components/cad/ThreeDViewer.tsx` — `partKeyMap`, `effectiveMaterials`, reconciliation panel
|
||||
- `frontend/src/components/cad/MaterialPanel.tsx` — dual-path save, provenance badge
|
||||
- `frontend/src/pages/Admin.tsx` — USD master bulk action buttons
|
||||
- `frontend/src/pages/ProductDetail.tsx` — `usd_master` row in asset table
|
||||
- `frontend/src/pages/Orders.tsx` — minor update
|
||||
## Affected Files
|
||||
|
||||
| File | Change |
|
||||
|---|---|
|
||||
| `backend/app/domains/pipeline/tasks/export_glb.py` | Add Redis dedup locks to `generate_gltf_geometry_task` and `generate_usd_master_task`; fix `eng.dispose()` placement |
|
||||
| `backend/app/api/routers/cad.py` | Add `cad.updated_at = datetime.utcnow()` in `save_manual_material_overrides` |
|
||||
|
||||
## Tasks (in order)
|
||||
|
||||
### [ ] Task 1: Apply migrations 060–062
|
||||
- **What**: Run `docker compose exec backend alembic upgrade head` to apply the three pending migrations
|
||||
- **Acceptance gate**: `docker compose exec backend alembic current` shows `062` (or higher) as current
|
||||
### [x] Task 1: Add Redis dedup locks to `generate_gltf_geometry_task`
|
||||
|
||||
- **File**: `backend/app/domains/pipeline/tasks/export_glb.py`
|
||||
- **What**: At the top of `generate_gltf_geometry_task`, after `pl.step_start(...)`, acquire a Redis lock using the same pattern as `extract_metadata.py`:
|
||||
|
||||
```python
|
||||
import redis as _redis_lib
|
||||
_lock_key = f"glb_geometry_lock:{cad_file_id}"
|
||||
_r = _redis_lib.from_url(app_settings.redis_url)
|
||||
_acquired = _r.set(_lock_key, "1", nx=True, ex=1800) # 30-min TTL
|
||||
if not _acquired:
|
||||
logger.warning("generate_gltf_geometry_task: %s already in-flight — skipping duplicate", cad_file_id)
|
||||
pl.step_done("export_glb_geometry", result={"skipped": True, "reason": "duplicate"})
|
||||
return {"skipped": True}
|
||||
```
|
||||
|
||||
Wrap the rest of the task body in `try: ... finally: _r.delete(_lock_key)`.
|
||||
|
||||
Note: `app_settings` is already imported inside the function. Import `redis` at the top of the `try` block as `import redis as _redis_lib` (same pattern as `extract_metadata.py` which imports it locally).
|
||||
|
||||
- **Acceptance gate**: Trigger "Generate Missing Canonical Scenes" twice in quick succession — worker logs show `"already in-flight — skipping duplicate"` for the second batch; no file ends up being tessellated twice.
|
||||
- **Dependencies**: none
|
||||
- **Risk**: Low — each migration is additive (ADD VALUE, ADD COLUMN, UPDATE). Check for phantom drops before running.
|
||||
- **Risk**: Low — same pattern as `process_step_file`, TTL 30min covers worst-case tessellation time.
|
||||
|
||||
### [ ] Task 2: TypeScript check
|
||||
- **What**: Run `docker compose exec frontend npx tsc --noEmit` to verify no type errors in the frontend changes
|
||||
- **Acceptance gate**: Zero TypeScript errors
|
||||
- **Dependencies**: none (frontend hot-reload, no rebuild needed)
|
||||
- **Risk**: Low
|
||||
### [x] Task 2: Add Redis dedup lock to `generate_usd_master_task`
|
||||
|
||||
### [ ] Task 3: Rebuild and restart backend + render-worker
|
||||
- **What**: `docker compose up -d --build backend worker render-worker beat` — picks up new Dockerfile (usd-core), new tasks, and new migrations
|
||||
- **Acceptance gate**: `docker compose logs backend | grep "Application startup complete"` and `docker compose exec render-worker python3 -c "from pxr import Usd; print(Usd.GetVersion())"` both succeed
|
||||
- **File**: `backend/app/domains/pipeline/tasks/export_glb.py`
|
||||
- **What**: Same pattern at the top of `generate_usd_master_task`, after `pl.step_start(...)`:
|
||||
|
||||
```python
|
||||
import redis as _redis_lib
|
||||
_lock_key = f"usd_master_lock:{cad_file_id}"
|
||||
_r = _redis_lib.from_url(app_settings.redis_url)
|
||||
_acquired = _r.set(_lock_key, "1", nx=True, ex=1800) # 30-min TTL
|
||||
if not _acquired:
|
||||
logger.warning("generate_usd_master_task: %s already in-flight — skipping duplicate", cad_file_id)
|
||||
pl.step_done("usd_master", result={"skipped": True, "reason": "duplicate"})
|
||||
return {"skipped": True}
|
||||
```
|
||||
|
||||
Wrap the rest of the function body in `try: ... finally: _r.delete(_lock_key)`.
|
||||
|
||||
- **Acceptance gate**: Trigger "Generate Missing USD Masters" while GLB tasks are still running — worker logs show USD tasks skipping duplicates instead of starting a second tessellation.
|
||||
- **Dependencies**: Task 1
|
||||
- **Risk**: Medium — `usd-core` pip install adds build time; if it fails the render-worker won't start
|
||||
|
||||
### [ ] Task 4: Commit all P2 work
|
||||
- **What**: Stage and commit all uncommitted P2 files in a single `feat(P2)` commit
|
||||
- **Acceptance gate**: `git status` shows clean working tree (except LEARNINGS.md and review-report.md which can be included)
|
||||
- **Dependencies**: Tasks 1–3 (verify before committing)
|
||||
- **Risk**: Low
|
||||
|
||||
### [ ] Task 5: Smoke-test end-to-end via Admin panel
|
||||
- **What**: Via Admin → "Generate Missing Canonical Scenes" to regenerate GLBs with `partKeyMap` + auto-chain USD masters for existing CAD files
|
||||
- **Acceptance gate**:
|
||||
- `GET /api/cad/{id}/scene-manifest` returns `{"parts": [...], ...}` for a processed CadFile
|
||||
- ThreeDViewer loads, click a part → MaterialPanel shows assignment provenance
|
||||
- Assign a material → reload page → assignment still present
|
||||
- **Dependencies**: Task 3
|
||||
- **Risk**: Medium — existing CAD files need backfill; may take minutes for bulk jobs to complete
|
||||
### [x] Task 3: Fix `eng.dispose()` placement in cache-hit early-return path
|
||||
|
||||
- **File**: `backend/app/domains/pipeline/tasks/export_glb.py`
|
||||
- **What**: In `generate_gltf_geometry_task`, the cache-hit path (lines 86–95) calls `eng.dispose()` at line 89 while still inside the `with Session(eng)` block, then returns. Move `eng.dispose()` to *after* the `with` block exits.
|
||||
|
||||
Current (broken):
|
||||
```python
|
||||
with Session(eng) as session:
|
||||
...
|
||||
if existing_geo:
|
||||
pl.step_done(...)
|
||||
eng.dispose() # ← inside with block
|
||||
try:
|
||||
generate_usd_master_task.delay(cad_file_id)
|
||||
...
|
||||
return {"cached": True, ...}
|
||||
eng.dispose() # normal path
|
||||
```
|
||||
|
||||
Fixed: remove the `eng.dispose()` at line 89, and move the `generate_usd_master_task.delay()` + `return` to after the `with` block:
|
||||
|
||||
```python
|
||||
_cache_hit_asset_id: str | None = None
|
||||
with Session(eng) as session:
|
||||
...
|
||||
if existing_geo:
|
||||
logger.info("[CACHE] hash match — skipping geometry GLB tessellation for %s", cad_file_id)
|
||||
pl.step_done("export_glb_geometry", result={"cached": True, "asset_id": str(existing_geo.id)})
|
||||
_cache_hit_asset_id = str(existing_geo.id)
|
||||
eng.dispose()
|
||||
|
||||
if _cache_hit_asset_id is not None:
|
||||
try:
|
||||
generate_usd_master_task.delay(cad_file_id)
|
||||
except Exception:
|
||||
logger.debug("Could not queue generate_usd_master_task from cache-hit path (non-fatal)")
|
||||
return {"cached": True, "asset_id": _cache_hit_asset_id}
|
||||
|
||||
# ... rest of function (tessellation path)
|
||||
```
|
||||
|
||||
- **Acceptance gate**: `docker compose exec render-worker python3 -c "import app"` (no import errors); cache-hit path still skips tessellation and chains USD master.
|
||||
- **Dependencies**: none
|
||||
- **Risk**: Low — pure refactor, no logic change.
|
||||
|
||||
### [x] Task 4: Add `updated_at` in `save_manual_material_overrides`
|
||||
|
||||
- **File**: `backend/app/api/routers/cad.py`
|
||||
- **What**: In `save_manual_material_overrides` (around line 537), add `cad.updated_at = datetime.utcnow()` before `await db.commit()`:
|
||||
|
||||
```python
|
||||
cad.manual_material_overrides = body.overrides
|
||||
cad.updated_at = datetime.utcnow() # ← add this line
|
||||
await db.commit()
|
||||
```
|
||||
|
||||
- **Acceptance gate**: `PUT /api/cad/{id}/manual-material-overrides` → `GET /api/cad/{id}` shows updated `updated_at` timestamp.
|
||||
- **Dependencies**: none
|
||||
- **Risk**: None
|
||||
|
||||
## Migration Check
|
||||
|
||||
Three migrations are pending in the working tree:
|
||||
- `060_usd_master_asset_type.py` — additive enum value
|
||||
- `061_material_assignment_layers.py` — additive JSONB columns
|
||||
- `062_rename_tessellation_settings.py` — UPDATE on `system_settings` rows (already checked: migration 062 was applied per review-report)
|
||||
|
||||
**Before running**: read each migration file to confirm no unexpected DROP statements.
|
||||
No migration required — no new columns or tables.
|
||||
|
||||
## Order Recommendation
|
||||
|
||||
Migrations → TypeScript check → Rebuild → Commit → Smoke test
|
||||
Tasks 3 and 4 are independent cleanup items — implement first (low risk).
|
||||
Tasks 1 and 2 are the core dedup fix — implement after.
|
||||
|
||||
Order: Task 4 → Task 3 → Task 1 → Task 2
|
||||
|
||||
## Risks / Open Questions
|
||||
|
||||
- `usd-core` build in Docker may be slow (first build) — expected, not a problem
|
||||
- Migration 062 may already be applied (review noted "verified by 0-row SELECT") — `alembic upgrade head` is idempotent if so
|
||||
- Existing CAD files need backfill for `partKeyMap` in GLB extras — handled by "Generate Missing Canonical Scenes" bulk action
|
||||
- `resolvePartKey()` falls back to identity (raw mesh name) for GLBs generated before this change — graceful degradation, not a blocking issue
|
||||
- Redis TTL of 30 minutes: if a task crashes hard (OOM, SIGKILL) without running `finally`, the lock stays for 30 minutes. This is the same tradeoff as `process_step_file`. Acceptable.
|
||||
- `generate_usd_master_task` is also queued by the cache-hit path in `generate_gltf_geometry_task` — that chained call will be deduplicated by the lock too if the primary USD task is already running. Correct behaviour.
|
||||
- The auto-chain from `generate_gltf_geometry_task → generate_usd_master_task` is still desirable (keeps canonical scene up-to-date after a fresh GLB). The lock prevents the *duplicate*, not the *legitimate* chain.
|
||||
|
||||
Reference in New Issue
Block a user