71e099305c
- Add per-file Redis SET NX EX 1800 locks to generate_gltf_geometry_task and generate_usd_master_task — concurrent duplicates (e.g. double-click of bulk action buttons) now log a warning and return immediately instead of running two expensive OCC tessellation subprocesses on the same file - Fix eng.dispose() called inside with Session() block in cache-hit path of both tasks — moved to after the with block exits (Tasks 3+4 from plan) - Add cad.updated_at = datetime.utcnow() in save_manual_material_overrides (was missing vs parallel save_part_materials endpoint) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
151 lines
8.0 KiB
Markdown
151 lines
8.0 KiB
Markdown
# Plan: Deduplication for GLB/USD Generation + Two Review Fixes
|
||
|
||
## Context
|
||
|
||
Two problems to solve:
|
||
|
||
**1. Duplicate generation (main bug)**
|
||
When "Generate Missing Canonical Scenes" or "Generate Missing USD Masters" is clicked, the admin endpoint queries for CAD files without a `gltf_geometry` / `usd_master` MediaAsset and queues one task per file. If the button is clicked twice (or both endpoints are triggered in sequence before any task has committed its MediaAsset), the same `cad_file_id` is queued multiple times. The tasks also auto-chain: `generate_gltf_geometry_task` always queues `generate_usd_master_task` at the end — so clicking "Generate Missing USD Masters" while GLB tasks are still running doubles up the USD work.
|
||
|
||
The existing cache check (`step_file_hash`) only short-circuits tessellation when a MediaAsset already exists — it does not prevent two concurrent tasks from both starting the expensive subprocess on the same file. Two processes writing to `_geometry.glb` simultaneously causes corruption / wasted compute.
|
||
|
||
**Solution**: Apply the same Redis `SET NX EX` dedup lock that `process_step_file` uses (lock key `step_processing_lock:{id}`, released in `finally`). Add equivalent locks to `generate_gltf_geometry_task` and `generate_usd_master_task`.
|
||
|
||
**2. Review fix A — `eng.dispose()` inside `with Session` block**
|
||
`export_glb.py` line 89: `eng.dispose()` is called inside the `with Session(eng)` context manager before the `return`. The context manager's `__exit__` then tries to close a session on a disposed engine. Safe in practice (no exception raised) but fragile and misleading. Move `eng.dispose()` to after the `with` block exits.
|
||
|
||
**3. Review fix B — `save_manual_material_overrides` missing `updated_at`**
|
||
`cad.py` line 537: `cad.manual_material_overrides = body.overrides` is committed without updating `cad.updated_at`. The parallel endpoint `save_part_materials` (line 430) does call `cad.updated_at = datetime.utcnow()`. Add the same line to `save_manual_material_overrides`.
|
||
|
||
## Affected Files
|
||
|
||
| File | Change |
|
||
|---|---|
|
||
| `backend/app/domains/pipeline/tasks/export_glb.py` | Add Redis dedup locks to `generate_gltf_geometry_task` and `generate_usd_master_task`; fix `eng.dispose()` placement |
|
||
| `backend/app/api/routers/cad.py` | Add `cad.updated_at = datetime.utcnow()` in `save_manual_material_overrides` |
|
||
|
||
## Tasks (in order)
|
||
|
||
### [x] Task 1: Add Redis dedup locks to `generate_gltf_geometry_task`
|
||
|
||
- **File**: `backend/app/domains/pipeline/tasks/export_glb.py`
|
||
- **What**: At the top of `generate_gltf_geometry_task`, after `pl.step_start(...)`, acquire a Redis lock using the same pattern as `extract_metadata.py`:
|
||
|
||
```python
|
||
import redis as _redis_lib
|
||
_lock_key = f"glb_geometry_lock:{cad_file_id}"
|
||
_r = _redis_lib.from_url(app_settings.redis_url)
|
||
_acquired = _r.set(_lock_key, "1", nx=True, ex=1800) # 30-min TTL
|
||
if not _acquired:
|
||
logger.warning("generate_gltf_geometry_task: %s already in-flight — skipping duplicate", cad_file_id)
|
||
pl.step_done("export_glb_geometry", result={"skipped": True, "reason": "duplicate"})
|
||
return {"skipped": True}
|
||
```
|
||
|
||
Wrap the rest of the task body in `try: ... finally: _r.delete(_lock_key)`.
|
||
|
||
Note: `app_settings` is already imported inside the function. Import `redis` at the top of the `try` block as `import redis as _redis_lib` (same pattern as `extract_metadata.py` which imports it locally).
|
||
|
||
- **Acceptance gate**: Trigger "Generate Missing Canonical Scenes" twice in quick succession — worker logs show `"already in-flight — skipping duplicate"` for the second batch; no file ends up being tessellated twice.
|
||
- **Dependencies**: none
|
||
- **Risk**: Low — same pattern as `process_step_file`, TTL 30min covers worst-case tessellation time.
|
||
|
||
### [x] Task 2: Add Redis dedup lock to `generate_usd_master_task`
|
||
|
||
- **File**: `backend/app/domains/pipeline/tasks/export_glb.py`
|
||
- **What**: Same pattern at the top of `generate_usd_master_task`, after `pl.step_start(...)`:
|
||
|
||
```python
|
||
import redis as _redis_lib
|
||
_lock_key = f"usd_master_lock:{cad_file_id}"
|
||
_r = _redis_lib.from_url(app_settings.redis_url)
|
||
_acquired = _r.set(_lock_key, "1", nx=True, ex=1800) # 30-min TTL
|
||
if not _acquired:
|
||
logger.warning("generate_usd_master_task: %s already in-flight — skipping duplicate", cad_file_id)
|
||
pl.step_done("usd_master", result={"skipped": True, "reason": "duplicate"})
|
||
return {"skipped": True}
|
||
```
|
||
|
||
Wrap the rest of the function body in `try: ... finally: _r.delete(_lock_key)`.
|
||
|
||
- **Acceptance gate**: Trigger "Generate Missing USD Masters" while GLB tasks are still running — worker logs show USD tasks skipping duplicates instead of starting a second tessellation.
|
||
- **Dependencies**: Task 1
|
||
- **Risk**: Low
|
||
|
||
### [x] Task 3: Fix `eng.dispose()` placement in cache-hit early-return path
|
||
|
||
- **File**: `backend/app/domains/pipeline/tasks/export_glb.py`
|
||
- **What**: In `generate_gltf_geometry_task`, the cache-hit path (lines 86–95) calls `eng.dispose()` at line 89 while still inside the `with Session(eng)` block, then returns. Move `eng.dispose()` to *after* the `with` block exits.
|
||
|
||
Current (broken):
|
||
```python
|
||
with Session(eng) as session:
|
||
...
|
||
if existing_geo:
|
||
pl.step_done(...)
|
||
eng.dispose() # ← inside with block
|
||
try:
|
||
generate_usd_master_task.delay(cad_file_id)
|
||
...
|
||
return {"cached": True, ...}
|
||
eng.dispose() # normal path
|
||
```
|
||
|
||
Fixed: remove the `eng.dispose()` at line 89, and move the `generate_usd_master_task.delay()` + `return` to after the `with` block:
|
||
|
||
```python
|
||
_cache_hit_asset_id: str | None = None
|
||
with Session(eng) as session:
|
||
...
|
||
if existing_geo:
|
||
logger.info("[CACHE] hash match — skipping geometry GLB tessellation for %s", cad_file_id)
|
||
pl.step_done("export_glb_geometry", result={"cached": True, "asset_id": str(existing_geo.id)})
|
||
_cache_hit_asset_id = str(existing_geo.id)
|
||
eng.dispose()
|
||
|
||
if _cache_hit_asset_id is not None:
|
||
try:
|
||
generate_usd_master_task.delay(cad_file_id)
|
||
except Exception:
|
||
logger.debug("Could not queue generate_usd_master_task from cache-hit path (non-fatal)")
|
||
return {"cached": True, "asset_id": _cache_hit_asset_id}
|
||
|
||
# ... rest of function (tessellation path)
|
||
```
|
||
|
||
- **Acceptance gate**: `docker compose exec render-worker python3 -c "import app"` (no import errors); cache-hit path still skips tessellation and chains USD master.
|
||
- **Dependencies**: none
|
||
- **Risk**: Low — pure refactor, no logic change.
|
||
|
||
### [x] Task 4: Add `updated_at` in `save_manual_material_overrides`
|
||
|
||
- **File**: `backend/app/api/routers/cad.py`
|
||
- **What**: In `save_manual_material_overrides` (around line 537), add `cad.updated_at = datetime.utcnow()` before `await db.commit()`:
|
||
|
||
```python
|
||
cad.manual_material_overrides = body.overrides
|
||
cad.updated_at = datetime.utcnow() # ← add this line
|
||
await db.commit()
|
||
```
|
||
|
||
- **Acceptance gate**: `PUT /api/cad/{id}/manual-material-overrides` → `GET /api/cad/{id}` shows updated `updated_at` timestamp.
|
||
- **Dependencies**: none
|
||
- **Risk**: None
|
||
|
||
## Migration Check
|
||
|
||
No migration required — no new columns or tables.
|
||
|
||
## Order Recommendation
|
||
|
||
Tasks 3 and 4 are independent cleanup items — implement first (low risk).
|
||
Tasks 1 and 2 are the core dedup fix — implement after.
|
||
|
||
Order: Task 4 → Task 3 → Task 1 → Task 2
|
||
|
||
## Risks / Open Questions
|
||
|
||
- Redis TTL of 30 minutes: if a task crashes hard (OOM, SIGKILL) without running `finally`, the lock stays for 30 minutes. This is the same tradeoff as `process_step_file`. Acceptable.
|
||
- `generate_usd_master_task` is also queued by the cache-hit path in `generate_gltf_geometry_task` — that chained call will be deduplicated by the lock too if the primary USD task is already running. Correct behaviour.
|
||
- The auto-chain from `generate_gltf_geometry_task → generate_usd_master_task` is still desirable (keeps canonical scene up-to-date after a fresh GLB). The lock prevents the *duplicate*, not the *legitimate* chain.
|