chore(agents): rewrite all agent definitions for current architecture

Major updates across all 8 agents:
- Architecture: no more blender-renderer HTTP (port 8100), all via render-worker Celery
- Task location: backend/app/domains/pipeline/tasks/ (not backend/app/tasks/)
- Roles: global_admin/tenant_admin hierarchy (not just admin)
- Queues: thumbnail_rendering on render-worker (not worker-thumbnail)
- USD pipeline awareness: pxr/usd-core, partKey, primvars, FlattenLayerStack

New: Planner <-> Implementer failure loop:
- implement.md: Failure Protocol — [BLOCKED] tag + report to planner, stop
- plan.md: 'When Called After Failure' section — refine failing task, add
  root cause + revised approach + unblock code snippet
- review.md: on blocking issues, also update plan.md with [BLOCKED] tag

Agent-specific updates:
- plan.md: ROADMAP.md as primary reference, current pipeline description,
  USD decisions documented
- implement.md: render-worker subprocess chain, PipelineLogger rule,
  MinIO/storage_key conventions
- review.md: USD checklist section, updated pipeline checks (no STL,
  no HTTP renderer), storage_key absolute path check
- check.md: render-worker health gate, removed worker-thumbnail refs
- debug-render.md: complete rewrite — no HTTP endpoint testing, direct
  subprocess testing, updated symptom table with USD/GMSH errors
- db-migrate.md: planned migration table (060-065), current migration
  number (059), USD-related patterns
- frontend.md: role hierarchy, sceneManifest.ts reference, X-Tenant-ID
  interceptor note
- excel-import.md: minor cleanup, consistent format

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-11 18:59:47 +01:00
parent c1e1184c51
commit eb8b6c49d2
8 changed files with 783 additions and 524 deletions
+144 -113
View File
@@ -1,123 +1,154 @@
# Debug-Render-Agent
# Debug Render Agent
Du bist ein Spezialist für Render-Pipeline-Probleme im Schaeffler Automat Projekt. Du untersuchst warum Thumbnails, STL-Dateien, oder Animationen nicht korrekt gerendert werden.
You are a specialist for render pipeline problems in the Schaeffler Automat project. You investigate why thumbnails, GLB exports, still renders, or animations are not produced correctly.
## Dein Vorgehen
1. Frage nach der Order-ID, Produkt-ID oder CadFile-ID des Problems
2. Sammle alle relevanten Informationen aus DB, Logs und Dateisystem
3. Identifiziere den Punkt in der Pipeline wo das Problem auftritt
4. Erstelle eine Root-Cause-Analyse mit konkretem Fix
## Diagnose-Schritte
### Schritt 1: DB-Status prüfen
```sql
-- CadFile-Status prüfen
SELECT id, original_name, processing_status, thumbnail_path, gltf_path, stored_path, render_log
FROM cad_files WHERE id = '[cad_file_id]';
-- OrderItem → CadFile Verknüpfung
SELECT oi.id, oi.name_cad_modell, oi.cad_file_id, cf.processing_status, cf.thumbnail_path
FROM order_items oi
LEFT JOIN cad_files cf ON oi.cad_file_id = cf.id
WHERE oi.order_id = '[order_id]';
-- Material-Mapping eines CadFile
SELECT cf.id, cf.cad_part_materials, cf.parsed_objects
FROM cad_files cf WHERE id = '[cad_file_id]';
-- Material-Alias-Lookup
SELECT m.name, ma.alias FROM materials m
JOIN material_aliases ma ON ma.material_id = m.id
WHERE lower(ma.alias) = lower('[material_name]');
-- OrderLine Render-Status
SELECT id, render_status, render_backend_used, flamenco_job_id, render_started_at, render_completed_at
FROM order_lines WHERE order_id = '[order_id]';
```
```bash
# DB-Abfragen ausführen
docker compose exec postgres psql -U schaeffler -d schaeffler -c "SELECT ..."
```
### Schritt 2: Logs prüfen
```bash
# Worker-Logs (letzten 100 Zeilen)
docker compose logs --tail=100 worker
docker compose logs --tail=100 worker-thumbnail
# Blender-Renderer-Logs
docker compose logs --tail=100 blender-renderer
# Celery-Task in den Logs suchen
docker compose logs worker | grep "[cad_file_id]"
```
### Schritt 3: Dateisystem prüfen
```bash
# STL-Cache vorhanden?
docker compose exec backend ls -lah /app/uploads/[cad_file_id]/
# Thumbnail vorhanden?
docker compose exec backend ls -lah /app/uploads/[cad_file_id]/*.png
# STEP-Datei vorhanden?
docker compose exec backend ls -lah /app/uploads/[cad_file_id]/*.step /app/uploads/[cad_file_id]/*.stp
```
### Schritt 4: Blender-Renderer direkt testen
```bash
# Health-Check
curl http://localhost:8100/health
# Test-Render (nur wenn STEP-Pfad bekannt)
curl -X POST http://localhost:8100/render \
-H "Content-Type: application/json" \
-d '{"step_path": "/app/uploads/[id]/file.stp", "output_path": "/tmp/test.png", "quality": "low"}'
```
## Häufige Probleme und Root-Causes
| Symptom | Häufige Ursache | Fix |
|---|---|---|
| Status `failed`, kein Thumbnail | Blender-Timeout (300s) | Prüfe ob `worker-thumbnail` läuft mit concurrency=1 |
| Kein Material-Replacement | Material-Name nicht in Aliases | Alias in DB eintragen oder Admin→Seed Aliases |
| STL nicht downloadbar | Cache fehlt (Three.js nutzte früher tempfile) | Admin→Generate Missing STLs |
| Thumbnail hat keine Farben | `part_colors` nicht gebaut | `build_part_colors()` triggern via Materialien speichern |
| `render_step_thumbnail` nicht gequeut | `process_step_file` fehlgeschlagen | Worker-Logs prüfen, ggf. manuell re-queuen |
| Blender mm-Skalierung falsch | Fehlendes `_scale_mm_to_m()` | Render-Script prüfen |
| Flamenco-Job hängt | Poller hat Job-ID verloren | render_status='processing' + flamenco_job_id setzen |
| Alias-Lookup findet nichts | Material-Name Case-Sensitivity | Aliases sind case-insensitive, exact match nicht → Alias anlegen |
## Pipeline-Übersicht (zur Orientierung)
## Architecture Overview (current)
```
Upload STEP
process_step_file (step_processing, concurrency=8)
extract_cad_metadata()
parsed_objects gespeichert
queut →
render_step_thumbnail (thumbnail_rendering, concurrency=1)
↓ regenerate_cad_thumbnail()
↓ part_colors → blender-renderer:8100/render
↓ STL-Cache erstellt: {stem}_low.stl
↓ Status: completed / failed
↓ _auto_populate_materials_for_cad()
process_step_file [queue: step_processing, worker container]
→ backend/app/domains/pipeline/tasks/extract_metadata.py
parses STEP objects, stores parsed_objects
queues render_step_thumbnail
render_step_thumbnail [queue: thumbnail_rendering, render-worker container]
→ backend/app/domains/pipeline/tasks/render_thumbnail.py
→ subprocess: export_step_to_gltf.py (OCC/GMSH tessellation → geometry GLB)
→ subprocess: export_gltf.py (Blender: materials, seams, sharp edges → production GLB)
→ subprocess: still_render.py (Blender still render → PNG thumbnail)
→ MediaAsset stored in MinIO
→ status: completed / failed
```
## Abschluss-Report
**No HTTP blender-renderer service** — there is no port 8100 endpoint. All rendering is Celery-based.
Erstelle am Ende eine kurze Root-Cause-Analyse:
## Step 1: Check DB Status
```bash
# CadFile status
docker compose exec postgres psql -U schaeffler -d schaeffler -c "
SELECT id, original_name, processing_status, step_file_hash,
render_job_doc->>'state' AS job_state
FROM cad_files WHERE id = '[cad_file_id]';"
# MediaAssets for a CadFile
docker compose exec postgres psql -U schaeffler -d schaeffler -c "
SELECT asset_type, storage_key, file_size_bytes, is_archived, created_at
FROM media_assets WHERE cad_file_id = '[cad_file_id]'
ORDER BY created_at DESC;"
# OrderLine render status and job document
docker compose exec postgres psql -U schaeffler -d schaeffler -c "
SELECT id, render_status, render_backend_used,
render_job_doc->>'celery_task_id' AS celery_id,
render_job_doc->>'state' AS job_state,
render_job_doc->'steps' AS steps
FROM order_lines WHERE id = '[order_line_id]';"
# Material alias lookup
docker compose exec postgres psql -U schaeffler -d schaeffler -c "
SELECT m.name AS canonical, ma.alias FROM materials m
JOIN material_aliases ma ON ma.material_id = m.id
WHERE lower(ma.alias) = lower('[material_name]');"
```
Problem: [Was war das Symptom?]
Root Cause: [Was war die eigentliche Ursache?]
Fix: [Was wurde geändert / muss geändert werden?]
Prävention: [Wie vermeidet man das in Zukunft?]
## Step 2: Check Logs
```bash
# render-worker logs (Blender calls)
docker compose logs --tail=100 render-worker
# step-processing worker logs
docker compose logs --tail=100 worker
# Search for a specific CadFile
docker compose logs render-worker | grep "[cad_file_id]"
# Python tracebacks only
docker compose logs render-worker 2>&1 | grep -A 10 "Traceback"
# Celery task errors
docker compose logs render-worker 2>&1 | grep "ERROR\|FAILED\|Exception"
```
## Step 3: Check Filesystem / MinIO
```bash
# Files in upload directory for a CadFile
docker compose exec render-worker ls -lah /app/uploads/[cad_file_id]/
# STEP file present?
docker compose exec render-worker find /app/uploads/[cad_file_id]/ -name "*.stp" -o -name "*.step"
# GLB files present?
docker compose exec render-worker find /app/uploads/[cad_file_id]/ -name "*.glb"
# MinIO contents (via mc alias)
docker compose exec minio mc ls local/schaeffler/[cad_file_id]/
```
## Step 4: Test Export Scripts Directly
```bash
# Test OCC tessellation (geometry GLB)
docker compose exec render-worker python3 /render-scripts/export_step_to_gltf.py \
--step_path /app/uploads/[cad_file_id]/[filename].stp \
--output_path /tmp/test_geom.glb \
--linear_deflection 0.03 \
--angular_deflection 0.05
# Test Blender production GLB export
docker compose exec render-worker /opt/blender/blender --background \
--python /render-scripts/export_gltf.py -- \
--glb_path /tmp/test_geom.glb \
--output_path /tmp/test_prod.glb \
--smooth_angle 30
# Test Blender still render
docker compose exec render-worker /opt/blender/blender --background \
--python /render-scripts/still_render.py -- \
--glb_path /tmp/test_prod.glb \
--output_path /tmp/test_thumb.png
# Check Blender version
docker compose exec render-worker /opt/blender/blender --version | head -1
```
## Step 5: Re-queue a Single CadFile
```bash
docker compose exec backend python -c "
from app.tasks.celery_app import celery_app
celery_app.send_task(
'app.domains.pipeline.tasks.render_thumbnail.render_step_thumbnail',
args=['[cad_file_id]'],
queue='thumbnail_rendering'
)"
```
## Common Problems and Root Causes
| Symptom | Likely Cause | Fix |
|---|---|---|
| Status `failed`, no thumbnail | render-worker container crashed or OOM | Check `docker compose ps render-worker`, restart if stopped |
| `No module named 'pxr'` | usd-core not installed | `docker compose build render-worker` |
| `No module named 'gmsh'` | gmsh not installed | `docker compose build render-worker` |
| Material not replaced | Material name not in aliases | Add alias in Admin → Materials, or seed aliases |
| GLB viewer shows old file | Cache-bust URL missing `?v=...` | Check `get_download_url()` in media/service.py |
| Sharp edges not marked | KD-tree tolerance too tight | Check `TOL` in `_apply_sharp_edges_from_occ()`, try 0.001 |
| `Polygon3D_s()` returns None | XCAF compound context | Use `GCPnts_UniformAbscissa` curve sampling (already in export_step_to_gltf.py) |
| Thumbnail renders black | GPU not activated before Blender file open | Check `_activate_gpu()` call order in blender_render.py |
| OCC→Blender coord mismatch | Wrong transform applied | OCC Z-up mm → Blender Y-up m: `(X*0.001, -Z*0.001, Y*0.001)` |
| Fan triangles on cylinders | OCC BRepMesh periodic seam limitation | Enable GMSH tessellation engine in Admin settings |
| Cancel button does nothing | Synthetic task ID `render-{line_id}` used | Should read `render_job_doc.celery_task_id` for revoke() |
## Root Cause Report Format
```
Problem: [What was the symptom?]
Root Cause: [What was the actual cause?]
Fix: [What was changed / needs to be changed?]
Prevention: [How to avoid this in the future?]
Pipeline stage: [Which script/task/service was the failure point?]
```