feat: render health endpoint + test script + pipeline fixes

- GET /api/worker/health/render: checks render-worker (thumbnail_rendering
  queue), Blender availability via active_queues inspect, queue depth,
  last render recency — returns ok/degraded/down status
- scripts/test_render_pipeline.py: integration test for full pipeline
  (--health, --sample, --full modes)
- PLAN.md: appended Render Pipeline Fixes section with all B-Fixes
- LEARNINGS.md: documented 5 new learnings (queue mismatch, circular
  import, 307 redirect, worker capability detection)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-06 19:34:12 +01:00
parent 979b0082ec
commit 381f44bc8b
4 changed files with 703 additions and 1 deletions
+49 -1
View File
@@ -1,7 +1,7 @@
# Refactor-Plan: Schaeffler Automat v2
**Erstellt:** 2026-03-05
**Aktualisiert:** 2026-03-06 — Phasen A, B, C, D, E abgeschlossen
**Aktualisiert:** 2026-03-06 — Phasen A, B, C, D, E abgeschlossen + Render-Pipeline-Fixes
**Status:** IN UMSETZUNG — Phase F als nächstes
**Branch:** `refactor/render-pipeline` → Ziel: neuer Branch `refactor/v2`
@@ -1405,3 +1405,51 @@ curl http://localhost:8888/api/media?product_id={product_id} | jq length # →
- [x] Startphase A bestätigt
- [x] Git-Tag `v1-stable` auf main erstellt
- [x] Git-Branch `refactor/v2` erstellt
---
## Render Pipeline Fixes (2026-03-06)
### Kontext
Nach Aktivierung von Multi-Tenancy (Migration 035/036) hatten mehrere Bugs die gesamte Render-Pipeline blockiert. Alle wurden behoben.
### Durchgeführte Fixes
| Fix | Problem | Lösung | Datei |
|---|---|---|---|
| B-Fix-1 | `worker-thumbnail` ohne Blender konkurrierte auf `thumbnail_rendering` → 50% Silent-Fails | `worker-thumbnail` aus docker-compose.yml entfernt | `docker-compose.yml` |
| B-Fix-2 | `render_order_line_task` auf `step_processing` Queue → `worker` ohne Blender → Pillow-Fallback | Queue zu `thumbnail_rendering` geändert | `step_tasks.py:247` |
| B-Fix-3 | Circular Import `template_service.py``domains/rendering/service.py``resolve_template()` nie aufrufbar | Volle sync SQLAlchemy Implementierung in `template_service.py` wiederhergestellt | `services/template_service.py` |
| B-Fix-4 | `audit_log.tenant_id NOT NULL` → Broadcast-Notifications scheiterten → Order Submit 500 | `ALTER TABLE audit_log ALTER COLUMN tenant_id DROP NOT NULL` | DB direkt |
| B-Fix-5 | Shared System-Tabellen (`output_types`, `materials`, etc.) `tenant_id NOT NULL` → Create-Endpoints schlugen fehl | `tenant_id DROP NOT NULL` für alle System-Tabellen | DB direkt |
| B-Fix-6 | STEP Upload + Excel Import setzten `tenant_id=NULL` | `user.tenant_id` durch alle Create-Pfade durchgezogen | `uploads.py`, `excel_import.py`, `products/service.py` |
| B-Fix-7 | `GET /api/tenants` → 307 Redirect → axios verliert Authorization-Header → 401 → leere Tenant-Liste | Trailing Slash in API-Call: `/tenants/` | `frontend/src/api/tenants.ts` |
| B-Fix-8 | Admin-UI zeigte noch Flamenco + Three.js Optionen | Flamenco-Section + Three.js-Picker entfernt | `Admin.tsx`, `OutputTypeTable.tsx` |
| B-Fix-9 | 5 Output-Types noch auf `render_backend='flamenco'` | `UPDATE output_types SET render_backend='celery'` | DB direkt |
### Neue Testing-Infrastruktur (DONE)
**`GET /api/worker/health/render`** — Render Health Endpoint:
- Render-Worker connected (Celery inspect)
- Blender erreichbar (HTTP GET blender-renderer:8100/health)
- `thumbnail_rendering` Queue Tiefe < 10
- Letzter Render < 30 min alt und erfolgreich
- Response: `{ status: "ok"|"degraded"|"down", render_worker_connected, blender_available, thumbnail_queue_depth, last_render_at, ... }`
**`scripts/test_render_pipeline.py`** — Integration Test Script:
```bash
python scripts/test_render_pipeline.py --health # Health-Check only
python scripts/test_render_pipeline.py --sample # 1 STEP + 1 Output-Type (schnell)
python scripts/test_render_pipeline.py --full # Alle Output-Types (langsam)
```
### Celery-Queue-Architektur (nach Fixes)
| Queue | Worker | Concurrency | Tasks |
|---|---|---|---|
| `step_processing` | `worker` | 8 | `process_step_file`, `dispatch_order_line_render` |
| `thumbnail_rendering` | `render-worker` (Blender 5.0.1) | 1 | `render_step_thumbnail`, `regenerate_thumbnail`, `render_order_line_task`, `generate_stl_cache` |
| `ai_validation` | `worker` | 8 | Azure AI Validierung |
**Schlüsselprinzip**: Alles was Blender aufruft → `thumbnail_rendering` Queue → nur `render-worker` → kein Timeout durch parallele Requests.