7a1329958d
- fix(render): ffmpeg overlay=0:0 -> overlay=0:0:shortest=1 to prevent hang on finite PNG sequences - feat(ws): add core/websocket.py ConnectionManager + Redis Pub/Sub subscriber loop - feat(ws): add /api/ws WebSocket endpoint with JWT query-param auth in main.py - feat(ws): emit render_complete/failed + cad_processing_complete events from step_tasks.py - feat(ws): emit order_status_change events from orders router - feat(ws): add beat_tasks.py broadcast_queue_status task (every 10s via Redis __broadcast__) - feat(frontend): add useWebSocket hook with auto-reconnect (exponential backoff, 25s ping) - feat(frontend): add WebSocketContext + WebSocketProvider wrapping App - refactor(frontend): remove polling from WorkerActivity (was 5s/3s) + OrderDetail (was 5s) - refactor(frontend): reduce polling in Layout (8s->60s) + NotificationCenter (15s->60s) - docs: add ffmpeg shortest=1 + WebSocket JWT auth learnings to LEARNINGS.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
19 KiB
19 KiB
Plan: Phase J (WebSocket) + Turntable Bug + Phase K (Asset Library)
Kontext
Analyse des aktuellen Codestands ergab: Phasen F, G, H, I, L sind bereits vollständig implementiert.
| Phase | Status | Beleg |
|---|---|---|
| F - Hash-Caching | DONE | domains/products/cache_service.py + migration 041 |
| G - Billing | DONE | domains/billing/ vollständig, WeasyPrint in Dockerfile |
| H - Excel Sanity-Check | DONE | domains/imports/service.py run_sanity_check() + Upload.tsx Dialog |
| I - Notification-Config | DONE | notification_configs migration 044, NotificationSettings.tsx |
| L - Dashboard | DONE | AdminDashboard.tsx + ClientDashboard.tsx vollständig |
| J - WebSocket | FEHLT | Kein core/websocket.py, alle Polls noch aktiv |
Zusätzlich: Kritischer Bug in render_blender.py — ffmpeg-Overlay-Befehl haengt bei endlicher Frame-Sequenz (kein shortest=1) -> Timeout -> Turntable-Render schlaegt fehl.
Bug Fix: Turntable ffmpeg Timeout
Root cause: In backend/app/services/render_blender.py:507:
"-filter_complex", "[1:v][0:v]overlay=0:0",
Der lavfi color-Quell-Stream hat keine definierte Laenge. Ohne shortest=1 wartet ffmpeg auf
weitere Frames vom Farb-Stream nachdem die PNG-Sequenz endet -> haengt bis Timeout (300s).
Fix: overlay=0:0 -> overlay=0:0:shortest=1
Phase J: WebSocket Backend + Frontend
Architektur (ADR-05: FastAPI nativ + Redis Pub/Sub)
Backend Task/Router:
-> redis.publish(f"tenant:{tenant_id}", json.dumps(event))
core/websocket.py:
ConnectionManager: tenant_id -> set[WebSocket]
background_task: asyncio.Task (redis subscribe loop)
Frontend:
useWebSocket() hook -> WebSocket('/api/ws')
Empfaengt Events, invalidiert React Query caches
Events die gesendet werden:
| Event | Sender | Daten |
|---|---|---|
render_complete |
step_tasks.py | order_line_id, status, thumbnail_url |
render_failed |
step_tasks.py | order_line_id, error |
cad_processing_complete |
step_tasks.py | cad_file_id, status |
order_status_change |
orders router | order_id, new_status |
queue_update |
beat task (alle 10s) | depth per queue |
Betroffene Dateien
Neu erstellen:
backend/app/core/websocket.py-- ConnectionManager + Redis Pub/Sub Loopfrontend/src/hooks/useWebSocket.ts-- WebSocket hook mit Auto-Reconnectfrontend/src/contexts/WebSocketContext.tsx-- Context Provider
Aendern:
backend/app/services/render_blender.py-- ffmpeg shortest=1 Bug-Fixbackend/app/main.py-- WebSocket-Endpoint registrieren (/api/ws)backend/app/tasks/step_tasks.py-- WebSocket-Events emittierenbackend/app/domains/orders/router.py-- Order-Status-Events emittierenbackend/app/tasks/celery_app.py--broadcast_queue_statusBeat-Task hinzufuegenfrontend/src/App.tsx-- WebSocketProvider wrappenfrontend/src/pages/WorkerActivity.tsx-- polling durch WS ersetzenfrontend/src/pages/OrderDetail.tsx-- polling durch WS ersetzenfrontend/src/pages/Orders.tsx-- polling reduzierenfrontend/src/components/layout/Layout.tsx-- polling reduzierenfrontend/src/components/layout/NotificationCenter.tsx-- polling durch WS ersetzen
Nach Phase J Commit -- Phase K:
backend/alembic/versions/045_asset_libraries.py-- asset_libraries Tabellebackend/app/domains/materials/models.py-- AssetLibrary Model hinzufuegenbackend/app/domains/materials/router.py-- Asset Library CRUD + Uploadrender-worker/scripts/asset_library.py-- Materialien + Node-Groups aus .blend ladenrender-worker/scripts/catalog_assets.py-- Katalog aus .blend lesenrender-worker/scripts/export_gltf.py-- GLB Export mit Materialienrender-worker/scripts/export_blend.py-- .blend Export mit pack_all()backend/app/domains/rendering/workflow_builder.py-- Asset Library Nodesfrontend/src/pages/Admin.tsx-- Asset Library Manager UIfrontend/src/api/assetLibraries.ts-- API Client
Tasks (in Reihenfolge)
Task 1: Bug-Fix ffmpeg Turntable Timeout [x]
- Datei:
backend/app/services/render_blender.py:507 - Was:
"[1:v][0:v]overlay=0:0"->"[1:v][0:v]overlay=0:0:shortest=1" - Akzeptanzkriterium: Turntable-Render fuer Order f0436188 kann erneut gestartet werden und produziert MP4
- Abhaengigkeiten: keine
Task 2: WebSocket Backend -- core/websocket.py [x]
- Datei:
backend/app/core/websocket.py(neu) - Was:
class ConnectionManager: _connections: dict[str, set[WebSocket]] # tenant_id -> sockets async def connect(ws, tenant_id) def disconnect(ws, tenant_id) async def broadcast_to_tenant(tenant_id, event: dict) async def start_redis_subscriber() # asyncio background task def publish_event_sync(tenant_id: str, event: dict): # Sync version fuer Celery tasks -- redis.publish()- Redis Pub/Sub: subscribe auf
tenant:*Channels - Bei Nachricht: alle WebSockets des Tenants benachrichtigen
- Auto-Ping alle 30s gegen Disconnects
- Redis Pub/Sub: subscribe auf
- Akzeptanzkriterium: broadcast_to_tenant sendet an alle verbundenen WS des Tenants
- Abhaengigkeiten: keine
Task 3: WebSocket Endpoint in main.py [x]
- Datei:
backend/app/main.py - Was:
@app.websocket("/api/ws") async def ws_endpoint(websocket: WebSocket, token: str = Query(...)): user = await verify_ws_token(token) await manager.connect(websocket, str(user.tenant_id)) try: while True: await websocket.receive_text() # Keep-alive pings except WebSocketDisconnect: manager.disconnect(websocket, str(user.tenant_id))- Token-Auth via Query-Parameter (WS kann keinen Authorization-Header senden)
verify_ws_token: JWT decode, User laden (analog zu get_current_user)managerals globale Instanz, gestartet im lifespan
- Akzeptanzkriterium:
ws://localhost:8888/api/ws?token=<jwt>oeffnet Verbindung - Abhaengigkeiten: Task 2
Task 4: WebSocket Events in step_tasks.py [x]
- Datei:
backend/app/tasks/step_tasks.py - Was: In render_order_line_task und render_step_thumbnail nach Erfolg/Fehler:
from app.core.websocket import publish_event_sync # bei render complete: publish_event_sync(tenant_id, {"type": "render_complete", "order_line_id": str(line.id), "status": "completed"}) # bei render failed: publish_event_sync(tenant_id, {"type": "render_failed", "order_line_id": str(line.id), "error": str(exc)}) # bei CAD processing complete: publish_event_sync(tenant_id, {"type": "cad_processing_complete", "cad_file_id": str(cad_file.id), "status": "completed"})- tenant_id aus cad_file.tenant_id bzw. order_line -> order -> user.tenant_id laden
- Akzeptanzkriterium: Render fertig -> WebSocket-Client empfaengt Event
- Abhaengigkeiten: Task 2
Task 5: WebSocket Events in orders router [x]
- Datei:
backend/app/domains/orders/router.py - Was: Bei Order-Status-Aenderung (submit, complete, cancel):
from app.core.websocket import manager await manager.broadcast_to_tenant( str(current_user.tenant_id), {"type": "order_status_change", "order_id": str(order.id), "status": new_status} ) - Akzeptanzkriterium: Order-Submit -> WebSocket-Event geht an alle Browser-Tabs des Tenants
- Abhaengigkeiten: Task 2
Task 6: Queue-Update Beat-Task [x]
- Datei:
backend/app/tasks/celery_app.py - Was: Neuer Beat-Task alle 10s:
@shared_task(name="beat.broadcast_queue_status", queue="step_processing") def broadcast_queue_status(): from app.core.websocket import publish_event_sync from redis import Redis r = Redis.from_url(settings.redis_url) depths = { "step_processing": r.llen("step_processing"), "thumbnail_rendering": r.llen("thumbnail_rendering"), } # Broadcast an alle Tenants (broadcast_all) r.publish("__broadcast__", json.dumps({"type": "queue_update", "depths": depths}))__broadcast__Channel: wird an ALLE verbundenen WS gesendet (nicht tenant-spezifisch)- ConnectionManager subscribt auch auf
__broadcast__
- Akzeptanzkriterium: WorkerActivity-Queue-Tiefe aktualisiert alle 10s automatisch
- Abhaengigkeiten: Task 2
Task 7: Frontend WebSocket Hook [x]
- Datei:
frontend/src/hooks/useWebSocket.ts(neu) - Was:
export function useWebSocketConnection() { // Verbindet zu ws://localhost:8888/api/ws?token=<jwt> // Auto-Reconnect: 1s, 2s, 4s, 8s, ... max 30s // Emittiert Events via onMessage callback // Pings alle 25s (keep-alive) // Trennt Verbindung bei Logout } - Akzeptanzkriterium: Verbindung bleibt offen, reconnected nach Netzwerktrennung
- Abhaengigkeiten: keine
Task 8: Frontend WebSocket Context [x]
- Datei:
frontend/src/contexts/WebSocketContext.tsx(neu),frontend/src/App.tsxaendern - Was:
export function WebSocketProvider({ children }) { const queryClient = useQueryClient() // on 'render_complete': invalidateQueries(['orders', order_line_id]) // on 'render_failed': invalidateQueries(['orders', order_line_id]) // on 'cad_processing_complete': invalidateQueries(['cad-activity']) // on 'order_status_change': invalidateQueries(['orders']) // on 'queue_update': queryClient.setQueryData(['queue-status'], ...) } // App.tsx: <WebSocketProvider> um <Router> wrappen - Akzeptanzkriterium: render_complete Event -> OrderDetail aktualisiert ohne Poll-Interval
- Abhaengigkeiten: Task 7
Task 9: Polling ersetzen -- WorkerActivity.tsx [x]
- Datei:
frontend/src/pages/WorkerActivity.tsx - Was:
refetchInterval: 5000entfernen -- beicad_processing_completeinvalidierenrefetchInterval: 3000fuer Queue-Status entfernen -- beiqueue_updatesetQueryData
- Akzeptanzkriterium: Keine automatischen HTTP-Requests im Network-Tab (nur WS-Frames)
- Abhaengigkeiten: Task 8
Task 10: Polling ersetzen -- OrderDetail.tsx [x]
- Datei:
frontend/src/pages/OrderDetail.tsx - Was:
refetchInterval: (query) => {...}entfernen- Stattdessen: bei
render_complete/render_failedfuer matching order_line_id -> invalidate
- Akzeptanzkriterium: Render-Status in OrderDetail aktualisiert live ohne Poll
- Abhaengigkeiten: Task 8
Task 11: Polling reduzieren -- Layout.tsx + NotificationCenter.tsx [x]
- Dateien:
frontend/src/components/layout/Layout.tsx,NotificationCenter.tsx - Was:
- Layout:
refetchInterval: 8000-> 60000 (1min) - NotificationCenter:
refetchInterval: 15_000-> 60000; beiorder_status_changezusaetzlich invalidieren
- Layout:
- Akzeptanzkriterium: Signifikant weniger Poll-Requests im Network-Tab
- Abhaengigkeiten: Task 8
Task 12: PLAN.md + LEARNINGS.md + Commit [x]
- Was:
- PLAN.md: Phase J als ABGESCHLOSSEN markieren, Status auf "Phase K als naechstes"
- LEARNINGS.md: ffmpeg
shortest=1Learning + WebSocket Auth via Query-Param Learning git commit -m "feat(J): WebSocket live-events + replace polling + fix ffmpeg turntable timeout"
- Abhaengigkeiten: Tasks 1-11
Phase K Tasks (nach Commit)
Task K1: Migration 045 + AssetLibrary Model [ ]
- Datei:
backend/alembic/versions/045_asset_libraries.py(neu, autogenerate),domains/materials/models.py - Was:
class AssetLibrary(Base): id: UUID PK, tenant_id FK nullable, name VARCHAR(200) blend_file_key TEXT, # MinIO key catalog JSONB, # {materials: [...], node_groups: [...]} description TEXT, is_active BOOL, created_at TIMESTAMPrender_templates.asset_library_idFK optional (nullable)output_types.asset_library_idFK optional (nullable)
- Akzeptanzkriterium:
alembic upgrade headerfolgreich,asset_librariesTabelle in DB
Task K2: Asset Library CRUD Backend [ ]
- Datei:
backend/app/domains/materials/router.py+service.py+schemas.py - Was:
POST /api/asset-libraries-- .blend Upload -> MinIOasset-libraries/{id}.blend-> queut Katalog-RefreshGET /api/asset-libraries-- ListeGET /api/asset-libraries/{id}/catalog-- Materialien + Node-GroupsDELETE /api/asset-libraries/{id}-- nur wenn nicht in Verwendung (FK-Check)AssetLibraryOutSchema mitcatalogfield
- Akzeptanzkriterium: POST + GET funktionieren, .blend in MinIO gespeichert
Task K3: Katalog-Refresh Celery Task + Blender Script [ ]
- Datei:
backend/app/domains/materials/tasks.py(neu),render-worker/scripts/catalog_assets.py(neu) - Was:
- Celery Task
refresh_asset_library_catalog(asset_library_id)auf Queuethumbnail_rendering - Laedt .blend aus MinIO in tmpdir
- Startet
blender --background --python catalog_assets.py -- <blend_path> catalog_assets.py: oeffnet .blend, liest alle markierten Assets:import bpy, json, sys blend_path = sys.argv[sys.argv.index('--') + 1] bpy.ops.wm.open_mainfile(filepath=blend_path) catalog = { "materials": [m.name for m in bpy.data.materials if m.asset_data], "node_groups": [ng.name for ng in bpy.data.node_groups if ng.asset_data], } print(json.dumps(catalog))- Schreibt Katalog in
asset_libraries.catalog JSONB
- Celery Task
- Akzeptanzkriterium: Nach .blend-Upload enthaelt
catalogJSONB die Asset-Namen
Task K4: Blender Asset Library Apply Script [ ]
- Datei:
render-worker/scripts/asset_library.py(neu) - Was:
def apply_asset_library_materials(blend_path: str, material_map: dict) -> None: """Laedt Materialien aus Asset-Library .blend, wendet auf Mesh-Parts an.""" with bpy.data.libraries.load(blend_path, link=True, assets_only=True) as (src, dst): dst.materials = [n for n in src.materials if n in material_map.values()] for obj in bpy.data.objects: if obj.type == 'MESH': for slot in obj.material_slots: resolved = material_map.get(slot.material.name if slot.material else '') if resolved and resolved in bpy.data.materials: slot.material = bpy.data.materials[resolved] def apply_asset_library_modifiers(blend_path: str, modifier_map: dict) -> None: """Laedt Geometry-Node-Gruppen, wendet als Modifier an.""" with bpy.data.libraries.load(blend_path, link=True, assets_only=True) as (src, dst): dst.node_groups = [n for n in src.node_groups if n in modifier_map.values()] for obj in bpy.data.objects: if obj.type == 'MESH': for part_name, mod_name in modifier_map.items(): if part_name.lower() in obj.name.lower(): mod = obj.modifiers.new(name=mod_name, type='NODES') mod.node_group = bpy.data.node_groups.get(mod_name) - Akzeptanzkriterium: Render mit Asset-Library zeigt korrekte Produktionsmaterialien
Task K5: export_gltf + export_blend Scripts [ ]
- Dateien:
render-worker/scripts/export_gltf.py(neu),render-worker/scripts/export_blend.py(neu) - Was:
export_gltf.py:- STL importieren (
bpy.ops.import_mesh.stl) - Asset Library laden via
apply_asset_library_materials+apply_asset_library_modifiers bpy.ops.export_scene.gltf(filepath=out, export_format='GLB', export_apply=True, export_draco_mesh_compression_enable=True)- Output nach MinIO
production-exports/{cad_file_id}/{run_id}.glb - MediaAsset-Record mit
asset_type=gltf_production
- STL importieren (
export_blend.py:- STL + Asset Library laden (wie export_gltf)
bpy.ops.file.pack_all()bpy.ops.wm.save_as_mainfile(filepath=out, compress=True, copy=True)- MediaAsset-Record mit
asset_type=blend_production
- Akzeptanzkriterium: GLB-Download oeffnet sich im Three.js Viewer mit Materialien
Task K6: Workflow-Builder -- Asset Library Nodes [ ]
- Datei:
backend/app/domains/rendering/workflow_builder.py - Was:
- Neue Celery Tasks:
apply_asset_library_materials_task,apply_asset_library_modifiers_task,export_gltf_task,export_blend_task - Neuer Workflow-Typ
still_production:chain( convert_step.si(order_line_id), group( chain(apply_asset_library_materials.si(order_line_id), render_still.si(order_line_id)), chain(apply_asset_library_materials.si(order_line_id), export_gltf.si(order_line_id)), chain(apply_asset_library_materials.si(order_line_id), export_blend.si(order_line_id)), ), generate_thumbnail.si(order_line_id), publish_asset.si(order_line_id), )
- Neue Celery Tasks:
- Akzeptanzkriterium: Dispatch eines
still_productionWorkflows -> PNG + GLB + .blend erzeugt
Task K7: Asset Library Management UI [ ]
- Dateien:
frontend/src/api/assetLibraries.ts(neu),frontend/src/pages/Admin.tsxerweitern - Was:
- API Client:
getAssetLibraries,uploadAssetLibrary(multipart),deleteAssetLibrary,getAssetLibraryCatalog - Admin.tsx: neues Panel "Asset Libraries" (nach Render Templates)
- Upload-Button + Drag-Drop
- Tabelle: Name, Materialien-Anzahl, Node-Groups-Anzahl, Aktionen
- Katalog-Detail: Material-Badge-Liste (gruen) + Node-Group-Badge-Liste (blau)
- OutputTypeTable: Asset-Library-Dropdown-Spalte
- API Client:
- Akzeptanzkriterium: Admin kann .blend hochladen, Katalog sehen, OutputType zuweisen
Task K8: PLAN.md + LEARNINGS.md + Commit [ ]
- Was:
- PLAN.md: Phase K als ABGESCHLOSSEN markieren
- LEARNINGS.md: Asset Library link=True Pattern, GLB-Export Blender API
git commit -m "feat(K): Blender Asset Library + production exports (GLB + .blend)"
Migrations-Check
| Migration | Phase | Status |
|---|---|---|
| 041 step_file_hash | F | existiert |
| 042 invoices | G | existiert |
| 043 import_validations | H | existiert |
| 044 notification_configs | I | existiert |
| 045 asset_libraries | K | fehlt |
Reihenfolge-Empfehlung
Task 1 (Bug-Fix, sofort)
Tasks 2-6 parallel (Backend WebSocket)
Tasks 7-8 parallel (Frontend Hook + Context)
Tasks 9-11 (Polling ersetzen, nach 8)
Task 12 (Commit)
Tasks K1-K3 parallel (Datenmodell + Backend + Blender-Katalog)
Tasks K4-K5 parallel (Blender Scripts)
Tasks K6-K7 (Workflow + UI, nach K1-K5)
Task K8 (Commit)
Risiken / Offene Fragen
- WebSocket Auth via Query-Param: Token in Server-Logs sichtbar. Fuer v2 akzeptabel. In v3: kurzlebigen WS-Token (TTL 30s) aus JWT generieren.
- Redis Pub/Sub Skalierung: Bei vielen Tenants/Tabs kann Subscriber-Loop Bottleneck werden. Fuer v2 OK. In v3: Redis Streams.
- Phase K -- MinIO Bucket:
asset-librariesBucket muss beim Startup erstellt werden (lifespan in main.py). - Phase K -- link=True bedeutet .blend muss vor Render via MinIO heruntergeladen werden (in tmpdir). Bereits einkalkuliert in K3.
- Bestehende material_libraries: Die alte
material_librariesTabelle/Feature bleibt parallel bestehen -- kein Breaking Change. Asset Libraries sind additiv.