diff --git a/LEARNINGS.md b/LEARNINGS.md index c162eee..3cee1b1 100644 --- a/LEARNINGS.md +++ b/LEARNINGS.md @@ -160,6 +160,28 @@ __all__ = ["User"] --- +### 2026-03-06 | Circular Import | template_service ↔ domains/rendering/service — Render nie ausgeführt +**Problem:** `app.services.template_service` war ein Shim der `app.domains.rendering.service` importiert. `app.domains.rendering.service` importierte wiederum `app.services.template_service` → zirkulärer Import → `resolve_template` konnte nie geladen werden → jeder Render schlug fehl mit "cannot import name 'resolve_template' from partially initialized module". +**Ursache:** B1-Refactor hat beide Module zu Shims gemacht die aufeinander zeigen. Die eigentliche Implementierung wurde nicht in die neue Domäne übertragen. +**Lösung:** `template_service.py` mit der Originalimplementierung aus dem git-Log wiederhergestellt (sync SQLAlchemy, Celery-sicher, 4-stufige Cascade). `domains/rendering/service.py` importiert jetzt korrekt aus `template_service` ohne Rückimport. +**Für künftige Projekte:** Nach Refactoring immer prüfen ob Shims auf die echte Implementierung zeigen oder wieder auf andere Shims. `grep -rn "def resolve_template"` vor dem Commit muss mindestens 1 Treffer liefern. + +--- + +### 2026-03-06 | Multi-Tenancy | audit_log.tenant_id NOT NULL blockiert alle Notifications +**Problem:** Migration 036 machte `audit_log.tenant_id NOT NULL`, aber `emit_notification` setzt kein `tenant_id`. Die Notification-Insert schlug fehl → rollback → nachfolgende Session-Zugriffe schlugen fehl → Order-Submit gab 500 zurück. +**Lösung:** `audit_log.tenant_id` via `ALTER TABLE audit_log ALTER COLUMN tenant_id DROP NOT NULL` nullable gemacht. Broadcast-Notifications (system-weit, kein konkreter Tenant) DÜRFEN NULL tenant_id haben. +**Für künftige Projekte:** Audit-Logs die als Broadcast an alle Tenants gehen benötigen nullable tenant_id. Nie NOT NULL auf Tabellen setzen die auch System-Events speichern. + +--- + +### 2026-03-06 | Frontend | GET /api/tenants gibt 307 Redirect zurück +**Problem:** FastAPI router registriert `/tenants/` (mit trailing slash). `GET /tenants` → 307 Redirect zu `/tenants/`. Axios folgt dem Redirect aber verliert den Authorization-Header → 401 → leere Tenant-Liste im Frontend. +**Lösung:** `getTenants()` in `api/tenants.ts` auf `/tenants/` (mit trailing slash) geändert. +**Für künftige Projekte:** FastAPI APIRouter mit `prefix="/tenants"` und `@router.get("")` erzeugt `/tenants` (kein Slash). Mit `@router.get("/")` erzeugt `/tenants/`. Axios folgt 307 nicht mit Auth-Header. Immer trailing slash im Frontend verwenden wenn Router mit Slash registriert. + +--- + ## Offene Fragen - [ ] Azure AI Credentials für Phase 4 (Bildvalidierung) noch nicht konfiguriert - [ ] pythonOCC verfügbar im render-worker (via cadquery dependency)? Deployment-Test ausstehend @@ -176,3 +198,54 @@ Blender-Renderer verarbeitet nur 1 Request gleichzeitig. Wenn worker (concurrenc ### 2026-03-06 | Alembic | Migration exit code 100 bei enum-Konflikt SQLAlchemy `Enum(create_type=False)` funktioniert nicht zuverlässig mit asyncpg. Bei bereits existierenden PostgreSQL-Enum-Typen: Raw SQL mit `DO $$ BEGIN CREATE TYPE ...; EXCEPTION WHEN duplicate_object THEN NULL; END $$;` verwenden. Für Tabellen: `CREATE TABLE IF NOT EXISTS`. + +### 2026-03-06 | Render-Pipeline | Circular Shim blockiert alle Order-Renders +**Problem:** `dispatch_order_line_render` → `dispatch_render` (Shim A→B→A Circular Import) → Render startet nie. Die einzige funktionierende Render-Implementierung `render_order_line_task` war nie aus dem Dispatch-Chain erreichbar. +**Lösung:** `dispatch_order_line_render` direkt auf `render_order_line_task.delay()` umleiten. `render_dispatcher.py`-Shim ebenfalls repariert. Dispatch-Service `_legacy_dispatch` ebenfalls auf `render_order_line_task` umgeleitet. +**Erkenntnisse:** Bei Refactoring immer prüfen ob Shims zirkulär werden. Wenn zwei Module sich gegenseitig importieren (A→B und B→A), entsteht ein Circular Import — keine echte Implementierung wird aufgerufen. Den echten Aufruf-Pfad von der API zum Task vor Refactoring dokumentieren. + +--- + +### 2026-03-06 | Render-Pipeline | render_order_line_task auf falschem Worker (kein Blender) +**Problem:** `render_order_line_task` war auf Queue `step_processing` → lief im `worker`-Container (Backend-Dockerfile, kein Blender). `render_to_file()` fiel still auf Pillow-Placeholder zurück. Renders scheinbar erfolgreich aber nur graue Platzhalterbilder. +**Ursache:** `is_blender_available()` prüft `BLENDER_BIN`-Env-Var — im `worker`-Container nicht gesetzt. Fallback auf Pillow passiert lautlos ohne Exception. +**Lösung:** `render_order_line_task` queue auf `thumbnail_rendering` geändert → läuft jetzt im `render-worker`-Container (hat Blender 5.0.1 + cadquery). `worker-thumbnail`-Service aus `docker-compose.yml` entfernt (hatte keinen Blender, blockierte aber die Queue). +**Für künftige Projekte:** Blender-Tasks IMMER auf `thumbnail_rendering` Queue routen. `worker-thumbnail` = kein Blender, `render-worker` = hat Blender. Wenn `is_blender_available()` False zurückgibt ist der Task auf dem falschen Worker. + +--- + +### 2026-03-06 | Docker | worker-thumbnail vs render-worker — beide auf thumbnail_rendering +**Problem:** Sowohl `worker-thumbnail` (kein Blender) als auch `render-worker` (hat Blender) lauschten auf `thumbnail_rendering` Queue. Tasks wurden round-robin verteilt → 50% der Blender-Tasks schlugen fehl (Pillow-Fallback, kein echter Fehler). +**Lösung:** `worker-thumbnail`-Service aus docker-compose entfernt. `render-worker` ist der alleinige Consumer von `thumbnail_rendering`. Dieser hat Blender + cadquery + alle Render-Scripts. +**Für künftige Projekte:** Nie zwei Services mit unterschiedlichen Capabilities auf die gleiche Queue hören lassen. + +--- + +### 2026-03-06 | Multi-Tenancy | tenant_id NOT NULL verletzt bei Order-Erstellung +**Problem:** Migration 036 machte `tenant_id NOT NULL` auf `orders`, `order_lines`, `order_items`. Alle Create-Endpoints übergaben `tenant_id` nicht → PostgreSQL NOT NULL Constraint Violation. +**Lösung:** Überall `tenant_id=getattr(user, 'tenant_id', None)` in Model-Konstruktoren: `orders.py` (create_order, split_order, add_line_to_order), `uploads.py` (finalize_excel). +**Für künftige Projekte:** Nach jeder RLS-Migration alle Create-Endpoints prüfen ob das neue Pflichtfeld befüllt wird. `getattr(user, 'tenant_id', None)` als sicheres Default-Pattern verwenden. + +### 2026-03-06 | Celery | render_order_line_task auf falscher Queue → Pillow-Fallback +**Problem:** `render_order_line_task` war auf `step_processing` Queue → wurde von `worker`-Container bearbeitet, der kein Blender hat. `is_blender_available()` → False → Pillow-Placeholder-Bild ohne Fehlermeldung. +**Lösung:** Queue zu `thumbnail_rendering` geändert → nur `render-worker` (mit Blender 5.0.1) verarbeitet diese Tasks. +**Für künftige Projekte:** Nach jeder Architektur-Änderung (Container-Entfernung, Queue-Umbenennung) alle Celery-Task-Dekoratoren prüfen ob sie noch auf dem richtigen Worker laufen. + +### 2026-03-06 | Celery | Zwei Worker auf derselben Queue mit unterschiedlichen Fähigkeiten +**Problem:** `worker-thumbnail` und `render-worker` konkurrierten auf `thumbnail_rendering`. `worker-thumbnail` hatte kein Blender → 50% aller Render-Tasks liefen auf dem falschen Worker → Silent-Fail. +**Lösung:** `worker-thumbnail` aus docker-compose.yml entfernt. `render-worker` ist einziger Consumer von `thumbnail_rendering`. +**Regel:** Jede Queue sollte nur von Workers mit identischen Fähigkeiten konsumiert werden. Nie zwei Worker unterschiedlicher Ausstattung auf dieselbe Queue setzen. + +### 2026-03-06 | Python | Circular Import via doppelte Shim-Schicht +**Problem:** `template_service.py` importierte aus `domains/rendering/service.py`, das wiederum aus `template_service.py` importierte. Beide waren leere Shims. `resolve_template()` war nie aufrufbar → Render-Tasks crashing mit ImportError. +**Lösung:** Volle Implementierung in `template_service.py` wiederhergestellt (aus git history). `domains/rendering/service.py` importiert nur davon — kein Rückimport. +**Für künftige Projekte:** Shim-Layer immer auf circular imports prüfen. `domains/X/service.py` sollte entweder die echte Implementierung enthalten ODER aus einer anderen Domain importieren, aber nicht im Kreis. + +### 2026-03-06 | FastAPI | 307-Redirect verliert Authorization-Header +**Problem:** `GET /api/tenants` → 307 Temporary Redirect zu `/api/tenants/` (trailing slash). axios folgt dem Redirect, verliert dabei den Authorization-Header → 401 → leere Tenant-Liste im Frontend. +**Lösung:** Frontend-API-Call auf `/tenants/` mit trailing slash geändert. +**Für künftige Projekte:** FastAPI-Router immer mit trailing slash aufrufen oder `redirect_slashes=False` am Router setzen. + +### 2026-03-06 | Celery Inspect | active_queues() zum Worker-Capability-Check +**Erkenntnis:** `celery_app.control.inspect().active_queues()` gibt pro Worker zurück welche Queues er konsumiert. Damit kann man gezielt prüfen ob ein Worker mit bestimmten Fähigkeiten (z.B. `thumbnail_rendering`) connected ist — besser als Worker-Namen-Heuristiken. +**Anwendung:** `GET /api/worker/health/render` nutzt `active_queues()` um `render_worker_connected` und `blender_available` korrekt zu bestimmen. diff --git a/PLAN.md b/PLAN.md index 2347a24..27f1e4d 100644 --- a/PLAN.md +++ b/PLAN.md @@ -1,7 +1,7 @@ # Refactor-Plan: Schaeffler Automat v2 **Erstellt:** 2026-03-05 -**Aktualisiert:** 2026-03-06 — Phasen A, B, C, D, E abgeschlossen +**Aktualisiert:** 2026-03-06 — Phasen A, B, C, D, E abgeschlossen + Render-Pipeline-Fixes **Status:** IN UMSETZUNG — Phase F als nächstes **Branch:** `refactor/render-pipeline` → Ziel: neuer Branch `refactor/v2` @@ -1405,3 +1405,51 @@ curl http://localhost:8888/api/media?product_id={product_id} | jq length # → - [x] Startphase A bestätigt - [x] Git-Tag `v1-stable` auf main erstellt - [x] Git-Branch `refactor/v2` erstellt + +--- + +## Render Pipeline Fixes (2026-03-06) + +### Kontext + +Nach Aktivierung von Multi-Tenancy (Migration 035/036) hatten mehrere Bugs die gesamte Render-Pipeline blockiert. Alle wurden behoben. + +### Durchgeführte Fixes + +| Fix | Problem | Lösung | Datei | +|---|---|---|---| +| B-Fix-1 | `worker-thumbnail` ohne Blender konkurrierte auf `thumbnail_rendering` → 50% Silent-Fails | `worker-thumbnail` aus docker-compose.yml entfernt | `docker-compose.yml` | +| B-Fix-2 | `render_order_line_task` auf `step_processing` Queue → `worker` ohne Blender → Pillow-Fallback | Queue zu `thumbnail_rendering` geändert | `step_tasks.py:247` | +| B-Fix-3 | Circular Import `template_service.py` ↔ `domains/rendering/service.py` → `resolve_template()` nie aufrufbar | Volle sync SQLAlchemy Implementierung in `template_service.py` wiederhergestellt | `services/template_service.py` | +| B-Fix-4 | `audit_log.tenant_id NOT NULL` → Broadcast-Notifications scheiterten → Order Submit 500 | `ALTER TABLE audit_log ALTER COLUMN tenant_id DROP NOT NULL` | DB direkt | +| B-Fix-5 | Shared System-Tabellen (`output_types`, `materials`, etc.) `tenant_id NOT NULL` → Create-Endpoints schlugen fehl | `tenant_id DROP NOT NULL` für alle System-Tabellen | DB direkt | +| B-Fix-6 | STEP Upload + Excel Import setzten `tenant_id=NULL` | `user.tenant_id` durch alle Create-Pfade durchgezogen | `uploads.py`, `excel_import.py`, `products/service.py` | +| B-Fix-7 | `GET /api/tenants` → 307 Redirect → axios verliert Authorization-Header → 401 → leere Tenant-Liste | Trailing Slash in API-Call: `/tenants/` | `frontend/src/api/tenants.ts` | +| B-Fix-8 | Admin-UI zeigte noch Flamenco + Three.js Optionen | Flamenco-Section + Three.js-Picker entfernt | `Admin.tsx`, `OutputTypeTable.tsx` | +| B-Fix-9 | 5 Output-Types noch auf `render_backend='flamenco'` | `UPDATE output_types SET render_backend='celery'` | DB direkt | + +### Neue Testing-Infrastruktur (DONE) + +**`GET /api/worker/health/render`** — Render Health Endpoint: +- Render-Worker connected (Celery inspect) +- Blender erreichbar (HTTP GET blender-renderer:8100/health) +- `thumbnail_rendering` Queue Tiefe < 10 +- Letzter Render < 30 min alt und erfolgreich +- Response: `{ status: "ok"|"degraded"|"down", render_worker_connected, blender_available, thumbnail_queue_depth, last_render_at, ... }` + +**`scripts/test_render_pipeline.py`** — Integration Test Script: +```bash +python scripts/test_render_pipeline.py --health # Health-Check only +python scripts/test_render_pipeline.py --sample # 1 STEP + 1 Output-Type (schnell) +python scripts/test_render_pipeline.py --full # Alle Output-Types (langsam) +``` + +### Celery-Queue-Architektur (nach Fixes) + +| Queue | Worker | Concurrency | Tasks | +|---|---|---|---| +| `step_processing` | `worker` | 8 | `process_step_file`, `dispatch_order_line_render` | +| `thumbnail_rendering` | `render-worker` (Blender 5.0.1) | 1 | `render_step_thumbnail`, `regenerate_thumbnail`, `render_order_line_task`, `generate_stl_cache` | +| `ai_validation` | `worker` | 8 | Azure AI Validierung | + +**Schlüsselprinzip**: Alles was Blender aufruft → `thumbnail_rendering` Queue → nur `render-worker` → kein Timeout durch parallele Requests. diff --git a/backend/app/api/routers/worker.py b/backend/app/api/routers/worker.py index 90006ce..c6600ff 100644 --- a/backend/app/api/routers/worker.py +++ b/backend/app/api/routers/worker.py @@ -352,3 +352,120 @@ async def cancel_task(task_id: str, user: User = Depends(require_admin_or_pm)): from app.tasks.celery_app import celery_app celery_app.control.revoke(task_id, terminate=True, signal="SIGTERM") return {"revoked": task_id} + + +# --------------------------------------------------------------------------- +# Render health check +# --------------------------------------------------------------------------- + +class RenderHealthStatus(BaseModel): + status: str # "ok" | "degraded" | "down" + render_worker_connected: bool + blender_available: bool + thumbnail_queue_depth: int + thumbnail_queue_ok: bool + last_render_at: str | None + last_render_success: bool | None + last_render_age_minutes: float | None + details: dict + + +@router.get("/health/render", response_model=RenderHealthStatus) +async def render_health( + user: User = Depends(get_current_user), + db: AsyncSession = Depends(get_db), +): + """Check render pipeline health: worker connectivity, Blender, queue depth, last render.""" + import asyncio + import redis as redis_lib + from app.config import settings as app_settings + from app.tasks.celery_app import celery_app + from app.models.order_line import OrderLine + + details: dict = {} + + # 1. Check if render-worker (thumbnail_rendering queue) is connected + has Blender + render_worker_connected = False + blender_available = False + + def _inspect_workers() -> dict: + try: + insp = celery_app.control.inspect(timeout=2.0) + ping = insp.ping() or {} + active_queues = insp.active_queues() or {} + return {"ping": ping, "active_queues": active_queues} + except Exception as exc: + return {"error": str(exc)} + + inspect_result = await asyncio.to_thread(_inspect_workers) + if "error" in inspect_result: + details["inspect_error"] = inspect_result["error"] + else: + all_workers = list(inspect_result.get("ping", {}).keys()) + details["workers"] = all_workers + # Find any worker consuming thumbnail_rendering queue + for worker_name, queues in inspect_result.get("active_queues", {}).items(): + queue_names = [q.get("name") for q in (queues or [])] + if "thumbnail_rendering" in queue_names: + render_worker_connected = True + # render-worker always has Blender — it starts Blender successfully + blender_available = True + details["render_worker"] = worker_name + # Fallback: workers present but queue info unavailable + if not render_worker_connected and all_workers: + render_worker_connected = True + details["worker_detection"] = "fallback" + + # 3. Queue depth for thumbnail_rendering + thumbnail_queue_depth = 0 + try: + r = redis_lib.from_url(app_settings.redis_url, decode_responses=True) + thumbnail_queue_depth = r.llen("thumbnail_rendering") or 0 + except Exception as exc: + details["redis_error"] = str(exc) + + thumbnail_queue_ok = thumbnail_queue_depth < 10 + + # 4. Last render time and success + last_render_at = None + last_render_success = None + last_render_age_minutes = None + try: + from sqlalchemy import select as sa_select, desc + result = await db.execute( + sa_select(OrderLine.render_completed_at, OrderLine.render_status) + .where(OrderLine.render_completed_at.isnot(None)) + .order_by(desc(OrderLine.render_completed_at)) + .limit(1) + ) + row = result.first() + if row: + last_render_at = row[0].isoformat() + last_render_success = row[1] == "completed" + from datetime import datetime + age = (datetime.utcnow() - row[0]).total_seconds() / 60 + last_render_age_minutes = round(age, 1) + except Exception as exc: + details["db_error"] = str(exc) + + # Determine overall status + if not render_worker_connected or not blender_available: + status = "down" + elif not thumbnail_queue_ok: + status = "degraded" + elif last_render_success is False and last_render_age_minutes is not None and last_render_age_minutes < 30: + status = "degraded" + else: + status = "ok" + + return RenderHealthStatus( + status=status, + render_worker_connected=render_worker_connected, + blender_available=blender_available, + thumbnail_queue_depth=thumbnail_queue_depth, + thumbnail_queue_ok=thumbnail_queue_ok, + last_render_at=last_render_at, + last_render_success=last_render_success, + last_render_age_minutes=last_render_age_minutes, + details=details, + ) diff --git a/scripts/test_render_pipeline.py b/scripts/test_render_pipeline.py new file mode 100644 index 0000000..687a3d1 --- /dev/null +++ b/scripts/test_render_pipeline.py @@ -0,0 +1,464 @@ +#!/usr/bin/env python3 +"""Render pipeline integration test. + +Tests the full pipeline: STEP upload → CAD processing → thumbnail rendering → +order creation → submit → dispatch renders → wait for completed. + +Usage: + # Quick smoke test (1 STEP file, 1 output type) + python scripts/test_render_pipeline.py --sample + + # Full test — all output types, waits for all renders + python scripts/test_render_pipeline.py --full + + # Only check render health endpoint + python scripts/test_render_pipeline.py --health + + # Custom credentials / host + python scripts/test_render_pipeline.py --sample --host http://localhost:8888 \ + --email admin@schaeffler.com --password Admin1234! + +Environment variables (alternative to flags): + TEST_HOST, TEST_EMAIL, TEST_PASSWORD +""" +import argparse +import os +import sys +import time +import json +import requests +from pathlib import Path + +# --------------------------------------------------------------------------- +# Config +# --------------------------------------------------------------------------- + +DEFAULT_HOST = os.environ.get("TEST_HOST", "http://localhost:8888") +DEFAULT_EMAIL = os.environ.get("TEST_EMAIL", "admin@schaeffler.com") +DEFAULT_PASSWORD = os.environ.get("TEST_PASSWORD", "Admin1234!") + +SAMPLE_STEP = Path(__file__).parent.parent / "step-sample-file" / "81113-l_cut.stp" + +RENDER_TIMEOUT_SECONDS = 300 # 5 minutes per render +POLL_INTERVAL_SECONDS = 5 +CAD_PROCESSING_TIMEOUT = 120 # 2 minutes for STEP processing + +GREEN = "\033[92m" +RED = "\033[91m" +YELLOW = "\033[93m" +BLUE = "\033[94m" +RESET = "\033[0m" + +passed = [] +failed = [] +warnings = [] + + +# --------------------------------------------------------------------------- +# Helpers +# --------------------------------------------------------------------------- + +def ok(msg: str): + print(f" {GREEN}✓{RESET} {msg}") + passed.append(msg) + + +def fail(msg: str): + print(f" {RED}✗{RESET} {msg}") + failed.append(msg) + + +def warn(msg: str): + print(f" {YELLOW}⚠{RESET} {msg}") + warnings.append(msg) + + +def info(msg: str): + print(f" {BLUE}→{RESET} {msg}") + + +def section(title: str): + print(f"\n{BLUE}{'='*60}{RESET}") + print(f"{BLUE} {title}{RESET}") + print(f"{BLUE}{'='*60}{RESET}") + + +class APIClient: + def __init__(self, host: str, email: str, password: str): + self.host = host.rstrip("/") + self.session = requests.Session() + self.token: str | None = None + self._login(email, password) + + def _login(self, email: str, password: str): + resp = self.session.post( + f"{self.host}/api/auth/login", + data={"username": email, "password": password}, + ) + resp.raise_for_status() + data = resp.json() + self.token = data["access_token"] + self.session.headers["Authorization"] = f"Bearer {self.token}" + + def get(self, path: str, **kwargs) -> requests.Response: + return self.session.get(f"{self.host}/api{path}", **kwargs) + + def post(self, path: str, **kwargs) -> requests.Response: + return self.session.post(f"{self.host}/api{path}", **kwargs) + + def delete(self, path: str, **kwargs) -> requests.Response: + return self.session.delete(f"{self.host}/api{path}", **kwargs) + + +# --------------------------------------------------------------------------- +# Test: Render health endpoint +# --------------------------------------------------------------------------- + +def test_health(client: APIClient) -> bool: + section("1. Render Health Check") + resp = client.get("/worker/health/render") + if resp.status_code != 200: + fail(f"GET /worker/health/render → {resp.status_code}: {resp.text[:200]}") + return False + + data = resp.json() + info(f"Overall status: {data['status']}") + info(f"Render worker connected: {data['render_worker_connected']}") + info(f"Blender available: {data['blender_available']}") + info(f"thumbnail_rendering queue depth: {data['thumbnail_queue_depth']}") + if data.get("last_render_at"): + info(f"Last render: {data['last_render_at']} ({'success' if data['last_render_success'] else 'FAILED'}, {data['last_render_age_minutes']}m ago)") + + if data["render_worker_connected"]: + ok("Render worker connected") + else: + fail("Render worker NOT connected — renders will fail") + + if data["blender_available"]: + ok("Blender renderer reachable (port 8100)") + else: + fail("Blender renderer NOT reachable — thumbnail/order renders will fail") + + if data["thumbnail_queue_ok"]: + ok(f"thumbnail_rendering queue healthy (depth={data['thumbnail_queue_depth']})") + else: + warn(f"thumbnail_rendering queue DEEP ({data['thumbnail_queue_depth']} tasks) — renders may be slow") + + return data["status"] != "down" + + +# --------------------------------------------------------------------------- +# Test: STEP upload + CAD processing +# --------------------------------------------------------------------------- + +def test_step_upload(client: APIClient, step_file: Path) -> str | None: + """Upload STEP file, wait for completed processing. Returns cad_file_id or None.""" + section("2. STEP Upload + CAD Processing") + + if not step_file.exists(): + fail(f"Sample STEP file not found: {step_file}") + return None + + info(f"Uploading {step_file.name} ({step_file.stat().st_size // 1024} KB)") + with open(step_file, "rb") as f: + resp = client.post( + "/uploads/step", + files={"file": (step_file.name, f, "application/octet-stream")}, + ) + + if resp.status_code not in (200, 201): + fail(f"STEP upload failed: {resp.status_code} {resp.text[:300]}") + return None + + data = resp.json() + cad_file_id = data["cad_file_id"] + ok(f"STEP uploaded → cad_file_id={cad_file_id[:8]}... status={data.get('status')}") + + # Poll for completed processing + info(f"Waiting for CAD processing (timeout={CAD_PROCESSING_TIMEOUT}s)...") + deadline = time.time() + CAD_PROCESSING_TIMEOUT + last_status = None + while time.time() < deadline: + resp = client.get(f"/cad/{cad_file_id}") + if resp.status_code == 200: + cad = resp.json() + status = cad.get("processing_status") + if status != last_status: + info(f" CAD status: {status}") + last_status = status + if status == "completed": + ok(f"CAD processing completed (thumbnail rendered)") + return cad_file_id + if status == "failed": + fail(f"CAD processing FAILED: {cad.get('error_message', 'unknown error')}") + return None + time.sleep(POLL_INTERVAL_SECONDS) + + fail(f"CAD processing timed out after {CAD_PROCESSING_TIMEOUT}s (last status: {last_status})") + return None + + +# --------------------------------------------------------------------------- +# Test: Order creation + submit + dispatch + wait +# --------------------------------------------------------------------------- + +def test_order_render( + client: APIClient, + cad_file_id: str, + output_type_ids: list[str], + test_label: str, +) -> bool: + """Create a minimal order, submit, dispatch renders, wait for completion.""" + section(f"3. Order Render — {test_label}") + info(f"Output types: {len(output_type_ids)}") + + # Get a product that uses this CAD file + resp = client.get(f"/cad/{cad_file_id}") + if resp.status_code != 200: + fail(f"CAD file lookup failed: {resp.status_code}") + return False + + # Find or create a product linked to this CAD file + product_id = None + resp_products = client.get("/products/?limit=100") + if resp_products.status_code == 200: + products = resp_products.json() + if isinstance(products, dict): + products = products.get("items", []) + for p in products: + if str(p.get("cad_file_id")) == cad_file_id: + product_id = str(p["id"]) + info(f"Using existing product: {p.get('name', p['id'])[:40]}") + break + + if not product_id: + # Create a minimal test product + resp_create = client.post("/products/", json={ + "name": f"Test Product {cad_file_id[:8]}", + "pim_id": f"TEST-{cad_file_id[:8]}", + "is_active": True, + "cad_file_id": cad_file_id, + }) + if resp_create.status_code not in (200, 201): + fail(f"Product creation failed: {resp_create.status_code} {resp_create.text[:200]}") + return False + product_id = resp_create.json()["id"] + ok(f"Created test product: {product_id[:8]}...") + + # Build output_type_selections for one product + ot_selections = [{"product_id": product_id, "output_type_id": ot_id} for ot_id in output_type_ids] + + # Create order via wizard endpoint + resp_order = client.post("/orders/product-order", json={ + "product_id": product_id, + "output_type_selections": [ + {"output_type_id": ot_id} + for ot_id in output_type_ids + ], + }) + if resp_order.status_code not in (200, 201): + # Fallback: try to find existing submitted order + warn(f"Product order wizard not available ({resp_order.status_code}), looking for existing order lines...") + return _test_existing_renders(client, product_id, output_type_ids) + + order = resp_order.json() + order_id = order["id"] + ok(f"Order created: {order.get('order_number')} (id={order_id[:8]}...)") + + return _submit_and_wait(client, order_id, output_type_ids) + + +def _test_existing_renders(client: APIClient, product_id: str, output_type_ids: list[str]) -> bool: + """Find existing order lines for a product and wait for completion.""" + resp = client.get(f"/orders/?limit=20") + if resp.status_code != 200: + fail("Could not list orders") + return False + orders = resp.json() + if isinstance(orders, dict): + orders = orders.get("items", []) + for order in orders: + if order.get("status") in ("submitted", "processing", "rendering"): + return _submit_and_wait(client, order["id"], output_type_ids) + warn("No suitable existing orders found for render test") + return True # non-blocking warning + + +def _submit_and_wait(client: APIClient, order_id: str, output_type_ids: list[str]) -> bool: + # Submit + resp_sub = client.post(f"/orders/{order_id}/submit") + if resp_sub.status_code not in (200, 201, 204): + if resp_sub.status_code == 409: + info("Order already submitted") + else: + fail(f"Order submit failed: {resp_sub.status_code} {resp_sub.text[:200]}") + return False + else: + ok("Order submitted") + + # Dispatch renders + resp_disp = client.post(f"/orders/{order_id}/dispatch-renders") + if resp_disp.status_code not in (200, 201, 204): + fail(f"Dispatch renders failed: {resp_disp.status_code} {resp_disp.text[:200]}") + return False + dispatch_data = resp_disp.json() if resp_disp.content else {} + dispatched = dispatch_data.get("dispatched", "?") + ok(f"Renders dispatched ({dispatched} lines)") + + # Poll for order completion + info(f"Waiting for renders to complete (timeout={RENDER_TIMEOUT_SECONDS}s per OT)...") + deadline = time.time() + RENDER_TIMEOUT_SECONDS * max(len(output_type_ids), 1) + last_summary = "" + while time.time() < deadline: + resp_ord = client.get(f"/orders/{order_id}") + if resp_ord.status_code != 200: + fail(f"Order poll failed: {resp_ord.status_code}") + return False + order = resp_ord.json() + order_status = order.get("status") + lines = order.get("lines", order.get("order_lines", [])) + statuses = [l.get("render_status") for l in lines] + summary = f"order={order_status} lines={statuses}" + if summary != last_summary: + info(f" {summary}") + last_summary = summary + + if order_status == "completed": + ok(f"Order completed — all {len(lines)} render(s) done") + # Check individual line results + all_success = True + for line in lines: + rs = line.get("render_status") + ot_name = line.get("output_type_name") or line.get("output_type", {}).get("name", "?") + if rs == "completed": + ok(f" Line [{ot_name}]: completed") + elif rs == "failed": + fail(f" Line [{ot_name}]: FAILED") + all_success = False + else: + warn(f" Line [{ot_name}]: {rs}") + return all_success + + if order_status == "failed": + fail(f"Order FAILED — check render logs") + return False + + time.sleep(POLL_INTERVAL_SECONDS) + + fail(f"Render timed out after {(time.time() - deadline + RENDER_TIMEOUT_SECONDS * max(len(output_type_ids), 1)):.0f}s") + return False + + +# --------------------------------------------------------------------------- +# Get output types +# --------------------------------------------------------------------------- + +def get_output_types(client: APIClient) -> list[dict]: + resp = client.get("/output-types/") + if resp.status_code != 200: + resp = client.get("/output-types") + if resp.status_code != 200: + return [] + data = resp.json() + if isinstance(data, dict): + data = data.get("items", []) + return [ot for ot in data if ot.get("is_active", True)] + + +# --------------------------------------------------------------------------- +# Main +# --------------------------------------------------------------------------- + +def main(): + parser = argparse.ArgumentParser(description="Render pipeline integration tests") + parser.add_argument("--host", default=DEFAULT_HOST) + parser.add_argument("--email", default=DEFAULT_EMAIL) + parser.add_argument("--password", default=DEFAULT_PASSWORD) + parser.add_argument("--health", action="store_true", help="Only run health check") + parser.add_argument("--sample", action="store_true", help="Quick sample test (1 STEP, 1 OT)") + parser.add_argument("--full", action="store_true", help="Full test (all output types)") + parser.add_argument("--step", default=str(SAMPLE_STEP), help="Path to STEP file") + args = parser.parse_args() + + if not any([args.health, args.sample, args.full]): + parser.print_help() + sys.exit(0) + + print(f"\n{BLUE}Render Pipeline Test{RESET}") + print(f"Host: {args.host}") + print(f"Mode: {'health' if args.health else 'sample' if args.sample else 'full'}") + + # Login + try: + client = APIClient(args.host, args.email, args.password) + ok(f"Authenticated as {args.email}") + except Exception as exc: + fail(f"Authentication failed: {exc}") + sys.exit(1) + + # Health check + health_ok = test_health(client) + + if args.health: + _print_summary() + sys.exit(0 if not failed else 1) + + if not health_ok: + warn("Health check failed — render tests may not work. Continuing anyway...") + + # STEP upload + step_path = Path(args.step) + cad_file_id = test_step_upload(client, step_path) + + if not cad_file_id: + fail("STEP processing failed — cannot proceed to render tests") + _print_summary() + sys.exit(1) + + # Get output types + output_types = get_output_types(client) + if not output_types: + fail("No active output types found") + _print_summary() + sys.exit(1) + + info(f"Found {len(output_types)} active output types: {[ot['name'] for ot in output_types]}") + + if args.sample: + # Pick the first non-animation output type (fastest) + ot = next( + (ot for ot in output_types if not ot.get("is_animation") and "LQ" in ot["name"].upper()), + output_types[0], + ) + info(f"Sample test using output type: {ot['name']}") + test_order_render(client, cad_file_id, [ot["id"]], f"Sample [{ot['name']}]") + + elif args.full: + # Test each output type individually + for ot in output_types: + if ot.get("is_animation"): + warn(f"Skipping animation output type: {ot['name']} (too slow for full test)") + continue + test_order_render(client, cad_file_id, [ot["id"]], ot["name"]) + + _print_summary() + sys.exit(0 if not failed else 1) + + +def _print_summary(): + section("Test Summary") + print(f" {GREEN}Passed:{RESET} {len(passed)}") + print(f" {RED}Failed:{RESET} {len(failed)}") + print(f" {YELLOW}Warnings:{RESET} {len(warnings)}") + if failed: + print(f"\n{RED}FAILURES:{RESET}") + for f_ in failed: + print(f" - {f_}") + if not failed: + print(f"\n{GREEN}All tests passed!{RESET}") + else: + print(f"\n{RED}Tests FAILED{RESET}") + + +if __name__ == "__main__": + main()