20 Commits

Author SHA1 Message Date
Hartmut 3e810c74a3 chore: snapshot workflow migration progress 2026-04-12 11:49:04 +02:00
Hartmut f13cb489c1 fix: migrate runtime data to native hartomat storage 2026-04-06 18:09:51 +02:00
Hartmut 8990b80abf fix: restore existing runtime data volumes 2026-04-06 18:01:15 +02:00
Hartmut 448996b546 fix: stabilize HartOMat runtime startup 2026-04-06 13:10:51 +02:00
Hartmut b795f0e6d6 refactor: rebrand project to HartOMat 2026-04-06 12:45:47 +02:00
Hartmut daad2c64f3 fix: revert dual queue to single GPU — light worker caused 2x regression
Root cause: render-worker and render-worker-light shared the same GPU,
causing contention. Complex TRB renders went from 17s → 36s (2x slower).

Changes:
- Thumbnails back to asset_pipeline queue (not asset_pipeline_light)
- Dispatch routing always uses asset_pipeline (no queue splitting)
- render-worker-light gated behind "multi-gpu" profile — only starts with:
  docker compose --profile multi-gpu up -d
- For single-GPU setups: all rendering is sequential on one worker

The dual queue approach is correct for multi-GPU machines where each
worker gets its own GPU. On single-GPU, serial execution is faster
than concurrent GPU contention.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 12:33:26 +01:00
Hartmut 5a148554c0 perf: dual queue, GLB caching, WebP output, persistent BVH
Task 4: Dual render queue
- render-worker: heavy (asset_pipeline, concurrency=1) — HQ 2048x2048, animations
- render-worker-light: light (asset_pipeline_light, concurrency=2) — thumbnails, <=1024
- Thumbnails routed to light queue automatically
- Order line renders routed by resolution at dispatch time

Task 5: GLB caching (skip re-tessellation)
- Before tessellating, check if gltf_geometry MediaAsset exists for the cad_file_id
- If found, copy to expected path — render_blender.py finds it and skips tessellation
- Saves 7-11s per re-render of the same product

Task 6: WebP output format
- New 'webp' option in output_format (OutputType admin)
- Blender renders PNG intermediate, Pillow converts to WebP (quality=90, method=4)
- 50-70% smaller files with no visible quality loss
- Correct MIME type (image/webp) in MediaAsset

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 12:07:12 +01:00
Hartmut 8c65c52271 fix: persist OptiX kernel cache — 9x faster first render after restart
The OptiX cache was mounted at /root/.nv but NVIDIA writes to
/var/tmp/OptixCache_root/optix7cache.db (28MB). Fixed volume mount.

Before: first render after container restart = 181s (OptiX recompilation)
After: first render after container restart = 20s (cached kernels)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 09:57:11 +01:00
Hartmut 1321ef2bd4 refactor: rename thumbnail_rendering queue to asset_pipeline
The queue handles far more than thumbnails: OCC tessellation, USD master
generation, GLB production, order line renders, and workflow renders.
asset_pipeline better reflects its role as the render-worker's primary queue.

Updated all references in: task decorators, celery_app.py, beat_tasks.py,
docker-compose.yml worker command, worker.py MONITORED_QUEUES, admin.py,
CLAUDE.md, LEARNINGS.md, Dockerfile, helpTexts.ts, test files,
and all .claude/commands/*.md skill files.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 22:28:38 +01:00
Hartmut ac48d359e6 fix(render): persist OptiX BVH cache across render-worker rebuilds
Mount named volume optix-cache:/root/.nv so the OptiX ComputeCache
survives docker compose rebuild. Without this every rebuild wiped the
BVH acceleration structure, causing the first render of any complex
scene (~175 parts) to take 130–150s instead of 22s while OptiX
recompiles kernels and rebuilds the BVH from scratch.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 21:15:45 +01:00
Hartmut 34f89cc225 feat(gpu): GPU health check + RENDER_DEVICE_USED token + strict mode
- gpu_probe.py: Blender script that probes OPTIX/CUDA/HIP/ONEAPI and
  exits 1 on no GPU — used at startup + on-demand from Admin UI
- blender_render.py, still_render.py, turntable_render.py: emit
  RENDER_DEVICE_USED: engine=CYCLES device=GPU|CPU compute_type=...
  after GPU activation; exit 2 when CYCLES_DEVICE=gpu and CPU fallback
- render_blender.py: parse RENDER_DEVICE_USED token into render_log
  (device_used, compute_type, gpu_fallback); handle exit code 2 as
  explicit GPU strict-mode failure
- check_version.py: check_gpu() runs gpu_probe.py at container startup;
  CYCLES_DEVICE=gpu aborts startup if no GPU found
- docker-compose.yml: CYCLES_DEVICE=${CYCLES_DEVICE:-auto} env var
- gpu_tasks.py: probe_gpu Celery task on thumbnail_rendering queue;
  saves result to system_settings.gpu_probe_last_result; beat every 30min
- worker.py: POST /probe/gpu (trigger) + GET /probe/gpu/result (last result)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 20:57:36 +01:00
Hartmut 07e3d1e026 feat(phase8.1-8.2): dynamic worker concurrency via worker_configs
- Migration 054: worker_configs table (queue_name PK, max/min_concurrency,
  enabled, updated_at); seeds step_processing(8/2), thumbnail_rendering(1/1),
  ai_validation(4/1)
- WorkerConfig SQLAlchemy model
- apply_worker_concurrency beat task: reads enabled configs, broadcasts
  pool_grow to all Celery workers every 5min
- GET/PUT /api/worker/configs (admin): list + update per-queue concurrency
- docker-compose.yml: worker uses --autoscale=${MAX_CONCURRENCY:-8},${MIN_CONCURRENCY:-2};
  render-worker uses --autoscale=1,1 --concurrency=1
- WorkerManagement.tsx: "Concurrency Settings" section with +/- steppers
  and Save button per queue

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 20:41:57 +01:00
Hartmut a70cb55d01 feat(N): workflow pipeline, 3D viewer, worker management, QC tests
- workflow_builder.py: fix broken stubs, add render_order_line_still_task
  (resolves step_path from DB instead of passing order_line_id as step_path)
- domains/rendering/tasks.py: add render_order_line_still_task,
  export_gltf_for_order_line_task, export_blend_for_order_line_task,
  generate_gltf_geometry_task (trimesh STL→GLB, no Blender needed)
- tasks/step_tasks.py: add generate_gltf_geometry_task for CadFile GLB export
- cad router: POST /{id}/generate-gltf-geometry endpoint (admin/PM)
- worker router: GET /celery-workers + POST /scale (docker compose subprocess)
- Dockerfile: pip install -e "[dev]" to enable pytest
- docker-compose.yml: docker socket + compose file mount on backend
- ThreeDViewer.tsx: mode toggle (geometry/production), wireframe, env presets,
  download buttons (GLB + .blend)
- CadPreview.tsx: load gltf_geometry/gltf_production/blend_production assets
  from MediaAsset table and pass URLs to ThreeDViewer
- ProductDetail.tsx: "View 3D" button → /cad/:id, "Generate GLB" button
- media router/service: cad_file_id filter on GET /api/media
- WorkerManagement.tsx: new page with worker status, queue depth, scale controls
- App.tsx + Layout.tsx: /workers route + sidebar link (admin/PM)
- tests: test_rendering_service.py, test_orders_service.py (backend)
- tests: WorkerActivity.test.tsx, WorkerManagement.test.tsx (frontend)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 22:56:53 +01:00
Hartmut ab3f9c734a fix: render pipeline + multi-tenancy bugs (B-Fix-1 through B-Fix-9)
- Remove worker-thumbnail (no Blender, was competing on thumbnail_rendering)
- Move render_order_line_task to thumbnail_rendering queue (render-worker)
- Restore template_service.py real implementation (fix circular import shim)
- Thread tenant_id through STEP upload, Excel import, product create
- Make system tables (output_types, materials, etc.) tenant_id nullable
- Fix tenants frontend 307-redirect: use trailing slash /tenants/
- Remove Flamenco + Three.js from Admin UI (unsupported)
- Set all output_types render_backend to celery (was flamenco)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 19:34:20 +01:00
Hartmut f19a6ccde8 feat(F-G-H-I): STL cache, invoices, import validation, notification settings
Phase F — STL Hash Cache:
- Migration 041: step_file_hash column on cad_files
- cache_service.py: SHA256 hash + MinIO-backed STL cache (check/store)
- render_step_thumbnail: compute+persist hash before render
- generate_stl_cache: check MinIO cache before cadquery conversion, store after

Phase G — Invoices:
- Migration 042: invoices + invoice_lines tables with RLS
- Invoice/InvoiceLine models + schemas
- billing service: generate_invoice_number (INV-YYYY-NNNN), create/list/get/delete/PDF
- WeasyPrint PDF generation; backend Dockerfile + pyproject.toml deps
- invoice_router with 6 endpoints; registered in main.py
- frontend: Billing.tsx page + api/billing.ts; route + nav link

Phase H — Import Sanity Check:
- Migration 043: import_validations table
- ImportValidation model + schemas
- run_sanity_check: material fuzzy-match (cutoff=0.8), STEP availability, duplicate detection
- validate_excel_import Celery task (queue: step_processing)
- uploads.py: create ImportValidation on /excel, fire task, expose GET /validations/{id}
- frontend: Upload.tsx polling ValidationDialog with Ampel status indicators

Phase I — Notification Settings:
- Migration 044: notification_configs table (user×event×channel toggles)
- NotificationConfig model + seeds (in_app=true, email=false)
- get/upsert/reset config endpoints on /notifications/config
- frontend: NotificationSettings.tsx page + api/notifications.ts extensions

Infrastructure:
- docker-compose.yml: add worker-thumbnail service (concurrency=1, Q=thumbnail_rendering)
- Fix Dockerfile: libgdk-pixbuf-2.0-0 (correct Debian bookworm package name)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 18:05:01 +01:00
Hartmut 7706c514c8 fix(deploy): fix render-worker build context + migration 040 idempotency
- docker-compose.yml: change render-worker build context from ./render-worker
  to . (project root) so pyproject.toml is accessible; update dockerfile path
- render-worker/Dockerfile: update COPY paths for new build context;
  install Python 3.11 via deadsnakes PPA (Ubuntu 22.04 ships 3.10 which
  fails the >=3.11 requirement in pyproject.toml)
- 040_media_assets.py: rewrite upgrade() with raw idempotent SQL (CREATE TYPE
  inside DO $$ EXCEPTION WHEN duplicate_object $$; CREATE TABLE IF NOT EXISTS;
  CREATE INDEX IF NOT EXISTS) to handle pre-existing enum from partial runs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 17:32:42 +01:00
Hartmut c728358fb6 feat(A4): add MinIO service + storage abstraction (core/storage.py)
- Add MinIO service to docker-compose.yml (port 9000 API, 9001 console)
- Add minio-data volume for persistent object storage
- Create backend/app/core/storage.py: MinIOStorage + LocalStorage abstraction
  - MinIOStorage: boto3-based, auto-creates bucket, upload/download/exists/delete/presign
  - LocalStorage: fallback for dev (UPLOAD_DIR filesystem, backward compat)
  - get_storage() singleton: auto-selects based on MINIO_URL env var
- Add MINIO_URL/USER/PASSWORD/BUCKET env vars to all service definitions
- backend/pyproject.toml: docker>=6.1.0 → boto3>=1.34.0
- Add docker-compose.worker.yml: external render-worker for remote machines
- Fix .gitignore: 'core' rule was too broad, now only matches root /core dump
- Update .env.example: MinIO connection vars documented

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 15:51:06 +01:00
Hartmut 9d1a820295 refactor(A2): replace blender-renderer HTTP service with render-worker Celery container
- Create render-worker/ with Dockerfile (Ubuntu + cadquery + Blender via host mount)
- Add render-worker/check_version.py: verifies Blender >= 5.0.1 at startup, Exit 1 on failure
- Add render-worker/scripts/: blender_render.py, still_render.py, turntable_render.py
- Create backend/app/services/render_blender.py: direct subprocess rendering
  - convert_step_to_stl() and export_per_part_stls() using cadquery
  - render_still(): STEP → STL → PNG via Blender subprocess
  - is_blender_available(): detects BLENDER_BIN env for render-worker context
- Create backend/app/domains/rendering/tasks.py: render_still_task + render_turntable_task
- Update step_processor.py: use subprocess path when BLENDER_BIN env is set (render-worker)
- Update step_tasks.py: generate_stl_cache uses direct cadquery instead of HTTP
- Remove blender-renderer and threejs-renderer from docker-compose.yml
- Replace worker-thumbnail with render-worker (Ubuntu + cadquery + Blender mount)
- Remove Docker SDK from backend Dockerfile (was only for flamenco scaling)
- Update .env.example: BLENDER_VERSION=5.0.1 documented
- Update celery_app.py: include domains.rendering.tasks in autodiscover

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 15:48:46 +01:00
Hartmut 1d6864fb64 refactor(A1): remove Flamenco, simplify render pipeline to Celery-only
- Remove flamenco-manager and flamenco-worker from docker-compose.yml
- Delete flamenco_client.py, flamenco_tasks.py, docker_scaler.py
- Simplify render_dispatcher.py to Celery-only (removes ~300 lines)
- Remove Flamenco beat schedule from celery_app.py
- Clean admin.py: remove flamenco settings, endpoints, threejs validation
- Clean orders.py cancel-render: Celery revoke only
- Clean worker.py: remove flamenco_job_id from activity response
- Migration 032: cancel lingering flamenco jobs, remove flamenco settings
- PLAN.md: mark all decisions confirmed, status IN UMSETZUNG

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 15:38:37 +01:00
Hartmut bce762a783 feat: initial commit 2026-03-05 22:12:38 +01:00