Extract volume, surface area, part count, assembly hierarchy, and complexity from STEP files via OCC B-rep analysis. Backend: - extract_rich_metadata() in step_processor.py: computes per-part volume (BRepGProp), surface area, triangle/vertex count, assembly depth, instance count, complexity score, largest part identification - cad_metadata JSONB column on Product model (DB migration) - Auto-populated during STEP processing (non-fatal, 10s timeout) - Also stored in cad_files.mesh_attributes["rich_metadata"] - Batch re-extract endpoint: POST /admin/settings/reextract-rich-metadata AI Agent: - search_products returns part_count, volume_cm3, complexity, largest_part - query_database tool description documents cad_metadata schema Frontend: - ProductDetail page: CAD Metadata section with stat cards (parts, volume, surface area, complexity, triangles, assembly depth) - Admin System Tools: "Re-extract Rich Metadata" button for backfill Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6.6 KiB
Plan: Rich Product Metadata Extraction from STEP Files
Context
The AI chat agent was asked "What is the biggest product from my order?" and couldn't answer because dimensional data wasn't available in tool results. While cad_files.mesh_attributes already stores bounding box dimensions, much more metadata is extractable from STEP files via OCC that would make the AI agent and the product library significantly more useful.
Currently extracted: part names, bounding box (xyz), sharp edges, smooth angle Available but not extracted: per-part volume, surface area, assembly hierarchy, instance counts, embedded colors, triangle counts, geometric complexity
Goal: Expand the STEP metadata extraction to compute richer product characteristics and store them in a structured cad_metadata JSONB field, accessible to the AI agent, product search, and frontend.
Affected Files
| File | Change |
|---|---|
backend/app/services/step_processor.py |
Expand extract_step_metadata() with volume, surface area, hierarchy, complexity |
backend/app/domains/products/models.py |
Add cad_metadata JSONB column to Product |
backend/alembic/versions/XXX_add_cad_metadata.py |
Migration |
backend/app/domains/pipeline/tasks/extract_metadata.py |
Populate cad_metadata after STEP processing |
backend/app/domains/products/schemas.py |
Expose cad_metadata in ProductOut |
backend/app/services/chat_service.py |
Include metadata in search_products and system prompt |
frontend/src/pages/ProductDetail.tsx |
Display rich metadata (volume, part count, complexity) |
Tasks (in order)
[ ] Task 1: Expand STEP metadata extraction
- File:
backend/app/services/step_processor.py - What: Expand
extract_step_metadata()to compute additional properties after the existing bbox/edge extraction. Add a new functionextract_rich_metadata(doc, shape_tool)that returns:Use OCC:{ "part_count": 42, # Number of leaf parts "assembly_depth": 3, # Max nesting depth "total_volume_cm3": 1250.4, # Sum of all part volumes (cm³) "total_surface_area_cm2": 3400.2, # Sum of all surface areas (cm²) "total_triangle_count": 45000, # After tessellation "total_vertex_count": 23000, # After tessellation "largest_part": { # Part with largest volume "name": "OuterRing", "volume_cm3": 450.2, }, "smallest_dimension_mm": 0.5, # Smallest bbox dimension across all parts "instance_count": 36, # Total instances (parts may repeat) "unique_part_count": 12, # Distinct shapes "complexity_score": "high", # low/medium/high based on triangle count }GProp_GProps+BRepGProp.VolumeProperties()for volumeBRepGProp.SurfaceProperties()for surface areaPoly_Triangulationfor triangle/vertex counts (after tessellation)- Assembly tree walk (already done in
_collect_part_key_map) for hierarchy depth + instance count
- Acceptance gate:
extract_rich_metadata()returns all fields for a test STEP file - Dependencies: None
[ ] Task 2: Add cad_metadata column to Product model
- File:
backend/app/domains/products/models.py - What: Add
cad_metadata: Mapped[dict | None] = mapped_column(JSONB, nullable=True, default=None)to the Product model. This stores the rich metadata at the product level (not cad_file) because products are the user-facing entity. - Migration:
alembic revision --autogenerate -m "add cad_metadata to products" - Also: Add to ProductOut schema in
backend/app/domains/products/schemas.py - Acceptance gate: Column exists, schema includes it
- Dependencies: None
[ ] Task 3: Populate cad_metadata during STEP processing
- File:
backend/app/domains/pipeline/tasks/extract_metadata.py - What: After
process_step_fileextracts objects and queues thumbnail, callextract_rich_metadata()and store the result on the Product'scad_metadatafield. Also store it oncad_files.mesh_attributes(merge with existing data). - Also: Add a "reextract metadata" admin action that re-runs this for all existing products
- Acceptance gate: After STEP processing, product.cad_metadata is populated with volume, part_count, etc.
- Dependencies: Tasks 1, 2
[ ] Task 4: Expose metadata in AI agent tools
- File:
backend/app/services/chat_service.py - What:
- Update
_tool_search_products()to includecad_metadatafields (part_count, total_volume_cm3, complexity_score) in results - Update
query_databasetool description to mentionproducts.cad_metadataJSONB field - Update system prompt to mention available metadata
- Update
- Acceptance gate: AI agent can answer "What is the biggest product?" using volume data
- Dependencies: Task 3
[ ] Task 5: Display rich metadata on ProductDetail page
- File:
frontend/src/pages/ProductDetail.tsx - What: Add a "CAD Metadata" section on the product detail page showing:
- Part count + unique parts + instances
- Total volume (cm³) + surface area (cm²)
- Largest part name + volume
- Complexity score badge (low/medium/high)
- Triangle/vertex count
- Assembly depth
- Acceptance gate: Metadata displayed on product page; empty gracefully when not available
- Dependencies: Task 2
[ ] Task 6: Batch re-extract metadata for existing products
- File:
backend/app/api/routers/admin.py - What: Add a "Re-extract Rich Metadata" button in System Tools that queues a Celery task to re-process all completed STEP files and populate
cad_metadatafor all products. - Acceptance gate: Button triggers batch job; existing products get metadata populated
- Dependencies: Tasks 1, 3
Migration Check
Yes — one new JSONB column on products table.
Order Recommendation
- Task 1 (extraction logic) + Task 2 (model + migration) — parallel
- Task 3 (wire up in pipeline)
- Task 4 (AI agent) + Task 5 (frontend) — parallel
- Task 6 (batch re-extract)
Risks / Open Questions
-
Volume calculation accuracy: OCC
BRepGPropcomputes exact B-rep volume, not mesh-based. This is accurate but can be slow for very complex shapes. Cap at 5s per file. -
Performance: Rich metadata extraction adds ~100-500ms per STEP file. This is acceptable since STEP processing already takes 1-5s.
-
Existing products: ~45 products with STEP files need backfill. Task 6 handles this.
-
Triangle count varies: Depends on tessellation settings (deflection angles). Store the count at the current tessellation quality for reference, with a note that it's approximate.