cfccdd5397
Extract volume, surface area, part count, assembly hierarchy, and complexity from STEP files via OCC B-rep analysis. Backend: - extract_rich_metadata() in step_processor.py: computes per-part volume (BRepGProp), surface area, triangle/vertex count, assembly depth, instance count, complexity score, largest part identification - cad_metadata JSONB column on Product model (DB migration) - Auto-populated during STEP processing (non-fatal, 10s timeout) - Also stored in cad_files.mesh_attributes["rich_metadata"] - Batch re-extract endpoint: POST /admin/settings/reextract-rich-metadata AI Agent: - search_products returns part_count, volume_cm3, complexity, largest_part - query_database tool description documents cad_metadata schema Frontend: - ProductDetail page: CAD Metadata section with stat cards (parts, volume, surface area, complexity, triangles, assembly depth) - Admin System Tools: "Re-extract Rich Metadata" button for backfill Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
123 lines
6.6 KiB
Markdown
123 lines
6.6 KiB
Markdown
# Plan: Rich Product Metadata Extraction from STEP Files
|
|
|
|
## Context
|
|
|
|
The AI chat agent was asked "What is the biggest product from my order?" and couldn't answer because dimensional data wasn't available in tool results. While `cad_files.mesh_attributes` already stores bounding box dimensions, much more metadata is extractable from STEP files via OCC that would make the AI agent and the product library significantly more useful.
|
|
|
|
**Currently extracted**: part names, bounding box (xyz), sharp edges, smooth angle
|
|
**Available but not extracted**: per-part volume, surface area, assembly hierarchy, instance counts, embedded colors, triangle counts, geometric complexity
|
|
|
|
**Goal**: Expand the STEP metadata extraction to compute richer product characteristics and store them in a structured `cad_metadata` JSONB field, accessible to the AI agent, product search, and frontend.
|
|
|
|
## Affected Files
|
|
|
|
| File | Change |
|
|
|------|--------|
|
|
| `backend/app/services/step_processor.py` | Expand `extract_step_metadata()` with volume, surface area, hierarchy, complexity |
|
|
| `backend/app/domains/products/models.py` | Add `cad_metadata` JSONB column to Product |
|
|
| `backend/alembic/versions/XXX_add_cad_metadata.py` | Migration |
|
|
| `backend/app/domains/pipeline/tasks/extract_metadata.py` | Populate `cad_metadata` after STEP processing |
|
|
| `backend/app/domains/products/schemas.py` | Expose `cad_metadata` in ProductOut |
|
|
| `backend/app/services/chat_service.py` | Include metadata in search_products and system prompt |
|
|
| `frontend/src/pages/ProductDetail.tsx` | Display rich metadata (volume, part count, complexity) |
|
|
|
|
## Tasks (in order)
|
|
|
|
### [ ] Task 1: Expand STEP metadata extraction
|
|
|
|
- **File**: `backend/app/services/step_processor.py`
|
|
- **What**: Expand `extract_step_metadata()` to compute additional properties after the existing bbox/edge extraction. Add a new function `extract_rich_metadata(doc, shape_tool)` that returns:
|
|
```python
|
|
{
|
|
"part_count": 42, # Number of leaf parts
|
|
"assembly_depth": 3, # Max nesting depth
|
|
"total_volume_cm3": 1250.4, # Sum of all part volumes (cm³)
|
|
"total_surface_area_cm2": 3400.2, # Sum of all surface areas (cm²)
|
|
"total_triangle_count": 45000, # After tessellation
|
|
"total_vertex_count": 23000, # After tessellation
|
|
"largest_part": { # Part with largest volume
|
|
"name": "OuterRing",
|
|
"volume_cm3": 450.2,
|
|
},
|
|
"smallest_dimension_mm": 0.5, # Smallest bbox dimension across all parts
|
|
"instance_count": 36, # Total instances (parts may repeat)
|
|
"unique_part_count": 12, # Distinct shapes
|
|
"complexity_score": "high", # low/medium/high based on triangle count
|
|
}
|
|
```
|
|
Use OCC:
|
|
- `GProp_GProps` + `BRepGProp.VolumeProperties()` for volume
|
|
- `BRepGProp.SurfaceProperties()` for surface area
|
|
- `Poly_Triangulation` for triangle/vertex counts (after tessellation)
|
|
- Assembly tree walk (already done in `_collect_part_key_map`) for hierarchy depth + instance count
|
|
- **Acceptance gate**: `extract_rich_metadata()` returns all fields for a test STEP file
|
|
- **Dependencies**: None
|
|
|
|
### [ ] Task 2: Add cad_metadata column to Product model
|
|
|
|
- **File**: `backend/app/domains/products/models.py`
|
|
- **What**: Add `cad_metadata: Mapped[dict | None] = mapped_column(JSONB, nullable=True, default=None)` to the Product model. This stores the rich metadata at the product level (not cad_file) because products are the user-facing entity.
|
|
- **Migration**: `alembic revision --autogenerate -m "add cad_metadata to products"`
|
|
- **Also**: Add to ProductOut schema in `backend/app/domains/products/schemas.py`
|
|
- **Acceptance gate**: Column exists, schema includes it
|
|
- **Dependencies**: None
|
|
|
|
### [ ] Task 3: Populate cad_metadata during STEP processing
|
|
|
|
- **File**: `backend/app/domains/pipeline/tasks/extract_metadata.py`
|
|
- **What**: After `process_step_file` extracts objects and queues thumbnail, call `extract_rich_metadata()` and store the result on the Product's `cad_metadata` field. Also store it on `cad_files.mesh_attributes` (merge with existing data).
|
|
- **Also**: Add a "reextract metadata" admin action that re-runs this for all existing products
|
|
- **Acceptance gate**: After STEP processing, product.cad_metadata is populated with volume, part_count, etc.
|
|
- **Dependencies**: Tasks 1, 2
|
|
|
|
### [ ] Task 4: Expose metadata in AI agent tools
|
|
|
|
- **File**: `backend/app/services/chat_service.py`
|
|
- **What**:
|
|
1. Update `_tool_search_products()` to include `cad_metadata` fields (part_count, total_volume_cm3, complexity_score) in results
|
|
2. Update `query_database` tool description to mention `products.cad_metadata` JSONB field
|
|
3. Update system prompt to mention available metadata
|
|
- **Acceptance gate**: AI agent can answer "What is the biggest product?" using volume data
|
|
- **Dependencies**: Task 3
|
|
|
|
### [ ] Task 5: Display rich metadata on ProductDetail page
|
|
|
|
- **File**: `frontend/src/pages/ProductDetail.tsx`
|
|
- **What**: Add a "CAD Metadata" section on the product detail page showing:
|
|
- Part count + unique parts + instances
|
|
- Total volume (cm³) + surface area (cm²)
|
|
- Largest part name + volume
|
|
- Complexity score badge (low/medium/high)
|
|
- Triangle/vertex count
|
|
- Assembly depth
|
|
- **Acceptance gate**: Metadata displayed on product page; empty gracefully when not available
|
|
- **Dependencies**: Task 2
|
|
|
|
### [ ] Task 6: Batch re-extract metadata for existing products
|
|
|
|
- **File**: `backend/app/api/routers/admin.py`
|
|
- **What**: Add a "Re-extract Rich Metadata" button in System Tools that queues a Celery task to re-process all completed STEP files and populate `cad_metadata` for all products.
|
|
- **Acceptance gate**: Button triggers batch job; existing products get metadata populated
|
|
- **Dependencies**: Tasks 1, 3
|
|
|
|
## Migration Check
|
|
|
|
**Yes** — one new JSONB column on `products` table.
|
|
|
|
## Order Recommendation
|
|
|
|
1. Task 1 (extraction logic) + Task 2 (model + migration) — parallel
|
|
2. Task 3 (wire up in pipeline)
|
|
3. Task 4 (AI agent) + Task 5 (frontend) — parallel
|
|
4. Task 6 (batch re-extract)
|
|
|
|
## Risks / Open Questions
|
|
|
|
1. **Volume calculation accuracy**: OCC `BRepGProp` computes exact B-rep volume, not mesh-based. This is accurate but can be slow for very complex shapes. Cap at 5s per file.
|
|
|
|
2. **Performance**: Rich metadata extraction adds ~100-500ms per STEP file. This is acceptable since STEP processing already takes 1-5s.
|
|
|
|
3. **Existing products**: ~45 products with STEP files need backfill. Task 6 handles this.
|
|
|
|
4. **Triangle count varies**: Depends on tessellation settings (deflection angles). Store the count at the current tessellation quality for reference, with a note that it's approximate.
|