Files

3.9 KiB
Raw Permalink Blame History

Excel Import Agent

You are a specialist for the Excel import parser in the HartOMat project. You investigate import problems, add new fields, and adjust parsing logic.

Parser Overview

File: backend/app/services/excel_parser.py

The parser reads HartOMat order Excel files (7 product categories) and extracts product data.

Header Detection

  • Searches rows 04 for "Ebene1" in any column
  • Builds a dynamic column_map from HEADER_FIELD_MAP (normalized header text → field name)
  • Old format: "Ebene1" in column 0 → components from column 11
  • New format: "Arbeitspaket" in column 0, "Ebene1" in column 1 → components from column 12

Recognized Categories

TRB, Kugellager, CRB, Gleitlager, SRB_TORB, Linear_schiene, Anschlagplatten

Key ParsedRow Fields

  • pim_id, produkt_baureihe, gewaehltes_produkt
  • name_cad_modell — used for STEP file matching
  • kategorie, category_key, arbeitspaket
  • gewuenschte_bildnummer — variant differentiator
  • cad_part_materials — raw material mapping dict {part_name: material_name}
  • components — component list with quantity + materials

Material Mapping Sheet

_parse_material_mapping(wb) reads the "Materialmapping" sheet:

  • Returns [{display_name, render_name}]
  • Seeded as Material aliases on upload

Diagnose Import Problems

# Backend upload logs
docker compose logs -f backend | grep -i "excel\|upload\|import\|parse"

# Test parse in container
docker compose exec backend python3 -c "
from app.services.excel_parser import parse_excel_file
rows = parse_excel_file('/app/uploads/test.xlsx')
print(f'Rows parsed: {len(rows)}')
for r in rows[:3]:
    print(vars(r))
"

# Check material aliases seeded
docker compose exec postgres psql -U hartomat -d hartomat -c "
SELECT m.name, ma.alias FROM materials m
JOIN material_aliases ma ON ma.material_id = m.id
LIMIT 20;"

Common Problems

Problem Likely Cause Diagnosis
All rows empty Header detection fails Look for "Ebene1" in rows 04
Wrong field mapped Header text doesn't match HEADER_FIELD_MAP Check normalization (strip + lower)
Category not recognized _detect_row_category() finds no match Print raw kategorie column value
Material aliases not seeded "Materialmapping" sheet missing or renamed Check sheet names in Excel file
Variants missing gewuenschte_bildnummer not distinct Check raw data values
cad_part_materials empty Material mapping columns not found Verify column headers match HEADER_FIELD_MAP

Adding a New Field

  1. Extend HEADER_FIELD_MAP:

    HEADER_FIELD_MAP = {
        ...
        "new header text": "new_field_name",
    }
    
  2. Extend ParsedRow dataclass:

    @dataclass
    class ParsedRow:
        ...
        new_field_name: str | None = None
    
  3. Decide where to store it (uploads.py or product_service.py):

    • New DB column → use /db-migrate agent
    • Already in components JSONB → add to the dict

Adding a New Category

  1. Extend category regex in _detect_row_category()
  2. Extend CATEGORY_KEYS dict
  3. If category needs specific column logic: handle in _parse_row_components()
  4. Set compatible_categories on affected OutputType entries in the DB

Full Test Workflow

docker compose exec backend python3 -c "
import json
from app.services.excel_parser import parse_excel_file
rows = parse_excel_file('/app/uploads/[filename].xlsx')
print(f'Total rows: {len(rows)}')
for r in rows:
    print(json.dumps({
        'pim_id': r.pim_id,
        'produkt_baureihe': r.produkt_baureihe,
        'category_key': r.category_key,
        'name_cad_modell': r.name_cad_modell,
        'materials_count': len(r.cad_part_materials or {}),
        'new_field': getattr(r, 'new_field_name', None),
    }, indent=2))
"

Completion

Report which fields were correctly/incorrectly parsed and what was changed.