Files
CapaKraken/docs/dispo-import-implementation.md
T
Hartmut cd78f72f33 chore: full technical rename planarchy → capakraken
Complete rename of all technical identifiers across the codebase:

Package names (11 packages):
- @planarchy/* → @capakraken/* in all package.json, tsconfig, imports

Import statements: 277 files, 548 occurrences replaced

Database & Docker:
- PostgreSQL user/db: planarchy → capakraken
- Docker volumes: planarchy_pgdata → capakraken_pgdata
- Connection strings updated in docker-compose, .env, CI

CI/CD:
- GitHub Actions workflow: all filter commands updated
- Test database credentials updated

Infrastructure:
- Redis channel: planarchy:sse → capakraken:sse
- Logger service name: planarchy-api → capakraken-api
- Anonymization seed updated
- Start/stop/restart scripts updated

Test data:
- Seed emails: @planarchy.dev → @capakraken.dev
- E2E test credentials: all 11 spec files updated
- Email defaults: @planarchy.app → @capakraken.app
- localStorage keys: planarchy_* → capakraken_*

Documentation: 30+ .md files updated

Verification:
- pnpm install: workspace resolution works
- TypeScript: only pre-existing TS2589 (no new errors)
- Engine: 310/310 tests pass
- Staffing: 37/37 tests pass

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-03-27 13:18:09 +01:00

637 lines
28 KiB
Markdown

# Dispo Import Implementation
**Date:** 2026-03-14
**Purpose:** Canonical implementation document for replacing the current Planarchy planning dataset with a clean-slate import from the Dispo v2 Excel workbooks.
## Scope
This document defines how Planarchy should ingest and normalize the following source workbooks:
- `/samples/Dispov2/MandatoryDispoCategories_V3.xlsx`
- `/samples/Dispov2/DISPO_2026.xlsx`
- `/samples/Dispov2/20260309_Bi-Weekly_Chargeability_Reporting_Content_Production_V0.943_4Hartmut.xlsx`
- `/samples/Dispov2/MV_DispoRoster.xlsx`
- `/samples/Dispov2/Resource Roster_MASTER_FY26_CJ_20251201.xlsx`
The goal is not a raw workbook archive. The goal is a normalized Planarchy dataset that:
- wipes existing database data and starts from a clean baseline
- imports canonical reference data first
- stages operational Dispo data before commit
- creates real projects, assignments, vacations, availability rules, and reporting inputs
- does not create fake bookings for unassigned time
- handles public holidays through the vacation planner
## Agreed Business Rules
These rules are fixed unless superseded by a later decision record:
- There is only one canonical person identifier. `EID` and `Enterprise ID` are the same source identity.
- Existing database data should be wiped completely before the new import.
- `[BMW]` is the client token.
- `[11035763]` is the stable WBS/project key.
- `{CH80}` means utilization category `CH` and `winProbability = 80`.
- `_HB` and `_SB` suffixes can be ignored.
- `TBD` means unassigned time. It must not create a fake project or fake booking.
- `[tbd]` means unresolved project identity and should remain staged until resolved.
- `2D` and `3D` map to chapter `Digital Content Production`.
- `PM` maps to chapter `Project Management`.
- `AD` maps to chapter `Art Direction`.
- People booked on projects need the corresponding project role assigned on the booking.
- Internal work categories should be mapped as normalized internal project/utilization buckets.
- Public holidays should be modeled properly in the vacation planner.
- Project start and end dates should be inferred from earliest and latest imported assignment dates.
- Assignment granularity should normalize 50% to 4 hours and 100% to 8 hours.
- Part-time import must reduce weekday availability on the days the person is not available.
## Source Workbook Roles
### 1. `MandatoryDispoCategories_V3.xlsx`
Use as the source of:
- reference/master-data vocabulary
- client and project attribute glossary
- calendar and SAH rules
- validation guidance for resource attributes
Do not treat it as the primary source of transactional planning rows.
### 2. `DISPO_2026.xlsx`
Use as the primary source of:
- operational planning matrix
- project bookings
- internal work bookings
- absences
- public-holiday source hints
- part-time hints
- unassigned capacity
This workbook requires a parser. It is not directly importable as row-based CRUD input.
### 3. `Bi-Weekly Chargeability Reporting...xlsx`
Use as the source of:
- target and forecast reconciliation
- resource enrichment when missing elsewhere
- aggregate validation after commit
Do not treat PTD/MTD/YTD outputs as canonical source-of-truth records when Planarchy can derive them from normalized data.
### 4. `MV_DispoRoster.xlsx`
Use as the source of:
- operational resource master rows keyed by `EID`
- `SAP_data` enrichment for real display names and email addresses
- chapter, department, client-unit, FTE, and activity-window enrichment
- pseudo-demand row filtering before any resource commit
Do not import `Demand_*` rows as real resources.
## Verified Source Constraint
The workbook set now includes both a row-based resource master source and a cost-rate source.
What is actually present:
- `MandatoryDispoCategories_V3.xlsx` `EID-Attr` is a glossary/reference sheet
- `Bi-Weekly Chargeability Reporting...xlsx` provides roster identity, FTE, management group, chapter, metro city, and client unit
- `MV_DispoRoster.xlsx` `DispoRoster` provides operational roster rows and `SAP_data` provides real email/display-name coverage for many resources
- `Resource Roster_MASTER_FY26_CJ_20251201.xlsx` provides per-person `LCR` / `UCR` rows plus management-level averages for fallback resolution
- `DISPO_2026.xlsx` provides transactional planning, vacation, holiday, and part-time signals
Implementation consequence:
- staging can be completed from the current workbook set
- strict source-data blockers are cleared for the supplied workbook set
- resources missing a source email are imported with fallback email `<EID>@accenture.com`
- `demand_*` planning identities are ignored during staging and readiness checks
- unresolved `[tbd]` project rows still remain staged by design and must not be auto-committed as final projects
## Current Implementation Status
Implemented in the application layer:
- reference workbook parser and stager
- chargeability parser and stager
- planning parser and stager
- project resolution staging
- roster parser and stager with `DispoRoster` + `SAP_data` merge logic
- cost-rate parser and roster-rate merge logic with exact and management-level fallback resolution
- readiness assessment over merged resource sources
- batch staging orchestration across all current workbook inputs
Remaining import blocker for a strict production-grade commit:
- unresolved `[tbd]` project references that are intentionally kept out of final project commit
## Target Domain Model
The import commits into the existing planning model:
- `Country`
- `MetroCity`
- `OrgUnit`
- `UtilizationCategory`
- `Client`
- `ManagementLevelGroup`
- `ManagementLevel`
- `Role`
- `Resource`
- `ResourceRole`
- `Project`
- `DemandRequirement`
- `Assignment`
- `Vacation`
- `VacationEntitlement`
Relevant current schema anchors:
- [schema.prisma](/home/hartmut/Documents/Copilot/planarchy/packages/db/prisma/schema.prisma#L178)
- [schema.prisma](/home/hartmut/Documents/Copilot/planarchy/packages/db/prisma/schema.prisma#L235)
- [schema.prisma](/home/hartmut/Documents/Copilot/planarchy/packages/db/prisma/schema.prisma#L334)
- [schema.prisma](/home/hartmut/Documents/Copilot/planarchy/packages/db/prisma/schema.prisma#L372)
- [schema.prisma](/home/hartmut/Documents/Copilot/planarchy/packages/db/prisma/schema.prisma#L460)
- [schema.prisma](/home/hartmut/Documents/Copilot/planarchy/packages/db/prisma/schema.prisma#L754)
- [schema.prisma](/home/hartmut/Documents/Copilot/planarchy/packages/db/prisma/schema.prisma#L780)
- [schema.prisma](/home/hartmut/Documents/Copilot/planarchy/packages/db/prisma/schema.prisma#L815)
## Required Implementation Changes
### 1. Canonical Person Identity
Planarchy currently stores both `eid` and `enterpriseId` on `Resource`. The import should operate on a single canonical identity.
Recommendation:
- choose `enterpriseId` as the canonical external person key
- keep `eid` synchronized to the same value during transition, or remove its operational significance in the import path
- reject imports that produce conflicting person rows for the same canonical identity
This avoids duplicate matching logic throughout staging and commit.
### 2. Staging Layer
Add dedicated staging tables or equivalent durable import records. The staging layer is required.
Recommended staging entities:
- `ImportBatch`
- `StagedResource`
- `StagedClient`
- `StagedProject`
- `StagedAssignment`
- `StagedVacation`
- `StagedAvailabilityRule`
- `StagedUnresolvedRecord`
Each staged record should retain:
- source workbook name
- sheet name
- row number
- original raw value
- normalized parsed fields
- parser warnings
- resolution status
The staging layer is the review boundary between workbook parsing and final commit.
### 3. Clean-Slate Reset
Before final import, wipe the existing database contents.
Recommended reset scope:
- business data
- auth/session data
- derived snapshots
- previous import artifacts
After reset, reseed immediately:
- admin user
- access roles and permissions required to sign in
- essential platform defaults
- import reference seed data only when not supplied from the workbooks
Implementation note:
- take a full backup before reset
- make the reset command idempotent in non-production development environments
- require an explicit force flag in scripts
## Two-Step Import Flow
### Step A. Stage
### Goals
- parse all source files
- normalize tokens
- build deterministic staging rows
- surface conflicts before any planning data is committed
### Responsibilities
1. ingest workbook files
2. parse reference/master data
3. parse planning matrix cells into typed staging records
4. detect conflicts and unresolved cases
5. generate a review report
### Stage Output
- staged resources
- staged roles
- staged chapters/org mapping
- staged clients
- staged projects
- staged assignments
- staged vacations/public holidays
- staged part-time availability rules
- unresolved `[tbd]` project references
- unresolved identity conflicts
### Step B. Commit
### Goals
- write only validated data
- preserve traceability to staging
- block unresolved project identities
### Responsibilities
1. create reference data
2. create resources and roles
3. create projects and internal buckets
4. create assignments
5. create vacations/public holidays
6. apply availability overrides
7. run reconciliation against chargeability workbook aggregates
### Commit Rules
- unresolved `[tbd]` project rows must not silently create final projects
- unassigned time must not create assignments
- weekend markers must not create vacation rows
- all project bookings must resolve to a role and utilization category
- all resources must resolve to a canonical person ID
## Field Mapping
### Reference Data Mapping
| Source | Field/Token | Target | Notes |
| ---------------------------------- | ------------------------ | --------------------------------------------------------------------- | -------------------------------------------- |
| `MandatoryDispoCategories_V3.xlsx` | `Country/Territory` | `Country` | source for country master data and SAH rules |
| `MandatoryDispoCategories_V3.xlsx` | `Metro City` | `MetroCity` | child of country |
| `MandatoryDispoCategories_V3.xlsx` | `Chapter` / org labels | `OrgUnit` and `Resource.chapter` | normalize level mapping during stage |
| `MandatoryDispoCategories_V3.xlsx` | `Management Level Group` | `ManagementLevelGroup` | includes target percentage reference |
| `MandatoryDispoCategories_V3.xlsx` | `Management Level` | `ManagementLevel` | child of management level group |
| `MandatoryDispoCategories_V3.xlsx` | `WBS Master Client` | `Client` parent | master client node |
| `MandatoryDispoCategories_V3.xlsx` | `WBS Client Name` | `Client` child | legal/sub-client node |
| `MandatoryDispoCategories_V3.xlsx` | SAH rules | `Country.dailyWorkingHours`, schedule logic, holiday generation rules | not all rules map 1:1 yet |
### Resource Mapping
| Source | Field | Target | Notes |
| ----------------------------------------- | ------------------------- | ----------------------------------------------------- | ------------------------------------------------------ |
| `EID-Attr`, `ChgFC`, `Dispo` row metadata | `Enterprise ID` / `EID` | canonical resource key | one identity only |
| `ChgFC` | `FTE` | `Resource.fte` | baseline contract capacity |
| `ChgFC` | `Management Level Group` | `Resource.managementLevelGroupId` | reference lookup |
| `EID-Attr` | `Management Level` | `Resource.managementLevelId` | reference lookup |
| `ChgFC` | `Metro City` | `Resource.metroCityId` | resolve via master data |
| `EID-Attr` | `Country/Territory` | `Resource.countryId` | resolve via master data |
| `ChgFC` | `MV Org Unit 1 / Chapter` | `Resource.chapter` / `Resource.orgUnitId` | normalize by agreed mapping |
| `EID-Attr` | `Unit (Client Unit)` | `Resource.clientUnitId` | lookup to client tree if modeled there |
| `EID-Attr` | `Ressource Type` | `Resource.resourceType` | enum mapping required |
| `EID-Attr` | `LCR` | `Resource.lcrCents` | normalize currency to cents |
| `EID-Attr` | `UCR` | `Resource.ucrCents` | normalize currency to cents |
| `ChgFC` | `Target (per Level)` | `Resource.chargeabilityTarget` or group target source | use management group target as primary when consistent |
Rate resolution during staging:
- `Resource Roster_MASTER_FY26_CJ_20251201.xlsx` is the primary cost-rate source.
- Exact `EID`/enterprise matches populate `lcrCents` and `ucrCents` directly.
- If a person is missing in the cost workbook, the importer falls back to same-management-level averages from that workbook.
- Only resources with neither an exact row nor a usable management-level fallback remain readiness blockers for LCR/UCR.
### Chapter and Role Mapping
Resource chapter mapping:
| Token | Resource chapter |
| ----- | ---------------------------- |
| `2D` | `Digital Content Production` |
| `3D` | `Digital Content Production` |
| `PM` | `Project Management` |
| `AD` | `Art Direction` |
Assignment role mapping:
| Token | Assignment role |
| ----- | ----------------- |
| `2D` | `2D Artist` |
| `3D` | `3D Artist` |
| `PM` | `Project Manager` |
| `AD` | `Art Director` |
Implementation guidance:
- chapter is a resource-organizational dimension
- role is an assignment-level delivery dimension
- assign both when both are known
### Project Mapping
| Source token | Target | Notes |
| ------------------------ | -------------------------------------------------------------- | ------------------------------------------ |
| `[BMW]` | `Project.clientId` | resolve to client |
| `[11035763]` | stable WBS key | recommended source for `Project.shortCode` |
| `{CH80}` | `Project.utilizationCategoryId`, `Project.winProbability = 80` | parser splits prefix and numeric suffix |
| project text body | `Project.name` | cleaned of helper suffixes |
| earliest assignment date | `Project.startDate` | inferred |
| latest assignment date | `Project.endDate` | inferred |
| `[tbd]` | unresolved staged project | do not auto-commit as final project |
### Utilization and Planning Token Mapping
| Token family | Meaning | Commit target |
| ------------ | ---------------------------------- | --------------------------------------------------------------------- |
| `CH` | chargeable client work | project assignment |
| `MO` | management and operations | internal project/bucket assignment |
| `MD` | market development / initiative | internal project/bucket assignment |
| `PD` | personal development / recruitment | internal project/bucket assignment |
| `AB` | absence | vacation row |
| `NA` | non-bookable / calendar state | usually availability or derived calendar rule, not project assignment |
| `UN` | unassigned | no assignment commit |
### Assignment Mapping
Assignments should be written only when a project or internal bucket is resolved.
| Source | Target | Notes |
| --------------------- | --------------------------------------- | --------------------------------------------------------- |
| project booking cell | `Assignment` | resource, project, role, start/end, hours/day, percentage |
| internal `MO` cell | `Assignment` to internal project/bucket | utilization category `M&O` |
| internal `MD` cell | `Assignment` to internal project/bucket | utilization category `MD&I` |
| internal `PD` cell | `Assignment` to internal project/bucket | utilization category `PD&R` |
| `[_UN] unassign {UN}` | no final assignment | capacity remains free |
### Vacation and Holiday Mapping
| Source token | Target | Notes |
| ------------------------------- | ------------------------------- | -------------------------------------------------------- |
| `[_AB] ... {AB}` | `Vacation` | approved absence row |
| `[_NA] Public Holiday ... {NA}` | `Vacation(type=PUBLIC_HOLIDAY)` | preferred source of truth is geography-driven generation |
| `[_NA] Weekend {NA}` | no vacation row | derive from calendar |
Public holiday implementation should integrate with the existing vacation planner and batch holiday support in [vacation.ts](/home/hartmut/Documents/Copilot/planarchy/packages/api/src/router/vacation.ts#L425).
### Availability and Part-Time Mapping
Part-time markers must alter resource availability instead of creating fake bookings.
Rules:
- `100%` maps to 8 available hours on a standard full working day
- `50%` maps to 4 available hours on a standard full working day
- part-time markers should reduce weekday availability on non-working or reduced-working days
Recommended commit target:
- `Resource.fte` remains the contractual baseline
- `Resource.availability` stores actual weekday available hours
Examples:
- 75% with one day fully off per week:
- availability sets one weekday to `0`
- 80% with reduced time across five weekdays:
- availability distributes reduced hours across weekdays
If the workbook does not encode the weekday pattern explicitly:
- stage the record as `availabilityPattern = unresolved`
- commit only the `fte`
- require manual review before applying weekday reductions
## Parser Design
The `DISPO_2026.xlsx` parser should operate at calendar-slot level and emit normalized staging records.
### Parser Inputs
- workbook name
- sheet name
- row identity
- day/column identity
- source token text
- resource roster metadata from the row
### Parser Outputs
- normalized booking type
- resource canonical ID
- date
- slot fraction
- hours per day
- percentage
- client token
- WBS
- utilization category
- win probability
- role token
- chapter token
- ignore flags
- unresolved reason when parsing fails
### Token Normalization Rules
1. strip ignored suffixes like `_HB`, `_SB`
2. extract client from `[CLIENT]`
3. extract WBS from `[12345678]` when numeric
4. extract unresolved project marker from `[tbd]`
5. extract utilization and win probability from `{CH80}`
6. extract role prefix from leading `2D`, `3D`, `PM`, `AD`
7. classify special bracket tokens `[_AB]`, `[_NA]`, `[_UN]`, `[_MO]`, `[_MD]`, `[_PD]`
## Internal Project Strategy
Internal work should not be left as free text.
Recommendation:
- seed canonical internal projects or internal planning buckets for:
- `M&O`
- `MD&I`
- `PD&R`
- assign them consistent utilization categories
- keep source token text in assignment metadata for traceability
This allows planning, reporting, and chargeability to remain normalized.
## `[tbd]` Resolution Strategy
`[tbd]` is not a valid final project identity.
Recommended behavior:
- stage as unresolved project demand
- do not auto-create final project rows during commit
- allow reviewer resolution by:
- matching to existing WBS
- creating a new real project
- converting the row into intentional unassigned capacity
- converting the row into a `DemandRequirement` when a role need is known but no confirmed project exists
This is the only area where a manual review gate is required by default.
## Public Holiday Strategy
Public holidays should be driven by geography and stored in the vacation planner.
Recommended approach:
1. resolve each resource to country and, where relevant, metro city/federal state
2. generate public holidays for the applicable calendar year
3. commit them as approved `Vacation(type=PUBLIC_HOLIDAY)` rows
4. reconcile workbook holiday markers against generated rows
Known implementation gap:
- the chargeability forecast currently passes an empty `publicHolidays` list into SAH calculation in [chargeability-report.ts](/home/hartmut/Documents/Copilot/planarchy/packages/api/src/router/chargeability-report.ts#L167)
Required follow-up:
- make forecast logic consume public-holiday vacation rows or geography-derived holiday dates
## Reconciliation and Acceptance Checks
After commit, run reconciliation against the chargeability workbook.
Required checks:
- resource count matches expected staged resource count
- FTE totals by org unit match workbook aggregates within tolerance
- chargeability targets by management group match expected values
- project/client/WBS relationships are complete for all committed assignments
- unassigned capacity is visible only as free capacity, not as fake bookings
- vacations and public holidays are visible in the vacation planner
- part-time resources show reduced weekday availability where resolved
## Suggested Implementation Order
1. add import staging schema and storage
2. add full reset + reseed command
3. add reference-data importer for `MandatoryDispoCategories_V3.xlsx`
4. add canonical resource importer and identity normalization
5. add role and chapter normalization
6. add `DISPO_2026.xlsx` parser
7. add staged project resolver for WBS and `[tbd]`
8. add assignment commit flow
9. add vacation and public-holiday import flow
10. add part-time availability overlay logic
11. add reconciliation report against `ChgFC` and aggregate sheets
12. add rollback/replay support for repeated dry runs
## Operator Commands
For the clean-slate import workflow, use dedicated DB scripts instead of ad-hoc SQL:
- reset and bootstrap a disposable environment:
- `pnpm --filter @capakraken/db db:reset:dispo -- --force`
- reset without `pg_dump` backup only in an intentionally disposable environment:
- `pnpm --filter @capakraken/db db:reset:dispo -- --force --skip-backup`
- seed Dispo v2 reference vocabulary after reset:
- `pnpm --filter @capakraken/db db:seed:dispo-v2`
The reset command:
- backs up the database with `pg_dump` by default
- truncates all public tables except Prisma migration history
- recreates a bootstrap admin user and baseline `SystemSettings`
- leaves workbook-derived reference and transactional data to the import pipeline
## Decision Log
The following decisions are locked for implementation and should be treated as the canonical baseline for all downstream tickets.
### D1. Canonical Resource Identity
- `enterpriseId` is the canonical external person key for all staged and committed import logic.
- `eid` remains in the schema as a compatibility alias during the transition period.
- the importer must mirror the canonical identity into both `enterpriseId` and `eid` on committed `Resource` rows unless a later cleanup ticket removes `eid`
- conflicting source rows for the same canonical identity must remain staged and unresolved
### D2. Reset And Bootstrap Scope
- the clean-slate reset wipes business data, auth/session data, notifications, audit logs, estimate data, and previous import artifacts
- the reset must reseed a bootstrap admin user, baseline platform permissions, and required singleton settings immediately after wipe
- imported workbook data is not the source of truth for platform access bootstrapping
- reset remains an explicit operator action with backup-first workflow
### D3. `[tbd]` Commit Policy
- `[tbd]` rows never auto-create final `Project` rows
- `[tbd]` rows remain staged and unresolved by default
- a reviewer may explicitly resolve a `[tbd]` row into:
- an existing real project
- a newly created real project
- intentional unassigned capacity
- a `DemandRequirement`
- automatic commit logic must skip unresolved `[tbd]` rows
### D4. Default Part-Time Fallback
- when the workbook encodes an explicit weekday pattern, the importer must commit that exact reduced availability pattern
- when only an FTE/percentage is known, the importer should distribute the reduced weekly capacity evenly across Monday to Friday
- any such fallback must be marked with a staging warning so operators can refine the weekday pattern later
- examples:
- `100%` => `8/8/8/8/8`
- `50%` => `4/4/4/4/4`
- `80%` => `6.4/6.4/6.4/6.4/6.4`
### D5. Internal Project Buckets
Seed canonical internal planning projects/buckets as follows:
| Source token | Project short code | Project name | Utilization category |
| ------------ | ------------------ | ---------------------------------- | -------------------- |
| `MO` | `INT-MO` | `Management & Operations` | `M&O` |
| `MD` | `INT-MD` | `Market Development & Initiatives` | `MD&I` |
| `PD` | `INT-PD` | `People Development & Recruitment` | `PD&R` |
The importer should preserve the original source token in staged and committed metadata for traceability.
### D6. Seeded Role Vocabulary
The baseline role seed list required for Dispo import is:
- `2D Artist`
- `3D Artist`
- `Project Manager`
- `Art Director`
These roles are assignment-level delivery roles. They do not replace chapter/org ownership on the resource profile.
## Recommendation
Proceed with a staged importer and treat this document as the implementation baseline.
The critical architectural choices are:
- clean-slate reset
- canonical single-ID resource matching
- staged parsing before commit
- no fake unassigned bookings
- explicit assignment roles separate from resource chapters
- geography-driven public holidays in the vacation planner
- manual resolution gate only for `[tbd]` and unresolved availability patterns