CapaKraken/docs/dispo-import-implementation.md

# Dispo Import Implementation

**Date:** 2026-03-14
**Purpose:** Canonical implementation document for replacing the current Planarchy planning dataset with a clean-slate import from the Dispo v2 Excel workbooks.

## Scope

This document defines how Planarchy should ingest and normalize the following source workbooks:

- `/samples/Dispov2/MandatoryDispoCategories_V3.xlsx`
- `/samples/Dispov2/DISPO_2026.xlsx`
- `/samples/Dispov2/20260309_Bi-Weekly_Chargeability_Reporting_Content_Production_V0.943_4Hartmut.xlsx`
- `/samples/Dispov2/MV_DispoRoster.xlsx`
- `/samples/Dispov2/Resource Roster_MASTER_FY26_CJ_20251201.xlsx`

The goal is not a raw workbook archive. The goal is a normalized Planarchy dataset that:

- wipes existing database data and starts from a clean baseline
- imports canonical reference data first
- stages operational Dispo data before commit
- creates real projects, assignments, vacations, availability rules, and reporting inputs
- does not create fake bookings for unassigned time
- handles public holidays through the vacation planner

## Agreed Business Rules

These rules are fixed unless superseded by a later decision record:

- There is only one canonical person identifier. `EID` and `Enterprise ID` are the same source identity.
- Existing database data should be wiped completely before the new import.
- `[BMW]` is the client token.
- `[11035763]` is the stable WBS/project key.
- `{CH80}` means utilization category `CH` and `winProbability = 80`.
- `_HB` and `_SB` suffixes can be ignored.
- `TBD` means unassigned time. It must not create a fake project or fake booking.
- `[tbd]` means unresolved project identity and should remain staged until resolved.
- `2D` and `3D` map to chapter `Digital Content Production`.
- `PM` maps to chapter `Project Management`.
- `AD` maps to chapter `Art Direction`.
- People booked on projects need the corresponding project role assigned on the booking.
- Internal work categories should be mapped as normalized internal project/utilization buckets.
- Public holidays should be modeled properly in the vacation planner.
- Project start and end dates should be inferred from earliest and latest imported assignment dates.
- Assignment granularity should normalize 50% to 4 hours and 100% to 8 hours.
- Part-time import must reduce weekday availability on the days the person is not available.

## Source Workbook Roles

### 1. `MandatoryDispoCategories_V3.xlsx`

Use as the source of:

- reference/master-data vocabulary
- client and project attribute glossary
- calendar and SAH rules
- validation guidance for resource attributes

Do not treat it as the primary source of transactional planning rows.

### 2. `DISPO_2026.xlsx`

Use as the primary source of:

- operational planning matrix
- project bookings
- internal work bookings
- absences
- public-holiday source hints
- part-time hints
- unassigned capacity

This workbook requires a parser. It is not directly importable as row-based CRUD input.

### 3. `Bi-Weekly Chargeability Reporting...xlsx`

Use as the source of:

- target and forecast reconciliation
- resource enrichment when missing elsewhere
- aggregate validation after commit

Do not treat PTD/MTD/YTD outputs as canonical source-of-truth records when Planarchy can derive them from normalized data.

### 4. `MV_DispoRoster.xlsx`

Use as the source of:

- operational resource master rows keyed by `EID`
- `SAP_data` enrichment for real display names and email addresses
- chapter, department, client-unit, FTE, and activity-window enrichment
- pseudo-demand row filtering before any resource commit

Do not import `Demand_*` rows as real resources.

## Verified Source Constraint

The workbook set now includes both a row-based resource master source and a cost-rate source.

What is actually present:

- `MandatoryDispoCategories_V3.xlsx` `EID-Attr` is a glossary/reference sheet
- `Bi-Weekly Chargeability Reporting...xlsx` provides roster identity, FTE, management group, chapter, metro city, and client unit
- `MV_DispoRoster.xlsx` `DispoRoster` provides operational roster rows and `SAP_data` provides real email/display-name coverage for many resources
- `Resource Roster_MASTER_FY26_CJ_20251201.xlsx` provides per-person `LCR` / `UCR` rows plus management-level averages for fallback resolution
- `DISPO_2026.xlsx` provides transactional planning, vacation, holiday, and part-time signals

Implementation consequence:

- staging can be completed from the current workbook set
- strict source-data blockers are cleared for the supplied workbook set
- resources missing a source email are imported with fallback email `<EID>@accenture.com`
- `demand_*` planning identities are ignored during staging and readiness checks
- unresolved `[tbd]` project rows still remain staged by design and must not be auto-committed as final projects

## Current Implementation Status

Implemented in the application layer:

- reference workbook parser and stager
- chargeability parser and stager
- planning parser and stager
- project resolution staging
- roster parser and stager with `DispoRoster` + `SAP_data` merge logic
- cost-rate parser and roster-rate merge logic with exact and management-level fallback resolution
- readiness assessment over merged resource sources
- batch staging orchestration across all current workbook inputs

Remaining import blocker for a strict production-grade commit:

- unresolved `[tbd]` project references that are intentionally kept out of final project commit

## Target Domain Model

The import commits into the existing planning model:

- `Country`
- `MetroCity`
- `OrgUnit`
- `UtilizationCategory`
- `Client`
- `ManagementLevelGroup`
- `ManagementLevel`
- `Role`
- `Resource`
- `ResourceRole`
- `Project`
- `DemandRequirement`
- `Assignment`
- `Vacation`
- `VacationEntitlement`

Relevant current schema anchors:

- [schema.prisma](/home/hartmut/Documents/Copilot/planarchy/packages/db/prisma/schema.prisma#L178)
- [schema.prisma](/home/hartmut/Documents/Copilot/planarchy/packages/db/prisma/schema.prisma#L235)
- [schema.prisma](/home/hartmut/Documents/Copilot/planarchy/packages/db/prisma/schema.prisma#L334)
- [schema.prisma](/home/hartmut/Documents/Copilot/planarchy/packages/db/prisma/schema.prisma#L372)
- [schema.prisma](/home/hartmut/Documents/Copilot/planarchy/packages/db/prisma/schema.prisma#L460)
- [schema.prisma](/home/hartmut/Documents/Copilot/planarchy/packages/db/prisma/schema.prisma#L754)
- [schema.prisma](/home/hartmut/Documents/Copilot/planarchy/packages/db/prisma/schema.prisma#L780)
- [schema.prisma](/home/hartmut/Documents/Copilot/planarchy/packages/db/prisma/schema.prisma#L815)

## Required Implementation Changes

### 1. Canonical Person Identity

Planarchy currently stores both `eid` and `enterpriseId` on `Resource`. The import should operate on a single canonical identity.

Recommendation:

- choose `enterpriseId` as the canonical external person key
- keep `eid` synchronized to the same value during transition, or remove its operational significance in the import path
- reject imports that produce conflicting person rows for the same canonical identity

This avoids duplicate matching logic throughout staging and commit.

### 2. Staging Layer

Add dedicated staging tables or equivalent durable import records. The staging layer is required.

Recommended staging entities:

- `ImportBatch`
- `StagedResource`
- `StagedClient`
- `StagedProject`
- `StagedAssignment`
- `StagedVacation`
- `StagedAvailabilityRule`
- `StagedUnresolvedRecord`

Each staged record should retain:

- source workbook name
- sheet name
- row number
- original raw value
- normalized parsed fields
- parser warnings
- resolution status

The staging layer is the review boundary between workbook parsing and final commit.

### 3. Clean-Slate Reset

Before final import, wipe the existing database contents.

Recommended reset scope:

- business data
- auth/session data
- derived snapshots
- previous import artifacts

After reset, reseed immediately:

- admin user
- access roles and permissions required to sign in
- essential platform defaults
- import reference seed data only when not supplied from the workbooks

Implementation note:

- take a full backup before reset
- make the reset command idempotent in non-production development environments
- require an explicit force flag in scripts

## Two-Step Import Flow

### Step A. Stage

### Goals

- parse all source files
- normalize tokens
- build deterministic staging rows
- surface conflicts before any planning data is committed

### Responsibilities

1. ingest workbook files
2. parse reference/master data
3. parse planning matrix cells into typed staging records
4. detect conflicts and unresolved cases
5. generate a review report

### Stage Output

- staged resources
- staged roles
- staged chapters/org mapping
- staged clients
- staged projects
- staged assignments
- staged vacations/public holidays
- staged part-time availability rules
- unresolved `[tbd]` project references
- unresolved identity conflicts

### Step B. Commit

### Goals

- write only validated data
- preserve traceability to staging
- block unresolved project identities

### Responsibilities

1. create reference data
2. create resources and roles
3. create projects and internal buckets
4. create assignments
5. create vacations/public holidays
6. apply availability overrides
7. run reconciliation against chargeability workbook aggregates

### Commit Rules

- unresolved `[tbd]` project rows must not silently create final projects
- unassigned time must not create assignments
- weekend markers must not create vacation rows
- all project bookings must resolve to a role and utilization category
- all resources must resolve to a canonical person ID

## Field Mapping

### Reference Data Mapping

| Source                             | Field/Token              | Target                                                                | Notes                                        |
| ---------------------------------- | ------------------------ | --------------------------------------------------------------------- | -------------------------------------------- |
| `MandatoryDispoCategories_V3.xlsx` | `Country/Territory`      | `Country`                                                             | source for country master data and SAH rules |
| `MandatoryDispoCategories_V3.xlsx` | `Metro City`             | `MetroCity`                                                           | child of country                             |
| `MandatoryDispoCategories_V3.xlsx` | `Chapter` / org labels   | `OrgUnit` and `Resource.chapter`                                      | normalize level mapping during stage         |
| `MandatoryDispoCategories_V3.xlsx` | `Management Level Group` | `ManagementLevelGroup`                                                | includes target percentage reference         |
| `MandatoryDispoCategories_V3.xlsx` | `Management Level`       | `ManagementLevel`                                                     | child of management level group              |
| `MandatoryDispoCategories_V3.xlsx` | `WBS Master Client`      | `Client` parent                                                       | master client node                           |
| `MandatoryDispoCategories_V3.xlsx` | `WBS Client Name`        | `Client` child                                                        | legal/sub-client node                        |
| `MandatoryDispoCategories_V3.xlsx` | SAH rules                | `Country.dailyWorkingHours`, schedule logic, holiday generation rules | not all rules map 1:1 yet                    |

### Resource Mapping

| Source                                    | Field                     | Target                                                | Notes                                                  |
| ----------------------------------------- | ------------------------- | ----------------------------------------------------- | ------------------------------------------------------ |
| `EID-Attr`, `ChgFC`, `Dispo` row metadata | `Enterprise ID` / `EID`   | canonical resource key                                | one identity only                                      |
| `ChgFC`                                   | `FTE`                     | `Resource.fte`                                        | baseline contract capacity                             |
| `ChgFC`                                   | `Management Level Group`  | `Resource.managementLevelGroupId`                     | reference lookup                                       |
| `EID-Attr`                                | `Management Level`        | `Resource.managementLevelId`                          | reference lookup                                       |
| `ChgFC`                                   | `Metro City`              | `Resource.metroCityId`                                | resolve via master data                                |
| `EID-Attr`                                | `Country/Territory`       | `Resource.countryId`                                  | resolve via master data                                |
| `ChgFC`                                   | `MV Org Unit 1 / Chapter` | `Resource.chapter` / `Resource.orgUnitId`             | normalize by agreed mapping                            |
| `EID-Attr`                                | `Unit (Client Unit)`      | `Resource.clientUnitId`                               | lookup to client tree if modeled there                 |
| `EID-Attr`                                | `Ressource Type`          | `Resource.resourceType`                               | enum mapping required                                  |
| `EID-Attr`                                | `LCR`                     | `Resource.lcrCents`                                   | normalize currency to cents                            |
| `EID-Attr`                                | `UCR`                     | `Resource.ucrCents`                                   | normalize currency to cents                            |
| `ChgFC`                                   | `Target (per Level)`      | `Resource.chargeabilityTarget` or group target source | use management group target as primary when consistent |

Rate resolution during staging:

- `Resource Roster_MASTER_FY26_CJ_20251201.xlsx` is the primary cost-rate source.
- Exact `EID`/enterprise matches populate `lcrCents` and `ucrCents` directly.
- If a person is missing in the cost workbook, the importer falls back to same-management-level averages from that workbook.
- Only resources with neither an exact row nor a usable management-level fallback remain readiness blockers for LCR/UCR.

### Chapter and Role Mapping

Resource chapter mapping:

| Token | Resource chapter             |
| ----- | ---------------------------- |
| `2D`  | `Digital Content Production` |
| `3D`  | `Digital Content Production` |
| `PM`  | `Project Management`         |
| `AD`  | `Art Direction`              |

Assignment role mapping:

| Token | Assignment role   |
| ----- | ----------------- |
| `2D`  | `2D Artist`       |
| `3D`  | `3D Artist`       |
| `PM`  | `Project Manager` |
| `AD`  | `Art Director`    |

Implementation guidance:

- chapter is a resource-organizational dimension
- role is an assignment-level delivery dimension
- assign both when both are known

### Project Mapping

| Source token             | Target                                                         | Notes                                      |
| ------------------------ | -------------------------------------------------------------- | ------------------------------------------ |
| `[BMW]`                  | `Project.clientId`                                             | resolve to client                          |
| `[11035763]`             | stable WBS key                                                 | recommended source for `Project.shortCode` |
| `{CH80}`                 | `Project.utilizationCategoryId`, `Project.winProbability = 80` | parser splits prefix and numeric suffix    |
| project text body        | `Project.name`                                                 | cleaned of helper suffixes                 |
| earliest assignment date | `Project.startDate`                                            | inferred                                   |
| latest assignment date   | `Project.endDate`                                              | inferred                                   |
| `[tbd]`                  | unresolved staged project                                      | do not auto-commit as final project        |

### Utilization and Planning Token Mapping

| Token family | Meaning                            | Commit target                                                         |
| ------------ | ---------------------------------- | --------------------------------------------------------------------- |
| `CH`         | chargeable client work             | project assignment                                                    |
| `MO`         | management and operations          | internal project/bucket assignment                                    |
| `MD`         | market development / initiative    | internal project/bucket assignment                                    |
| `PD`         | personal development / recruitment | internal project/bucket assignment                                    |
| `AB`         | absence                            | vacation row                                                          |
| `NA`         | non-bookable / calendar state      | usually availability or derived calendar rule, not project assignment |
| `UN`         | unassigned                         | no assignment commit                                                  |

### Assignment Mapping

Assignments should be written only when a project or internal bucket is resolved.

| Source                | Target                                  | Notes                                                     |
| --------------------- | --------------------------------------- | --------------------------------------------------------- |
| project booking cell  | `Assignment`                            | resource, project, role, start/end, hours/day, percentage |
| internal `MO` cell    | `Assignment` to internal project/bucket | utilization category `M&O`                                |
| internal `MD` cell    | `Assignment` to internal project/bucket | utilization category `MD&I`                               |
| internal `PD` cell    | `Assignment` to internal project/bucket | utilization category `PD&R`                               |
| `[_UN] unassign {UN}` | no final assignment                     | capacity remains free                                     |

### Vacation and Holiday Mapping

| Source token                    | Target                          | Notes                                                    |
| ------------------------------- | ------------------------------- | -------------------------------------------------------- |
| `[_AB] ... {AB}`                | `Vacation`                      | approved absence row                                     |
| `[_NA] Public Holiday ... {NA}` | `Vacation(type=PUBLIC_HOLIDAY)` | preferred source of truth is geography-driven generation |
| `[_NA] Weekend {NA}`            | no vacation row                 | derive from calendar                                     |

Public holiday implementation should integrate with the existing vacation planner and batch holiday support in [vacation.ts](/home/hartmut/Documents/Copilot/planarchy/packages/api/src/router/vacation.ts#L425).

### Availability and Part-Time Mapping

Part-time markers must alter resource availability instead of creating fake bookings.

Rules:

- `100%` maps to 8 available hours on a standard full working day
- `50%` maps to 4 available hours on a standard full working day
- part-time markers should reduce weekday availability on non-working or reduced-working days

Recommended commit target:

- `Resource.fte` remains the contractual baseline
- `Resource.availability` stores actual weekday available hours

Examples:

- 75% with one day fully off per week:
  - availability sets one weekday to `0`
- 80% with reduced time across five weekdays:
  - availability distributes reduced hours across weekdays

If the workbook does not encode the weekday pattern explicitly:

- stage the record as `availabilityPattern = unresolved`
- commit only the `fte`
- require manual review before applying weekday reductions

## Parser Design

The `DISPO_2026.xlsx` parser should operate at calendar-slot level and emit normalized staging records.

### Parser Inputs

- workbook name
- sheet name
- row identity
- day/column identity
- source token text
- resource roster metadata from the row

### Parser Outputs

- normalized booking type
- resource canonical ID
- date
- slot fraction
- hours per day
- percentage
- client token
- WBS
- utilization category
- win probability
- role token
- chapter token
- ignore flags
- unresolved reason when parsing fails

### Token Normalization Rules

1. strip ignored suffixes like `_HB`, `_SB`
2. extract client from `[CLIENT]`
3. extract WBS from `[12345678]` when numeric
4. extract unresolved project marker from `[tbd]`
5. extract utilization and win probability from `{CH80}`
6. extract role prefix from leading `2D`, `3D`, `PM`, `AD`
7. classify special bracket tokens `[_AB]`, `[_NA]`, `[_UN]`, `[_MO]`, `[_MD]`, `[_PD]`

## Internal Project Strategy

Internal work should not be left as free text.

Recommendation:

- seed canonical internal projects or internal planning buckets for:
  - `M&O`
  - `MD&I`
  - `PD&R`
- assign them consistent utilization categories
- keep source token text in assignment metadata for traceability

This allows planning, reporting, and chargeability to remain normalized.

## `[tbd]` Resolution Strategy

`[tbd]` is not a valid final project identity.

Recommended behavior:

- stage as unresolved project demand
- do not auto-create final project rows during commit
- allow reviewer resolution by:
  - matching to existing WBS
  - creating a new real project
  - converting the row into intentional unassigned capacity
  - converting the row into a `DemandRequirement` when a role need is known but no confirmed project exists

This is the only area where a manual review gate is required by default.

## Public Holiday Strategy

Public holidays should be driven by geography and stored in the vacation planner.

Recommended approach:

1. resolve each resource to country and, where relevant, metro city/federal state
2. generate public holidays for the applicable calendar year
3. commit them as approved `Vacation(type=PUBLIC_HOLIDAY)` rows
4. reconcile workbook holiday markers against generated rows

Known implementation gap:

- the chargeability forecast currently passes an empty `publicHolidays` list into SAH calculation in [chargeability-report.ts](/home/hartmut/Documents/Copilot/planarchy/packages/api/src/router/chargeability-report.ts#L167)

Required follow-up:

- make forecast logic consume public-holiday vacation rows or geography-derived holiday dates

## Reconciliation and Acceptance Checks

After commit, run reconciliation against the chargeability workbook.

Required checks:

- resource count matches expected staged resource count
- FTE totals by org unit match workbook aggregates within tolerance
- chargeability targets by management group match expected values
- project/client/WBS relationships are complete for all committed assignments
- unassigned capacity is visible only as free capacity, not as fake bookings
- vacations and public holidays are visible in the vacation planner
- part-time resources show reduced weekday availability where resolved

## Suggested Implementation Order

1. add import staging schema and storage
2. add full reset + reseed command
3. add reference-data importer for `MandatoryDispoCategories_V3.xlsx`
4. add canonical resource importer and identity normalization
5. add role and chapter normalization
6. add `DISPO_2026.xlsx` parser
7. add staged project resolver for WBS and `[tbd]`
8. add assignment commit flow
9. add vacation and public-holiday import flow
10. add part-time availability overlay logic
11. add reconciliation report against `ChgFC` and aggregate sheets
12. add rollback/replay support for repeated dry runs

## Operator Commands

For the clean-slate import workflow, use dedicated DB scripts instead of ad-hoc SQL:

- reset and bootstrap a disposable environment:
  - `pnpm --filter @capakraken/db db:reset:dispo -- --force`
- reset without `pg_dump` backup only in an intentionally disposable environment:
  - `pnpm --filter @capakraken/db db:reset:dispo -- --force --skip-backup`
- seed Dispo v2 reference vocabulary after reset:
  - `pnpm --filter @capakraken/db db:seed:dispo-v2`

The reset command:

- backs up the database with `pg_dump` by default
- truncates all public tables except Prisma migration history
- recreates a bootstrap admin user and baseline `SystemSettings`
- leaves workbook-derived reference and transactional data to the import pipeline

## Decision Log

The following decisions are locked for implementation and should be treated as the canonical baseline for all downstream tickets.

### D1. Canonical Resource Identity

- `enterpriseId` is the canonical external person key for all staged and committed import logic.
- `eid` remains in the schema as a compatibility alias during the transition period.
- the importer must mirror the canonical identity into both `enterpriseId` and `eid` on committed `Resource` rows unless a later cleanup ticket removes `eid`
- conflicting source rows for the same canonical identity must remain staged and unresolved

### D2. Reset And Bootstrap Scope

- the clean-slate reset wipes business data, auth/session data, notifications, audit logs, estimate data, and previous import artifacts
- the reset must reseed a bootstrap admin user, baseline platform permissions, and required singleton settings immediately after wipe
- imported workbook data is not the source of truth for platform access bootstrapping
- reset remains an explicit operator action with backup-first workflow

### D3. `[tbd]` Commit Policy

- `[tbd]` rows never auto-create final `Project` rows
- `[tbd]` rows remain staged and unresolved by default
- a reviewer may explicitly resolve a `[tbd]` row into:
  - an existing real project
  - a newly created real project
  - intentional unassigned capacity
  - a `DemandRequirement`
- automatic commit logic must skip unresolved `[tbd]` rows

### D4. Default Part-Time Fallback

- when the workbook encodes an explicit weekday pattern, the importer must commit that exact reduced availability pattern
- when only an FTE/percentage is known, the importer should distribute the reduced weekly capacity evenly across Monday to Friday
- any such fallback must be marked with a staging warning so operators can refine the weekday pattern later
- examples:
  - `100%` => `8/8/8/8/8`
  - `50%` => `4/4/4/4/4`
  - `80%` => `6.4/6.4/6.4/6.4/6.4`

### D5. Internal Project Buckets

Seed canonical internal planning projects/buckets as follows:

| Source token | Project short code | Project name                       | Utilization category |
| ------------ | ------------------ | ---------------------------------- | -------------------- |
| `MO`         | `INT-MO`           | `Management & Operations`          | `M&O`                |
| `MD`         | `INT-MD`           | `Market Development & Initiatives` | `MD&I`               |
| `PD`         | `INT-PD`           | `People Development & Recruitment` | `PD&R`               |

The importer should preserve the original source token in staged and committed metadata for traceability.

### D6. Seeded Role Vocabulary

The baseline role seed list required for Dispo import is:

- `2D Artist`
- `3D Artist`
- `Project Manager`
- `Art Director`

These roles are assignment-level delivery roles. They do not replace chapter/org ownership on the resource profile.

## Recommendation

Proceed with a staged importer and treat this document as the implementation baseline.

The critical architectural choices are:

- clean-slate reset
- canonical single-ID resource matching
- staged parsing before commit
- no fake unassigned bookings
- explicit assignment roles separate from resource chapters
- geography-driven public holidays in the vacation planner
- manual resolution gate only for `[tbd]` and unresolved availability patterns