Skip to content

Conversation

@JasonWarrenUK
Copy link
Contributor

@JasonWarrenUK JasonWarrenUK commented Dec 10, 2025

Overview

This PR fixes a critical data integrity bug where learner reference numbers were being auto-generated instead of using the actual legacy values from source data, and previous learner reference numbers were being completely omitted from ILR files.

Tip

No additional steps required after pulling. Existing ILR files generated with auto-generated reference numbers will need to be regenerated to include correct legacy references.

Changes

🔴 Critical Bug Fix: Reference Number Handling

The Problem:

  • LearnRefNumber was being auto-generated from the loop index (0001, 0002, 0003...) instead of reading actual legacy reference numbers from the CSV
  • PrevLearnRefNumber was completely missing from learner records - not being included in ILR files at all

The Impact:

  • Learners were losing their historical reference numbers, breaking continuity with legacy systems
  • Previous learner references weren't being tracked, making it impossible to link learners to their prior records
  • Data submitted to ESFA would not match their existing learner database

The Fix:

  • Now reads LearnRefNumber from column 223 of the source CSV (the actual legacy reference)
  • Now reads PrevLearnRefNumber from column 222 and includes it in learner records
  • Reference numbers are preserved exactly as they appear in source data
Refactoring: Factory Module Pattern

Extracted learner-building logic into dedicated factory modules to prevent similar bugs and improve maintainability:

  • buildReferenceNumbers.js - Handles all learner reference numbers (LearnRefNumber, ULN, PrevLearnRefNumber)
  • buildLearnerDetails.js - Core learner information (name, DOB, contact details)
  • buildHealthDetails.js - LLDD and health problem data
  • buildPriorAttainment.js - Prior attainment information

Each module has a single, clear responsibility, making it obvious where data comes from and easier to spot mistakes.

src/utils/pushLearners.js

Simplified from 50+ lines to 24 lines by composing learner objects from focused factory functions, making the data flow clearer and reducing the risk of field mapping errors.

src/factories/buildEmploymentArray.js

Moved from src/utils/ to src/factories/ to align with new factory module pattern.

docs/inputs/

Updated schema documentation:

  • Renamed 25_26 Export.csv to 25-26 Export.csv (standardised naming)
  • Updated 25_26 Example.csv with current field mappings
  • Added 25_26 Properties.csv reference documenting CSV column structure

Summary

Like discovering your filing cabinet has been using "File001, File002, File003" instead of the actual document reference numbers written on each file, then realising some files are missing their "Previous Reference" labels entirely - this PR restores the proper reference numbers so everything links up correctly with the historical record system.

Reorganise learner construction by splitting monolithic functions into focused factory modules. Move files from src/utils to src/factories to better reflect their purpose as data builders rather than general utilities.

- Extract reference number, learner details, health details, prior attainment, and employment logic into separate factory modules
- Simplify pushLearners by composing learner objects from factory functions
- Update imports to reflect new factory structure
- Clean up formatting in main.js and buildLearningDeliveryArray
@JasonWarrenUK JasonWarrenUK merged commit f1acfb5 into main Dec 10, 2025
3 checks passed
@JasonWarrenUK JasonWarrenUK deleted the fix/apply-legacy-reference-numbers branch December 10, 2025 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants