Skip to content

Commit

Permalink
Update 0035-metadata-csv-utf-8-file-validation.md
Browse files Browse the repository at this point in the history
  • Loading branch information
ian-hoyle authored Oct 16, 2024
1 parent a96216d commit a34ef71
Showing 1 changed file with 1 addition and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ The structure and integrity of the CSV file is checked before the actual [data i
The metadata csv files are prepared by the transferring body, and it is expected **Microsoft Excel** will be used and users can be required to save with the UTF-8 option.
Files stored in this manner will add the BOM

The tdr-draft-metadata-validator will use the presence of the BOM (Byte order mark) at the beginning of the file. This indicates the file may have been created with **Microsoft Excel** and explicitly saved as UTF-8.
The tdr-draft-metadata-validator will use the presence of the BOM (Byte order mark) at the beginning of the file. This indicates the file may have been created with **Microsoft Excel** and explicitly saved as UTF-8. This is a simple and fast check that only for the first three bytes 'EF BB BF'
This method is restrictive as other spreadsheet software such as LibreOffice (Linux) and Numbers (Mac) do not add the BOM

TNA CSV validator uses TNA utf8-validator library. This library checks the byte sequence and invalid single bytes. It does not check the BOM that suggests a file saved explicitly saved as UTF-8 from Microsoft Excel
Expand Down

0 comments on commit a34ef71

Please sign in to comment.