Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: strip the UTF8 BOM #85

Merged
merged 1 commit into from
Mar 30, 2023
Merged

fix: strip the UTF8 BOM #85

merged 1 commit into from
Mar 30, 2023

Conversation

pgerlach
Copy link
Contributor

The input file is read as UTF8, and in csv-parse documentation is written "It is recommended to always activate this option when working with UTF-8 files." (https://csv.js.org/parse/options/bom/).

This fixes the case where there is a BOM, in which case the first column was not detected, because it includes the BOM character as the first char of the first column name.

If the file has no BOM, then the option does nothing.

We read the input file as UTF8, and in csv-parse documentation is
written "It is recommended to always activate this option when working
with UTF-8 files."
@gavinr
Copy link
Owner

gavinr commented Mar 24, 2023

Thanks for this. Can you please provide an example CSV file that is currently breaking that this PR fixes?

@pgerlach
Copy link
Contributor Author

Sure ! This is an export from Excel choosing the format "CSV UTF-8".

csv-file-with-utf8-bom.csv

hexdump shows that it begins with the UTF-8 BOM 0xefbbbf.

$ hexdump -C csv-file-with-utf8-bom.csv
00000000  ef bb bf 74 69 74 6c 65  2c 62 6f 64 79 0d 0a 55  |...title,body..U|
00000010  54 46 2d 38 20 42 4f 4d  2c 68 61 6e 64 6c 65 20  |TF-8 BOM,handle |
00000020  55 54 46 2d 38 20 66 69  6c 65 73 20 77 69 74 68  |UTF-8 files with|
00000030  20 42 4f 4d                                       | BOM|
00000034

githubCsvTools can't parse it. But it can parse the same file with the bom removed.

csv-file-without-utf8-bom.csv

@gavinr gavinr merged commit 1ee65e9 into gavinr:master Mar 30, 2023
@gavinr
Copy link
Owner

gavinr commented Mar 30, 2023

thanks!

github-actions bot pushed a commit that referenced this pull request Mar 30, 2023
## [3.1.7](v3.1.6...v3.1.7) (2023-03-30)

### Bug Fixes

* strip the UTF8 BOM ([#85](#85)) ([1ee65e9](1ee65e9))
@github-actions
Copy link

🎉 This PR is included in version 3.1.7 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants