Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 Byte Order Markers should be ignored #156

Open
jetzerb opened this issue May 13, 2021 · 1 comment
Open

UTF-8 Byte Order Markers should be ignored #156

jetzerb opened this issue May 13, 2021 · 1 comment

Comments

@jetzerb
Copy link
Contributor

jetzerb commented May 13, 2021

If a data file includes Byte Order Markers (in my case, UTF-8 BOM ef bb bf), those bytes should be ignored by trdsql, but instead are currently treated as part of the data file:

$ trdsql -ih 'select [Service Type] from data.csv limit 1'
2021/05/13 10:37:05 export: no such column: Service Type [select [Service Type] from `data.csv` limit 1]

$ sed -n '1{s/,.*//; p;}' data.csv  | tee >(hexyl)
Service Type
┌────────┬─────────────────────────┬─────────────────────────┬────────┬────────┐
│00000000│ ef bb bf 53 65 72 76 69 ┊ 63 65 20 54 79 70 65 0a │×××Servi┊ce Type_│
└────────┴─────────────────────────┴─────────────────────────┴────────┴────────┘

trdsql currently thinks the "Service Type" column name is prefixed with the 3 byte BOM:

$ col=$(printf "%b" '\xef\xbb\xbfService Type'); echo $col | hexyl; trdsql -ih -oat "select [$col] from data.csv limit 1"
┌────────┬─────────────────────────┬─────────────────────────┬────────┬────────┐
│00000000│ ef bb bf 53 65 72 76 69 ┊ 63 65 20 54 79 70 65 0a │×××Servi┊ce Type_│
└────────┴─────────────────────────┴─────────────────────────┴────────┴────────┘
+--------------------------------+
|          Service Type          |
+--------------------------------+
| Construction - General         |
+--------------------------------+

I don't know if other encodings also have byte order markers, but I see that Go's standard library will apparently never handle BOMs so each application must deal with them individually 😞

@noborus
Copy link
Owner

noborus commented May 14, 2021

Thank you for reporting the issue.
I know it doesn't work with UTF-8 BOM, but I don't want to support it if possible.
It may be dealt with in the future, but I would like you to deal with it in other ways if possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants