Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bat shows BOM at the start of the line #285

Closed
rasjani opened this issue Sep 5, 2018 · 7 comments · Fixed by #337
Closed

bat shows BOM at the start of the line #285

rasjani opened this issue Sep 5, 2018 · 7 comments · Fixed by #337
Assignees
Labels
feature-request New feature or request

Comments

@rasjani
Copy link

rasjani commented Sep 5, 2018

when the source code has BOM at the begining of the line, its shown as "<U+FEFF>". I have now huge problem with this per se but it would be more user friendly to strip that piece away when showing the code and putting that info as interpreted value into same line as "File:" is, like additional piece "Encoding: UTF-16 Big Endian" ..

Happens atleast with bat 0.5.0

See https://en.wikipedia.org/wiki/Byte_order_mark for details

@sharkdp sharkdp added the feature-request New feature or request label Sep 5, 2018
@sharkdp
Copy link
Owner

sharkdp commented Sep 5, 2018

Thank you for your feedback.

I have recently written a small library (https://crates.io/crates/content_inspector) that I plan to use within bat. It will be able to detect BOMs, which should make this feature request possible. I love the idea of showing the encoding within the header. I was always planning to put more information there.

@sharkdp
Copy link
Owner

sharkdp commented Sep 5, 2018

Just curious: I have never encountered a UTF-16BE file "in the wild". Where do they appear? Windows?

@rasjani
Copy link
Author

rasjani commented Sep 5, 2018

Yeah, this came up in a c# code that only targets windows. But to be honest, i didnt check if that FEFF bom was BE or LE :)

@lzybkr
Copy link

lzybkr commented Oct 6, 2018

There was a commonly used Windows tool that generated UTF-16BE files by default (for no good reason really), but the default was thankfully changed to UTF8.

When adding BOM detection, it would be great to convert UTF16 to UTF8. Right now, UTF16-LE renders ^@ between each character, presumably because the text is being treated as UTF8, e.g.:

image

@sharkdp
Copy link
Owner

sharkdp commented Oct 6, 2018

When adding BOM detection, it would be great to convert UTF16 to UTF8.

That would be the plan, yes.

There was a commonly used Windows tool that generated UTF-16BE files by default

Notepad? :-)

@lzybkr
Copy link

lzybkr commented Oct 7, 2018

Notepad? :-)

That'd be funny, but no, it was powershell_ise.exe.

Thanks for the quick fix.

@sharkdp
Copy link
Owner

sharkdp commented Oct 17, 2018

Released in v0.8.0. @rasjani Could you please check if this works for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants