-
Notifications
You must be signed in to change notification settings - Fork 30.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it a bug to return BOM header after calling fs.readFile when the text encoding (like utf8) is specified? #6924
Comments
It seems developers leave this to userland: #3040 |
I don't understand the rational behind such a decision, it's seems to be semantically incorrect ... It does not take a lot to do it in the core at all |
Well, I also am tired of writing time after time: const rli = rl.createInterface({input: fs.createReadStream(path, 'utf8')});
let lineNumber = 0;
rli.on('line', line => {
if (++lineNumber === 1) line = line.replace(bomRE, '');
//...
}) May be something will change in the distant future. |
There are at multiple (15+) different Byte Order Mark (BOM) values that can be received from a stream. Actual value of BOM is very relevant to someone implementing a generic parser capable of handling anything thrown at it - or at least fail gracefully. If core receives the BOM and stops passing it through then core should also handle all the nuances of every encoding and endianness accordingly and somehow present them in a uniform way for userland stream. That seems unlikely to happen and no-one wants the core to make dumbed down assumptions as a shortcut just because majority of users are expecting to receive UTF-8 on little-endian CPU. So, it is true that requesting stream in UTF-8 has fixed BOM of Maybe optional parameter for dropping BOMs from read streams could be offered? |
Yeh, what you suggested could be an option for the time being ... But still ... I am not an expert in text encoding so I do not know a lot intricacies about it. But to me it seems that byte ordering is platform dependent only, after having decoded a block of text successfully from the raw bytes the node core had already made the right conversion (using the BOM) and send the resulting text into the userland. What is the use of the BOM information after that, inside the userland, which most like doing only platform independent programming? For those who make generic parsers (the real men :-)), they can always read the raw bytes into a Buffer object and start from there. Am I missing something in my statements? |
Exactly. For example: embedded system code that gets cross-compiled to multiple platforms, code interfacing with legacy systems/hardware or mission-critical code with need to gracefully handle every possible input scenario; no matter how implausible.
You might be right about that, but I'm still arguing that BOMs are not by any means irrelevant nuisance and exposing them makes Node.js stronger in capability than it would be if hiding them. Streams starting with BOMs are defined in unicode standard. That said, it I still think that it would be good idea to offer option for BOM stripped streams - just for developer comfort. Also, BOMs could be used to implement |
It would potentially be possible to add a |
Maybe for streams it would be also very helpful. |
This issue has been inactive for sufficiently long that it seems like perhaps it should be closed. Feel free to re-open (or leave a comment requesting that it be re-opened) if you disagree. I'm just tidying up and not acting on a super-strong opinion or anything like that. |
fs.readFile* don't strip the BOM header, even when specifying 'utf8' as the encoding, and JSON.parse() doesn't handle it either. There are technically a bunch of BOM indicators, but this is the one seen most commonly and actually appears in a number of package.json files in the wild. See nodejs/node#6924, nodejs/node#3040 for background.
fs.readFile* don't strip the BOM header, even when specifying 'utf8' as the encoding, and JSON.parse() doesn't handle it either. There are technically a bunch of BOM indicators, but this is the one seen most commonly and actually appears in a number of package.json files in the wild. See nodejs/node#6924, nodejs/node#3040 for background.
fs.readFile* don't strip the BOM header, even when specifying 'utf8' as the encoding, and JSON.parse() doesn't handle it either. There are technically a bunch of BOM indicators, but this is the one seen most commonly and actually appears in a number of package.json files in the wild. See nodejs/node#6924, nodejs/node#3040 for background.
fs.readFile* don't strip the BOM header, even when specifying 'utf8' as the encoding, and JSON.parse() doesn't handle it either. There are technically a bunch of BOM indicators, but this is the one seen most commonly and actually appears in a number of package.json files in the wild. See nodejs/node#6924, nodejs/node#3040 for background.
fs.readFile* don't strip the BOM header, even when specifying 'utf8' as the encoding, and JSON.parse() doesn't handle it either. There are technically a bunch of BOM indicators, but this is the one seen most commonly and actually appears in a number of package.json files in the wild. See nodejs/node#6924, nodejs/node#3040 for background.
fs.readFile* don't strip the BOM header, even when specifying 'utf8' as the encoding, and JSON.parse() doesn't handle it either. There are technically a bunch of BOM indicators, but this is the one seen most commonly and actually appears in a number of package.json files in the wild. See nodejs/node#6924, nodejs/node#3040 for background.
fs.readFile* don't strip the BOM header, even when specifying 'utf8' as the encoding, and JSON.parse() doesn't handle it either. There are technically a bunch of BOM indicators, but this is the one seen most commonly and actually appears in a number of package.json files in the wild. See nodejs/node#6924, nodejs/node#3040 for background.
When loading text content from
filepath
that are utf-8 encode and having a BOM headerthe
content
returned still contains the BOM header, in addition to the text content. It causes a lot of wasted efforts to remove such a header in custom codes and is not the kind of behavior in other languages ...Removing such a header in node shouldn't affect most of the existing codes that depend on it since such a header is most likely not used by custom codes anyway
The text was updated successfully, but these errors were encountered: