Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document how to get swap endian for the Buffer class #12813

Closed
Pomax opened this issue May 3, 2017 · 10 comments
Closed

Document how to get swap endian for the Buffer class #12813

Pomax opened this issue May 3, 2017 · 10 comments
Labels
buffer Issues and PRs related to the buffer subsystem. doc Issues and PRs related to the documentations. good first issue Issues that are suitable for first-time contributors. help wanted Issues that need assistance from volunteers or PRs that need help to proceed.

Comments

@Pomax
Copy link

Pomax commented May 3, 2017

As the Buffer class currently only supports utf16 in little endian ordering, Buffer cannot be used to read in data from things like OpenType fonts, which by definition are always in big endian ordering, irrespective of the hardware or data reader (see https://www.microsoft.com/typography/OTSpec/otff.htm, "data types", All OpenType fonts use Motorola-style byte ordering (Big Endian)).

Can Buffer be given a utf16be to match the already present utf16le, so that Buffer can be used with all utf16 data, rather than only with data that is in little endian ordering?

@addaleax addaleax added buffer Issues and PRs related to the buffer subsystem. feature request Issues that request new features to be added to Node.js. labels May 3, 2017
@addaleax
Copy link
Member

addaleax commented May 3, 2017

For what it’s worth, buffer.swap16() is there pretty much for this kind of thing. Do you think that would be enough to cover your use case? (It should be, but if your requirements are not matched by it, it would be good to know why.)

@Pomax
Copy link
Author

Pomax commented May 3, 2017

This would need to get called in every single place where buffers need to be turned into strings, which in a complex, pure Big Endian data file like an OpenType program is "tons of places", so not having a utf16be serialization but having a swap + toString makes every instance of that construction a potential bug, where new devs may forget to add the crucially important swap instruction, necessitating additional linting and build chain steps just to verify that people remembered to call swap prior to toString, but only for big endian data - that's basically an impossible task without relying on runtime errors.

That's a ton of work that can be avoided by offering utf16be in parallel with utf16le, so that people (and more importantly, code) can be explicit about what's happening in the code. Adding a utf16be will be both safer, DRYer (is that a word?) code.

@mscdex
Copy link
Contributor

mscdex commented May 3, 2017

IIRC the only reason utf16le/ucs2 support exists is because that's (supposedly) how JS strings are stored internally in V8, so that encoding comes free.

I'm not particularly keen on adding more encodings to core. There are third-party modules like iconv-lite that are more suitable for converting to/from other encodings.

@Pomax
Copy link
Author

Pomax commented May 3, 2017

To be clear, if there is a swap16, then offering the encoding utf16be but implementing it by under the bonnet "reading in the data, then calling swap16 in-place without the user having to type that" would be perfectly acceptable.

That is, as long as the docs of course have a little note for that encoding to explain that using utf16be incurs some processing overhead.

And in the mean time it might be worth updating the docs to, in the valid encodings section, mention that there is no utf16be but that buffer.swap16() exists. Additionally it's probably important to update the docs for the swap16() function, as most people won't find it if they're searching with terms that make sense in this context: the utf16be term (which people may be thinking of when searching because they know utf16le exists) won't find any hits on the page, and the terms little endian, big endian, or even just endian (which people will use because they know the "correct" names for byte ordering) won't find any results either, as the swap16() documentation does not mention these at all. So even if you know what you should be finding, the docs currently don't let you.

@mscdex
Copy link
Contributor

mscdex commented May 3, 2017

I'm mainly concerned about the performance hit brought with having to check yet another encoding name, especially for an encoding that is not common (I would bet utf16le is just as uncommon). It's not so much the difficulty of the actual implementation of the encoding.

@TimothyGu TimothyGu added doc Issues and PRs related to the documentations. and removed feature request Issues that request new features to be added to Node.js. labels May 4, 2017
@TimothyGu TimothyGu changed the title add "utf16be" as encoding for the Buffer class Document how to get swap endian for the Buffer class May 4, 2017
@TimothyGu
Copy link
Member

Relabeling this issue as a documentation one then.

@TimothyGu
Copy link
Member

@mscdex To be fair V8 internally stores strings as native-endian, and we swap it internally on creation when the machine is big endian:

node/src/node_buffer.cc

Lines 628 to 629 in ff001c1

if (IsBigEndian())
SwapBytes16(reinterpret_cast<char*>(&str[0]), str_length);

@Pomax
Copy link
Author

Pomax commented May 4, 2017

I'm cool with doc improvements.

@Trott
Copy link
Member

Trott commented Aug 10, 2017

Anybody up for volunteering for these doc improvements?

@BridgeAR BridgeAR added help wanted Issues that need assistance from volunteers or PRs that need help to proceed. good first issue Issues that are suitable for first-time contributors. labels Apr 22, 2018
@BeniCheni
Copy link
Contributor

BeniCheni commented Jun 3, 2018

Hi maintainer(s), analyzed the details of this "low-hanging", "good first issue" issue. Saw the timestamp as old as "Aug 10, 2017" in last comment, so I gave the simple PR a shot in #21111. Thanks in advance for review or issue comments here. I'll update the PR according, Or happy to close the PR if it's irrelevant.

P.S: Was trying to look out for a test or any simple change with "good first issue", but there are a lot of enthusiastic contributors out there that the "triffic" is not exactly "light" 😅 Will keep looking out the labels on the issue list.

P.P.S.: If anyone has advice for any "deeper" but still "good first issue" with a tiny bit of challenge, it's greatly appreciated. But I have been having a lot of fun updating docs as well. 👍

jasnell added a commit to jasnell/node that referenced this issue Oct 19, 2018
addaleax pushed a commit that referenced this issue Oct 20, 2018
Fixes: #12813

PR-URL: #23747
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
jasnell added a commit that referenced this issue Oct 21, 2018
Fixes: #12813

PR-URL: #23747
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
MylesBorins pushed a commit that referenced this issue Oct 30, 2018
Fixes: #12813

PR-URL: #23747
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
rvagg pushed a commit that referenced this issue Nov 28, 2018
Fixes: #12813

PR-URL: #23747
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
MylesBorins pushed a commit that referenced this issue Nov 29, 2018
Fixes: #12813

PR-URL: #23747
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buffer Issues and PRs related to the buffer subsystem. doc Issues and PRs related to the documentations. good first issue Issues that are suitable for first-time contributors. help wanted Issues that need assistance from volunteers or PRs that need help to proceed.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants