-
Notifications
You must be signed in to change notification settings - Fork 695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String encoding is often unspecified #968
Comments
BinaryEncoding.md should only specify length + bytes. It's not a string, but rather a collection of bytes, and is treated as such. An embedder such as JavaScript then can restrict encoding to something it can handle, such as UTF-8. This is what JS.md should do. Any divergence from this is a spec bug. |
Hypothetically, an embedder could ban UTF8, also. There are no assurances of interoperability among embedders built into the current spec. |
I'm assuming that the motivation is to emphasize that, at the binary level, strings are always treated as opaque binary keys and that the embedder has flexibility/responsibility in how it handles invalid identifiers, case-folding for case-insensitive languages, etc. If so, perhaps BinaryEncoding.md could be explicit about the opaque treatment of strings, and still state that strings are UTF-8 encoded to close the interoperability issue at the binary level? (...or was there some other motivation to leave the door open for UTF-16, etc.?) PS - I am coming from the language compiler author's perspective, looking to the binary encoding spec to inform me about which bytes to emit. I realize that from the bytecode compiler side, it is desirable for string encoding to be an uninteresting detail. |
The JavaScript API requires UTF-8 for every string it uses. So, if you intend to run your compiler's output directly in a web browser, UTF-8 is what you want. The binary encoding document makes no mention of JavaScript at all. It's extremely "pure"... it tells you everything you need to know about reading and writing WASM with no discussion about how it fits into a larger ecosystem. You really need to read that, plus Semantics and JavaScript API to get a full picture on how it actually works today. |
Was this fixed by 8e5ecc3? |
Thank you! :) |
In BinaryEncoding.md, 'fun_name_str' and 'local_name_str' specify "valid utf8 encoding".
Other string fields are less specific about the encoding (e.g., "module string of module_len bytes").
I assume utf8 is permitted everywhere?
The text was updated successfully, but these errors were encountered: