-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode identifiers in the WAT format #1843
Comments
The new syntax merely requires delimiting identifiers with quote characters. Escapes are not necessary, except for exceptional cases of names that wouldn't even be allowable as unquoted identifiers, such as ones themselves containing quotes or control characters. The Wasm text format is a lightweight interchange format that is used by a wide variety of tools, with varying degrees of complexity and resource constraints, on a wide range of platforms, from Web to small embedded systems. Undelimited Unicode identifiers, if handled properly according to Unicode UAX # 31, would add substantial complexity to both specification and implementations: Unicode's definition of identifier is complicated and requires Unicode property tables to handle. The burden would be on all tools processing the Wasm text format, and is unlikely to get implemented on all, causing fragmentation. In contrast, to understand quoted identifiers, tools merely need to implement UTF-8 decoding, which is a few lines of code. As UAX # 31 admits itself: "The disadvantage of working with the lexical classes defined previously is the storage space needed for the detailed definitions, plus the fact that with each new version of the Unicode Standard new characters are added, which an existing parser would not be able to recognize. In other words, the recommendations based on that table are not upwardly compatible." Unfortunately, the alternative it suggests (negative character classification) also has serious problems, such as reserving the entire code space for identifiers, and hence turning many future extensions to the language's lexical syntax that would otherwise be conservative into breaking changes.
Simply because it did not make the feature cut, which already happened in 2021. But Wasm 3 is essentially done at this point, so will be pushed into the process immediately after Wasm 2 is published. |
In #1618 :
We (W3C i18n WG) have two questions about the resolution:
Why are Unicode identifiers not allowed directly in the WebAssembly text format (i.e., string-escaping seems to be required)? Although web developers usually don't read them, devtools developers, Wasm module authors, or WebAssembly compiler developers might read them and find Unicode identifiers useful. Escapes will make the identifiers unreadable. See https://github.com/unicode-org/message-format-wg/blob/5f6657b54f60b35a8fb17653942551ebf0b862ca/spec/message.abnf#L56 for an example of a language supporting Unicode identifiers, using XML-Name related restrictions.
Why is it only supported in Wasm 3, but not Wasm 2 (which is not CR yet)?
The text was updated successfully, but these errors were encountered: