-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Remove "normalized to NFKC" clause from the reference manual, section 3.1 #12388
Comments
We actually don't do any normalization of source input right now, the manual is just plain wrong in claiming it. A PR removing that sentence would probably be welcomed. |
@Kimundi the text requires that we, programmers, must normalize our source code before the compilation IMHO, so I would ask whether it is really needed. |
That's not what the manual is saying, the manual is saying that it On Wed, Feb 19, 2014 at 4:01 AM, OGINO Masanori notifications@github.comwrote:
|
@cmr Thank you for clarifying. Why is the spec saying so? |
@omasanori probably because there is precedent in other languages for doing such normalization, at least for identifiers. (Though if the spec implies doing it in string constants, then that is probably just sloppiness in the writing of the spec.) Related bug: #2253 update: To clarify, graydon originally wanted to do NFKC normalization in the lexer (as noted in the bug above), but he changed his mind and so we have been in a bit of a state of limbo ever since. But as I said above, the scope of that normalization was, I think, intended to be restricted to identifiers, not all lexical syntax (i.e. not the interior of string constants). |
@pnkfelix Thank you. I agree on the normalization for identifiers. I think it is acceptable. Certainly some similar but different identifiers treated as the same ones, but we should not do such cheat. (NFKC vs. NFC problem remains, though) |
The reference manual said that code is interpreted as UTF-8 text and a implementation will normalize it to NFKC. However, rustc doesn't do any normalization now. We may want to do any normalization for symbols, but normalizing whole text seems harmful because doing so loses some sort of information even if we choose a non-K variant of normalization. I'd suggest removing "normalized to Unicode normalization form NFKC" phrase for the present so that the manual represents the current state properly. When we address the problem (with a RFC?), then the manual should be updated. Closes #12388. Reference: #2253
internal: Make use of the statusBarItem colors in VSCode Fixes rust-lang/rust-analyzer#7736
From The Rust Reference Manual;
However, NFKC requires to transform some characters into different ones even in strings or comments and then we will get different results on such cases. Even NFC have some problems if we have to preserve a text strictly.
(yes, the word different is ambiguous; in NFKC, they are treated as the same, but the glyphs of them are different... sometimes depends on the font, though)
I'd suggest to remove the "normalized to NFKC" clause and leave the input, like golang. From The Go Programming Language Specification:
The text was updated successfully, but these errors were encountered: