-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Do not warn about similar ASCII-only idents #2923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
See discussion at rust-lang/rust#55467 (comment) ff. Rename `less_used_codepoints` to `uncommon_codepoints` (rust-lang/rust#55467 (comment))
One caveat of ignoring ASCII-only identifiers is that the compiler still warns about mixed script identifiers with a potentially confusable ASCII character. By this metric |
Looks great. By the way, another unsolved question:
Reading the actual generated table, the answer is also clearly 'yes', i think. The mixed script confusable alphabet for ascii latin is: and a single lib.rs with cc @Manishearth |
@crlf0710 that table seems incorrectly generated. |
oh, never mind |
Yes, we need a Latin exception, because the stdlib is Latin. The RFC states this outright. |
Should I just remove the (un)resolved question about |
No, that's not how unresolved questions work, we resolve them in the tracking issue if necessary. |
Implementation adjustment is ready at rust-lang/rust#72069 |
Is there some way we can implement the notion that the actual confusable characters must be from different scripts? |
That's already in the RFC in the heuristics for mixed_script_confusables |
Aren't them the same? Person and 人 have the same meaning but there is no easy way to write el or e1 in Chinese or Japanese. So a mix of characters is still likely needed, such as 人1 or 人2, it would be weird to see 人一 or 人二. I don't know if writing variables in non-latin would be helpful but a mix for numeric might be needed. |
No, one is rén + the letter e + the number 1, the other is rén + the letter e + the lowercase letter l |
I would rather use @joshtriplett's heuristic here. mixed_script_confusables already uses this, we just need to reuse this: we lint on different strings that map to each other via mixed-script-confusables only. |
@Manishearth Ah, you are talking about the By the way, how did you type |
@pickfire sigh the point is that the current patch catches that case incorrectly because it contains a non-confusable non-ascii character as well. I have a compose key on my keyboard. I also have a chinese input method set up, but I wanted to specifically list out the word there. |
[unicode-set-allowed]: https://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%5B%3AIdentifier_Status%3DAllowed%3A%5D%26%5B%3AXID_Continue%3DYes%3A%5D%5D&g=&i=Script_Extensions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this link correct? Why do I get 404?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the tool is down right now, unicode is having IT issues
I don't think this is considered anymore. |
See discussion at rust-lang/rust#55467 (comment) ff.
Rename
less_used_codepoints
touncommon_codepoints
(rust-lang/rust#55467 (comment))cc @crlf0710