-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Description
The current Unicode normalization policy (#5576, #5434) is to employ the NFC normalization to canonicalize identifiers. However, NFC is overly conservative as a choice of canonicalization, since it does not alleviate the possibility of writing obfuscated code using, for example, full-width punctuation characters in identifiers.
Example:
julia> b=3:5 #full-width equals
ERROR: b=3 not defined
julia> b=3=-1
-1
julia> [b=3:5]
7-element Array{Int64,1}:
-1
0
1
2
3
4
5
While in general we probably don't want to get into the business of building in semantic knowledge of natural languages into the parser, I think at the very least we should support as synonyms the default output produced by standard input method editors. As an example, setting the input method to Pinyin - Simplified IME on OSX 10.9, typing on the keyboard bing1=3
selects the first Chinese character with phonetic spelling bing
, then continues with =3
as part of the input stream. The result, when typed directly into the Julia REPL, is
julia> 丙=3
ERROR: 丙=3 not defined
which stems from the full-width =
being parsed as part of the identifier rather than the assignment operator, which is arguably what the typical user would have intended.