Skip to content

Parse a minimal set of fullwidth punctuation as synonyms #5903

@jiahao

Description

@jiahao

The current Unicode normalization policy (#5576, #5434) is to employ the NFC normalization to canonicalize identifiers. However, NFC is overly conservative as a choice of canonicalization, since it does not alleviate the possibility of writing obfuscated code using, for example, full-width punctuation characters in identifiers.

Example:

julia> b=3:5 #full-width equals
ERROR: b=3 not defined

julia> b=3=-1
-1

julia> [b=3:5]
7-element Array{Int64,1}:
 -1
  0
  1
  2
  3
  4
  5

While in general we probably don't want to get into the business of building in semantic knowledge of natural languages into the parser, I think at the very least we should support as synonyms the default output produced by standard input method editors. As an example, setting the input method to Pinyin - Simplified IME on OSX 10.9, typing on the keyboard bing1=3 selects the first Chinese character with phonetic spelling bing, then continues with =3 as part of the input stream. The result, when typed directly into the Julia REPL, is

julia> 丙=3
ERROR: 丙=3 not defined

which stems from the full-width being parsed as part of the identifier rather than the assignment operator, which is arguably what the typical user would have intended.

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs decisionA decision on this change is neededparserLanguage parsing and surface syntaxunicodeRelated to unicode characters and encodings

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions