Parse a minimal set of fullwidth punctuation as synonyms

The current Unicode normalization policy (#5576, #5434) is to employ the NFC normalization to canonicalize identifiers. However, NFC is overly conservative as a choice of canonicalization, since it does not alleviate the possibility of writing obfuscated code using, for example, full-width punctuation characters in identifiers.

Example:

``` julia
julia> b＝3:5 #full-width equals
ERROR: b＝3 not defined

julia> b＝3=-1
-1

julia> [b＝3:5]
7-element Array{Int64,1}:
 -1
  0
  1
  2
  3
  4
  5
```

While in general we probably don't want to get into the business of building in semantic knowledge of natural languages into the parser, I think at the very least we should support as synonyms the default output produced by standard input method editors. As an example, setting the input method to Pinyin - Simplified IME on OSX 10.9, typing on the keyboard `bing1=3` selects the first Chinese character with phonetic spelling `bing`, then continues with `=3` as part of the input stream. The result, when typed directly into the Julia REPL, is

``` julia
julia> 丙＝3
ERROR: 丙＝3 not defined
```

which stems from the full-width `＝` being parsed as part of the identifier rather than the assignment operator, which is arguably what the typical user would have intended.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Parse a minimal set of fullwidth punctuation as synonyms #5903

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Parse a minimal set of fullwidth punctuation as synonyms #5903

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions