-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
antlr's lack of unicode is disturbing #15679
Comments
In particular, it only knows about UCS2. You need to encode characters outside of the BMP using surrogates. However, the range syntax ( |
Note that this is also going to affect spans, since spans are in bytes but antlr gives them to us in characters. We can possibly correct this by translating the BytePos to CharPos. |
I am currently looking into this. I've added two additional files which contain the definitions of To generate the rules, I used noidejs, because https://github.com/mathiasbynens/unicode-4.0.0 provides all codepoints for But when I try to convert a
Here is the Rust code I use to convert them: fhahn@e85d830#diff-cc371bf2fbfcbeb87f125d3c45fd8fc3R237 I would really appreciate any hints what could be wrong. Note that at the moment I did only include symbols up to \uFFFF, but according to antlr/antlr4#276 surrogates could be specified like
becomes
|
@fhahn, I'm going to send you a patch with other symbols. But I have an issue where antlr sees 2 surrogates, but Rust's span counts them as one. |
The [issue it refers to](rust-lang/rust#15679) was closed a year ago.
The [issue it refers to](rust-lang/rust#15679) was closed a year ago.
The [issue it refers to](rust-lang/rust#15679) was closed a year ago.
Downgrade `unused_variables` to experimental I feel problems like rust-lang#15679 are common.
It's impossible to correctly form XID_start and XID_continue in antlr, so for now the reference lexer just ignores unicode entirely.
The text was updated successfully, but these errors were encountered: