Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ANTLR backend: Spurious escaping of single quotes in character sets #329

Closed
kaby76 opened this issue Dec 30, 2020 · 11 comments
Closed

ANTLR backend: Spurious escaping of single quotes in character sets #329

kaby76 opened this issue Dec 30, 2020 · 11 comments
Assignees
Labels
bug Java/ANTLR lexer Concerning the generated lexer
Milestone

Comments

@kaby76
Copy link

kaby76 commented Dec 30, 2020

I'm using BNF.cf and bnfc 2.8.4 to generate an Antlr4 grammar which I then strip and use for my extension in a language server. The generated lexer code is syntactically incorrect and not accepted by the Antlr 4.9 tool.

STRINGTEXT : ~[\"\\] -> more;
CHARANY     :  ~[\'\\] -> more, mode(CHAREND);

I'm not sure why this isn't accepted by the Antlr grammar, after all, EscAny is in Antlr4's lexer rule. I will investigate further. But I recommend removing the backslashes before the single and double-quotes.

Disregard if this is already fixed.

@andreasabel
Copy link
Member

Thanks @kaby76 for the report!
Please make sure that isn't already fixed in #319, and released with BNFC 2.9.0.

Should this not be fixed, please provide a MWE in form of a small .cf file that allow me to easily reproduce the bug.

@andreasabel andreasabel added info-needed More information is needed from the bug reporter to confirm the issue. Java/ANTLR labels Dec 30, 2020
@kaby76
Copy link
Author

kaby76 commented Dec 30, 2020

It is not fixed in v2.9.0, at least not completely. STRINGTEXT : ~["\\] -> more; is fine, but CHARANY : ~[\'\\] -> more, mode(CHAREND); is not.

Here is how to reproduce it from a Bash shell. I assume that you have Git, downloaded antlr-4.9-complete.jar (see https://www.antlr.org/download/index.html), downloaded Java (I have jdk-11.0.4), and placed the Java executable on path for your OS.

mkdir temp
cd temp
git clone https://github.com/BNFC/bnfc.git --branch v2.9.0 --single-branch
cd bnfc/source/src
cp ../BNFC.cf .
# I assume you are using bnfc 2.9.0 executable--I'm using that for Windows.
bnfc --java-antlr --force BNFC.cf
cd bnfc
# I assume you have downloaded the latest Antlr jar and java on path.
java -jar ~/Downloads/antlr-4.9-complete.jar *.g4

The output of this is:

warning(156): bnfcLexer.g4:78:16: invalid escape sequence \'
Exception in thread "main" java.lang.RuntimeException: set is empty
    at org.antlr.v4.runtime.misc.IntervalSet.getMaxElement(IntervalSet.java:421)
    at org.antlr.v4.runtime.atn.ATNSerializer.serialize(ATNSerializer.java:169)
    at org.antlr.v4.runtime.atn.ATNSerializer.getSerialized(ATNSerializer.java:601)
    at org.antlr.v4.Tool.generateInterpreterData(Tool.java:745)
    at org.antlr.v4.Tool.processNonCombinedGrammar(Tool.java:400)
    at org.antlr.v4.Tool.process(Tool.java:369)
    at org.antlr.v4.Tool.processGrammarsOnCommandLine(Tool.java:328)
    at org.antlr.v4.Tool.main(Tool.java:172)

@kaby76 kaby76 changed the title Generate Antlr4 grammar incorrect Generated Antlr4 grammar is incorrect Dec 30, 2020
@andreasabel
Copy link
Member

Ok, this seems to be a problem with more recent ANTLR versions, it is still ok with 4.5:

$ java  org.antlr.v4.Tool 
ANTLR Parser Generator  Version 4.5.1
...
$ java  org.antlr.v4.Tool -lib bnfc -package bnfc bnfc/bnfcLexer.g4
$ java  org.antlr.v4.Tool -lib bnfc -package bnfc bnfc/bnfcParser.g4

@kaby76
Copy link
Author

kaby76 commented Dec 30, 2020

Ok, thanks. That kind of sucks. I'll check out Antlr's source and see what got messed up.

@andreasabel
Copy link
Member

It seems like BNFC is still not implementing the escaping correctly, see the rules at: https://github.com/antlr/antlr4/blob/master/doc/lexer-rules.md
According to these rules, a single quote should not be escaped inside brackets (char sets).

CHARANY : ~[\'\\] -> more, mode(CHAREND);

This is wrong.

I suppose ANTLR got stricter checking the rules.

@kaby76
Copy link
Author

kaby76 commented Dec 31, 2020

I think it's a regression in the Antlr rules. This stuff was moved into lexer modes, which broke things. My first test using EscAny without modes works fine with [\'foobar], but I'm adding in the rules one by one to figure what broke the grammar. In any case, you might want to generate the single-quote without a backslash.

@kaby76
Copy link
Author

kaby76 commented Dec 31, 2020

There are at least two Git issues raised and then closed on escaped characters in Antlr: antlr/antlr4#1537 antlr/antlr4#1871 . For [\'], the tool gives a warning that it's not valid, but it crashes, which it shouldn't do. It is, apparently, now an invalid escape character, at least in character sets (https://github.com/antlr/antlr4/blob/master/doc/lexer-rules.md#lexer-rule-elements).

@andreasabel andreasabel added bug lexer Concerning the generated lexer and removed info-needed More information is needed from the bug reporter to confirm the issue. labels Jan 1, 2021
@andreasabel andreasabel added this to the 2.9.1 milestone Jan 1, 2021
@andreasabel andreasabel self-assigned this Jan 1, 2021
@andreasabel
Copy link
Member

andreasabel commented Jan 1, 2021

I am fixing the spurious escaping of 's in the ANTLR backend, but I agree that ANTLR should not crash on spurious escaping, just ignore it.

For [\'], the tool gives a warning that it's not valid, but it crashes, which it shouldn't do.

I left a comment at antlr/antlr4#1537: antlr/antlr4#1537 (comment)

I also filed antlr/antlr4#3024.

@andreasabel andreasabel changed the title Generated Antlr4 grammar is incorrect ANTLR backend: Spurious escaping of single quotes in character sets Jan 1, 2021
@andreasabel
Copy link
Member

Did the fix work for you, @kaby76 ?

@kaby76
Copy link
Author

kaby76 commented Jan 5, 2021

Yes, that fixed the problem. Thank you. It took me a while to get a build working because I work mostly on Windows in a MINGW64 environment. I could not get the build to work via Cabal (https://github.com/BNFC/bnfc#installing-the-development-version), but stack install --stack-yaml stack-8.10.3.yaml worked fine. I was looking to see what you have for nightly builds in order to set up a Windows env for building, but you're using Github Actions only on Linux. And, you have a .travis.yml file, which looks unused, and no Appveyor file.

@andreasabel
Copy link
Member

I was looking to see what you have for nightly builds in order to set up a Windows env for building, but you're using Github Actions only on Linux.

PRs welcome!
The current GHA workflows are automatically generated by Haskell-CI, which does not add macOS or Windows builds. These could be added manually in a separate .yaml file, similar to https://github.com/agda/agda/blob/master/.github/workflows/nightly.yml maybe.

And, you have a .travis.yml file, which looks unused, and no Appveyor file.

I used to do CI on travis, but am migrating to Github Actions because of the new limitations for OSS builds by travis-ci.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Java/ANTLR lexer Concerning the generated lexer
Projects
None yet
Development

No branches or pull requests

2 participants