-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Have ANTLR4 prevent conflict with user rule names by behind-the-scenes renaming of its own variables #1070
Comments
I like the idea of renaming identifiers which conflict in the target language rather than reporting an error.
The C# target actually already does this in the code generation template by changing identifiers like |
Maybe it's better to restrict rule names from common "bad-word" set? (Union of sets: Java, C#, JavaScript etc.). Because of |
I would prefer to not do this, since we end up in a bad position in the future:
In addition, users targeting a specific language should not need to be aware of the syntax of other languages unrelated to their work. |
Current targets already check for "bad words". e.g., protected static final String[] javaKeywords = {
"abstract", "assert", "boolean", "break", "byte", "case", "catch",
"char", "class", "const", "continue", "default", "do", "double", "else",
"enum", "extends", "false", "final", "finally", "float", "for", "goto",
"if", "implements", "import", "instanceof", "int", "interface",
"long", "native", "new", "null", "package", "private", "protected",
"public", "return", "short", "static", "strictfp", "super", "switch",
"synchronized", "this", "throw", "throws", "transient", "true", "try",
"void", "volatile", "while"
};
/** Avoid grammar symbols in this set to prevent conflicts in gen'd code. */
protected final Set<String> badWords = new HashSet<String>(); You should get a warning. E.g.,
|
Linked: #1851 |
@parrt I agree with @sharwell message. Moreover, I've approved a lot of problems with symbol conflicts in grammars-v4, you can find them by symbol-conflic label (already more than 50 and some of the issues are not marked). Grammar developers are forced to make a lot of useless work on identifiers renaming ( I suggest reopening the issue and the following fixes:
It doesn't look like a big change to the core but it simplifies the development of universal grammars a lot. Also, it improves the clarity of grammars and breaks the dependency on target runtime. Also, I think grammars-v4 is very important for the ANTLR community, and its grammars are widely used. |
@KvanTTT Yes, that looks like a good solution. Finding and fixing all these symbol conflicts was ridiculous busy work. It was alleviated to some extent after I wrote trrename and applied it to create one gigantic symbol renaming in 183 files. When/if this is fixed, we can rename everything back to a time before all this terrible renaming started, again using trrename (it works across imported grammars and between the lexer and parser, but it does not yet take regular expressions like sed). I know people were not happy with the renaming that I did. I assume |
Re 1 I'd suggest a much simpler approach, where we simply add |
I suggest adding prefix or suffix to only words from existing
Yes, this change has a significant impact on user code, also suffixes everywhere look a bit redundant. Also, I think it's not much simpler than my suggestion and not so natural (if runtime supports reserved word escaping it should be used). |
I renamed |
Are you planning to parameterize the "bad word list"? Unfortunately, we're still finding them. A few weeks ago I made a PR to rename "emptyStatement" for the Go target of java/java8, javascript/ecmascript, .... If you only correct what we know, then we're back where we started, i.e., we're going to still have to manually rename some symbol in grammars-v4 once we discover it. We still haven't ported quite a few grammars in grammars-v4 to Go and even less for the Cpp target. I only created CI for Cpp in grammars-v4 in the last week, and most of the grammars are skipped. I have absolutely no clue what will happen when I get around to Swift. It would be good if I can adjust the renaming (per symbol, or across a symbol class--TOKEN_REF vs RULE_REF) when I make the port rather than wait for a new version of Antlr. I'm now wondering if this renaming by the Antlr tool is the right place for it. @studentmain and I have been bantering around for many months the idea of a "preprocessor" for the Antlr grammar in grammars-v4. It could rename symbol based on some parameterized rule. |
Yes, but I am still afraid of huge changes in user code if use your strategy. Also, I hope one moment in the future we'll find almost |
Hi,
not sure I agree with the proposed change.
I think rules should have the same name across all targets and all grammar names, such that it’s easier to locate rules within and across targets without having to remember per target implementation details.
As an example, rule ’this’ should be named ’this_rule’ regardless of the target, otherwise it will be named ’this_rule’ in most targets, but ***@***.***' in C# and ’this’ in Python… simply horrible.
Every rule context is named xxxContext and we should follow a similar strategy for naming rules.
I agree that making this an option could reduce the immediate noise, but it would defeat the purpose i.e. help newbies avoid name collision issues.
End of the day, this might not be the right approach.
Eric
… Le 30 déc. 2021 à 17:57, Ivan Kochurkin ***@***.***> a écrit :
That assumes of course that we've found all the bad words for a target. Unfortunately, we're still finding them.
Yes, but I am still afraid of huge changes in user code if use your strategy. Also, I hope one moment in the future we'll find almost
all of such bad words, and maybe ANTLR will be updated more frequently. For now, I suggest using escaping for only bad words by default and maybe an option that turns on escaping for all words (boolean --escaping or string --escapingSuffix?). The final decision is up to @parrt <https://github.com/parrt> (It looks like I'm able to implement everything).
—
Reply to this email directly, view it on GitHub <#1070 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZNQJGOX5UNYP4O4NO4HKTUTSFQXANCNFSM4BW3ON4A>.
You are receiving this because you commented.
|
Actually, what I'm arguing for is to allow the user to input the bad word list (e.g., "emptyStatement" => "emptyStatement_") or global renaming scheme ("TOKEN_REF" => "TOKEN_REF + '_'") to the Antlr tool. People who use a grammar X for target Y will discover the symbol conflicts, and they can adjust the renaming to what they want, rather than modify grammar X and check that into grammars-v4. People who use grammar X for target W won't be affected since grammar X is constant and they presumably had their own discovery of symbol conflicts for target W. You can then release Antlr with a default list for target Y, W, ..., but be able to overwrite that list when a grammar is ported and not have to change the grammar, and not wait for another release of Antlr. Otherwise, I can just run another tool, like trrename or @studentmain 's translator, prior to running CI for grammars-v4 and adjust that list rather than keep making PRs to change grammars for symbol conflict. |
Users ordinary use single target as I understand. But grammar developers write grammars for all runtimes. Thus per target implementation details is mostly their responsibility.
I don't think it's a bit problem because code completion works well. In C# it will suggest For your case, maybe it makes sense to add something like
It's a very big immediate noise covering all ANTLR runtime users that ideally should be prevented. |
If @ericvergnaud solution is accepted, I suggest fixing the following issue #1615 together since all runtime identifiers will be broken anyway. The mentioned issue has 13 upvotes thus it's important. |
Looking at #1615, my tuppence-worth: does it matter? I embed actions
into the grammar. Very minimal ones, to build the AST, crash and burn on
syntax errors, and pretty well nothing else eg.
fkjoins_branch returns [FKJoinsBranch fkjb] :
SR
fkjac = fkjoins_arrow_chain
fkjbq = opt_fkjoins_branches
ER
{
var fkjac = $fkjac.fkjac;
var fkjbq = $fkjbq.fkjbq;
var res = new FKJoinsBranch(fkjac, fkjbq);
$fkjb = res;
}
;
I don't see the generated code unless something goes wrong, and the case
style is irrelevant IMO.
Is this substantially a different situation when doing
listeners/visitors, such that case naming starts to be of concern?
cheers
jan
…On 30/12/2021 17:58, Ivan Kochurkin wrote:
If @ericvergnaud <https://github.com/ericvergnaud> solution is
accepted, I suggest fixing the following issue #1615
<#1615> together since all
runtime identifiers will be broken anyway. The mentioned issue has 13
upvotes thus it's important.
—
Reply to this email directly, view it on GitHub
<#1070 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AD4SBWLQ5ZDT62VP4XNQVLTUTSMVBANCNFSM4BW3ON4A>.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
It also looks good. By the way, when I was a newbie in ANTLR I was confused about how to call the parse method. In this case |
You have to call parsing methods and access to parse tree nodes in visitor/listener (or without them). I don't think it's rare code. Also, your message is a bit irrelevant to the current topic. |
‘very big noise’ vs small problem… not sure anyone will like this
… Le 30 déc. 2021 à 18:43, Ivan Kochurkin ***@***.***> a écrit :
I think rules should have the same name across all targets and all grammar names, such that it’s easier to locate rules within and across targets without having to remember per target implementation details.
Users ordinary use single target as I understand. But grammar developers write grammars for all runtimes. Thus per target implementation details is mostly their responsibility.
As an example, rule ’this’ should be named ’this_rule’ regardless of the target, otherwise it will be named ’this_rule’ in most targets, but @.***' in C# and ’this’ in Python… simply horrible.
I don't think it's a bit problem because code completion works well. In C# it will suggest @this as well it will suggest this_ in Python. Moreover _rule looks natural for Python runtime where snake_case is used, but it's ugly for C# and Java because they use CamelCase. Anyway, generated code differs across runtimes because of different conventions.
I agree that making this an option could reduce the immediate noise, but it would defeat the purpose i.e. help newbies avoid name collision issues.
It's a very big immediate noise covering all ANTLR runtime users that ideally should be prevented.
—
Reply to this email directly, view it on GitHub <#1070 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZNQJHQT3IC7ZIZATTYBULUTSK33ANCNFSM4BW3ON4A>.
You are receiving this because you commented.
|
Good morning! well this is a difficult decision. I personally ran into this naming issue the other day when I tried to run the Java grammar using Python or something and I got a bad word collision. I had to change the grammar and that annoyed the crap out of me. haha. welcome to the party I guess. Let me see if I understand the situation. Users access parser rule x by calling x() in their support code and they also access XContext as well as the field y associated with rule argument y and so on. The essential problem is that sometimes x and y are reserved words in the target language, which will of course cause a syntax error for the generated code. Thinking out loud...
I would propose a single default target method, such as |
Good evening! (actually almost night for me) :) If I understand correctly, you suggest almost the same algorithm that I suggested, but with only a single method But why do you prefer prefix over suffix? I guess suffix is better for code completion and clarity (private fields start with |
:) I don't prefer any translation... |
Ter,
this will not only affect new grammars but also new targets for existing grammars (which is what Ken Domino bumped into when checking grammars in our repo).
Eric
… Le 30 déc. 2021 à 20:22, Terence Parr ***@***.***> a écrit :
Good morning! well this is a difficult decision. I personally ran into this naming issue the other day when I tried to run the Java grammar using Python or something and I got a bad word collision. I had to change the grammar and that annoyed the crap out of me. haha. welcome to the party I guess.
Let me see if I understand the situation. Users access parser rule x by calling x() in their support code and they also access XContext as well as the field y associated with rule argument y and so on. The essential problem is that sometimes x and y are reserved words in the target language, which will of course cause a syntax error for the generated code.
Thinking out loud...
I take it as axiomatic that we don't want to force everyone to manually go edit all of their support code (visitors etc...) to change every single x and y reference for existing working projects. Existing grammar builds + support code should continue to work, even if someone had to alter the grammar rule names to make it work with that target.
I think it's reasonable to suggest that we remove the bad word error message and simply ask the target to tweak the symbol for generation purposes. E.g., x -> _x or whatever. This can only affect new grammars because previously it was an error, hence, no backward compatibility issues. It does mean that programmers will have to be aware that rule class becomes _class in the generated code, but we are used to this kind of thing as programmers.
It seems that we should do the minimal change, at the cost of a bit of inconsistency. Rule x -> _x but also to XContext. I'm not sure I like _xContext or _XContext.
I'm not concerned that it might be x in Python but _x in DART support code; everything will be self consistent within a specific target world.
I would propose a single default target method, such as escageTargetSymbol() or something, that added _ as a prefix or something; individual targets could do as they like. Moreover, this method would only be called for the list of known bad words. I value the relationship between a grammar rule name and argument name and the same name in the generated code more than I dislike the inconsistency of few symbols that must be escaped.
—
Reply to this email directly, view it on GitHub <#1070 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZNQJB7YRON52JZUJB4LQLUTSWORANCNFSM4BW3ON4A>.
You are receiving this because you were mentioned.
|
Whether "newbie" or "power user", people just want to get something from grammars-v4 to work. And, believe it or not, people like Antlr because of the huge grammar library in grammars-v4. So, yes, as an advocate of Antlr, this matters. Either:
"Newbies" don't understand what a "symbol conflict" means (see this or this). |
Deprecate USE_OF_BAD_WORD
Just little addition. Even if using different suffixes/prefixes, the final identifiers may conflict with runtime keywords in theory (but it's a much rare case). |
From the PR, I am seeing both |
Ivan points out rule |
Ok, it can be resolved by checking the method |
Probably we should just choose an escape that always works for that target. |
Unfortunately, it's anyway not so easy because grammar semantics analyzer and grammar generator use different names during processing. Analyzer always uses original names, which can ref other rules in original grammar. The generator uses mostly runtimeName (but not always as I understand). It's important not to mix them. Current code is written without the conception of different grammar/generated rule names. |
The point is about native words escaping in C# and Swift. |
BTW, targets at least use case transformation for some entities, i.e. |
But usually as part of another word, I think right? |
Yes, for all targets except Go. It has |
…arrt version) Deprecate USE_OF_BAD_WORD
Now I think maybe it's more compilated for C# and Swift. |
…arrt version) Deprecate USE_OF_BAD_WORD
Yep, every target can implement the simple rewriting as they want. current thoughts:
|
I like the first option more, and I'm trying to implement both of them for comparison: 1, 2. It looks like they require a comparable amount of changes, but the first one requires more template changes, the second one requires more tool changes. But the first choice looks preferable in my opinion because escaping is related only to conflicting keywords and symbols, and original rule names without escaping can be obtained. But I also accept the second option, if it's more preferable for all. |
it looks like you are saying you like the second option more and then you say the first option. :) Oh, i see. you like the second option but you think we should implement the first. |
Sorry, I messed up. I meant I like the first choice when a rule |
Yeah makes most sense I think. |
…r -> for_ but RULE_for) Deprecate USE_OF_BAD_WORD
…r -> for_ but RULE_for) Deprecate USE_OF_BAD_WORD
* Escape reserved words during grammar generation, fixes #1070 (for -> for_ but RULE_for) Deprecate USE_OF_BAD_WORD * Make name and escapedName consistent across tool and codegen classes Fix other pull request notes * Rename NamedActionChunk to SymbolRefChunk
* Get rid of reflection in CodeGenerator * Rename TargetType -> Language * Remove TargetType enum, use String instead as it was before Create CodeGenerator only one time during grammar processing, refactor code * Add default branch to appendEscapedCodePoint for unofficial targets (Kotlin) * Remove getVersion() overrides from Targets since they return the same value * Remove getLanguage() overrides from Targets since common implementation returns correct value * [again] don't use "quiet" option for mvn tests...hard to figure out what's wrong when failed. * normalize targets to 80 char strings for ATN serialization, except Java which needs big strings for efficiency. * Update actions.md fixed a small typo * Rename `CodeGenerator.createCodeGenerator` to `CodeGenerator.create` * Replace constants on string literals in `appendEscapedCodePoint` * Restore API of Target getLanguage(): protected -> public as it was before appendUnicodeEscapedCodePoint(int codePoint, StringBuilder sb, boolean escape): protected -> private (it's a new helper method, no need for API now) Added comment for appendUnicodeEscapedCodePoint * Introduce caseInsensitive lexer rule option, fixes #3436 * don't ahead of time compile for DART. See 8ca8804#commitcomment-62642779 * Simplify test rig related to timeouts (#3445) * remove all -q quiet mvn options to see output on CI servers. * run the various unit test classes in parallel rather than each individual test method, all except for Swift at the moment: `-Dparallel=classes -DthreadCount=4` * use bigger machine at circleci * No more test groups like parser1, parser2. * simplify Swift like the other tests * fix whitespace issues * use 4.10 not 4.9.4 * improve releasing antlr doc * Add Support For Swift Package Manager (#3132) * Add Swift Package Manager Support * Swift Package Dynamic * 【fix】【test】Fix run process path Co-authored-by: Terence Parr <parrt@cs.usfca.edu> * use src 11 for tool, but 8 for plugin/runtime (#3450) * use src 11 for tool, but 8 for plugin/runtime/runtime-tests. * use 11 in CI builds * cpp/cmake: Fix library install directories (#3447) This installs DLLs in bin directory instead of lib. * Python local import fixes (#3232) * Fixed pygrun relative import issue * Added name to contributors.txt Co-authored-by: Terence Parr <parrt@cs.usfca.edu> * Update javadoc to 8 and 11 (#3454) * no need for plugin in runtime, always gen svg from dot for javadoc, gen 1.8 not 1.7 doc for runtime. Gen 11 for tool. * tweak doc for 1.8 runtime. Test rig should gen 1.8 not 1.7 * [Go] Fix (*BitSet).equals (#3455) * set tool version for testing * oops reversion tool version as it's not sync'd with runtime and not time to release yet. * Remove unused variable from generated code (#3459) * [C++] Fix bugs in UnbufferedCharStream (#3420) * Escape bad words during grammar generation (#3451) * Escape reserved words during grammar generation, fixes #1070 (for -> for_ but RULE_for) Deprecate USE_OF_BAD_WORD * Make name and escapedName consistent across tool and codegen classes Fix other pull request notes * Rename NamedActionChunk to SymbolRefChunk * try out windows runners * rename workflow * Update windows.yml Fix cmd line issue * fix maven issue on windows * use jdk 11 * remove arch arg * display Github status for windows * try testing python3 on windows * try new run for python3 windows * try new run for python3 windows (again) * try new run for python3 windows (again2) * try new run for python3 windows (again3) * try new run for python3 windows (again4) * try new run for python3 windows (again5) * try new run for python3 windows * try new run for python3 windows * try new run for python3 windows * ugh i give up. python won't install on github actions. * Update windows.yml try python 3 * Update windows.yml * Update run-tests-python3.cmd * Update run-tests-python3.cmd * Create run-tests-python2.cmd * Update windows.yml * Update run-tests-python2.cmd * Update windows.yml * Update windows.yml * Update windows.yml * Create run-tests-javascript.cmd * Update run-tests-javascript.cmd * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Create run-tests-csharp.cmd * Update windows.yml * fix warnings in C# CI * Update windows.yml * Update windows.yml * Create run-tests-dart.cmd * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update run-tests-dart.cmd * Update run-tests-dart.cmd * Update run-tests-dart.cmd * Update run-tests-dart.cmd * Update windows.yml * Update windows.yml * Update windows.yml * Create run-tests-go.cmd * Update windows.yml * Update windows.yml * Update windows.yml * GitHub action php (#3474) * Update windows.yml * Create run-tests-php.cmd * Update run-tests-php.cmd * Update run-tests-php.cmd * Update run-tests-php.cmd * Update run-tests-php.cmd * Update windows.yml * Update windows.yml * Update windows.yml * Update run-tests-php.cmd * Update windows.yml * Cleanup ci (#3476) * Delete .appveyor directory * Delete .travis directory * Improve CI concurrency (#3477) * Update windows.yml * Update windows.yml * Update windows.yml * Optimize toArray replace toArray(new T[size]) with toArray(new T[0]) for better performance https://shipilev.net/blog/2016/arrays-wisdom-ancients/#_conclusion * add contributor * resolve conflicts * fix-maven-concurrency (#3479) * fix-maven-concurrency * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update run-tests-python2.cmd * Update run-tests-python3.cmd * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update windows.yml * Update run-tests-php.cmd * Update windows.yml * Update run-tests-dart.cmd * Update run-tests-csharp.cmd * Update run-tests-go.cmd * Update run-tests-java.cmd * Update run-tests-javascript.cmd * Update run-tests-php.cmd * Update run-tests-python2.cmd * Update run-tests-python3.cmd * increase Windows CI concurrency for all targets except Dart * Preserve line separators for input runtime tests data (#3483) * Preserve line separators for input data in runtime tests, fix test data Refactor and improve performance of BaseRuntimeTest * Add LineSeparator (\n, \r\n) tests * Set up .gitattributes for LineSeparator_LF.txt (eol=lf) and LineSeparator_CRLF.txt (eol=crlf) * Restore `\n` for all input in runtime tests, add extra LexerExec tests (LineSeparatorLf, LineSeparatorCrLf) * Add generated LargeLexer test, remove LargeLexer.txt descriptor * tweak name to be GeneratedLexerDescriptors * [JavaScript] Migrate from jest to jasmine * [C++] Fix Windows min/max macro collision * [C++] Update cmake README.md to C++17 * remove unnecessary comparisons. * Add useful function writeSerializedATNIntegerHistogram for writing out information concerning how many of each integer value appear in a serialized ATN. * fix comment indicating what goes in the serialized ATN. * move writeSerializedATNIntegerHistogram out of runtime. * follow guidelines * Fix .interp file parsing test for the Java runtime. Also includes separating the generation of the .interp file from writing it out so that we can use both independently. * Delete files no longer needed. Should have been part of #3520 * [C++] Optimizations and cleanups and const correctness, oh my * [C++] Optimize LL1Analyzer * [C++] Fix missing virtual destructors * Remove not used PROTECTED, PUBLIC, PRIVATE tokens from ANTLRLexer.g * Remove ANTLR 3 stuff from ANTLR grammars, deprecate ANTLR 3 errors * Remove not used imaginary tokens from ANTLRParser.g * Fix misprints in grammars * ATN serialized data: remove shifting by 2, remove UUID; fix #3515 Regenerate XPathLexer files * Disable native runtime tests (see #3521) * Implement Java-specific ATN data optimization (+-2 shift) * [C++] Remove now unused antlrcpp::Guid * pull new branch diagram from master * use dev not master branch for CI github * update doc from master * add back missing author * [C++] Fix const correctness in ATN and DFA * keep getSerializedATNSegmentLimit at max int * Fixes #3259 make InErrorRecoveryMode public for go * Change code gen template to capitalize InErrorRecoveryMode * [C++] Improve multithreaded performance, fix TSAN error, and fix profiling ATN simulator setup bug * Get rid of unnecessary allocations and calculations in SerializedATN * Get rid of excess char escaping in generated files, decrease size of output files Fix creation of excess fragments for Dart, Cpp, PHP runtimes * Swift: fix binary serialization and use instead of JSON * Fix targetCharValueEscape, make them final and static * [C++] Cleanup ATNDeserializer and remove related deprecated methods from ATNSimulator * Fix for #3557 (getting "go test" to work again). * Convert Python2/3 to use int arrays not strings for ATN encodings (#3561) * Convert Python2/3 to use int arrays not strings for ATN encodings. Also make target indicate int vs string. * rename and reverse ATNSerializedAsInts * add override * remove unneeded method * [C++] Drastically improve multi-threaded performance (#3550) Thanks guys. A major advancement. * [C++] Remove duplicate includes and remove unused includes (#3563) * [C++] Lazily deserialize ATN in generated code (#3562) * [Docs] Update Swift Docs (#3458) * Add Swift Package Manager Support * Swift Package Dynamic * 【fix】【test】Fix run process path * [Docs] [Swift] update link, remove expired descriptions Co-authored-by: Terence Parr <parrt@cs.usfca.edu> * Ascii only ATN serialization (#3566) * go back to generating pure ascii ATN serializations to avoid issues where target compilers might assume ascii vs utf-8. * forgot I had to change php on previous ATN serialization tweak. * change how we escapeChar() per target. * oops; gotta use escapeChar method * rm unneeded case * add @OverRide * use ints not chars for C# (#3567) * use ints not chars for C# * oops. remove 'quotes' * regen from XPathLexer.g4 * simplify ATN with bypass alts mechanism in Java. * Change string to int[] for serialized ATN for C#; removed unneeded `use System` from XPathLexer.g4; regen that grammar. * [C++] Use camel case name in generated lexers and parsers (#3565) * Change string to int array for serialized ATN for JavaScript (#3568) * perf: Add default implementation for Visit in ParseTreeVisitor. (#3569) * perf: Add default implementation for Visit in ParseTreeVisitor. Reference: https://github.com/antlr/antlr4/blob/ad29539cd2e94b2599e0281515f6cbb420d29f38/runtime/Java/src/org/antlr/v4/runtime/tree/AbstractParseTreeVisitor.java#L18 * doc: add contributor * Don't use utf decoding...these are just ints (#3573) * [Go] Cleanup and fix ATN deserialization verification (#3574) * [C++] Force generated static data type name to titlecase (#3572) * Use int array not string for ATN in Swift (#3575) * [C++] Fix generated Lexer static data constructor (#3576) * Use int array not string for ATN in Dart (#3578) * Fix PHP codegen to support int ATN serialization (#3579) * Update listener documentation to satisfy the discussion about improving exception handling: #3162 * tweak * [C++] Remove unused LexerATNSimulator::match_calls (#3570) * [C++] Remove unused LexerATNSimulator::match_calls * Remove match_calls from other targets * [Java] Preserve serialized ATN version 3 compatibility (#3583) * add jcking to the contributors list * Update releasing-antlr.md * [C++] Avoid using dynamic_cast where possible by using hand rolled RTTI (#3584) * Revert "[Java] Preserve serialized ATN version 3 compatibility (#3583)" This reverts commit 01bc811. * [C++] Add ANTLR4CPP_PUBLIC attributes to various symbols (#3588) * Update editorconfig for c++ (#3586) * Make it easier to contribute: Add c++ configuration for .editorconfig. Using the observed style with 2 indentation spaces. Signed-off-by: Henner Zeller <hzeller@google.com> * Add hzeller to contributors.txt Signed-off-by: Henner Zeller <hzeller@google.com> * Fix code style and typing to support PHP 8 (#3582) * [Go] Port locking algorithm from C++ to Go (#3571) * Use linux DCO not our old contributors certificate of origin * [C++] Fix bugs in SemanticContext (#3595) * [Go] Do not export Array2DHashSet which is an implementation detail (#3597) * Revert "Use linux DCO not our old contributors certificate of origin" This reverts commit b0f8551. * Use signed ints for ATN serialization not uint16, except for java (#3591) * refactor serialize so we don't need comments * more cleanup during refactor * store language in serializer obj * A lexer rule token type should never be -1 (EOF). 0 is fragment but then must be > 0. * Go uses int not uint16 for ATN now. java/go/python3 pass * remove checks for 0xFFFF in Go. * C++ uint16_t to int for ATN. * add mac php dir; fix type on accept() for generated code to be mixed. * Add test from @KvanTTT. This PR fixes #3555 for non-Java targets. * cleanup and add big lexer from #3546 * increase mvn mem size to 2G * increase mvn mem size to 8G * turn off the big ATN lexer test as we have memory issues during testing. * Fixes #3592 * Revert "C++ uint16_t to int for ATN." This reverts commit 4d2ebbf. # Conflicts: # runtime/Cpp/runtime/src/atn/ATNSerializer.cpp # runtime/Cpp/runtime/src/tree/xpath/XPathLexer.cpp * C++ uint16_t to int32_t for ATN. * rm unnecessary include file, updating project file. get rid of the 0xFFFF does in the C++ deserialization * rm refs to 0xFFFF in swift * javascript tests were running as Node...added to ignore list. * don't distinguish between 16 and 32 bit char sets in serialization; Python2/3 updated to work with this change. * update C++ to deserialize only 32-bit sets * 0xFFFF -> -1 for C++ target. * get other targets to use 32-bit sets in serialization. tests pass locally. * refactor to reduce code size * add comment * oops. comment out call to writeSerializedATNIntegerHistogram(). I wonder if this is why it ran out of memory during testing? * all but Java, Node, PHP, Go work now for the huge lexer file; I have set them to ignore. note that the swift target takes over a minute to lex it. I've turned off Node but it does not seem to terminate but it could terminate eventually. * all but Java, Node, PHP, Go work now for the huge lexer file; I have set them to ignore. note that the swift target takes over a minute to lex it. I've turned off Node but it does not seem to terminate but it could terminate eventually. * Turn off this big lexer because we get memory errors during continuous integration * Intermediate commit where I have shuffled around all of the -1 flipping and bumping by two. work still needs to be done because the token stream rewriter stuff fails. and I assume the other decoding for human readability testing if doesn't work * convert decode to use int[]; remove dead code. don't use serializeAsChar stuff. more tests pass. * more tests passing. simplify. When copying atn, must run ATN through serializer to set some state flags. * 0xFFFD+ are not valid char * clean up. tests passing now * huge clean up. Got Java working with 32-bit ATNs!Still working on cleanup but I want to run the tests * Cleanup the hack I did earlier; everything still seems to work * Use linux DCO not our old contributors certificate of origin * remove bump-by-2 code * clean up per @KvanTTT. Can't test locally on this box. Will see what CI says. * tweak comment * Revert "Use linux DCO not our old contributors certificate of origin" This reverts commit b0f8551. * see if C++ works in CI for huge ATN * Use linux DCO not our old contributors certificate of origin (#3598) * Use linux DCO not our old contributors certificate of origin * Revert "Use linux DCO not our old contributors certificate of origin" This reverts commit b0f8551. * use linux DCO * use linux DCO * Use linux DCO not our old contributors certificate of origin * update release documentation Signed-off-by: Terence Parr <parrt@antlr.org> * Equivalent of #3537 * clean up setup * clean up doc version * [Swift] improvements to equality functions (#3302) * fix default equality * equality cases * optional unwrapping * [Swift] Use for in loops (#3303) * common for in loops * reversed loop * drop first loop * for in with default BitSet * [Go] Fix symbol collision in generated lexers and parsers (#3603) * [C++] Refactor and optimize SemanticContext (#3594) * [C++] Devirtualize hand rolled RTTI for performance (#3609) * [C++] Add T::is for type hierarchy checks and remove some dynamic_cast (#3612) * [C++] Avoid copying statically generated serialized ATNs (#3613) * [C++] Refactor PredictionContext and yet more performance improvements (#3608) * [C++] Cleanup DFA, DFAState, LexerAction, and yet more performance improvements (#3615) * fix dependabot issues * [Swift] use stdlib (single pass) (#3602) * this was added to the stdlib in Swift 5 * &>> is defined as lhs >> (rhs % lhs.bitwidth) * the stdlib has these * reduce loops * use indices * append(contentsOf:) * Array literal init works for sets too! * inline and remove bit query functions * more optional handling (#3605) * [C++] Minor improvements to PredictionContext (#3616) * use php runtime dev branch to test dev * update doc to be more explicit about the interaction between lexer actions and semantic predicates; Fixes #3611. Fixes #3606. Signed-off-by: Terence Parr <parrt@antlr.org> * Refactor js runtime in preparation of future improvements * refactor, 1 file per class, use import, use module semantics, use webpack 5, use eslint * all tests pass * simplifications and alignment with standard js idioms * simplifications and alignment with standard js idioms * support reading legacy ATN * support both module and non-module imports * fix failing tests * fix failing tests * No longer necessary too generate sets or single atom transit that are bigger than 16bits. (#3620) * Updated getting started with Cpp documentation. (#3628) Included specific examples of using ANTLR4_TAG and ANTLR4_ZIP_REPOSITORY in the sample CMakeLists file. * [C++] Free ATNConfig lookup set in readonly ATNConfigSet (#3630) * [C++] Implement configurable PredictionContextMergeCache (#3627) * Allow to choose to switch off building tests in C++ (#3624) The new option to cmake ANTLR_BUILD_CPP_TESTS is default on (so the behavior is as before), but it provides a way to switch off if not needed. The C++ tests pull in an external dependency (googletests), which might conflict if ANTLR is used as a subproject in another cmake project. Signed-off-by: Henner Zeller <h.zeller@acm.org> * Fix NPE for undefined label, fix #2788 * An interval ought to be a value Interval was a pointer to 2 Ints it ought to be just 2 Ints, which is smaller and more semantically correct, with no need for a cache. However, this technically breaks metadata and AnyObject conformance but people shouldn't be relying on those for an Interval. * [C++] Remove more dynamic_cast usage * [C++] Introduce version macros * add license prefix * Prep 4.10 (#3599) * Tweak doc * Swift was referring to hardcoded version * Start version update script. * add files to update * clean up setup * clean up setup * clean up setup * don't need file * don't need file * Fixes #3600. add instructions and associated code necessary to build the xpath lexers. * clean up version nums * php8 * php8 * php8 * php8 * php8 * php8 * php8 * php8 * tweak doc * ok, i give up. php won't bump up too v8 * tweak doc * version number bumped to 4.10 in runtime. * Change the doc for releasing and update to use latest ST 4.3.2 * fix dart version to 4.10.0 * cmd files Cannot use export bash command. * try fixing php ci again * working on deploy Signed-off-by: Terence Parr <parrt@antlr.org> * php8 always install. * set js to 4.10.0 not 4.10 * turn off apt update for php circleci * try w/o cimg/php * try setting branch * ok i give up * tweak * update docs for release. * php8 circleci * use 3.5.3 antlr * use 3.5.3-SNAPSHOT antlr * use full 3.5.3 antlr * [Swift] reduce Optionals in APIs (#3621) * ParserRuleContext.children see comment in removeLastChild * TokenStream.getText * Parser._parseListeners this might require changes to the code templates? * ATN {various} * make computeReachSet return empty, not nil * overrides refine optionality * BufferedTokenStream getHiddenTokensTo{Left, Right} return empty not nil * Update Swift.stg * avoid breakage by adding overload of `getText` in extension * tweak to kick off build Signed-off-by: Terence Parr <parrt@antlr.org> * try parallelism: 4 circleci * Revert "[Swift] reduce Optionals in APIs (#3621)" This reverts commit b5ccba0. * tweaks to doc * Improve the deploy script and tweak the released doc. * use 4.10 not Snapshot for scripts Co-authored-by: Ivan Kochurkin <kvanttt@gmail.com> Co-authored-by: Alexandr <60813335+Alex-Andrv@users.noreply.github.com> Co-authored-by: 100mango <100mango@users.noreply.github.com> Co-authored-by: Biswapriyo Nath <nathbappai@gmail.com> Co-authored-by: Benjamin Spiegel <bspiegel11@gmail.com> Co-authored-by: Justin King <jcking@google.com> Co-authored-by: Eric Vergnaud <eric.vergnaud@wanadoo.fr> Co-authored-by: Harry Chan <harry.chan@codersatlas.com> Co-authored-by: Ken Domino <kenneth.domino@domemtech.com> Co-authored-by: chenquan <chenquan.dev@gmail.com> Co-authored-by: Marcos Passos <marcospassos@users.noreply.github.com> Co-authored-by: Henner Zeller <h.zeller@acm.org> Co-authored-by: Dante Broggi <34220985+Dante-Broggi@users.noreply.github.com> Co-authored-by: chris-miner <94078897+chris-miner@users.noreply.github.com>
I have a grammar with rules named xif and xcontinue and others, which were renamed from if/continue/etc due to antlr creating variables with these names and clashing with java. It would be more polished if antlr distinguished its identifiers from any possible back-end reserved words somehow.
A consistent prefixing scheme might suffice, preventing clashes but keep a human-comprehensible correspondence between the parser and the generated back-end code (perhaps have the prefix 'ANTLR4_' as a reserved prefix for antlr itself?).
The text was updated successfully, but these errors were encountered: