-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add "writing target-agnostic grammars".
Signed-off-by: Ken Domino <kenneth.domino@domemtech.com>
- Loading branch information
Showing
2 changed files
with
59 additions
and
37 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,46 +1,66 @@ | ||
# Target agnostic grammars | ||
# Writing target-agnostic grammars | ||
|
||
If your grammar is targeted to Python only, you may ignore the following. But if your goal is to get your Java parser to also run in Python, then you might find it useful. | ||
Some grammars may require side-effecting | ||
actions or [semantic predicates](https://github.com/antlr/antlr4/blob/dev/doc/predicates.md) | ||
in order to add context-sensitive parsing to what would normally be a context-free grammar. | ||
|
||
1. Do not embed production code inside your grammar. This is not portable and will not be. Move all your code to listeners or visitors. | ||
1. The only production code absolutely required to sit with the grammar should be semantic predicates, like: | ||
For example: | ||
* In Fortran90, [lines that being with a 'C' in column 1 | ||
denote a comment](https://github.com/antlr/grammars-v4/blob/43fbb16fec1d474d38a603cc6a6bcbe5edf07b1e/fortran/fortran90/slow/hw.f90#L1). | ||
These types of comments should be tokenized and placed on a comment token stream, | ||
not the default token stream. But, if the 'C' does not begin in | ||
column 1, then the input is invalid and should be flagged as so. | ||
```fortran | ||
c Hello World. | ||
c This is a syntax error because 'c' does not start in column 1 | ||
program hello | ||
print *, 'Hello World!' | ||
end | ||
``` | ||
ID {$text.equals("test")}? | ||
``` | ||
|
||
Unfortunately, this is not portable, as Java and Python (and other target languages) have different syntaxes for all but the simplest language elements. But you can work around it. The trick involves: | ||
|
||
* deriving your parser from a parser you provide, such as BaseParser | ||
* implementing utility methods, such as "isEqualText", in this BaseParser, in different files for each target language | ||
* invoking your utility methods in the semantic predicate from the `$parser` object | ||
|
||
Thanks to the above, you should be able to rewrite the above semantic predicate as follows: | ||
|
||
File `MyGrammarParser.g4`: | ||
``` | ||
options { superClass = MyGrammarBaseParser; } | ||
... | ||
ID {$parser.isEqualText($text,"test")}? | ||
* In CSharp, double angle-brackets `'>>'` can either mean | ||
[a right shift expression](https://github.com/antlr/grammars-v4/blob/43fbb16fec1d474d38a603cc6a6bcbe5edf07b1e/csharp/examples/AllInOneNoPreprocessor.cs#L657C15-L657C17) | ||
or [a declaration that contains a type with | ||
nested templates](https://github.com/antlr/grammars-v4/blob/master/csharp/examples/AllInOneNoPreprocessor.cs#L463C33-L463C35). | ||
Since lexers in Antlr are not parser aware, | ||
the lexer must tokenize the double angle-brackets as two separate tokens, one for each angle bracket. | ||
A semantic predicate should be added to disallow a space between the angle brackets when parsing an | ||
expression. | ||
```C# | ||
class Foo { | ||
void Func() | ||
{ | ||
int x = 1000 > > 2; // syntax error if a space exists in the double-angle bracket | ||
} | ||
Dictionary<int, List<int> > mapping; // nested template declaration, valid | ||
} | ||
``` | ||
|
||
File `MyGrammarBaseParser.py`: | ||
```python | ||
from antlr4 import * | ||
|
||
class MyGrammarBaseParser(Parser): | ||
Antlr does not have a general-purpose language for actions and predicates; | ||
these must be written in the target language of the generated the parser. | ||
The problem is that if one wants a grammar that "works" across a number of | ||
targets, like CSharp, Java, and Cpp, the grammar would need to be forked for | ||
each target. This adds a | ||
burdon of maintaining a version of the grammar for each target. _Or, so it would | ||
seem so._ | ||
|
||
def isEqualText(a, b): | ||
return a is b | ||
``` | ||
It turns out that it is possible to write the grammar | ||
so that can appear to be target-independent called _target agnostic format_. | ||
|
||
File `MyGrammarBaseParser.java`: | ||
```java | ||
import org.antlr.v4.runtime.*; | ||
## Rules in writing target-agnostic grammars | ||
|
||
public abstract class MyGrammarBaseParser extends Parser { | ||
1) Move all actions and semantic predicate written in target-specific | ||
code to base-class methods. | ||
2) Replace all actions and semantic predicates in the grammar with a single | ||
call to the method. For actions, use `{ this.name_of_action() }`. For predicates, | ||
use `{ this.name_of_predicate() }?`. | ||
3) Add a Python script called "transformGrammar.py" that replaces strings | ||
in the grammar per target. | ||
a) For Cpp: replace `this.` strings with `this->`. | ||
b) For PHP: replace `this.` strings with `$this->`. | ||
c) For Python: replace `this.` strings with `self.`, `l.`, or `p.` depending on | ||
where the action or predicate is in the grammar. | ||
d) Run `python transformGrammar.py *.g4` prior to generating the parser and lexer. | ||
|
||
public static boolean isEqualText(a, b) { | ||
return a.equals(b); | ||
} | ||
} | ||
``` | ||
## Examples of target agnostic grammars | ||
* [fortran90](https://github.com/antlr/grammars-v4/tree/master/fortran/fortran90) | ||
* [csharp](https://github.com/antlr/grammars-v4/tree/master/csharp) |