Skip to content

Commit

Permalink
Add "writing target-agnostic grammars".
Browse files Browse the repository at this point in the history
Signed-off-by: Ken Domino <kenneth.domino@domemtech.com>
  • Loading branch information
kaby76 authored and parrt committed Jun 28, 2023
1 parent f024ec1 commit b6ea82f
Show file tree
Hide file tree
Showing 2 changed files with 59 additions and 37 deletions.
2 changes: 2 additions & 0 deletions doc/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,8 @@ For those using Java, here's a great [set of ANTLR in Intellij notes](https://do

* [Parser and lexer interpreters](interpreters.md)

* [Writing target-agnostic grammars](target-agnostic-grammars.md)

* [Resources](resources.md)

# Building / releasing ANTLR itself
Expand Down
94 changes: 57 additions & 37 deletions doc/target-agnostic-grammars.md
Original file line number Diff line number Diff line change
@@ -1,46 +1,66 @@
# Target agnostic grammars
# Writing target-agnostic grammars

If your grammar is targeted to Python only, you may ignore the following. But if your goal is to get your Java parser to also run in Python, then you might find it useful.
Some grammars may require side-effecting
actions or [semantic predicates](https://github.com/antlr/antlr4/blob/dev/doc/predicates.md)
in order to add context-sensitive parsing to what would normally be a context-free grammar.

1. Do not embed production code inside your grammar. This is not portable and will not be. Move all your code to listeners or visitors.
1. The only production code absolutely required to sit with the grammar should be semantic predicates, like:
For example:
* In Fortran90, [lines that being with a 'C' in column 1
denote a comment](https://github.com/antlr/grammars-v4/blob/43fbb16fec1d474d38a603cc6a6bcbe5edf07b1e/fortran/fortran90/slow/hw.f90#L1).
These types of comments should be tokenized and placed on a comment token stream,
not the default token stream. But, if the 'C' does not begin in
column 1, then the input is invalid and should be flagged as so.
```fortran
c Hello World.
c This is a syntax error because 'c' does not start in column 1
program hello
print *, 'Hello World!'
end
```
ID {$text.equals("test")}?
```

Unfortunately, this is not portable, as Java and Python (and other target languages) have different syntaxes for all but the simplest language elements. But you can work around it. The trick involves:

* deriving your parser from a parser you provide, such as BaseParser
* implementing utility methods, such as "isEqualText", in this BaseParser, in different files for each target language
* invoking your utility methods in the semantic predicate from the `$parser` object

Thanks to the above, you should be able to rewrite the above semantic predicate as follows:

File `MyGrammarParser.g4`:
```
options { superClass = MyGrammarBaseParser; }
...
ID {$parser.isEqualText($text,"test")}?
* In CSharp, double angle-brackets `'>>'` can either mean
[a right shift expression](https://github.com/antlr/grammars-v4/blob/43fbb16fec1d474d38a603cc6a6bcbe5edf07b1e/csharp/examples/AllInOneNoPreprocessor.cs#L657C15-L657C17)
or [a declaration that contains a type with
nested templates](https://github.com/antlr/grammars-v4/blob/master/csharp/examples/AllInOneNoPreprocessor.cs#L463C33-L463C35).
Since lexers in Antlr are not parser aware,
the lexer must tokenize the double angle-brackets as two separate tokens, one for each angle bracket.
A semantic predicate should be added to disallow a space between the angle brackets when parsing an
expression.
```C#
class Foo {
void Func()
{
int x = 1000 > > 2; // syntax error if a space exists in the double-angle bracket
}
Dictionary<int, List<int> > mapping; // nested template declaration, valid
}
```

File `MyGrammarBaseParser.py`:
```python
from antlr4 import *

class MyGrammarBaseParser(Parser):
Antlr does not have a general-purpose language for actions and predicates;
these must be written in the target language of the generated the parser.
The problem is that if one wants a grammar that "works" across a number of
targets, like CSharp, Java, and Cpp, the grammar would need to be forked for
each target. This adds a
burdon of maintaining a version of the grammar for each target. _Or, so it would
seem so._

def isEqualText(a, b):
return a is b
```
It turns out that it is possible to write the grammar
so that can appear to be target-independent called _target agnostic format_.

File `MyGrammarBaseParser.java`:
```java
import org.antlr.v4.runtime.*;
## Rules in writing target-agnostic grammars

public abstract class MyGrammarBaseParser extends Parser {
1) Move all actions and semantic predicate written in target-specific
code to base-class methods.
2) Replace all actions and semantic predicates in the grammar with a single
call to the method. For actions, use `{ this.name_of_action() }`. For predicates,
use `{ this.name_of_predicate() }?`.
3) Add a Python script called "transformGrammar.py" that replaces strings
in the grammar per target.
a) For Cpp: replace `this.` strings with `this->`.
b) For PHP: replace `this.` strings with `$this->`.
c) For Python: replace `this.` strings with `self.`, `l.`, or `p.` depending on
where the action or predicate is in the grammar.
d) Run `python transformGrammar.py *.g4` prior to generating the parser and lexer.

public static boolean isEqualText(a, b) {
return a.equals(b);
}
}
```
## Examples of target agnostic grammars
* [fortran90](https://github.com/antlr/grammars-v4/tree/master/fortran/fortran90)
* [csharp](https://github.com/antlr/grammars-v4/tree/master/csharp)

0 comments on commit b6ea82f

Please sign in to comment.