Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python backend #485

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 98 additions & 0 deletions docs/user_guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -284,3 +284,101 @@ BNFC adds the grammar name as a file extension. So if the grammar file is
named ``Calc.cf``, the lexer will be associated to the file extension
``.calc``. To associate other file extensions to a generated lexer, you need to
modify (or subclass) the lexer.

Python Backend
===============

The BNF Converter's Python Backend generates a Python frontend, that uses
Antlr4, to parse input into an AST (abstract syntax tree).

The python package Antlr4, the jar for Antlr4 and Python 3.10 or higher is needed.

Example usage: ::

bnfc --python -m Calc.cf


.. list-table:: The result is a set of files:
:widths: 25 25
:header-rows: 1

* - Filename
- Description
* - bnfcPyGenCalc/CalcLexer.g4
- Provides the grammar for the lexer.
* - bnfcPyGenCalc/CalcParser.g4
- Provides the grammar for the parser.
* - bnfcPyGenCalc/Absyn.py
- Provides the classes for the abstract syntax.
* - bnfcPyGenCalc/PrettyPrinter.py
- Provides printing for both the AST and the linearized tree.
* - genTest.py
- A ready test-file, that uses the generated frontend to convert input into an AST.
* - skele.py
- Provides skeleton code to deconstruct an AST, using structural pattern matching.
* - Makefile
- The makefile, which uses an Antlr jar file to produce the lexer and parser for Python.

Make sure the jar for Antlr is accessible from the generated makefile and
run the makefile. For example, on linux, one can export the following
variable from ``.profile``:

``export ANTLR="$HOME/Downloads/antlr/antlr-4.13.2-complete.jar"``

Subsequently run ``make``. The generated lexer and parser is placed inside the
folder used above.

Testing the frontend
....................

It's possible to pipe input, like::

echo "(1 + 2) * 3" | python3 genTest.py

or::

python3 genTest.py < file.txt

and it's possible to just use an argument::

python3 genTest.py file.txt


Caveats
.......

Maximum elements for hand-made lists:
If one defines custom rules for lists, such as::

(:) [C] ::= 'a' C 'b' [C] 'c'

the Python backend can not simplify the rule for an iterative approach
for the parser, meaning at most 1000 elements can be parsed - or a maximum
recursion depth will be thrown. Using the terminal or separator pragmas
should work fine.

Skeleton code for using lists as entrypoints:
Matchers for using lists, such as [Exp], are not generated in the
skeleton code as it may confuse users if the grammar uses several different
list categories, as a user may then try to pattern match lists without
checking what type the elements have. Users are instead encouraged to use
non-list entrypoints.

Several entrypoints:
The testfile genTest.py only uses the first entrypoint used by default.

Using multiple separators:
Using multiple separators for the same category, such as below, generates
Python functions with overlapping names, causing runtime errors.::

separator Exp1 "," ;
separator Exp1 ";" ;

Results from the parameterized tests:
One error among the regression tests are reported: the Java BNFC example
grammar contains mutually left recursive rules.

Escaped characters in haskell-hcr:
Attempting to parse ParCore.hcr from the BNFC example grammar
haskell-hcr yield errors for escaped characters.

230 changes: 230 additions & 0 deletions document/BNF_Converter_Python_Mode.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,230 @@
<!DOCTYPE html>
<head>
<meta http-equiv="content-type"
content="text/html; charset=ISO-8859-1">
<title>BNF Converter Python Mode</title>
</head>
<style>
table {
font-family: arial, sans-serif;
border-collapse: collapse;
width: 100%;
}

td, th {
text-align: left;
padding: 4px;
}

</style>
<body>
<div style="text-align: center;">
<h2>BNF Converter</h2>
<h2>Python Mode</h2>
</div>
<h3>By Björn Werner</h3>

<h3>2024</h3>
<p>
The BNF Converter's Python Backend generates a Python frontend, that uses
Antlr4, to parse input into an AST (abstract syntax tree).
</p>
<p>
BNFC on Github:<br>
<a href="https://github.com/BNFC/bnfc">https://github.com/BNFC/bnfc</a>
</p>
<p>
Antlr on Github:<br>
<a href="https://github.com/antlr/antlr4">https://github.com/antlr/antlr4</a>
</p>
<p>
Requirements are: the jar file for ANTLRv4, the Python package antlr4, and
Python 3.10 or higher.
</p>
<h3>Usage</h3>
<div style="margin-left: 40px; "><big><span style="font-family: monospace; ">
bnfc --python -m NAME.cf</span></big><br style="font-family: monospace; ">
</div>
<p>
There should now exist the following files:
</p>
<table style="padding: 1cm;">
<tr>
<th>Filename:</th><th>Description:</th>
</tr>
<tr>
<td>bnfcPyGenNAME/NAMELexer.g4</td><td>Provides the grammar for the lexer.</td>
</tr>
<tr>
<td>bnfcPyGenNAME/NAMEParser.g4</td><td>Provides the grammar for the parser.</td>
</tr>
<tr>
<td>bnfcPyGenNAME/Absyn.py</td><td>Provides the classes for the abstract syntax.</td>
</tr>
<tr>
<td>bnfcPyGenNAME/PrettyPrinter.py</td><td>Provides printing for both the AST and the linearized tree.</td>
</tr>
<tr>
<td>genTest.py</td><td>A ready test-file, that uses the generated frontend to convert input into an AST.</td>
</tr>
<tr>
<td>skele.py</td><td>Provides skeleton code to deconstruct an AST, using structural pattern matching.</td>
</tr>
</table>
<p>
Make sure the jar for Antlr is accessible from the generated makefile and run the makefile. The generated lexer and parser is placed inside the folder used above.
</p>
<p>
For example, on linux, export the following variable from .profile:
</p>
<div style="margin-left: 40px; "><big><span style="font-family: monospace; ">
export ANTLR="$HOME/Downloads/antlr/antlr-4.13.2-complete.jar"</span></big><br style="font-family: monospace; ">
</div>
<p>
After that it should be possible to run the makefile:
</p>
<div style="margin-left: 40px; "><big><span style="font-family: monospace; ">
make</span></big><br style="font-family: monospace; ">
</div>
<h3>Testing the frontend</h3>
<p>
The following example uses a frontend that is generated from a C-like grammar.
</p>
<p style="font-family: monospace;">
$ python3 genTest.py < hello.c
</p>
<p style="font-family: monospace;">
Parse Successful!<br>
<br>
[Abstract Syntax]<br>
(PDefs [(DFun Type_int "main" [] [(SExp (EApp "printString" [(EString "Hello world")])), (SReturn (EInt 0))])])<br>
<br>
[Linearized Tree]<br>
int main ()<br>
{<br>
&nbsp;printString ("Hello world");<br>
&nbsp;return 0;<br>
}<br>
</p>
<h3>The Abstract Syntax Tree</h3>
<p>
The AST is built up using instances of Python classes, using the dataclass decorator, such as:
</p>
<p style="font-family: monospace;">
@dataclass<br>
class EAdd:<br>
&nbsp;exp_1: Exp<br>
&nbsp;exp_2: Exp<br>
&nbsp;_ann_type: _AnnType = field(default_factory=_AnnType)
</p>
<p>
The "_ann_type" variable is a placeholder that can be used to store useful information,
for example type-information in order to create a type-annotated AST.
</p>
<h3>Using the skeleton file</h3>
<p>
The skeleton file serves as a template, to create an interpreter for example.
Two different types of matchers are generated: the first with all the value
categories together, and a second type where each matcher only has one
individual value category, as in the example below:
</p>
<p style="font-family: monospace;">
def matcherExp(exp_: Exp):<br>
&nbsp;match exp_:<br>
&nbsp;&nbsp;case EAdd(exp_1, exp_2, _ann_type):<br>
&nbsp;&nbsp;&nbsp;# Exp "+" Exp1<br>
&nbsp;&nbsp;&nbsp;raise Exception('EAdd not implemented')<br>
&nbsp;&nbsp;case ESub(exp_1, exp_2, _ann_type):<br>
&nbsp;&nbsp;&nbsp;...
</p>
<p>
This can be modified, in order to return the addition of each evaluated argument
category, into:
</p>
<p style="font-family: monospace;">
def matcherExp(exp_: Exp):<br>
&nbsp;match exp_:<br>
&nbsp;&nbsp;case EAdd(exp_1, exp_2, _ann_type):<br>
&nbsp;&nbsp;&nbsp;# Exp "+" Exp1<br>
&nbsp;&nbsp;&nbsp;return matcherExp(exp_1) + matcherExp(exp_2)<br>
&nbsp;&nbsp;case ESub(exp_1, exp_2, _ann_type):<br>
&nbsp;&nbsp;&nbsp;...
</p>
<p>
The function can now be imported and used in the generated test file
(similarly to how the pretty printer is imported and used):
</p>
<p style="font-family: monospace;">
from skele import matcherExp<br>
...<br>
print(matcherExp(ast))
</p>

<h3>Known issues</h3>
<h4>
Maximum elements for hand-made list rules:
</h4>
<p>
If one defines custom rules for lists, such as:
</p>
<p style="font-family: monospace;">
(:) [C] ::= 'a' C 'b' [C] 'c'
</p>
<p>
the Python backend can not simplify the rule for an iterative approach
for the parser, meaning at most 1000 elements can be parsed - or a maximum
recursion depth will be thrown. Using the terminal or separator pragmas should work fine.
</p>
<h4>
Skeleton code for using lists as entrypoints:
</h4>
<p>
Matchers for using lists, such as [Exp], are not generated in the
skeleton code as it may confuse users if the grammar uses several different
list categories - as a user may then try to pattern match lists without
checking what type the elements have. Users are instead encouraged to use
non-list entrypoints.
</p>
<p>
The improper way to iterate over lists, as the value category is unknown:
</p>
<p style="font-family: monospace;">
&nbsp;case list():<br>
&nbsp;&nbsp;for ele in ast:<br>
&nbsp;&nbsp;&nbsp;...
</p>
<p>
The proper way to deconstruct lists, where we know the value category:
</p>
<p style="font-family: monospace;">
&nbsp;case RuleName(listexp_):<br>
&nbsp;&nbsp;for exp in listexp_:<br>
&nbsp;&nbsp;&nbsp;...
</p>
<h4>Several entrypoints:</h4>
<p>
The testfile genTest.py only uses the first entrypoint by default.
</p>
<h4>
Using multiple separators:
</h4>
<p>
Using multiple separators for the same category, such as below, generates
Python functions with overlapping names, causing runtime errors.
</p>
<p style="font-family: monospace;">
separator Exp1 "," ;<br>
separator Exp1 ";" ;
</p>
<h4>
Results from the parameterized tests:
</h4>
<p>
One error among the regression tests are reported: the Java BNFC example grammar contains mutually left recursive rules.
</p>
<h4>
Example for grammar haskell-hcr:
</h4>
<p>
Attempting to parse ParCore.hcr from the haskell-hcr example BNFC grammar yields an error for escaped characters.
</p>
28 changes: 14 additions & 14 deletions source/BNFC.cabal
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,8 @@ Description:

-- Support range when build with cabal
tested-with:
GHC == 9.10.1
GHC == 9.8.2
GHC == 9.6.5
GHC == 9.8.1
GHC == 9.6.3
GHC == 9.4.8
GHC == 9.2.8
GHC == 9.0.2
Expand All @@ -44,6 +43,7 @@ tested-with:
GHC == 8.4.4
GHC == 8.2.2
GHC == 8.0.2
GHC == 7.10.3

extra-doc-files:
README.md
Expand Down Expand Up @@ -81,9 +81,6 @@ executable bnfc
other-modules:
-- Generated by cabal
Paths_BNFC
autogen-modules:
-- Generated by cabal
Paths_BNFC
default-extensions:
-- Keep in alphabetical order.
LambdaCase
Expand Down Expand Up @@ -157,14 +154,6 @@ library
-- BNFC.Lex
-- -- Generated by happy
-- BNFC.Par
-- 2023-11-03 We cannot add BNFC.{Lex,Par} as then the Lex.x and Par.y files
-- are not bundled by cabal dist.
-- Just make sure that there is no src/BNFC/{Lex,Par}.hs before running cabal sdist,
-- otherwise we will end up with both Lex.hs and Lex.x (resp. Par.{hs,y})
-- which will cause alex/happy to not be run, leading to build failures.
autogen-modules:
-- Generated by cabal
Paths_BNFC
other-modules:
-- Generated by cabal
Paths_BNFC
Expand Down Expand Up @@ -266,6 +255,17 @@ library
BNFC.Backend.Java.RegToAntlrLexer
BNFC.Backend.Java.Utils

-- Python backend
BNFC.Backend.Python
BNFC.Backend.Python.CFtoPyAbs
BNFC.Backend.Python.CFtoPyPrettyPrinter
BNFC.Backend.Python.RegToFlex
BNFC.Backend.Python.PyHelpers
BNFC.Backend.Python.CFtoPySkele
BNFC.Backend.Python.CFtoAntlr4Lexer
BNFC.Backend.Python.CFtoAntlr4Parser
BNFC.Backend.Python.Antlr4Utils

-- XML backend
BNFC.Backend.XML

Expand Down
Loading