BNFC · AiStudent · Aug 27, 2024 · Aug 27, 2024 · Aug 27, 2024 · Aug 27, 2024
diff --git a/docs/user_guide.rst b/docs/user_guide.rst
@@ -284,3 +284,101 @@ BNFC adds the grammar name as a file extension. So if the grammar file is
 named ``Calc.cf``, the lexer will be associated to the file extension
 ``.calc``. To associate other file extensions to a generated lexer, you need to
 modify (or subclass) the lexer.
+
+Python Backend
+===============
+
+The BNF Converter's Python Backend generates a Python frontend, that uses 
+Antlr4, to parse input into an AST (abstract syntax tree).
+
+The python package Antlr4, the jar for Antlr4 and Python 3.10 or higher is needed.
+
+Example usage: ::
+
+    bnfc --python -m Calc.cf
+
+
+.. list-table:: The result is a set of files:
+   :widths: 25 25
+   :header-rows: 1
+
+   * - Filename
+     - Description
+   * - bnfcPyGenCalc/CalcLexer.g4
+     - Provides the grammar for the lexer.
+   * - bnfcPyGenCalc/CalcParser.g4
+     - Provides the grammar for the parser.
+   * - bnfcPyGenCalc/Absyn.py
+     - Provides the classes for the abstract syntax.
+   * - bnfcPyGenCalc/PrettyPrinter.py
+     - Provides printing for both the AST and the linearized tree.
+   * - genTest.py
+     - A ready test-file, that uses the generated frontend to convert input into an AST.
+   * - skele.py
+     - Provides skeleton code to deconstruct an AST, using structural pattern matching.
+   * - Makefile
+     - The makefile, which uses an Antlr jar file to produce the lexer and parser for Python.
+
+Make sure the jar for Antlr is accessible from the generated makefile and
+run the makefile. For example, on linux, one can export the following
+variable from ``.profile``:
+
+``export ANTLR="$HOME/Downloads/antlr/antlr-4.13.2-complete.jar"``
+
+Subsequently run ``make``. The generated lexer and parser is placed inside the
+folder used above.
+
+Testing the frontend
+....................
+
+It's possible to pipe input, like::
+
+    echo "(1 + 2) * 3" | python3 genTest.py
+
+or::
+
+    python3 genTest.py < file.txt
+
+and it's possible to just use an argument::
+
+    python3 genTest.py file.txt
+
+
+Caveats
+.......
+
+Maximum elements for hand-made lists:
+  If one defines custom rules for lists, such as::
+
+    (:) [C] ::= 'a' C 'b' [C] 'c'
+
+  the Python backend can not simplify the rule for an iterative approach
+  for the parser, meaning at most 1000 elements can be parsed - or a maximum
+  recursion depth will be thrown. Using the terminal or separator pragmas
+  should work fine.
+
+Skeleton code for using lists as entrypoints:
+  Matchers for using lists, such as [Exp], are not generated in the
+  skeleton code as it may confuse users if the grammar uses several different 
+  list categories, as a user may then try to pattern match lists without 
+  checking what type the elements have. Users are instead encouraged to use
+  non-list entrypoints. 
+
+Several entrypoints:
+  The testfile genTest.py only uses the first entrypoint used by default.
+
+Using multiple separators:
+  Using multiple separators for the same category, such as below, generates
+  Python functions with overlapping names, causing runtime errors.::
+
+    separator Exp1 "," ;
+    separator Exp1 ";" ;
+
+Results from the parameterized tests:
+  One error among the regression tests are reported: the Java BNFC example
+  grammar contains mutually left recursive rules.
+
+Escaped characters in haskell-hcr:
+  Attempting to parse ParCore.hcr from the BNFC example grammar 
+  haskell-hcr yield errors for escaped characters.
+
diff --git a/document/BNF_Converter_Python_Mode.html b/document/BNF_Converter_Python_Mode.html
@@ -0,0 +1,230 @@
+<!DOCTYPE html>
+<head>
+  <meta http-equiv="content-type"
+ content="text/html; charset=ISO-8859-1">
+  <title>BNF Converter Python Mode</title>
+</head>
+<style>
+  table {
+    font-family: arial, sans-serif;
+    border-collapse: collapse;
+    width: 100%;
+  }
+
+  td, th {
+    text-align: left;
+    padding: 4px;
+  }
+
+  </style>
+<body>
+<div style="text-align: center;">
+<h2>BNF Converter</h2>
+<h2>Python Mode</h2>
+</div> 
+<h3>By Björn Werner</h3>
+
+<h3>2024</h3>
+<p>
+  The BNF Converter's Python Backend generates a Python frontend, that uses 
+  Antlr4, to parse input into an AST (abstract syntax tree).
+</p>
+<p>
+  BNFC on Github:<br>
+  <a href="https://github.com/BNFC/bnfc">https://github.com/BNFC/bnfc</a>
+</p>
+<p>
+  Antlr on Github:<br>
+  <a href="https://github.com/antlr/antlr4">https://github.com/antlr/antlr4</a>
+</p>
+<p>
+  Requirements are: the jar file for ANTLRv4, the Python package antlr4, and
+  Python 3.10 or higher.
+</p>
+<h3>Usage</h3>
+<div style="margin-left: 40px; "><big><span style="font-family: monospace; ">
+    bnfc --python -m NAME.cf</span></big><br style="font-family: monospace; ">
+</div>
+<p>
+There should now exist the following files:
+</p>
+<table style="padding: 1cm;">
+  <tr>
+    <th>Filename:</th><th>Description:</th>
+  </tr>
+  <tr>
+    <td>bnfcPyGenNAME/NAMELexer.g4</td><td>Provides the grammar for the lexer.</td>
+  </tr>
+  <tr>
+    <td>bnfcPyGenNAME/NAMEParser.g4</td><td>Provides the grammar for the parser.</td>
+  </tr>
+  <tr>
+    <td>bnfcPyGenNAME/Absyn.py</td><td>Provides the classes for the abstract syntax.</td>
+  </tr>
+  <tr>
+    <td>bnfcPyGenNAME/PrettyPrinter.py</td><td>Provides printing for both the AST and the linearized tree.</td>
+  </tr>
+  <tr>
+    <td>genTest.py</td><td>A ready test-file, that uses the generated frontend to convert input into an AST.</td>
+  </tr>
+  <tr>
+    <td>skele.py</td><td>Provides skeleton code to deconstruct an AST, using structural pattern matching.</td>
+  </tr>
+</table>
+<p>
+Make sure the jar for Antlr is accessible from the generated makefile and run the makefile. The generated lexer and parser is placed inside the folder used above.
+</p>
+<p>
+  For example, on linux, export the following variable from .profile:
+</p>
+<div style="margin-left: 40px; "><big><span style="font-family: monospace; "> 
+export ANTLR="$HOME/Downloads/antlr/antlr-4.13.2-complete.jar"</span></big><br style="font-family: monospace; ">
+</div>
+<p>
+  After that it should be possible to run the makefile:
+</p>
+<div style="margin-left: 40px; "><big><span style="font-family: monospace; ">
+    make</span></big><br style="font-family: monospace; ">
+</div>
+<h3>Testing the frontend</h3>
+<p>
+  The following example uses a frontend that is generated from a C-like grammar.
+</p>
+<p style="font-family: monospace;">
+  $ python3 genTest.py < hello.c
+</p>
+<p style="font-family: monospace;">
+  Parse Successful!<br>
+  <br>
+  [Abstract Syntax]<br>
+  (PDefs [(DFun Type_int "main" [] [(SExp (EApp "printString" [(EString "Hello world")])), (SReturn (EInt 0))])])<br>
+  <br>
+  [Linearized Tree]<br>
+  int main ()<br>
+  {<br>
+    &nbsp;printString ("Hello world");<br>
+    &nbsp;return 0;<br>
+  }<br>
+</p>
+<h3>The Abstract Syntax Tree</h3>
+<p>
+  The AST is built up using instances of Python classes, using the dataclass decorator, such as:
+</p>
+<p style="font-family: monospace;">
+@dataclass<br>
+class EAdd:<br>
+&nbsp;exp_1: Exp<br>
+&nbsp;exp_2: Exp<br>
+&nbsp;_ann_type: _AnnType = field(default_factory=_AnnType)
+</p>
+<p>
+  The "_ann_type" variable is a placeholder that can be used to store useful information,
+  for example type-information in order to create a type-annotated AST.
+</p>
+<h3>Using the skeleton file</h3>
+<p>
+  The skeleton file serves as a template, to create an interpreter for example.
+  Two different types of matchers are generated: the first with all the value
+  categories together, and a second type where each matcher only has one
+  individual value category, as in the example below:
+</p>
+<p style="font-family: monospace;">
+def matcherExp(exp_: Exp):<br>
+&nbsp;match exp_:<br>
+&nbsp;&nbsp;case EAdd(exp_1, exp_2, _ann_type):<br>
+&nbsp;&nbsp;&nbsp;# Exp "+" Exp1<br>
+&nbsp;&nbsp;&nbsp;raise Exception('EAdd not implemented')<br>
+&nbsp;&nbsp;case ESub(exp_1, exp_2, _ann_type):<br>
+&nbsp;&nbsp;&nbsp;...
+</p>
+<p>
+  This can be modified, in order to return the addition of each evaluated argument
+  category, into:
+</p>
+<p style="font-family: monospace;">
+  def matcherExp(exp_: Exp):<br>
+  &nbsp;match exp_:<br>
+  &nbsp;&nbsp;case EAdd(exp_1, exp_2, _ann_type):<br>
+  &nbsp;&nbsp;&nbsp;# Exp "+" Exp1<br>
+  &nbsp;&nbsp;&nbsp;return matcherExp(exp_1) + matcherExp(exp_2)<br>
+  &nbsp;&nbsp;case ESub(exp_1, exp_2, _ann_type):<br>
+  &nbsp;&nbsp;&nbsp;...
+</p>
+<p>
+  The function can now be imported and used in the generated test file 
+  (similarly to how the pretty printer is imported and used):
+</p>
+<p style="font-family: monospace;">
+  from skele import matcherExp<br>
+  ...<br>
+  print(matcherExp(ast))
+</p>
+
+<h3>Known issues</h3>
+<h4>
+  Maximum elements for hand-made list rules:
+</h4>
+<p>
+  If one defines custom rules for lists, such as:
+</p>
+<p style="font-family: monospace;"> 
+ (:) [C] ::= 'a' C 'b' [C] 'c'
+</p>
+<p>
+  the Python backend can not simplify the rule for an iterative approach
+  for the parser, meaning at most 1000 elements can be parsed - or a maximum
+  recursion depth will be thrown. Using the terminal or separator pragmas should work fine.
+</p>
+<h4>
+  Skeleton code for using lists as entrypoints:
+</h4>
+<p>
+  Matchers for using lists, such as [Exp], are not generated in the
+  skeleton code as it may confuse users if the grammar uses several different 
+  list categories - as a user may then try to pattern match lists without 
+  checking what type the elements have. Users are instead encouraged to use
+  non-list entrypoints. 
+</p>
+<p>
+  The improper way to iterate over lists, as the value category is unknown:
+</p>
+<p style="font-family: monospace;">
+  &nbsp;case list():<br>
+  &nbsp;&nbsp;for ele in ast:<br>
+  &nbsp;&nbsp;&nbsp;...
+</p>
+<p>
+  The proper way to deconstruct lists, where we know the value category:
+</p>
+<p style="font-family: monospace;">
+  &nbsp;case RuleName(listexp_):<br>
+  &nbsp;&nbsp;for exp in listexp_:<br>
+  &nbsp;&nbsp;&nbsp;...
+</p>
+<h4>Several entrypoints:</h4>
+<p>
+  The testfile genTest.py only uses the first entrypoint by default.
+</p>
+<h4>
+  Using multiple separators:
+</h4>
+<p>
+  Using multiple separators for the same category, such as below, generates
+  Python functions with overlapping names, causing runtime errors.
+</p>
+<p style="font-family: monospace;">
+  separator Exp1 "," ;<br>
+  separator Exp1 ";" ;
+</p>
+<h4>
+Results from the parameterized tests:
+</h4>
+<p>
+  One error among the regression tests are reported: the Java BNFC example grammar contains mutually left recursive rules.
+</p>
+<h4>
+  Example for grammar haskell-hcr:
+</h4>
+<p>
+  Attempting to parse ParCore.hcr from the haskell-hcr example BNFC grammar yields an error for escaped characters.
+</p>
diff --git a/source/BNFC.cabal b/source/BNFC.cabal
@@ -32,9 +32,8 @@ Description:
 
 -- Support range when build with cabal
 tested-with:
-  GHC == 9.10.1
-  GHC == 9.8.2
-  GHC == 9.6.5
+  GHC == 9.8.1
+  GHC == 9.6.3
   GHC == 9.4.8
   GHC == 9.2.8
   GHC == 9.0.2
@@ -44,6 +43,7 @@ tested-with:
   GHC == 8.4.4
   GHC == 8.2.2
   GHC == 8.0.2
+  GHC == 7.10.3
 
 extra-doc-files:
   README.md
@@ -81,9 +81,6 @@ executable bnfc
   other-modules:
     -- Generated by cabal
     Paths_BNFC
-  autogen-modules:
-    -- Generated by cabal
-    Paths_BNFC
   default-extensions:
     -- Keep in alphabetical order.
     LambdaCase
@@ -157,14 +154,6 @@ library
   --   BNFC.Lex
   --   -- Generated by happy
   --   BNFC.Par
-  -- 2023-11-03 We cannot add BNFC.{Lex,Par} as then the Lex.x and Par.y files
-  -- are not bundled by cabal dist.
-  -- Just make sure that there is no src/BNFC/{Lex,Par}.hs before running cabal sdist,
-  -- otherwise we will end up with both Lex.hs and Lex.x (resp. Par.{hs,y})
-  -- which will cause alex/happy to not be run, leading to build failures.
-  autogen-modules:
-    -- Generated by cabal
-    Paths_BNFC
   other-modules:
     -- Generated by cabal
     Paths_BNFC
@@ -266,6 +255,17 @@ library
     BNFC.Backend.Java.RegToAntlrLexer
     BNFC.Backend.Java.Utils
 
+    -- Python backend
+    BNFC.Backend.Python
+    BNFC.Backend.Python.CFtoPyAbs
+    BNFC.Backend.Python.CFtoPyPrettyPrinter
+    BNFC.Backend.Python.RegToFlex
+    BNFC.Backend.Python.PyHelpers
+    BNFC.Backend.Python.CFtoPySkele
+    BNFC.Backend.Python.CFtoAntlr4Lexer
+    BNFC.Backend.Python.CFtoAntlr4Parser
+    BNFC.Backend.Python.Antlr4Utils
+
     -- XML backend
     BNFC.Backend.XML