Skip to content

Commit 0073398

Browse files
committed
Add an example for Python
1 parent e414a70 commit 0073398

12 files changed

+2246
-1
lines changed

.github/workflows/tests.yml

+10-1
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,15 @@ jobs:
6262
ref: d64aefb55228d9584d3e5b2433f720ea8fd00c82
6363
persist-credentials: false
6464

65+
- name: Checkout CPython
66+
if: ${{ matrix.platform != 'win32' }}
67+
uses: actions/checkout@v3
68+
with:
69+
path: cpython
70+
repository: python/cpython
71+
ref: 3979150a0d406707f6d253d7c15fb32c1e005a77
72+
persist-credentials: false
73+
6574
- name: 'Build & Test'
6675
run: |
67-
${{ env.DC }} -run runtests.d --compiler ${{ env.DC }} -m${{ matrix.model }} --json-test-dir JSONTestSuite/test_parsing --dmd-dir dmd --avoid-parallel-memory-usage --github ${{ matrix.extra_args }}
76+
${{ env.DC }} -run runtests.d --compiler ${{ env.DC }} -m${{ matrix.model }} --json-test-dir JSONTestSuite/test_parsing --dmd-dir dmd --python-test-dir cpython/Lib/test --avoid-parallel-memory-usage --github ${{ matrix.extra_args }}

README.md

+4
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,10 @@ The parser for C++ uses GLR, while the grammar for the preprocessor can
5151
use LALR. The example application shows the parse tree for a C++ file,
5252
which needs to be already preprocessed.
5353

54+
An example for parsing Python is is folder [examples/python/](examples/python/).
55+
It uses a wrapper around the generated lexer, which keeps track of the
56+
indentation level.
57+
5458
The folder [tests/grammars/](tests/grammars/) also contains example grammars, but some
5559
of them test corner cases and should not be used as examples for
5660
real grammars.

docs/api.md

+2
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,8 @@ different things, like for example:
6666
[lexer hack](https://en.wikipedia.org/wiki/Lexer_hack) for C.
6767
* Store or process comments, which are ignored by the parser.
6868
* Add debug output without modifying lexer or parser directly.
69+
* Keep track of the indentation level for languages like Python, see
70+
example in [examples/python/](../examples/python/).
6971

7072
## Tree Creator
7173

examples/python/README.md

+31
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# Example grammar for Python
2+
3+
This is an example for parsing [Python](https://www.python.org/).
4+
The grammar is in file grammarpython.ebnf. Application testpython.d uses
5+
it to parse Python files and print a parse tree.
6+
7+
Python uses the indentation to define the structure of the source code.
8+
The generated lexer does not directly implement this. Instead the
9+
grammar contains the tokens `Indent` and `Dedent` without a definition.
10+
A wrapper around the lexer in the application keeps track of the current
11+
indentation and generates these tokens, so the generated parser can use
12+
them.
13+
14+
The application can be built with the following command:
15+
```sh
16+
dub build
17+
```
18+
19+
It is also possible to test the grammar on test cases from
20+
https://github.com/python/cpython/tree/main/Lib/test using the argument `--test-dir`:
21+
```sh
22+
git clone https://github.com/python/cpython.git
23+
git -C cpython checkout 3979150a0d406707f6d253d7c15fb32c1e005a77
24+
./example_python --test-dir cpython/Lib/test/
25+
```
26+
It will only print on errors, so the expected output is empty.
27+
28+
The grammar is based on the official grammar at https://docs.python.org/3/reference/lexical_analysis.html
29+
and https://docs.python.org/3/reference/grammar.html.
30+
The grammar was converted with the program grammarpythongen.d, which
31+
uses grammarpeg.ebnf. Some manual changes were also made.

examples/python/dub.json

+27
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
{
2+
"name": "example_python",
3+
"description": "Example Python for DParserGen",
4+
"authors": ["Tim Schendekehl"],
5+
"license": "BSL-1.0",
6+
"targetType": "executable",
7+
"dependencies": {
8+
"dparsergen:core": {
9+
"version": "*",
10+
"path": "../.."
11+
},
12+
"dparsergen:generator": {
13+
"version": "*",
14+
"path": "../.."
15+
}
16+
},
17+
"sourceFiles": [
18+
"testpython.d",
19+
"grammarpython.d",
20+
"grammarpython_lexer.d"
21+
],
22+
"lflags-windows": ["/STACK:10485760"],
23+
"preBuildCommands": [
24+
"\"$DUB\" run --root=../../ :generator -- grammarpython.ebnf -o grammarpython.d --lexer grammarpython_lexer.d"
25+
],
26+
"buildRequirements": ["allowWarnings"]
27+
}

examples/python/grammarpeg.ebnf

+71
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
PEG = Definition+;
2+
3+
Definition
4+
= Name BracketExpression? Memo? ":" Newline* "|"? Productions Newline
5+
| Newline
6+
| "@" "trailer" StringLiteralLong Newline
7+
;
8+
9+
Memo = "(" Expression ")";
10+
11+
Productions @array
12+
= Production
13+
| Productions Newline? "|" Production
14+
;
15+
16+
Production = NamedExpression+ Code?;
17+
18+
NamedExpression @backtrack
19+
= Name BracketExpression? "=" PrefixExpression
20+
| <PrefixExpression
21+
;
22+
PrefixExpression
23+
= "!" Expression
24+
| "&" PrefixExpression
25+
| "&&" PrefixExpression
26+
| StringLiteral "." Expression "+"
27+
| StringLiteral "..." StringLiteral
28+
| <Expression
29+
| "~"
30+
;
31+
Expression @noOptDescent
32+
= Name
33+
| StringLiteral
34+
| ExtraText
35+
| <BracketExpression
36+
| Expression "*"
37+
| Expression "+"
38+
| Expression "?"
39+
| "(" Productions ")"
40+
;
41+
BracketExpression
42+
= "[" Productions "]"
43+
;
44+
45+
token StringLiteralLong @minimalMatch
46+
= "'''" [^]* "'''"
47+
;
48+
token StringLiteral
49+
= "'" [^']* "'" !"'"
50+
| "\"" [^"]* "\""
51+
;
52+
token ExtraText
53+
= "<" [^<>]* ">"
54+
;
55+
56+
token Code = "{" {[^{}"'] | StringLiteral}* "}";
57+
58+
token Name @lowPrio
59+
= [a-zA-Z_] [a-zA-Z_0-9]*
60+
| "`" [a-zA-Z_] [a-zA-Z_0-9]* "`"
61+
;
62+
63+
token Space @ignoreToken
64+
= [ \t]+
65+
;
66+
token Comment @ignoreToken
67+
= "#" [^\n]*
68+
;
69+
token Newline
70+
= "\n" | "\r" | "\r\n"
71+
;

0 commit comments

Comments
 (0)