Skip to content

Commit

Permalink
Merge branch 'master' into match-python3
Browse files Browse the repository at this point in the history
# Conflicts:
#	lark/grammars/python.lark
  • Loading branch information
joseph-e-k committed Mar 6, 2022
2 parents e20b5ae + a9a60f1 commit 9be9ac5
Show file tree
Hide file tree
Showing 57 changed files with 1,722 additions and 658 deletions.
10 changes: 6 additions & 4 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,12 @@ v1.0

- `maybe_placeholders` is now True by default

- `use_accepts` in `UnexpectedInput.match_examples()` is now True by default
- Renamed TraditionalLexer to BasicLexer, and 'standard' lexer option to 'basic'

- Default priority is now 0, for both terminals and rules (used to be 1 for terminals)

- Token priority is now 0 by default
- Discard mechanism is now done by returning Discard, instead of raising it as an exception.

- `v_args(meta=True)` now gives meta as the first argument. i.e. `(meta, children)`
- `use_accepts` in `UnexpectedInput.match_examples()` is now True by default

- Renamed TraditionalLexer to BasicLexer, and 'standard' lexer option to 'basic'
- `v_args(meta=True)` now gives meta as the first argument. i.e. `(meta, children)`
21 changes: 15 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Most importantly, Lark will save you time and prevent you from getting parsing h

- [Documentation @readthedocs](https://lark-parser.readthedocs.io/)
- [Cheatsheet (PDF)](/docs/_static/lark_cheatsheet.pdf)
- [Online IDE](https://lark-parser.github.io/ide)
- [Online IDE](https://lark-parser.org/ide)
- [Tutorial](/docs/json_tutorial.md) for writing a JSON parser.
- Blog post: [How to write a DSL with Lark](http://blog.erezsh.com/how-to-write-a-dsl-in-python-with-lark/)
- [Gitter chat](https://gitter.im/lark-parser/Lobby)
Expand Down Expand Up @@ -98,15 +98,15 @@ Lark is great at handling ambiguity. Here is the result of parsing the phrase "f
- **LALR(1)** parser
- Fast and light, competitive with PLY
- Can generate a stand-alone parser ([read more](docs/tools.md#stand-alone-parser))
- **CYK** parser, for highly ambiguous grammars
- **EBNF** grammar
- **Unicode** fully supported
- **Python 2 & 3** compatible
- Automatic line & column tracking
- Interactive parser for advanced parsing flows and debugging
- Grammar composition - Import terminals and rules from other grammars
- Standard library of terminals (strings, numbers, names, etc.)
- Import grammars from Nearley.js ([read more](/docs/tools.md#importing-grammars-from-nearleyjs))
- Extensive test suite [![codecov](https://codecov.io/gh/lark-parser/lark/branch/master/graph/badge.svg?token=lPxgVhCVPK)](https://codecov.io/gh/lark-parser/lark)
- MyPy support using type stubs
- Type annotations (MyPy support)
- And much more!

See the full list of [features here](https://lark-parser.readthedocs.io/en/latest/features.html)
Expand Down Expand Up @@ -164,6 +164,7 @@ Check out the [JSON tutorial](/docs/json_tutorial.md#conclusion) for more detail
- [pytreeview](https://gitlab.com/parmenti/pytreeview) - a lightweight tree-based grammar explorer
- [harmalysis](https://github.com/napulen/harmalysis) - A language for harmonic analysis and music theory
- [gersemi](https://github.com/BlankSpruce/gersemi) - A CMake code formatter
- [MistQL](https://github.com/evinism/mistql) - A query language for JSON-like structures

Using Lark? Send me a message and I'll add your project!

Expand All @@ -173,13 +174,19 @@ Lark uses the [MIT license](LICENSE).

(The standalone tool is under MPL2)

## Contribute
## Contributors

Lark is currently accepting pull-requests. See [How to develop Lark](/docs/how_to_develop.md)

Big thanks to everyone who contributed so far:

<a href="https://github.com/lark-parser/lark/graphs/contributors">
<img src="https://contributors-img.web.app/image?repo=lark-parser/lark" />
</a>

## Sponsor

If you like Lark, and want to see it grow, please consider [sponsoring us!](https://github.com/sponsors/lark-parser)
If you like Lark, and want to see us grow, please consider [sponsoring us!](https://github.com/sponsors/lark-parser)

## Contact the author

Expand All @@ -188,3 +195,5 @@ Questions about code are best asked on [gitter](https://gitter.im/lark-parser/Lo
For anything else, I can be reached by email at erezshin at gmail com.

-- [Erez](https://github.com/erezsh)


2 changes: 1 addition & 1 deletion docs/classes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ Tree

.. autoclass:: lark.Tree
:members: pretty, find_pred, find_data, iter_subtrees, scan_values,
iter_subtrees_topdown
iter_subtrees_topdown, __rich__

Token
-----
Expand Down
13 changes: 3 additions & 10 deletions docs/features.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@
- Flexible error handling by using an interactive parser interface (LALR only)
- Automatic line & column tracking (for both tokens and matched rules)
- Automatic terminal collision resolution
- Grammar composition - Import terminals and rules from other grammars
- Standard library of terminals (strings, numbers, names, etc.)
- Unicode fully supported
- Extensive test suite
- MyPy support using type stubs
- Python 2 & Python 3 compatible
- Type annotations (MyPy support)
- Pure-Python implementation

[Read more about the parsers](parsers.md)
Expand All @@ -27,13 +27,6 @@
- Import grammars from Nearley.js ([read more](tools.html#importing-grammars-from-nearleyjs))
- CYK parser
- Visualize your parse trees as dot or png files ([see_example](https://github.com/lark-parser/lark/blob/master/examples/fruitflies.py))


### Experimental features
- Automatic reconstruction of input from parse-tree (see examples)
- Use Lark grammars in Julia and Javascript.

### Planned features (not implemented yet)
- Generate code in other languages than Python
- Grammar composition
- LALR(k) parser
- Full regexp-collision support using NFAs
17 changes: 11 additions & 6 deletions docs/grammar.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,11 +99,13 @@ num_list: "[" _separated{NUMBER, ","} "]" // Will match "[1, 2, 3]" etc.

### Priority

Terminals can be assigned priority only when using a lexer (future versions may support Earley's dynamic lexing).
Terminals can be assigned a priority to influence lexing. Terminal priorities
are signed integers with a default value of 0.

Priority can be either positive or negative. If not specified for a terminal, it defaults to 1.
When using a lexer, the highest priority terminals are always matched first.

Highest priority terminals are always matched first.
When using Earley's dynamic lexing, terminal priorities are used to prefer
certain lexings and resolve ambiguity.

### Regexp Flags

Expand Down Expand Up @@ -228,9 +230,12 @@ four_words: word ~ 4
### Priority
Rules can be assigned priority only when using Earley (future versions may support LALR as well).
Like terminals, rules can be assigned a priority. Rule priorities are signed
integers with a default value of 0.
Priority can be either positive or negative. In not specified for a terminal, it's assumed to be 1 (i.e. the default).
When using LALR, the highest priority rules are used to resolve collision errors.
When using Earley, rule priorities are used to resolve ambiguity.
<a name="dirs"></a>
## Directives
Expand Down Expand Up @@ -321,4 +326,4 @@ Can also be used to implement a plugin system where a core grammar is extended b
%extend NUMBER: /0x\w+/
```

For both `%extend` and `%override`, there is not requirement for a rule/terminal to come from another file, but that is probably the most common usecase
For both `%extend` and `%override`, there is not requirement for a rule/terminal to come from another file, but that is probably the most common usecase
5 changes: 5 additions & 0 deletions docs/how_to_develop.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,11 @@ You can also run the tests using pytest:
pytest tests
```

### Code Style

Lark does not follow a predefined code style.
We accept any code style that makes sense, as long as it's Pythonic and easy to read.

### Using setup.py

Another way to run the tests is using setup.py:
Expand Down
2 changes: 1 addition & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ Resources

.. _Examples: https://github.com/lark-parser/lark/tree/master/examples
.. _Third-party examples: https://github.com/ligurio/lark-grammars
.. _Online IDE: https://lark-parser.github.io/ide
.. _Online IDE: https://lark-parser.org/ide
.. _How to write a DSL: http://blog.erezsh.com/how-to-write-a-dsl-in-python-with-lark/
.. _Program Synthesis is Possible: https://www.cs.cornell.edu/~asampson/blog/minisynth.html
.. _Cheatsheet (PDF): _static/lark_cheatsheet.pdf
Expand Down
18 changes: 13 additions & 5 deletions docs/json_tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ The dictionaries and lists are recursive, and contain other json documents (or "

Let's write this structure in EBNF form:

```lark
value: dict
| list
| STRING
Expand All @@ -47,7 +48,7 @@ Let's write this structure in EBNF form:
dict : "{" [pair ("," pair)*] "}"
pair : STRING ":" value

```

A quick explanation of the syntax:
- Parenthesis let us group rules together.
Expand All @@ -58,25 +59,31 @@ Lark also supports the rule+ operator, meaning one or more instances. It also su

Of course, we still haven't defined "STRING" and "NUMBER". Luckily, both these literals are already defined in Lark's common library:

```lark
%import common.ESCAPED_STRING -> STRING
%import common.SIGNED_NUMBER -> NUMBER
```

The arrow (->) renames the terminals. But that only adds obscurity in this case, so going forward we'll just use their original names.

We'll also take care of the white-space, which is part of the text.
We'll also take care of the white-space, which is part of the text, by simply matching and then throwing it away.

```lark
%import common.WS
%ignore WS
```

We tell our parser to ignore whitespace. Otherwise, we'd have to fill our grammar with WS terminals.

By the way, if you're curious what these terminals signify, they are roughly equivalent to this:

```lark
NUMBER : /-?\d+(\.\d+)?([eE][+-]?\d+)?/
STRING : /".*?(?<!\\)"/
%ignore /[ \t\n\f\r]+/
```

Lark will accept this, if you really want to complicate your life :)
Lark will accept this way of writing too, if you really want to complicate your life :)

You can find the original definitions in [common.lark](https://github.com/lark-parser/lark/blob/master/lark/grammars/common.lark).
They don't strictly adhere to [json.org](https://json.org/) - but our purpose here is to accept json, not validate it.
Expand Down Expand Up @@ -150,6 +157,7 @@ We now have a parser that can create a parse tree (or: AST), but the tree has so

I'll present the solution, and then explain it:

```lark
?value: dict
| list
| string
Expand All @@ -161,6 +169,7 @@ I'll present the solution, and then explain it:
...
string : ESCAPED_STRING
```

1. Those little arrows signify *aliases*. An alias is a name for a specific part of the rule. In this case, we will name the *true/false/null* matches, and this way we won't lose the information. We also alias *SIGNED_NUMBER* to mark it for later processing.

Expand Down Expand Up @@ -444,5 +453,4 @@ This is the end of the tutorial. I hoped you liked it and learned a little about

To see what else you can do with Lark, check out the [examples](/examples).

For questions or any other subject, feel free to email me at erezshin at gmail dot com.

Read the documentation here: https://lark-parser.readthedocs.io/en/latest/
4 changes: 2 additions & 2 deletions docs/tree_construction.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ Using `item+` or `item*` will result in a list of items, equivalent to writing `

Using `item?` will return the item if it matched, or nothing.

If `maybe_placeholders=False` (the default), then `[]` behaves like `()?`.
If `maybe_placeholders=True` (the default), then using `[item]` will return the item if it matched, or the value `None`, if it didn't.

If `maybe_placeholders=True`, then using `[item]` will return the item if it matched, or the value `None`, if it didn't.
If `maybe_placeholders=False`, then `[]` behaves like `()?`.

## Terminals

Expand Down
1 change: 1 addition & 0 deletions docs/visitors.rst
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,7 @@ Discard

.. autoclass:: lark.visitors.Discard


VisitError
----------

Expand Down
2 changes: 1 addition & 1 deletion examples/advanced/extend_python.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
from python_parser import PythonIndenter

GRAMMAR = r"""
%import .python3 (compound_stmt, single_input, file_input, eval_input, test, suite, _NEWLINE, _INDENT, _DEDENT, COMMENT)
%import python (compound_stmt, single_input, file_input, eval_input, test, suite, _NEWLINE, _INDENT, _DEDENT, COMMENT)
%extend compound_stmt: match_stmt
Expand Down
94 changes: 94 additions & 0 deletions examples/advanced/py3to2.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
"""
Python 3 to Python 2 converter (tree templates)
===============================================
This example demonstrates how to translate between two trees using tree templates.
It parses Python 3, translates it to a Python 2 AST, and then outputs the result as Python 2 code.
Uses reconstruct_python.py for generating the final Python 2 code.
"""


from lark import Lark
from lark.tree_templates import TemplateConf, TemplateTranslator

from lark.indenter import PythonIndenter
from reconstruct_python import PythonReconstructor


#
# 1. Define a Python parser that also accepts template vars in the code (in the form of $var)
#
TEMPLATED_PYTHON = r"""
%import python (single_input, file_input, eval_input, atom, var, stmt, expr, testlist_star_expr, _NEWLINE, _INDENT, _DEDENT, COMMENT, NAME)
%extend atom: TEMPLATE_NAME -> var
TEMPLATE_NAME: "$" NAME
?template_start: (stmt | testlist_star_expr _NEWLINE)
%ignore /[\t \f]+/ // WS
%ignore /\\[\t \f]*\r?\n/ // LINE_CONT
%ignore COMMENT
"""

parser = Lark(TEMPLATED_PYTHON, parser='lalr', start=['single_input', 'file_input', 'eval_input', 'template_start'], postlex=PythonIndenter(), maybe_placeholders=False)


def parse_template(s):
return parser.parse(s + '\n', start='template_start')

def parse_code(s):
return parser.parse(s + '\n', start='file_input')


#
# 2. Define translations using templates (each template code is parsed to a template tree)
#

pytemplate = TemplateConf(parse=parse_template)

translations_3to2 = {
'yield from $a':
'for _tmp in $a: yield _tmp',

'raise $e from $x':
'raise $e',

'$a / $b':
'float($a) / $b',
}
translations_3to2 = {pytemplate(k): pytemplate(v) for k, v in translations_3to2.items()}

#
# 3. Translate and reconstruct Python 3 code into valid Python 2 code
#

python_reconstruct = PythonReconstructor(parser)

def translate_py3to2(code):
tree = parse_code(code)
tree = TemplateTranslator(translations_3to2).translate(tree)
return python_reconstruct.reconstruct(tree)


#
# Test Code
#

_TEST_CODE = '''
if a / 2 > 1:
yield from [1,2,3]
else:
raise ValueError(a) from e
'''

def test():
print(_TEST_CODE)
print(' -----> ')
print(translate_py3to2(_TEST_CODE))

if __name__ == '__main__':
test()
10 changes: 1 addition & 9 deletions examples/advanced/python_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,17 +12,9 @@
import glob, time

from lark import Lark
from lark.indenter import Indenter
from lark.indenter import PythonIndenter


class PythonIndenter(Indenter):
NL_type = '_NEWLINE'
OPEN_PAREN_types = ['LPAR', 'LSQB', 'LBRACE']
CLOSE_PAREN_types = ['RPAR', 'RSQB', 'RBRACE']
INDENT_type = '_INDENT'
DEDENT_type = '_DEDENT'
tab_len = 8

kwargs = dict(postlex=PythonIndenter(), start='file_input')

# Official Python grammar by Lark
Expand Down
Loading

0 comments on commit 9be9ac5

Please sign in to comment.