Peginator is a PEG (Parsing Expression Grammar) parser generator written in Rust. It is specifically made to parse into ASTs (Abstract Syntax Trees), as opposed to most, streaming-style parsers out there.
It generates both the tree structure and the parsing code that can create that tree from
a &str
. The generated parsing code is deliberately very simple straightforward Rust
code, which is usually optimized very well by the compiler.
There is an opt-in memoization feature that makes it a proper packrat parser that can parse any input in linear time and space.
Left-recursion is also supported using said memoization feature (also opt-in).
This documentation describes how peginator implements PEGs. A basic understanding of PEGs are assumed. There are good introductions on wikipedia or in the docs of other parser generators.
Peginator is bootstrapped using its own syntax and grammar file, which is somewhat easy-to-read.
Please see the syntax reference and the API documentation
The tests can also be used as examples.
The grammars for peginator are written in a syntax similar to EBNF (extended Backus-Naur form):
@export
FunctionDef = 'fn' name:Ident '(' param_list:ParamList ')' [ '->' return_value:Type ];
ParamList = self_param:SelfParam {',' params:Param} | params:Param {',' params:Param} | ;
Param = name:Ident ':' typ: Type;
SelfParam = [ref_type:ReferenceMarker] 'self';
Type = [ref_type:ReferenceMarker] typename:Ident;
ReferenceMarker = @:MutableReference | @:ImmutableReference;
ImmutableReference = '&';
MutableReference = '&' 'mut';
@string
@no_skip_ws
Ident = {'a'..'z' | 'A'..'Z' | '_' | '0'..'9'};
Based on the above grammar, peginator will generate the following types:
pub struct FunctionDef {
pub name: Ident,
pub param_list: ParamList,
pub return_value: Option<Type>,
}
pub struct ParamList {
pub self_param: Option<SelfParam>,
pub params: Vec<Param>,
}
pub struct Param {
pub name: Ident,
pub typ: Type,
}
pub struct SelfParam {
pub ref_type: Option<ReferenceMarker>,
}
pub struct Type {
pub ref_type: Option<ReferenceMarker>,
pub typename: Ident,
}
pub enum ReferenceMarker {
ImmutableReference(ImmutableReference),
MutableReference(MutableReference),
}
pub struct ImmutableReference;
pub struct MutableReference;
pub type Ident = String;
impl PegParser for FunctionDef { /* omitted */ }
Parsing then looks like this:
FunctionDef::parse("fn example(&self, input:&str, rectified:&mut Rect) -> ExampleResult;")
Which results in the following structure:
FunctionDef {
name: "example",
param_list: ParamList {
self_param: Some(SelfParam {
ref_type: Some(ImmutableReference(ImmutableReference)),
}),
params: [
Param {
name: "input",
typ: Type {
ref_type: Some(ImmutableReference(ImmutableReference)),
typename: "str",
},
},
Param {
name: "rectified",
typ: Type {
ref_type: Some(MutableReference(MutableReference)),
typename: "Rect",
},
},
],
},
return_value: Some(Type {
ref_type: None,
typename: "ExampleResult",
}),
}
We have pretty errors, based on the first failure of the longest match (a'la python's parser):
And parse tracing (opt-in, no cost if not used):
There are multiple ways to integrate a Peginator grammar to your project:
- Compile your grammars directly with the
peginator-cli
binary - Inline your grammars with the
peginate!
macro from the peginator_macro package - Or you can use the buildscript helper
The Peginator CLI can also create a nice railroad graph of your grammar:
At this point, I'd be happy if simply more people used this code. Please reach out if you need any help.
The project is meant to be an almost drop-in replacement for Tatsu, and its fantastic Model Builder. This is why the grammar looks like the way it does.
There are a ton of other PEG parser implementations in Rust, please check them out. Non-exhaustive list in no particular order:
Special mention: lalrpop
Licensed under the MIT license