-
-
Notifications
You must be signed in to change notification settings - Fork 261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract the parser
and validator
from the pest_derive
crate into pest_meta
#159
Conversation
Just a note on bootstrapping: Pest itself uses a language for defining the grammar rules for generating a parser. A parser for said language could even be generated by Pest from a description of the language, written in the Pest language itself. That parser could, in turn, be used to implement a new version of Pest. This process is called bootstrapping. The problem is that the new version of Pest would depend on the old version of Pest, both of which would need to be compiled. Assuming this worked (which is still unclear at the moment), the users of the pest library would have to incur at least the extra compilation time required for the bootstrapping operation. Unless there can be a way around this (e.g. cargo starts shipping binary artifacts), the benefits of bootstrapping would need to outweigh the compilation time costs. Another problem is that the parser for the new version of the language would be generated from a description written in an older version of the language. This means that the language itself would need versioning in order to avoid confusion and incompatibilities. In general, the benefits of bootstrapping the Pest language are:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice job with extracting this!
Versioning for the beta is a bit dirty: I've been publishing new versions of each crate even when not fully required, so I guess that can work out until 1.0 is launched.
pest_derive/src/lib.rs
Outdated
let (ast, defaults) = parser::consume_rules(pairs); | ||
fn format_errors<Errors: IntoIterator>(errors: Errors) -> String where Errors::Item: Display { | ||
errors.into_iter() | ||
.map(|error| { format!("{}", error) }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The usual style I used was to align methods calls vertically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I tried to do that as well, but I think I missed this one :) I really need to get Vim to do it for me :)
pest_meta/Cargo.toml
Outdated
|
||
[dependencies] | ||
quote = "^0.3" | ||
maplit = "^0.1.5" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would optimize for having as few dependencies as possible in order to facilitate compile times for end users. The pest crates compile somewhat slowly and the grammars can also take quite a bit during the first compilation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maplit
is just a bunch of macros, so it doesn't really take any compilation time at all aside from downloading it (<200 lines of code including tests and documentation).
I thought you meant "quote", which is quite a bit of an annoyance since we only use one struct: quote::Ident
. Of course, quote
is already a dependency of pest_derive
, so for that use case there's no added cost, but for projects who don't care much about producing a rust plugin it may be annoying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant both, actually. The only reason I'd give up on maplit
is because it's easy to replace and one request less to worry about when compiling big projects with a lot of dependencies.
About quote
, it may be a smart idea to give up on using quote::Ident
in pest_meta
since it doesn't make sense at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add the circumvention of the quote
crate as a second to do.
pest_meta/src/parser.rs
Outdated
) -> Result<Position<I>, Position<I>> { | ||
pos.sequence(|pos| { | ||
soi(pos, state).and_then(|pos| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a funny change. :D I used to have this whole file as close to what the generator would spit out, hoping to be able to bootstrap it completely sooner or later, but of course it makes no sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually going to ask about this one, because I wasn't sure if it was there for historical reasons or if was written in expectation of an upcoming feature. Is it OK that I took it out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup!
pest_meta/src/parser.rs
Outdated
GrammarRule::expression => expression(pos, &mut state), | ||
GrammarRule::term => term(pos, &mut state), | ||
GrammarRule::positive_predicate_operator => { | ||
PestRule::grammar_rules => grammar_rules(pos, &mut state), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this could be transformed into a macro since it got quite big since I started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that is a lot of repetition a lot of repetition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is so meta. haha.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New macro pushed :)
pest_meta/src/validator.rs
Outdated
use super::parser::{PestRule, ParserExpr, ParserNode, ParserRule}; | ||
|
||
pub fn validate_pairs<I: Input>(pairs: Pairs<PestRule, I>) -> Result<Vec<Ident>, Vec<Error<PestRule, I>>> { | ||
let rust_keywords = hashset!{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does look good. I think it would be very easy to just have a macro for this defined in the crate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! But I would really hope you gave maplit another thought. The code for this macro (16 lines) is more than 25% of the library code (16 + 16 + 13 + 14 lines).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I guess this is a perfect use case and there is pretty much no other dependency.
pest_meta/src/validator.rs
Outdated
Ok(defaults.into_iter().map(|string| Ident::new(*string)).collect()) | ||
} | ||
|
||
pub fn validate_rust_keywords<'a, I: Input+'a, Definitions: IntoIterator<Item=&'a Span<I>>>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would normally used spaces around the +
in types as well, but guess it really depends. Also, I would put the lifetimes before types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing! I tried rustfmt, but had to revert because it had some very different opinions 😄 and I really wanted to reproduce the style of the other crates as much as possible. Is there any plan to use it at some point when it is stable (and consistent) enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue with rustfmt
is an old one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, whoops! it was right there! Good thing rustfmt is open source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but, unfortunately, the hang bug seems to be pretty non-trivial.
I guess bootstrapping can currently stay in its current form. The reason bootstrapping might make more sense in the future would if pest will gain error-flexible parsing capabilities. |
pest_meta/src/parser.rs
Outdated
} | ||
|
||
pest::state(input, move |mut state, pos| parse_rule!{ | ||
pos, state for rule in { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, creative syntax! 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I like rust macros! They can allow for some really intuitive and readable code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Come to think about it, maybe something with match
in the syntax might be more appropriate, as the rule is only matched to exactly one of the variants.
The |
@dragostis any idea what I did to bring the code coverage down so much? I think it might be due to all of the text I removed when I made the validator return Result instead of panicking in a2a2980. I guess those cases tested both the validator and the parser at once. |
@Victor-Savu It seems to be the case, yes. The integration tests in |
@Victor-Savu tarpaulin might help. I don't have a machine to test it on right now, so maybe you could try |
|
As `quote::Ident` is the only thing from `quote` used by `pest_meta`, the dependency on `quote` is overkill. Replace uses of `quote::Ident` by `String` and remove the dependency on `quote`. As `pest_derive` relies on `quote::Ident` for the code generation part, it is OK to keep `quote` as a dependency. Create the required `quote::Ident` instances from the identifier names which are now provided by `pest_meta` as `String`.
Wow! Tarpaulin is awesome! |
Does the example tick the documentation box? |
parser
and validator
from the pest_derive
crate into pest_meta
parser
and validator
from the pest_derive
crate into pest_meta
Hi, @dragostis! I am not sure what would be left to do for this PR and I'd like to do hear your thoughts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, sorry for the late reply. Had a pretty crazy week.
This looks really good and is very close to being merged modulo some small issues. I've tried the colorized example and it looks absolutely sweet!
pest_meta/Cargo.toml
Outdated
@@ -12,3 +12,9 @@ readme = "_README.md" | |||
[dependencies] | |||
maplit = "^0.1.5" | |||
pest = { path = "../pest", version = "^1.0.0-beta" } | |||
|
|||
[[example]] | |||
name="colorize" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not completely sure, but the example entry seems to be redundant.
pest_meta/examples/colorize.rs
Outdated
// processed (styled), in order to: | ||
// - skip sub-trees which overlap with already styled ones to avoid duplicating the text | ||
// - print the input spans between the styled sub-trees | ||
// . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The period here seems a bit too much. 😛
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are linguists -- of sorts :)
pest_meta/examples/colorize.rs
Outdated
// Now, we print everything between the last processed code point and the beginning of the | ||
// current sub-tree span using the default text output style. Then we print the span of the | ||
// sub-tree itself using the selected style. | ||
print!("{}{}", unsafe {input.slice(processed, span.start())}, style(span.as_str())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rest of the code leave spaces around the unsafe {}
braces. (still a muddy affair until rustfmt will work)
pest_meta/examples/colorize.rs
Outdated
processed = span.end(); | ||
} | ||
// To finish up, we print whatever is left from the input using the default style. | ||
println!("{}", unsafe {input.slice(processed, input.len())}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
@dragostis No worries! Thanks a lot for taking the time to review this! I applied the fixes (I hope 🤞) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. Would you like to merge this PR as is for now and leave documentation for later or do you think it's best to do it here?
Great! If you think the current interface looks good, I can get to the documentation. I think documentation should be part of the PR. The example colorizer was the first step in that direction, but I feel there needs to be a way for people to find it. Should I add a small section in the top-level README.md? If that gets too long, maybe a dedicated In the meantime, I will write the rust API documentation. |
Ah, very good point, the benchmark should include both debug and release builds. Your point regarding when this change should land brought up a rather scary question to mind: Can the bootstrapped parser benefit from other advances in |
It would be stuck in the 1.0 era, yes. But this might not be such a huge issue since the grammar should be stable from now on. However, this means that 1.0 will have to be supported indefinitely. As for the case of pre-built binaries, I don't think this will hit Cargo anytime soon. |
I finally bootstrapped the parser, but I am getting two errors that I don't really know how to fix. I suspect they have similar causes (at least in spirit). |
pest_meta/src/grammar/pest.pest
Outdated
@@ -0,0 +1,99 @@ | |||
grammar_rules = _{ soi ~ grammar_rule* ~ eoi } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be grammar_rules = _{ soi ~ grammar_rule+ ~ eoi }
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The grammar looks good. This should improve maintainability quite a bit. Unfortunately, there is quite a bit more work to be able to merge this on v1.1
mainly concerning the changes to the AST.
pest_derive/src/lib.rs
Outdated
GrammarRule::range_operator => "`..`".to_owned(), | ||
GrammarRule::single_quote => "`'`".to_owned(), | ||
error.renamed_rules(|rule| { match *rule { | ||
Rule::grammar_rule => "rule".to_owned(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A translation rule for _push
would also be needed.
pest_derive/src/optimizer.rs
Outdated
name: Ident::new("rule"), | ||
ty: RuleType::Atomic, | ||
name: "rule", | ||
ty: RuleType::Atomic, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Space here is not needed per rustfmt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I run rustfmt on this file, I get a bunch of changes (many not related to the changes I made) Should I just go ahead and use rustfmt, or should I just remove these spaces? I am using rustfmt 0.8.0
pest_derive/src/optimizer.rs
Outdated
name: Ident::new("rule"), | ||
ty: RuleType::Atomic, | ||
name: "rule", | ||
ty: RuleType::Atomic, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
pest_derive/src/optimizer.rs
Outdated
name: Ident::new("rule"), | ||
ty: RuleType::Atomic, | ||
name: "rule", | ||
ty: RuleType::Atomic, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Space.
pest_derive/src/optimizer.rs
Outdated
name: Ident::new("rule"), | ||
ty: RuleType::Atomic, | ||
name: "rule", | ||
ty: RuleType::Atomic, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Space.
pest_meta/Cargo.toml
Outdated
pest_derive = "1.0.3" | ||
pest = "1.0.3" | ||
|
||
[dev_dependencies] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be dev-dependencies
. Not sure I'd keep this here though. It's kind of pointless to have every debug build download a copy of colored
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any suggestions as to what to do with the example code that depends on colored
?
pest_meta/Cargo.toml
Outdated
|
||
[dependencies] | ||
maplit = "^0.1.5" | ||
pest_derive = "1.0.3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pest dependencies here should be ^1.0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup! I suppose 1.1 would already be available by the time this PR lands, so it might make sense to use that instead and release these changes as v 1.2. (same reasoning if 1.2 is released as well before this PR)
pest_meta/src/ast.rs
Outdated
@@ -26,33 +24,33 @@ pub enum RuleType { | |||
} | |||
|
|||
#[derive(Clone, Debug, Eq, PartialEq)] | |||
pub enum Expr { | |||
pub enum Expr<'i> { | |||
Str(String), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the lifetime was added, it probably makes sense to make Str(&'i str)
, Insens(&'i str)
and Range(&'i str, &'i str)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, yes! :) I was wondering why this wasn't part of the PR which introduced lifetimes to begin with. I will put it in, though it does feel a bit out of scope for this PR.
pest_meta/src/lib.rs
Outdated
|
||
pub fn unwrap_or_report<T, E>(res: Result<T, E>) -> T where E: IntoIterator, E::Item: Display { | ||
res.unwrap_or_else(|e| panic!("grammar error\n\n".to_owned() + &e.into_iter() | ||
.map(|error| { format!("{}", error) }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about this formatting. Maybe a newer version of rustfmt needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will use rustfmt
on this file. Would you recommend using cargo fmt
to format all 4 crates? That would create a lot of changes, and in some places arguably wrong formatting. I feel like rustfmt is not really ready for prime time yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, rustfmt
is not ready for prime-time. :/ I'm still kind of torn between whether or not to use it, but for now, I guess it's fine. I expect formatting to get better in time. Anyway, I definitely don't mind the changes if they're concentrated in a last commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do a cargo fmt
as part of a dedicated PR which only deals with formatting. Merging (non-fast-forward) can be quite difficult after such changes since there would be conflicts on so many lines.
pest_meta/src/validator.rs
Outdated
rules: &HashMap<Ident, &ParserNode<'i>>, | ||
trace: &mut Vec<Ident> | ||
) -> Option<Error<'i, GrammarRule>> { | ||
fn left_recursion<'a, 'i: 'a>(rules: HashMap<&'i str, &'a ParserNode<'i>>) -> Vec<Error<'i, Rule>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does 'a
need to be introduced here? I would have imagined that using 'i
everywhere would work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second lifetime is needed because the nodes can't live as long as the slices they reference. If we forced &'i ParserNode<'i>
the nodes referenced in the HashMap would need to live at least as long as lifetime 'i
, which is also the lifetime of the slice that they reference. This, however, is not possible, since the node cannot live beyond the input as that would invalidate the references that it holds to the input.
Thanks for the review! Any suggestions regarding the two failing tests? I really don't know why it tries to match those extra rules. |
Yes, it's the first comment. 😛
The error reporting can be a bit flaky, I suppose. The problem is that the logic doesn't expect multiple reports on the same exact position the way it happens here. Anyway, this is a paper cut for a subsequent release. I feel like it doesn't make sense to release By the way, I really like your work with the PR and I think it makes sense for you to join the team if you want to continue developing pest. I definitely have high hopes for the future of the project! |
Thanks! I think I missed that comment. Al green now! Thank you very much for the invitation! I'd be happy to contribute more! |
I did some measurements for the build times from scratch of the entire cargo workspace and this change increases the build time by 50% on average both in debug (from 16.3s to 24.6s) as well as in release (from 52.6s to 76.2s). |
1 similar comment
I did some measurements for the build times from scratch of the entire cargo workspace and this change increases the build time by 50% on average both in debug (from 16.3s to 24.6s) as well as in release (from 52.6s to 76.2s). |
I think that's fine. This number is only problematic when using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to merge this on v1.1
. The borrowed strings in the AST are the only blockers and it seems like all of them will need to be changed, unless we want to duplicate work in the optimizer by having another to convert to.
pest_derive/src/optimizer.rs
Outdated
@@ -91,23 +91,6 @@ pub fn optimize(rules: Vec<Rule>) -> Vec<Rule> { | |||
ty, | |||
expr: expr.map_bottom_up(rotate_right) | |||
.map_bottom_up(unroll_loops) | |||
.map_bottom_up(|expr| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the issue here: since strings are not owned anymore, concatenation is not possible. Seems like we'll have to revert this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you happen to have any measurements comparing the string concatenation optimization with the slice optimization? The slice optimization prevents memory allocation, prevents copying and improves referential locality, whereas the string concatenation optimization reduces the number of nodes. On first glance I don't really see a clear cut advantage of one optimization over another. However, they can coexist :)
My proposal would be to keep the string slice references and to postpone the concatenation optimization for a separate PR. The reason for this is that the The concatenated strings could be represented later using another Rule
variant containing a String
(or another data structure, such as a Vec[&'i str]
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The slice optimization happens to make compilation faster, while the concatenation works on improving parse performance once the actual code is generated which is why I would definitely go for the latter. Apart from that, the actual grammar parsing/optimization/generation step is very small during the whole compilation phase which is why I didn't bother at all to do any optimizations on this side and went for code clarity. (as best as I could while still trying to get features out 😃)
Having another variant for concatenations is not a bad idea, but wouldn't it flood the optimization and generation steps too much? Since this would require you to deal with multiple cases that, in the end, reach the same point. The reason why I find the &str
to be advantageous is because it is way easier to write tests without all the ugly .to_owned()
labels everywhere. However, this could be improved by writing some macros for test improvement. Quite frankly, tests are extremely verbose since I haven't had the time to invest in making them a bit nicer. However, I find their explicitness to be rather helpful. (if you kind of know your way around the code)
So, I would definitely go for the owned approach.
If there is any way I can help with that, please let me know. I wanted to ask if |
Also, Hipchat or another chat service for development would be useful :) I have really abused this PR thread for discussions which are not related to the changes, but rather to the |
I'm aware of GitFlow, but I guess I got sidetracked at one particular point. I'll merge
The Gitter channel is in active use. :D |
This is the first step towards #158 . It provides the new crate
pest_meta
, which grants access to the implementations of the parser and validator used bypest_derive
. However, in order to declare them usable, there needs to be some experimentation outside their original intended use.This PR is a WIP mainly to get early feedback and to prevent working inside the proverbial cave for too long.
Open questions:
pest_meta
as for all the other crates in thepest
workspace but I am not sure if that's what I was supposed to do 😃Once the design & implementation kinks are ironed out and any new ideas are implemented (or postponed, dropped), this PR still needs:
pest_meta
targeted at people who want to use it for building tools forpest
(syntax highlighting, linting, completion, etc.)