-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(parse)!: Switch from pure parsers to stream mutation #268
Conversation
Pull Request Test Coverage Report for Build 5470633585
💛 - Coveralls |
examples/json/parser.rs
Outdated
fn json_value<'i, E: ParseError<Stream<'i>> + ContextError<Stream<'i>, &'static str>>( | ||
input: Stream<'i>, | ||
) -> IResult<Stream<'i>, JsonValue, E> { | ||
input: &mut Stream<'i>, | ||
) -> PResult<JsonValue, E> { | ||
// `alt` combines the each value parser. It returns the result of the first | ||
// successful parser, or an error | ||
alt(( | ||
unpeek(null).value(JsonValue::Null), | ||
unpeek(boolean).map(JsonValue::Boolean), | ||
unpeek(string).map(JsonValue::Str), | ||
null.value(JsonValue::Null), | ||
boolean.map(JsonValue::Boolean), | ||
string.map(JsonValue::Str), | ||
float.map(JsonValue::Num), | ||
unpeek(array).map(JsonValue::Array), | ||
unpeek(object).map(JsonValue::Object), | ||
array.map(JsonValue::Array), | ||
object.map(JsonValue::Object), | ||
)) | ||
.parse_peek(input) | ||
.parse_next(input) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an example of what a lot of the changes look like
(unpeek
and parse_peek
are approaches for making the old way work with the new API)
src/combinator/core.rs
Outdated
trace("cond", move |input: &mut I| { | ||
if b { | ||
f.parse_next(input).map(Some) | ||
} else { | ||
Ok(None) | ||
} | ||
}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In some ways, this makes things a lot cleaner
//! } | ||
//! | ||
//! fn decimal(input: &str) -> IResult<&str, &str> { | ||
//! fn decimal<'s>(input: &mut &'s str) -> PResult<&'s str> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change can sometimes force explicit lifetimes
src/ascii/tests.rs
Outdated
let f = "βèƒôřè\rÂßÇáƒƭèř"; | ||
assert_eq!( | ||
not_line_ending.parse_peek(f), | ||
Err(ErrMode::Backtrack(Error::new(f, ErrorKind::Tag))) | ||
Err(ErrMode::Backtrack(Error::new(&f[12..], ErrorKind::Tag))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an example of how error reporting changes with this. We now report errors (and in generally leave the shared mutable stream) at the most specific location where the error occurred
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could provide a wrapper that does this. I don't get where the offset of 12 comes from here and probably neither of the other users would.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be curious what you are thinking for what this wrapper would be.
In this case, the offset of 12 is the \r
character which is what caused the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see it now, nevermind then. I should go to bed...
$input.reset($start.clone()); | ||
match $self.$it.parse_next($input) { | ||
Err(ErrMode::Backtrack(e)) => { | ||
let err = $err.or(e); | ||
succ!($it, alt_trait_inner!($self, $input, err, $($id)+)) | ||
succ!($it, alt_trait_inner!($self, $input, $start, err, $($id)+)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the main changes is if a user needs to backtrack, instead of preserving the original input, they take a checkpoint
and reset
the stream to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the checkpoint/reset concept very elegant. Maybe checkpoint need a different name. I instantly thought of postgres WAL checkpoints. Not too far off though. Maybe freeze
could be a possible alternativ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I didn't want to block the experiment by waiting to come up with a name, so I just took what combine named things.
I had also been considering snapsnhot
though save
and bookmark
could also work. state
would overlap with Stateful
.
For me, freeze
carries the wrong annotation, that you are locking things down like in Python's frozenmap
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I wasn't sure if I could make checkpoints as small as they are. I was surprised with what you can do with traits, like
- 8b3a1bc (passing an associated type as generic parameter to a trait)
- ade7912 (passing an associated type of the trait currently being defined to a generic parameter of a super trait)
The initialization-order aspects of those I find wild.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I learnt something new today: didn't know that you could use as
in generic parameter position. 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
winnow
has pushed my knowledge of what is possible...
src/combinator/multi.rs
Outdated
loop { | ||
let i_ = input.clone(); | ||
let start = input.checkpoint(); | ||
let len = input.eof_offset(); | ||
match f.parse_peek(i_) { | ||
Ok((i, o)) => { | ||
match f.parse_next(input) { | ||
Ok(o) => { | ||
// infinite loop check: the parser must always consume | ||
if i.eof_offset() == len { | ||
return Err(ErrMode::assert(i, "`repeat` parsers must always consume")); | ||
if input.eof_offset() == len { | ||
return Err(ErrMode::assert( | ||
input.clone(), | ||
"`repeat` parsers must always consume", | ||
)); | ||
} | ||
|
||
res = g(res, o); | ||
input = i; | ||
} | ||
Err(ErrMode::Backtrack(_)) => { | ||
return Ok((input, res)); | ||
input.reset(start); | ||
return Ok(res); | ||
} | ||
Err(e) => { | ||
return Err(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, for repeating parsers, we have to checkpoint
before each parse attempt and then reset
when its the end of the error.
In general, this type of scheme means only a fraction of the code needs to do checkpointing. The alternative would be every parser would need to perform a "transaction" and that seemed too invasive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think checkpoint/reset are better than the status quo since with that it's more explicit when and where input rewinding happens. See the changes to the expr_inner
function in parser/expr.rs
in my hcl-edit branch. I think with the checkpointing it's clearer what's going on now.
@martinohmann I know you have limited time but I would appreciate feedback on this API change. The primary motivation is to reduce overhead when using |
I had an hour of spare time and compiled an The branch is here: martinohmann/hcl-rs@main...hcl-edit/winnow-stream-mutation Benchmark results also look nice so far. I'm seeing ~15% performance improvements with this. Current
The
|
Thanks for looking into that! I might build off of that to see if there is further room for improvement. If this is about return types, your error type is likely the remaining limiting factor. We could instead store a The standard approach of just boxing all of the error wouldn't work well because that would make backtracking more expensive. |
Yeah, that's a good idea. I'll try that out on my branch the next time I have a spare hour and report back. On another note: the excessive explict lifetime annotations I added are just a result of how I defined my own |
The `FnMut` impl is still focused on `parse_peek`
This will make some `parse_peek` to `parse_next` conversions to be automatic.
I think the I think the |
As for reproducing the performance gains seen from this PR, I ran $ cargo bench -p benchmarks --bench parse -- parse/hcl-edit/large.tf
(yeah, my machiner is jittery and I do about 4-5 runs of criterion) |
Another thing that might be useful to simplify the migration: I had to replace uses of |
Good call. I'll be evaluating the migration more when we are getting ready for release. In that case, I might make changes to the previous release to help. |
A couple are still left but I decided to defer working on them as they are a bit more complicated to move over.
I've since split this out into individual PRs I also just release 0.4.8 which includes |
This is great! Looking forward to v0.5. I'm also curious to see if it enables further performance improvements than what we already saw. Also, maybe the workaround described in https://docs.rs/winnow/latest/winnow/_topic/performance/index.html#built-time-performance to reduce build times in some cases isn't really needed anymore given that the overall signature of |
I'm assuming its not enough of a difference in the signature but we'll see. However, if you use |
Everything is there for still writing pure parsers when people want to.
Remaining work
unpeek
Located
to the input #72 and see if its addressedFixes #72