Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(parse)!: Switch from pure parsers to stream mutation #268

Closed
wants to merge 26 commits into from

Conversation

epage
Copy link
Collaborator

@epage epage commented Jun 30, 2023

Everything is there for still writing pure parsers when people want to.

Remaining work

Fixes #72

src/binary/mod.rs Fixed Show fixed Hide fixed
src/binary/mod.rs Fixed Show fixed Hide fixed
src/binary/mod.rs Fixed Show fixed Hide fixed
src/binary/mod.rs Fixed Show fixed Hide fixed
@coveralls
Copy link

coveralls commented Jun 30, 2023

Pull Request Test Coverage Report for Build 5470633585

  • 412 of 620 (66.45%) changed or added relevant lines in 15 files are covered.
  • 58 unchanged lines in 8 files lost coverage.
  • Overall coverage decreased (-1.5%) to 49.001%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/combinator/branch.rs 12 13 92.31%
src/error.rs 0 1 0.0%
src/trace/internals.rs 0 1 0.0%
src/trace/mod.rs 0 3 0.0%
src/binary/mod.rs 63 67 94.03%
src/token/mod.rs 24 28 85.71%
src/binary/bits/mod.rs 32 40 80.0%
src/combinator/core.rs 25 37 67.57%
src/parser.rs 12 25 48.0%
src/combinator/parser.rs 37 68 54.41%
Files with Coverage Reduction New Missed Lines %
src/binary/bits/mod.rs 1 85.14%
src/token/mod.rs 1 86.09%
src/combinator/parser.rs 2 59.34%
src/error.rs 2 32.78%
src/combinator/core.rs 3 49.33%
src/combinator/multi.rs 6 68.35%
src/binary/mod.rs 18 75.54%
src/ascii/mod.rs 25 49.55%
Totals Coverage Status
Change from base Build 5448486419: -1.5%
Covered Lines: 1202
Relevant Lines: 2453

💛 - Coveralls

Comment on lines 39 to 53
fn json_value<'i, E: ParseError<Stream<'i>> + ContextError<Stream<'i>, &'static str>>(
input: Stream<'i>,
) -> IResult<Stream<'i>, JsonValue, E> {
input: &mut Stream<'i>,
) -> PResult<JsonValue, E> {
// `alt` combines the each value parser. It returns the result of the first
// successful parser, or an error
alt((
unpeek(null).value(JsonValue::Null),
unpeek(boolean).map(JsonValue::Boolean),
unpeek(string).map(JsonValue::Str),
null.value(JsonValue::Null),
boolean.map(JsonValue::Boolean),
string.map(JsonValue::Str),
float.map(JsonValue::Num),
unpeek(array).map(JsonValue::Array),
unpeek(object).map(JsonValue::Object),
array.map(JsonValue::Array),
object.map(JsonValue::Object),
))
.parse_peek(input)
.parse_next(input)
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example of what a lot of the changes look like

(unpeek and parse_peek are approaches for making the old way work with the new API)

Comment on lines 116 to 123
trace("cond", move |input: &mut I| {
if b {
f.parse_next(input).map(Some)
} else {
Ok(None)
}
})
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some ways, this makes things a lot cleaner

//! }
//!
//! fn decimal(input: &str) -> IResult<&str, &str> {
//! fn decimal<'s>(input: &mut &'s str) -> PResult<&'s str> {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change can sometimes force explicit lifetimes

Comment on lines 241 to 245
let f = "βèƒôřè\rÂßÇáƒƭèř";
assert_eq!(
not_line_ending.parse_peek(f),
Err(ErrMode::Backtrack(Error::new(f, ErrorKind::Tag)))
Err(ErrMode::Backtrack(Error::new(&f[12..], ErrorKind::Tag)))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example of how error reporting changes with this. We now report errors (and in generally leave the shared mutable stream) at the most specific location where the error occurred

Copy link
Contributor

@martinohmann martinohmann Jul 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could provide a wrapper that does this. I don't get where the offset of 12 comes from here and probably neither of the other users would.

Copy link
Collaborator Author

@epage epage Jul 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be curious what you are thinking for what this wrapper would be.

In this case, the offset of 12 is the \r character which is what caused the error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see it now, nevermind then. I should go to bed...

Comment on lines +150 to 155
$input.reset($start.clone());
match $self.$it.parse_next($input) {
Err(ErrMode::Backtrack(e)) => {
let err = $err.or(e);
succ!($it, alt_trait_inner!($self, $input, err, $($id)+))
succ!($it, alt_trait_inner!($self, $input, $start, err, $($id)+))
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the main changes is if a user needs to backtrack, instead of preserving the original input, they take a checkpoint and reset the stream to it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the checkpoint/reset concept very elegant. Maybe checkpoint need a different name. I instantly thought of postgres WAL checkpoints. Not too far off though. Maybe freeze could be a possible alternativ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I didn't want to block the experiment by waiting to come up with a name, so I just took what combine named things.

I had also been considering snapsnhot though save and bookmark could also work. state would overlap with Stateful.

For me, freeze carries the wrong annotation, that you are locking things down like in Python's frozenmap.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I wasn't sure if I could make checkpoints as small as they are. I was surprised with what you can do with traits, like

  • 8b3a1bc (passing an associated type as generic parameter to a trait)
  • ade7912 (passing an associated type of the trait currently being defined to a generic parameter of a super trait)

The initialization-order aspects of those I find wild.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I learnt something new today: didn't know that you could use as in generic parameter position. 😃

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

winnow has pushed my knowledge of what is possible...

Comment on lines 842 to 860
loop {
let i_ = input.clone();
let start = input.checkpoint();
let len = input.eof_offset();
match f.parse_peek(i_) {
Ok((i, o)) => {
match f.parse_next(input) {
Ok(o) => {
// infinite loop check: the parser must always consume
if i.eof_offset() == len {
return Err(ErrMode::assert(i, "`repeat` parsers must always consume"));
if input.eof_offset() == len {
return Err(ErrMode::assert(
input.clone(),
"`repeat` parsers must always consume",
));
}

res = g(res, o);
input = i;
}
Err(ErrMode::Backtrack(_)) => {
return Ok((input, res));
input.reset(start);
return Ok(res);
}
Err(e) => {
return Err(e);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, for repeating parsers, we have to checkpoint before each parse attempt and then reset when its the end of the error.

In general, this type of scheme means only a fraction of the code needs to do checkpointing. The alternative would be every parser would need to perform a "transaction" and that seemed too invasive

Copy link
Contributor

@martinohmann martinohmann Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think checkpoint/reset are better than the status quo since with that it's more explicit when and where input rewinding happens. See the changes to the expr_inner function in parser/expr.rs in my hcl-edit branch. I think with the checkpointing it's clearer what's going on now.

@epage
Copy link
Collaborator Author

epage commented Jun 30, 2023

@martinohmann I know you have limited time but I would appreciate feedback on this API change. The primary motivation is to reduce overhead when using Located but I'm still working on verifying that it helps with that case. At minimum, it doesn't seem to be hurting performance. If it doesn't pan out, I'm assuming having a pure-functional API would be preferred and I'd reject this PR.

@martinohmann
Copy link
Contributor

@martinohmann I know you have limited time but I would appreciate feedback on this API change. The primary motivation is to reduce overhead when using Located but I'm still working on verifying that it helps with that case. At minimum, it doesn't seem to be hurting performance. If it doesn't pan out, I'm assuming having a pure-functional API would be preferred and I'd reject this PR.

I had an hour of spare time and compiled an hcl-edit branch which uses the changes in this PR to see how this goes. I have to say that the migration was fairly easy and I was able to do the majority of necessary adjustments via search&replace (mostly changing Input<'a> to &mut Input<'a> and replacing IResult to PResult). I didn't look if there are new opportunities for performance optimizations or code simplifications yet, but just tried to make it work.

The branch is here: martinohmann/hcl-rs@main...hcl-edit/winnow-stream-mutation

Benchmark results also look nice so far. I'm seeing ~15% performance improvements with this.

Current main:

parse/hcl-edit/deeply_nested.tf
                        time:   [18.422 µs 18.459 µs 18.511 µs]
                        thrpt:  [38.176 MiB/s 38.283 MiB/s 38.360 MiB/s]
parse/hcl-edit/large.tf time:   [1.8049 ms 1.8123 ms 1.8230 ms]
                        thrpt:  [44.719 MiB/s 44.983 MiB/s 45.166 MiB/s]
parse/hcl-edit/medium.tf
                        time:   [414.88 µs 415.94 µs 417.18 µs]
                        thrpt:  [34.418 MiB/s 34.521 MiB/s 34.609 MiB/s]
parse/hcl-edit/small.tf time:   [24.656 µs 24.675 µs 24.698 µs]
                        thrpt:  [38.342 MiB/s 38.379 MiB/s 38.409 MiB/s]

The hcl-edit/winnow-stream-mutation branch:

parse/hcl-edit/deeply_nested.tf
                        time:   [15.302 µs 15.306 µs 15.312 µs]
                        thrpt:  [46.153 MiB/s 46.169 MiB/s 46.182 MiB/s]
                 change:
                        time:   [-17.253% -16.961% -16.692%] (p = 0.00 < 0.05)
                        thrpt:  [+20.037% +20.425% +20.850%]
                        Performance has improved.
parse/hcl-edit/large.tf time:   [1.5211 ms 1.5299 ms 1.5418 ms]
                        thrpt:  [52.876 MiB/s 53.287 MiB/s 53.594 MiB/s]
                 change:
                        time:   [-16.262% -15.584% -14.790%] (p = 0.00 < 0.05)
                        thrpt:  [+17.357% +18.461% +19.420%]
                        Performance has improved.
parse/hcl-edit/medium.tf
                        time:   [355.48 µs 356.44 µs 357.55 µs]
                        thrpt:  [40.158 MiB/s 40.283 MiB/s 40.392 MiB/s]
                 change:
                        time:   [-14.652% -14.305% -13.994%] (p = 0.00 < 0.05)
                        thrpt:  [+16.270% +16.693% +17.168%]
                        Performance has improved.
parse/hcl-edit/small.tf time:   [20.405 µs 20.481 µs 20.590 µs]
                        thrpt:  [45.992 MiB/s 46.239 MiB/s 46.410 MiB/s]
                 change:
                        time:   [-17.433% -16.925% -16.355%] (p = 0.00 < 0.05)
                        thrpt:  [+19.552% +20.373% +21.113%]
                        Performance has improved.

@epage
Copy link
Collaborator Author

epage commented Jun 30, 2023

Thanks for looking into that!

I might build off of that to see if there is further room for improvement. If this is about return types, your error type is likely the remaining limiting factor. We could instead store a Stream::Checkpoint inside of it and store the cause in Context which should shrink ParseError from being 9-10 pointers wide to 5 pointers wide. Doubt its needed but storing content in a Option<Box<Vec<Context>>> might save another pointer-width without much or any performance impact. Likely even the Checkpoint could be dropped because the Stream is left pointing to where the error is when everything is done.

The standard approach of just boxing all of the error wouldn't work well because that would make backtracking more expensive.

@martinohmann
Copy link
Contributor

Yeah, that's a good idea. I'll try that out on my branch the next time I have a spare hour and report back.

On another note: the excessive explict lifetime annotations I added are just a result of how I defined my own PResult for now because I was being lazy. They are not really a side effect of your changes (except in the cases that you pointed out further up).

@epage
Copy link
Collaborator Author

epage commented Jun 30, 2023

I think the cause change slowed things down: https://github.com/epage/hcl-rs/tree/hcl-edt/winnow-error-cause

I think the input change was neutral on performance: https://github.com/epage/hcl-rs/tree/hcl-edt/winnow-error-input

@epage
Copy link
Collaborator Author

epage commented Jun 30, 2023

As for reproducing the performance gains seen from this PR, I ran

$ cargo bench -p benchmarks --bench parse -- parse/hcl-edit/large.tf
  • Before: results ranged from 1.18ms - 1.30ms
  • After: results ranged from 1.04ms - 1.14ms

(yeah, my machiner is jittery and I do about 4-5 runs of criterion)

@martinohmann
Copy link
Contributor

martinohmann commented Jul 1, 2023

Another thing that might be useful to simplify the migration: I had to replace uses of Offset::offset_to with Offset::offset_from in some places. Maybe don't remove offset_to but rather provide a default impl which uses offset_from under the hood.

@epage
Copy link
Collaborator Author

epage commented Jul 3, 2023

Another thing that might be useful to simplify the migration: I had to replace uses of Offset::offset_to with Offset::offset_from in some places. Maybe don't remove offset_to but rather provide a default impl which uses offset_from under the hood.

Good call. I'll be evaluating the migration more when we are getting ready for release. In that case, I might make changes to the previous release to help.

examples/json_iterator.rs Fixed Show fixed Hide fixed
examples/json_iterator.rs Fixed Show fixed Hide fixed
examples/json_iterator.rs Fixed Show fixed Hide fixed
examples/json_iterator.rs Fixed Show fixed Hide fixed
benches/number.rs Fixed Show fixed Hide fixed
@epage
Copy link
Collaborator Author

epage commented Jul 6, 2023

I've since split this out into individual PRs

I also just release 0.4.8 which includes Parser::parse_peek, Stream::peek_token, Stream::peek_slice, and Offset::offset_from which wrap existing functionality but make it easier to prep for the upgrade within the 0.4 branch

@martinohmann
Copy link
Contributor

This is great! Looking forward to v0.5. I'm also curious to see if it enables further performance improvements than what we already saw. Also, maybe the workaround described in https://docs.rs/winnow/latest/winnow/_topic/performance/index.html#built-time-performance to reduce build times in some cases isn't really needed anymore given that the overall signature of FnMut parsers is smaller now, which might simplify the generated types. At least for me, that would allow removing a substantial amount of boilerplate from my parser code. But maybe i'm completely off here.

@epage
Copy link
Collaborator Author

epage commented Jul 6, 2023

overall signature of FnMut parsers is smaller now,

I'm assuming its not enough of a difference in the signature but we'll see.

However, if you use trace to help debugging then you are always doing it anyways...

@epage epage closed this Jul 10, 2023
@epage epage deleted the mut branch July 10, 2023 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parsing takes 30% longer just by adding Located to the input
3 participants