Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing takes 30% longer just by adding Located to the input #72

Closed
epage opened this issue Jan 25, 2023 · 4 comments
Closed

Parsing takes 30% longer just by adding Located to the input #72

epage opened this issue Jan 25, 2023 · 4 comments
Labels
A-input Area: input / parsing state C-enhancement Category: Raise on the bar on expectations M-breaking-change Meta: Implementing or merging this will introduce a breaking change.
Milestone

Comments

@epage
Copy link
Collaborator

epage commented Jan 25, 2023

When evaluating location / span solutions and checking there performance, I found:

  • Located<I> takes about a 30% performance hit over I without actually capturing spans
  • pori's Located took the json benchmark from 1.1126 µs to 1.4623 µs
  • nom_locate's Located took the json benchmark from 1.4362 µs (pori) to 1.9520 µs
  • Stateful<I, ()> has negligible performance impact
    • Stateful with usize, (usize, usize), and (usize, usize, usize) mirrors the Located performance and then continued to get slower from there.

See also

@epage
Copy link
Collaborator Author

epage commented Jan 26, 2023

combine doesn't have the parser in the return type but uses &mut I. It then saves and restores the position. This seems like this would get us to the Stateful<I, ()> level of performance.

Geal tried that but felt it was brittle. Unfortunately, I can't find where we talked about that.

pom takes the approach of Parser inputs being a (&str, usize) and the return type includes usize, so you are constantly indexing into the input but have to pass the original input around. I'd be worried about the cost of that indexing/bounds checking

@settings settings bot removed the enhancement label Jan 26, 2023
@epage epage added C-enhancement Category: Raise on the bar on expectations M-breaking-change Meta: Implementing or merging this will introduce a breaking change. A-input Area: input / parsing state labels Jan 26, 2023
@epage epage pinned this issue Jan 27, 2023
@epage epage added this to the 0.2 milestone Jan 30, 2023
@epage
Copy link
Collaborator Author

epage commented Feb 22, 2023

I'm assuming the combine approach of &mut I is the best approach. The biggest problem is correctly reverting it on errors.
While combine gets away with this design, probably one of the reasons it can is that it is more of a framework model where users don't hand-implement parsers (generally) so it can handle all the hard stuff and users just describe the parser they want.

Something we can do to make this easier for parser-writers is Parser::parse_next impl for FnMut could wrap functions in a Stream-transaction to do this for users.

Probably my biggest concern is losing out on the beauty of the pure-function model. I'm especially concerned with how users of the library will react.

@epage
Copy link
Collaborator Author

epage commented Mar 6, 2023

What if the API guarentee for &mut I was not that it rolls back on error but it instead points to the error location?

This means only combinators like opt, alt, and many* need to handle transactions. We might want to allow adapters with the functional style for when people write non-declarative parsers as that will still be the easiest.

This also means that errors won't need to track the input anymore unless you are doing an error tree.

@epage
Copy link
Collaborator Author

epage commented Jul 6, 2023

There was no single PR that "closed" this. The main PRs are

There is some remaining documentation work. A lot of the parser doc comments still use the old-style function signature. They just get called directly, so no unpeeks were needed

PRs that serve as migration examples include

For the common cases, this appears to be neutral on performance or has a slight benefit (despite the double indirection)

v0.4.7

json/basic/canada       time:   [10.272 ms 10.308 ms 10.347 ms]
json/verbose/canada     time:   [32.539 ms 32.714 ms 32.893 ms]
json/dispatch/canada    time:   [8.9747 ms 8.9926 ms 9.0118 ms]
json/streaming/canada   time:   [10.612 ms 10.667 ms 10.726 ms]

#279

json/basic/canada       time:   [9.7645 ms 9.7882 ms 9.8147 ms]
json/unit/canada        time:   [9.1609 ms 9.1819 ms 9.2036 ms]
json/verbose/canada     time:   [30.859 ms 30.906 ms 30.957 ms]
json/dispatch/canada    time:   [8.5941 ms 8.6272 ms 8.6656 ms]
json/streaming/canada   time:   [10.044 ms 10.108 ms 10.172 ms]

In #268, we confirmed that for parsers like hcl-edit, there was a consistent 10% performance gain. My best guess is that the return value of some parsers was large enough that it was using the stack instead of registers (I would have thought the size to be smaller than that, making us hit it in basic benchmarks)

The main risk was that we were going to lose the "core" of what made nom and what makes winnow: straightforward, easy to write code. The pure-functional nature of the old API seemed to make this trivial. With the way we got the API contract setup in this, the code feels just as simple or simpler. We did check with users of winnow in #268, VorpalBlade/chezmoi_modify_manager#20, metent/uair#11, organize-rs/organize#2, and resistor/rsrc-rs#1 to see if there were concerns over the new API but there weren't, only in making the upgrade path smooth.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-input Area: input / parsing state C-enhancement Category: Raise on the bar on expectations M-breaking-change Meta: Implementing or merging this will introduce a breaking change.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant