Parsing takes 30% longer just by adding `Located` to the input #72

epage · 2023-01-25T22:46:12Z

When evaluating location / span solutions and checking there performance, I found:

Located<I> takes about a 30% performance hit over I without actually capturing spans
pori's Located took the json benchmark from 1.1126 µs to 1.4623 µs
nom_locate's Located took the json benchmark from 1.4362 µs (pori) to 1.9520 µs
Stateful<I, ()> has negligible performance impact
- Stateful with usize, (usize, usize), and (usize, usize, usize) mirrors the Located performance and then continued to get slower from there.

See also

feat(input): Add span support #61

WIP: perf(input): Store minimal initial data for Located #62

fix: Use pori style Located #63

fix: Adapt to nom-locate style #64

The text was updated successfully, but these errors were encountered:

epage · 2023-01-26T00:59:01Z

combine doesn't have the parser in the return type but uses &mut I. It then saves and restores the position. This seems like this would get us to the Stateful<I, ()> level of performance.

Geal tried that but felt it was brittle. Unfortunately, I can't find where we talked about that.

pom takes the approach of Parser inputs being a (&str, usize) and the return type includes usize, so you are constantly indexing into the input but have to pass the original input around. I'd be worried about the cost of that indexing/bounds checking

epage · 2023-02-22T21:09:44Z

I'm assuming the combine approach of &mut I is the best approach. The biggest problem is correctly reverting it on errors.
While combine gets away with this design, probably one of the reasons it can is that it is more of a framework model where users don't hand-implement parsers (generally) so it can handle all the hard stuff and users just describe the parser they want.

Something we can do to make this easier for parser-writers is Parser::parse_next impl for FnMut could wrap functions in a Stream-transaction to do this for users.

Probably my biggest concern is losing out on the beauty of the pure-function model. I'm especially concerned with how users of the library will react.

epage · 2023-03-06T15:26:21Z

What if the API guarentee for &mut I was not that it rolls back on error but it instead points to the error location?

This means only combinators like opt, alt, and many* need to handle transactions. We might want to allow adapters with the functional style for when people write non-declarative parsers as that will still be the easiest.

This also means that errors won't need to track the input anymore unless you are doing an error tree.

epage · 2023-07-06T14:47:32Z

There was no single PR that "closed" this. The main PRs are

There is some remaining documentation work. A lot of the parser doc comments still use the old-style function signature. They just get called directly, so no unpeeks were needed

PRs that serve as migration examples include

fix(parser)!: Switch inherent method from parse_next to parse_peek #276 for impl Parser
refactor: Update to new style of FnMut parsers #279 for fns

For the common cases, this appears to be neutral on performance or has a slight benefit (despite the double indirection)

v0.4.7

json/basic/canada       time:   [10.272 ms 10.308 ms 10.347 ms]
json/verbose/canada     time:   [32.539 ms 32.714 ms 32.893 ms]
json/dispatch/canada    time:   [8.9747 ms 8.9926 ms 9.0118 ms]
json/streaming/canada   time:   [10.612 ms 10.667 ms 10.726 ms]

#279

json/basic/canada       time:   [9.7645 ms 9.7882 ms 9.8147 ms]
json/unit/canada        time:   [9.1609 ms 9.1819 ms 9.2036 ms]
json/verbose/canada     time:   [30.859 ms 30.906 ms 30.957 ms]
json/dispatch/canada    time:   [8.5941 ms 8.6272 ms 8.6656 ms]
json/streaming/canada   time:   [10.044 ms 10.108 ms 10.172 ms]

In #268, we confirmed that for parsers like hcl-edit, there was a consistent 10% performance gain. My best guess is that the return value of some parsers was large enough that it was using the stack instead of registers (I would have thought the size to be smaller than that, making us hit it in basic benchmarks)

The main risk was that we were going to lose the "core" of what made nom and what makes winnow: straightforward, easy to write code. The pure-functional nature of the old API seemed to make this trivial. With the way we got the API contract setup in this, the code feels just as simple or simpler. We did check with users of winnow in #268, VorpalBlade/chezmoi_modify_manager#20, metent/uair#11, organize-rs/organize#2, and resistor/rsrc-rs#1 to see if there were concerns over the new API but there weren't, only in making the upgrade path smooth.

epage added the enhancement label Jan 26, 2023

settings bot removed the enhancement label Jan 26, 2023

epage added C-enhancement Category: Raise on the bar on expectations M-breaking-change Meta: Implementing or merging this will introduce a breaking change. A-input Area: input / parsing state labels Jan 26, 2023

epage mentioned this issue Jan 26, 2023

Give control over backtracking or not #75

Open

2 tasks

epage pinned this issue Jan 27, 2023

epage added this to the 0.2 milestone Jan 30, 2023

epage mentioned this issue Feb 8, 2023

fix(stream)!: Change from "input" terminology to "stream" #142

Merged

epage mentioned this issue Feb 16, 2023

fix!: Parsing over a slice should invoke complete parsers #172

Merged

epage modified the milestones: 0.4.x, 0.5.x Mar 22, 2023

epage mentioned this issue Mar 24, 2023

performance regression in winnow 0.4.0 #223

Closed

2 tasks

epage mentioned this issue Jun 16, 2023

Large tuples are a performance footgun #237

Open

2 tasks

This was referenced Jun 26, 2023

feat(stream)!: Add checkpointing to Stream #260

Merged

refactor(token): Move helpers closer to use #263

Merged

perf(parse)!: Switch from pure parsers to stream mutation #268

Closed

epage closed this as completed Jul 6, 2023

epage mentioned this issue Jul 7, 2023

feat(error)!: Provide a simple, getting started error #282

Merged

epage unpinned this issue Dec 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parsing takes 30% longer just by adding `Located` to the input #72

Parsing takes 30% longer just by adding `Located` to the input #72

epage commented Jan 25, 2023 •

edited

Loading

epage commented Jan 26, 2023

epage commented Feb 22, 2023

epage commented Mar 6, 2023

epage commented Jul 6, 2023

Parsing takes 30% longer just by adding Located to the input #72

Parsing takes 30% longer just by adding Located to the input #72

Comments

epage commented Jan 25, 2023 • edited Loading

epage commented Jan 26, 2023

epage commented Feb 22, 2023

epage commented Mar 6, 2023

epage commented Jul 6, 2023

Parsing takes 30% longer just by adding `Located` to the input #72

Parsing takes 30% longer just by adding `Located` to the input #72

epage commented Jan 25, 2023 •

edited

Loading