feat(input): Add span support #61

epage · 2022-12-27T18:46:13Z

There are two main crates that provide location support for nom

nom-locate: wide adoption for long time
pori: more recent, only really being used by one application

While pori is less mature, I think it iterates quite well on nom-locate; so let's break this down.

How to track location

pori: tracks how much input has advanced
- nom-locate: same as above but also tracks the line number
leave initial input unchanged and only change begin/end indices, doing unchecked slicing of the input to get the current input
- Note: this is fairly similar to pom whose input is the Input + location
keep initial and current input and use offset
keep start pointer from initial and current input

For now, I'm going with (initial, input) because it is relatively easy to implement and the bookkeeping costs are paid when doing location lookups. The downside is it takes up more space, especially if Located isn't the inner-most wrapper type. We could reduce this somewhat by using <I as IntoOutput>::Output for initial. If we make more assumptions on inner types, we could just capture the start pointer and do math with that when needed.

The problem with nom-locate tracking line numbers is it pessimizes performance. Knowing the line number is more of a UI concern and should be deferred until there (if the UI will even ask for it).

How to capture a span

pori combinators that returns (usize, usize)
nom-locate use the wrapper type
- This makes any recognize, take, etc capture a span which is convenient

I was originally attracted to nom-locates approach but the downside is that this makes these types more awkward to work with when not capturing a span. I decided to instead use Parser::span and Parser::with_span combinators, similar to recognize and with_recognized. poris combinators returned (usize, usize), meaning (start, length) which could be confused with (start, end) with "end" being inclusive or not. To make this clearer, I used Range<usize>. Now how it works is embedded in the type system.

While the approach in this commit is a new design, it iterates on previous designs with an effort to simplify them down to their most basic parts. I feel fairly confident that this is something we can commit API stability to.

There are two main crates that provide location support for nom - `nom-locate`: wide adoption for long time - `pori`: more recent, only really being used by one application While `pori` is less mature, I think it iterates quite well on `nom-locate`; so let's break this down. How to track location - `pori`: tracks how much `input` has advanced - `nom-locate`: same as above but also tracks the line number - leave initial input unchanged and only change begin/end indices, doing unchecked slicing of the input to get the current input - keep initial and current input and use offset - keep start pointer from initial and current input For now, I'm going with (initial, input) because it is relatively easy to implement and the bookkeeping costs are paid when doing location lookups. The downside is it takes up more space, especially if `Located` isn't the inner-most wrapper type. We could reduce this somewhat by using `<I as IntoOutput>::Output` for `initial`. If we make more assumptions on inner types, we could just capture the start pointer and do math with that when needed. The problem with nom-locate tracking line numbers is it pessimizes performance. Knowing the line number is more of a UI concern and should be deferred until there (if the UI will even ask for it). How to capture a span - `pori` combinators that returns `(usize, usize)` - `nom-locate` use the wrapper type - This makes any `recognize`, `take`, etc capture a span which is convenient I was originally attracted to `nom-locate`s approach but the downside is that this makes these types more awkward to work with when **not** capturing a span. I decided to instead use `Parser::span` and `Parser::with_span` combinators, similar to `recognize` and `with_recognized`. `pori`s combinators returned `(usize, usize)`, meaning (start, length) which could be confused with (start, end) with "end" being inclusive or not. To make this clearer, I used `Range<usize>`. Now how it works is embedded in the type system. While the approach in this commit is a new design, it iterates on previous designs with an effort to simplify them down to their most basic parts. I feel fairly confident that this is something we can commit API stability to.

coveralls · 2022-12-27T18:48:07Z

Pull Request Test Coverage Report for Build 3789359607

0 of 112 (0.0%) changed or added relevant lines in 3 files are covered.
1 unchanged line in 1 file lost coverage.
Overall coverage decreased (-2.01%) to 56.221%

Changes Missing Coverage	Changed/Added Lines	%
src/parser.rs	2	0.0%
src/combinator/mod.rs	14	0.0%
src/input.rs	96	0.0%

Files with Coverage Reduction	New Missed Lines	%
src/character/complete.rs	1	79.58%

Totals
Change from base Build 3789169739:	-2.01%
Covered Lines:	1821
Relevant Lines:	3239

💛 - Coveralls

epage · 2022-12-30T17:54:26Z

Stateful<I, ()> has negligible performance impact.

Located<I> takes about a 30% parsing hit

epage · 2022-12-30T18:14:16Z

I just implemented pori style Located and it seems be 30% slower than what is implemented (#63)

I also tested Stateful with usize, (usize, usize), and (usize, usize, usize) and it mirrored the Located<I> performance and then continued to get slower from there.

epage · 2022-12-30T18:27:56Z

#64 shows that nom-locate is even slower.

epage · 2022-12-30T18:29:07Z

I suspect the only way to capture back some of the performance is if we switch from the input being returned to it being a &mut Input parameter. This would reduce the size of the return type which I suspect is the biggest slow down.

epage · 2022-12-30T18:32:37Z

Interesting, pom takes the approach of Parser inpuits being a (&str, usize) and the return type includes usize, so you are constantly indexing into the input but have to pass the original input around.

epage merged commit fb3a8a1 into main Dec 27, 2022

epage deleted the locate branch December 27, 2022 18:49

epage mentioned this pull request Dec 27, 2022

WIP: perf(input): Store minimal initial data for Located #62

Closed

epage mentioned this pull request Dec 30, 2022

blog/nom-error-recovery/ ebkalderon/ebkalderon.github.io#3

Open

epage mentioned this pull request Jan 25, 2023

Parsing takes 30% longer just by adding Located to the input #72

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(input): Add span support #61

feat(input): Add span support #61

epage commented Dec 27, 2022 •

edited

Loading

coveralls commented Dec 27, 2022

epage commented Dec 30, 2022

epage commented Dec 30, 2022 •

edited

Loading

epage commented Dec 30, 2022

epage commented Dec 30, 2022

epage commented Dec 30, 2022

feat(input): Add span support #61

feat(input): Add span support #61

Conversation

epage commented Dec 27, 2022 • edited Loading

coveralls commented Dec 27, 2022

Pull Request Test Coverage Report for Build 3789359607

💛 - Coveralls

epage commented Dec 30, 2022

epage commented Dec 30, 2022 • edited Loading

epage commented Dec 30, 2022

epage commented Dec 30, 2022

epage commented Dec 30, 2022

epage commented Dec 27, 2022 •

edited

Loading

epage commented Dec 30, 2022 •

edited

Loading