Replace error `Vec` with a linked list of slab `Vec`s #75

domenicquirl · 2022-01-16T20:22:01Z

You mentioned that you wanted to take some stabs at performance. While I am not familiar enough with the codebase to suggest any structural changes (and sadly don't have the time to change that, as interesting as chumsky is), I had a look at the benchmarks and also perfed them. The flamegraph has the usual problem of recursive and/or deeply layered parsing function that any parser expectedly consists mostly of its sub-parsers and it's hard to make out problematic areas. One part that showed up repeatedly was Vec::push and Vec::append.

If I saw correctly, some parsers like Seq use a Vec as output as well, but on top-level I noticed all of the parsing errors are returned in a Vec, so many of the parsers copy them from outputs of their constituents into an aggregated result Vec. I replaced that with a wrapper around LinkedList<Vec<T>> to obtain the O(1) characteristics of its append. It doesn't have everything that Vec has (in particular, no methods to remove elements), but everything that you use in chumsky plus iter and into_iter, so it's possible to get back to a Vec after parsing . On the Linux machine I tested this, making this change through all the parsers and also the recovery Strategys made about a 10% difference. This depends heavily on the presence of errors, obviously.

Notables:

Breaking: the container type is exposed through PResult
Right now this PR increases the MSRV to current stable. That's only because I used unwrap_unchecked though, so if you want to support earlier versions that's an easy change
I added tests for the data structure, and also documented it briefly. I put the thing in its own new module, which is missing the quote in its module comment because I didn't feel qualified ^^

zesterer · 2022-02-13T12:47:33Z

Hey, thanks for this! I'm probably not going to merge this now because I'm working on a pretty big overhaul in #82, but I'll definitely be using this PR as a reference!

domenicquirl · 2022-02-13T14:57:02Z

Very welcome! I somewhat expected that after hearing about the (very cool!) zero-copy rewrite. If you may be looking at this later, let me quickly note down three results of trying out some different structures and perfing them:

in general, obviously the less code the better and in particular the less additional branching the better
that said, one of the most impactful things regarding performance was to make sure to keep the "no allocations until some element is actually added" that all of the std data structures have (in particular the vec and list used). Pre-allocation basically always yielded worse results, presumably due to the fact that - barring highly malformed input - most parser invocations will not return any errors, so you don't want to add unnecessary allocation work to all these cases. Having an emptiness check on insertion therefore consistently performs better than always having space available.
I did experiment with having a separate Vec for individual elements instead of pushing them onto the last current buffer in the list. In theory, this makes the data structure more independent from how big Vecs being pushed on are - if you always make the separate Vec have the same capacity that the one before ended up needing, that probably re-allocates less than potentially re-sizing every time because you add elements to incoming Vecs. In practice, there probably again aren't enough errors for that to be worth it (since iirc the first allocation of a Vec over-allocates, so if there are only a few elements in there there is probably space left over). At the very least with the additional logic required to do this it turned out to not be much of an improvement in the benches.

Do I remember correctly that there was also someone mentioning the possibility of chumsky being generic over these data structures? If so, my flat list could just be one of these probably quite easily.

zesterer · 2022-02-13T23:23:50Z

Thanks, this is very valuable information!

zesterer · 2023-03-08T10:07:04Z

I'm going to close this because it's now out of date. However, we did end up using a very similar approach (and more besides), which was a big factor on the rewrite's very competitive performance. Thanks for the ideas!

replace error Vec with a list of slabs

5566766

zesterer force-pushed the master branch from 93608d3 to f8d2a77 Compare February 7, 2022 14:27

zesterer force-pushed the master branch from 7f23a42 to 3556618 Compare September 30, 2022 19:58

zesterer closed this Mar 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace error `Vec` with a linked list of slab `Vec`s #75

Replace error `Vec` with a linked list of slab `Vec`s #75

domenicquirl commented Jan 16, 2022

zesterer commented Feb 13, 2022

domenicquirl commented Feb 13, 2022

zesterer commented Feb 13, 2022

zesterer commented Mar 8, 2023

Replace error Vec with a linked list of slab Vecs #75

Replace error Vec with a linked list of slab Vecs #75

Conversation

domenicquirl commented Jan 16, 2022

zesterer commented Feb 13, 2022

domenicquirl commented Feb 13, 2022

zesterer commented Feb 13, 2022

zesterer commented Mar 8, 2023

Replace error `Vec` with a linked list of slab `Vec`s #75

Replace error `Vec` with a linked list of slab `Vec`s #75