Reduce the amount of generic code for ParallelExtend #887

cuviper · 2021-10-07T19:08:29Z

For unindexed parallel itererators, we've implemented ParallelExtend
for most collections using an intermediate LinkedList<Vec<T>> like:

    par_iter
        .into_par_iter()
        .fold(Vec::new, vec_push)
        .map(as_list)
        .reduce(LinkedList::new, list_append)

However, this introduces Fold, Map, and Reduce types that are all
dependent on the input iterator type. When it comes to very complicated
cases like nested tuple unzips, this can add up quickly. For example, in
rust-lang/rust#68926 an 8-way unzip leads to 3.7GB of LLVM IR, with
lines up to 67K characters in long generic types.

Now we add a new ListVecConsumer that is not generic at all itself,
and implements Consumer<T> etc. generic only on the item type. So each
collection now gets the same LinkedList<Vec<T>> as before with:

    par_iter.into_par_iter().drive_unindexed(ListVecConsumer);

Each implementation now also separates the code that doesn't need to be
iterator-specific to a separate function, for their reserve and final
extend from the list data.

That 8-way unzip is now only 1.5GB with lines up to 17K characters.
Compile time drops from 12.8s to 7.7s debug, 32.1s to 26.9s release.

For unindexed parallel itererators, we've implemented `ParallelExtend` for most collections using an intermediate `LinkedList<Vec<T>>` like: ```rust par_iter .into_par_iter() .fold(Vec::new, vec_push) .map(as_list) .reduce(LinkedList::new, list_append) ``` However, this introduces `Fold`, `Map`, and `Reduce` types that are all dependent on the input iterator type. When it comes to very complicated cases like nested tuple unzips, this can add up quickly. For example, in rust-lang/rust#68926 an 8-way unzip leads to 3.7GB of LLVM IR, with lines up to 67K characters in long generic types. Now we add a new `ListVecConsumer` that is not generic at all itself, and implements `Consumer<T>` etc. generic only on the item type. So each collection now gets the same `LinkedList<Vec<T>>` as before with: ```rust par_iter.into_par_iter().drive_unindexed(ListVecConsumer); ``` Each implementation now also separates the code that doesn't need to be iterator-specific to a separate function, for their `reserve` and final `extend` from the list data. That 8-way unzip is now _only_ 1.5GB with lines up to 17K characters. Compile time drops from 12.8s to 7.7s debug, 32.1s to 26.9s release.

cuviper · 2022-04-01T16:44:20Z

bors r+

bors · 2022-04-01T16:59:11Z

Build succeeded:

cuviper force-pushed the extend-diet branch from b226b13 to 7d2444a Compare October 7, 2021 19:10

cuviper mentioned this pull request Oct 7, 2021

Slow compilation rust-lang/rust#68926

Open

bors bot merged commit 5298d6a into rayon-rs:master Apr 1, 2022

cuviper deleted the extend-diet branch February 25, 2023 17:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce the amount of generic code for ParallelExtend #887

Reduce the amount of generic code for ParallelExtend #887

cuviper commented Oct 7, 2021

cuviper commented Apr 1, 2022

bors bot commented Apr 1, 2022

Reduce the amount of generic code for ParallelExtend #887

Reduce the amount of generic code for ParallelExtend #887

Conversation

cuviper commented Oct 7, 2021

cuviper commented Apr 1, 2022

bors bot commented Apr 1, 2022