Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce the amount of generic code for ParallelExtend #887

Merged
merged 1 commit into from
Apr 1, 2022

Conversation

cuviper
Copy link
Member

@cuviper cuviper commented Oct 7, 2021

For unindexed parallel itererators, we've implemented ParallelExtend
for most collections using an intermediate LinkedList<Vec<T>> like:

    par_iter
        .into_par_iter()
        .fold(Vec::new, vec_push)
        .map(as_list)
        .reduce(LinkedList::new, list_append)

However, this introduces Fold, Map, and Reduce types that are all
dependent on the input iterator type. When it comes to very complicated
cases like nested tuple unzips, this can add up quickly. For example, in
rust-lang/rust#68926 an 8-way unzip leads to 3.7GB of LLVM IR, with
lines up to 67K characters in long generic types.

Now we add a new ListVecConsumer that is not generic at all itself,
and implements Consumer<T> etc. generic only on the item type. So each
collection now gets the same LinkedList<Vec<T>> as before with:

    par_iter.into_par_iter().drive_unindexed(ListVecConsumer);

Each implementation now also separates the code that doesn't need to be
iterator-specific to a separate function, for their reserve and final
extend from the list data.

That 8-way unzip is now only 1.5GB with lines up to 17K characters.
Compile time drops from 12.8s to 7.7s debug, 32.1s to 26.9s release.

For unindexed parallel itererators, we've implemented `ParallelExtend`
for most collections using an intermediate `LinkedList<Vec<T>>` like:

```rust
    par_iter
        .into_par_iter()
        .fold(Vec::new, vec_push)
        .map(as_list)
        .reduce(LinkedList::new, list_append)
```

However, this introduces `Fold`, `Map`, and `Reduce` types that are all
dependent on the input iterator type. When it comes to very complicated
cases like nested tuple unzips, this can add up quickly. For example, in
rust-lang/rust#68926 an 8-way unzip leads to 3.7GB of LLVM IR, with
lines up to 67K characters in long generic types.

Now we add a new `ListVecConsumer` that is not generic at all itself,
and implements `Consumer<T>` etc. generic only on the item type. So each
collection now gets the same `LinkedList<Vec<T>>` as before with:

```rust
    par_iter.into_par_iter().drive_unindexed(ListVecConsumer);
```

Each implementation now also separates the code that doesn't need to be
iterator-specific to a separate function, for their `reserve` and final
`extend` from the list data.

That 8-way unzip is now _only_ 1.5GB with lines up to 17K characters.
Compile time drops from 12.8s to 7.7s debug, 32.1s to 26.9s release.
@cuviper
Copy link
Member Author

cuviper commented Apr 1, 2022

bors r+

@bors bors bot merged commit 5298d6a into rayon-rs:master Apr 1, 2022
@cuviper cuviper deleted the extend-diet branch February 25, 2023 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant