-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow compilation #68926
Comments
This is kind of impressive! Running So I ran The debug binary is also large at 245M, but most of that is debuginfo. The actual text+data is about 12M. |
For unindexed parallel itererators, we've implemented `ParallelExtend` for most collections using an intermediate `LinkedList<Vec<T>>` like: ```rust par_iter .into_par_iter() .fold(Vec::new, vec_push) .map(as_list) .reduce(LinkedList::new, list_append) ``` However, this introduces `Fold`, `Map`, and `Reduce` types that are all dependent on the input iterator type. When it comes to very complicated cases like nested tuple unzips, this can add up quickly. For example, in rust-lang/rust#68926 an 8-way unzip leads to 3.7GB of LLVM IR, with lines up to 67K characters in long generic types. Now we add a new `ListVecConsumer` that is not generic at all itself, and implements `Consumer<T>` etc. generic only on the item type. So each collection now gets the same `LinkedList<Vec<T>>` as before with: ```rust par_iter.into_par_iter().drive_unindexed(ListVecConsumer); ``` Each implementation now also separates the code that doesn't need to be iterator-specific to a separate function, for their `reserve` and final `extend` from the list data. That 8-way unzip is now _only_ 1.5GB with lines up to 17K characters. Compile time drops from 12.8s to 7.7s debug, 32.1s to 26.9s release.
For unindexed parallel itererators, we've implemented `ParallelExtend` for most collections using an intermediate `LinkedList<Vec<T>>` like: ```rust par_iter .into_par_iter() .fold(Vec::new, vec_push) .map(as_list) .reduce(LinkedList::new, list_append) ``` However, this introduces `Fold`, `Map`, and `Reduce` types that are all dependent on the input iterator type. When it comes to very complicated cases like nested tuple unzips, this can add up quickly. For example, in rust-lang/rust#68926 an 8-way unzip leads to 3.7GB of LLVM IR, with lines up to 67K characters in long generic types. Now we add a new `ListVecConsumer` that is not generic at all itself, and implements `Consumer<T>` etc. generic only on the item type. So each collection now gets the same `LinkedList<Vec<T>>` as before with: ```rust par_iter.into_par_iter().drive_unindexed(ListVecConsumer); ``` Each implementation now also separates the code that doesn't need to be iterator-specific to a separate function, for their `reserve` and final `extend` from the list data. That 8-way unzip is now _only_ 1.5GB with lines up to 17K characters. Compile time drops from 12.8s to 7.7s debug, 32.1s to 26.9s release.
I've reduced it somewhat in rayon-rs/rayon#887:
I'm not sure if there's anything the compiler could do though -- there's just a lot of code to expand here. In Specialization might avoid that, but in the meantime maybe we could add a |
Thank you for looking into this and writing a patch to improve things. The |
If you look at the current implementation that matches I took a look though, and I don't see how to do indexed unzip in rayon's design. The |
887: Reduce the amount of generic code for ParallelExtend r=cuviper a=cuviper For unindexed parallel itererators, we've implemented `ParallelExtend` for most collections using an intermediate `LinkedList<Vec<T>>` like: ```rust par_iter .into_par_iter() .fold(Vec::new, vec_push) .map(as_list) .reduce(LinkedList::new, list_append) ``` However, this introduces `Fold`, `Map`, and `Reduce` types that are all dependent on the input iterator type. When it comes to very complicated cases like nested tuple unzips, this can add up quickly. For example, in rust-lang/rust#68926 an 8-way unzip leads to 3.7GB of LLVM IR, with lines up to 67K characters in long generic types. Now we add a new `ListVecConsumer` that is not generic at all itself, and implements `Consumer<T>` etc. generic only on the item type. So each collection now gets the same `LinkedList<Vec<T>>` as before with: ```rust par_iter.into_par_iter().drive_unindexed(ListVecConsumer); ``` Each implementation now also separates the code that doesn't need to be iterator-specific to a separate function, for their `reserve` and final `extend` from the list data. That 8-way unzip is now _only_ 1.5GB with lines up to 17K characters. Compile time drops from 12.8s to 7.7s debug, 32.1s to 26.9s release. Co-authored-by: Josh Stone <cuviper@gmail.com>
Using nested tuples in this example leads to painfully long compile times:
The text was updated successfully, but these errors were encountered: