-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chain() make collect very slow #63340
Comments
cc @scottmcm |
Sounds like a good change; please make a PR, @Stargateur! Let me know if you need any assistance.
There are some resources about why linked from #45595 |
I need to search a lot more because I find simple case where I find some bench test for vector, so I will try to add some and see if fn extend_desugared<I: Iterator<Item = T>>(&mut self, iterator: I) {
// This is the case for a general iterator.
//
// This function should be the moral equivalent of:
//
// for item in iterator {
// self.push(item);
// }
let (lower, upper) = iterator.size_hint();
if let Some(upper) = upper {
self.reserve(upper); // why not ?
}
else {
self.reserve(lower)
}
// We use `for_each()` that should allow more efficient code
iterator.for_each(|element| {
let len = self.len();
if len == self.capacity() {
self.reserve(1);
}
unsafe {
ptr::write(self.get_unchecked_mut(len), element);
// NB can't overflow since we would have had to alloc the address space
self.set_len(len + 1);
}
})
} The main difference (actuel code) is that we can't anymore check the lower bound of the iterator in the loop, also I wonder why not take the upper bound if it exist. Well, I suppose this is for saving memory but I don't know if it's make sense because reserve already take more memory in a lot of case. So why not just hint what should be the max ? Also, Rust take like forever to compile on my PC, this going to take a while to do my search. |
You could try https://internals.rust-lang.org/t/gcc-compile-farm-for-rustc/9511, but it will take |
#50481, well better give up for now, I can't wait 10 hours everytime to compile rust for each little change. |
Here's my bench result
While the average performance is about the same, the variance is still larger. Another run...
I don't even know anymore... |
@rustbot claim |
@hbina cargo bench'd on my machine, "
|
It could be due to different performance characteristics between different microarchitectures (especially AMD vs Intel). Although variance in hbina results is quite high. |
Results of cargo bench
Finished bench [optimized] target(s) in 0.01s
Running target/release/deps/hash_bench-140dea619a7d2d1c
running 2 tests
test tests::test_collect ... bench: 145,083 ns/iter (+/- 3,624)
test tests::test_for_each ... bench: 172,960 ns/iter (+/- 4,167)
test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured; 0 filtered out |
Maybe something has fixed it already (like LLVM update). current nightly on Arch Linux (Ryzen 2700X):
|
@mati865 yes, $ rustc -vV
rustc 1.40.0-nightly (4a8c5b20c 2019-10-23)
binary: rustc
commit-hash: 4a8c5b20c7772bc5342b83d4b0696ea216ef75a7
commit-date: 2019-10-23
host: x86_64-unknown-linux-gnu
release: 1.40.0-nightly
LLVM version: 9.0 |
@csmoe are you running on Windows? On Windows 10 (the same PC) windows-gnu toolchain:
|
This comment has been minimized.
This comment has been minimized.
bench result with jemalloc as @mati865 suggested: running 2 tests
test tests::test_collect ... bench: 253,160 ns/iter (+/- 27,068)
test tests::test_for_each ... bench: 283,811 ns/iter (+/- 46,165) |
@Stargateur I compiled Rust with #63340 (comment) but it gave me slightly worse results on windows-gnu toolchain (linux-gnu toolchain doesn't reproduce the issue for me, most likely because I have latest glibc). @hbina if you still want to work on this issue you will have to create environment where you can reproduce the slowness (maybe running old distros in Docker?). |
`for_each` are specialized for iterators such as `chain` allowing for faster iteration than a normal `for/while` loop. Note that since this only checks `size_hint` once at the start it may end up needing to call `reserve` more in the case that `size_hint` returns a larger and more accurate lower bound during iteration. This could maybe be alleviated with an implementation closure like the current one but the extra complexity will likely end up harming the normal case of an accurate or 0 (think `filter`) lower bound. ```rust while let Some(element) = iterator.next() { let (lower, _) = iterator.size_hint(); self.reserve(lower.saturating_add(1)); unsafe { let len = self.len(); ptr::write(self.get_unchecked_mut(len), element); // NB can't overflow since we would have had to alloc the address space self.set_len(len + 1); } iterator.by_ref().take(self.capacity()).for_each(|element| { unsafe { let len = self.len(); ptr::write(self.get_unchecked_mut(len), element); // NB can't overflow since we would have had to alloc the address space self.set_len(len + 1); } }); } // OR let (lower, _) = iterator.size_hint(); self.reserve(lower); loop { let result = iterator.by_ref().try_for_each(|element| { if self.len() == self.capacity() { return Err(element); } unsafe { let len = self.len(); ptr::write(self.get_unchecked_mut(len), element); // NB can't overflow since we would have had to alloc the address space self.set_len(len + 1); } Ok(()) }); match result { Ok(()) => break, Err(element) => { let (lower, _) = iterator.size_hint(); self.reserve(lower.saturating_add(1)); self.push(element); } } } ``` Closes rust-lang#63340
perf: Use `for_each` in `Vec::extend` `for_each` are specialized for iterators such as `chain` allowing for faster iteration than a normal `for/while` loop. Note that since this only checks `size_hint` once at the start it may end up needing to call `reserve` more in the case that `size_hint` returns a larger and more accurate lower bound during iteration. This could maybe be alleviated with an implementation closure like the current one but the extra complexity will likely end up harming the normal case of an accurate or 0 (think `filter`) lower bound. ```rust while let Some(element) = iterator.next() { let (lower, _) = iterator.size_hint(); self.reserve(lower.saturating_add(1)); unsafe { let len = self.len(); ptr::write(self.get_unchecked_mut(len), element); // NB can't overflow since we would have had to alloc the address space self.set_len(len + 1); } iterator.by_ref().take(self.capacity()).for_each(|element| { unsafe { let len = self.len(); ptr::write(self.get_unchecked_mut(len), element); // NB can't overflow since we would have had to alloc the address space self.set_len(len + 1); } }); } // OR let (lower, _) = iterator.size_hint(); self.reserve(lower); loop { let result = iterator.by_ref().try_for_each(|element| { if self.len() == self.capacity() { return Err(element); } unsafe { let len = self.len(); ptr::write(self.get_unchecked_mut(len), element); // NB can't overflow since we would have had to alloc the address space self.set_len(len + 1); } Ok(()) }); match result { Ok(()) => break, Err(element) => { let (lower, _) = iterator.size_hint(); self.reserve(lower.saturating_add(1)); self.push(element); } } } ``` Closes #63340
`for_each` are specialized for iterators such as `chain` allowing for faster iteration than a normal `for/while` loop. Note that since this only checks `size_hint` once at the start it may end up needing to call `reserve` more in the case that `size_hint` returns a larger and more accurate lower bound during iteration. This could maybe be alleviated with an implementation closure like the current one but the extra complexity will likely end up harming the normal case of an accurate or 0 (think `filter`) lower bound. ```rust while let Some(element) = iterator.next() { let (lower, _) = iterator.size_hint(); self.reserve(lower.saturating_add(1)); unsafe { let len = self.len(); ptr::write(self.get_unchecked_mut(len), element); // NB can't overflow since we would have had to alloc the address space self.set_len(len + 1); } iterator.by_ref().take(self.capacity()).for_each(|element| { unsafe { let len = self.len(); ptr::write(self.get_unchecked_mut(len), element); // NB can't overflow since we would have had to alloc the address space self.set_len(len + 1); } }); } // OR let (lower, _) = iterator.size_hint(); self.reserve(lower); loop { let result = iterator.by_ref().try_for_each(|element| { if self.len() == self.capacity() { return Err(element); } unsafe { let len = self.len(); ptr::write(self.get_unchecked_mut(len), element); // NB can't overflow since we would have had to alloc the address space self.set_len(len + 1); } Ok(()) }); match result { Ok(()) => break, Err(element) => { let (lower, _) = iterator.size_hint(); self.reserve(lower.saturating_add(1)); self.push(element); } } } ``` Closes rust-lang#63340
Triage: Let's clean-up assignment. |
While working on a SO question.
We was wondering if
chain()
would produce an acceptable speed, after some digging and benchmark, we come to the conclusion thatcollect()
is slow because it usewhile let
. Unfortunately, this make collect very slow, I don't really understand why but that a fact.But we saw that
for_each()
(probably thank tofold()
) implementation ofchain()
don't have this problem and produce something a lot faster.So, should we change implementation of collect to use
for_each()
? Note that a for loop doesn't solve the problem. For this to be optimized we need to usefor_each()
.The text was updated successfully, but these errors were encountered: