-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize implementations of FromIterator and Extend for Vec #22681
Conversation
Use one loop, efficient for both sized and size-ignorant iterators (including iterators lying about their size).
For the first ever element to put into a vector, the branching conditions are more predictable.
Implement both Vec::from_iter and extend in terms of an internal method working with Iterator. Otherwise, the code below ends up using two monomorphizations of extend, differing only in the implementation of IntoIterator: let mut v = Vector::from_iter(iterable1); v.extend(iterable2);
r? @gankro (rust_highfive has picked a reviewer for you, use r? to override) |
Can you post your experimental results. |
Does it optimize to a memcpy if the src is contiguous? If not then it will have to be changed anyway. |
// self.push(item); | ||
// } | ||
loop { | ||
match iterator.next() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use while let
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can, good point.
Results of benchmarking on i686: https://gist.github.com/mzabaluev/df41c2f50464416b0a26 tl;dr: Notable losers are (observed consistently over multiple bench runs):
Winners:
Overall, I don't think too much trust should be put in microbenchmarks repeatedly testing one call-site optimized function. |
How much is "winning" and "losing'"? |
@mahkoh These changes should not drastically change the performance with slice iterators, in which case it's as close to |
@huonw I've added percentages to the comment above. |
Two of these methods have |
I actually ran benchmarks once with |
There should be more benchmarks with extending from sizeless/pessimistic iterators. Any good candidates in libcollections? |
BTreeMap's |
@pczarn Without looking too closely into the results (I get easily frustrated with the long build times of the library crates and their tests), I assume the optimizer has more "reason" to share |
How do we all feel about this PR today? |
Oh geez, this slipped right through the cracks! I don't have a great gut on this since it seems to just be shuffling perf around. r? @huonw |
if vector.len() == vector.capacity() { | ||
for element in iterator { | ||
vector.push(element); | ||
let mut vector = match iterator.next() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, does this actually improve performance, over the simpler:
let mut vector = Vec::with_capacity(iterator.size_hint().0);
vector.extend_desugared(iterator);
vector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This avoids a branch that is present in extend_desugared
. That branch tends to be not taken, but in the first iteration the vector is always expanded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, why it does it avoid that branch? It seems that if the iterator has a non-zero size hint the with_capacity
will ensure that it isn't taken, and in the case that the iterator has a zero-size hint it seems both with and without this are equally bad off?
Basically, I'm asking if this was noticeable in practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, and it was somehow lost on me that Vec::with_capacity(1)
allocates exactly one element. I should simplify the code to your suggestion; the performance difference will likely be negligible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the suggested change, vec::bench_from_iter_0000
regresses
from 29 ns/iter (+/- 2) to 87 ns/iter (+/- 48) in my testing
(rebased against commit 3dbfa74), the other benchmarks seemingly unaffected. I wonder if it can be considered an edge case.
@huonw @mzabaluev What's up with this? |
Closing due to inactivity. |
@huonw Any reason not to merge this? |
huon's busy |
while let Some(element) = iterator.next() { | ||
let len = self.len(); | ||
if len == self.capacity() { | ||
let (lower, _) = iterator.size_hint(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling size_hint in a loop seems really bad. This is not necessarily a straight-forward or cheap method. Is hoisting it out not worth it in your testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is called exponentially rarely, so it seems fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh derp, sure.
At worst I like this better than the old code. r+ |
@bors r+ |
📌 Commit 7b464d3 has been approved by |
Instead of a fast branch with a sized iterator falling back to a potentially poorly optimized iterate-and-push loop, a single efficient loop can serve all cases. In my benchmark runs, I see some good gains, but also some regressions, possibly due to different inlining choices by the compiler. YMMV.
Instead of a fast branch with a sized iterator falling back to a potentially poorly optimized iterate-and-push loop, a single efficient loop can serve all cases.
In my benchmark runs, I see some good gains, but also some regressions, possibly due to different inlining choices by the compiler. YMMV.