-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specialize len in ExactSizeIterator implementations #91998
Conversation
r? @m-ou-se (rust-highfive has picked a reviewer for you, use r? to override) |
Hum... Many of these one line functions are good |
In other PRs I got the feedback that we shouldn't generally |
☔ The latest upstream changes (presumably #95241) made this pull request unmergeable. Please resolve the merge conflicts. |
7b2d47d
to
ce4ec10
Compare
☔ The latest upstream changes (presumably #95837) made this pull request unmergeable. Please resolve the merge conflicts. |
ce4ec10
to
5afc44d
Compare
This is still waiting on review. |
r? @the8472 - Do you have time to review this? I think you're our Iterator expert. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you done any measurements or looked at assembly and seen improvements? Or what motivated this change? Not that I am opposed to it, it looks quite reasonable.
In other PRs I got the feedback that we shouldn't generally #[inline] stuff willy nilly and let the compiler decide. It seems ridiculous to me to slap #[inline] on a (generic) function that simply returns self.length - surely the compiler knows there's no point in creating a function for a single opcode. But I also noticed that splitting off part of a function that is #[inline] to a new function that is not #[inline], often leads to a performance backlash.
The impact varies a lot. It depends on whether the function is part of a loop body our outside, whether inlining it unlocks other optimizations and also on the number of codegen units. Afaik the CGU splitting algorithm takes inlining annotations into account and tries to group related methods together so they can be optimized together.
count()
consumes the iterator, so it's less likely that the count will be used in a bounds check of a following loop. len()
on the other hand may feature as condition on some loops, so inlining it could be help, especially when compiling with multiple CGUs and LTO=off. That's presumably why the default impl of len()
and many size_hint
are inline.
So I only put #[inline] where size_hint now calls the new function.
You might want to consider the case where size_hint
already is #[inline]
. Since the default impl also is inline that means it previously was inlined all the way.
Since some of those iterators are very hot (although I don't know how much code relies on len()
) I think it makes sense to put this through perf.
@bors try @rust-timer queue
looks like the bots don't listen to review comments. @bors try @rust-timer queue |
⌛ Trying commit 5afc44d28a758bb1559ea8ec0d34434e00030357 with merge 6354a6adb63febaea3c269075326e12e45008c86... |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
☀️ Try build successful - checks-actions |
Queued 6354a6adb63febaea3c269075326e12e45008c86 with parent b12708f, future comparison URL. |
Finished benchmarking commit (6354a6adb63febaea3c269075326e12e45008c86): comparison url. Instruction count
Max RSS (memory usage)Results
CyclesResults
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Footnotes |
No. I just saw the inconsistency where
I've recently seen it matter for “correctness” too: #[inline] avoids stack overflow by avoiding that moved values get copied on the stack.
There is one way to speculate less about it… make them abort and see how far we get with the test suite. I haven't completed this
Well, what I pursued is a more general case - I don't even care whether |
5afc44d
to
84125b2
Compare
So I yanked |
Then I'm not quite seeing the motivation here. consistency-of-impls isn't that important since the API surface doesn't change and the impls are non-uniform for other reasons anyway. As long as the behavior is correct and fast the increase in lines of code doesn't seem to be worth it. |
Two commits (tell me if you prefer separate PRs):Override the default implementation ofIterator::count
inExactSizeIterator
iterators without side effects, much like When possible without changing semantics, implement Iterator::last in terms of DoubleEndedIterator::next_back for types in liballoc and libcore. #62316, making them O(1) instead of O(n). Though I doubtcount
is used often or at all.count
simply delegate to an inner implementation, which means the ones in HashMap/HashSet don't actually shortcut until hashbrown's would do so.ExactSizeIterator
.Override the default implementation of
ExactSizeIterator::len
, wheresize_hint
clearly returns equal lower and upper limits. This contradicts ExactSizeIterator's advice "The len method has a default implementation, so you usually shouldn’t implement it", but I think it makes code easier to understand, might improve performance, and most of the library iterators already do it: I counted 50 iterators that do, plus an unexplored number of slice iterators through a macro, against 31 iterators that didn't and are in this PR, and these that can't easily be:ExactSizeIterator
FlatMap
is only conditionallyTrustedLen
and not evenExactSizeIterator