Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove all 4 RC clones in min_candidates. allowing it to be inlined #5625

Merged
merged 1 commit into from
Jun 27, 2018

Conversation

Eh2406
Copy link
Contributor

@Eh2406 Eh2406 commented Jun 10, 2018

So I was looking at a profile, and noted that DepsFrame::min_candidates was taking ~10% of the runtime. The odd part is that it should be a thin wrapper around Vec::len(), and so should be completely inlined away. Also it is the key for the BinaryHeap so it gets called a lot! Looking into it remaining_siblings.clone() clones the RC in the RcVecIter then .next() clones T witch is a DepInfo each part of which is an RC that needs to be cloned. All 4 of these RC clones can be removed, but it is apparently too much for the optimizer. So I added a 'peek' method that uses a normal reference to the inner value instead of an RC clone. After this DepsFrame::min_candidates does not appear in the profile results. Probably as the name is inlined away. But is the inlined code faster?

before: 20000000 ticks, 104s, 192.308 ticks/ms
after: 20000000 ticks, 87s, 229.885 ticks/ms

So yes ~16% faster!

All profiling/benchmark was done by commenting out the code from #5213 so its test case would run for a long time. But this should improve the happy path as well.

@rust-highfive
Copy link

r? @alexcrichton

(rust_highfive has picked a reviewer for you, use r? to override)

@alexcrichton
Copy link
Member

@bors: r+

Nice find!

Generally inlined code is faster in that is avoids setting up a call frame and primarily enables further optimizations, but if the runtime is faster it's faster, so no need to worry about things like #[inline]!

@bors
Copy link
Contributor

bors commented Jun 27, 2018

📌 Commit edc516c has been approved by alexcrichton

@bors
Copy link
Contributor

bors commented Jun 27, 2018

⌛ Testing commit edc516c with merge 1e35888...

bors added a commit that referenced this pull request Jun 27, 2018
remove all 4 RC clones in min_candidates. allowing it to be inlined

So I was looking at a profile, and noted that `DepsFrame::min_candidates` was taking ~10% of the runtime. The odd part is that it should be a thin wrapper around `Vec::len()`, and so should be completely inlined away. Also it is the key for the `BinaryHeap` so it gets called a lot! Looking into it `remaining_siblings.clone()` clones the RC in the `RcVecIter` then `.next()` clones `T` witch is a `DepInfo` each part of which is an RC that needs to be cloned. All 4 of these RC clones can be removed, but it is apparently too much for the optimizer. So I added a 'peek' method that uses a normal reference to the inner value instead of an RC clone. After this `DepsFrame::min_candidates` does not appear in the profile results. Probably as the name is inlined away. But is the inlined code faster?

before: 20000000 ticks, 104s, 192.308 ticks/ms
after:    20000000 ticks, 87s,   229.885 ticks/ms

So yes ~16% faster!

All profiling/benchmark was done by commenting out the code from #5213 so its test case would run for a long time. But this should improve the happy path as well.
@bors
Copy link
Contributor

bors commented Jun 27, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing 1e35888 to master...

@bors bors merged commit edc516c into rust-lang:master Jun 27, 2018
@Eh2406 Eh2406 deleted the min_candidates_is_slow branch June 29, 2018 18:32
@ehuss ehuss added this to the 1.29.0 milestone Feb 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants