remove all 4 RC clones in min_candidates. allowing it to be inlined #5625
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
So I was looking at a profile, and noted that
DepsFrame::min_candidates
was taking ~10% of the runtime. The odd part is that it should be a thin wrapper aroundVec::len()
, and so should be completely inlined away. Also it is the key for theBinaryHeap
so it gets called a lot! Looking into itremaining_siblings.clone()
clones the RC in theRcVecIter
then.next()
clonesT
witch is aDepInfo
each part of which is an RC that needs to be cloned. All 4 of these RC clones can be removed, but it is apparently too much for the optimizer. So I added a 'peek' method that uses a normal reference to the inner value instead of an RC clone. After thisDepsFrame::min_candidates
does not appear in the profile results. Probably as the name is inlined away. But is the inlined code faster?before: 20000000 ticks, 104s, 192.308 ticks/ms
after: 20000000 ticks, 87s, 229.885 ticks/ms
So yes ~16% faster!
All profiling/benchmark was done by commenting out the code from #5213 so its test case would run for a long time. But this should improve the happy path as well.