-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch the BB CFG cache from postorder to RPO #112638
Conversation
Sanity check: @bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit f134101 with merge 7f0a9c6034d4ceaeed051ae99e90c2d96d1a2c5e... |
let rpo = body.basic_blocks.reverse_postorder().to_vec(); | ||
for bb in rpo { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tmiasko This is where it could have been cool to avoid this existing allocation, as it's not easy to combine lazy RPO + as_mut_preserves_cfg
. Probably not worth it just for this single call-site though, I don't remember seeing this elsewhere.
☀️ Try build successful - checks-actions |
1 similar comment
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (7f0a9c6034d4ceaeed051ae99e90c2d96d1a2c5e): comparison URL. Overall result: ❌ regressions - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 649.904s -> 650.513s (0.09%) |
bitmaps looks currently bimodal on these 0.3% swings |
also see #112288 |
@Nilstrieb incredible timing :) To me it looks like there's so few non-cached postorder traversals, that using the cache for them is a wash (all the uses of the postorder cache are the RPO traversals). Now, if we cache anything, at the very least let's cache the ordering we actually use: RPO. (But I saw this when looking to do RPO on the reverse CFG...) That's a small but noticeable icount win in cg, and I think a cleanup. What do you think ? Making |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice cleanup. We should definitely get this merged.
Some changes occurred to MIR optimizations cc @rust-lang/wg-mir-opt |
Probably no need for another perf run. r? @cjgillot |
@bors r+ |
☀️ Test successful - checks-actions |
Finished benchmarking commit (677710e): comparison URL. Overall result: no relevant changes - no action needed@rustbot label: -perf-regression Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)This benchmark run did not return any relevant results for this metric. CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 656.258s -> 656.279s (0.00%) |
The
BasicBlocks
CFG cache is interesting:traversal::postorder
doesn't use ittraversal::reverse_postorder
does traverse the postorder cache backwardsThis PR switches the order of the cache, and makes a bit more use of it. This is a tiny win locally, but it's also for consistency and aesthetics.
r? @ghost