-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[perf experiments] Clone all MIR bodies #125025
Conversation
Oops, sorry about that. @bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
[perf experiments] Clone all MIR bodies We keep saying things like cloning all the MIR would be expensive, but... how expensive actually?
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (c0bcf54): comparison URL. Overall result: no relevant changes - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)This benchmark run did not return any relevant results for this metric. CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 673.703s -> 675.385s (0.25%) |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
[perf experiments] Clone all MIR bodies We keep saying things like cloning all the MIR would be expensive, but... how expensive actually?
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (f0a6c88): comparison URL. Overall result: no relevant changes - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 676.399s -> 675.548s (-0.13%) |
Hm... so maybe Miri should clone the body and apply the substitution once, rather than applying it on each statement? For hot loops that should help a lot. But for big functions where we only run a small part of the code, less so. |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (6fd1dde): comparison URL. Overall result: ❌ regressions - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)Results (primary -2.3%, secondary 1.2%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary 1.1%, secondary -1.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (primary -0.4%, secondary -0.2%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 671.52s -> 677.804s (0.94%) |
29038ee
to
c49fd8f
Compare
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
[perf experiments] Clone all MIR bodies We keep saying things like cloning all the MIR would be expensive, but... how expensive actually?
This comment has been minimized.
This comment has been minimized.
💔 Test failed - checks-actions |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
[perf experiments] Clone all MIR bodies We keep saying things like cloning all the MIR would be expensive, but... how expensive actually?
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (e881ea0): comparison URL. Overall result: ❌ regressions - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)Results (primary -3.2%, secondary 2.5%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (secondary 3.0%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (primary -0.4%, secondary -0.2%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 671.997s -> 675.131s (0.47%) |
@bors try |
[perf experiments] Clone all MIR bodies We keep saying things like cloning all the MIR would be expensive, but... how expensive actually?
☀️ Try build successful - checks-actions |
I've updated the PR description with my conclusions from these experiments. |
We keep saying things like cloning all the MIR would be expensive, but... how expensive actually?
Based on this PR, it looks like the literal clone is cheap enough to be considered free, but running basically any MIR transform on the cloned MIR is a measurable slowdown. The hope here was that we could do optimizations on monomorphized MIR. I think ther eare two problems with that:
The first problem is this:
rust/compiler/rustc_mir_transform/src/add_call_guards.rs
Line 28 in 032be6f
LLVM needs a certain structure around calls, which is generally wasteful in terms of the MIR that it takes to generate. We currently cope with this by creating the required structure at the end of the
optimized_mir
passes. This is very problematic, because the most obvious post-mono optimization is running something likeSimplifyCfg
to removeif true
andif false
and the resultant goto chains, but that deletes all the work ofAddCallGuard
, so now you need to run at least two post-mono MIR transforms.The second problem is that naively monomorphizing the entire Body doesn't skip unreachable blocks. I tried adding some code that uses the same reachability analysis to skip dead code, but something subtle happens that causes one of our secondary benchmarks to regress a lot. That's this perf run: #125025 (comment). I never figured out what causes this slowdown. The later perf runs are me trying to figure out what that's from.