-
Notifications
You must be signed in to change notification settings - Fork 13.3k
[experiment] add loop deletion llvm pass #79774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
(rust-highfive has picked a reviewer for you, use r? to override) |
@bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit 3f1d957 with merge a89aa1b166f929b236fd540bdca9e9ae44473760... |
☀️ Try build successful - checks-actions |
Queued a89aa1b166f929b236fd540bdca9e9ae44473760 with parent 0f6f2d6, future comparison URL. |
Finished benchmarking try commit (a89aa1b166f929b236fd540bdca9e9ae44473760): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
That's more one-sided that I expected, but oh well. 😓 |
@the8472 We measure compiler in perf here and it is obviously became slower when we use more optimization passes because it is more work to do. Maybe we need to benchmark some another program to see the difference? |
@AngelicosPhosphoros it also made |
My findings in #79308 suggest that adding one extra loop deletion pass in the right place can eliminate some loops such that the algorithmic complexity of a function improves categorically. Adding it via
-Cpasses
doesn't quite work on its own (at least for x86_64 defaults), probably due to pass ordering. So this PR just hacks it directly into the llvm pass manager builder.I'm probably not doing this the right way, but locally that's the way that requires the fewest extra passes to get the desired assembly.
If the checks pass I'd like a perf run to see what the impact is.