-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MIR opt: separate constant predecessors of a switch #85646
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @varkor (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this optimization. We already saw on zulip it's potential. I left some nits.
Please also rebase on master instead of merging.
} | ||
|
||
let blocks = body.basic_blocks_mut(); | ||
for (pred_id, target_id) in new_edges { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the two loops be merged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I separated the loops is that according to somebody on Zulip, mutating the blocks vector invalidates the predecessor cache which is apparently expensive to re-compute. So this way I can have an immutable loop with a stable copy of the predecessors without having to clone it. I don't know if it's worth it. The other approach would be to clone the predecessors beforehand so we don't use the recomputed one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not come up with a good way to merge the loops. If you have a suggestion, I'd be happy to reconsider.
New developments revealed that:
More information in updated OP. |
bf8cf36
to
d7ccc64
Compare
After destroying my branch a couple times trying solutions found online, I can assert with a great amount of certainty that I have no idea how to rebase on master in the current state of things. Could somebody guide me through it? |
r? @cjgillot seems to have this under control :) |
Try this
which resulted in this (in your case it should update this PR). If this end ups failing you should be able to return to your original state by checking out your original branch/commit (possibly after aborting the rebase first, depending on when things go wrong) |
Thank you for the answer @nagisa.
which makes no sense to me as I just did it. Do you perhaps know what is going on? @cjgillot I tried the instructions you sent to me in private to different but similar results. Notably, after doing what you suggested (to simply rebase using
(this is the end of the report) edit: I think I'll just copy my changes to a new branch and force push that |
b1faed9
to
e857e11
Compare
I think it is ready for further review now. Once you are happy with its state, would it make sense to run some performance tests to know its impact on build time and runtime performance? |
587193e
to
2f3662c
Compare
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 8b1094be858f3079777566ab0e0ad25405de6362 with merge 54784f9434beb0631f268648b9c03ce81a2116d1... |
This comment has been minimized.
This comment has been minimized.
☀️ Try build successful - checks-actions |
Queued 54784f9434beb0631f268648b9c03ce81a2116d1 with parent 40c1623, future comparison URL. |
Finished benchmarking try commit (54784f9434beb0631f268648b9c03ce81a2116d1): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
78db715
to
4f965dd
Compare
As discussed on Zulip, I decided to keep this pass simple for now, so it ignores projections. I also improved documentation. |
If we proceed with this, I think it is important to decide what a canonically optimized MIR looks like, so that all the optimizations can work towards the same goal. This specific optimization sounds a lot like something a simplifycfg-like pass could undo right after this transformation is done. |
That's true. I think there is something about size vs speed to consider here. Clearly this pass gives up on generated code size in favor of (theoretical) performance. How would we translate the will of the user with respect to this with the current infrastructure? |
un-update itertools improve predecessor amount short-circuiting cleanup and comments somewhat improved drawing
c31a69f
to
a77e2ad
Compare
@bors r+ |
📌 Commit a77e2ad has been approved by |
☀️ Test successful - checks-actions |
For each block S ending with a switch, this pass copies S for each of S's predecessors that seem to assign the value being switched over as a const. This is done using a somewhat simple heuristic to determine what seems to be a const transitively.
More precisely, this is what the pass does:
This pass is not optimal and could probably duplicate in more cases, but the intention was mostly to address cases like in #85133 or #85365, to avoid creating new enums that get destroyed immediately afterwards (notably making the new try v2
?
desugar zero-cost).A benefit of this pass working the way it does is that it is easy to ensure its correctness: the worst that can happen is for it to needlessly copy a basic block, which is likely to be destroyed by cleanup passes afterwards. The complex parts where aliasing matters are only heuristics and the hard work is left to further passes like ConstProp.
LLVM blocker
Unfortunately, I believe it would be unwise to enable this optimization by default for now. Indeed, currently switch lowering passes like SimplifyCFG in LLVM lose the information on the set of possible variant values, which means it tends to actually generate worse code with this optimization enabled. A fix would have to be done in LLVM itself. This is something I also want to look into. I have opened a bug report at the LLVM bug tracker.
When this is done, I hope we can enable this pass by default. It should be fairly fast and I think it is beneficial in many cases. Notably, it should be a sound alternative to simplify-arm-identity. By the way, ConstProp only seems to pick up the optimization in functions that are not generic. This is however most likely an issue in ConstProp that I will look into afterwards.
This is my first contribution to rustc, and I would like to thank everyone on the Zulip mir-opt chat for the help and support, and especially @scottmcm for the guidance.