-
-
Notifications
You must be signed in to change notification settings - Fork 31.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-104635: Eliminate redundant STORE_FAST instructions in the compiler #105040
Conversation
corona10
commented
May 28, 2023
•
edited
Loading
edited
- Issue: dead store elimination in the compiler #104635
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems worth doing if the reported 1% win from pyperformance is real. We can also do a bit better here by allowing this optimization to help apply_static_swaps
.
I'm not sure if there are reasons to avoid new super-instructions? Would like feedback from @markshannon on that.
103c2c7
to
a9f902c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks reasonable to me! Seems ok to do the swap thing either in this PR or separately. Interested in any feedback from @iritkatriel / @markshannon / @brandtbucher.
Misc/NEWS.d/next/Core and Builtins/2023-05-31-16-53-17.gh-issue-104635.iwf0d8.rst
Outdated
Show resolved
Hide resolved
We might also want to expand this so it can look more than one instruction ahead for another write to the same location, skipping over an allowlist of instructions that we know don't read that same local, don't execute arbitrary code, and can't raise an exception. E.g. |
@mdboom Hi, Can we run the pyperformance benchmark for this PR? |
Yeah, great idea. _, _, a, _ = foo But if we expand the optimization, we can do it. |
…e-104635.iwf0d8.rst Co-authored-by: Carl Meyer <carl@oddbird.net>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, presuming pyperformance comes back looking neutral or better.
Do we have to expand the optimization in this PR or handle it on a separate PR?
Either way seems fine to me.
('NOP', 0, 3), | ||
('POP_TOP', 0, 4), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a cool benefit I didn't realize we'd also get! LOAD_*/POP_TOP
pairs can be eliminated entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally this would just become LOAD_CONST NOP NOP STORE_FAST
though, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes; I think that's an orthogonal improvement to the LOAD/POP
elimination optimization, allowing it to ignore intervening NOP
. Or maybe it has to do rather with making it multi-pass? Or both? I haven't checked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can run the loop of optimization until there are no opcode changes.
(I didn't test yet)
I remeasured the benchmark with my bare metal machine, I would like to merge this PR if possible, but it looks like we have to wait for Mark's task The best thing would be to improve this PR with more optimization without adding super instruction or new single opcode concatenated by multiple opcodes. Do you have any ideas or suggestions? |
I think we should wait for #105230 to land, given the motivations outlined in #105229. Then we should re-measure the performance impact of this. I think it would be good to do a stats run of pyperformance with this change and see how often |
When I ran the pyperformance, they showed the very low ratio of executions.
|
Replaced by #105320 |