gh-104635: Eliminate redundant STORE_FAST instructions in the compiler #105040

corona10 · 2023-05-28T13:43:15Z

Issue: dead store elimination in the compiler #104635

import pyperf

runner = pyperf.Runner()
runner.timeit(name="bench dead_store",
              stmt="""
_, _, _, a = 1, 2, 3, 4
""")


Mean +- std dev: [base] 10.8 ns +- 0.2 ns -> [super] 10.1 ns +- 0.1 ns: 1.07x faster

carljm

This seems worth doing if the reported 1% win from pyperformance is real. We can also do a bit better here by allowing this optimization to help apply_static_swaps.

I'm not sure if there are reasons to avoid new super-instructions? Would like feedback from @markshannon on that.

Lib/test/test_peepholer.py

Python/flowgraph.c

Python/instrumentation.c

carljm

This looks reasonable to me! Seems ok to do the swap thing either in this PR or separately. Interested in any feedback from @iritkatriel / @markshannon / @brandtbucher.

Lib/test/test_peepholer.py

Misc/NEWS.d/next/Core and Builtins/2023-05-31-16-53-17.gh-issue-104635.iwf0d8.rst

Lib/test/test_peepholer.py

carljm · 2023-05-31T19:02:26Z

We might also want to expand this so it can look more than one instruction ahead for another write to the same location, skipping over an allowlist of instructions that we know don't read that same local, don't execute arbitrary code, and can't raise an exception. E.g. LOAD_FAST of a different local, POP_TOP, NOP, probably others.

corona10 · 2023-06-01T05:08:37Z

@mdboom Hi, Can we run the pyperformance benchmark for this PR?

corona10 · 2023-06-01T09:29:56Z

We might also want to expand this so it can look more than one instruction ahead for another write to the same location, skipping over an allowlist of instructions that we know don't read that same local, don't execute arbitrary code, and can't raise an exception.

Yeah, great idea.
Currently, we can not optimize the following case.

_, _, a, _ = foo

But if we expand the optimization, we can do it.
Do we have to expand the optimization in this PR or handle it on a separate PR?

…e-104635.iwf0d8.rst Co-authored-by: Carl Meyer <carl@oddbird.net>

carljm

Looks good to me, presuming pyperformance comes back looking neutral or better.

Do we have to expand the optimization in this PR or handle it on a separate PR?

Either way seems fine to me.

carljm · 2023-06-01T23:06:35Z

Lib/test/test_peepholer.py

+            ('NOP', 0, 3),
+            ('POP_TOP', 0, 4),


This is a cool benefit I didn't realize we'd also get! LOAD_*/POP_TOP pairs can be eliminated entirely.

Ideally this would just become LOAD_CONST NOP NOP STORE_FAST though, right?

Yes; I think that's an orthogonal improvement to the LOAD/POP elimination optimization, allowing it to ignore intervening NOP. Or maybe it has to do rather with making it multi-pass? Or both? I haven't checked.

Maybe we can run the loop of optimization until there are no opcode changes.
(I didn't test yet)

corona10 · 2023-06-02T16:43:08Z

@carljm

Looks good to me, presuming pyperformance comes back looking neutral or better.

I remeasured the benchmark with my bare metal machine,
As you expected, it shows 1.0x faster (neutral) with latest pyperformance 1.0.8.

I would like to merge this PR if possible, but it looks like we have to wait for Mark's task
And if the runtime super instruction is removed, we should re-consider adding POP_TOP__POP_TOP and POP_TOP__STORE_FAST.

The best thing would be to improve this PR with more optimization without adding super instruction or new single opcode concatenated by multiple opcodes.

Do you have any ideas or suggestions?

carljm · 2023-06-02T20:38:15Z

Do you have any ideas or suggestions?

I think we should wait for #105230 to land, given the motivations outlined in #105229. Then we should re-measure the performance impact of this. I think it would be good to do a stats run of pyperformance with this change and see how often POP_TOP is followed by another POP_TOP or by STORE_FAST, in order to help inform whether we should add POP_TOP_POP_TOP and POP_TOP_STORE_FAST superinstructions (in the compiler).

corona10 · 2023-06-04T11:28:12Z

I think it would be good to do a stats run of pyperformance with this change and see how often POP_TOP is followed by another POP_TOP or by STORE_FAST,

When I ran the pyperformance, they showed the very low ratio of executions.

POP_TOP__POP_TOP: 4,846,500 counts == 0.0 %
POP_TOP__STORE_FAST: 19,918,740 counts == 0.0%
https://gist.github.com/corona10/810255492523f2fa802396ca1deac73f

corona10 · 2023-06-05T14:39:38Z

Replaced by #105320

bedevere-bot mentioned this pull request May 28, 2023

dead store elimination in the compiler #104635

Open

corona10 added the skip news label May 28, 2023

corona10 changed the title ~~gh-104635: Naive dead store elimination in cfg opt~~ gh-104635: Naive dead store elimination in cfg opt (experiment) May 28, 2023

carljm reviewed May 30, 2023

View reviewed changes

Lib/test/test_peepholer.py Outdated Show resolved Hide resolved

Python/flowgraph.c Show resolved Hide resolved

Python/instrumentation.c Show resolved Hide resolved

corona10 force-pushed the dead_store branch 2 times, most recently from 103c2c7 to a9f902c Compare May 31, 2023 07:45

corona10 removed the skip news label May 31, 2023

corona10 requested a review from carljm May 31, 2023 09:23

corona10 marked this pull request as ready for review May 31, 2023 09:24

corona10 requested review from brandtbucher and markshannon as code owners May 31, 2023 09:24

bedevere-bot added the awaiting core review label May 31, 2023

corona10 changed the title ~~gh-104635: Naive dead store elimination in cfg opt (experiment)~~ gh-104635: Naive dead store elimination in cfg opt May 31, 2023

corona10 changed the title ~~gh-104635: Naive dead store elimination in cfg opt~~ gh-104635: Naive dead store elimination in basic block optimization May 31, 2023

corona10 force-pushed the dead_store branch from 8604e56 to 030b989 Compare May 31, 2023 16:21

carljm reviewed May 31, 2023

View reviewed changes

corona10 changed the title ~~gh-104635: Naive dead store elimination in basic block optimization~~ gh-104635: Eliminate redundant :STORE_FAST instructions in the compiler Jun 1, 2023

corona10 changed the title ~~gh-104635: Eliminate redundant :STORE_FAST instructions in the compiler~~ gh-104635: Eliminate redundant STORE_FAST instructions in the compiler Jun 1, 2023

corona10 requested review from carljm and iritkatriel June 1, 2023 06:44

corona10 added 7 commits June 1, 2023 22:55

Implement simple dead store elimination in cfg opt

7b5afb0

Implement POP_TOP__STORE_FAST super instruction

2a886ba

Only when same line number

553b361

remove white space

0b601dc

Implement POP_TOP__POP_TOP

9039dfe

Address code review

067fc53

Add NEWS.d

827fd69

corona10 and others added 6 commits June 1, 2023 22:58

Add test code

99a2cea

fix

69c1aae

Update Misc/NEWS.d/next/Core and Builtins/2023-05-31-16-53-17.gh-issu…

c119a6f

…e-104635.iwf0d8.rst Co-authored-by: Carl Meyer <carl@oddbird.net>

Address code review

6d159bb

Address code review

ba46ff3

Move apply_static_swap to the last phase of bb optimization

b0e0a06

corona10 force-pushed the dead_store branch from 70c1a9e to b0e0a06 Compare June 1, 2023 14:01

carljm approved these changes Jun 1, 2023

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting core review labels Jun 1, 2023

corona10 closed this Jun 5, 2023

corona10 mentioned this pull request Jul 9, 2023

gh-104635: Expand optimization for dead store elimination #106571

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-104635: Eliminate redundant STORE_FAST instructions in the compiler #105040

gh-104635: Eliminate redundant STORE_FAST instructions in the compiler #105040

corona10 commented May 28, 2023 •

edited

Loading

carljm left a comment

carljm left a comment

carljm commented May 31, 2023 •

edited

Loading

corona10 commented Jun 1, 2023

corona10 commented Jun 1, 2023 •

edited

Loading

carljm left a comment

carljm Jun 1, 2023

brandtbucher Jun 2, 2023

carljm Jun 2, 2023 •

edited

Loading

corona10 Jun 3, 2023 •

edited

Loading

corona10 commented Jun 2, 2023 •

edited

Loading

carljm commented Jun 2, 2023

corona10 commented Jun 4, 2023

corona10 commented Jun 5, 2023

gh-104635: Eliminate redundant STORE_FAST instructions in the compiler #105040

gh-104635: Eliminate redundant STORE_FAST instructions in the compiler #105040

Conversation

corona10 commented May 28, 2023 • edited Loading

carljm left a comment

Choose a reason for hiding this comment

carljm left a comment

Choose a reason for hiding this comment

carljm commented May 31, 2023 • edited Loading

corona10 commented Jun 1, 2023

corona10 commented Jun 1, 2023 • edited Loading

carljm left a comment

Choose a reason for hiding this comment

carljm Jun 1, 2023

Choose a reason for hiding this comment

brandtbucher Jun 2, 2023

Choose a reason for hiding this comment

carljm Jun 2, 2023 • edited Loading

Choose a reason for hiding this comment

corona10 Jun 3, 2023 • edited Loading

Choose a reason for hiding this comment

corona10 commented Jun 2, 2023 • edited Loading

carljm commented Jun 2, 2023

corona10 commented Jun 4, 2023

corona10 commented Jun 5, 2023

corona10 commented May 28, 2023 •

edited

Loading

carljm commented May 31, 2023 •

edited

Loading

corona10 commented Jun 1, 2023 •

edited

Loading

carljm Jun 2, 2023 •

edited

Loading

corona10 Jun 3, 2023 •

edited

Loading

corona10 commented Jun 2, 2023 •

edited

Loading