Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tier2 peephole optimization: remove extraneous _CHECK_STACK_SPACE ops #116168

Closed
Tracked by #115419
lazorchakp opened this issue Feb 29, 2024 · 23 comments
Closed
Tracked by #115419

Tier2 peephole optimization: remove extraneous _CHECK_STACK_SPACE ops #116168

lazorchakp opened this issue Feb 29, 2024 · 23 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage type-feature A feature request or enhancement

Comments

@lazorchakp
Copy link
Contributor

lazorchakp commented Feb 29, 2024

Implement a new peephole optimization for the tier2 optimizer that removes _CHECK_STACK_SPACE if we see that this check is already present

Linked PRs

@lazorchakp
Copy link
Contributor Author

cc @Fidget-Spinner

This might need some labels. I also looked around for a parent ticket, but didn't find anything particularly obvious

@Fidget-Spinner Fidget-Spinner added type-feature A feature request or enhancement performance Performance or resource usage interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Mar 1, 2024
@Fidget-Spinner
Copy link
Member

I will link to #115419

@gvanrossum
Copy link
Member

I don't know anything about the plan here, so this may not be news. :-) Couldn't you check for the needed stack space once at the beginning of the executor? And if there's a loop, lift it out of there. (And if it fits within the stack space of the code object already, drop it altogether.)

@Fidget-Spinner
Copy link
Member

Makes sense. Let's go with consolidating them first then another follow-up PR for lifting. Lifting stuff out of loops requires _JUMP_ABSOLUTE and I can just point to the code from my old PR to copy. (Though the heuristics need someone else to work on it).

@gvanrossum
Copy link
Member

Is anyone working on this? Otherwise I'd like to give it a go.

@Fidget-Spinner
Copy link
Member

@gvanrossum I think @lazorchakp might be working on this, @lazorchakp do you mind if Guido takes over? If you have something with an implementation up but no tests that's okay, you can just submit a draft PR and Guido can probably take over if you don't mind.

@gvanrossum
Copy link
Member

I'd also be happy to mentor @lazorchakp !

@lazorchakp
Copy link
Contributor Author

Sorry everyone - I've been busier than anticipated and haven't made concrete progress on this yet. If it's alright though, I'd still like to try it, and I should be able to get a draft up tonight if that works. @Fidget-Spinner has given me some guidance over email, but I'd gladly accept as much mentoring as you both are willing to provide 🙂

That being said, @gvanrossum if you'd rather go for it yourself, that's quite alright! Just let me know.

@gvanrossum
Copy link
Member

No, I'd love to have you do it!

My guess is that the critical trick should be to change the uop that checks for stack space to take a number indicating the amount of stack space needed, rather than digging that value out of the code object (which must be retrieved from the function object, which must be dug out of the stack at position -2-oparg, IIRC).

Maybe this number can be put in its operand field by the initial trace projection pass (which needs the code object anyway). Then a later dedicated pass can combine stack checks as long as they are within the same "scope" (i.e., _PUSH_FRAME/_POP_FRAME pair).

@lazorchakp
Copy link
Contributor Author

That sounds great. It makes sense to me that we'd now need _CHECK_STACK_SPACE to know up-front how much space it's looking for. I'll see if I can take your suggestion and modify the uop so that this information is available for the optimization pass.

lazorchakp added a commit to lazorchakp/cpython that referenced this issue Mar 21, 2024
@lazorchakp
Copy link
Contributor Author

I'm at the point where I have the operand for each _CHECK_STACK_SPACE being correctly assigned to its corresponding function's framesize (pushed a rough branch). Before I implement the actual optimization though, I want to make sure I'm reasoning about this correctly - I'm making some heavy assumptions about the underlying machinery. Are the following cases correct? (other ops removed for simplicity):


CASE 1 - sequential function calls:

_CHECK_STACK_SPACE A
_PUSH_FRAME
_POP_FRAME
_CHECK_STACK_SPACE B
_PUSH_FRAME
_POP_FRAME

proposed optimization:

_CHECK_STACK_SPACE max(A, B)
_PUSH_FRAME
_POP_FRAME
_PUSH_FRAME
_POP_FRAME

CASE 2 - nested function calls:

_CHECK_STACK_SPACE A
_PUSH_FRAME
_CHECK_STACK_SPACE B
_PUSH_FRAME
_POP_FRAME
_POP_FRAME

proposed optimization:

_CHECK_STACK_SPACE A + B
_PUSH_FRAME
_PUSH_FRAME
_POP_FRAME
_POP_FRAME

If these cases are correct, I should be able to combine everything into a single _CHECK_STACK_SPACE uop per trace, as @gvanrossum mentioned above.

@gvanrossum
Copy link
Member

Yes, both cases look right.

lazorchakp added a commit to lazorchakp/cpython that referenced this issue Mar 22, 2024
@lazorchakp
Copy link
Contributor Author

@gvanrossum / @Fidget-Spinner

I have this mostly working, but I'm stuck on one critical piece: actually using the new _CHECK_STACK_SPACE operand (the amount of space to check) at execution time. I think I need to update the definition of _CHECK_STACK_SPACE in bytecodes.c. My fear is that I'll only have the operand in tier2 after trace projection, so tier1 and tier2 behavior would have to differ.

Does anyone have ideas about how I might be able to modify _CHECK_STACK_SPACE so that for tier2 (executor_cases.c.h...?) it will use the operand populated during trace projection but for tier1 (generated_cases.c.h...?) it will check the value on the stack? If the behavior must be identical, do I need to modify the bytecode cache in the specializer too? (It's entirely possible that these questions don't make sense or that I have some fundamental misunderstanding)

@Fidget-Spinner
Copy link
Member

@lazorchakp if the instruction will not fit the format in tier 1, you can add a new uop _CHECK_STACK_SPACE_COMBINED or something. Then manually swap out for that uop in the optimizer analysis.

@gvanrossum
Copy link
Member

Tier 1 should be unaffected. IIRC there are #ifdefs you can check for.

@gvanrossum
Copy link
Member

I don't know what the "eyes" reaction means. Do you need more info? If so please ask another question. :-)

@lazorchakp
Copy link
Contributor Author

Ah sorry, didn't want to generate too much noise - I'll be more clear in the future. I just restarted work on this for the evening and these comments are enough to unblock me for now. I won't hesitate to ask another question if I get stuck somewhere!

lazorchakp added a commit to lazorchakp/cpython that referenced this issue Mar 23, 2024
@lazorchakp
Copy link
Contributor Author

I'm getting ready to open the PR here and am wrapping up some final tests. One of my new tests fails on main though; is opening a new issue appropriate? I haven't had time to investigate yet.

This is it, if anyone's interested (under test_opt.TestUopsOptimziation)

    def test_many_nested(self)
        # overflow the trace_stack
        def dummy_a(x):
            return x
        def dummy_b(x):
            return dummy_a(x)
        def dummy_c(x):
            return dummy_b(x)
        def dummy_d(x):
            return dummy_c(x)
        def dummy_e(x):
            return dummy_d(x)
        def dummy_f(x):
            return dummy_e(x)
        def dummy_g(x):
            return dummy_f(x)
        def dummy_h(x):
            return dummy_g(x)
        def testfunc(n):
            a = 0
            for _ in range(n):
                a += dummy_h(n)
            return a

        self._run_with_optimizer(testfunc, 32)

error:

Assertion failed: ((this_instr + 2)->opcode == _PUSH_FRAME), function optimize_uops, file optimizer_cases.c.h, line 1600.
Fatal Python error: Aborted

Current thread 0x000000020140a100 (most recent call first):
  File "..../cpython/Lib/test/test_capi/test_opt.py", line 978 in testfunc
  File "..../cpython/Lib/test/test_capi/test_opt.py", line 588 in _run_with_optimizer
  ...

@Fidget-Spinner
Copy link
Member

Fidget-Spinner commented Mar 23, 2024

Seems like a bug. Please open an issue with the repro. I will fix it. Thanks!

@lazorchakp
Copy link
Contributor Author

@Fidget-Spinner I opened #117180. Thanks, and lmk if you need more details.

lazorchakp added a commit to lazorchakp/cpython that referenced this issue Mar 25, 2024
lazorchakp added a commit to lazorchakp/cpython that referenced this issue Mar 25, 2024
lazorchakp added a commit to lazorchakp/cpython that referenced this issue Mar 26, 2024
lazorchakp added a commit to lazorchakp/cpython that referenced this issue Mar 26, 2024
lazorchakp added a commit to lazorchakp/cpython that referenced this issue Mar 26, 2024
@lazorchakp
Copy link
Contributor Author

Still have to iterate with the pipeline a bit in my draft PR. I'll move it to Open and send an update here when it's officially ready for review.

lazorchakp added a commit to lazorchakp/cpython that referenced this issue Mar 28, 2024
lazorchakp added a commit to lazorchakp/cpython that referenced this issue Mar 28, 2024
@lazorchakp
Copy link
Contributor Author

Just opened #117242 for review

lazorchakp added a commit to lazorchakp/cpython that referenced this issue Mar 29, 2024
lazorchakp added a commit to lazorchakp/cpython that referenced this issue Mar 30, 2024
lazorchakp added a commit to lazorchakp/cpython that referenced this issue Apr 2, 2024
lazorchakp added a commit to lazorchakp/cpython that referenced this issue Apr 3, 2024
gvanrossum pushed a commit that referenced this issue Apr 3, 2024
This merges all `_CHECK_STACK_SPACE` uops in a trace into a single `_CHECK_STACK_SPACE_OPERAND` uop that checks whether there is enough stack space for all calls included in the entire trace.
@gvanrossum
Copy link
Member

This is done -- thanks again!

diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024
This merges all `_CHECK_STACK_SPACE` uops in a trace into a single `_CHECK_STACK_SPACE_OPERAND` uop that checks whether there is enough stack space for all calls included in the entire trace.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants