Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-119866: Spill before escaping calls in all cases. #124392

Merged
merged 62 commits into from
Oct 7, 2024

Conversation

markshannon
Copy link
Member

@markshannon markshannon commented Sep 23, 2024

Also fixes #123391

This PR spills the stack pointer across escaping call, also writing the contents of the stack to memory where necessary.

How it works:

Adds a new Storage class which combines the state of the stack and the local variables within a micro-op.
Adds copying and merging functionality to that class, and the underlying stack and locals, to support tracking state across divergent flow.
Splits and merges the storage in if statements.
Tracks the state of conceptual stack and local variables, ensuring the necessary values are written to memory when needed.

The code generator needs to tell when inputs are dead, in order to known when the stack_pointer should be reduced when a call escapes. We can track explicit PyStackRef_CLOSE and DECREF_INPUTS, but many cases are implicit.
For these cases we add the DEAD macro to mark the variable as dead.

To simplify parsing, the code generator also enforces PEP 7 rules for braces.
A few of the changes in bytecodes.c are a result of this change.

Both interpreter performance and JIT performance show no slowdown.

Replaces #123397

@markshannon
Copy link
Member Author

I've addressed all the issues.

I've increased the number of functions that we treat as non-escaping, but not every function mentioned above.

I think it safer to do this than to try to mark everything that is non-escaping and risk making a mistake, or any of those functions changing in the future such that it does escape.

Since the cost of spilling is low, spilling around such maybe-non-escaping calls has no measurable performance impact.

Copy link
Member

@Fidget-Spinner Fidget-Spinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to approve this on the condition that usage of DEAD is properly documented in a follow-up PR. I don't think it's so clear when to use it correctly and when it might be misused.

@Fidget-Spinner
Copy link
Member

Alternatively, you can add the documentation in, since you need to fix the failing test cases anyways.

@markshannon
Copy link
Member Author

I will leave the documentation of DEAD to another PR, as ERROR_IF and other macros are not documented either.
It will be easier to do them all together.

self.assertListEqual(gc.get_referrers(exc), [])

asyncio.run(main())
self.assertListEqual(gc.get_referrers(exc), [main_coro])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we are spilling the stack pointer, so the GC can see the locals of the generator.

Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to asyncio look fine to me, but maybe a comment (along the lines of what you explained to Kumar) would be helpful.

Copy link
Member

@brandtbucher brandtbucher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm far from an expert on the cases generator, but I didn't see any obvious issues there. A few notes:

  • It should be an error if somebody uses a name after it has been killed by either DEAD(...) or INPUTS_DEAD().
  • It should be an error to kill a name twice in the same path.
  • Can you remind me why DEAD(...)/INPUTS_DEAD() needs to be explicit in the code, and can't just be inferred from the location of the last use? I vaguely remember discussing this at the sprint, but I forget why.
  • Can we handle scalars-under-arrays in the analyzer, so we don't have these awkward length-one arrays?

I'm approving this PR because I think it probably needs to happen and I trust that this is the best way of doing this. But I think in general, we might need to take a step back and consider the huge amount of complexity the has accumulated in bytecodes.c, and the thousands of lines of code in the cases generator that process it. The DSL was originally introduced to reduce the boilerplate and complexity in the interpreter loop... I think it still does a good job of this, as evidenced by this PR. But the mental load when reading and editing this file has crept up incrementally with time, and I'm worried that rate is accelerating. It's beyond the scope of this issue, but we should probably consider if the DSL should be reworked to better support the needs of the cases generator.

@brandtbucher
Copy link
Member

Thanks for tackling this, by the way. It was a big project and it's really cool to see it working correctly (the newly-failing test was neat to see).

@markshannon
Copy link
Member Author

Can you remind me why DEAD(...)/INPUTS_DEAD() needs to be explicit in the code, and can't just be inferred from the location of the last use? I vaguely remember discussing this at the sprint, but I forget why.

Because variables hold references to objects the last use doesn't kill the variable. PyStackRef_CLOSE() closes the reference and kills the variable.
However, for calls that consume references and immortal objects, the code generator needs to be told that the reference is dead with DEAD

@markshannon
Copy link
Member Author

Can we handle scalars-under-arrays in the analyzer, so we don't have these awkward length-one arrays?

Theoretically yes, but mixing scalars and arrays is awkward. We want to be able to move scalars into registers, but having gaps in the in-memory stack makes things complicated.

@markshannon
Copy link
Member Author

#125046 for the other issues.

@markshannon markshannon merged commit da071fa into python:main Oct 7, 2024
62 of 63 checks passed
graingert added a commit to graingert/cpython that referenced this pull request Oct 14, 2024
1st1 pushed a commit that referenced this pull request Oct 14, 2024
…p and _PyFuture refcycles ... (#125486)

* Revert "gh-125472: Revert "gh-124958: fix asyncio.TaskGroup and _PyFuture refcycles (#12… (#125476)"

This reverts commit e99650b.

* fix incompatability with gh-124392
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Interpreter code generators need to be able to handle basic flow control
6 participants