Description
Bug report
Bug description:
The popular greenlet
package implements cooperative multitasking by moving parts of the C stack around. The active greenlet has all of its stack in the expected place, but a suspended greenlet might have spilled part of its stack to the heap in order to allow the active greenlet to use the same region of stack. For many years, this has worked fine in practice because storage on the stack is generally not reachable from Python objects on the heap. The introduction with #96319 of interpreter frames stored on the C stack broke this assumption; under Python 3.12 and later, if you can get ahold of a frame object from a suspended greenlet, you can crash the interpreter by following f_back
links until you reach one that would traverse an entry frame. A workaround was added in python-greenlet/greenlet@40646dc but it only protects the innermost greenlet frame (which is the easiest one to access since greenlets provide a gr_frame
attribute to retrieve it), severing its link with the rest of the greenlet stack when the greenlet is suspended. This hampers the ability to understand what a suspended greenlet is doing, and it doesn't even completely resolve the crash because there are other ways to obtain a non-innermost greenlet frame.
I filed python-greenlet/greenlet#388 against greenlet to discuss ways greenlet could work around the C-stack-based interpreter frames. None of the options are really palatable; they all involve taking new dependencies on CPython internals, as well as some tradeoff between unsoundness (exposing frame objects whose f_back
attribute will crash the interpreter when accessed) and poor performance (needing to walk the stack on every greenlet suspend/resume). I'm wondering if there's anything that could be done on the CPython side to better support this use case.
The easiest solution from greenlet's perspective would be to just not store interpreter frames on the C stack. It appears likely feasible to store the entry frames on the per-thread frame stack instead; to maintain stack discipline, the entry frame for evaluating an owned-by-thread frame would need to be allocated before the owned-by-thread frame, but that doesn't look like a blocker (in fact both could be allocated simultaneously). Another option would be to use a single static interpreter frame object for all entry frames, and to store their previous
pointers (the only portion that definitely needs to be variable from one entry frame to the next) on a new per-thread stack. Since entry frames return using a different bytecode instruction than non-entry frames, this wouldn't introduce additional branching in the eval loop, only in frame introspection (the f_back
getter, etc).
Another category of potential solution would still keep entry frames on the C stack, but would store enough information in the interpreter frame object under evaluation that it would be able to skip its entry-frame parent without accessing any portion of it. The easiest approach here would be to add a new previous_heap
pointer (name for discussion purposes only) which is like previous
but skips entry frames; but that's increasing the size of the interpreter frame structure, which might not be acceptable. If taking that size bump is OK then the rest of the solution is trivial; just make f_back
follow previous_heap
instead of previous
.
Maybe someone who's more familiar with interpreter internals than I am can come up with an option that's better than any of these. But it would be really useful for greenlet
if we could somehow eliminate the recently-introduced requirement to access the C stack in the course of walking the Python stack. Thanks for your consideration.
CPython versions tested on:
3.12, 3.13, CPython main branch
Operating systems tested on:
Linux