Keeping 'stack_pointer` in sync with frame->stacktop is possibly too tricky #94666

ericsnowcurrently · 2022-07-07T19:41:26Z

While considering the situation of gh-94215, I realized there is a broader concern worth discussing, of which that issue is a specific instance. FWIW, this isn't some massive, pervasive, nor high-priority problem that needs immediate attention. It is worth consideration at some point, though, as it's bitten us already and I expect it will bite us again.

Context:

In the eval loop (in ceval.c), we copy some interpreter/thread/frame state into local variables, to reduce overhead a little. Of these variables, several get modified during execution. One of them, stack_pointer, must be kept in sync with the frame->stacktop, which can be modified via _PyFrame_SetStackPointer(), _PyFrame_StackPop(), and _PyFrame_StackPush(). We pass stack_pointer in various calls (e.g. vectorcall) throughout the eval loop, so it is important that it stay in sync, at least at the points where we use it. (I covered this a bit more in the other issue.)

The Concern:

From where I'm at, it's hard to have a lot of confidence that we are preserving the invaraiant (that stack_pointer is properly in sync with the frame state) in all cases. There doesn't seem to be any structural mechanism by which we get guarantees (relative to use of _PyFrame_SetStackPointer(), etc.). Instead it seems like knowing when to synchronize stack_pointer, and when we can avoid unnecessary synchronization, currently requires deep knowledge of a lot of pieces.

Consequently, it seems like it is relatively easy to break the invariant accidentally. The fact that it happened to someone as knowledgeable as @markshannon really says a lot.

Am I out of touch here? If not, is it worth doing something about all that?

CC @iritkatriel @tiran @markshannon @pablogsal

The text was updated successfully, but these errors were encountered:

ericsnowcurrently · 2022-07-07T19:41:38Z

Some possible "solutions":

leave it alone (the performance benefit is enough to offset the very infrequent mistakes?)
add tests that catch the mistakes somehow
add asserts to every use of stack_pointer to check synchronization
add equivalent checks under Py_DEBUG
drop the local variable and use _PyFrame_GetStackPointer() directly instead
add stack_pointer = _PyFrame_SetStackPointer(frame); right before every call/use of stack_pointer (which makes the local irrelevant)
add some mechanism that ties _PyFrame_SetStackPointer(), etc. to the local variable somehow (unrealistic)

encukou · 2022-07-08T13:16:32Z

For starters, it would be nice to properly define the invariant. There are obviously cases when stack_pointer goes out of sync with frame->stacktop, and currently it's not clear if they are intentional – and if so, when the eventual sync is expected.
Same goes for next_instr/frame->prev_instr (and anything else that'll be made local in the future).

markshannon · 2022-07-25T15:12:08Z

The invariant is well defined.
For years, the invariant has been that if the stack-pointer stored in the frame was invalid (used to be NULL, now its -1) then the stack pointer was being handled in the interpreter, _PyEval_EvalFrameDefault.

https://github.com/python/cpython/blob/main/Python/ceval.c#L1706 explains why the in-memory stack offset is made invalid when the stack pointer is held in a register.

There are obviously cases when stack_pointer goes out of sync with frame->stacktop

When are they out of sync?

markshannon · 2022-07-26T14:52:20Z

I should say that the stack pointer is (along with the instruction pointer) the hottest variable int the entire VM and needs to be in a register. No harm in adding more asserts, though.

ericsnowcurrently · 2022-07-26T15:03:14Z

When are they out of sync?

See #94215 (and my comment at #94215 (comment)).

markshannon · 2022-07-26T16:39:10Z

Where in the code?

They can only be out of sync when both are valid and reachable.
In other words, it can only happen in _PyEval_EvalFrameDefault when frame->stacktop >= 0.
That should only happen in a few places dealing with tracing and the like.

brandtbucher · 2022-10-13T21:14:59Z

Looking at the code, it appears that we don't set frame->stacktop = -1; after TRACE_FUNCTION_ENTRY();. I don't know if this is intentional or not, but it seems incorrect to me.

ericsnowcurrently added the type-bug An unexpected behavior, bug, or error label Jul 7, 2022

ericsnowcurrently mentioned this issue Jul 7, 2022

GC crash _PyObject_AssertFailed with pdb #94215

Closed

iritkatriel added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Nov 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keeping 'stack_pointer` in sync with frame->stacktop is possibly too tricky #94666

Keeping 'stack_pointer` in sync with frame->stacktop is possibly too tricky #94666

ericsnowcurrently commented Jul 7, 2022

ericsnowcurrently commented Jul 7, 2022

encukou commented Jul 8, 2022

markshannon commented Jul 25, 2022 •

edited

Loading

markshannon commented Jul 26, 2022

ericsnowcurrently commented Jul 26, 2022

markshannon commented Jul 26, 2022

brandtbucher commented Oct 13, 2022 •

edited

Loading

Keeping 'stack_pointer` in sync with frame->stacktop is possibly too tricky #94666

Keeping 'stack_pointer` in sync with frame->stacktop is possibly too tricky #94666

Comments

ericsnowcurrently commented Jul 7, 2022

ericsnowcurrently commented Jul 7, 2022

encukou commented Jul 8, 2022

markshannon commented Jul 25, 2022 • edited Loading

markshannon commented Jul 26, 2022

ericsnowcurrently commented Jul 26, 2022

markshannon commented Jul 26, 2022

brandtbucher commented Oct 13, 2022 • edited Loading

markshannon commented Jul 25, 2022 •

edited

Loading

brandtbucher commented Oct 13, 2022 •

edited

Loading