Skip to content

GH-133231: Changes to executor management to support proposed sys._jit module #133287

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
May 4, 2025

Conversation

markshannon
Copy link
Member

@markshannon markshannon commented May 2, 2025

This PR:

  • Tracks the current executor on the thread-state, not the previous one
  • Sets the current executor to NULL when entering PyEval_EvalDefault and resetting it when leaving.
  • Batch executors for deallocation to avoid having to constantly incref executors; this is an ad-hoc form of deferred reference counting.

Should simplify the implementation of sys._jit.is_enabled() as it is simply tstate->current_executor != NULL and doesn't require additional state to be tracked.

… Batch executors for deallocation to avoid having to constantly incref executors; this is an ad-hoc form of deferred reference counting.
@markshannon
Copy link
Member Author

The msvc jit failure might be fixed by #132746

@Fidget-Spinner
Copy link
Member

The msvc jit failure might be fixed by #132746

Wow I did not expect that to fix it.

Copy link
Member

@brandtbucher brandtbucher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the overall approach is neat, but I think there are a couple of potential issues (and possible cleanups) here.

Long story short, anything that uses the entry_frame for this purpose is going to run into issues with the tail-calling interpreter (whether it's your approach, or mine). It's a hard problem.

Comment on lines +993 to +996
typedef struct {
_PyInterpreterFrame frame;
_PyStackRef stack[1];
} _PyEntryFrame;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? We only need one slot in localsplus, which _PyInterpreterFrame already has.

Copy link
Member Author

@markshannon markshannon May 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now need two slots. One on the stack for the return value, as before, and one local variable to hold the current executor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Forgot about the return value.

Maybe add a comment that the one stack slot is for the return value, and the one local is for the executor.

@bedevere-app
Copy link

bedevere-app bot commented May 3, 2025

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

@brandtbucher
Copy link
Member

Idead: if we have a struct struct { PyExecutorObject *executor, struct node *prev} node;, we can stack allocate one of these in every GOTO_TIER_TWO call and restore it afterwards. Then keep the top on the tstate.

Something like this:

diff --git a/Python/ceval_macros.h b/Python/ceval_macros.h
index e1d2673848c..c77c2cafee7 100644
--- a/Python/ceval_macros.h
+++ b/Python/ceval_macros.h
@@ -360,10 +360,13 @@ do {                                                   \
     OPT_STAT_INC(traces_executed);                     \
     _PyExecutorObject *_executor = (EXECUTOR);         \
     jit_func jitted = _executor->jit_code;             \
+    node prev = tstate->current_executor;              \
+    tstate->current_executor = (node){_executor, &prev}; \
     /* Keep the shim frame alive via the executor: */  \
     Py_INCREF(_executor);                              \
     next_instr = jitted(frame, stack_pointer, tstate); \
     Py_DECREF(_executor);                              \
+    tstate->current_executor = prev;                   \
     Py_CLEAR(tstate->previous_executor);               \
     frame = tstate->current_frame;                     \
     stack_pointer = _PyFrame_GetStackPointer(frame);   \
@@ -379,6 +382,9 @@ do { \
     OPT_STAT_INC(traces_executed); \
     next_uop = (EXECUTOR)->trace; \
     assert(next_uop->opcode == _START_EXECUTOR); \
+    /* prev is declared local to the tier two interpreter: */ \
+    prev = tstate->current_executor; \
+    tstate->current_executor = (node){_executor, &prev}; \
     goto enter_tier_two; \
 } while (0)
 #endif
@@ -386,6 +392,7 @@ do { \
 #define GOTO_TIER_ONE(TARGET)                                         \
     do                                                                \
     {                                                                 \
+        tstate->current_executor = prev;                              \
         next_instr = (TARGET);                                        \
         OPT_HIST(trace_uop_execution_counter, trace_run_length_hist); \
         _PyFrame_SetStackPointer(frame, stack_pointer);               \

No entry frame needed.

@markshannon
Copy link
Member Author

I don't see how that would work, as macros don't have scope. But its moot anyway as we can use the entry frame.

The entry frame exists in the tail-calling interpreter, it is the entry variable that we can't use. But we don't need to. Anywhere we return, either in INTERPRETER_EXIT or exit_unwind, the frame is the entry frame: frame == &entry.frame.

@markshannon
Copy link
Member Author

I have made the requested changes; please review again

@bedevere-app
Copy link

bedevere-app bot commented May 3, 2025

Thanks for making the requested changes!

@brandtbucher: please review the changes made to this pull request.

@bedevere-app bedevere-app bot requested a review from brandtbucher May 3, 2025 10:53
@brandtbucher
Copy link
Member

I don't see how that would work, as macros don't have scope.

I mean, it would just be a local declared in the _PyEval_EvalFrameDefault itself for the tail-calling interpreter, and would become local to whatever bytecode is using the macro on other builds.

But as you say...

But its moot anyway as we can use the entry frame.

The entry frame exists in the tail-calling interpreter, it is the entry variable that we can't use. But we don't need to. Anywhere we return, either in INTERPRETER_EXIT or exit_unwind, the frame is the entry frame: frame == &entry.frame.

Ah, this is what I missed. Makes sense: we're always able to set it when entering because we have entry handy. And we're always able to reset it when exiting because it's just the current frame. Thanks for explaining.

Copy link
Member

@brandtbucher brandtbucher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks. I'm just going to kick off some JIT benchmarks to be safe.

next_instr = (TARGET); \
assert(tstate->current_executor == NULL); \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: this seems redundant, given that we're setting it one line above.

__attribute__((musttail)) return jitted(frame, stack_pointer, tstate); \
} while (0)

#undef GOTO_TIER_ONE
#define GOTO_TIER_ONE(TARGET) \
do { \
tstate->current_executor = NULL; \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can either do this here, or in the second half of the _Py_JIT implementation of GOTO_TIER_TWO in ceval_macros.h, after the JIT code returns. No need to do it in both places.

@@ -5316,7 +5322,6 @@ dummy_func(
}

tier2 op(_START_EXECUTOR, (executor/4 --)) {
Copy link
Member

@brandtbucher brandtbucher May 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note, this instruction now exists solely to set the current_executor in the tier two interpreter. We should try to remove it as a follow-up (doesn't need to be in this PR).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can replace all uses of current_executor with an operand, which will remove the current_executor local variable from the tier 2 interpreter and the _JIT_EXECUTOR patch from the JIT template.

We can also shrink _CHECK_VALIDITY a bit by replacing current_executor->vm_data.valid with *validity_ptr:

tier2 op(_CHECK_VALIDITY, (validity_ptr/4 --)) {
    DEOPT_IF(*validity_ptr == 0);
}

For another PR.

@brandtbucher
Copy link
Member

Looks fine.

@markshannon markshannon merged commit ac7d5ba into python:main May 4, 2025
71 checks passed
diegorusso added a commit to diegorusso/cpython that referenced this pull request May 4, 2025
* origin/main: (111 commits)
  pythongh-91048: Add filename and line number to external inspection routines (pythonGH-133385)
  pythongh-131178: Add tests for `ast` command-line interface (python#133329)
  Regenerate pcbuild.sln in Visual Studio 2022 (python#133394)
  pythongh-133042: disable HACL* HMAC on Emscripten (python#133064)
  pythongh-133351: Fix remote PDB's multi-line block tab completion (python#133387)
  pythongh-109700: Improve stress tests for interpreter creation (pythonGH-109946)
  pythongh-81793: Skip tests for os.link() to symlink on Android (pythonGH-133388)
  pythongh-126835: Rename `ast_opt.c` to `ast_preprocess.c` and related stuff after moving const folding to the peephole optimizier (python#131830)
  pythongh-91048: Relax test_async_global_awaited_by to fix flakyness (python#133368)
  pythongh-132457: make staticmethod and classmethod generic (python#132460)
  pythongh-132805: annotationlib: Fix handling of non-constant values in FORWARDREF (python#132812)
  pythongh-132426: Add get_annotate_from_class_namespace replacing get_annotate_function (python#132490)
  pythongh-81793: Always call linkat() from os.link(), if available (pythonGH-132517)
  pythongh-122559: Synchronize C and Python implementation of the io module about pickling (pythonGH-122628)
  pythongh-69605: Add PyREPL import autocomplete feature to 'What's New' (python#133358)
  bpo-44172: Keep reference to original window in curses subwindow objects (pythonGH-26226)
  pythonGH-133231: Changes to executor management to support proposed `sys._jit` module (pythonGH-133287)
  pythongh-133363: Fix Cmd completion for lines beginning with `! ` (python#133364)
  pythongh-132983: Introduce `_zstd` bindings module (pythonGH-133027)
  pythonGH-91048: Add utils for printing the call stack for asyncio tasks (python#133284)
  ...
@markshannon markshannon deleted the current-executor branch May 5, 2025 11:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants