-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
GH-133231: Changes to executor management to support proposed sys._jit
module
#133287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… Batch executors for deallocation to avoid having to constantly incref executors; this is an ad-hoc form of deferred reference counting.
The msvc jit failure might be fixed by #132746 |
Wow I did not expect that to fix it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the overall approach is neat, but I think there are a couple of potential issues (and possible cleanups) here.
Long story short, anything that uses the entry_frame
for this purpose is going to run into issues with the tail-calling interpreter (whether it's your approach, or mine). It's a hard problem.
typedef struct { | ||
_PyInterpreterFrame frame; | ||
_PyStackRef stack[1]; | ||
} _PyEntryFrame; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed? We only need one slot in localsplus
, which _PyInterpreterFrame
already has.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now need two slots. One on the stack for the return value, as before, and one local variable to hold the current executor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. Forgot about the return value.
Maybe add a comment that the one stack slot is for the return value, and the one local is for the executor.
When you're done making the requested changes, leave the comment: |
Idead: if we have a struct Something like this: diff --git a/Python/ceval_macros.h b/Python/ceval_macros.h
index e1d2673848c..c77c2cafee7 100644
--- a/Python/ceval_macros.h
+++ b/Python/ceval_macros.h
@@ -360,10 +360,13 @@ do { \
OPT_STAT_INC(traces_executed); \
_PyExecutorObject *_executor = (EXECUTOR); \
jit_func jitted = _executor->jit_code; \
+ node prev = tstate->current_executor; \
+ tstate->current_executor = (node){_executor, &prev}; \
/* Keep the shim frame alive via the executor: */ \
Py_INCREF(_executor); \
next_instr = jitted(frame, stack_pointer, tstate); \
Py_DECREF(_executor); \
+ tstate->current_executor = prev; \
Py_CLEAR(tstate->previous_executor); \
frame = tstate->current_frame; \
stack_pointer = _PyFrame_GetStackPointer(frame); \
@@ -379,6 +382,9 @@ do { \
OPT_STAT_INC(traces_executed); \
next_uop = (EXECUTOR)->trace; \
assert(next_uop->opcode == _START_EXECUTOR); \
+ /* prev is declared local to the tier two interpreter: */ \
+ prev = tstate->current_executor; \
+ tstate->current_executor = (node){_executor, &prev}; \
goto enter_tier_two; \
} while (0)
#endif
@@ -386,6 +392,7 @@ do { \
#define GOTO_TIER_ONE(TARGET) \
do \
{ \
+ tstate->current_executor = prev; \
next_instr = (TARGET); \
OPT_HIST(trace_uop_execution_counter, trace_run_length_hist); \
_PyFrame_SetStackPointer(frame, stack_pointer); \ No entry frame needed. |
I don't see how that would work, as macros don't have scope. But its moot anyway as we can use the entry frame. The entry frame exists in the tail-calling interpreter, it is the |
I have made the requested changes; please review again |
Thanks for making the requested changes! @brandtbucher: please review the changes made to this pull request. |
I mean, it would just be a local declared in the But as you say...
Ah, this is what I missed. Makes sense: we're always able to set it when entering because we have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks. I'm just going to kick off some JIT benchmarks to be safe.
next_instr = (TARGET); \ | ||
assert(tstate->current_executor == NULL); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: this seems redundant, given that we're setting it one line above.
__attribute__((musttail)) return jitted(frame, stack_pointer, tstate); \ | ||
} while (0) | ||
|
||
#undef GOTO_TIER_ONE | ||
#define GOTO_TIER_ONE(TARGET) \ | ||
do { \ | ||
tstate->current_executor = NULL; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can either do this here, or in the second half of the _Py_JIT
implementation of GOTO_TIER_TWO
in ceval_macros.h
, after the JIT code returns. No need to do it in both places.
@@ -5316,7 +5322,6 @@ dummy_func( | |||
} | |||
|
|||
tier2 op(_START_EXECUTOR, (executor/4 --)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note, this instruction now exists solely to set the current_executor
in the tier two interpreter. We should try to remove it as a follow-up (doesn't need to be in this PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can replace all uses of current_executor
with an operand, which will remove the current_executor
local variable from the tier 2 interpreter and the _JIT_EXECUTOR
patch from the JIT template.
We can also shrink _CHECK_VALIDITY
a bit by replacing current_executor->vm_data.valid
with *validity_ptr
:
tier2 op(_CHECK_VALIDITY, (validity_ptr/4 --)) {
DEOPT_IF(*validity_ptr == 0);
}
For another PR.
* origin/main: (111 commits) pythongh-91048: Add filename and line number to external inspection routines (pythonGH-133385) pythongh-131178: Add tests for `ast` command-line interface (python#133329) Regenerate pcbuild.sln in Visual Studio 2022 (python#133394) pythongh-133042: disable HACL* HMAC on Emscripten (python#133064) pythongh-133351: Fix remote PDB's multi-line block tab completion (python#133387) pythongh-109700: Improve stress tests for interpreter creation (pythonGH-109946) pythongh-81793: Skip tests for os.link() to symlink on Android (pythonGH-133388) pythongh-126835: Rename `ast_opt.c` to `ast_preprocess.c` and related stuff after moving const folding to the peephole optimizier (python#131830) pythongh-91048: Relax test_async_global_awaited_by to fix flakyness (python#133368) pythongh-132457: make staticmethod and classmethod generic (python#132460) pythongh-132805: annotationlib: Fix handling of non-constant values in FORWARDREF (python#132812) pythongh-132426: Add get_annotate_from_class_namespace replacing get_annotate_function (python#132490) pythongh-81793: Always call linkat() from os.link(), if available (pythonGH-132517) pythongh-122559: Synchronize C and Python implementation of the io module about pickling (pythonGH-122628) pythongh-69605: Add PyREPL import autocomplete feature to 'What's New' (python#133358) bpo-44172: Keep reference to original window in curses subwindow objects (pythonGH-26226) pythonGH-133231: Changes to executor management to support proposed `sys._jit` module (pythonGH-133287) pythongh-133363: Fix Cmd completion for lines beginning with `! ` (python#133364) pythongh-132983: Introduce `_zstd` bindings module (pythonGH-133027) pythonGH-91048: Add utils for printing the call stack for asyncio tasks (python#133284) ...
This PR:
Should simplify the implementation of
sys._jit.is_enabled()
as it is simplytstate->current_executor != NULL
and doesn't require additional state to be tracked.sys._jit
#133231