Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-104584: Remove ip_offset and simplify EXIT_TRACE #108961

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions Include/internal/pycore_opcode_metadata.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 3 additions & 5 deletions Python/bytecodes.c
Original file line number Diff line number Diff line change
Expand Up @@ -3006,6 +3006,7 @@ dummy_func(
tstate->py_recursion_remaining--;
LOAD_SP();
LOAD_IP();
frame->prev_instr = _PyCode_CODE(_PyFrame_GetCode(frame));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment in ceval_macros.c -- perhaps this should also be a macro?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unnecessary, frame->prev_instr is set during frame creation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if we want to reuse this code for SEND we don't want to be messing with frame->prev_instr

Copy link
Member Author

@brandtbucher brandtbucher Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unnecessary, frame->prev_instr is set during frame creation.

It's set to an index of -1 (since the frame hasn't started yet), which is invalid. So we either need to set it like this, or at least do an increment.

#if LLTRACE && TIER_ONE
lltrace = maybe_lltrace_resume_frame(frame, &entry_frame, GLOBALS());
if (lltrace < 0) {
Expand Down Expand Up @@ -3794,7 +3795,7 @@ dummy_func(
}

op(SAVE_IP, (--)) {
frame->prev_instr = ip_offset + oparg;
frame->prev_instr = (_Py_CODEUNIT *)(uintptr_t)operand;
}

op(SAVE_CURRENT_IP, (--)) {
Expand All @@ -3808,10 +3809,7 @@ dummy_func(
}

op(EXIT_TRACE, (--)) {
frame->prev_instr--; // Back up to just before destination
_PyFrame_SetStackPointer(frame, stack_pointer);
Py_DECREF(self);
return frame;
goto deoptimize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In uops debug mode (PYTHONUOPSDEBUG=2 or higher) this would (confusingly, IMO) print DEOPT: [Opcode ...]. Doesn't that bother you? Or do you consider EXIT_TRACE as just another form of deoptimization?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It definitely is annoying. Actual deoptimizations are rare, but exiting a trace is common. So I get a lot of log spam containing endless repetitions of

DEOPT: [Opcode 300, operand 0]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tier 2 only, so it doesn't matter much what the implementation is. I can see it changing quite a few times in the near future.
Whatever it is, it shouldn't have the context dependent frame->prev_instr--; // Back up to just before destination

}

op(INSERT, (unused[oparg], top -- top, unused[oparg])) {
Expand Down
2 changes: 1 addition & 1 deletion Python/ceval_macros.h
Original file line number Diff line number Diff line change
Expand Up @@ -396,7 +396,7 @@ stack_pointer = _PyFrame_GetStackPointer(frame);
#if TIER_TWO

#define LOAD_IP() \
do { ip_offset = (_Py_CODEUNIT *)_PyFrame_GetCode(frame)->co_code_adaptive; } while (0)
do {} while (0)
Comment on lines 398 to +399
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be curious what @markshannon thinks of this change. It makes me think that the "abstractions" embodied by these LOAD/STORE macros are not as solid as we thought. And I wonder if the line that sets prev_instr should also be a macro. (Note that the _SP macros don't actually differ by tier.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems good to me. LOAD_IP should be a no-op in tier 2 execution. The IP should be virtual.

(In this context "virtual" means tracked by the region selector (translate_bytecode_to_trace) not at runtime)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _SP may not differ by tier, but the copy-and-patch compiler will need to treat them differently from the tier 2 interpreter.


#define STORE_SP() \
_PyFrame_SetStackPointer(frame, stack_pointer)
Expand Down
1 change: 0 additions & 1 deletion Python/executor.c
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,6 @@ _PyUopExecute(_PyExecutorObject *executor, _PyInterpreterFrame *frame, PyObject
CHECK_EVAL_BREAKER();

OBJECT_STAT_INC(optimization_traces_executed);
_Py_CODEUNIT *ip_offset = (_Py_CODEUNIT *)_PyFrame_GetCode(frame)->co_code_adaptive;
int pc = 0;
int opcode;
int oparg;
Expand Down
8 changes: 3 additions & 5 deletions Python/executor_cases.c.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions Python/generated_cases.c.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 4 additions & 8 deletions Python/optimizer.c
Original file line number Diff line number Diff line change
Expand Up @@ -479,7 +479,6 @@ translate_bytecode_to_trace(
#define TRACE_STACK_PUSH() \
if (trace_stack_depth >= TRACE_STACK_SIZE) { \
DPRINTF(2, "Trace stack overflow\n"); \
ADD_TO_TRACE(SAVE_IP, 0, 0); \
goto done; \
} \
trace_stack[trace_stack_depth].code = code; \
Expand All @@ -503,7 +502,7 @@ translate_bytecode_to_trace(
top: // Jump here after _PUSH_FRAME
for (;;) {
RESERVE_RAW(2, "epilogue"); // Always need space for SAVE_IP and EXIT_TRACE
ADD_TO_TRACE(SAVE_IP, INSTR_IP(instr, code), 0);
ADD_TO_TRACE(SAVE_IP, 0, (uintptr_t)instr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask to keep (redundantly) the INSTR_IP(...) argument in all cases? It's helpful for debugging -- the 64-bit hex operand is useless, but the code-relative offset can be related directly to the output from dis, and I regularly do that.

If you worry about the cost of computing INSTR_IP() when no debugging is selected, maybe define the macro differently (or define another macro)?

Also, I think the operand may have to be printed in hex (both here and in executor.c) -- debug output like 106858786520180 is not very helpful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will ultimately want to change from instruction pointer back to the offset (as we had in 3.9 and earlier), so let's keep the redundant (for now) macro.

The reason for wanting the offset, not the pointer is that we expect writes to frame->instr_ptr to be rare, thanks to optimization, most reads actually want the offset, and it means we only need a 32 bit operand (less copying and cache pressure).


uint32_t opcode = instr->op.code;
uint32_t oparg = instr->op.arg;
Expand Down Expand Up @@ -554,7 +553,7 @@ translate_bytecode_to_trace(
uint32_t uopcode = opcode == POP_JUMP_IF_TRUE ?
_POP_JUMP_IF_TRUE : _POP_JUMP_IF_FALSE;
ADD_TO_TRACE(uopcode, max_length, 0);
ADD_TO_STUB(max_length, SAVE_IP, INSTR_IP(target_instr, code), 0);
ADD_TO_STUB(max_length, SAVE_IP, 0, (uintptr_t)target_instr);
ADD_TO_STUB(max_length + 1, EXIT_TRACE, 0, 0);
break;
}
Expand Down Expand Up @@ -614,7 +613,7 @@ translate_bytecode_to_trace(
ADD_TO_TRACE(next_op, 0, 0);

ADD_TO_STUB(max_length + 0, POP_TOP, 0, 0);
ADD_TO_STUB(max_length + 1, SAVE_IP, INSTR_IP(target_instr, code), 0);
ADD_TO_STUB(max_length + 1, SAVE_IP, 0, (uintptr_t)target_instr);
ADD_TO_STUB(max_length + 2, EXIT_TRACE, 0, 0);
break;
}
Expand Down Expand Up @@ -668,7 +667,7 @@ translate_bytecode_to_trace(
oparg = orig_oparg & 0xF;
break;
case OPARG_SAVE_IP: // op==SAVE_IP; oparg=next instr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the comment.

oparg = INSTR_IP(instr + offset, code);
operand = (uintptr_t)(instr + offset);
break;

default:
Expand Down Expand Up @@ -707,15 +706,13 @@ translate_bytecode_to_trace(
PyUnicode_AsUTF8(new_code->co_qualname),
PyUnicode_AsUTF8(new_code->co_filename),
new_code->co_firstlineno);
ADD_TO_TRACE(SAVE_IP, 0, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Because you added an assignment to frame->prev_instr to _PUSH_FRAME, you don't need this dummy SAVE_IP any more. Same when bailing on version mismatch below and ditto if func == NULL. Right?

goto done;
}
if (new_code->co_version != func_version) {
// func.__code__ was updated.
// Perhaps it may happen again, so don't bother tracing.
// TODO: Reason about this -- is it better to bail or not?
DPRINTF(2, "Bailing because co_version != func_version\n");
ADD_TO_TRACE(SAVE_IP, 0, 0);
goto done;
}
// Increment IP to the return address
Expand All @@ -731,7 +728,6 @@ translate_bytecode_to_trace(
2 * INSTR_IP(instr, code));
goto top;
}
ADD_TO_TRACE(SAVE_IP, 0, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to look twice before understanding how we even get here. :-) I think we could use a comment here explaining that here func == NULL. (And/or a DPRINT() call?)

goto done;
}
}
Expand Down
5 changes: 4 additions & 1 deletion Tools/cases_generator/flags.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,10 @@ def fromInstruction(instr: parsing.Node) -> "InstructionFlags":
)
and not has_free,
HAS_EVAL_BREAK_FLAG=variable_used(instr, "CHECK_EVAL_BREAKER"),
HAS_DEOPT_FLAG=variable_used(instr, "DEOPT_IF"),
HAS_DEOPT_FLAG=(
variable_used(instr, "DEOPT_IF")
or variable_used(instr, "deoptimize")
),
HAS_ERROR_FLAG=(
variable_used(instr, "ERROR_IF")
or variable_used(instr, "error")
Expand Down
Loading