Skip to content

gh-119786: Edit InternalDocs/frames.md, Python/vm-state.md, Python/tier2_engine.md #124450

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
16 changes: 16 additions & 0 deletions InternalDocs/frames.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,20 @@ This seems to provide the best performance without excessive complexity.
The specials have a fixed size, so the offset of the locals is know. The
interpreter needs to hold two pointers, a frame pointer and a stack pointer.

### Fast locals and evaluation stack

The frame contains a single array of object pointers, `localsplus`,
which contains both the fast locals and the stack. The top of the
stack, including the locals, is indicated by `stacktop`.
For example, in a function with three locals, if the stack contains
one value, `frame->stacktop == 4`.

The interpreters share an implementation which uses the same memory
but caches the depth (as a pointer) in a C local, `stack_pointer`.
We aren't sure yet exactly how the JIT will implement the stack;
likely some of the values near the top of the stack will be held in registers.


#### Alternative layout

An alternative layout that was used for part of 3.11 alpha was:
Expand Down Expand Up @@ -124,6 +138,8 @@ if the frame were to resume. After `frame.f_lineno` is set, `instr_ptr` points t
the next instruction to be executed. During a call to a python function,
`instr_ptr` points to the call instruction, because this is what we would expect
to see in an exception traceback.
Dispatching on `instr_ptr` would be very inefficient, so in Tier 1 we cache the
upcoming value of `instr_ptr` in the C local `next_instr`.

The `return_offset` field determines where a `RETURN` should go in the caller,
relative to `instr_ptr`. It is only meaningful to the callee, so it needs to
Expand Down
10 changes: 10 additions & 0 deletions Python/tier2_engine.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,3 +148,13 @@ TO DO.
The implementation will change soon, so there is no point in
documenting it until then.


# Tier 2 IR format

The tier 2 IR (Internal Representation) format is also the basis for the Tier 2 interpreter (though the two formats may eventually differ). This format is also used as the input to the machine code generator (the JIT compiler).

Tier 2 IR entries are all the same size; there is no equivalent to `EXTENDED_ARG` or trailing inline cache entries. Each instruction is a struct with the following fields (all integers of varying sizes):

- **opcode**: Sometimes the same as a Tier 1 opcode, sometimes a separate micro opcode. Tier 2 opcodes are 9 bits (as opposed to Tier 1 opcodes, which fit in 8 bits). By convention, Tier 2 opcode names start with `_`.
- **oparg**: The argument. Usually the same as the Tier 1 oparg after expansion of `EXTENDED_ARG` prefixes. Up to 32 bits.
- **operand**: An additional argument, Typically the value of *one* cache item from the Tier 1 inline cache, up to 64 bits.
59 changes: 11 additions & 48 deletions Python/vm-state.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,39 +3,22 @@
## Definition of Tiers

- **Tier 1** is the classic Python bytecode interpreter.
This includes the specializing adaptive interpreter described in [PEP 659](https://peps.python.org/pep-0659/) and introduced in Python 3.11.
- **Tier 2**, also known as the micro-instruction ("uop") interpreter, is a new interpreter with a different instruction format.
It will be introduced in Python 3.13, and also forms the basis for a JIT using copy-and-patch technology that is likely to be introduced at the same time (but, unlike the Tier 2 interpreter, hasn't landed in the main branch yet).
This includes the specializing [adaptive interpreter](../InternalDocs/adaptive.md).
- **Tier 2**, also known as the micro-instruction ("uop") interpreter, is a new execution engine.
It was introduced in Python 3.13, and also forms the basis for a JIT using copy-and-patch technology. See [Tier 2](tier2_engine.md) for more information.

# Frame state

Almost all interpreter state is nominally stored in the frame structure.
A pointer to the current frame is held in `frame`. It contains:

- **local variables** (a.k.a. "fast locals")
- **evaluation stack** (tacked onto the end of the locals)
- **stack top** (an integer giving the top of the evaluation stack)
- **instruction pointer**
- **code object**, which holds things like the array of instructions, lists of constants and names referenced by certain instructions, the exception handling table, and the table that translates instruction offsets to line numbers
- **return offset**, only relevant during calls, telling the interpreter where to return

There are some other fields in the frame structure of less importance; notably frames are linked together in a singly-linked list via the `previous` pointer, pointing from callee to caller.
The frame also holds a pointer to the current function, globals, builtins, and the locals converted to dict (used to support the `locals()` built-in).

## Fast locals and evaluation stack

The frame contains a single array of object pointers, `localsplus`, which contains both the fast locals and the stack.
The top of the stack, including the locals, is indicated by `stacktop`.
For example, in a function with three locals, if the stack contains one value, `frame->stacktop == 4`.
# Thread state and interpreter state

The interpreters share an implementation which uses the same memory but caches the depth (as a pointer) in a C local, `stack_pointer`.
We aren't sure yet exactly how the JIT will implement the stack; likely some of the values near the top of the stack will be held in registers.
An important piece of VM state is the **thread state**, held in `tstate`.
The current frame pointer, `frame`, is always equal to `tstate->current_frame`.
The thread state also holds the exception state (`tstate->exc_info`) and the recursion counters (`tstate->c_recursion_remaining` and `tstate->py_recursion_remaining`).

## Instruction pointer
The thread state is also used to access the **interpreter state** (`tstate->interp`), which is important since the "eval breaker" flags are stored there (`tstate->interp->ceval.eval_breaker`, an "atomic" variable), as well as the "PEP 523 function" (`tstate->interp->eval_frame`).
The interpreter state also holds the optimizer state (`optimizer` and some counters).
Note that the eval breaker may be moved to the thread state soon as part of the multicore (PEP 703) work.

The canonical, in-memory, representation of the instruction pointer is `frame->instr_ptr`.
It always points to an instruction in the bytecode array of the frame's code object.
Dispatching on `frame->instr_ptr` would be very inefficient, so in Tier 1 we cache the upcoming value of `frame->instr_ptr` in the C local `next_instr`.

## Tier 2

Expand All @@ -47,7 +30,7 @@ The Tier 2 instruction pointer is strictly internal to the Tier 2 interpreter, s

## Unwinding

Unwinding uses exception tables to find the next point at which normal execution can occur, or fail if there are no exception handlers.
Unwinding uses exception tables to find the next point at which normal execution can occur, or fail if there are no exception handlers. For more information on what exception tables are, see [exception handling](exception_handling.md).
During unwinding both the stack and the instruction pointer should be in their canonical, in-memory representation.

## Jumps in bytecode
Expand All @@ -68,23 +51,3 @@ Patching exits should be fairly straightforward in the interpreter.
It will be more complex in the JIT.

(We might also consider deoptimizations as a separate jump type.)

# Thread state and interpreter state

Another important piece of VM state is the **thread state**, held in `tstate`.
The current frame pointer, `frame`, is always equal to `tstate->current_frame`.
The thread state also holds the exception state (`tstate->exc_info`) and the recursion counters (`tstate->c_recursion_remaining` and `tstate->py_recursion_remaining`).

The thread state is also used to access the **interpreter state** (`tstate->interp`), which is important since the "eval breaker" flags are stored there (`tstate->interp->ceval.eval_breaker`, an "atomic" variable), as well as the "PEP 523 function" (`tstate->interp->eval_frame`).
The interpreter state also holds the optimizer state (`optimizer` and some counters).
Note that the eval breaker may be moved to the thread state soon as part of the multicore (PEP 703) work.

# Tier 2 IR format

The tier 2 IR (Internal Representation) format is also the basis for the Tier 2 interpreter (though the two formats may eventually differ). This format is also used as the input to the machine code generator (the JIT compiler).

Tier 2 IR entries are all the same size; there is no equivalent to `EXTENDED_ARG` or trailing inline cache entries. Each instruction is a struct with the following fields (all integers of varying sizes):

- **opcode**: Sometimes the same as a Tier 1 opcode, sometimes a separate micro opcode. Tier 2 opcodes are 9 bits (as opposed to Tier 1 opcodes, which fit in 8 bits). By convention, Tier 2 opcode names start with `_`.
- **oparg**: The argument. Usually the same as the Tier 1 oparg after expansion of `EXTENDED_ARG` prefixes. Up to 32 bits.
- **operand**: An additional argument, Typically the value of *one* cache item from the Tier 1 inline cache, up to 64 bits.
Loading