Skip to content

Update DSL docs for cases generator #105753

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 14, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 60 additions & 44 deletions Tools/cases_generator/interpreter_definition.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,26 +67,24 @@ parts of instructions, we can reduce the potential for errors considerably.

## Specification

This specification is at an early stage and is likely to change considerably.
This specification is a work in progress.
We update it as the need arises.

Syntax
------
### Syntax

Each op definition has a kind, a name, a stack and instruction stream effect,
and a piece of C code describing its semantics::

```
file:
(definition | family)+
(definition | family | pseudo)+

definition:
"inst" "(" NAME ["," stack_effect] ")" "{" C-code "}"
|
"op" "(" NAME "," stack_effect ")" "{" C-code "}"
|
"macro" "(" NAME ")" "=" uop ("+" uop)* ";"
|
"super" "(" NAME ")" "=" NAME ("+" NAME)* ";"

stack_effect:
"(" [inputs] "--" [outputs] ")"
Expand Down Expand Up @@ -122,16 +120,17 @@ and a piece of C code describing its semantics::
object "[" C-expression "]"

family:
"family" "(" NAME ")" = "{" NAME ("," NAME)+ "}" ";"
"family" "(" NAME ")" = "{" NAME ("," NAME)+ [","] "}" ";"

pseudo:
"pseudo" "(" NAME ")" = "{" NAME ("," NAME)+ [","] "}" ";"
```

The following definitions may occur:

* `inst`: A normal instruction, as previously defined by `TARGET(NAME)` in `ceval.c`.
* `op`: A part instruction from which macros can be constructed.
* `macro`: A bytecode instruction constructed from ops and cache effects.
* `super`: A super-instruction, such as `LOAD_FAST__LOAD_FAST`, constructed from
normal or macro instructions.

`NAME` can be any ASCII identifier that is a C identifier and not a C or Python keyword.
`foo_1` is legal. `$` is not legal, nor is `struct` or `class`.
Expand Down Expand Up @@ -159,15 +158,21 @@ By convention cache effects (`stream`) must precede the input effects.

The name `oparg` is pre-defined as a 32 bit value fetched from the instruction stream.

### Special functions/macros

The C code may include special functions that are understood by the tools as
part of the DSL.

Those functions include:

* `DEOPT_IF(cond, instruction)`. Deoptimize if `cond` is met.
* `ERROR_IF(cond, label)`. Jump to error handler if `cond` is true.
* `ERROR_IF(cond, label)`. Jump to error handler at `label` if `cond` is true.
* `DECREF_INPUTS()`. Generate `Py_DECREF()` calls for the input stack effects.

Note that the use of `DECREF_INPUTS()` is optional -- manual calls
to `Py_DECREF()` or other approaches are also acceptable
(e.g. calling an API that "steals" a reference).

Variables can either be defined in the input, output, or in the C code.
Variables defined in the input may not be assigned in the C code.
If an `ERROR_IF` occurs, all values will be removed from the stack;
Expand All @@ -187,17 +192,39 @@ These requirements result in the following constraints on the use of
intermediate results.)
3. No `DEOPT_IF` may follow an `ERROR_IF` in the same block.

Semantics
---------
(There is some wiggle room: these rules apply to dynamic code paths,
not to static occurrences in the source code.)

If code detects an error condition before the first `DECREF` of an input,
two idioms are valid:

- Use `goto error`.
- Use a block containing the appropriate `DECREF` calls ending in
`ERROR_IF(true, error)`.

An example of the latter would be:
```cc
res = PyObject_Add(left, right);
if (res == NULL) {
DECREF_INPUTS();
ERROR_IF(true, error);
}
```

### Semantics

The underlying execution model is a stack machine.
Operations pop values from the stack, and push values to the stack.
They also can look at, and consume, values from the instruction stream.

All members of a family must have the same stack and instruction stream effect.
All members of a family
(which represents a specializable instruction and its specializations)
must have the same stack and instruction stream effect.

The same is true for all members of a pseudo instruction
(which is mapped by the bytecode compiler to one of its members).

Examples
--------
## Examples

(Another source of examples can be found in the [tests](test_generator.py).)

Expand Down Expand Up @@ -237,27 +264,6 @@ This would generate:
}
```

### Super-instruction definition

```C
super ( LOAD_FAST__LOAD_FAST ) = LOAD_FAST + LOAD_FAST ;
```
This might get translated into the following:
```C
TARGET(LOAD_FAST__LOAD_FAST) {
PyObject *value;
value = frame->f_localsplus[oparg];
Py_INCREF(value);
PUSH(value);
NEXTOPARG();
next_instr++;
value = frame->f_localsplus[oparg];
Py_INCREF(value);
PUSH(value);
DISPATCH();
}
```

### Input stack effect and cache effect
```C
op ( CHECK_OBJECT_TYPE, (owner, type_version/2 -- owner) ) {
Expand Down Expand Up @@ -339,14 +345,26 @@ For explanations see "Generating the interpreter" below.)
}
```

### Define an instruction family
These opcodes all share the same instruction format):
### Defining an instruction family

A _family_ represents a specializable instruction and its specializations.

Example: These opcodes all share the same instruction format):
```C
family(load_attr) = { LOAD_ATTR, LOAD_ATTR_INSTANCE_VALUE, LOAD_SLOT };
```

### Defining a pseudo instruction

A _pseudo instruction_ is used by the bytecode compiler to represent a set of possible concrete instructions.

Example: `JUMP` may expand to `JUMP_FORWARD` or `JUMP_BACKWARD`:
```C
family(load_attr) = { LOAD_ATTR, LOAD_ATTR_INSTANCE_VALUE, LOAD_SLOT } ;
pseudo(JUMP) = { JUMP_FORWARD, JUMP_BACKWARD };
```

Generating the interpreter
==========================

## Generating the interpreter

The generated C code for a single instruction includes a preamble and dispatch at the end
which can be easily inserted. What is more complex is ensuring the correct stack effects
Expand Down Expand Up @@ -401,9 +419,7 @@ rather than popping and pushing, such that `LOAD_ATTR_SLOT` would look something
}
```

Other tools
===========
## Other tools

From the instruction definitions we can generate the stack marking code used in `frame.set_lineno()`,
and the tables for use by disassemblers.