Skip to content

Commit 3ceda79

Browse files
author
Anselm Kruis
committed
Stackless issue python#290: bpo-36974, PEP 590 Vectorcall-protocol
Enhance the Stackless-protocol to support the PEP-590 vectorcall protocol. Patch the relevant functions to use the new functionality. No all test cases pass again.
1 parent e9d9b3d commit 3ceda79

File tree

12 files changed

+279
-68
lines changed

12 files changed

+279
-68
lines changed

Doc/c-api/stackless.rst

Lines changed: 119 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,28 @@
33
|SLP| C-API
44
===========
55

6+
|SLP| uses two fundamentally different methods to switch control
7+
flow from one tasklet to another. One method, called *hard-switching*,
8+
manipulates the C-stack with hardware-dependent assembly code. This is always
9+
possible, but somewhat costly. The other method, called *soft-switching*, is
10+
only possible under special conditions, but is cheap. Moreover, soft-switching
11+
allows the storage (pickling) and recovery (unpickling) of active tasklets.
12+
13+
Soft-switching avoids recursive calls to the |PY| interpreter, such as those
14+
that occur when calling a |PY| function, by maintaining a chained list of
15+
tasks that are processed sequentially. This list consists of
16+
:c:type:`PyFrameObject` and :c:type:`PyCFrameObject`
17+
objects chained by their :c:member:`PyFrameObject.f_back` pointer. In the C-function
18+
:c:func:`slp_dispatch` (and :c:func:`slp_dispatch_top`) the list is processed
19+
in a loop. In order to proceed
20+
to the processing of the next (C)frame, all C-functions involved in the
21+
processing of the current (C)frame must return. A special return value
22+
Unwind-Token is used here. If a C-function returns the value :c:data:`Py_UnwindToken`,
23+
its caller must add any unfinished tasks to the (C)frame list and return
24+
:c:data:`Py_UnwindToken` itself. It follows that *soft-switching* is only possible if
25+
it is supported by all functions just called. If this is not the case,
26+
*hard-switching* remains as a fallback.
27+
628
.. note::
729

830
Some switching functions have a variant with the
@@ -383,21 +405,21 @@ Soft-switchable extension functions
383405
The API for soft-switchable extension function has been added on a
384406
provisional basis (see :pep:`411` for details.)
385407
386-
A soft switchable extension function or method is a function or method defined
408+
A soft-switchable extension function or method is a function or method defined
387409
by an extension module written in C. In contrast to an normal C-function you
388410
can soft-switch tasklets while this function executes. Soft-switchable functions
389-
obey the stackless-protocol. At the C-language level
411+
obey the Stackless-protocol. At the C-language level
390412
such a function or method is made from 3 C-definitions:
391413
392414
1. A declaration object of type :c:type:`PyStacklessFunctionDeclaration_Type`.
393415
It declares the soft-switchable function and must be declared as a global
394416
variable.
395417
2. A conventional extension function, that uses
396-
:c:func:`PyStackless_CallFunction` to call the soft switchable function.
418+
:c:func:`PyStackless_CallFunction` to call the soft-switchable function.
397419
3. A C-function of type ``slp_softswitchablefunc``. This function provides the
398-
implemantation of the soft switchable function.
420+
implemantation of the soft-switchable function.
399421
400-
To create a soft switchable function declaration simply define it as a static
422+
To create a soft-switchable function declaration simply define it as a static
401423
variable and call :c:func:`PyStackless_InitFunctionDeclaration` from your
402424
module init code to initialise it. See the example code in the source
403425
of the extension module `_teststackless <https://github.com/stackless-dev/stackless/blob/master-slp/Stackless/module/_teststackless.c>`_.
@@ -410,7 +432,7 @@ Typedef ``slp_softswitchablefunc``::
410432
411433
.. c:type:: PyStacklessFunctionDeclarationObject
412434
413-
This subtype of :c:type:`PyObject` represents a Stackless soft switchable
435+
This subtype of :c:type:`PyObject` represents a Stackless soft-switchable
414436
extension function declaration object.
415437
416438
Here is the structure definition::
@@ -437,7 +459,7 @@ Typedef ``slp_softswitchablefunc``::
437459
.. c:var:: PyTypeObject PyStacklessFunctionDeclaration_Type
438460
439461
This instance of :c:type:`PyTypeObject` represents the Stackless
440-
soft switchable extension function declaration type.
462+
soft-switchable extension function declaration type.
441463
442464
.. c:function:: int PyStacklessFunctionDeclarationType_CheckExact(PyObject *p)
443465
@@ -446,7 +468,7 @@ Typedef ``slp_softswitchablefunc``::
446468
447469
.. c:function:: PyObject* PyStackless_CallFunction(PyStacklessFunctionDeclarationObject *sfd, PyObject *arg, PyObject *ob1, PyObject *ob2, PyObject *ob3, long n, void *any)
448470
449-
Invoke the soft switchable extension, which is represented by *sfd*.
471+
Invoke the soft-switchable extension, which is represented by *sfd*.
450472
Pass *arg* as initial value for argument *retval* and *ob1*, *ob2*, *ob3*,
451473
*n* and *any* as general purpose in-out-arguments.
452474
@@ -457,42 +479,67 @@ Typedef ``slp_softswitchablefunc``::
457479
Initialize the fields :c:member:`PyStacklessFunctionDeclarationObject.name` and
458480
:c:member:`PyStacklessFunctionDeclarationObject.module_name` of *sfd*.
459481
460-
Within the body of a soft switchable extension function (or any other C-function, that obyes the stackless-protocol)
482+
Within the body of a soft-switchable extension function (or any other C-function, that obyes the stackless-protocol)
461483
you need the following macros.
462484
463-
Macros for the "stackless-protocol"
485+
Macros for the "Stackless-protocol"
464486
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
465487
466-
How to does Stackless Python decide, if a function may return an unwind-token?
467-
There is one global variable "_PyStackless_TRY_STACKLESS"[#]_ which is used
468-
like an implicit parameter. Since we don't have a real parameter,
469-
the flag is copied into the local variable "stackless" and cleared.
470-
This is done by the STACKLESS_GETARG() macro, which should be added to
471-
the top of the function's declarations.
472488
473-
The idea is to keep the chances to introduce error to the minimum.
474-
A function can safely do some tests and return before calling
475-
anything, since the flag is in a local variable.
476-
Depending on context, this flag is propagated to other called
477-
functions. They *must* obey the protocol. To make this sure,
478-
the STACKLESS_ASSERT() macro has to be called after every such call.
489+
How does a C-function in |SLP| decide whether it may return
490+
:c:data:`Py_UnwindToken`? (After all, this is only allowed if the caller can handle
491+
:c:data:`Py_UnwindToken`). The obvious thing would be to use your own function
492+
argument, but that would change the function prototypes and thus
493+
Python's C-API. This is not practical. Instead, the global variable
494+
"_PyStackless_TRY_STACKLESS"[#f1]_ is used as an implicit parameter.
495+
496+
The content of this variable is moved to the local variable "stackless"
497+
at the beginning of a C function. In the process, "_PyStackless_TRY_STACKLESS"
498+
is set to 0, indicating that no unwind-token may be returned.
499+
This is done with the macro :c:func:`STACKLESS_GETARG` or, for vectorcall [#f2]_ functions,
500+
with the macro :c:func:`STACKLESS_VECTORCALL_GETARG`, which should be added at the
501+
beginning of the function declaration.
502+
503+
This design minimizes the possibility of introducing errors due to improper
504+
return of :c:data:`Py_UnwindToken`. The function can contain arbitrary code because the
505+
flag is hidden in a local variable. If the function is to support
506+
*soft-switching*, it must be further adapted. The flag may only be passed to
507+
other called functions if they adhere to the Stackless-protocol. The macros
508+
STACKLESS_PROMOTExxx() serve this purpose. To ensure compliance with the
509+
protocol, the macro :c:func:`STACKLESS_ASSERT` must be called after each such call.
510+
An exception is the call of vectorcall functions. The call of a vectorcall
511+
function must be framed with the macros :c:func:`STACKLESS_VECTORCALL_BEFORE` and
512+
:c:func:`STACKLESS_VECTORCALL_AFTER` or - more simply - performed with the macro
513+
:c:func:`STACKLESS_VECTORCALL`.
479514
480515
Many internal functions have been patched to support this protocol.
481516
Their first action is a direct or indirect call of the macro
482-
:c:func:`STACKLESS_GETARG`.
517+
:c:func:`STACKLESS_GETARG` or :c:func:`STACKLESS_VECTORCALL_GETARG`.
483518
484519
.. c:function:: STACKLESS_GETARG()
485520
486-
Define the local variable ``int stackless`` and move the global
487-
"_PyStackless_TRY_STACKLESS" flag into the local variable "stackless".
488-
After a call to :c:func:`STACKLESS_GETARG` the value of
521+
Define and initialize the local variable ``int stackless``.
522+
The value of *stackless* is non-zero, if the function may return
523+
:c:data:`Py_UnwindToken`.
524+
After a call to :c:func:`STACKLESS_GETARG` the value of the global variable
489525
"_PyStackless_TRY_STACKLESS" is 0.
490526
527+
.. c:function:: STACKLESS_VECTORCALL_GETARG(func)
528+
529+
.. versionadded:: 3.8.0
530+
531+
Vectorcall variant of the macro :c:func:`STACKLESS_GETARG`. Functions of type
532+
:c:type:`vectorcallfunc` must use :c:func:`STACKLESS_VECTORCALL_GETARG` instead
533+
of :c:func:`STACKLESS_GETARG`. The argument *func* must be set to the vectorcall
534+
function itself. See function :c:func:`_PyCFunction_FastCallKeywords` for an example.
535+
491536
.. c:function:: STACKLESS_PROMOTE_ALL()
492537
493-
All STACKLESS_PROMOTE_xxx macros are used to propagate the stackless-flag
538+
All STACKLESS_PROMOTExxx() macros are used to propagate the stackless-flag
494539
from the local variable "stackless" to the global variable
495-
"_PyStackless_TRY_STACKLESS". The macro :c:func:`STACKLESS_PROMOTE_ALL` does
540+
"_PyStackless_TRY_STACKLESS". These macros can't be used to call a vectorcall [#f2]_ function.
541+
542+
The macro :c:func:`STACKLESS_PROMOTE_ALL` does
496543
this unconditionally. It is used for cases where we know that the called
497544
function will take care of our object, and we need no test. For example,
498545
:c:func:`PyObject_Call` and all other Py{Object,Function,CFunction}_*Call*
@@ -539,6 +586,40 @@ Their first action is a direct or indirect call of the macro
539586
Set the global variable "_PyStackless_TRY_STACKLESS" unconditionally to 0.
540587
Rarely used.
541588
589+
.. c:function:: STACKLESS_VECTORCALL_BEFORE(func)
590+
591+
.. c:function:: STACKLESS_VECTORCALL_AFTER(func)
592+
593+
.. versionadded:: 3.8.0
594+
595+
If a C-function needs to propagate the stackless-flag
596+
from the local variable "stackless" to the global variable
597+
"_PyStackless_TRY_STACKLESS" in order to call a vectorcall [#f2]_ function, it
598+
must frame the call with these macros. Set the argument *func* to the called
599+
function. The called function *func* is not required to support the
600+
Stackless-protocol. [#f3]_ Example:
601+
602+
.. code-block:: C
603+
604+
STACKLESS_GETARG();
605+
vectorcallfunc func = a_vectorcal_function;
606+
607+
/* other code */
608+
609+
STACKLESS_VECTORCALL_BEFORE(func);
610+
PyObject * result = func(callable, args, nargsf, kwnames);
611+
STACKLESS_VECTORCALL_AFTER(func);
612+
return result;
613+
614+
.. c:function:: STACKLESS_VECTORCALL(func, callable, args, nargsf, kwnames)
615+
616+
.. versionadded:: 3.8.0
617+
618+
Call the vectorcall function *func* with the given arguments and return
619+
the result. It is a convenient alternative to the macros
620+
:c:func:`STACKLESS_VECTORCALL_BEFORE` and :c:func:`STACKLESS_VECTORCALL_AFTER`.
621+
The called function *func* is not required to support the Stackless-protocol.
622+
542623
Examples
543624
~~~~~~~~
544625
@@ -557,9 +638,18 @@ Another, more realistic example is :py:const:`_asyncio._task_step_impl_stackless
557638
"Modules/_asynciomodules.c".
558639
559640
560-
.. [#] Actually "_PyStackless_TRY_STACKLESS" is a macro that expands to a C L-value. As long as
641+
.. [#f1] Actually "_PyStackless_TRY_STACKLESS" is a macro that expands to a C L-value. As long as
561642
|CPY| uses the GIL, this L-value is a global variable.
562643
644+
.. [#f2] See :pep:`590` Vectorcall: a fast calling protocol for CPython
645+
646+
.. [#f3] If a |PY| type supports the :pep:`590` Vectorcall-protocol the actual :c:type:`vectorcallfunc`
647+
C-function is a per object property. This speeds up calling vectorcall functions on classes,
648+
but the consequence is, that it is no longer possible to use a flag in the type to indicate,
649+
if the vectorcall slot supports the Stackless-protocol. Therefore |SLP|
650+
has special macros to deal with vectorcall functions.
651+
652+
563653
Debugging and monitoring Functions
564654
----------------------------------
565655

Include/cpython/abstract.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -128,7 +128,9 @@ _PyObject_Vectorcall(PyObject *callable, PyObject *const *args,
128128
return _PyObject_MakeTpCall(callable, args, nargs, kwnames);
129129
}
130130
STACKLESS_GETARG();
131+
STACKLESS_VECTORCALL_BEFORE(func);
131132
PyObject *res = func(callable, args, nargsf, kwnames);
133+
STACKLESS_VECTORCALL_AFTER(func);
132134
return _Py_CheckFunctionResult(callable, res, NULL);
133135
}
134136

Include/cpython/slp_structs.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -317,7 +317,7 @@ PyAPI_DATA(PyTypeObject) PyChannel_Type;
317317
******************************************************/
318318

319319
#ifndef _PyStackless_TRY_STACKLESS
320-
PyAPI_DATA(int * const) _PyStackless__TryStacklessPtr;
320+
PyAPI_DATA(intptr_t * const) _PyStackless__TryStacklessPtr;
321321
#define _PyStackless_TRY_STACKLESS (*_PyStackless__TryStacklessPtr)
322322
#endif
323323
#ifndef STACKLESS__GETARG_ASSERT

Include/internal/pycore_slp_pystate.h

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -70,16 +70,28 @@ typedef struct {
7070
*/
7171
struct _stackless_runtime_state {
7272
/*
73-
* flag whether the next call should try to be stackless.
74-
* The protocol is: This flag may be only set if the called
73+
* try_stackless: flag whether the next call should try to be stackless.
74+
*
75+
* Possible values:
76+
* 0: don't be stackless
77+
* 1: any called C-function shall try to be stackless.
78+
* other: only the C-function with address try_stackless shall try to
79+
* by stackless.
80+
*
81+
* The protocol is: This flag may be only set to 1 if the called
7582
* thing supports it. It doesn't matter whether it uses the
7683
* chance, but it *must* set it to zero before returning.
84+
*
85+
* This flag may be set to the address of a directly called C-function.
86+
* It is not required, that the called function supports stackless
87+
* calls.
88+
*
7789
* This flags in a way serves as a parameter that we don't have.
7890
*
7991
* As long as the GIL is shared between sub-interpreters,
8092
* try_stackless can be a field in the runtime state.
8193
*/
82-
int try_stackless;
94+
intptr_t try_stackless;
8395

8496
/* Used to manage free C-stack objects, see stacklesseval.c */
8597
int cstack_cachecount;

0 commit comments

Comments
 (0)