-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert static types to heap types: use PyType_FromSpec() #84258
Comments
Some of modules is not using PyType_FromSpec. This changes can bring
|
Let me elaborate. Static types have multiple issues:
|
Wouldn't having less static types slow down startup time? |
Pablo:
That's possible, even if I only expect a minor or non significant overhead. But before starting talking about performances, we should focus on the correctness. Static types are causing issues with subinterpreters and the Python finalization. |
Yes, and not only startup time: |
In the _json module, PyModule_GetState() (called by get_json_state()) is only used by the garbage collector (traverse function) and to unload the module (clear and free functions). It's not used in "hot code" (let me consider that the GC is not part of the usual "hot code" :-)). But maybe we should measure the overhead of future changes if PyModule_GetState() starts to be used in "hot code" by running a few microbenchmarks. -- Stefan Krah:
Aha, that's interesting. I didn't know that it could have an impact on runtime performance as well. _decimal_pep3121-384_v1.patch attached to bpo-15722 seems to use: #define _decimal_state_global ((_decimal_state *)PyModule_GetState(PyState_FindModule(&_decimal_module))) whereas the commit 33f15a1 only uses: static inline _jsonmodulestate*
get_json_state(PyObject *module)
{
void *state = PyModule_GetState(module);
assert(state != NULL);
return (_jsonmodulestate *)state;
} Maybe PyState_FindModule() adds a little overhead, even if this function is simple: in short, it gets the i-th item of a list (from PyInterpreterState.modules_by_index). PyModule_GetState() function is really simple: it only reads PyModuleObject.md_state attribute. Or maybe _decimal_state_global was used in "hot code". If PyState_FindModule() or PyModule_GetState() is the source of the overhead, maybe we could try to optimise these functions, or pass directly the module state to inner functions. -- PyState_FindModule() doesn't work for a module using PyModuleDef_Init(). The PEP-573 is going to give access to the module state in functions which didn't access to it previsouly. The commit 33f15a1 removed a few assertions checking that "self" has the expected type. It wasn't possible to get the module state to get the types, because PEP-573 is not implemented yet and PyState_FindModule() doesn't work in _json (which uses PyModuleDef_Init()). I decided that it's ok to remove these assertions: it should not be possible to call these functions with another type in practice. -- In his PR 19177, Dong-hee started by replacing direct access to PyTypeObject fields, like replacing:
with:
I asked him to revert these changes. I'm interested to experiment a few C extension modules of the stdlib using the limited C API (PEP-384, stable ABI), but I don't think that it should done while converting modules to PyType_FromSpec(). We should separate the two changes. And I would prefer to first see the overhead of PyType_FromSpec(), and discuss the advantages and drawbacks. |
Yes, _decimal has problems here that most modules don't have. I just posted it as an example why one should be a bit cautious.
That could help, we'll see. |
Should we stop the work until the overhead is measured? |
Can you try to measure the _abc._abc_instancecheck() and _abc._abc_subclasscheck() functions performance before/after your change? Functions like register() are rarely call, so I don't care much of their performance (I expect a small overhead or no overhead). |
It shown 1% slower for performance. +--------------------------+-----------------+----------------------------+ +--------------------------+-------------------+----------------------------+ |
I 've submitted the benchmark :) |
IMO 1.01x slower on a *microbenchmark* is not significant so it's ok. Thanks for checking. You pass a *type to isinstance() in bench_isinstance_check.py. You should pass *an instance* instead. Like (complete the (...) ;-)): runner.timeit(name="bench _abc_instancecheck", Would you mind to fix your microbenchmark and re-run it? |
Thanks for the catch my mistake. The result is showing: |
FWIW, I've been considering an approach where the main interpreter |
See also bpo-40601: [C API] Hide static types from the limited C API. |
I tried to finalize static types in Py_Finalize(), but it didn't work: |
PR 20960 (_bz2 module) triggers an interesting question. The effect of converting a static type to a heap type on pickle, especially for protocols 0 and 1. pickle.dumps(o, 0) calls object.__reduce__(o) which calls copyreg._reduce_ex(o, 0). copyreg._reduce_ex() behaves differently on heap types:
There are 3 things which impacts serialization/deserialization:
Examples:
|
Search also for issues with "384" in their title (34 open issues): Search for issues with PEP-3121 keyword (40 open issues): |
Progress: 43% (89/206) of types are declared as heap types on a total of 206 types. $ grep -E 'static PyTypeObject .* =' $(find -name "*.c"|grep -v Doc/)|wc -l
117
$ grep -E 'PyType_Spec .* =' $(find -name "*.c")|wc -l
89 |
Should Modules/_testcapimodule.c stay untouched? |
Yes, please keep _testcapimodule.c as it is. Static types are still supported and need to be tested. |
Ok, perhaps we should leave a comment about that? It already has a comment about multi-phase init. What about Modules/_testbuffer.c and the xx-modules? $ grep -E 'static PyTypeObject .* =' $(find . -name "*.c"|grep -vE '(Doc/|Modules/_testcapimodule)') | wc -l
94
$ grep -E 'PyType_Spec .* =' $(find . -name "*.c")|wc -l (master)cpython.git
95 We're almost halfway there. |
See also bpo-46417 "[subinterpreters] Clear static types in Py_Finalize()". |
Converting further static types to heap types require a discussion. See what the Python Steering Council wrote at Feb 8, 2021: "The Steering Council discussed the ongoing work on porting types in the standard library to heap-types and the subinterpreter-related changes. It was decided that through Pablo, the Steering Council will ask the core developers driving those changes to create an informational PEP and not to make any more changes in this area after beta 1, as per our general policy." |
I used the following shell command to search remaining static types: I found 86 static types in 17 files:
|
Of these 135 static types, most are cleared since bpo-46417 was implemented:
|
I marked bpo-23769 "valgrind reports leaks for test_zipimport" as duplicate of this issue. At exit, Python doesn't clear the static types of the _collections, itertools and _struct extensions:
|
On Windows PC/winreg.c has PyHKEY_Type static type which isn't cleared at exit too. |
PEP 687 was accepted |
Note that this does not imply that all types shall be converted to heap types. This is also mentioned in the PEP; please read it carefully. |
We are on the same page, every module needs to considered individually and I am against converting any built-in modules or types unless it's necessary. |
Superseded by #103092 |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: