-
-
Notifications
You must be signed in to change notification settings - Fork 31.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[subinterpreters] Meta issue: per-interpreter GIL #84692
Comments
To be able to run multiple (sub)interpreters in parallel, the unique global interpreter lock aka "GIL" should be replace with multiple GILs: one "GIL" per interpreter. The scope of such per-interpreter GIL would be a single interpreter. The current CPython code base is not fully read to have one GIL per interpreter. TODO:
Until we can ensure that no Python object is shared between two interpreters, we might make PyObject.ob_refcnt, PyGC_Head (_gc_next and _gc_prev) and _dictkeysobject.dk_refcnt atomic. C extension modules should be modified as well:
I'm not sure how to handle C extensions which are binding for a C library which has a state and so should not be used multiple times in parallel. Some C extensions use a "global lock" for that. The question is how to get Most of these tasks are already tracked in Eric Snow's "Multi Core Python" project: This issue is related to PEP-554 "Multiple Interpreters in the Stdlib", but not required by this PEP. This issue is a tracker for sub-issues related to the goal "have one GIL per interpreter". -- Some changes have a negative impact on "single threaded" Python application. Even if the overhead is low, one option to be able to move faster on this issue may be to add a new temporary configure option to have an opt-in build mode to better isolate subinterpreters. Examples:
That would be a temporary solution to "unblock" the development on this list. For the long term, free lists should be made per-interpreter, pymalloc should support multiple interpreters, no Python object must be shared by two interpreters, etc. -- One idea to detect if a Python object is shared by two interpreters *in debug mode* would be to store a reference to the interpreter which created it, and then check if the current interpreter is the same. If not, fail with a Python Fatal Error. -- During Python 3.9 development cycle, many states moved from the global _PyRuntimeState to per-interpreter PyInterpreterState:
Many corner cases related to daemon threads have also been fixed:
And more code is now shared for the initialization and finalization of the main interpreter and subinterpreters (ex: see bpo-38858). Subinterpreters builtins and sys are now really isolated from the main interpreter (bpo-38858). -- Obviously, there are likely tons of other issues which are not known at this stage. Again, this issue is a placeholder to track them all. It may be more efficient to create one sub-issue per sub-task, rather than discussing all tasks at the same place. |
I created bpo-40513: "Move _PyRuntimeState.ceval to PyInterpreterState". |
I created bpo-40514: "Add --experimental-isolated-subinterpreters build option". |
I created bpo-40522: "Subinterpreters: get the current Python interpreter state from Thread Local Storage (autoTSSkey)". |
Attached demo.py: benchmark to compare performance of sequential execution, threads and subinterpreters. |
(oops, there was a typo in my script: threads and subinterpreters was the same benchmark) |
Hum, demo.py is not reliable for threads: the standard deviation is quite large. I rewrote it using pyperf to compute the average and the standard deviation. |
I updated demo-pyperf.py to also benchmark multiprocessing. |
I created bpo-40533: "Subinterpreters: don't share Python objects between interpreters". |
See also bpo-39465: "Design a subinterpreter friendly alternative to _Py_IDENTIFIER". Currently, this C API is not compatible with subinterpreters. |
By the way, tracemalloc is not compatible with subinterpreters. test.support.run_in_subinterp() skips the test if tracemalloc is tracing. |
I marked bpo-36877 "[subinterpreters][meta] Move fields from _PyRuntimeState to PyInterpreterState" as a duplicate of this issue. |
I created a new "Subinterpreters" component in the bug tracker. It may help to better track all issues related to subinterpreters. |
Currently, the import lock is shared by all interpreters. It would also help for performance to make it per-interpreter to parallelize imports. |
Update of the EXPERIMENTAL_ISOLATED_SUBINTERPRETERS status. I made many free lists and singletons per interpreter in bpo-40521. TODO:
I'm investigating performance of my _PyUnicode_FromId() PR: #20058 This PR now uses "atomic functions" proposed in a second PR: #20766 The "atomic functions" avoids the need to have to declare a variable or a structure member as atomic, which would cause different issues if they are declared in Python public headers (which is the case for _Py_Identifier used by _PyUnicode_FromId()). |
Also:
Misc notes:
|
FYI I'm also using https://pythondev.readthedocs.io/subinterpreters.html to track the progress on isolating subinterpreters. |
See also bpo-15751: "Make the PyGILState API compatible with subinterpreters". |
I created bpo-42745: "[subinterpreters] Make the type attribute lookup cache per-interpreter". |
I played with ./configure --with-experimental-isolated-subinterpreters. I tried to run "pip list" in parallel in multiple interpreters. I hit multiple issues:
To run "pip list", I used: CODE = """
import runpy
import sys
import traceback
sys.argv = ["pip", "list"]
try:
runpy.run_module("pip", run_name="__main__", alter_sys=True)
except SystemExit:
pass
except Exception as exc:
traceback.print_exc()
print("BUG", exc)
raise
""" |
Attached resolve_slotdups.patch works around the issue by removing the cache. |
FYI I wrote an article about this issue: "Isolate Python Subinterpreters" |
See bpo-43313: "feature: support pymalloc for subinterpreters. each subinterpreter has pymalloc_state". |
PyStructSequence_InitType2() is not compatible with subinterpreters: it uses static types. Moreover, it allocates tp_members memory which is not released when the type is destroyed. But I'm not sure that the type is ever destroyed, since this API is designed for static types. |
IMO, I suggest to create a new function, PyStructSequence_FromModuleAndDesc(module, desc, flags) to create a heaptype and don't aloocates memory block for tp_members,something like 'PyType_FromModuleAndSpec()`. I don't know there have any block issue to do this converting operation. But I can take a look. @petr ping, Petr, do you have any better idea about this question :) |
Hai Shi:
Please create a new issue. If possible, I would prefer to have a sub-issue for that, to keep this issue as a tracking issue for all issues related to subinterpreters. |
bpo-45113: [subinterpreters][C API] Add a new function to create PyStructSequence from Heap. |
PyStructSequence_NewType exists, and is the same as the proposed PyStructSequence_FromModuleAndDesc except it doesn't take the module (which isn't necessary: PyStructSequence_Desc has no way to define functionality that would need the module state). |
Sadly, I don't have the bandwidth to work on this issue, so I just close it. @ericsnowcurrently is now working on https://peps.python.org/pep-0684/ which is a little bit different. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: