Closed
Description
At this point it makes sense to have a PEP about interpreter isolation. This will be a companion to PEP 554. The PEP will cover the following:
- objective
- enable multi-core parallelism for Python code (incl. per-interpreter GIL)
- concurrency model (see PEP 554)
- strategy: make static globals per-interpreter
- move globals to PyInterpreterState (or module state)
- incl.
PyInterpreterState.global_objects
(Include/internal/pycore_global_objects.h) - share pointers in main interpreter with subinterpreters
- incl.
- tool to identify globals (& test to verify)
- move globals to PyInterpreterState (or module state)
- compatibility (fully compatible)
- maintenance burden impact (more maintainable?)
- performance impact
- C-API obstacles
- N objects exposed in public C-API
- N exception types (
PyObject *
variable) - N other types (
PyTypeObject
variable, not pointer) - 5 singletons (macro to address of
PyObject
variable, not pointer)
- N exception types (
- most are in limited API (N exceptions, M other types, 5 singletons)
- solution A:
- add per-interpreter lookup functions to C-API
- replace objects with calls to the per-interpreter lookup functions
- stop exposing the objects directly in
Include/*.h
; keep using them for the main interpreter - non-pointer objects require trickiness
- limited API (< 3.11)
- keep exporting all existing symbols from object files (used for main interpreter too)
- disallow such extensions in subinterpreters
- solution B:
- make all the C-API objects "immortal"
- solution C:
- (too much work and too fragile)
- for the main interpreter use the existing objects as-is
- for subinterpreters, do a lookup using the existing objects as keys into a per-interpreter mapping
- requires that every C-API function possibly taking one of the objects be updated to do that lookup on its args
- N objects exposed in public C-API
- extension modules
- concerns & impact
- mitigation strategy
- assistance
draft
PEP: NNN
Title: Isolating Multiple Interpreters in a Process, including the GIL
Author: Eric Snow <ericsnowcurrently@gmail.com>
BDFL-Delegate: ...
Status: Draft
Type: Standards Track ???
Content-Type: text/x-rst
Created: DD-MMM-2021
Python-Version: 3.11
Post-History: DD-MMM-2021
Abstract
========
CPython has supported multiple interpreters in the same process (AKA
"subinterpreters") since version 1.5 (1997). The feature has been
available via the C-API. [c-api]_ PEP 554 discusses some of the value of
subinterpreters and the merits of exposing them to Python code.
However, that PEP purposefully avoids discussion about isolation,
especially related to the GIL. This PEP fills that role.
The more isolation there is between interpreters, the more value they
can offer. Currently subinterpreters operate in
`relative isolation from one another <Interpreter Isolation_>`_. If they
were fully isolated then they could operate in parallel on multi-core
hosts.
This proposal identifies a path forward to reach full isolation between
interpreters. This includes making the GIL per-interpreter.
Proposal
========
TBD
Rationale
=========
TBD
Concerns
--------
TBD
About Subinterpreters
=====================
(copied from PEP 554, needs editing)
Concurrency
-----------
Concurrency is a challenging area of software development. Decades of
research and practice have led to a wide variety of concurrency models,
each with different goals. Most center on correctness and usability.
One class of concurrency models focuses on isolated threads of
execution that interoperate through some message passing scheme. A
notable example is `Communicating Sequential Processes`_ (CSP) (upon
which Go's concurrency is roughly based). The isolation inherent to
subinterpreters makes them well-suited to this approach.
Shared data
-----------
Subinterpreters are inherently isolated (with caveats explained below),
in contrast to threads. So the same communicate-via-shared-memory
approach doesn't work. Without an alternative, effective use of
concurrency via subinterpreters is significantly limited.
The key challenge here is that sharing objects between interpreters
faces complexity due to various constraints on object ownership,
visibility, and mutability. At a conceptual level it's easier to
reason about concurrency when objects only exist in one interpreter
at a time. At a technical level, CPython's current memory model
limits how Python *objects* may be shared safely between interpreters;
effectively objects are bound to the interpreter in which they were
created. Furthermore, the complexity of *object* sharing increases as
subinterpreters become more isolated, e.g. after GIL removal.
Consequently,the mechanism for sharing needs to be carefully considered.
There are a number of valid solutions, several of which may be
appropriate to support in Python. This proposal provides a single basic
solution: "channels". Ultimately, any other solution will look similar
to the proposed one, which will set the precedent. Note that the
implementation of ``Interpreter.run()`` will be done in a way that
allows for multiple solutions to coexist, but doing so is not
technically a part of the proposal here.
Regarding the proposed solution, "channels", it is a basic, opt-in data
sharing mechanism that draws inspiration from pipes, queues, and CSP's
channels. [fifo]_
As simply described earlier by the API summary,
channels have two operations: send and receive. A key characteristic
of those operations is that channels transmit data derived from Python
objects rather than the objects themselves. When objects are sent,
their data is extracted. When the "object" is received in the other
interpreter, the data is converted back into an object owned by that
interpreter.
To make this work, the mutable shared state will be managed by the
Python runtime, not by any of the interpreters. Initially we will
support only one type of objects for shared state: the channels provided
by ``create_channel()``. Channels, in turn, will carefully manage
passing objects between interpreters.
This approach, including keeping the API minimal, helps us avoid further
exposing any underlying complexity to Python users. Along those same
lines, we will initially restrict the types that may be passed through
channels to the following:
* None
* bytes
* str
* int
* channels
Limiting the initial shareable types is a practical matter, reducing
the potential complexity of the initial implementation. There are a
number of strategies we may pursue in the future to expand supported
objects and object sharing strategies.
Interpreter Isolation
---------------------
CPython's interpreters are intended to be strictly isolated from each
other. Each interpreter has its own copy of all modules, classes,
functions, and variables. The same applies to state in C, including in
extension modules. The CPython C-API docs explain more. [caveats]_
However, there are ways in which interpreters share some state. First
of all, some process-global state remains shared:
* file descriptors
* builtin types (e.g. dict, bytes)
* singletons (e.g. None)
* underlying static module data (e.g. functions) for
builtin/extension/frozen modules
There are no plans to change this.
Second, some isolation is faulty due to bugs or implementations that did
not take subinterpreters into account. This includes things like
extension modules that rely on C globals. [cryptography]_ In these
cases bugs should be opened (some are already):
* readline module hook functions (http://bugs.python.org/issue4202)
* memory leaks on re-init (http://bugs.python.org/issue21387)
Finally, some potential isolation is missing due to the current design
of CPython. Improvements are currently going on to address gaps in this
area:
* GC is not run per-interpreter [global-gc]_
* at-exit handlers are not run per-interpreter [global-atexit]_
* extensions using the ``PyGILState_*`` API are incompatible [gilstate]_
* interpreters share memory management (e.g. allocators, gc)
* interpreters share the GIL
Existing Usage
--------------
Subinterpreters are not a widely used feature. In fact, the only
documented cases of widespread usage are
`mod_wsgi <https://github.com/GrahamDumpleton/mod_wsgi>`_,
`OpenStack Ceph <https://github.com/ceph/ceph/pull/14971>`_, and
`JEP <https://github.com/ninia/jep>`_. On the one hand, these cases
provide confidence that existing subinterpreter support is relatively
stable. On the other hand, there isn't much of a sample size from which
to judge the utility of the feature.
Alternate Python Implementations
================================
(not affected? this is CPython-only)
Interpreter "Isolated" Mode
===========================
(copied from PEP 554, needs editing)
By default, every new interpreter created by ``interpreters.create()``
has specific restrictions on any code it runs. This includes the
following:
* importing an extension module fails if it does not implement the
PEP 489 API
* new threads of any kind are not allowed
* ``os.fork()`` is not allowed (so no ``multiprocessing``)
* ``os.exec*()``, AKA "fork+exec", is not allowed (so no ``subprocess``)
This represents the full "isolated" mode of subinterpreters. It is
applied when ``interpreters.create()`` is called with the "isolated"
keyword-only argument set to ``True`` (the default). If
``interpreters.create(isolated=False)`` is called then none of those
restrictions is applied.
One advantage of this approach is that it allows extension maintainers
to check subinterpreter compatibility before they implement the PEP 489
API. Also note that ``isolated=False`` represents the historical
behavior when using the existing subinterpreters C-API, thus providing
backward compatibility. For the existing C-API itself, the default
remains ``isolated=False``. The same is true for the "main" module, so
existing use of Python will not change.
We may choose to later loosen some of the above restrictions or provide
a way to enable/disable granular restrictions individually. Regardless,
requiring PEP 489 support from extension modules will always be a
default restriction.
Documentation
=============
TBD
Deferred Functionality
======================
TBD
Rejected Ideas
==============
TBD
Implementation
==============
TBD
References
==========
.. [c-api]
https://docs.python.org/3/c-api/init.html#sub-interpreter-support
.. [caveats]
https://docs.python.org/3/c-api/init.html#bugs-and-caveats
.. [petr-c-ext]
https://mail.python.org/pipermail/import-sig/2016-June/001062.html
https://mail.python.org/pipermail/python-ideas/2016-April/039748.html
.. [cryptography]
https://github.com/pyca/cryptography/issues/2299
.. [global-gc]
http://bugs.python.org/issue24554
.. [gilstate]
https://bugs.python.org/issue10915
http://bugs.python.org/issue15751
.. [global-atexit]
https://bugs.python.org/issue6531
.. [bug-rate]
https://mail.python.org/pipermail/python-ideas/2017-September/047094.html
.. [benefits]
https://mail.python.org/pipermail/python-ideas/2017-September/047122.html
.. [main-thread]
https://mail.python.org/pipermail/python-ideas/2017-September/047144.html
https://mail.python.org/pipermail/python-dev/2017-September/149566.html
.. [reset_globals]
https://mail.python.org/pipermail/python-dev/2017-September/149545.html
.. [multi-core-project]
https://github.com/ericsnowcurrently/multi-core-python
.. [cache-line-ping-pong]
https://mail.python.org/archives/list/python-dev@python.org/message/3HVRFWHDMWPNR367GXBILZ4JJAUQ2STZ/
.. [extension-docs]
https://docs.python.org/3/extending/index.html
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:
Metadata
Metadata
Assignees
Labels
No labels