diff --git a/Architecture.md b/Architecture.md index 2f66598c7e3..2b081a487e4 100644 --- a/Architecture.md +++ b/Architecture.md @@ -99,7 +99,7 @@ Thus, when copying a Rust struct to a Python object, we first allocate `PyClassO move `T` into it. The primary way to interact with Python objects implemented in Rust is through the `Bound<'py, T>` smart pointer. -By having the `'py` lifetime of the `Python<'py>` token, this ties the lifetime of the `Bound<'py, T>` smart pointer to the lifetime of the GIL and allows PyO3 to call Python APIs at maximum efficiency. +By having the `'py` lifetime of the `Python<'py>` token, this ties the lifetime of the `Bound<'py, T>` smart pointer to the lifetime for which the thread is attached to the Python interpreter and allows PyO3 to call Python APIs at maximum efficiency. `Bound<'py, T>` requires that `T` implements `PyClass`. This trait is somewhat complex and derives many traits, but the most important one is `PyTypeInfo` diff --git a/Cargo.toml b/Cargo.toml index 030174f62e9..a6e0ed838c2 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -127,7 +127,8 @@ generate-import-lib = ["pyo3-ffi/generate-import-lib"] # Changes `Python::attach` to automatically initialize the Python interpreter if needed. auto-initialize = [] -# Enables `Clone`ing references to Python objects `Py` which panics if the GIL is not held. +# Enables `Clone`ing references to Python objects `Py` which panics if the +# thread is not attached to the Python interpreter. py-clone = [] # Adds `OnceExt` and `MutexExt` implementations to the `parking_lot` types diff --git a/guide/src/class.md b/guide/src/class.md index 8eb71669d02..d296339044c 100644 --- a/guide/src/class.md +++ b/guide/src/class.md @@ -203,9 +203,10 @@ mod my_module { It is often useful to turn a `#[pyclass]` type `T` into a Python object and access it from Rust code. The [`Py`] and [`Bound<'py, T>`] smart pointers are the ways to represent a Python object in PyO3's API. More detail can be found about them [in the Python objects](./types.md#pyo3s-smart-pointers) section of the guide. -Most Python objects do not offer exclusive (`&mut`) access (see the [section on Python's memory model](./python-from-rust.md#pythons-memory-model)). However, Rust structs wrapped as Python objects (called `pyclass` types) often *do* need `&mut` access. Due to the GIL, PyO3 *can* guarantee exclusive access to them. +Most Python objects do not offer exclusive (`&mut`) access (see the [section on Python's memory model](./python-from-rust.md#pythons-memory-model)). However, Rust structs wrapped as Python objects (called `pyclass` types) often *do* need `&mut` access. +However, the Rust borrow checker cannot reason about `&mut` references once an object's ownership has been passed to the Python interpreter. -The Rust borrow checker cannot reason about `&mut` references once an object's ownership has been passed to the Python interpreter. This means that borrow checking is done at runtime using with a scheme very similar to `std::cell::RefCell`. This is known as [interior mutability](https://doc.rust-lang.org/book/ch15-05-interior-mutability.html). +To solve this, PyO3 does borrow checking at runtime using a scheme very similar to `std::cell::RefCell`. This is known as [interior mutability](https://doc.rust-lang.org/book/ch15-05-interior-mutability.html). Users who are familiar with `RefCell` can use `Py` and `Bound<'py, T>` just like `RefCell`. @@ -685,7 +686,8 @@ impl MyClass { } ``` -Calls to these methods are protected by the GIL, so both `&self` and `&mut self` can be used. +Both `&self` and `&mut self` can be used, due to the use of [runtime borrow checking](#bound-and-interior-mutability). + The return type must be `PyResult` or `T` for some `T` that implements `IntoPyObject`; the latter is allowed if the method cannot raise Python exceptions. @@ -828,7 +830,12 @@ impl MyClass { ## Classes as function arguments -Free functions defined using `#[pyfunction]` interact with classes through the same mechanisms as the self parameters of instance methods, i.e. they can take Python-bound references, Python-bound reference wrappers or Python-independent references: +Class objects can be used as arguments to `#[pyfunction]`s and `#[pymethods]` in the same way as the self parameters of instance methods, i.e. they can be passed as: +- `Py` or `Bound<'py, T>` smart pointers to the class Python object, +- `&T` or `&mut T` references to the Rust data contained in the Python object, or +- `PyRef` and `PyRefMut` reference wrappers. + +Examples of each of these below: ```rust,no_run # #![allow(dead_code)] @@ -838,21 +845,21 @@ struct MyClass { my_field: i32, } -// Take a reference when the underlying `Bound` is irrelevant. +// Take a reference to Rust data when the Python object is irrelevant. #[pyfunction] fn increment_field(my_class: &mut MyClass) { my_class.my_field += 1; } // Take a reference wrapper when borrowing should be automatic, -// but interaction with the underlying `Bound` is desired. +// but access to the Python object is still needed #[pyfunction] -fn print_field(my_class: PyRef<'_, MyClass>) { +fn print_field_and_return_me(my_class: PyRef<'_, MyClass>) -> PyRef<'_, MyClass> { println!("{}", my_class.my_field); + my_class } -// Take a reference to the underlying Bound -// when borrowing needs to be managed manually. +// Take (a reference to) a Python object smart pointer when borrowing needs to be managed manually. #[pyfunction] fn increment_then_print_field(my_class: &Bound<'_, MyClass>) { my_class.borrow_mut().my_field += 1; @@ -860,7 +867,8 @@ fn increment_then_print_field(my_class: &Bound<'_, MyClass>) { println!("{}", my_class.borrow().my_field); } -// Take a GIL-independent reference when you want to store the reference elsewhere. +// When the Python object smart pointer needs to be stored elsewhere prefer `Py` over `Bound<'py, T>` +// to avoid the lifetime restrictions. #[pyfunction] fn print_refcnt(my_class: Py, py: Python<'_>) { println!("{}", my_class.get_refcnt(py)); diff --git a/guide/src/conversions/tables.md b/guide/src/conversions/tables.md index 9c2bd0add1f..498e7028a53 100644 --- a/guide/src/conversions/tables.md +++ b/guide/src/conversions/tables.md @@ -51,9 +51,9 @@ It is also worth remembering the following special types: | What | Description | | ---------------- | ------------------------------------- | -| `Python<'py>` | A GIL token, used to pass to PyO3 constructors to prove ownership of the GIL. | -| `Bound<'py, T>` | A Python object connected to the GIL lifetime. This provides access to most of PyO3's APIs. | -| `Py` | A Python object isolated from the GIL lifetime. This can be sent to other threads. | +| `Python<'py>` | A token used to prove attachment to the Python interpreter. | +| `Bound<'py, T>` | A Python object with a lifetime which binds it to the attachment to the Python interpreter. This provides access to most of PyO3's APIs. | +| `Py` | A Python object not connected to any lifetime of attachment to the Python interpreter. This can be sent to other threads. | | `PyRef` | A `#[pyclass]` borrowed immutably. | | `PyRefMut` | A `#[pyclass]` borrowed mutably. | diff --git a/guide/src/faq.md b/guide/src/faq.md index e00bec71f76..3743a7fad72 100644 --- a/guide/src/faq.md +++ b/guide/src/faq.md @@ -166,7 +166,8 @@ print(f"a: {a}\nb: {b}") a: b: ``` -The downside to this approach is that any Rust code working on the `Outer` struct now has to acquire the GIL to do anything with its field. +The downside to this approach is that any Rust code working on the `Outer` struct potentially has to attach to the Python interpreter to do anything with the `inner` field. (If `Inner` is `#[pyclass(frozen)]` and implements `Sync`, then `Py::get` +may be used to access the `Inner` contents from `Py` without needing to attach to the interpreter.) ## I want to use the `pyo3` crate re-exported from dependency but the proc-macros fail! diff --git a/guide/src/features.md b/guide/src/features.md index f9706b586b5..efb0b90646b 100644 --- a/guide/src/features.md +++ b/guide/src/features.md @@ -67,11 +67,11 @@ This is a first step towards adding first-class support for generating type anno ### `py-clone` -This feature was introduced to ease migration. It was found that delayed reference counts cannot be made sound and hence `Clon`ing an instance of `Py` must panic without the GIL being held. To avoid migrations introducing new panics without warning, the `Clone` implementation itself is now gated behind this feature. +This feature was introduced to ease migration. It was found that delayed reference counting (which PyO3 used historically) could not be made sound and hence `Clone`-ing an instance of `Py` is impossible when not attached to Python interpreter (it will panic). To avoid migrations introducing new panics without warning, the `Clone` implementation itself is now gated behind this feature. ### `pyo3_disable_reference_pool` -This is a performance-oriented conditional compilation flag, e.g. [set via `$RUSTFLAGS`][set-configuration-options], which disabled the global reference pool and the associated overhead for the crossing the Python-Rust boundary. However, if enabled, `Drop`ping an instance of `Py` without the GIL being held will abort the process. +This is a performance-oriented conditional compilation flag, e.g. [set via `$RUSTFLAGS`][set-configuration-options], which disabled the global reference pool and the associated overhead for the crossing the Python-Rust boundary. However, if enabled, `Drop`ping an instance of `Py` when not attached to the Python interpreter will abort the process. ### `macros` diff --git a/guide/src/free-threading.md b/guide/src/free-threading.md index 317e413a1aa..9c6186a693d 100644 --- a/guide/src/free-threading.md +++ b/guide/src/free-threading.md @@ -1,26 +1,26 @@ # Supporting Free-Threaded CPython -CPython 3.13 introduces an experimental "free-threaded" build of CPython that -does not rely on the [global interpreter -lock](https://docs.python.org/3/glossary.html#term-global-interpreter-lock) -(often referred to as the GIL) for thread safety. As of version 0.23, PyO3 also -has preliminary support for building Rust extensions for the free-threaded -Python build and support for calling into free-threaded Python from Rust. - -If you want more background on free-threaded Python in general, see the [what's -new](https://docs.python.org/3/whatsnew/3.13.html#whatsnew313-free-threaded-cpython) -entry in the 3.13 release notes, the [free-threading HOWTO -guide](https://docs.python.org/3/howto/free-threading-extensions.html#freethreading-extensions-howto) -in the CPython docs, the [extension porting -guide](https://py-free-threading.github.io/porting-extensions/) in the -community-maintained Python free-threading guide, and [PEP -703](https://peps.python.org/pep-0703/), which provides the technical background +CPython 3.14 declared support for the "free-threaded" build of CPython that +does not rely on the [global interpreter lock](https://docs.python.org/3/glossary.html#term-global-interpreter-lock) +(often referred to as the GIL) for thread safety. Since version 0.23, PyO3 +supports building Rust extensions for the free-threaded Python build and +calling into free-threaded Python from Rust. + +If you want more background on free-threaded Python in general, see the +[what's new](https://docs.python.org/3/whatsnew/3.13.html#whatsnew313-free-threaded-cpython) +entry in the 3.13 release notes (when the "free-threaded" build was first added as an experimental +mode), the +[free-threading HOWTO guide](https://docs.python.org/3/howto/free-threading-extensions.html#freethreading-extensions-howto) +in the CPython docs, the +[extension porting guide](https://py-free-threading.github.io/porting-extensions/) +in the community-maintained Python free-threading guide, and +[PEP 703](https://peps.python.org/pep-0703/), which provides the technical background for the free-threading implementation in CPython. -In the GIL-enabled build, the global interpreter lock serializes access to the -Python runtime. The GIL is therefore a fundamental limitation to parallel -scaling of multithreaded Python workflows, due to [Amdahl's -law](https://en.wikipedia.org/wiki/Amdahl%27s_law), because any time spent +In the GIL-enabled build (the only choice before the "free-threaded" build was introduced), +the global interpreter lock serializes access to the Python runtime. The GIL is therefore +a fundamental limitation to parallel scaling of multithreaded Python workflows, due to +[Amdahl's law](https://en.wikipedia.org/wiki/Amdahl%27s_law), because any time spent executing a parallel processing task on only one execution context fundamentally cannot be sped up using parallelism. @@ -123,9 +123,7 @@ free-threaded build. The free-threaded interpreter does not have a GIL. Many existing extensions providing mutable data structures relied on the GIL to lock Python objects and -make interior mutability thread-safe. Historically, PyO3's API was designed -around the same strong assumptions, but is transitioning towards more general -APIs applicable for both builds. +make interior mutability thread-safe. Calling into the CPython C API is only legal when an OS thread is explicitly attached to the interpreter runtime. In the GIL-enabled build, this happens when diff --git a/guide/src/parallelism.md b/guide/src/parallelism.md index a937b49764f..260865eb63e 100644 --- a/guide/src/parallelism.md +++ b/guide/src/parallelism.md @@ -1,8 +1,15 @@ # Parallelism -CPython has the infamous [Global Interpreter Lock](https://docs.python.org/3/glossary.html#term-global-interpreter-lock) (GIL), which prevents several threads from executing Python bytecode in parallel. This makes threading in Python a bad fit for [CPU-bound](https://en.wikipedia.org/wiki/CPU-bound) tasks and often forces developers to accept the overhead of multiprocessing. There is an experimental "free-threaded" version of CPython 3.13 that does not have a GIL, see the PyO3 docs on [free-threaded Python](./free-threading.md) for more information about that. +Historically, CPython was limited by the [global interpreter lock](https://docs.python.org/3/glossary.html#term-global-interpreter-lock) (GIL), which only allowed a single thread to drive the Python interpreter at a time. This made threading in Python a bad fit for [CPU-bound](https://en.wikipedia.org/wiki/CPU-bound) tasks and often forced developers to accept the overhead of multiprocessing. + +Rust is well-suited to multithreaded code, and libraries like [`rayon`] can help you leverage safe parallelism with minimal effort. The [`Python::detach`] method can be used to allow the Python interpreter to do other work while the Rust work is ongoing. + +To enable full parallelism in your application, consider also using [free-threaded Python](./free-threading.md) which is supported since Python 3.14. + +## Parallelism under the Python GIL + +Let's take a look at our [word-count](https://github.com/PyO3/pyo3/blob/main/examples/word-count/src/lib.rs) example, where we have a `search` function that utilizes the [`rayon`] crate to count words in parallel. -In PyO3 parallelism can be easily achieved in Rust-only code. Let's take a look at our [word-count](https://github.com/PyO3/pyo3/blob/main/examples/word-count/src/lib.rs) example, where we have a `search` function that utilizes the [rayon](https://github.com/rayon-rs/rayon) crate to count words in parallel. ```rust,no_run # #![allow(dead_code)] use pyo3::prelude::*; @@ -32,6 +39,7 @@ fn search(contents: &str, needle: &str) -> usize { ``` But let's assume you have a long running Rust function which you would like to execute several times in parallel. For the sake of example let's take a sequential version of the word count: + ```rust,no_run # #![allow(dead_code)] # fn count_line(line: &str, needle: &str) -> usize { @@ -175,3 +183,4 @@ collecting the results from the worker threads. You should always call cases where worker threads need to acquire the GIL, to prevent deadlocks. [`Python::detach`]: {{#PYO3_DOCS_URL}}/pyo3/marker/struct.Python.html#method.detach +[`rayon`]: https://github.com/rayon-rs/rayon diff --git a/guide/src/performance.md b/guide/src/performance.md index 998d1f8b940..85b6c337ffd 100644 --- a/guide/src/performance.md +++ b/guide/src/performance.md @@ -98,6 +98,7 @@ impl PartialEq for FooBound<'_> { ``` ## Calling Python callables (`__call__`) + CPython support multiple calling protocols: [`tp_call`] and [`vectorcall`]. [`vectorcall`] is a more efficient protocol unlocking faster calls. PyO3 will try to dispatch Python `call`s using the [`vectorcall`] calling convention to archive maximum performance if possible and falling back to [`tp_call`] otherwise. This is implemented using the (internal) `PyCallArgs` trait. It defines how Rust types can be used as Python `call` arguments. This trait is currently implemented for @@ -110,6 +111,18 @@ Rust tuples may make use of [`vectorcall`] where as `Bound<'_, PyTuple>` and `Py [`tp_call`]: https://docs.python.org/3/c-api/call.html#the-tp-call-protocol [`vectorcall`]: https://docs.python.org/3/c-api/call.html#the-vectorcall-protocol +## Detach from the interpreter for long-running Rust-only work + +When executing Rust code which does not need to interact with the Python interpreter, use [`Python::detach`] to allow the Python interpreter to proceed without waiting for the current thread. + +On the GIL-enabled build, this is crucial for best performance as only a single thread may ever be attached at a time. + +On the free-threaded build, this is still best practice as there are several "stop the world" events (such as garbage collection) where all threads attached to the Python interpreter are forced to wait. + +As a rule of thumb, attaching and detaching from the Python interpreter takes less than a millisecond, so any work which is expected to take multiple milliseconds can likely benefit from detaching from the interpreter. + +[`Python::detach`]: {{#PYO3_DOCS_URL}}/pyo3/marker/struct.Python.html#method.detach + ## Disable the global reference pool PyO3 uses global mutable state to keep track of deferred reference count updates implied by `impl Drop for Py` being called without being attached to the interpreter. The necessary synchronization to obtain and apply these reference count updates when PyO3-based code next attaches to the interpreter is somewhat expensive and can become a significant part of the cost of crossing the Python-Rust boundary. diff --git a/guide/src/python-from-rust.md b/guide/src/python-from-rust.md index 7ab39161e70..74d530041d4 100644 --- a/guide/src/python-from-rust.md +++ b/guide/src/python-from-rust.md @@ -12,13 +12,14 @@ The subchapters also cover the following topics: ## The `'py` lifetime -To safely interact with the Python interpreter a Rust thread must have a corresponding Python thread state and hold the [Global Interpreter Lock (GIL)](#the-global-interpreter-lock). PyO3 has a `Python<'py>` token that is used to prove that these conditions -are met. Its lifetime `'py` is a central part of PyO3's API. +To safely interact with the Python interpreter a Rust thread must be [attached] to the Python interpreter. +PyO3 has a `Python<'py>` token that is used to prove that these conditions are met. +Its lifetime `'py` is a central part of PyO3's API. The `Python<'py>` token serves three purposes: * It provides global APIs for the Python interpreter, such as [`py.eval()`][eval] and [`py.import()`][import]. -* It can be passed to functions that require a proof of holding the GIL, such as [`Py::clone_ref`][clone_ref]. +* It can be passed to functions that require a proof of attachment, such as [`Py::clone_ref`][clone_ref]. * Its lifetime `'py` is used to bind many of PyO3's types to the Python interpreter, such as [`Bound<'py, T>`][Bound]. PyO3's types that are bound to the `'py` lifetime, for example `Bound<'py, T>`, all contain a `Python<'py>` token. This means they have full access to the Python interpreter and offer a complete API for interacting with Python objects. @@ -27,9 +28,13 @@ Consult [PyO3's API documentation][obtaining-py] to learn how to acquire one of ### The Global Interpreter Lock -Concurrent programming in Python is aided by the Global Interpreter Lock (GIL), which ensures that only one Python thread can use the Python interpreter and its API at the same time. This allows it to be used to synchronize code. See the [`pyo3::sync`] module for synchronization tools PyO3 offers that are based on the GIL's guarantees. +Prior to the introduction of free-threaded Python (first available in 3.13, fully supported in 3.14), the Python interpreter was made thread-safe by the [global interpreter lock]. +This ensured that only one Python thread can use the Python interpreter and its API at the same time. +Historically, Rust code was able to use the GIL as a synchronization guarantee, but the introduction of free-threaded Python removed this possibility. -Non-Python operations (system calls and native Rust code) can unlock the GIL. See [the section on parallelism](parallelism.md) for how to do that using PyO3's API. +The [`pyo3::sync`] module offers synchronization tools which abstract over both Python builds. + +To enable any parallelism on the GIL-enabled build, and best throughput on the free-threaded build, non-Python operations (system calls and native Rust code) should consider detaching from the Python interpreter to allow other work to proceed. See [the section on parallelism](parallelism.md) for how to do that using PyO3's API. ## Python's memory model @@ -41,6 +46,8 @@ PyO3's API reflects this by providing [smart pointer][smart-pointers] types, `Py Because of the lack of exclusive `&mut` references, PyO3's APIs for Python objects, for example [`PyListMethods::append`], use shared references. This is safe because Python objects have internal mechanisms to prevent data races (as of time of writing, the Python GIL). +[attached]: https://docs.python.org/3.14/glossary.html#term-attached-thread-state +[global interpreter lock]: https://docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock [smart-pointers]: https://doc.rust-lang.org/book/ch15-00-smart-pointers.html [obtaining-py]: {{#PYO3_DOCS_URL}}/pyo3/marker/struct.Python.html#obtaining-a-python-token [`pyo3::sync`]: {{#PYO3_DOCS_URL}}/pyo3/sync/index.html diff --git a/pyo3-benches/Cargo.toml b/pyo3-benches/Cargo.toml index 7cf5de3768d..982b21c1c5c 100644 --- a/pyo3-benches/Cargo.toml +++ b/pyo3-benches/Cargo.toml @@ -23,6 +23,10 @@ hashbrown = "0.15" name = "bench_any" harness = false +[[bench]] +name = "bench_attach" +harness = false + [[bench]] name = "bench_call" harness = false @@ -47,10 +51,6 @@ harness = false name = "bench_frompyobject" harness = false -[[bench]] -name = "bench_gil" -harness = false - [[bench]] name = "bench_intopyobject" harness = false diff --git a/pyo3-benches/benches/bench_gil.rs b/pyo3-benches/benches/bench_attach.rs similarity index 70% rename from pyo3-benches/benches/bench_gil.rs rename to pyo3-benches/benches/bench_attach.rs index b8fc8496225..745a7555b89 100644 --- a/pyo3-benches/benches/bench_gil.rs +++ b/pyo3-benches/benches/bench_attach.rs @@ -2,20 +2,20 @@ use codspeed_criterion_compat::{criterion_group, criterion_main, Bencher, Criter use pyo3::prelude::*; -fn bench_clean_acquire_gil(b: &mut Bencher<'_>) { +fn bench_clean_attach(b: &mut Bencher<'_>) { // Acquiring first GIL will also create a "clean" GILPool, so this measures the Python overhead. b.iter(|| Python::attach(|_| {})); } -fn bench_dirty_acquire_gil(b: &mut Bencher<'_>) { +fn bench_dirty_attach(b: &mut Bencher<'_>) { let obj = Python::attach(|py| py.None()); // Drop the returned clone of the object so that the reference pool has work to do. b.iter(|| Python::attach(|py| obj.clone_ref(py))); } fn criterion_benchmark(c: &mut Criterion) { - c.bench_function("clean_acquire_gil", bench_clean_acquire_gil); - c.bench_function("dirty_acquire_gil", bench_dirty_acquire_gil); + c.bench_function("clean_attach", bench_clean_attach); + c.bench_function("dirty_attach", bench_dirty_attach); } criterion_group!(benches, criterion_benchmark); diff --git a/src/sync.rs b/src/sync.rs index c1e3ff177e2..8557d83cd30 100644 --- a/src/sync.rs +++ b/src/sync.rs @@ -1,9 +1,14 @@ -//! Synchronization mechanisms based on the Python GIL. +//! Synchronization mechanisms which are aware of the existence of the Python interpreter. //! -//! With the acceptance of [PEP 703] (aka a "freethreaded Python") for Python 3.13, these -//! are likely to undergo significant developments in the future. +//! The Python interpreter has multiple "stop the world" situations which may block threads, such as +//! - The Python global interpreter lock (GIL), on GIL-enabled builds of Python, or +//! - The Python garbage collector (GC), which pauses attached threads during collection. //! -//! [PEP 703]: https://peps.python.org/pep-703/ +//! To avoid deadlocks in these cases, threads should take care to be detached from the Python interpreter +//! before performing operations which might block waiting for other threads attached to the Python +//! interpreter. +//! +//! This module provides synchronization primitives which are able to synchronize under these conditions. use crate::{ internal::state::SuspendAttach, sealed::Sealed,