From 2acdaebc69be78a5d5dd7923adfd0b4a2ebe2dfd Mon Sep 17 00:00:00 2001 From: Mara Bos Date: Wed, 22 May 2024 13:51:05 +0200 Subject: [PATCH 1/9] Add thread_spawn_hook rfc. --- text/3642-thread-spawn-hook.md | 238 +++++++++++++++++++++++++++++++++ 1 file changed, 238 insertions(+) create mode 100644 text/3642-thread-spawn-hook.md diff --git a/text/3642-thread-spawn-hook.md b/text/3642-thread-spawn-hook.md new file mode 100644 index 00000000000..51a5122fad6 --- /dev/null +++ b/text/3642-thread-spawn-hook.md @@ -0,0 +1,238 @@ +- Feature Name: `thread_spawn_hook` +- Start Date: 2024-05-22 +- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) +- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) + +# Summary + +Add `std::thread::add_spawn_hook` to register a hook that runs every time a thread spawns. +This will effectively provide us with "inheriting thread locals", a much requested feature. + +```rust +thread_local! { + static MY_THREAD_LOCAL: Cell = Cell::new(0); +} + +std::thread::add_spawn_hook(|_| { + // Get the value of X in the spawning thread. + let value = MY_THREAD_LOCAL.get(); + + Ok(move || { + // Set the value of X in the newly spawned thread. + MY_THREAD_LOCAL.set(value); + }) +}); +``` + +# Motivation + +Thread local variables are often used for scoped "global" state. +For example, a testing framework might store the status or name of the current +unit test in a thread local variable, such that multiple tests can be run in +parallel in the same process. + +However, this information will not be preserved across threads when a unit test +will spawn a new thread, which is problematic. + +The solution seems to be "inheriting thread locals": thread locals that are +automatically inherited by new threads. + +However, adding this property to thread local variables is not easily possible. +Thread locals are initialized lazily. And by the time they are initialized, the +parent thread might have already disappeared, such that there is no value left +to inherit from. +Additionally, even if the parent thread was still alive, there is no way to +access the value in the parent thread without causing race conditions. + +Allowing hooks to be run as part of spawning a thread allows precise control +over how thread locals are "inherited". +One could simply `clone()` them, but one could also add additional information +to them, or even add relevant information to some (global) data structure. + +For example, not only could a custom testing framework keep track of unit test +state even across spawned threads, but a logging/debugging/tracing library could +keeps track of which thread spawned which thread to provide more useful +information to the user. + +# Public Interface + +```rust +// In std::thread: + +/// Registers a function to run for every new thread spawned. +/// +/// The hook is executed in the parent thread, and returns a function +/// that will be executed in the new thread. +/// +/// The hook is called with the `Thread` handle for the new thread. +/// +/// If the hook returns an `Err`, thread spawning is aborted. In that case, the +/// function used to spawn the thread (e.g. `std::thread::spawn`) will return +/// the error returned by the hook. +/// +/// Hooks can only be added, not removed. +/// +/// The hooks will run in order, starting with the most recently added. +/// +/// # Usage +/// +/// ``` +/// std::thread::add_spawn_hook(|_| { +/// ..; // This will run in the parent (spawning) thread. +/// Ok(move || { +/// ..; // This will run it the child (spawned) thread. +/// }) +/// }); +/// ``` +/// +/// # Example +/// +/// ``` +/// thread_local! { +/// static MY_THREAD_LOCAL: Cell = Cell::new(0); +/// } +/// +/// std::thread::add_spawn_hook(|_| { +/// // Get the value of X in the spawning thread. +/// let value = MY_THREAD_LOCAL.get(); +/// +/// Ok(move || { +/// // Set the value of X in the newly spawned thread. +/// MY_THREAD_LOCAL.set(value); +/// }) +/// }); +/// ``` +pub fn add_spawn_hook(hook: F) +where + F: 'static + Sync + Fn(&Thread) -> std::io::Result, + G: 'static + Send + FnOnce(); +``` + +# Implementation + +The implementation could simply be a static `RwLock` with a `Vec` of +(boxed/leaked) `dyn Fn`s, or a simple lock free linked list of hooks. + +Functions that spawn a thread, such as `std::thread::spawn` will eventually call +`spawn_unchecked_`, which will call the hooks in the parent thread, after the +child `Thread` object has been created, but before the child thread has been +spawned. The resulting `FnOnce` objects are stored and passed on to the child +thread afterwards, which will execute them one by one before continuing with its +main function. + +# Downsides + +- The implementation requires allocation for each hook (to store them in the + global list of hooks), and an allocation each time a hook is spawned + (to store the resulting closure). + +- A library that wants to make use of inheriting thread locals will have to + register a global hook, and will need to keep track of whether its hook has + already been added (e.g. in a static `AtomicBool`). + +- The hooks will not run if threads are spawned through e.g. pthread directly, + bypassing the Rust standard library. + (However, this is already the case for output capturing in libtest: + that does not work across threads when not spawned by libstd.) + +# Rationale and alternatives + +## Use of `io::Result`. + +The hook returns an `io::Result` rather than the `FnOnce` directly. +This can be useful for e.g. resource limiting or possible errors while +registering new threads, but makes the signature more complicated. + +An alternative could be to simplify the signature by removing the `io::Result`, +which is fine for most use cases. + +## Global vs thread local effect + +`add_spawn_hook` has a global effect (similar to e.g. libc's `atexit()`), +to keep things simple. + +An alternative could be to store the list of spawn hooks per thread, +that are inherited to by new threads from their parent thread. +That way, a hook added by `add_spawn_hook` will only affect the current thread +and all (direct and indirect) future child threads of the current thread, +not other unrelated threads. + +Both are relatively easy and efficient to implement (as long as removing hooks +is not an option). + +However, the first (global) behavior is conceptually simpler and allows for more +flexibility. Using a global hook, one can still implement the thread local +behavior, but this is not possible the other way around. + +## Add but no remove + +Having only an `add_spawn_hook` but not a `remove_spawn_hook` keeps things +simple, by 1) not needing a global (thread safe) data structure that allows +removing items and 2) not needing a way to identify a specific hook (through a +handle or a name). + +If a hook only needs to execute conditionally, one can make use of an +`if` statement. + +## Requiring storage on spawning + +Because the hooks run on the parent thread first, before the child thread is +spawned, the results of those hooks (the functions to be executed in the child) +need to be stored. This will require heap allocations (although it might be +possible for an optimization to save small objects on the stack up to a certain +size). + +An alternative interface that wouldn't require any store is possible, but has +downsides. Such an interface would spawn the child thread *before* running the +hooks, and allow the hooks to execute a closure on the child (before it moves on +to its main function). That looks roughly like this: + +```rust +std::thread::add_spawn_hook(|child| { + // Get the value on the parent thread. + let value = MY_THREAD_LOCAL.get(); + // Set the value on the child thread. + child.exec(|| MY_THREAD_LOCAL.set(value)); +}); +``` + +This could be implemented without allocations, as the function executed by the +child can now be borrowed from the parent thread. + +However, this means that the parent thread will have to block until the child +thread has been spawned, and block for each hook to be finished on both threads, +significantly slowing down thread creation. + +Considering that spawning a thread involves several allocations and syscalls, +it doesn't seem very useful to try to minimize an extra allocation when that +comes at a significant cost. + +## `impl` vs `dyn` in the signature + +An alternative interface could use `dyn` instead of generics, as follows: + +```rust +pub fn add_spawn_hook( + hook: Box io::Result> + Sync> +); +``` + +However, this mostly has downsides: it requires the user to write `Box::new` in +a few places, and it prevents us from ever implementing some optimization tricks +to, for example, use a single allocation for multiple hook results. + +# Unresolved questions + +- Should the return value of the hook be an `Option`, for when the hook does not + require any code to be run in the child? + +- Should the hook be able to access/configure more information about the child + thread? E.g. set its stack size. + (Note that settings that can be changed afterwards by the child thread, such as + the thread name, can already be set by simply setting it as part of the code + that runs on the child thread.) + +# Future possibilities + +- Using this in libtest for output capturing (instead of today's + implementation that has special hardcoded support in libstd). From 71364c4c00376659e9d1bd6ebbb50540088dc5bc Mon Sep 17 00:00:00 2001 From: Mara Bos Date: Wed, 22 May 2024 15:20:43 +0200 Subject: [PATCH 2/9] Add relevant history. --- text/3642-thread-spawn-hook.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/text/3642-thread-spawn-hook.md b/text/3642-thread-spawn-hook.md index 51a5122fad6..98fcbe0e42e 100644 --- a/text/3642-thread-spawn-hook.md +++ b/text/3642-thread-spawn-hook.md @@ -236,3 +236,10 @@ to, for example, use a single allocation for multiple hook results. - Using this in libtest for output capturing (instead of today's implementation that has special hardcoded support in libstd). + +# Relevant history + +- The original reason I wrote [RFC 3184 "Thread local Cell methods"](https://github.com/rust-lang/rfcs/pull/3184) + was to simplify thread spawn hooks (which I was experimenting with at the time). + Without that RFC, thread spawn hooks would look something like `let v = X.with(|x| x.get()); || X.with(|x| x.set(v))`, instead of just `let v = X.get(); || X.set(v)`, + which is far less ergonomic (and behaves subtly differently). This is the reason I waited with this RFC until that RFC was merged and stabilized. From 1e18872a9030a448700ba9cd294ec40b01435b81 Mon Sep 17 00:00:00 2001 From: Mara Bos Date: Wed, 22 May 2024 16:25:36 +0200 Subject: [PATCH 3/9] Add note on a global registration lang feature. --- text/3642-thread-spawn-hook.md | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/text/3642-thread-spawn-hook.md b/text/3642-thread-spawn-hook.md index 98fcbe0e42e..bf703eff467 100644 --- a/text/3642-thread-spawn-hook.md +++ b/text/3642-thread-spawn-hook.md @@ -221,6 +221,24 @@ However, this mostly has downsides: it requires the user to write `Box::new` in a few places, and it prevents us from ever implementing some optimization tricks to, for example, use a single allocation for multiple hook results. +## A regular function vs some lang feature + +Just like `std::panic::set_hook`, `std::thread::add_spawn_hook` is just regular function. + +An alternative would be to have some special attribute, like `#[thread_spawn_hook]`, +similar to `#[panic_handler]` in `no_std` programs, or to make use of +a potential future [global registration feature](https://github.com/rust-lang/rust/issues/125119). + +While such things might make sense in a `no_std` world, spawning threads (like +panic hooks) is an `std` only feature, where we can use global state and allocations. + +The only potential advantage of such an approach might be a small reduction in overhead, +but this potential overhead is insignificant compared to the overall cost of spwaning a thread. + +The downsides are plenty, including limitations on what your hook can do and return, +needing a macro or special syntax to register a hook, potential issues with dynamic linking, +additional implementation complexity, and possibly having to block on a language feature. + # Unresolved questions - Should the return value of the hook be an `Option`, for when the hook does not From 17da70abf0dea3f41a38130ba4f76cb34b275e00 Mon Sep 17 00:00:00 2001 From: Mara Bos Date: Fri, 7 Jun 2024 15:37:13 +0200 Subject: [PATCH 4/9] Suggest Once instead of AtomicBool. Co-authored-by: Kevin Reid --- text/3642-thread-spawn-hook.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/3642-thread-spawn-hook.md b/text/3642-thread-spawn-hook.md index bf703eff467..f739e792ca3 100644 --- a/text/3642-thread-spawn-hook.md +++ b/text/3642-thread-spawn-hook.md @@ -128,7 +128,7 @@ main function. - A library that wants to make use of inheriting thread locals will have to register a global hook, and will need to keep track of whether its hook has - already been added (e.g. in a static `AtomicBool`). + already been added (e.g. in a static `std::sync::Once`). - The hooks will not run if threads are spawned through e.g. pthread directly, bypassing the Rust standard library. From d9097340edd47abc881bdcf161b4b607a0cdb079 Mon Sep 17 00:00:00 2001 From: Mara Bos Date: Thu, 24 Oct 2024 11:18:00 +0200 Subject: [PATCH 5/9] Update RFC. 1. Make effect thread local. 2. Remove use of io::Error. 3. Add opt-out to thread builder. --- text/3642-thread-spawn-hook.md | 109 +++++++++++++++++++-------------- 1 file changed, 62 insertions(+), 47 deletions(-) diff --git a/text/3642-thread-spawn-hook.md b/text/3642-thread-spawn-hook.md index f739e792ca3..8d9d97c8a98 100644 --- a/text/3642-thread-spawn-hook.md +++ b/text/3642-thread-spawn-hook.md @@ -5,7 +5,7 @@ # Summary -Add `std::thread::add_spawn_hook` to register a hook that runs every time a thread spawns. +Add `std::thread::add_spawn_hook` to register a hook that runs for newly spawned threads. This will effectively provide us with "inheriting thread locals", a much requested feature. ```rust @@ -17,10 +17,8 @@ std::thread::add_spawn_hook(|_| { // Get the value of X in the spawning thread. let value = MY_THREAD_LOCAL.get(); - Ok(move || { - // Set the value of X in the newly spawned thread. - MY_THREAD_LOCAL.set(value); - }) + // Set the value of X in the newly spawned thread. + move || MY_THREAD_LOCAL.set(value) }); ``` @@ -56,21 +54,21 @@ information to the user. # Public Interface +For adding a hook: + ```rust // In std::thread: -/// Registers a function to run for every new thread spawned. +/// Registers a function to run for every newly thread spawned. /// /// The hook is executed in the parent thread, and returns a function /// that will be executed in the new thread. /// /// The hook is called with the `Thread` handle for the new thread. /// -/// If the hook returns an `Err`, thread spawning is aborted. In that case, the -/// function used to spawn the thread (e.g. `std::thread::spawn`) will return -/// the error returned by the hook. -/// -/// Hooks can only be added, not removed. +/// The hook will only be added for the current thread and is inherited by the threads it spawns. +/// In other words, adding a hook has no effect on already running threads (other than the current +/// thread) and the threads they might spawn in the future. /// /// The hooks will run in order, starting with the most recently added. /// @@ -79,39 +77,62 @@ information to the user. /// ``` /// std::thread::add_spawn_hook(|_| { /// ..; // This will run in the parent (spawning) thread. -/// Ok(move || { +/// move || { /// ..; // This will run it the child (spawned) thread. -/// }) +/// } /// }); /// ``` /// /// # Example /// +/// A spawn hook can be used to "inherit" a thread local from the parent thread: +/// /// ``` +/// use std::cell::Cell; +/// /// thread_local! { -/// static MY_THREAD_LOCAL: Cell = Cell::new(0); +/// static X: Cell = Cell::new(0); /// } /// +/// // This needs to be done once in the main thread before spawning any threads. /// std::thread::add_spawn_hook(|_| { /// // Get the value of X in the spawning thread. -/// let value = MY_THREAD_LOCAL.get(); -/// -/// Ok(move || { -/// // Set the value of X in the newly spawned thread. -/// MY_THREAD_LOCAL.set(value); -/// }) +/// let value = X.get(); +/// // Set the value of X in the newly spawned thread. +/// move || X.set(value) /// }); +/// +/// X.set(123); +/// +/// std::thread::spawn(|| { +/// assert_eq!(X.get(), 123); +/// }).join().unwrap(); /// ``` pub fn add_spawn_hook(hook: F) where - F: 'static + Sync + Fn(&Thread) -> std::io::Result, + F: 'static + Sync + Fn(&Thread) -> G, G: 'static + Send + FnOnce(); ``` +And for opting out when spawning a hook: + +```rust +// In std::thread: + +impl Builder { + /// Disables running and inheriting [spawn hooks](add_spawn_hook). + /// + /// Use this if the parent thread is in no way relevant for the child thread. + /// For example, when lazily spawning threads for a thread pool. + pub fn no_hooks(mut self) -> Builder; +} +``` + # Implementation -The implementation could simply be a static `RwLock` with a `Vec` of -(boxed/leaked) `dyn Fn`s, or a simple lock free linked list of hooks. +The implementation is a *thread local* linked list of hooks, which is inherited by newly spawned threads. +This means that adding a hook will only affect the current thread and all (direct and indirect) future child threads of the current thread. +It will not globally affect all already running threads. Functions that spawn a thread, such as `std::thread::spawn` will eventually call `spawn_unchecked_`, which will call the hooks in the parent thread, after the @@ -123,12 +144,12 @@ main function. # Downsides - The implementation requires allocation for each hook (to store them in the - global list of hooks), and an allocation each time a hook is spawned + list of hooks), and an allocation each time a hook is spawned (to store the resulting closure). - A library that wants to make use of inheriting thread locals will have to - register a global hook, and will need to keep track of whether its hook has - already been added (e.g. in a static `std::sync::Once`). + register a global hook (e.g. at the start of `main`), + and will need to keep track of whether its hook has already been added. - The hooks will not run if threads are spawned through e.g. pthread directly, bypassing the Rust standard library. @@ -137,43 +158,37 @@ main function. # Rationale and alternatives -## Use of `io::Result`. - -The hook returns an `io::Result` rather than the `FnOnce` directly. -This can be useful for e.g. resource limiting or possible errors while -registering new threads, but makes the signature more complicated. - -An alternative could be to simplify the signature by removing the `io::Result`, -which is fine for most use cases. - ## Global vs thread local effect -`add_spawn_hook` has a global effect (similar to e.g. libc's `atexit()`), -to keep things simple. +Unlike e.g. libc's `atexit()`, which has a global effect, `add_spawn_hook` has a thread local effect. + +This means that adding a hook will only affect the current thread and all (direct and indirect) future child threads of the current thread. +In other words, adding a hook has no effect on already running threads (other than the current thread) and the threads they might spawn in the future. -An alternative could be to store the list of spawn hooks per thread, -that are inherited to by new threads from their parent thread. -That way, a hook added by `add_spawn_hook` will only affect the current thread -and all (direct and indirect) future child threads of the current thread, -not other unrelated threads. +An alternative could be to have a global set of hooks that affects all newly spawned threads, on any existing and future thread. Both are relatively easy and efficient to implement (as long as removing hooks is not an option). -However, the first (global) behavior is conceptually simpler and allows for more -flexibility. Using a global hook, one can still implement the thread local -behavior, but this is not possible the other way around. +The global behavior was proposed in an earlier version of this RFC, +but the library-api team expressed a preference for exploring a "more local" solution. + +Having a "lexicographically local" solution doesn't seem to be possible other than for scoped threads, however, +since threads can outlive their parent thread and then spawn more threads. + +A thread local effect (affecting all future child threads) seems to be the most "local" behavior we can achieve here. ## Add but no remove Having only an `add_spawn_hook` but not a `remove_spawn_hook` keeps things -simple, by 1) not needing a global (thread safe) data structure that allows -removing items and 2) not needing a way to identify a specific hook (through a +simple, by not needing a way to identify a specific hook (through a handle or a name). If a hook only needs to execute conditionally, one can make use of an `if` statement. +If no hooks should be executed or inherited, one can use `Builder::no_hooks`. + ## Requiring storage on spawning Because the hooks run on the parent thread first, before the child thread is From dd6bfb37da0778eafe3411789029a12d28c82e11 Mon Sep 17 00:00:00 2001 From: Mara Bos Date: Thu, 24 Oct 2024 11:19:49 +0200 Subject: [PATCH 6/9] Fix link to rfc itself. --- text/3642-thread-spawn-hook.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/3642-thread-spawn-hook.md b/text/3642-thread-spawn-hook.md index 8d9d97c8a98..8836064b50d 100644 --- a/text/3642-thread-spawn-hook.md +++ b/text/3642-thread-spawn-hook.md @@ -1,6 +1,6 @@ - Feature Name: `thread_spawn_hook` - Start Date: 2024-05-22 -- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) +- RFC PR: [rust-lang/rfcs#3642](https://github.com/rust-lang/rfcs/pull/3642) - Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) # Summary From 1f89d4551e346728cb9768a188449a7c13b81698 Mon Sep 17 00:00:00 2001 From: Mara Bos Date: Thu, 24 Oct 2024 11:36:04 +0200 Subject: [PATCH 7/9] Require Send+Sync, not just Sync. --- text/3642-thread-spawn-hook.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/text/3642-thread-spawn-hook.md b/text/3642-thread-spawn-hook.md index 8836064b50d..9103fb8cd04 100644 --- a/text/3642-thread-spawn-hook.md +++ b/text/3642-thread-spawn-hook.md @@ -110,7 +110,7 @@ For adding a hook: /// ``` pub fn add_spawn_hook(hook: F) where - F: 'static + Sync + Fn(&Thread) -> G, + F: 'static + Send + Sync + Fn(&Thread) -> G, G: 'static + Send + FnOnce(); ``` @@ -228,7 +228,7 @@ An alternative interface could use `dyn` instead of generics, as follows: ```rust pub fn add_spawn_hook( - hook: Box io::Result> + Sync> + hook: Box io::Result> + Send + Sync> ); ``` From b7994dd0850a895eebb3b74088256171929b0e25 Mon Sep 17 00:00:00 2001 From: Mara Bos Date: Thu, 24 Oct 2024 14:52:04 +0200 Subject: [PATCH 8/9] Remove old io::Result. --- text/3642-thread-spawn-hook.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/3642-thread-spawn-hook.md b/text/3642-thread-spawn-hook.md index 9103fb8cd04..53ae699cdee 100644 --- a/text/3642-thread-spawn-hook.md +++ b/text/3642-thread-spawn-hook.md @@ -228,7 +228,7 @@ An alternative interface could use `dyn` instead of generics, as follows: ```rust pub fn add_spawn_hook( - hook: Box io::Result> + Send + Sync> + hook: Box Box> ); ``` From dde61446d1bcb415f720d779b9ba829b62a88aee Mon Sep 17 00:00:00 2001 From: Mara Bos Date: Mon, 18 Nov 2024 17:07:12 +0100 Subject: [PATCH 9/9] Add tracking issue. --- text/3642-thread-spawn-hook.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/3642-thread-spawn-hook.md b/text/3642-thread-spawn-hook.md index 53ae699cdee..e4838e007e9 100644 --- a/text/3642-thread-spawn-hook.md +++ b/text/3642-thread-spawn-hook.md @@ -1,7 +1,7 @@ - Feature Name: `thread_spawn_hook` - Start Date: 2024-05-22 - RFC PR: [rust-lang/rfcs#3642](https://github.com/rust-lang/rfcs/pull/3642) -- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) +- Rust Issue: [rust-lang/rust#132951](https://github.com/rust-lang/rust/issues/132951) # Summary