Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for cleaning up std's thread_local implementation details #110897

Open
20 of 24 tasks
m-ou-se opened this issue Apr 27, 2023 · 14 comments
Open
20 of 24 tasks

Tracking issue for cleaning up std's thread_local implementation details #110897

m-ou-se opened this issue Apr 27, 2023 · 14 comments
Assignees
Labels
A-thread-locals Area: Thread local storage (TLS) C-cleanup Category: PRs that clean code up or issues documenting cleanup. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@m-ou-se
Copy link
Member

m-ou-se commented Apr 27, 2023

std::thread_local, std::thread::local, std::thread::local_impl, std::sys_common::thread_local_key, std::sys_common::thread_local_dtor, std::sys::thread_local_key, etc. etc. are all messy and form quite a confusing maze. Just look at this map I tried to draw of it all:

tlmap

AAAaaaaAaa

Time to clean it up.

And maybe also fix some bugs while we're at it. ^^


Related issues to solve, maybe:

@m-ou-se m-ou-se added C-cleanup Category: PRs that clean code up or issues documenting cleanup. A-thread-locals Area: Thread local storage (TLS) T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Apr 27, 2023
@m-ou-se m-ou-se self-assigned this Apr 27, 2023
@joboet
Copy link
Member

joboet commented Apr 27, 2023

LazyKeyInner unfortunately can't just be OnceCell because of reentrant initialization: Because OnceCell::get_or_init returns a shared reference to the data that lives for the duration of self, it cannot change the value after the first inner initialization. LazyKeyInner can do that (and does) because references cannot outlive LocalKey::with.

@m-ou-se
Copy link
Member Author

m-ou-se commented Apr 27, 2023

@joboet In what situation would that mem::replace call replace a Some instead of a None? Is that the right behaviour? Do we have a test for that situation?

@joboet
Copy link
Member

joboet commented Apr 27, 2023

It's very cursed, but this works. I don't think it's currently tested, however.

@m-ou-se
Copy link
Member Author

m-ou-se commented Apr 27, 2023

That just seems like a bug to me. A (thread local) static &'static str (not mut, not a cell) shouldn't be able to have different values at different times (on the same thread).

@joboet
Copy link
Member

joboet commented Apr 27, 2023

Maybe 😄? It's sound however, as there can be no references to the data from the first initialization when the new value is written.

@m-ou-se
Copy link
Member Author

m-ou-se commented Apr 27, 2023

It's sound

Sure, but so would OnceCell be. ^^ And I think users should be able to safely assume that an immutable static never changes. (And as a bonus we get to delete code.)

@m-ou-se
Copy link
Member Author

m-ou-se commented Apr 27, 2023

I did some software archeology and found the original issue and PR that introduced that mem::replace call: #30228 and #30267.

The issue was originally about thread locals, but then most of the discussion was about drop on assignment in general. The issue description suggests:

It should either drop one of the two values (perhaps after putting the TLS slot in "destructor being called" mode) or panic, if a multiple initialization condition is detected.

Without a preference for any of these solutions. The PR also doesn't include a reason for picking one solution over the other.

So I don't think the mem::replace behaviour was purposely picked over another solution (like dropping the other value or panicking).

@workingjubilee
Copy link
Member

"We've all come to rely on undocumented endpoints in your API." "And this gives you power over me?"

JohnTitor added a commit to JohnTitor/rust that referenced this issue Apr 28, 2023
…-key, r=cuviper

Remove unused std::sys_common::thread_local_key::Key

Part of rust-lang#110897

This `Key` type seems unused. Let's remove it and see if anything explodes. :)
@m-ou-se
Copy link
Member Author

m-ou-se commented Apr 28, 2023

In this case I don't think anyone actually relies on LazyKeyInner being a OnceOrMaybeTwiceIfYouTryReallyHardCell instead of just a OnceCell. ^^

@joboet
Copy link
Member

joboet commented Apr 28, 2023

I'd advocate for a crater run, just in case, but yes.

@workingjubilee
Copy link
Member

It is extremely unlikely someone is going to have tests that stress this exact point sufficiently for it to be revealed.

@joboet
Copy link
Member

joboet commented Apr 29, 2023

Ok, yes, it's actually part of the documentation. Nevermind, then.

@thomcc
Copy link
Member

thomcc commented Jun 5, 2023

Thanks for this, I've been very unhappy with the thread_local! impl for a long time (@ChrisDenton has had to listen to me gripe about it many times). It's something I was hoping to get to back last fall, but then got a dayjob that takes up much of my time. I think you hit the big issues. Here are some other ones

  1. Fixing our (mis)use of __dso_handle/__cxa_thread_atexit_impl (thread_local! dtor registration can pass wrong __dso_handle if dynamically linked #88737). That issue has a lot of writing and some of it is made with incomplete understanding of the issue (I think), so I've summarized in a gist: https://gist.github.com/thomcc/8c4a8003cf1a282949a539e899cb45aa (initially it was going to be part of this comment).

  2. Lots of code does unnecessary reads from #[thread_local]s, which, while generally fast, can be very slow in cases like dynamic libraries (where it has to call a function that sometimes takes a lock, walks linked lists, etc).

    Some of this is just a quirk of the compiler (https://doc.rust-lang.org/nightly/src/std/sys/common/thread_local/fast_local.rs.html#182-198), but our "fast" impl actually uses two #[thread_local] variables now, which guarantees such behavior (that said, I only realized this yesterday and intend to fix it today).

  3. Related to that code, the non-const version ends up pessimizing things (especially the !needs_drop case) to avoid extra TLS reads on macOS:

    // `try_initialize` is only called once per fast thread local variable,
    .

    This might be fixed in LLVM now, and if not could be worked around by using either compiler_fence or (if all else fails) black_box.

    (Alternatively, it's possible for us to do semi-dubious things with the fact that std is allowed to unstabley use needs_drop<T>() as a const fn...)

There are a few others I had written down (multiple implementations of the manual tls-dtor list) but you touch on them and it might be outdated given #109858.

thomcc pushed a commit to tcdi/postgrestd that referenced this issue Jul 18, 2023
…uviper

Remove unused std::sys_common::thread_local_key::Key

Part of rust-lang/rust#110897

This `Key` type seems unused. Let's remove it and see if anything explodes. :)
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Oct 15, 2023
…n, r=dtolnay

Broaden the consequences of recursive TLS initialization

This PR updates the documentation of `LocalKey` to clearly disallow the behaviour described in [this comment](rust-lang#110897 (comment)). This allows using `OnceCell` for the lazy initialization of TLS variables, which panics on reentrant initialization instead of updating the value like TLS variables currently do.

`@rustbot` label +T-libs-api
r? `@m-ou-se`
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Oct 15, 2023
…n, r=dtolnay

Broaden the consequences of recursive TLS initialization

This PR updates the documentation of `LocalKey` to clearly disallow the behaviour described in [this comment](rust-lang#110897 (comment)). This allows using `OnceCell` for the lazy initialization of TLS variables, which panics on reentrant initialization instead of updating the value like TLS variables currently do.

``@rustbot`` label +T-libs-api
r? ``@m-ou-se``
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Oct 15, 2023
Rollup merge of rust-lang#116172 - joboet:recursive_tls_initialization, r=dtolnay

Broaden the consequences of recursive TLS initialization

This PR updates the documentation of `LocalKey` to clearly disallow the behaviour described in [this comment](rust-lang#110897 (comment)). This allows using `OnceCell` for the lazy initialization of TLS variables, which panics on reentrant initialization instead of updating the value like TLS variables currently do.

``@rustbot`` label +T-libs-api
r? ``@m-ou-se``
bors added a commit to rust-lang-ci/rust that referenced this issue Mar 16, 2024
Rewrite native thread-local storage

(part of rust-lang#110897)

The current native thread-local storage implementation has become quite messy, uses indescriptive names and unnecessarily adds code to the macro expansion. This PR tries to fix that by using a new implementation that also allows more layout optimizations and potentially increases performance by eliminating unnecessary TLS accesses.

This does not change the recursive initialization behaviour I described in [this comment](rust-lang#110897 (comment)), so it should be a library-only change. Changing that behaviour should be quite easy now, however.

r? `@m-ou-se`
`@rustbot` label +T-libs
bors added a commit to rust-lang-ci/rust that referenced this issue Mar 18, 2024
Rewrite native thread-local storage

(part of rust-lang#110897)

The current native thread-local storage implementation has become quite messy, uses indescriptive names and unnecessarily adds code to the macro expansion. This PR tries to fix that by using a new implementation that also allows more layout optimizations and potentially increases performance by eliminating unnecessary TLS accesses.

This does not change the recursive initialization behaviour I described in [this comment](rust-lang#110897 (comment)), so it should be a library-only change. Changing that behaviour should be quite easy now, however.

r? `@m-ou-se`
`@rustbot` label +T-libs
bors added a commit to rust-lang-ci/rust that referenced this issue Mar 18, 2024
Rewrite native thread-local storage

(part of rust-lang#110897)

The current native thread-local storage implementation has become quite messy, uses indescriptive names and unnecessarily adds code to the macro expansion. This PR tries to fix that by using a new implementation that also allows more layout optimizations and potentially increases performance by eliminating unnecessary TLS accesses.

This does not change the recursive initialization behaviour I described in [this comment](rust-lang#110897 (comment)), so it should be a library-only change. Changing that behaviour should be quite easy now, however.

r? `@m-ou-se`
`@rustbot` label +T-libs
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this issue Mar 20, 2024
Simplify key-based thread locals

This PR simplifies key-based thread-locals by:
* unifying the macro expansion of `const` and non-`const` initializers
* reducing the amount of code in the expansion
* simply reallocating on recursive initialization instead of going through `LazyKeyInner`
* replacing `catch_unwind` with the shared `abort_on_dtor_unwind`

It does not change the initialization behaviour described in rust-lang#110897.
bors added a commit to rust-lang-ci/rust that referenced this issue May 23, 2024
Rewrite native thread-local storage

(part of rust-lang#110897)

The current native thread-local storage implementation has become quite messy, uses indescriptive names and unnecessarily adds code to the macro expansion. This PR tries to fix that by using a new implementation that also allows more layout optimizations and potentially increases performance by eliminating unnecessary TLS accesses.

This does not change the recursive initialization behaviour I described in [this comment](rust-lang#110897 (comment)), so it should be a library-only change. Changing that behaviour should be quite easy now, however.

r? `@m-ou-se`
`@rustbot` label +T-libs
bors added a commit to rust-lang-ci/rust that referenced this issue May 24, 2024
Rewrite TLS on platforms without threads

The saga of rust-lang#110897 continues!

r? `@m-ou-se` if you have time
github-actions bot pushed a commit to rust-lang/miri that referenced this issue May 24, 2024
Rewrite native thread-local storage

(part of #110897)

The current native thread-local storage implementation has become quite messy, uses indescriptive names and unnecessarily adds code to the macro expansion. This PR tries to fix that by using a new implementation that also allows more layout optimizations and potentially increases performance by eliminating unnecessary TLS accesses.

This does not change the recursive initialization behaviour I described in [this comment](rust-lang/rust#110897 (comment)), so it should be a library-only change. Changing that behaviour should be quite easy now, however.

r? `@m-ou-se`
`@rustbot` label +T-libs
bors added a commit to rust-lang-ci/rust that referenced this issue May 24, 2024
Simplify key-based thread locals

This PR simplifies key-based thread-locals by:
* unifying the macro expansion of `const` and non-`const` initializers
* reducing the amount of code in the expansion
* simply reallocating on recursive initialization instead of going through `LazyKeyInner`
* replacing `catch_unwind` with the shared `abort_on_dtor_unwind`

It does not change the initialization behaviour described in rust-lang#110897.
flip1995 pushed a commit to flip1995/rust-clippy that referenced this issue May 24, 2024
Rewrite native thread-local storage

(part of #110897)

The current native thread-local storage implementation has become quite messy, uses indescriptive names and unnecessarily adds code to the macro expansion. This PR tries to fix that by using a new implementation that also allows more layout optimizations and potentially increases performance by eliminating unnecessary TLS accesses.

This does not change the recursive initialization behaviour I described in [this comment](rust-lang/rust#110897 (comment)), so it should be a library-only change. Changing that behaviour should be quite easy now, however.

r? `@m-ou-se`
`@rustbot` label +T-libs
joboet added a commit to joboet/rust that referenced this issue Jun 15, 2024
As discovered by Mara in rust-lang#110897, our TLS implementation is a total mess. In the past months, I have simplified the actual macros and their expansions, but the majority of the complexity comes from the platform-specific support code needed to create keys and register destructors. In keeping with rust-lang#117276, I have therefore moved all of the `thread_local_key`/`thread_local_dtor` functions to the `thread_local` module in `sys` and given them a new structure, so that future porters of `std` can simply mix-and-match the existing code instead of having to copy the same (bad) implementation everywhere. The new structure should become obvious when looking at `sys/thread_local/mod.rs`.

Unfortunately, the documentation changes associated with the refactoring have made this PR rather large. That said, this contains no functional changes except for two small ones:
* the key-based destructor fallback now, by virtue of sharing the implementation used by macOS and others, stores its list in a `#[thread_local]` static instead of in the key, eliminating one indirection layer and drastically simplifying its code.
* I've switched over ZKVM (tier 3) to use the same implementation as WebAssembly, as the implementation was just a way worse version of that

Please let me know if I can make this easier to review! I know these large PRs aren't optimal, but I couldn't think of any good intermediate steps.

@rustbot label +A-thread-locals
joboet added a commit to joboet/rust that referenced this issue Jun 15, 2024
As discovered by Mara in rust-lang#110897, our TLS implementation is a total mess. In the past months, I have simplified the actual macros and their expansions, but the majority of the complexity comes from the platform-specific support code needed to create keys and register destructors. In keeping with rust-lang#117276, I have therefore moved all of the `thread_local_key`/`thread_local_dtor` modules to the `thread_local` module in `sys` and merged them into a new structure, so that future porters of `std` can simply mix-and-match the existing code instead of having to copy the same (bad) implementation everywhere. The new structure should become obvious when looking at `sys/thread_local/mod.rs`.

Unfortunately, the documentation changes associated with the refactoring have made this PR rather large. That said, this contains no functional changes except for two small ones:
* the key-based destructor fallback now, by virtue of sharing the implementation used by macOS and others, stores its list in a `#[thread_local]` static instead of in the key, eliminating one indirection layer and drastically simplifying its code.
* I've switched over ZKVM (tier 3) to use the same implementation as WebAssembly, as the implementation was just a way worse version of that

Please let me know if I can make this easier to review! I know these large PRs aren't optimal, but I couldn't think of any good intermediate steps.

@rustbot label +A-thread-locals
compiler-errors added a commit to compiler-errors/rust that referenced this issue Jun 23, 2024
… r=Mark-Simulacrum

std: refactor the TLS implementation

As discovered by Mara in rust-lang#110897, our TLS implementation is a total mess. In the past months, I have simplified the actual macros and their expansions, but the majority of the complexity comes from the platform-specific support code needed to create keys and register destructors. In keeping with rust-lang#117276, I have therefore moved all of the `thread_local_key`/`thread_local_dtor` modules to the `thread_local` module in `sys` and merged them into a new structure, so that future porters of `std` can simply mix-and-match the existing code instead of having to copy the same (bad) implementation everywhere. The new structure should become obvious when looking at `sys/thread_local/mod.rs`.

Unfortunately, the documentation changes associated with the refactoring have made this PR rather large. That said, this contains no functional changes except for two small ones:
* the key-based destructor fallback now, by virtue of sharing the implementation used by macOS and others, stores its list in a `#[thread_local]` static instead of in the key, eliminating one indirection layer and drastically simplifying its code.
* I've switched over ZKVM (tier 3) to use the same implementation as WebAssembly, as the implementation was just a way worse version of that

Please let me know if I can make this easier to review! I know these large PRs aren't optimal, but I couldn't think of any good intermediate steps.

`@rustbot` label +A-thread-locals
bors added a commit to rust-lang-ci/rust that referenced this issue Jun 24, 2024
…=Mark-Simulacrum

std: refactor the TLS implementation

As discovered by Mara in rust-lang#110897, our TLS implementation is a total mess. In the past months, I have simplified the actual macros and their expansions, but the majority of the complexity comes from the platform-specific support code needed to create keys and register destructors. In keeping with rust-lang#117276, I have therefore moved all of the `thread_local_key`/`thread_local_dtor` modules to the `thread_local` module in `sys` and merged them into a new structure, so that future porters of `std` can simply mix-and-match the existing code instead of having to copy the same (bad) implementation everywhere. The new structure should become obvious when looking at `sys/thread_local/mod.rs`.

Unfortunately, the documentation changes associated with the refactoring have made this PR rather large. That said, this contains no functional changes except for two small ones:
* the key-based destructor fallback now, by virtue of sharing the implementation used by macOS and others, stores its list in a `#[thread_local]` static instead of in the key, eliminating one indirection layer and drastically simplifying its code.
* I've switched over ZKVM (tier 3) to use the same implementation as WebAssembly, as the implementation was just a way worse version of that

Please let me know if I can make this easier to review! I know these large PRs aren't optimal, but I couldn't think of any good intermediate steps.

`@rustbot` label +A-thread-locals
bors added a commit to rust-lang-ci/rust that referenced this issue Jun 24, 2024
…=Mark-Simulacrum

std: refactor the TLS implementation

As discovered by Mara in rust-lang#110897, our TLS implementation is a total mess. In the past months, I have simplified the actual macros and their expansions, but the majority of the complexity comes from the platform-specific support code needed to create keys and register destructors. In keeping with rust-lang#117276, I have therefore moved all of the `thread_local_key`/`thread_local_dtor` modules to the `thread_local` module in `sys` and merged them into a new structure, so that future porters of `std` can simply mix-and-match the existing code instead of having to copy the same (bad) implementation everywhere. The new structure should become obvious when looking at `sys/thread_local/mod.rs`.

Unfortunately, the documentation changes associated with the refactoring have made this PR rather large. That said, this contains no functional changes except for two small ones:
* the key-based destructor fallback now, by virtue of sharing the implementation used by macOS and others, stores its list in a `#[thread_local]` static instead of in the key, eliminating one indirection layer and drastically simplifying its code.
* I've switched over ZKVM (tier 3) to use the same implementation as WebAssembly, as the implementation was just a way worse version of that

Please let me know if I can make this easier to review! I know these large PRs aren't optimal, but I couldn't think of any good intermediate steps.

`@rustbot` label +A-thread-locals
@robertbastian
Copy link
Contributor

This might have caused rust-lang/cargo#14889

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Jun 29, 2024
std: separate TLS key creation from TLS access

Currently, `std` performs an atomic load to get the OS key on every access to `StaticKey` even when the key is already known. This PR thus replaces `StaticKey` with the platform-specific `get` and `set` function and a new `LazyKey` type that acts as a `LazyLock<Key>`, allowing the reuse of the retreived key for multiple accesses.

Related to rust-lang#110897.
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Jun 29, 2024
Rollup merge of rust-lang#126953 - joboet:lazy_key, r=jhpratt

std: separate TLS key creation from TLS access

Currently, `std` performs an atomic load to get the OS key on every access to `StaticKey` even when the key is already known. This PR thus replaces `StaticKey` with the platform-specific `get` and `set` function and a new `LazyKey` type that acts as a `LazyLock<Key>`, allowing the reuse of the retreived key for multiple accesses.

Related to rust-lang#110897.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-thread-locals Area: Thread local storage (TLS) C-cleanup Category: PRs that clean code up or issues documenting cleanup. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants