Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify TLS key creation on Windows #102103

Closed
wants to merge 1 commit into from

Conversation

joboet
Copy link
Member

@joboet joboet commented Sep 21, 2022

By using a static array instead of a linked list for the storage of destructor function, the creation of new keys does not need to be synchronized. Also, running destructors should be a bit faster because of cache locality (using an atomic counter to store the maximum index of destructors avoids having to iterate over the whole array, assuming keys are mostly dense). This optimization comes at the cost of a minor runtime memory increase, however.

Using a fixed-size array is possible since the total number of TLS keys is capped at 1088. Since TLS keys are always numbers less than 1088, we can just use the value of the key to index into the array.

@rustbot label +O-windows-gnu

@rustbot rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Sep 21, 2022
@rustbot
Copy link
Collaborator

rustbot commented Sep 21, 2022

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • Stabilizing library features
  • Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
  • Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
  • Changing public documentation in ways that create new stability guarantees
  • Changing observable runtime behavior of library APIs

@rust-highfive
Copy link
Collaborator

r? @thomcc

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 21, 2022
@rustbot rustbot added the O-windows-gnu Toolchain: GNU, Operating system: Windows label Sep 21, 2022
@joboet joboet force-pushed the windows_thread_local branch from 8027ba1 to e26bec5 Compare September 21, 2022 15:23
@joboet joboet force-pushed the windows_thread_local branch from e26bec5 to ecca8ce Compare September 21, 2022 16:08
Copy link
Member

@thomcc thomcc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to think a bit about this. In general I'm not sure it's worth hard-coding limits like this, even if they can't change (at least not on existing targets).

CC @ChrisDenton (we've chatted at length about the TLS situation on Windows -- thoughts here?)

// If the destructors are run in a signal handler running after this
// code, we need to guarantee that the changes have been performed
// before the handler is triggered.
compiler_fence(Release);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This micro-optimization doesn't really feel likely to be worth it, and it's plausibly not even totally sound.

We should just use the correct orderings for these atomics -- it really shouldn't make a performance difference and is easier to reason about.

@bjorn3
Copy link
Member

bjorn3 commented Sep 22, 2022

Shouldn't we rather work around the 1088 cap on number of tls keys instead of fixating on it even more? Libstd consumes a fair amount of tls keys by itself (a hello world seems to use 20 keys, might have counted wrong though). I can easily imagine running out of tls keys in complex programs in the future. For example once more libraries start using rust internally and include their own copy of libstd.

@ChrisDenton
Copy link
Member

I don't think TLS link you provided supports the assertion that the maximum is forever capped at 1,088. That is the current maximum but that could conceivably change in the future as it has in the past (it used to be 64). And I don't think the documentation of TlsSetValue makes the case either:

In particular, it succeeds if dwTlsIndex is in the range 0 through (TLS_MINIMUM_AVAILABLE – 1).

TLS_MINIMUM_AVAILABLE is 64, so it's only guaranteeing those values. In contrast, TlsAlloc has this to say:

The value of the TLS index should be treated as an opaque value; do not assume that it is an index into a zero-based array.

So in summary the first 64 TLS keys are guaranteed and a further 1024 are currently possible but this is not a guaranteed upper limit.

Shouldn't we rather work around the 1088 cap on number of tls keys instead of fixating on it even more? Libstd consumes a fair amount of tls keys by itself (a hello world seems to use 20 keys, might have counted wrong though).

If libstd does use that many then, yeah, we should definitely workaround that.

@thomcc
Copy link
Member

thomcc commented Sep 22, 2022

Shouldn't we rather work around the 1088 cap on number of tls keys instead of fixating on it even more? Libstd consumes a fair amount of tls keys by itself (a hello world seems to use 20 keys, might have counted wrong though). I can easily imagine running out of tls keys in complex programs in the future. For example once more libraries start using rust internally and include their own copy of libstd.

20 is surprising. I went through looking for these recently, and didn't find nearly that many. Looking now, I only count 6 that arent part of tests:

  1. Hashmap keys
  2. Thread info
  3. thread panic state (not used until a thread panics).
  4. stdio output capture (only used by libtest).
  5. threadid address byte in recursive mutex.
  6. one in library/backtrace, probably hard to combine with anything else without coupling (altho it should use a const ctor EDIT: it can't because of the backtrace crate's MSRV, and it's not a thing that matters enough to cfg on).

3 and 4 don't use slots in most programs. 5 could be avoided (at the cost of either perf or system specific code), and 1/2 can be combined (at the cost of coupling).

There are also a bunch in library/proc_macro, but I think they don't matter?

For example once more libraries start using rust internally and include their own copy of libstd

I'm not sure it's generally okay to have multiple copies of libstd in your program at the same time. I guess it's probably fine if they're all fully independent.


All that said, I think in practice we hope to eventually have#[cfg(target_thread_local)] enabled for most/all windows targets (disabled by #95429, which might be fixable?), so the hard-coded limit might not matter that much here.

(Although... IIRC a limit like this might still exist for static TLS on windows, so maybe it's still worth thinking about)

@ChrisDenton
Copy link
Member

(Although... IIRC a limit like this might still exist for static TLS on windows, so maybe it's still worth thinking about)

Static TLS is just one big array embedded in the exe. I guess the only limit would be the maximum size of a single section but I've never needed to find out.

@thomcc
Copy link
Member

thomcc commented Sep 22, 2022

In contrast, TlsAlloc has this to say:

The value of the TLS index should be treated as an opaque value; do not assume that it is an index into a zero-based array.

Ah, nice catch. IMO we definitely shouldn't do this then.

@bjorn3
Copy link
Member

bjorn3 commented Sep 22, 2022

I'm not sure it's generally okay to have multiple copies of libstd in your program at the same time. I guess it's probably fine if they're all fully independent.

Indeed. Linking multiple rust staticlib together is not fine as the libsyd symbols are exported outside of the staticlib. Linking multiple cdylib together is fine as it only exports user defined #[no_mangle] symbols.

20 is surprising. I went through looking for these recently, and didn't find nearly that many.

I counted the amount of callers of thread_local_key::StaticKey::lazy_init as proxy, but that may indeed have been wrong.

@joboet
Copy link
Member Author

joboet commented Sep 22, 2022

Ok, so this is definitely not possible then. An alternative solution would be to use fiber local storage, as it allows destructors and can be called even from outside fiber environments (TLS was implemented on top of FLS on older Windows versions, so this is probably a stable guarantee). This is actually the solution used by libc++, so we would be in good company. However, it would be quite surprising for projects using native Windows fibers as a green thread implementation, assuming those exist (I could not find any).

@thomcc
Copy link
Member

thomcc commented Sep 22, 2022

I'm pretty sure there are projects using fibers as green threads, and even if there aren't IMO we should support code that wants to do that (at least, I think this optimization is not worthwhile enough to make a difference there). So, I'm not sure std is the right place for that. ISTM there's just a lot of ways it could go wrong (issues around Thread/ThreadId, the fact that the dtor runs on fiber destruction instead of thread destruction, ...).

I have done it in my own code before though -- it's nice to have a version of TlsAlloc that takes a dtor, it's just only really a good idea if you won't be using fibers for anything else (or know you're using it for a case where it wouldn't matter).

@joboet joboet closed this Sep 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O-windows-gnu Toolchain: GNU, Operating system: Windows S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants