-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify TLS key creation on Windows #102103
Conversation
Hey! It looks like you've submitted a new PR for the library teams! If this PR contains changes to any Examples of
|
r? @thomcc (rust-highfive has picked a reviewer for you, use r? to override) |
8027ba1
to
e26bec5
Compare
e26bec5
to
ecca8ce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to think a bit about this. In general I'm not sure it's worth hard-coding limits like this, even if they can't change (at least not on existing targets).
CC @ChrisDenton (we've chatted at length about the TLS situation on Windows -- thoughts here?)
// If the destructors are run in a signal handler running after this | ||
// code, we need to guarantee that the changes have been performed | ||
// before the handler is triggered. | ||
compiler_fence(Release); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This micro-optimization doesn't really feel likely to be worth it, and it's plausibly not even totally sound.
We should just use the correct orderings for these atomics -- it really shouldn't make a performance difference and is easier to reason about.
Shouldn't we rather work around the 1088 cap on number of tls keys instead of fixating on it even more? Libstd consumes a fair amount of tls keys by itself (a hello world seems to use 20 keys, might have counted wrong though). I can easily imagine running out of tls keys in complex programs in the future. For example once more libraries start using rust internally and include their own copy of libstd. |
I don't think TLS link you provided supports the assertion that the maximum is forever capped at 1,088. That is the current maximum but that could conceivably change in the future as it has in the past (it used to be 64). And I don't think the documentation of TlsSetValue makes the case either:
So in summary the first 64 TLS keys are guaranteed and a further 1024 are currently possible but this is not a guaranteed upper limit.
If libstd does use that many then, yeah, we should definitely workaround that. |
20 is surprising. I went through looking for these recently, and didn't find nearly that many. Looking now, I only count 6 that arent part of tests:
3 and 4 don't use slots in most programs. 5 could be avoided (at the cost of either perf or system specific code), and 1/2 can be combined (at the cost of coupling). There are also a bunch in library/proc_macro, but I think they don't matter?
I'm not sure it's generally okay to have multiple copies of libstd in your program at the same time. I guess it's probably fine if they're all fully independent. All that said, I think in practice we hope to eventually have (Although... IIRC a limit like this might still exist for static TLS on windows, so maybe it's still worth thinking about) |
Static TLS is just one big array embedded in the exe. I guess the only limit would be the maximum size of a single section but I've never needed to find out. |
Ah, nice catch. IMO we definitely shouldn't do this then. |
Indeed. Linking multiple rust staticlib together is not fine as the libsyd symbols are exported outside of the staticlib. Linking multiple cdylib together is fine as it only exports user defined
I counted the amount of callers of |
Ok, so this is definitely not possible then. An alternative solution would be to use fiber local storage, as it allows destructors and can be called even from outside fiber environments (TLS was implemented on top of FLS on older Windows versions, so this is probably a stable guarantee). This is actually the solution used by libc++, so we would be in good company. However, it would be quite surprising for projects using native Windows fibers as a green thread implementation, assuming those exist (I could not find any). |
I'm pretty sure there are projects using fibers as green threads, and even if there aren't IMO we should support code that wants to do that (at least, I think this optimization is not worthwhile enough to make a difference there). So, I'm not sure I have done it in my own code before though -- it's nice to have a version of TlsAlloc that takes a dtor, it's just only really a good idea if you won't be using fibers for anything else (or know you're using it for a case where it wouldn't matter). |
By using a static array instead of a linked list for the storage of destructor function, the creation of new keys does not need to be synchronized. Also, running destructors should be a bit faster because of cache locality (using an atomic counter to store the maximum index of destructors avoids having to iterate over the whole array, assuming keys are mostly dense). This optimization comes at the cost of a minor runtime memory increase, however.
Using a fixed-size array is possible since the total number of TLS keys is capped at 1088. Since TLS keys are always numbers less than 1088, we can just use the value of the key to index into the array.
@rustbot label +O-windows-gnu