-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NativeAOT] Optimize the access pattern to threadstatics. #84373
Comments
Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas Issue DetailsPrior to implementing JIT inlining of threadstatic access, it makes sense to simplify the access pattern. There could be some opportunities. For example we may not need per-module indirection.
|
According to @MichalStrehovsky multimodule scenario is not something that is commonly used, but it is functional (there are tests) and it may make sense to not break it, if possible. |
The multimodule scenario leaves performance on the table in general. It is ok to use a slow helper for it if we want to keep it working. We should focus on optimizing the thread-static access pattern for the single module case with assumption that the JIT can help by inlining it. How far do you think it makes to push it? Should we push it to be as efficient as thread statics in C where possible (non-GC non-generic statics)? |
C threadstatic can be very efficient. Here is the access to the current thread in the native code: Thread * pThread = ThreadStore::GetCurrentThread();
00007FF7BAB06FD2 mov edx,dword ptr [_tls_index (07FF7BACB2B48h)]
00007FF7BAB06FD8 mov rax,qword ptr gs:[58h]
00007FF7BAB06FE1 mov ebx,10h
00007FF7BAB06FE6 add rbx,qword ptr [rax+rdx*8] This is basically getting native thread static I think we can have this for unmanaged threadstatics only if we emit unmanaged threadstatics in the same way as C and let the C runtime manage them. It would be attractive for threadstatics such as managed thread ID. Managed and dynamic threadstatics will need something similar to what we have now, but I think we can reduce number of indirections. If the above is too hard or fragile, then unmanaged stuff could be on the same plan as managed. |
For the current scheme - I think allocating a storage per type is a bit of a waste. At least in the nondynamic case. I think allocating just one instance per thread that provides storage to all nondynamic threadstatics, could work better. As I looked through the use of threadstatics in libraries tests, the per-type array can get up to 512 bytes - that is to hold up to 64 references to per-type storages. I think the "combo" instance will rarely be larger than 1K. |
in a first approximation I'd have two slots on the native thread instance:
The tricky part about "combo" instance is that |
CC: @kunalspathak |
Actually, code does not know statically if it is in module0, so we will need one check. Either for the module number==0 or whether there are multiple modules. |
The compiler knows whether it is emitting multi-module code ( |
That is good! Then we can tell the jit to just call the helper. |
Note that we have two helpers:
The reason we have My plan is to simplify the storage how we can, which should result in simpler asm helper for the fast path. As the next step @kunalspathak will teach JIT to inline that piece of asm, so that |
An initial implementation of simpler access to threadstatics - #84566 It separates the fast "inlineable" case from multimodule/dynamic case and preinitializes the storage, so that the asm helper does not need to check for anything or call anything. It does not yet introduce a "combo" instance. It looks like that could require some changes in how JIT deals with threadstatics - i.e. the helper would return a |
Fixes: #84373 - [x] separate "fast" inlinable case . (used for singlemodule, not dynamic cases, when optimizing) - [x] make the storage for fast threadstatics a single "combo" instance instead of array of instances.
Prior to implementing JIT inlining of threadstatic access, it makes sense to simplify the access pattern.
There could be some opportunities. For example we may not need per-module indirection.
Re: #82973 (comment)
The text was updated successfully, but these errors were encountered: