-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize threadpool paths in CoreCLR used by ASP.NET #8299
Comments
To be clear, this is Linux-specific, right? |
Actually we want both. The traces I have are actually for Windows, but you make a good point that we care about Linux more. |
Sorry, I should have just looked at the trace :) This is with thread dispatch in Kestrel, right? It looks like some of this is related to thread dispatch. That said, most seems related to I/O completion. This is particularly worrisome: Name Inc % Inc Exc % Exc
Between coreclr!ThreadpoolMgr::CompletionPortThreadStart and system.net.sockets!System.Net.Sockets.SocketAsyncEventArgs+<>c.<.cctor>b__200_0(UInt32, UInt32, System.Threading.NativeOverlapped*), something is taking up 13.8% of CPU time. That seems huge. That said, in past runs I have usually not seen this so high. (Those were typically Plaintext runs or similar.) Do you see this consistently across runs? @kouvel who may have some insight here. |
BTW @vancem: You said this is 1-2%, but if I'm reading the trace right, it's actually more like 13%, right? 1351 out of 10774? Am I missing something? |
The trace above was filtered to just show you the costs in Coreclr.dll.
Yes, it is 13% of the cost in CoreCLR (which is a lot), but the total CPU for the process is 71K (not 10K) so you have to divide by 7 which is what gets you the 1-2%. |
@stephentoub @vancem is there still work here? This is marked 2.1 and I wonder whether it is superseded or should be marked future etc. |
It's still relevant. |
Might be reading this wrong; but its it resolving the managed callsite each time? If so is this due to the AppDomain being given as the first param to e.g.
void ManagedThreadBase::ThreadPool(ADID pAppDomain, Context::ADCallBackFcnType pTarget, LPVOID args)
{
WRAPPER_NO_CONTRACT;
ManagedThreadBase_FullTransitionWithAD(pAppDomain, pTarget, args, ThreadPoolThread);
} from
|
The native preamble looks to be 5% of running something on the ThreadPool thread
i.e. 5% is already lost when it gets to |
If you go up another level, then it looses 91.69% before it hits |
Linux is more like Inc % Exc %
100.0 0.0 + 2.27.so <<libpthread-2.27.so!start_thread>>
100.0 31.9 + libcoreclr.so <<libcoreclr.so!unknown>>
38.3 4.8 + System.Private.CoreLib <<System.Private.CoreLib!System.Threading.ThreadPoolWorkQueue::Dispatch()>>
21.8 4.1 + 2.27.so <<libc-2.27.so!__sched_yield>>
4.4 0.6 + libcoreclrtraceptprovider.so <<libcoreclrtraceptprovider.so!unknown>>
0.6 0.6 + kernel.kallsyms <<kernel.kallsyms!swapgs_restore_regs_and_return_to_usermode>>
0.5 0.5 + 2.27.so <<ld-2.27.so!__tls_get_addr>>
0.5 0.5 + tracepoint.so.0.0.0 <<liblttng-ust-tracepoint.so.0.0.0!tp_rcu_read_lock_bp>>
0.4 0.4 + unknown <<unknown!/tmp/perf-4281.map>>
0.3 0.1 + System.Private.CoreLib <<System.Private.CoreLib!System.Threading.ThreadPoolWorkQueue::EnsureCurrentT
0.1 0.1 + tracepoint.so.0.0.0 <<liblttng-ust-tracepoint.so.0.0.0!tp_rcu_read_unlock_bp>>
0.1 0.1 + 2.27.so <<libpthread-2.27.so!__pthread_getspecific>>
0.1 0.1 + <<stub GenerateResolveStub !/tmp/perf-4281.map>>
0.1 0.1 + <<stub AllocateTemporaryEntryPoints !/tmp/perf-4281.map>>
0.1 0.1 + ust.so.0.0.0 <<liblttng-ust.so.0.0.0!unknown>>
0.1 0.1 + 2.27.so <<libc-2.27.so!unknown>>
0.1 0.1 + System.Private.CoreLib <<System.Private.CoreLib!System.Threading.QueueUserWorkItemCallbackDefaultCon
0.1 0.1 + System.Private.CoreLib <<System.Private.CoreLib!System.Threading.QueueUserWorkItemCallback::ExecuteW
0.1 0.1 + System.Private.CoreLib <<System.Private.CoreLib!System.Threading._ThreadPoolWaitCallback::PerformWai
0.0 0.0 + System.Private.CoreLib <<System.Private.CoreLib!System.Threading.ThreadPoolWorkQueue::Dequeue(class
0.0 0.0 + 2.27.so <<libc-2.27.so!__clock_gettime>>
0.0 0.0 + System.Private.CoreLib <<System.Private.CoreLib!System.Threading.QueueUserWorkItemCallbackDefaultCon
0.0 0.0 + 2.27.so <<libpthread-2.27.so!__lll_unlock_wake>>
0.0 0.0 + System.Net.Sockets <<System.Net.Sockets!System.Net.Sockets.SocketAsyncEventArgs::TransferCompletionC
0.0 0.0 + tracepoint.so.0.0.0 <<liblttng-ust-tracepoint.so.0.0.0!tp_rcu_dereference_sym_bp>>
0.0 0.0 + System.Net.Sockets <<System.Net.Sockets!System.Net.Sockets.SocketAsyncEventArgs::CompletionCallback(
0.0 0.0 + tracepoint.so.0.0.0 <<liblttng-ust-tracepoint.so.0.0.0!__tls_get_addr@plt>>
0.0 0.0 + System.Private.CoreLib <<System.Private.CoreLib!System.Threading.ExecutionContext::RunInternal(class
0.0 0.0 + 2.27.so <<libpthread-2.27.so!__pthread_mutex_lock>>
0.0 0.0 + <<stub AllocateTemporaryEntryPoints !/tmp/perf-4281.map>>
0.0 0.0 + kernel.kallsyms <<kernel.kallsyms!hyperv_callback_vector>>
0.0 0.0 + System.Net.Sockets <<System.Net.Sockets!System.Net.Sockets.SocketAsyncContext+OperationQueue`1[Syste
0.0 0.0 + <<stub AllocateTemporaryEntryPoints !/tmp/perf-4281.map>>
0.0 0.0 + System.Net.Sockets <<System.Net.Sockets!System.Net.Sockets.SocketAsyncEventArgs::FinishOperationAsyn
0.0 0.0 + System.Private.CoreLib <<System.Private.CoreLib!System.Threading.ThreadPoolWorkQueue::EnsureThreadRe
0.0 0.0 + System.Net.Sockets <<System.Net.Sockets!System.Net.Sockets.SocketAsyncContext+OperationQueue`1+<>c[S
0.0 0.0 + <<stub AllocateTemporaryEntryPoints !/tmp/perf-4281.map>> |
I think most of the overhead has been eliminated by now with several changes. One was to remove overhead from entering/exiting an AppDomain, there may have been others. A couple other changes moved to a mostly managed implementation of the thread pool, which avoids the frequent long call chain but there wasn't much overhead left there by then. |
See the profile
You will see that
uses 1351 msec (approximate 1-2% of the total CPU time in a Techempower benchmark). There is just alot of 'goo' that is between the OS getting a callback on a socket and the managed code getting called.
@stephentoub @geoffkizer
The text was updated successfully, but these errors were encountered: