Replies: 8 comments 4 replies
-
I'll try to answer all your questions here - let me know if I miss anything. From what I can see you do have legitimate high JIT time. Per-method jtting that takes 1.8 seconds is crazy high. Most jitting that doesn't do assembly loads is usually sub-millisecond to maybe a few milliseconds. From the images above, it looks like you are doing a lot of dynamic code generation. I'm curious if this is intentional - often times it's either unintentional (the lambda isn't being cached), or the lambda can be refactored so that it can be cached and re-used. Additionally, I would recommend looking at the CPU stacks and/or blocked time stacks during the time range of one or more of these jitting operations. You can limit the stack viewer to the time range of one of these jit operations, and then include just the thread where the jit operation is happening. Hopefully that will help to clarify why the jit operation is taking so long. "BG JIT not enabled" just means that the multi-core JIT feature isn't enabled. This is a feature that was implemented in .NET Framework several years back to help with critical path JIT-time latency. It wouldn't help in this case because it can't pre-JIT dynamic code. I don't think you're missing any flags - I suspect we just need to understand why the jitting takes so long. Before answering your last question, I'd need to understand why the JIT operations take so long. LINQ is often considered to be slow because it hides algorithmic complexity. That may be true in this case, but what we're seeing here is high JIT time, so this is before the algorithm even runs. |
Beta Was this translation helpful? Give feedback.
-
Thank you @brianrob for the answers.
We are not intentionally loading any assemblies, its normal LINQ. Yes, some paths can be optimized.
I looked at both CPU stacks and blocked time stacks for one of the JIT operations as your guidance. Here is what I did: I took the JIT Event that happens at 2,138.817 ms and takes 1,614.4 ms in thread 14620. I Opened the CPU stack by right clicking 2138.817 column in JIT Events. And did the following steps:
Here is what I see in the By Name View And the in Call Tree View, it shows similar cost as it is entering in critical section which is before JIT does sync way thus locking. And I opened the Thread Time View for the exact time range and thread id. It shows an additional calls "WaitForAlertByThreadId" After removing the grouping and folding pattern it shows the all the time is spent in SwapContext as seen below Now, why swap context is taking that long is the question which I do know exactly but If I have to guess it is thread pool starved and waiting for thread to swap as CPU is not saturated and there is no GC event both suspension and Pause taking close to second in the same time range seen below. while thread pool new worker thread count stands at 519 after climbing moves as I set the both Min worker and IO threads at 100 I see the same behavior in individual traces taken by Azure Diagnostic Profiling - that Swap Context taking seconds. But last less thing is what I see in Any Stacks View of the same time range and thread - hinting at some ETW traces as seen below. Is it possible Azure App Service Diagnostic Basic Tracing enabled interfering with it as rundown is outside the time range. Or we can conclude that its Thread pool Starved? But additional thing I am noticing in JIT Start Events inside Events View that these expensive JIT is happening when multiple request landing for the same end point. This seems less concerning. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
This doesn't look like a thread pool starvation issue. It looks like there is a lock being taken during JIT. I think the question is what is happening while that lock is held. You should be able to tell this by looking to see what's using the CPU during the blocked time. What do you see if you filter down to one particular jitted method in the CPU stacks? You'll see the time in |
Beta Was this translation helpful? Give feedback.
-
Yes. There will be a thread that is holding this lock. I'm wondering what that thread is doing. |
Beta Was this translation helpful? Give feedback.
-
I see this stack trace in the dump, but the process did not crash. Does this mean that it happened in a different app domain?
|
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Context
JIT Statistics
The total time taken by Foreground Jit is 280833.0.
After exporting events to Excel, I see that it says "BG JIT not enabled" in Blocked Reason Column (Last Column) as seen below
When I Open the Any stacks from the events window I can see which methods are causing JIT mostly LINQ.
Please note that not all traces of JIT Methods are entering the CRIT Section.
Run Down Events
As mentioned in other issue #1889 (comment) it could be caused by the Rundown events - at least the above blocking trace falls outside the rundown events.
Rundown events below.
How do I ensure when looking JIT events that it does not include any rundown costs or blocked by rundown?
Thread Time with Start Stop Activities
When looking at Thread Time view - 98% of the sample spent time in Non-Activities, its always in high 90% in multiple traces taken with thread time enabled.
Overall CPU stays between 30 - 45 % during significant load of 2K requests per instance per 5 mins.
Thread Pool configured with min IO and Worker threads to 100 but thread pool reports that New Worker Thread Count in low 500+ under max load (up to 500 per sec on each instance). The reason is listed as ClimbingMove.
Looking into Activities View it shows 69% Blocked Time
Under blocked time it is again it is spending around 27% in JIT again and it enters in crit section perhaps blocked by GC.
The overall GC sort of acceptable but Gen 0 is always at 100% allocation and almost all trigger reasons are AllocSmall.
Questions and Help
Any guidance to pin the root cause is greatly appreciated. Tagging @brianrob
Share Trace
I will have to check the policy to whether I can share the trace or not. Appreciate the offer.
Beta Was this translation helpful? Give feedback.
All reactions