Correctly Document UNMANAGED_CODE_TIME vs CPU_TIME #1166

josalem · 2020-05-26T16:34:20Z

Is there any documentation about what is UNMANAGED_CODE_TIME? When I look into the Thread Time stack, UNMANAGED_CODE_TIME always takes up 100% of the time. It seems very obscure as to what exactly it is collecting.

Originally posted by @iliamosko in #976 (comment)

josalem · 2020-05-26T16:36:12Z

@iliamosko, could you post a sample trace with this behavior and describe your environment, e.g., what version of .NET the target app is running, what version of dotnet-trace was used, what platform you were running on, what you were using to view the trace, and whether you were running inside a container?

iliamosko · 2020-05-26T17:41:19Z

My application is running on .NET Core 3.1 which is trying to simulate high CPU load. The dotnet-trace version I'm using is 3.1.120604. For the trace collected I used the default dotnet-trace collect command with a 1 minute duration:

dotnet-trace collect -p <PID> --duration 00:00:01:00

I've used that command on both a virtual Linux machine(WSL) and on a Windows 10 machine. I'm getting similar results for both traces, but I'm going to focus on the trace collected on Windows. I'm using PerfView that is running on version 2.0.52 to view the collected trace file.

By Name stack:

Call-Tree stack:

Callers stack:

So my questions is,

What exactly is UNMANAGED_CODE_TIME? What is it collecting?

When comparing that trace to one that uses this dotnet-trace collect command:

dotnet-trace collect -p <PID> --providers '*' --duration 00:00:01:00

UNMANAGED_CODE_TIME gets split into CPU_TIME which is understandable but it still doesn't answer what UNMANAGED_CODE_TIME is collecting.

josalem · 2020-05-26T17:50:19Z

Can you expand on what your test app is doing? UNMANAGED_CODE_TIME means what the name implies: it is time where unmanaged code was on the stack, e.g., calls into native code like the runtime. The example stack you have there shows UNMANAGED_CODE_TIME on top of System.DateTime.get_UtcNow() which is probably because the implementation of the getter for UtcNow() calls into native code to retrieve the datetime value for the system. The in-box CPU sampler in the runtime can only walk managed code, which means that native code will show up in traces as UNMANAGED_CODE_TIME. If you want native frames as well as managed frames, you would need to use a different tool to do your CPU profiling (see: perf on linux or ETW on Windows).

iliamosko · 2020-05-26T20:43:29Z

So just taking a look at the DoSomeMath method, here is what I am doing:

Main:

DoSomeMath method:

All it does is get the current time and run the DoSomeMath method on a different thread for a certain amount of time. Other methods in this app follow the same approach.

Would it be possible to determine which actions in UNMANAGED_CODE_TIME are being done in cpu time? or is it the CPU-TIME stack, which was collected in the second trace, that I am looking for?

baal2000 · 2020-05-27T16:11:48Z

@josalem
To give a bit of the context, we are moving towards OS-agnostic performance tracing in windows/linux by fully replacing PervView with dotnet -trace tool. Considering that a good measure of .NET calls end up in unmanaged space, production incident troubleshooting (with PerfView) often finds CPU being burnt in a system rather than managed call. If, by design, dotnet -trace may not be able to tell whether any given (whether system - bound or managed) .NET call ends up eating CPU, the usability of such a tool is significantly limited and we need to take a look elsewhere and hope it is going to get better in time. Linux is the main concern because of no PerfView there. This is what we are trying to understand.

baal2000 · 2020-05-27T16:14:20Z

Other methods in this app follow the same approach

This is a test app to understand trace limitations.

josalem · 2020-05-27T17:42:11Z

A stack sample that has UNMANAGED_CODE_TIME at the top, still has the chain of managed events that resulted in that native code call, e.g., in your sample images above from the PerfView UI you collected samples of the following stack:

UNMANAGED_CODE_TIME                               (18.4%)
System.DateTime.get_UtcNow                        (23.4%)
TestApp.Program.DoSomeMath                        (33.4%)
System.Threading.ThreadHelper.ThreadStart_context (33.4%)
...

where those percentages are inclusive samples. This tells you what managed calls lead to spending CPU time in native code, as well as the % of stack samples that inclusively had that frame in that position. When you put the PerfView UI into Caller-Callee view and bucket on UNMANGED_CODE_TIME, it's going to treat all UNMANAGED_CODE_TIME frames as the same thing and you're going to get a very distorted view of the trace.

If, by design, dotnet -trace may not be able to tell whether any given (whether system - bound or managed) .NET call ends up eating CPU

I'm not quite sure I understand your statement here. The trace shows you which managed calls had the most inclusive samples and by extension spent the most time on the stack. In your images above, you can see that ~23% of stacks sampled on thread 14448 had the frame System.DateTime.get_UtcNow in that position on the stack. 18.4% of samples on that thread were stacks with calls from System.DateTime.get_UtcNow into native code. The % isn't a measure of time, but you can extrapolate that if there is a high % of samples with that stack, that a good amount of time was spent with that method on the stack. That is all relative to the number of samples on a given thread, though. That should indicate to you which calls took the most time. Even if a managed frame calls into native code, the managed frame is still on the stack and you can derive understanding of what "took the most time" from that.

I think what might be causing confusion here is the parsing of the data rather than the data itself. It sounds like you want to know the answer the question: "What managed calls took the most time?" If that is the question you want answered, looking at the unfiltered caller-callee view won't be the best way to answer it. I would recommend looking at a flamegraph (try dotnet trace convert --format Speedscope <trace-file> and open at https://www.speedscope.app for doing this without PerfView). Additionally, I would filter out the UNMANAGED_CODE_TIME frames when using PerfView views other than CallTree and Caller-Callee as it will bucket them all as the same frame despite them not being the same. If you filter them out, you will be able to get a clearer picture of just managed frames using the standard PerfView views.

If you want to answer the question, "What percent of samples were collected while executing kernel code?" or "How much time was spent inside DoStuff() from MyNativeCode.so?" then you will need to use a different tracing tool. Specifically, you will need a CPU sampling tool that is capable of collecting native frames, e.g., perf on Linux.

Let me know if that helps/answered your question 😄

CC @shirhatti @noahfalk

baal2000 · 2020-05-27T18:15:57Z

@josalem
Thank you John. That makes perfect sense for

UNMANAGED_CODE_TIME                               (18.4%)
System.DateTime.get_UtcNow                        (23.4%)
TestApp.Program.DoSomeMath                        (33.4%)
System.Threading.ThreadHelper.ThreadStart_context (33.4%)
...

I'm not quite sure I understand your statement here.

I think the sticking point is what "the most time on the stack" means in

which managed calls had the most inclusive samples and by extension spent the most time on the stack

Is the thread time CPU or BLOCKED?

For example, what about this screenshot:

UNMANAGED_CODE_TIME                              (66.2%)
System.Console.il!ConsolePal.ReadKey(bool)       (19.3%)
System.DateTime.get_UtcNow                       (17.0%)
...

ReadKey call is not CPU- bound for what we know, yet it is lumped together with get_UtcNow under the same UNMANAGED_CODE_TIME bucket.

Hard to tell without specialized knowledge of how the framework implements this or that API.
Much easier if we had something like this:

CPU_TIME                                             (79.2%)
    UNMANAGED_CODE_TIME                              (46.2%)
        System.DateTime.get_UtcNow                   (17.0%)
        ...
     Math.Acos()                                     (33.0%)
BLOCKED_TIME                                         (19.3%)
    UNMANAGED_CODE_TIME                              (19.3%)
      System.Console.il!ConsolePal.ReadKey(bool)     (19.3%)

Thank you for looking into this!

josalem · 2020-05-27T21:44:20Z

I took a bit of a deeper look into what causes PerfView to output CPU_TIME vs UNMANAGED_CODE_TIME and it looks like there might be an issue in which stacks are getting labelled under the wrong category. I'll investigate and create an issue in the Microsoft/PerfView repo if I can pinpoint what's happening. My understanding is that UNMANAGED_CODE_TIME is supposed to be "external" code frames, while CPU_TIME is general "working time" for a stack.

As for differentiating between "blocked" time and "working" time, that is unfortunately something that I don't think dotnet-trace will be capable of. The traces collected by dotnet-trace were designed to contain the maximum amount of information possible in a platform agnostic way. This means they won't contain information pertaining to the OS scheduler and by extension things like "when a thread has been put to sleep waiting on something to complete". You will be able to see how long managed methods were on the stack and derive information from that, but if you want OS Scheduler details, then you'll need to use ETW on Windows or something like perf on Linux (see perfcollect for something already tuned for .NET).

baal2000 · 2020-05-28T03:00:38Z

looks like there might be an issue in which stacks are getting labelled under the wrong category.

We are going to try https://www.speedscope.app/ then.

Thanks.

josalem · 2020-11-24T18:46:28Z

The code responsible for adding the UNMANAGED_CODE_TIME and CPU_TIME frames for EventPipe traces is found here:

            public void LogThreadStack(double timeRelativeMSec, StackSourceCallStackIndex stackIndex, TraceThread thread, SampleProfilerThreadTimeComputer computer, bool onCPU)
            {
                if (onCPU)
                {
                    if (ThreadRunning) // continue running 
                    {
                        AddCPUSample(timeRelativeMSec, thread, computer);
                    }
                    else if (ThreadBlocked) // unblocked
                    {
                        AddBlockTimeSample(timeRelativeMSec, thread, computer);
                        LastBlockStackRelativeMSec = -timeRelativeMSec;
                    }


                    LastCPUStackRelativeMSec = timeRelativeMSec;
                    LastCPUCallStack = stackIndex;
                }
                else
                {
                    if (ThreadBlocked) // continue blocking
                    {
                        AddBlockTimeSample(timeRelativeMSec, thread, computer);
                    }
                    else if (ThreadRunning) // blocked
                    {
                        AddCPUSample(timeRelativeMSec, thread, computer);
                    }


                    LastBlockStackRelativeMSec = timeRelativeMSec;
                    LastBlockCallStack = stackIndex;
                }
            }

https://github.com/microsoft/perfview/blob/e75e3856d53c123c7f7ba202579d44ba4d91a7b8/src/TraceEvent/Computers/SampleProfilerThreadTimeComputer.cs#L477-L508

AddCPUSample adds the CPU_TIME frame, while AddBlockTimeSameple adds UNMANAGED_CODE_TIME. Unfortunately, I believe these there may be an issue where managed stacks are being marked as "external" code, so stacks that should have CPU_TIME as the leaf frame instead have UNMANAGED_CODE_TIME. Regardless of that issue, we should document the intended and correct meanings of these frames. This issue will be used to track that documentation. (#515)

baal2000 · 2021-06-10T12:05:34Z

@josalem : I believe these there may be an issue where managed stacks are being marked as "external" code

Then this would be rather be a bug, not enchancement issue?

josalem · 2021-06-10T18:48:37Z

This issue is only tracking the documentation of what those values mean. The underlying issue is being tracked here: dotnet/runtime#45179. I'm not sure why they weren't linked.

baal2000 · 2021-06-10T20:31:39Z

Thanks for the update @josalem, now I know this does not apply to our situation (on Linux).

noahfalk · 2022-06-02T06:01:58Z

@josalem I see this got closed but was there a doc change that completed it? I wasn't able to find it.

josalem · 2022-06-02T15:35:53Z

I added docs on the PerfView side here: microsoft/perfview#1613

I didn't add any docs in the dotnet docs though. If we do, we should just link to the PerfView ones.

noahfalk · 2022-06-02T19:28:32Z

Ah thats right, I remember talking about the PerfView docs now. Thanks for reminding my forgetful ole self ;p

tommcdon added the documentation Documentation related issue label Jun 1, 2020

tommcdon added this to the 5.0 milestone Jun 1, 2020

josalem changed the title ~~Is there any documentation about what is _UNMANAGED_CODE_TIME_?~~ Correctly Document UNMANAGED_CODE_TIME vs CPU_TIME Nov 24, 2020

josalem modified the milestones: 5.0, 6.0 Nov 24, 2020

josalem self-assigned this Nov 24, 2020

tommcdon mentioned this issue Mar 3, 2021

Add docs to better describe the .NET diagnostic experience #515

Closed

26 tasks

tommcdon added the enhancement New feature or request label Mar 3, 2021

tommcdon modified the milestones: 6.0.0, 7.0.0 Jun 21, 2021

josalem mentioned this issue Apr 18, 2022

Add docs on UNMANAGED_CODE_TIME and CPU_TIME microsoft/perfview#1613

Merged

josalem closed this as completed in microsoft/perfview#1613 Apr 25, 2022

ghost locked as resolved and limited conversation to collaborators Jun 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correctly Document UNMANAGED_CODE_TIME vs CPU_TIME #1166

Correctly Document UNMANAGED_CODE_TIME vs CPU_TIME #1166

josalem commented May 26, 2020

josalem commented May 26, 2020

iliamosko commented May 26, 2020

josalem commented May 26, 2020

iliamosko commented May 26, 2020

baal2000 commented May 27, 2020

baal2000 commented May 27, 2020 •

edited

Loading

josalem commented May 27, 2020 •

edited

Loading

baal2000 commented May 27, 2020 •

edited

Loading

josalem commented May 27, 2020

baal2000 commented May 28, 2020

josalem commented Nov 24, 2020

baal2000 commented Jun 10, 2021

josalem commented Jun 10, 2021

baal2000 commented Jun 10, 2021

noahfalk commented Jun 2, 2022

josalem commented Jun 2, 2022

noahfalk commented Jun 2, 2022

Correctly Document UNMANAGED_CODE_TIME vs CPU_TIME #1166

Correctly Document UNMANAGED_CODE_TIME vs CPU_TIME #1166

Comments

josalem commented May 26, 2020

josalem commented May 26, 2020

iliamosko commented May 26, 2020

josalem commented May 26, 2020

iliamosko commented May 26, 2020

baal2000 commented May 27, 2020

baal2000 commented May 27, 2020 • edited Loading

josalem commented May 27, 2020 • edited Loading

baal2000 commented May 27, 2020 • edited Loading

josalem commented May 27, 2020

baal2000 commented May 28, 2020

josalem commented Nov 24, 2020

baal2000 commented Jun 10, 2021

josalem commented Jun 10, 2021

baal2000 commented Jun 10, 2021

noahfalk commented Jun 2, 2022

josalem commented Jun 2, 2022

noahfalk commented Jun 2, 2022

baal2000 commented May 27, 2020 •

edited

Loading

josalem commented May 27, 2020 •

edited

Loading

baal2000 commented May 27, 2020 •

edited

Loading