You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Prior to PR #5304 being merged (and DD Tracers < v2.49 releases), I think the DD Runtime metric for runtime.dotnet.threads.count was more closely related to Process.Threads.Count.
After PR #5304, was merged (and DD Tracers >= v2.49 releases), I think the DD Runtime metric for runtime.dotnet.threads.count on Windows 8.1+ platforms now come from Kernel32!Pss*Snapshot.
Lastly, there is a gate/guard that prevents Kernel32!Pss*Snapshot from P/Invoke calls below, which can create further discontinuity when looking at the same apps running on different platforms on Linux vs Windows 8.1+ platforms:
The source of this discontinuity seems to be related to how these actual thread counts are sourced in the NT Kernel. That is, the "Threads Captured" from calling Kernel32!Pss*Snapshot can be different from the number reported by Process.Threads.Count (eg "Active Threads").
In other words, I don't think we should assume that when calling Kernel32!Pss*Snapshot that the PSS_THREAD_INFORMATION.ThreadsCaptured is somewhat equivalent to the previous reporting of runtime.dotnet.threads.count which previously used Process.Threads.Count "Active Threads" to report threading metrics:
Ultimately, if we filter the "Captured Threads" from Kernel32!Pss*Snapshot against PSS_THREAD_FLAGS_NONE then we will more closely align DD metric runtime.dotnet.threads.count with Process.Threads.Count (eg "Active Threads"). To filter threads by PSS_THREAD_FLAGS_NONE we'd likely need to walk the snapshot handle and apply filtering & count and simply not take threadCount = threadInformation.ThreadsCaptured at face value like we do now.
Otherwise, the PSS_THREAD_INFORMATION.ThreadsCaptured we are sourcing from Kernel32!Pss*Snapshot will include both PSS_THREAD_FLAGS_NONE and PSS_THREAD_FLAGS_TERMINATED (which are not "active threads" from the NT Kernel's perspective).
Supporting Evidence
With some reverse engineering and NT Kernel debugging, we find the following supporting evidence below.
Let's consider the following simple .NET program:
usingSystem.Diagnostics;usingstaticSystem.Console;namespaceHelloWorld{internalclassProgram{staticvoidMain(string[]args){usingvarp=Process.GetCurrentProcess();WriteLine($"Hello, World! My PID={p.Id}");ReadLine();}}}
We run the program above that prints out the PID of the HelloWorld.exe program. Additionally, we have two "monitoring" programs that use the Kernel32!Pss*Snapshot API:
Monitor.exe - A managed .NET program that contains ProcessSnapshotRuntimeInformation.cs extracted from PR Optimize runtime metrics #5304
SnapshotCPP.exe - a C++ version of the same Kernel32!Pss*Snapshot with #include <processsnapshot.h> header file so we know we are getting structs directly from Microsoft. Additionally, this program has some additional code to "walk" the thread snapshot, and dumps the PSS_THREAD_FLAGS as we walk each thread.
Pointing Monitor.exe and SnapshotCPP.exe to HelloWorld.exe; and running procexp64 and Task Manager with Threads column, we find the following:
We observe:
The Monitor.exe managed .NET P/Invoke implementation, picking up 5 threads with ProcessSnapshotRuntimeInformation and 4 threads with Process.Threads.Count. We also observe the 4 thread count is consistent with procexp64 and Task Manager with Threads reporting.
The SnapshotCPP.exe unmanaged native binary walking the snapshot, we find 4 threads with PSS_THREAD_FLAGS_NONE and 1 thread with PSS_THREAD_FLAGS_TERMINATED.
So right out of the gate, we are finding a slight discontinuity of the actual number of threads for the given process. Kernel32!Pss*Snapshot "Captured Threads" vs Process.Threads.Count.
Let's dig a bit more, let's try to find where Kernel32!Pss*Snapshot and procexp64 and Task Manager with Threads column are getting these thread counts...
🚀 💍 A small trip into Kernel space ring0
Breaking WinDbg into the NT Kernel; our target for posterity:
!process 0 0 we find HelloWorld.exe:
The Kernel _EPROCESS structure at ffffc48741c30080
The _PEB structure at 64e943e000
kd> dt nt!_EPROCESS ffffc48741c30080:
And we find the _EPROCESS.ActiveThreads = 4 value field at offset 0x5f0;
Cool. So maybe a good guess is that the NT Kernel is sourcing its reporting of "Active Threads" aka Process.Threads.Count from the _EPROCESSNT Kernel structure. Well, we have accounted for 4 threads, but we are still short 1 terminated thread. Where is it?
What about this ONEPSS_THREAD_FLAGS_TERMINATED thread TID: 4348; where in the NT Kernel is this accounting information being kept?
🌊 ⛵ Traversing _EPROCESS.ThreadListHead
The adjacent field to ActiveThreads //0x5f0 is ThreadListHead //0x5e0:
Again, best guess that _EPROCESS.ThreadListHead is a list of _ETHREAD structures and the _ETHREAD.ThreadListEntry is at field offset 0x4e8; so we need to subtract 0x4e8 from the first _EPROCESS.ThreadListHead.FLink to get the base address of the first _ETHREAD then walk the thread list from this base address with dt.
And we've identified all 5 threads, alive: TID: 4344, TID: 1340, TID: 3704, TID: 5672 and the terminated TID: 4348 (dead) thread.
Lets compare both _ETHREAD alive and a dead threads to understand what the difference is; we'll dump the following TIDs:
TID: 1340 (alive) ThreadListEntry.FLink at 0xffffc487``40b8e080(.NET Event Pipe)
TID: 4348 (dead) ThreadListEntry.FLink at 0xffffc487``4188b080
Our first indication that something is drastically different is _ETHREAD.CrossThreadFlags.Terminated = 1 at offset 0x510 for the dead thread TID: 4348. Our second indication is that _ETHREAD.KernelStackReference = 0 at offset 0x55c for the dead thread TID: 4348. I am only guessing that since this TID: 4348 dead thread has no kernel stack reference, that the kernel-side callstack of this thread is no longer allocated and is gone and there's no chance in the world for this thread to 'revive' and can no longer make any more NT system service calls/transitions via syscall.
The problem can get so bad in heavily loaded and larger complex systems that I was able to create Trouble.exe that greatly exacerbates the discontinuity/discrepancy between Kernel32!Pss*Snapshot "Thread Capture" counts and Process.Threads.Count; see below:
🗺 🧭 Guidence and Direction, and Background Context
If we want DataDog Tracer's runtime.dotnet.threads.count to maintain closer parity with previous DD Tracer < v2.49 behavior and some semblance of parity/consistency across Linux and Windows thread reporting and operating-system monitoring/tool reporting, before PR #5304 was introduced, then we'll need to augment PR #5304 with additional code to ::PssWalkSnapshot( PSS_WALK_THREADS ) and filter/count for threads with PSS_THREAD_ENTRY.Flags = PSS_THREAD_FLAGS_NONE to get an accurate count of "active threads" only.
Background Context: We had a problem with DataDog Tracer after a DD Tracer update at my work (>= 2.49+) that lead us to believe there was a serious threading problem with one of our apps and such strange behavior between Windows vs Linux, and we weren't sure why. So I spent a few Sundays investigating this super fun problem and topic; and now we know why. 🙏
Hope this helps anyone who is interested in the internals of DD Tracerruntime.dotnet.threads.count and Kernel32!Pss*Snapshot APIs on Windows 8.1+.
Thank you,
Brian Chavez
The text was updated successfully, but these errors were encountered:
bchavez
changed the title
Kernel32!Pss*Snapshot Thread Capture Counts not the same as an Active Thread Counts for runtime.dotnet.threads.count in DD Tracer v2.49+Kernel32!Pss*Snapshot Thread Capture Counts are not the same as Active Thread Counts for runtime.dotnet.threads.count in DD Tracer v2.49+
Oct 21, 2024
Hello,
I noticed there was a change in DD Tracer for .NET in
v2.49
here:Prior to PR #5304 being merged (and DD Tracers
< v2.49
releases), I think the DD Runtime metric forruntime.dotnet.threads.count
was more closely related toProcess.Threads.Count
.After PR #5304, was merged (and DD Tracers
>= v2.49
releases), I think the DD Runtime metric forruntime.dotnet.threads.count
on Windows 8.1+ platforms now come fromKernel32!Pss*Snapshot
.Lastly, there is a gate/guard that prevents
Kernel32!Pss*Snapshot
from P/Invoke calls below, which can create further discontinuity when looking at the same apps running on different platforms on Linux vs Windows 8.1+ platforms:dd-trace-dotnet/tracer/src/Datadog.Trace/RuntimeMetrics/RuntimeMetricsWriter.cs
Lines 318 to 322 in a200693
The source of this discontinuity seems to be related to how these actual thread counts are sourced in the NT Kernel. That is, the "Threads Captured" from calling
Kernel32!Pss*Snapshot
can be different from the number reported byProcess.Threads.Count
(eg "Active Threads").In other words, I don't think we should assume that when calling
Kernel32!Pss*Snapshot
that thePSS_THREAD_INFORMATION.ThreadsCaptured
is somewhat equivalent to the previous reporting ofruntime.dotnet.threads.count
which previously usedProcess.Threads.Count
"Active Threads" to report threading metrics:dd-trace-dotnet/tracer/src/Datadog.Trace/RuntimeMetrics/ProcessSnapshotRuntimeInformation.cs
Line 93 in a200693
dd-trace-dotnet/tracer/src/Datadog.Trace/RuntimeMetrics/RuntimeMetricsWriter.cs
Line 343 in a200693
Basically, what I think is missing in PR #5304 is filtering the snapshot against
PSS_THREAD_FLAGS_NONE
here:Ultimately, if we filter the "Captured Threads" from
Kernel32!Pss*Snapshot
againstPSS_THREAD_FLAGS_NONE
then we will more closely align DD metricruntime.dotnet.threads.count
withProcess.Threads.Count
(eg "Active Threads"). To filter threads byPSS_THREAD_FLAGS_NONE
we'd likely need to walk the snapshot handle and apply filtering & count and simply not takethreadCount = threadInformation.ThreadsCaptured
at face value like we do now.Otherwise, the
PSS_THREAD_INFORMATION.ThreadsCaptured
we are sourcing fromKernel32!Pss*Snapshot
will include bothPSS_THREAD_FLAGS_NONE
andPSS_THREAD_FLAGS_TERMINATED
(which are not "active threads" from the NT Kernel's perspective).Supporting Evidence
With some reverse engineering and NT Kernel debugging, we find the following supporting evidence below.
Let's consider the following simple .NET program:
We run the program above that prints out the PID of the
HelloWorld.exe
program. Additionally, we have two "monitoring" programs that use theKernel32!Pss*Snapshot
API:Monitor.exe
- A managed .NET program that containsProcessSnapshotRuntimeInformation.cs
extracted from PR Optimize runtime metrics #5304SnapshotCPP.exe
- a C++ version of the sameKernel32!Pss*Snapshot
with#include <processsnapshot.h>
header file so we know we are gettingstructs
directly from Microsoft. Additionally, this program has some additional code to "walk" the thread snapshot, and dumps thePSS_THREAD_FLAGS
as we walk each thread.Pointing
Monitor.exe
andSnapshotCPP.exe
toHelloWorld.exe
; and runningprocexp64
and Task Manager with Threads column, we find the following:We observe:
Monitor.exe
managed .NET P/Invoke implementation, picking up 5 threads withProcessSnapshotRuntimeInformation
and 4 threads withProcess.Threads.Count
. We also observe the 4 thread count is consistent withprocexp64
and Task Manager with Threads reporting.SnapshotCPP.exe
unmanaged native binary walking the snapshot, we find 4 threads withPSS_THREAD_FLAGS_NONE
and 1 thread withPSS_THREAD_FLAGS_TERMINATED
.So right out of the gate, we are finding a slight discontinuity of the actual number of threads for the given process.
Kernel32!Pss*Snapshot
"Captured Threads" vsProcess.Threads.Count
.Let's dig a bit more, let's try to find where
Kernel32!Pss*Snapshot
andprocexp64
and Task Manager with Threads column are getting these thread counts...🚀 💍 A small trip into Kernel space
ring0
Breaking WinDbg into the NT Kernel; our target for posterity:
!process 0 0
we findHelloWorld.exe
:_EPROCESS
structure atffffc48741c30080
_PEB
structure at64e943e000
kd> dt nt!_EPROCESS ffffc48741c30080
:And we find the
_EPROCESS.ActiveThreads = 4
value field at offset0x5f0
;Cool. So maybe a good guess is that the NT Kernel is sourcing its reporting of "Active Threads" aka
Process.Threads.Count
from the_EPROCESS
NT Kernel structure. Well, we have accounted for 4 threads, but we are still short 1 terminated thread. Where is it?What about this ONE
PSS_THREAD_FLAGS_TERMINATED
threadTID: 4348
; where in the NT Kernel is this accounting information being kept?🌊 ⛵ Traversing
_EPROCESS.ThreadListHead
The adjacent field to
ActiveThreads //0x5f0
isThreadListHead //0x5e0
:Again, best guess that
_EPROCESS.ThreadListHead
is a list of_ETHREAD
structures and the_ETHREAD.ThreadListEntry
is at field offset0x4e8
; so we need to subtract0x4e8
from the first_EPROCESS.ThreadListHead.FLink
to get the base address of the first_ETHREAD
then walk the thread list from this base address withdt
.So it looks like we are in the right area if we can identify the first
_ETHREAD.Cid // Client ID
; matchingPID=2148
andTID=4344
(alive;Flags:0
).Now let's enumerate all the
_ETHREAD
s now that we have a base_ETHREAD
address calculated at:ffffc487``4188c080
:And we've identified all 5 threads, alive:
TID: 4344
,TID: 1340
,TID: 3704
,TID: 5672
and the terminatedTID: 4348
(dead) thread.Lets compare both
_ETHREAD
alive and a dead threads to understand what the difference is; we'll dump the followingTID
s:TID: 1340
(alive)ThreadListEntry.FLink at 0xffffc487``40b8e080
(.NET Event Pipe)TID: 4348
(dead)ThreadListEntry.FLink at 0xffffc487``4188b080
Our first indication that something is drastically different is
_ETHREAD.CrossThreadFlags.Terminated = 1
at offset0x510
for the dead threadTID: 4348
. Our second indication is that_ETHREAD.KernelStackReference = 0
at offset0x55c
for the dead threadTID: 4348
. I am only guessing that since thisTID: 4348
dead thread has no kernel stack reference, that the kernel-side callstack of this thread is no longer allocated and is gone and there's no chance in the world for this thread to 'revive' and can no longer make any more NT system service calls/transitions viasyscall
.🔎 👻 Zooming Out and Getting out of whack
The problem can get so bad in heavily loaded and larger complex systems that I was able to create
Trouble.exe
that greatly exacerbates the discontinuity/discrepancy betweenKernel32!Pss*Snapshot
"Thread Capture" counts andProcess.Threads.Count
; see below:🗺 🧭 Guidence and Direction, and Background Context
If we want DataDog Tracer's
runtime.dotnet.threads.count
to maintain closer parity with previousDD Tracer < v2.49
behavior and some semblance of parity/consistency across Linux and Windows thread reporting and operating-system monitoring/tool reporting, before PR #5304 was introduced, then we'll need to augment PR #5304 with additional code to::PssWalkSnapshot( PSS_WALK_THREADS )
and filter/count for threads withPSS_THREAD_ENTRY.Flags = PSS_THREAD_FLAGS_NONE
to get an accurate count of "active threads" only.Background Context: We had a problem with DataDog Tracer after a DD Tracer update at my work (
>= 2.49+
) that lead us to believe there was a serious threading problem with one of our apps and such strange behavior between Windows vs Linux, and we weren't sure why. So I spent a few Sundays investigating this super fun problem and topic; and now we know why. 🙏Hope this helps anyone who is interested in the internals of DD Tracer
runtime.dotnet.threads.count
andKernel32!Pss*Snapshot
APIs on Windows 8.1+.Thank you,
Brian Chavez
The text was updated successfully, but these errors were encountered: