-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize runtime metrics #5304
Optimize runtime metrics #5304
Conversation
|
||
if (_pssConsecutiveFailures >= 3) | ||
{ | ||
Log.Error("Pss failed 3 times in a row, falling back to the Process API"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be a Warning message instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it an error because it's not supposed to ever happen, and I'd like to see it in telemetry logs if it does.
Datadog ReportBranch report: ✅ 0 Failed, 330087 Passed, 1597 Skipped, 35m 15.86s Wall Time |
public long CreateTime; | ||
public long ExitTime; | ||
public long KernelTime; | ||
public long UserTime; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the FILETIME structure is composed of two DWORDS (which I believe are unsigned 32-bit values), should this be a ulong
value instead of a long
value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ultimately I don't believe it will change anything because we give them to TimeSpan.FromTicks
which expects a long (so we have to cast them back). Still, I suppose it's better to get the marshalled structure right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 0fe295c
namespace Datadog.Trace.RuntimeMetrics; | ||
|
||
// ReSharper disable InconsistentNaming UnusedMember.Local | ||
internal class PssRuntimeInformation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super small nit: Can we expand Pss
to ProcessSnapshot
in the class name, so ProcessSnapshotRuntimeInformation
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 0fe295c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, LGTM! Nice work!
Regarding the app trimming and assembly reference snapshot diffs, this PR adds a reference to the |
Execution-Time Benchmarks Report ⏱️Execution-time results for samples comparing the following branches/commits: Execution-time benchmarks measure the whole time it takes to execute a program. And are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are shown in red. The following thresholds were used for comparing the execution times:
Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard. Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph). gantt
title Execution time (ms) FakeDbCommand (.NET Framework 4.6.2)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (5304) - mean (71ms) : 69, 73
. : milestone, 71,
master - mean (73ms) : 65, 82
. : milestone, 73,
section CallTarget+Inlining+NGEN
This PR (5304) - mean (1,008ms) : 991, 1024
. : milestone, 1008,
master - mean (1,005ms) : 978, 1033
. : milestone, 1005,
gantt
title Execution time (ms) FakeDbCommand (.NET Core 3.1)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (5304) - mean (111ms) : 108, 115
. : milestone, 111,
master - mean (111ms) : 108, 114
. : milestone, 111,
section CallTarget+Inlining+NGEN
This PR (5304) - mean (739ms) : 719, 759
. : milestone, 739,
master - mean (733ms) : 711, 754
. : milestone, 733,
gantt
title Execution time (ms) FakeDbCommand (.NET 6)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (5304) - mean (94ms) : 91, 96
. : milestone, 94,
master - mean (93ms) : 91, 96
. : milestone, 93,
section CallTarget+Inlining+NGEN
This PR (5304) - mean (686ms) : 660, 711
. : milestone, 686,
master - mean (683ms) : 665, 702
. : milestone, 683,
gantt
title Execution time (ms) HttpMessageHandler (.NET Framework 4.6.2)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (5304) - mean (187ms) : 184, 191
. : milestone, 187,
master - mean (187ms) : 185, 190
. : milestone, 187,
section CallTarget+Inlining+NGEN
This PR (5304) - mean (1,075ms) : 1049, 1100
. : milestone, 1075,
master - mean (1,079ms) : 1060, 1099
. : milestone, 1079,
gantt
title Execution time (ms) HttpMessageHandler (.NET Core 3.1)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (5304) - mean (271ms) : 268, 275
. : milestone, 271,
master - mean (271ms) : 267, 275
. : milestone, 271,
section CallTarget+Inlining+NGEN
This PR (5304) - mean (882ms) : 859, 906
. : milestone, 882,
master - mean (883ms) : 861, 904
. : milestone, 883,
gantt
title Execution time (ms) HttpMessageHandler (.NET 6)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (5304) - mean (260ms) : 256, 264
. : milestone, 260,
master - mean (260ms) : 256, 264
. : milestone, 260,
section CallTarget+Inlining+NGEN
This PR (5304) - mean (858ms) : 830, 886
. : milestone, 858,
master - mean (861ms) : 841, 882
. : milestone, 861,
|
Throughput/Crank Report:zap:Throughput results for AspNetCoreSimpleController comparing the following branches/commits: Cases where throughput results for the PR are worse than latest master (5% drop or greater), results are shown in red. Note that these results are based on a single point-in-time result for each branch. For full results, see one of the many, many dashboards! gantt
title Throughput Linux x64 (Total requests)
dateFormat X
axisFormat %s
section Baseline
This PR (5304) (11.100M) : 0, 11099977
master (11.029M) : 0, 11029393
benchmarks/2.9.0 (11.199M) : 0, 11198849
section Automatic
This PR (5304) (7.533M) : 0, 7532883
master (7.725M) : 0, 7725218
benchmarks/2.9.0 (8.073M) : 0, 8073104
section Trace stats
This PR (5304) (7.949M) : 0, 7948656
master (7.999M) : 0, 7999147
section Manual
This PR (5304) (9.574M) : 0, 9574277
master (9.720M) : 0, 9720192
section Manual + Automatic
This PR (5304) (7.234M) : 0, 7233944
master (7.300M) : 0, 7299991
section Version Conflict
This PR (5304) (6.502M) : 0, 6501952
master (6.535M) : 0, 6534605
gantt
title Throughput Linux arm64 (Total requests)
dateFormat X
axisFormat %s
section Baseline
This PR (5304) (9.451M) : 0, 9451208
master (9.645M) : 0, 9645161
benchmarks/2.9.0 (9.424M) : 0, 9424414
section Automatic
This PR (5304) (6.661M) : 0, 6660745
master (6.570M) : 0, 6569649
section Trace stats
This PR (5304) (6.797M) : 0, 6796817
master (6.920M) : 0, 6919843
section Manual
This PR (5304) (8.324M) : 0, 8323730
master (8.279M) : 0, 8278546
section Manual + Automatic
This PR (5304) (6.232M) : 0, 6231846
master (6.119M) : 0, 6118527
section Version Conflict
This PR (5304) (5.513M) : 0, 5512778
master (5.557M) : 0, 5557390
gantt
title Throughput Windows x64 (Total requests)
dateFormat X
axisFormat %s
section Baseline
This PR (5304) (10.072M) : 0, 10071694
master (9.952M) : 0, 9952258
benchmarks/2.9.0 (9.559M) : 0, 9558711
section Automatic
This PR (5304) (7.191M) : 0, 7190754
master (6.660M) : 0, 6659881
benchmarks/2.9.0 (7.049M) : 0, 7049098
section Trace stats
This PR (5304) (7.388M) : 0, 7387714
master (7.461M) : 0, 7460529
section Manual
This PR (5304) (8.764M) : 0, 8764246
master (8.877M) : 0, 8876838
section Manual + Automatic
This PR (5304) (6.855M) : 0, 6854979
master (6.824M) : 0, 6823787
section Version Conflict
This PR (5304) (6.262M) : 0, 6262132
master (6.200M) : 0, 6199775
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM asfar as I can tell, but I'm trusting you that the structs are all correct etc :D
tracer/src/Datadog.Trace/RuntimeMetrics/ProcessSnapshotRuntimeInformation.cs
Outdated
Show resolved
Hide resolved
|
||
if (result != 0) | ||
{ | ||
Log.Error<IntPtr, int>("PssFreeSnapshot returned an error, the tracer might be leaking memory. Handle: {Handle}. Error code: {Result}.", snapshotHandle, result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:panic:
} | ||
catch | ||
{ | ||
_pssConsecutiveFailures += 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can technically have more than one RuntimeMetricsWriter in version conflict, right? Should we use interlocked here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then they're different RuntimeMetricsWriter and they all have their own counter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. And we never have more than one instance of the same type, when we replace the tracer instance for example (before the old one has been disposed)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though honestly I can't think of any reason not to use an interlocked here so I guess I can play safe and add one
…Information.cs Co-authored-by: Andrew Lock <andrew.lock@datadoghq.com>
Benchmarks Report for tracer 🐌Benchmarks for #5304 compared to master:
The following thresholds were used for comparing the benchmark speeds:
Allocation changes below 0.5% are ignored. Benchmark detailsBenchmarks.Trace.ActivityBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.AgentWriterBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.AspNetCoreBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.DbCommandBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.ElasticsearchBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.GraphQLBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.HttpClientBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.ILoggerBenchmark - Slower
|
Benchmark | diff/base | Base Median (ns) | Diff Median (ns) | Modality |
---|---|---|---|---|
Benchmarks.Trace.ILoggerBenchmark.EnrichedLog‑net6.0 | 1.141 | 1,447.84 | 1,652.56 |
Raw results
Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|
master | EnrichedLog |
net6.0 | 1.45μs | 0.629ns | 2.35ns | 0.0231 | 0 | 0 | 1.65 KB |
master | EnrichedLog |
netcoreapp3.1 | 2.11μs | 0.959ns | 3.71ns | 0.0221 | 0 | 0 | 1.65 KB |
master | EnrichedLog |
net472 | 2.54μs | 1.16ns | 4.34ns | 0.25 | 0 | 0 | 1.57 KB |
#5304 | EnrichedLog |
net6.0 | 1.65μs | 0.571ns | 2.21ns | 0.023 | 0 | 0 | 1.65 KB |
#5304 | EnrichedLog |
netcoreapp3.1 | 2.22μs | 1.38ns | 5.18ns | 0.0225 | 0 | 0 | 1.65 KB |
#5304 | EnrichedLog |
net472 | 2.55μs | 1.2ns | 4.48ns | 0.25 | 0 | 0 | 1.57 KB |
Benchmarks.Trace.Log4netBenchmark - Same speed ✔️ Same allocations ✔️
Raw results
Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|
master | EnrichedLog |
net6.0 | 113μs | 197ns | 763ns | 0 | 0 | 0 | 4.23 KB |
master | EnrichedLog |
netcoreapp3.1 | 118μs | 102ns | 381ns | 0.0592 | 0 | 0 | 4.23 KB |
master | EnrichedLog |
net472 | 145μs | 92.8ns | 360ns | 0.654 | 0.218 | 0 | 4.41 KB |
#5304 | EnrichedLog |
net6.0 | 114μs | 48.8ns | 183ns | 0.0574 | 0 | 0 | 4.23 KB |
#5304 | EnrichedLog |
netcoreapp3.1 | 118μs | 124ns | 479ns | 0.0593 | 0 | 0 | 4.23 KB |
#5304 | EnrichedLog |
net472 | 147μs | 53.4ns | 207ns | 0.662 | 0.221 | 0 | 4.41 KB |
Benchmarks.Trace.NLogBenchmark - Same speed ✔️ Same allocations ✔️
Raw results
Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|
master | EnrichedLog |
net6.0 | 3μs | 1.16ns | 4.03ns | 0.0303 | 0 | 0 | 2.21 KB |
master | EnrichedLog |
netcoreapp3.1 | 4.28μs | 2.4ns | 8.97ns | 0.0298 | 0 | 0 | 2.21 KB |
master | EnrichedLog |
net472 | 4.84μs | 2.15ns | 8.06ns | 0.321 | 0 | 0 | 2.02 KB |
#5304 | EnrichedLog |
net6.0 | 3.19μs | 11.7ns | 58.4ns | 0.0311 | 0 | 0 | 2.21 KB |
#5304 | EnrichedLog |
netcoreapp3.1 | 4.14μs | 3.33ns | 12.9ns | 0.0287 | 0 | 0 | 2.21 KB |
#5304 | EnrichedLog |
net472 | 4.97μs | 1.09ns | 4.09ns | 0.32 | 0 | 0 | 2.02 KB |
Benchmarks.Trace.RedisBenchmark - Same speed ✔️ Same allocations ✔️
Raw results
Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|
master | SendReceive |
net6.0 | 1.41μs | 0.571ns | 2.21ns | 0.0162 | 0 | 0 | 1.18 KB |
master | SendReceive |
netcoreapp3.1 | 1.87μs | 1.14ns | 4.27ns | 0.0159 | 0 | 0 | 1.18 KB |
master | SendReceive |
net472 | 2.3μs | 1ns | 3.89ns | 0.187 | 0 | 0 | 1.18 KB |
#5304 | SendReceive |
net6.0 | 1.5μs | 0.986ns | 3.55ns | 0.0165 | 0 | 0 | 1.18 KB |
#5304 | SendReceive |
netcoreapp3.1 | 1.81μs | 1.43ns | 5.35ns | 0.0163 | 0 | 0 | 1.18 KB |
#5304 | SendReceive |
net472 | 2.23μs | 0.704ns | 2.63ns | 0.186 | 0 | 0 | 1.18 KB |
Benchmarks.Trace.SerilogBenchmark - Same speed ✔️ Same allocations ✔️
Raw results
Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|
master | EnrichedLog |
net6.0 | 2.69μs | 0.905ns | 3.39ns | 0.0215 | 0 | 0 | 1.55 KB |
master | EnrichedLog |
netcoreapp3.1 | 3.9μs | 1ns | 3.89ns | 0.0213 | 0 | 0 | 1.6 KB |
master | EnrichedLog |
net472 | 4.43μs | 2.12ns | 8.23ns | 0.315 | 0 | 0 | 1.99 KB |
#5304 | EnrichedLog |
net6.0 | 2.75μs | 1.07ns | 4.01ns | 0.022 | 0 | 0 | 1.55 KB |
#5304 | EnrichedLog |
netcoreapp3.1 | 4.1μs | 15ns | 58.1ns | 0.0219 | 0 | 0 | 1.6 KB |
#5304 | EnrichedLog |
net472 | 4.56μs | 1.79ns | 6.69ns | 0.316 | 0 | 0 | 1.99 KB |
Benchmarks.Trace.SpanBenchmark - Slower ⚠️ Same allocations ✔️
Slower ⚠️ in #5304
Benchmark
diff/base
Base Median (ns)
Diff Median (ns)
Modality
Benchmarks.Trace.SpanBenchmark.StartFinishSpan‑net6.0
1.191
451.45
537.69
Benchmark | diff/base | Base Median (ns) | Diff Median (ns) | Modality |
---|---|---|---|---|
Benchmarks.Trace.SpanBenchmark.StartFinishSpan‑net6.0 | 1.191 | 451.45 | 537.69 |
Raw results
Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|
master | StartFinishSpan |
net6.0 | 451ns | 0.527ns | 1.97ns | 0.00771 | 0 | 0 | 552 B |
master | StartFinishSpan |
netcoreapp3.1 | 682ns | 1.07ns | 4.15ns | 0.00736 | 0 | 0 | 552 B |
master | StartFinishSpan |
net472 | 730ns | 0.393ns | 1.52ns | 0.0877 | 0 | 0 | 554 B |
master | StartFinishScope |
net6.0 | 600ns | 0.714ns | 2.67ns | 0.00952 | 0 | 0 | 672 B |
master | StartFinishScope |
netcoreapp3.1 | 813ns | 1.04ns | 4.04ns | 0.0089 | 0 | 0 | 672 B |
master | StartFinishScope |
net472 | 1.01μs | 1.03ns | 3.86ns | 0.1 | 0 | 0 | 634 B |
#5304 | StartFinishSpan |
net6.0 | 537ns | 0.331ns | 1.28ns | 0.00767 | 0 | 0 | 552 B |
#5304 | StartFinishSpan |
netcoreapp3.1 | 700ns | 1.22ns | 4.41ns | 0.00733 | 0 | 0 | 552 B |
#5304 | StartFinishSpan |
net472 | 770ns | 1.5ns | 5.79ns | 0.0878 | 0 | 0 | 554 B |
#5304 | StartFinishScope |
net6.0 | 586ns | 0.62ns | 2.32ns | 0.00946 | 0 | 0 | 672 B |
#5304 | StartFinishScope |
netcoreapp3.1 | 848ns | 1.14ns | 4.41ns | 0.00881 | 0 | 0 | 672 B |
#5304 | StartFinishScope |
net472 | 954ns | 1.68ns | 6.52ns | 0.1 | 0 | 0 | 634 B |
Benchmarks.Trace.TraceAnnotationsBenchmark - Same speed ✔️ Same allocations ✔️
Raw results
Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|
master | RunOnMethodBegin |
net6.0 | 700ns | 0.762ns | 2.95ns | 0.00944 | 0 | 0 | 672 B |
master | RunOnMethodBegin |
netcoreapp3.1 | 993ns | 1.81ns | 7.01ns | 0.00908 | 0 | 0 | 672 B |
master | RunOnMethodBegin |
net472 | 1.09μs | 1.73ns | 6.72ns | 0.101 | 0 | 0 | 634 B |
#5304 | RunOnMethodBegin |
net6.0 | 750ns | 1.05ns | 4.08ns | 0.00948 | 0 | 0 | 672 B |
#5304 | RunOnMethodBegin |
netcoreapp3.1 | 987ns | 0.626ns | 2.42ns | 0.00922 | 0 | 0 | 672 B |
#5304 | RunOnMethodBegin |
net472 | 1.04μs | 1.79ns | 6.93ns | 0.1 | 0 | 0 | 634 B |
Summary of changes
Optimize the process information collection in the runtime metrics, on Windows.
Reason for change
We have seen multiple cases where it appeared as a hotspot in customers who enabled the continuous profiler (I'm not saying that the CP is the cause, just that we're blind for other customers). Also, I often see dead
ProcessThread
objects in memory dumps, indicating that it's a significant source of allocations.Implementation details
The old version uses the .NET
Process
object to retrieve the required information. On Windows, it relies onNtQuerySystemInformation
, which is very slow because it actually queries the information for all processes on the machine. To make things worse, .NET allocatesProcessThread
objects for each thread, even though we're only interested in the count. To make worse things worse,ProcessThread
objects have a finalizer, increasing the probability that they end up in gen 2.This PR relies on the process snapshotting APIs introduced in Windows 8.1. They're much faster because they target a single process, and use copy-on-write. The old API is kept as a fallback in case we encounter a non-transient error (arbitrarily defined as "3 consecutive errors") or if the target application uses an older version of Windows.
I had to declare separate
PSS_PROCESS_INFORMATION_32
andPSS_PROCESS_INFORMATION_64
structures because to my surprise thePack
argument ofStructLayout
is 8 even on 32 bit (I naively thought it used the pointer size). It doesn't look like there is any good solution for this today, except duplicating the code: dotnet/runtime#60573.Test coverage
Added a unit test for the process metrics. They do only basic validation, as anything more complex would likely flake.