Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Exception Replay] Fixed capturing issue of async methods with await in finally block + added missing snapshot attributes + better frame matching algorithm #6549

Merged
merged 9 commits into from
Jan 30, 2025

Conversation

GreenMatan
Copy link
Contributor

@GreenMatan GreenMatan commented Jan 15, 2025

Summary of changes

  • There was an issue where certain exceptions were not captured due to misleading stack traces. The API ExceptionDispatchInfo.Capture(ex).Throw() is used (mostly by the .NET compiler although it's a public API that can also be used directly) in async methods to capture exceptions that may be rethrown on another thread. The compiler emits this call when there's await in a finally block. There are certain code constructs that result with await in finally such as await using and await foreach statements.
  • In those situations it was difficult to capture an exception stack trace because the stack trace being analyzed for instrumentation was incorrect. It showed a causality chain that has not really happened (duplicated frames that are not really resulted in recursion call but because the exception was freezed & unfreezed for rethrow). To be able to overcome that, the algorithm was improved to realize if a method is 'misleading' (duplicated in the stack trace due to ExceptionDispatchInfo usage by analyzing method's bytecode) or not and handling the enter/leave capturing accordingly.
  • The matching algorithm between the exception stack trace and the snapshots was also improved. It's now based on the string representation of the exception instead of solely new StackTrace(ex). It mitigates mismatches between a frame and a snapshot for better user experience.
  • Added the following missing attributes on snapshots produced from Exception Replay:
    • exceptionHash: represents the hash of the exception, should be identical among snapshots produced for similar exceptions in the same or different occurrence.
    • exceptionId: a UUID that is being generated when an exception is captured. It's identical among the snapshots that were uploaded together for an exception for the same occurrence.
    • frameIndex: the index, in exception.ToString(), where current snapshot is part of (indexed from 0 to the outer most user frame index, where 0 represents the inner most frame, e.g the throwing method).

Reason for change

  • Improve capturing accuracy for exception stack traces that holds misleading duplicated frames.
  • Matching frame <-> snapshot more accurately, based on the actual exception.ToString() instead of new StackTrace(ex).

Implementation details

  • Analyzing methods bytecode to determine if a method calls ExceptionDispatchInfo.Capture.
  • Improved the instrumentation logic to, at runtime, accommodate for potential false duplicated frames.
  • The indices of frames as span tags represent the actual location in the exception.ToString to avoid frame <-> snapshot confusing mismatches.

Test coverage

  • New Exception Replay tests were added that previously couldn't be captured.
  • ILAnalyzerTests that validate the functionality of the ILAnalyzer class that was added to determine if a given method's IL has a call to ExceptionDispatchInfo.Throw.
  • MethodMatcherTests a bunch of tests that make sure the new class MethodMatcher works properly. This class was added to check if a string representation of a method (taken from exception.ToString) matches a given MethodBase instance.

Fixes

DEBUG-2789

@GreenMatan GreenMatan force-pushed the matang/exception-replay-capturing-improvements branch 3 times, most recently from a3a217d to fa2c06d Compare January 15, 2025 13:24
@GreenMatan GreenMatan changed the title [Exception Replay] Better capturing async methods with await in finally block + added missing snapshot attributes + better frame matching algorithm [Exception Replay] Fixed capturing issue of async methods with await in finally block + added missing snapshot attributes + better frame matching algorithm Jan 15, 2025
@andrewlock
Copy link
Member

andrewlock commented Jan 15, 2025

Execution-Time Benchmarks Report ⏱️

Execution-time results for samples comparing the following branches/commits:

Execution-time benchmarks measure the whole time it takes to execute a program. And are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are shown in red. The following thresholds were used for comparing the execution times:

  • Welch test with statistical test for significance of 5%
  • Only results indicating a difference greater than 5% and 5 ms are considered.

Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard.

Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph).

gantt
    title Execution time (ms) FakeDbCommand (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6549) - mean (69ms)  : 66, 73
     .   : milestone, 69,
    master - mean (69ms)  : 67, 72
     .   : milestone, 69,

    section CallTarget+Inlining+NGEN
    This PR (6549) - mean (984ms)  : 963, 1006
     .   : milestone, 984,
    master - mean (983ms)  : 953, 1012
     .   : milestone, 983,

Loading
gantt
    title Execution time (ms) FakeDbCommand (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6549) - mean (108ms)  : 105, 112
     .   : milestone, 108,
    master - mean (108ms)  : 106, 110
     .   : milestone, 108,

    section CallTarget+Inlining+NGEN
    This PR (6549) - mean (681ms)  : 667, 694
     .   : milestone, 681,
    master - mean (678ms)  : 662, 695
     .   : milestone, 678,

Loading
gantt
    title Execution time (ms) FakeDbCommand (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6549) - mean (92ms)  : 90, 94
     .   : milestone, 92,
    master - mean (91ms)  : 89, 94
     .   : milestone, 91,

    section CallTarget+Inlining+NGEN
    This PR (6549) - mean (632ms)  : 616, 648
     .   : milestone, 632,
    master - mean (632ms)  : 615, 648
     .   : milestone, 632,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6549) - mean (190ms)  : 186, 194
     .   : milestone, 190,
    master - mean (190ms)  : 185, 195
     .   : milestone, 190,

    section CallTarget+Inlining+NGEN
    This PR (6549) - mean (1,090ms)  : 1059, 1121
     .   : milestone, 1090,
    master - mean (1,083ms)  : 1060, 1106
     .   : milestone, 1083,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6549) - mean (277ms)  : 272, 282
     .   : milestone, 277,
    master - mean (276ms)  : 273, 280
     .   : milestone, 276,

    section CallTarget+Inlining+NGEN
    This PR (6549) - mean (869ms)  : 842, 897
     .   : milestone, 869,
    master - mean (868ms)  : 834, 901
     .   : milestone, 868,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6549) - mean (264ms)  : 260, 267
     .   : milestone, 264,
    master - mean (264ms)  : 259, 268
     .   : milestone, 264,

    section CallTarget+Inlining+NGEN
    This PR (6549) - mean (850ms)  : 813, 888
     .   : milestone, 850,
    master - mean (842ms)  : 809, 875
     .   : milestone, 842,

Loading

@andrewlock
Copy link
Member

andrewlock commented Jan 15, 2025

Benchmarks Report for tracer 🐌

Benchmarks for #6549 compared to master:

  • 2 benchmarks are slower, with geometric mean 1.119
  • 1 benchmarks have more allocations

The following thresholds were used for comparing the benchmark speeds:

  • Mann–Whitney U test with statistical test for significance of 5%
  • Only results indicating a difference greater than 10% and 0.3 ns are considered.

Allocation changes below 0.5% are ignored.

Benchmark details

Benchmarks.Trace.ActivityBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master StartStopWithChild net6.0 8.23μs 45.9ns 315ns 0.0205 0.0123 0.0041 5.6 KB
master StartStopWithChild netcoreapp3.1 9.82μs 50.9ns 233ns 0.0146 0.00487 0 5.8 KB
master StartStopWithChild net472 16.1μs 58.7ns 219ns 1.04 0.306 0.0967 6.21 KB
#6549 StartStopWithChild net6.0 7.97μs 43.9ns 252ns 0.0153 0.00383 0 5.62 KB
#6549 StartStopWithChild netcoreapp3.1 10μs 55.6ns 356ns 0.0188 0.00471 0 5.8 KB
#6549 StartStopWithChild net472 16.1μs 50.3ns 195ns 1.07 0.339 0.105 6.22 KB
Benchmarks.Trace.AgentWriterBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master WriteAndFlushEnrichedTraces net6.0 521μs 668ns 2.5μs 0 0 0 2.7 KB
master WriteAndFlushEnrichedTraces netcoreapp3.1 650μs 343ns 1.33μs 0 0 0 2.7 KB
master WriteAndFlushEnrichedTraces net472 856μs 490ns 1.9μs 0.428 0 0 3.3 KB
#6549 WriteAndFlushEnrichedTraces net6.0 474μs 353ns 1.37μs 0 0 0 2.7 KB
#6549 WriteAndFlushEnrichedTraces netcoreapp3.1 641μs 384ns 1.44μs 0 0 0 2.7 KB
#6549 WriteAndFlushEnrichedTraces net472 853μs 819ns 3.06μs 0.422 0 0 3.3 KB
Benchmarks.Trace.AspNetCoreBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master SendRequest net6.0 129μs 404ns 1.56μs 0.195 0 0 14.47 KB
master SendRequest netcoreapp3.1 140μs 564ns 2.18μs 0.207 0 0 17.27 KB
master SendRequest net472 0.0224ns 0.00436ns 0.0169ns 0 0 0 0 b
#6549 SendRequest net6.0 130μs 576ns 2.23μs 0.13 0 0 14.47 KB
#6549 SendRequest netcoreapp3.1 148μs 127ns 474ns 0.221 0 0 17.27 KB
#6549 SendRequest net472 0ns 0ns 0ns 0 0 0 0 b
Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark - Same speed ✔️ More allocations ⚠️

More allocations ⚠️ in #6549

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces‑net6.0 41.41 KB 41.69 KB 278 B 0.67%

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master WriteAndFlushEnrichedTraces net6.0 563μs 2.68μs 11.1μs 0.566 0 0 41.41 KB
master WriteAndFlushEnrichedTraces netcoreapp3.1 660μs 2.53μs 9.45μs 0.327 0 0 41.62 KB
master WriteAndFlushEnrichedTraces net472 838μs 3.2μs 12μs 8.22 2.47 0.411 53.29 KB
#6549 WriteAndFlushEnrichedTraces net6.0 560μs 2.62μs 10.5μs 0.546 0 0 41.69 KB
#6549 WriteAndFlushEnrichedTraces netcoreapp3.1 661μs 3.58μs 19.9μs 0.317 0 0 41.79 KB
#6549 WriteAndFlushEnrichedTraces net472 829μs 3μs 11.6μs 8.28 2.48 0.414 53.34 KB
Benchmarks.Trace.DbCommandBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master ExecuteNonQuery net6.0 1.25μs 0.848ns 3.28ns 0.0144 0 0 1.02 KB
master ExecuteNonQuery netcoreapp3.1 1.71μs 1.23ns 4.59ns 0.0137 0 0 1.02 KB
master ExecuteNonQuery net472 1.92μs 2.24ns 8.68ns 0.157 0.000961 0 987 B
#6549 ExecuteNonQuery net6.0 1.2μs 0.899ns 3.48ns 0.0144 0 0 1.02 KB
#6549 ExecuteNonQuery netcoreapp3.1 1.74μs 2.35ns 9.11ns 0.0138 0 0 1.02 KB
#6549 ExecuteNonQuery net472 1.97μs 1.25ns 4.68ns 0.156 0.000994 0 987 B
Benchmarks.Trace.ElasticsearchBenchmark - Slower ⚠️ Same allocations ✔️

Slower ⚠️ in #6549

Benchmark diff/base Base Median (ns) Diff Median (ns) Modality
Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearchAsync‑net6.0 1.117 1,254.30 1,400.60

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master CallElasticsearch net6.0 1.29μs 0.681ns 2.55ns 0.0136 0 0 976 B
master CallElasticsearch netcoreapp3.1 1.5μs 0.374ns 1.35ns 0.0128 0 0 976 B
master CallElasticsearch net472 2.51μs 1.28ns 4.8ns 0.158 0 0 995 B
master CallElasticsearchAsync net6.0 1.25μs 0.539ns 2.02ns 0.0132 0 0 952 B
master CallElasticsearchAsync netcoreapp3.1 1.62μs 2.58ns 9.66ns 0.0137 0 0 1.02 KB
master CallElasticsearchAsync net472 2.62μs 1.68ns 6.28ns 0.167 0 0 1.05 KB
#6549 CallElasticsearch net6.0 1.23μs 1.17ns 4.53ns 0.0136 0 0 976 B
#6549 CallElasticsearch netcoreapp3.1 1.51μs 1.09ns 4.23ns 0.0129 0 0 976 B
#6549 CallElasticsearch net472 2.58μs 2.17ns 8.13ns 0.157 0 0 995 B
#6549 CallElasticsearchAsync net6.0 1.4μs 0.773ns 2.99ns 0.0133 0 0 952 B
#6549 CallElasticsearchAsync netcoreapp3.1 1.65μs 0.595ns 2.3ns 0.0141 0 0 1.02 KB
#6549 CallElasticsearchAsync net472 2.69μs 1.4ns 5.42ns 0.167 0 0 1.05 KB
Benchmarks.Trace.GraphQLBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master ExecuteAsync net6.0 1.31μs 0.619ns 2.4ns 0.0132 0 0 952 B
master ExecuteAsync netcoreapp3.1 1.63μs 0.778ns 3.01ns 0.0124 0 0 952 B
master ExecuteAsync net472 1.84μs 0.383ns 1.43ns 0.145 0 0 915 B
#6549 ExecuteAsync net6.0 1.29μs 0.372ns 1.44ns 0.0128 0 0 952 B
#6549 ExecuteAsync netcoreapp3.1 1.69μs 0.925ns 3.46ns 0.0128 0 0 952 B
#6549 ExecuteAsync net472 1.85μs 3.86ns 14.4ns 0.145 0 0 915 B
Benchmarks.Trace.HttpClientBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master SendAsync net6.0 4.42μs 1.51ns 5.66ns 0.0309 0 0 2.31 KB
master SendAsync netcoreapp3.1 5.38μs 1.63ns 6.1ns 0.0376 0 0 2.85 KB
master SendAsync net472 7.38μs 1.79ns 6.95ns 0.496 0 0 3.12 KB
#6549 SendAsync net6.0 4.33μs 1.49ns 5.56ns 0.0327 0 0 2.31 KB
#6549 SendAsync netcoreapp3.1 5.35μs 2.06ns 7.7ns 0.0374 0 0 2.85 KB
#6549 SendAsync net472 7.38μs 1.53ns 5.91ns 0.493 0 0 3.12 KB
Benchmarks.Trace.ILoggerBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 1.52μs 0.699ns 2.61ns 0.0228 0 0 1.64 KB
master EnrichedLog netcoreapp3.1 2.26μs 1.61ns 6.24ns 0.0214 0 0 1.64 KB
master EnrichedLog net472 2.53μs 0.661ns 2.47ns 0.249 0 0 1.57 KB
#6549 EnrichedLog net6.0 1.68μs 0.754ns 2.72ns 0.0226 0 0 1.64 KB
#6549 EnrichedLog netcoreapp3.1 2.22μs 1.95ns 7.56ns 0.022 0 0 1.64 KB
#6549 EnrichedLog net472 2.61μs 1.02ns 3.95ns 0.249 0 0 1.57 KB
Benchmarks.Trace.Log4netBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 116μs 146ns 545ns 0.0582 0 0 4.28 KB
master EnrichedLog netcoreapp3.1 120μs 166ns 644ns 0.0597 0 0 4.28 KB
master EnrichedLog net472 150μs 82.9ns 321ns 0.677 0.226 0 4.46 KB
#6549 EnrichedLog net6.0 116μs 149ns 576ns 0.0578 0 0 4.28 KB
#6549 EnrichedLog netcoreapp3.1 119μs 159ns 615ns 0.0595 0 0 4.28 KB
#6549 EnrichedLog net472 150μs 104ns 374ns 0.673 0.224 0 4.46 KB
Benchmarks.Trace.NLogBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 3.16μs 1.52ns 5.48ns 0.03 0 0 2.2 KB
master EnrichedLog netcoreapp3.1 4.34μs 1.67ns 6.47ns 0.0283 0 0 2.2 KB
master EnrichedLog net472 5.02μs 1.31ns 5.06ns 0.319 0 0 2.02 KB
#6549 EnrichedLog net6.0 2.88μs 1.2ns 4.65ns 0.0306 0 0 2.2 KB
#6549 EnrichedLog netcoreapp3.1 4.34μs 5.08ns 19.7ns 0.0301 0 0 2.2 KB
#6549 EnrichedLog net472 4.77μs 1.42ns 5.49ns 0.319 0 0 2.02 KB
Benchmarks.Trace.RedisBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master SendReceive net6.0 1.46μs 0.5ns 1.94ns 0.0161 0 0 1.14 KB
master SendReceive netcoreapp3.1 1.89μs 0.609ns 2.28ns 0.0151 0 0 1.14 KB
master SendReceive net472 2.15μs 0.785ns 3.04ns 0.183 0 0 1.16 KB
#6549 SendReceive net6.0 1.38μs 0.489ns 1.83ns 0.016 0 0 1.14 KB
#6549 SendReceive netcoreapp3.1 1.76μs 1.54ns 5.56ns 0.0156 0 0 1.14 KB
#6549 SendReceive net472 2.12μs 0.684ns 2.65ns 0.183 0 0 1.16 KB
Benchmarks.Trace.SerilogBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 2.83μs 0.957ns 3.7ns 0.0213 0 0 1.6 KB
master EnrichedLog netcoreapp3.1 3.94μs 3.02ns 11.7ns 0.0217 0 0 1.65 KB
master EnrichedLog net472 4.46μs 2.86ns 11.1ns 0.322 0 0 2.04 KB
#6549 EnrichedLog net6.0 2.75μs 1.14ns 4.12ns 0.022 0 0 1.6 KB
#6549 EnrichedLog netcoreapp3.1 3.85μs 1.1ns 4.27ns 0.0212 0 0 1.65 KB
#6549 EnrichedLog net472 4.39μs 2.69ns 10.4ns 0.322 0 0 2.04 KB
Benchmarks.Trace.SpanBenchmark - Slower ⚠️ Same allocations ✔️

Slower ⚠️ in #6549

Benchmark diff/base Base Median (ns) Diff Median (ns) Modality
Benchmarks.Trace.SpanBenchmark.StartFinishSpan‑net6.0 1.121 396.60 444.55

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master StartFinishSpan net6.0 396ns 0.4ns 1.55ns 0.0081 0 0 576 B
master StartFinishSpan netcoreapp3.1 577ns 0.721ns 2.79ns 0.00786 0 0 576 B
master StartFinishSpan net472 645ns 1.54ns 5.97ns 0.0916 0 0 578 B
master StartFinishScope net6.0 539ns 0.536ns 2.07ns 0.00986 0 0 696 B
master StartFinishScope netcoreapp3.1 688ns 1.1ns 4.25ns 0.00943 0 0 696 B
master StartFinishScope net472 799ns 2.8ns 10.8ns 0.104 0 0 658 B
#6549 StartFinishSpan net6.0 443ns 0.9ns 3.49ns 0.008 0 0 576 B
#6549 StartFinishSpan netcoreapp3.1 605ns 0.231ns 0.865ns 0.00789 0 0 576 B
#6549 StartFinishSpan net472 661ns 1.33ns 5.14ns 0.0918 0 0 578 B
#6549 StartFinishScope net6.0 576ns 1.02ns 3.95ns 0.00989 0 0 696 B
#6549 StartFinishScope netcoreapp3.1 764ns 0.959ns 3.46ns 0.00941 0 0 696 B
#6549 StartFinishScope net472 754ns 1.73ns 6.72ns 0.104 0 0 658 B
Benchmarks.Trace.TraceAnnotationsBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master RunOnMethodBegin net6.0 655ns 0.62ns 2.4ns 0.00988 0 0 696 B
master RunOnMethodBegin netcoreapp3.1 939ns 1.56ns 6.05ns 0.00936 0 0 696 B
master RunOnMethodBegin net472 1.05μs 3.37ns 12.6ns 0.104 0 0 658 B
#6549 RunOnMethodBegin net6.0 595ns 0.827ns 3.2ns 0.00979 0 0 696 B
#6549 RunOnMethodBegin netcoreapp3.1 963ns 1.46ns 5.64ns 0.00914 0 0 696 B
#6549 RunOnMethodBegin net472 1.09μs 2.57ns 9.94ns 0.104 0 0 658 B

@datadog-ddstaging
Copy link

datadog-ddstaging bot commented Jan 15, 2025

Datadog Report

Branch report: matang/exception-replay-capturing-improvements
Commit report: 9ad36af
Test service: dd-trace-dotnet

✅ 0 Failed, 254144 Passed, 2966 Skipped, 32h 54m 57.98s Total Time
❄️ 1 New Flaky

New Flaky Tests (1)

  • SubmitTraces - Datadog.Trace.ClrProfiler.IntegrationTests.RabbitMQTests - Last Failure

    Expand for error
     The sample did not exit in 600000ms. Memory dump taken: True. Killing process.
    

@andrewlock
Copy link
Member

andrewlock commented Jan 16, 2025

Throughput/Crank Report ⚡

Throughput results for AspNetCoreSimpleController comparing the following branches/commits:

Cases where throughput results for the PR are worse than latest master (5% drop or greater), results are shown in red.

Note that these results are based on a single point-in-time result for each branch. For full results, see one of the many, many dashboards!

gantt
    title Throughput Linux x64 (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (6549) (11.209M)   : 0, 11208594
    master (11.186M)   : 0, 11186080
    benchmarks/2.9.0 (11.045M)   : 0, 11045405

    section Automatic
    This PR (6549) (7.276M)   : 0, 7276189
    master (7.398M)   : 0, 7397628
    benchmarks/2.9.0 (7.885M)   : 0, 7885346

    section Trace stats
    master (7.551M)   : 0, 7550829

    section Manual
    master (11.324M)   : 0, 11323840

    section Manual + Automatic
    This PR (6549) (6.694M)   : 0, 6693660
    master (6.734M)   : 0, 6734233

    section DD_TRACE_ENABLED=0
    master (10.344M)   : 0, 10344009

Loading
gantt
    title Throughput Linux arm64 (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (6549) (9.803M)   : 0, 9803091
    master (9.606M)   : 0, 9606391
    benchmarks/2.9.0 (9.586M)   : 0, 9586476

    section Automatic
    This PR (6549) (6.470M)   : 0, 6469733
    master (6.521M)   : 0, 6520510

    section Trace stats
    master (6.884M)   : 0, 6883687

    section Manual
    master (9.760M)   : 0, 9760374

    section Manual + Automatic
    This PR (6549) (6.042M)   : 0, 6041924
    master (6.105M)   : 0, 6104998

    section DD_TRACE_ENABLED=0
    master (9.122M)   : 0, 9122141

Loading
gantt
    title Throughput Windows x64 (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (6549) (9.994M)   : 0, 9994343
    master (9.907M)   : 0, 9907248

    section Automatic
    This PR (6549) (6.502M)   : 0, 6501677
    master (6.540M)   : 0, 6540109

    section Trace stats
    master (6.964M)   : 0, 6963736

    section Manual
    master (10.035M)   : 0, 10035406

    section Manual + Automatic
    This PR (6549) (6.092M)   : 0, 6091650
    master (6.140M)   : 0, 6139744

    section DD_TRACE_ENABLED=0
    master (9.414M)   : 0, 9414341

Loading

@GreenMatan GreenMatan force-pushed the matang/exception-replay-capturing-improvements branch from fa2c06d to 4d8bd48 Compare January 19, 2025 14:25
Copy link
Contributor

github-actions bot commented Jan 19, 2025

Snapshots difference summary

The following differences have been observed in committed snapshots. It is meant to help the reviewer.
The diff is simplistic, so please check some files anyway while we improve it.

49 occurrences of :

+        "exceptionHash": "ScrubbedValue",
+        "exceptionId": "ScrubbedValue",
+        "frameIndex": "ScrubbedValue",

@GreenMatan GreenMatan marked this pull request as ready for review January 19, 2025 14:50
@GreenMatan GreenMatan requested a review from a team as a code owner January 19, 2025 14:50
@GreenMatan GreenMatan force-pushed the matang/exception-replay-capturing-improvements branch 7 times, most recently from 752dcbe to 9ad36af Compare January 21, 2025 17:21
Copy link
Contributor

@dudikeleti dudikeleti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left some tiny comments

@GreenMatan GreenMatan force-pushed the matang/exception-replay-capturing-improvements branch from 9ad36af to 4312d59 Compare January 28, 2025 11:19
@datadog-ddstaging
Copy link

datadog-ddstaging bot commented Jan 28, 2025

Datadog Report

Branch report: matang/exception-replay-capturing-improvements
Commit report: f6733e9
Test service: dd-trace-dotnet

✅ 0 Failed, 257238 Passed, 3054 Skipped, 32h 30m 56.07s Total Time

@GreenMatan GreenMatan force-pushed the matang/exception-replay-capturing-improvements branch 2 times, most recently from eaaacd4 to 761fc13 Compare January 28, 2025 20:49
@GreenMatan GreenMatan force-pushed the matang/exception-replay-capturing-improvements branch from d6bfd1a to f6733e9 Compare January 29, 2025 18:21
@GreenMatan GreenMatan merged commit 548aa89 into master Jan 30, 2025
139 of 142 checks passed
@GreenMatan GreenMatan deleted the matang/exception-replay-capturing-improvements branch January 30, 2025 12:20
@github-actions github-actions bot added this to the vNext-v3 milestone Jan 30, 2025
nhulston pushed a commit that referenced this pull request Jan 30, 2025
…` in finally block + added missing snapshot attributes + better frame matching algorithm (#6549)

## Summary of changes
- There was an issue where certain exceptions were not captured due to
misleading stack traces. The API
`ExceptionDispatchInfo.Capture(ex).Throw()` is used (mostly by the .NET
compiler although it's a public API that can also be used directly) in
async methods to capture exceptions that may be rethrown on another
thread. The compiler emits this call when there's `await` in a `finally`
block. There are certain code constructs that result with `await` in
`finally` such as `await using` and `await foreach` statements.
- In those situations it was difficult to capture an exception stack
trace because the stack trace being analyzed for instrumentation was
incorrect. It showed a causality chain that has not really happened
(duplicated frames that are not really resulted in recursion call but
because the exception was freezed & unfreezed for rethrow). To be able
to overcome that, the algorithm was improved to realize if a method is
'misleading' (duplicated in the stack trace due to
`ExceptionDispatchInfo` usage by analyzing method's bytecode) or not and
handling the enter/leave capturing accordingly.
- The matching algorithm between the exception stack trace and the
snapshots was also improved. It's now based on the string representation
of the exception instead of solely `new StackTrace(ex)`. It mitigates
mismatches between a frame and a snapshot for better user experience.
- Added the following missing attributes on snapshots produced from
Exception Replay:
- `exceptionHash`: represents the hash of the exception, should be
identical among snapshots produced for similar exceptions in the same or
different occurrence.
- `exceptionId`: a UUID that is being generated when an exception is
captured. It's identical among the snapshots that were uploaded together
for an exception for the same occurrence.
- `frameIndex`: the index, in `exception.ToString()`, where current
snapshot is part of (indexed from `0` to the outer most user frame
index, where `0` represents the inner most frame, e.g the throwing
method).
 
## Reason for change
- Improve capturing accuracy for exception stack traces that holds
misleading duplicated frames.
- Matching frame <-> snapshot more accurately, based on the actual
`exception.ToString()` instead of `new StackTrace(ex)`.

## Implementation details
- Analyzing methods bytecode to determine if a method calls
`ExceptionDispatchInfo.Capture`.
- Improved the instrumentation logic to, at runtime, accommodate for
potential false duplicated frames.
- The indices of frames as span tags represent the actual location in
the `exception.ToString` to avoid frame <-> snapshot confusing
mismatches.

## Test coverage
- New Exception Replay tests were added that previously couldn't be
captured.
- `ILAnalyzerTests` that validate the functionality of the `ILAnalyzer`
class that was added to determine if a given method's IL has a call to
`ExceptionDispatchInfo.Throw`.
- `MethodMatcherTests` a bunch of tests that make sure the new class
`MethodMatcher` works properly. This class was added to check if a
string representation of a method (taken from `exception.ToString`)
matches a given `MethodBase` instance.

## Fixes
DEBUG-2789
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants