Skip to content

Commit 01158ef

Browse files
authored
Fix orphaned spans in WCF traces (#7842)
## Summary of changes Fixes orphan traces in WCF ## Reason for change Many customers have reported that in v3, with "delayed WCF" instrumentation enabled, they get orphan spans. This can be easily reproduced, so this PR fixes the issue. ## Background There are two "modes" WCF can operate in: delayed instrumentation, and "simple" instrumentation. Delayed instrumentation was introduced in #1992 to avoid a problematic error scenario. Instead of instrumenting early across the whole "channel", we delayed instrumentation. A subsequent PR #5206 updated the instrumentation point to ensure the wcf span was active while `IDispatchMessageInspector` were running. This was somewhat complex, due to the convoluted and legacy way wcf uses async and threading. Basically, we use a weak table to hold the current span, and us the `MessageRpc` object as a key. Later on, we retrieve the scope and dispose it. ## Implementation details The missing part was that we were not "restoring" the wcf scope for when the actual Invoker implementations executed. The process was: - Update the tests and snapshots to: - Record _all_ spans and traces (it makes it easier to [understand the snapshots using](https://andrewlock.github.io/DatadogSpanSnapshotViewer/)) - Create a span inside the "service" calls. - These _should_ have been children of the wcf server spans, but they were not. They were orphaned - Update the "Invoker" implementations to grab the wcf spans, make it active, and then deactivate it afterwards - Regenerate the snapshots, and confirm the custom spans are no longer orphaned 🎉 Additionally, do a small amount of refactoring: - "Fix" nullability in some places - Extract the "restore wcf span" functionality to `WcfCommon` as it's used in multiple places ## Test coverage Covered by existing, after the tweaks above. The snapshots change like this: Before the fix, after adding the additional custom spans, `WSHttpBinding` traces: <img width="1468" height="622" alt="image" src="https://github.com/user-attachments/assets/7973514c-660e-4871-8c4d-7495473e9944" /> After the fix, `WSHttpBinding` traces: <img width="1459" height="96" alt="image" src="https://github.com/user-attachments/assets/0698e5d3-cb06-4ea6-ab00-5a3680f7e3ee" /> Before the fix, after adding the additional custom spans, WebHttp traces: <img width="1458" height="268" alt="image" src="https://github.com/user-attachments/assets/b0a9f4c9-8584-4fa2-9441-74268d6fd279" /> After the fix, Web Http traces <img width="1465" height="126" alt="image" src="https://github.com/user-attachments/assets/b06f0255-0818-4516-a6a1-b1807c9781ad" /> Unfortunately, this ends up being somewhat unstable in regards to sorting, so I did some more refactoring to separate the root span for each separate httpcall. This should make sure we have more stable ordering of the spans using the built-in sort order The result is we get separate traces for each API call instead: <img width="1458" height="713" alt="image" src="https://github.com/user-attachments/assets/989a0cc8-befa-429f-9a9e-ede09c2abc48" /> ## Other details I wonder if we really need a weak table here? 🤔 seeing as we have an instrumentation point that explicitly disposes the span, shouldn't we just remove the entry at the same time?
1 parent 55cc088 commit 01158ef

File tree

44 files changed

+22200
-2498
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+22200
-2498
lines changed

tracer/src/Datadog.Trace/ClrProfiler/AutoInstrumentation/Wcf/AfterReceiveRequestIntegration.cs

Lines changed: 5 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ internal static CallTargetState OnMethodBegin<TTarget, TMessageRpc>(TTarget inst
7070
/// <param name="exception">Exception instance in case the original code threw an exception.</param>
7171
/// <param name="state">Calltarget state value</param>
7272
/// <returns>A response value, in an async scenario will be T of Task of T</returns>
73-
internal static CallTargetReturn OnMethodEnd<TTarget>(TTarget instance, Exception exception, in CallTargetState state)
73+
internal static CallTargetReturn OnMethodEnd<TTarget>(TTarget instance, Exception? exception, in CallTargetState state)
7474
{
7575
// Add an exception and close the span only if there was an uncaught exception,
7676
// which should only happen if one of the IClientMessageInspector's has thrown an exception.
@@ -80,17 +80,10 @@ internal static CallTargetReturn OnMethodEnd<TTarget>(TTarget instance, Exceptio
8080
state.Scope.DisposeWithException(exception);
8181
}
8282

83-
if (state.Scope != null)
84-
{
85-
// OnMethodBegin started an active span that can be accessed by IDispatchMessageInspector's and the actual WCF endpoint
86-
// Before returning, we must reset the scope to the previous active scope, so that callers of this method do not see this scope
87-
// Don't worry, this will be accessed and closed by the BeforeSendReplyIntegration
88-
if (Tracer.Instance.ScopeManager is IScopeRawAccess rawAccess)
89-
{
90-
rawAccess.Active = state.PreviousScope;
91-
DistributedTracer.Instance.SetSpanContext(state.PreviousDistributedSpanContext);
92-
}
93-
}
83+
// OnMethodBegin started an active span that can be accessed by IDispatchMessageInspector's and the actual WCF endpoint
84+
// Before returning, we must reset the scope to the previous active scope, so that callers of this method do not see this scope
85+
// Don't worry, this will be accessed and closed by the BeforeSendReplyIntegration
86+
WcfCommon.RestorePreviousScope(in state);
9487

9588
return CallTargetReturn.GetDefault();
9689
}

tracer/src/Datadog.Trace/ClrProfiler/AutoInstrumentation/Wcf/AsyncMethodInvoker_InvokeBegin_Integration.cs

Lines changed: 15 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,6 @@
99
using System;
1010
using System.ComponentModel;
1111
using Datadog.Trace.ClrProfiler.CallTarget;
12-
using Datadog.Trace.DuckTyping;
1312

1413
namespace Datadog.Trace.ClrProfiler.AutoInstrumentation.Wcf
1514
{
@@ -29,27 +28,27 @@ namespace Datadog.Trace.ClrProfiler.AutoInstrumentation.Wcf
2928
[EditorBrowsable(EditorBrowsableState.Never)]
3029
public class AsyncMethodInvoker_InvokeBegin_Integration
3130
{
32-
internal static CallTargetReturn<TReturn> OnMethodEnd<TTarget, TReturn>(TTarget instance, TReturn returnValue, Exception exception, in CallTargetState state)
31+
internal static CallTargetState OnMethodBegin<TTarget>(TTarget instance, object? instanceArg, object[]? inputs, ref AsyncCallback? callback, object? state)
3332
{
34-
if (exception is not null)
33+
var tracer = Tracer.Instance;
34+
if (!tracer.CurrentTraceSettings.Settings.IsIntegrationEnabled(WcfCommon.IntegrationId) || !tracer.Settings.DelayWcfInstrumentationEnabled || WcfCommon.GetCurrentOperationContext is null)
3535
{
36-
var operationContext = WcfCommon.GetCurrentOperationContext?.Invoke();
36+
return CallTargetState.GetDefault();
37+
}
3738

38-
if (operationContext != null && operationContext.TryDuckCast<IOperationContextStruct>(out var operationContextProxy))
39-
{
40-
var requestContext = operationContextProxy.RequestContext;
39+
return WcfCommon.ActivateScopeFromContext();
40+
}
4141

42-
// Retrieve the scope that we saved during InvokeBegin
43-
if (((IDuckType?)requestContext)?.Instance is object requestContextInstance
44-
&& WcfCommon.Scopes.TryGetValue(requestContextInstance, out var scope))
45-
{
46-
// Add the exception but do not dispose the span.
47-
// BeforeSendReplyIntegration is responsible for closing the span.
48-
scope.Span?.SetException(exception);
49-
}
50-
}
42+
internal static CallTargetReturn<TReturn> OnMethodEnd<TTarget, TReturn>(TTarget instance, TReturn returnValue, Exception? exception, in CallTargetState state)
43+
{
44+
if (state.Scope is not null && exception is not null)
45+
{
46+
// Add the exception but do not dispose the scope.
47+
// BeforeSendReplyIntegration is responsible for closing the span.
48+
state.Scope.Span.SetException(exception);
5149
}
5250

51+
WcfCommon.RestorePreviousScope(in state);
5352
return new CallTargetReturn<TReturn>(returnValue);
5453
}
5554
}

tracer/src/Datadog.Trace/ClrProfiler/AutoInstrumentation/Wcf/AsyncMethodInvoker_InvokeEnd_Integration.cs

Lines changed: 17 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,6 @@
99
using System;
1010
using System.ComponentModel;
1111
using Datadog.Trace.ClrProfiler.CallTarget;
12-
using Datadog.Trace.DuckTyping;
1312

1413
namespace Datadog.Trace.ClrProfiler.AutoInstrumentation.Wcf
1514
{
@@ -29,6 +28,17 @@ namespace Datadog.Trace.ClrProfiler.AutoInstrumentation.Wcf
2928
[EditorBrowsable(EditorBrowsableState.Never)]
3029
public class AsyncMethodInvoker_InvokeEnd_Integration
3130
{
31+
internal static CallTargetState OnMethodBegin<TTarget>(TTarget instance, object? instanceArg, object[]? outputs, IAsyncResult? result)
32+
{
33+
var tracer = Tracer.Instance;
34+
if (!tracer.CurrentTraceSettings.Settings.IsIntegrationEnabled(WcfCommon.IntegrationId) || !tracer.Settings.DelayWcfInstrumentationEnabled || WcfCommon.GetCurrentOperationContext is null)
35+
{
36+
return CallTargetState.GetDefault();
37+
}
38+
39+
return WcfCommon.ActivateScopeFromContext();
40+
}
41+
3242
/// <summary>
3343
/// OnMethodEnd callback
3444
/// </summary>
@@ -39,27 +49,16 @@ public class AsyncMethodInvoker_InvokeEnd_Integration
3949
/// <param name="exception">Exception instance in case the original code threw an exception.</param>
4050
/// <param name="state">Calltarget state value</param>
4151
/// <returns>A response value, in an async scenario will be T of Task of T</returns>
42-
internal static CallTargetReturn<TReturn> OnMethodEnd<TTarget, TReturn>(TTarget instance, TReturn returnValue, Exception exception, in CallTargetState state)
52+
internal static CallTargetReturn<TReturn> OnMethodEnd<TTarget, TReturn>(TTarget instance, TReturn returnValue, Exception? exception, in CallTargetState state)
4353
{
44-
if (exception is not null)
54+
if (state.Scope is not null && exception is not null)
4555
{
46-
var operationContext = WcfCommon.GetCurrentOperationContext?.Invoke();
47-
48-
if (operationContext != null && operationContext.TryDuckCast<IOperationContextStruct>(out var operationContextProxy))
49-
{
50-
var requestContext = operationContextProxy.RequestContext;
51-
52-
// Retrieve the scope that we saved during InvokeBegin
53-
if (((IDuckType?)requestContext)?.Instance is object requestContextInstance
54-
&& WcfCommon.Scopes.TryGetValue(requestContextInstance, out var scope))
55-
{
56-
// Add the exception but do not dispose the span.
57-
// BeforeSendReplyIntegration is responsible for closing the span.
58-
scope.Span?.SetException(exception);
59-
}
60-
}
56+
// Add the exception but do not dispose the scope.
57+
// BeforeSendReplyIntegration is responsible for closing the span.
58+
state.Scope.Span.SetException(exception);
6159
}
6260

61+
WcfCommon.RestorePreviousScope(in state);
6362
return new CallTargetReturn<TReturn>(returnValue);
6463
}
6564
}

tracer/src/Datadog.Trace/ClrProfiler/AutoInstrumentation/Wcf/BeforeSendReplyIntegration.cs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ internal static CallTargetState OnMethodBegin<TTarget, TMessageRpc>(TTarget inst
4848
}
4949

5050
var rpcProxy = rpc.DuckCast<MessageRpcStruct>();
51-
if (((IDuckType?)rpcProxy.OperationContext.RequestContext)?.Instance is object requestContextInstance
51+
if (rpcProxy.OperationContext.RequestContext?.Instance is { } requestContextInstance
5252
&& WcfCommon.Scopes.TryGetValue(requestContextInstance, out var scope))
5353
{
5454
return new CallTargetState(scope);
@@ -65,7 +65,7 @@ internal static CallTargetState OnMethodBegin<TTarget, TMessageRpc>(TTarget inst
6565
/// <param name="exception">Exception instance in case the original code threw an exception.</param>
6666
/// <param name="state">Calltarget state value</param>
6767
/// <returns>A response value, in an async scenario will be T of Task of T</returns>
68-
internal static CallTargetReturn OnMethodEnd<TTarget>(TTarget instance, Exception exception, in CallTargetState state)
68+
internal static CallTargetReturn OnMethodEnd<TTarget>(TTarget instance, Exception? exception, in CallTargetState state)
6969
{
7070
state.Scope.DisposeWithException(exception);
7171
return CallTargetReturn.GetDefault();

tracer/src/Datadog.Trace/ClrProfiler/AutoInstrumentation/Wcf/ChannelHandlerIntegration.cs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ internal static CallTargetState OnMethodBegin<TTarget, TRequestContext, TOperati
6363
/// <param name="exception">Exception instance in case the original code threw an exception.</param>
6464
/// <param name="state">Calltarget state value</param>
6565
/// <returns>A response value, in an async scenario will be T of Task of T</returns>
66-
internal static CallTargetReturn<TReturn> OnMethodEnd<TTarget, TReturn>(TTarget instance, TReturn returnValue, Exception exception, in CallTargetState state)
66+
internal static CallTargetReturn<TReturn> OnMethodEnd<TTarget, TReturn>(TTarget instance, TReturn returnValue, Exception? exception, in CallTargetState state)
6767
{
6868
state.Scope.DisposeWithException(exception);
6969
return new CallTargetReturn<TReturn>(returnValue);

tracer/src/Datadog.Trace/ClrProfiler/AutoInstrumentation/Wcf/IRequestContext.cs

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,14 @@
55

66
#nullable enable
77

8+
using Datadog.Trace.DuckTyping;
9+
810
namespace Datadog.Trace.ClrProfiler.AutoInstrumentation.Wcf
911
{
1012
/// <summary>
1113
/// System.ServiceModel.Channels.RequestContext interface for duck-typing
1214
/// </summary>
13-
internal interface IRequestContext
15+
internal interface IRequestContext : IDuckType
1416
{
1517
/// <summary>
1618
/// Gets the request message

tracer/src/Datadog.Trace/ClrProfiler/AutoInstrumentation/Wcf/SyncMethodInvokerIntegration.cs

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,6 @@
99
using System;
1010
using System.ComponentModel;
1111
using Datadog.Trace.ClrProfiler.CallTarget;
12-
using Datadog.Trace.DuckTyping;
1312

1413
namespace Datadog.Trace.ClrProfiler.AutoInstrumentation.Wcf
1514
{
@@ -40,12 +39,13 @@ public class SyncMethodInvokerIntegration
4039
/// <returns>Calltarget state value</returns>
4140
internal static CallTargetState OnMethodBegin<TTarget>(TTarget instance, object instanceArg, object[] inputs, ref object[] outputs)
4241
{
43-
if (!Tracer.Instance.CurrentTraceSettings.Settings.IsIntegrationEnabled(WcfCommon.IntegrationId) || !Tracer.Instance.Settings.DelayWcfInstrumentationEnabled || WcfCommon.GetCurrentOperationContext is null)
42+
var tracer = Tracer.Instance;
43+
if (!tracer.CurrentTraceSettings.Settings.IsIntegrationEnabled(WcfCommon.IntegrationId) || !tracer.Settings.DelayWcfInstrumentationEnabled || WcfCommon.GetCurrentOperationContext is null)
4444
{
4545
return CallTargetState.GetDefault();
4646
}
4747

48-
return new CallTargetState(Tracer.Instance.ActiveScope as Scope);
48+
return WcfCommon.ActivateScopeFromContext();
4949
}
5050

5151
/// <summary>
@@ -58,15 +58,16 @@ internal static CallTargetState OnMethodBegin<TTarget>(TTarget instance, object
5858
/// <param name="exception">Exception instance in case the original code threw an exception.</param>
5959
/// <param name="state">Calltarget state value</param>
6060
/// <returns>A response value, in an async scenario will be T of Task of T</returns>
61-
internal static CallTargetReturn<TReturn> OnMethodEnd<TTarget, TReturn>(TTarget instance, TReturn returnValue, Exception exception, in CallTargetState state)
61+
internal static CallTargetReturn<TReturn> OnMethodEnd<TTarget, TReturn>(TTarget instance, TReturn returnValue, Exception? exception, in CallTargetState state)
6262
{
6363
if (state.Scope is not null && exception is not null)
6464
{
6565
// Add the exception but do not dispose the scope.
6666
// BeforeSendReplyIntegration is responsible for closing the span.
67-
state.Scope.Span?.SetException(exception);
67+
state.Scope.Span.SetException(exception);
6868
}
6969

70+
WcfCommon.RestorePreviousScope(in state);
7071
return new CallTargetReturn<TReturn>(returnValue);
7172
}
7273
}

tracer/src/Datadog.Trace/ClrProfiler/AutoInstrumentation/Wcf/TaskMethodInvokerIntegration.cs

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,6 @@
99
using System;
1010
using System.ComponentModel;
1111
using Datadog.Trace.ClrProfiler.CallTarget;
12-
using Datadog.Trace.Configuration;
13-
using Datadog.Trace.DuckTyping;
1412

1513
namespace Datadog.Trace.ClrProfiler.AutoInstrumentation.Wcf
1614
{
@@ -40,12 +38,13 @@ public class TaskMethodInvokerIntegration
4038
/// <returns>Calltarget state value</returns>
4139
internal static CallTargetState OnMethodBegin<TTarget>(TTarget instance, object instanceArg, object[] inputs)
4240
{
43-
if (!Tracer.Instance.CurrentTraceSettings.Settings.IsIntegrationEnabled(WcfCommon.IntegrationId) || !Tracer.Instance.Settings.DelayWcfInstrumentationEnabled || WcfCommon.GetCurrentOperationContext is null)
41+
var tracer = Tracer.Instance;
42+
if (!tracer.CurrentTraceSettings.Settings.IsIntegrationEnabled(WcfCommon.IntegrationId) || !tracer.Settings.DelayWcfInstrumentationEnabled || WcfCommon.GetCurrentOperationContext is null)
4443
{
4544
return CallTargetState.GetDefault();
4645
}
4746

48-
return new CallTargetState(Tracer.Instance.ActiveScope as Scope);
47+
return WcfCommon.ActivateScopeFromContext();
4948
}
5049

5150
/// <summary>
@@ -58,15 +57,16 @@ internal static CallTargetState OnMethodBegin<TTarget>(TTarget instance, object
5857
/// <param name="exception">Exception instance in case the original code threw an exception.</param>
5958
/// <param name="state">Calltarget state value</param>
6059
/// <returns>A response value, in an async scenario will be T of Task of T</returns>
61-
internal static TResponse OnAsyncMethodEnd<TTarget, TResponse>(TTarget instance, TResponse returnValue, Exception exception, in CallTargetState state)
60+
internal static TResponse OnAsyncMethodEnd<TTarget, TResponse>(TTarget instance, TResponse returnValue, Exception? exception, in CallTargetState state)
6261
{
6362
if (state.Scope is not null && exception is not null)
6463
{
6564
// Add the exception but do not dispose the scope.
6665
// BeforeSendReplyIntegration is responsible for closing the span.
67-
state.Scope.Span?.SetException(exception);
66+
state.Scope.Span.SetException(exception);
6867
}
6968

69+
WcfCommon.RestorePreviousScope(in state);
7070
return returnValue;
7171
}
7272
}

tracer/src/Datadog.Trace/ClrProfiler/AutoInstrumentation/Wcf/WcfCommon.cs

Lines changed: 49 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
using System.Net;
1313
using System.Reflection;
1414
using System.Runtime.CompilerServices;
15+
using Datadog.Trace.ClrProfiler.CallTarget;
1516
using Datadog.Trace.Configuration;
1617
using Datadog.Trace.DuckTyping;
1718
using Datadog.Trace.ExtensionMethods;
@@ -77,7 +78,11 @@ internal class WcfCommon
7778
httpMethod = httpRequestPropertyProxy.Method?.ToUpperInvariant();
7879

7980
// try to extract propagated context values from http headers
80-
if (tracer.ActiveScope == null)
81+
if (tracer.ActiveScope is { } activeScope)
82+
{
83+
Log.Warning("Skipped extracting headers due to existing scope: {ActiveScope}", activeScope.Span);
84+
}
85+
else
8186
{
8287
try
8388
{
@@ -216,6 +221,49 @@ internal class WcfCommon
216221

217222
return null;
218223
}
224+
225+
public static CallTargetState ActivateScopeFromContext()
226+
{
227+
// First, capture the active scope
228+
var tracer = Tracer.Instance;
229+
var activeScope = tracer.InternalActiveScope;
230+
231+
var operationContext = WcfCommon.GetCurrentOperationContext?.Invoke();
232+
if (operationContext != null && operationContext.TryDuckCast<IOperationContextStruct>(out var operationContextProxy))
233+
{
234+
var requestContext = operationContextProxy.RequestContext;
235+
236+
// Retrieve the scope that we saved during InvokeBegin
237+
if (requestContext?.Instance is { } requestContextInstance
238+
&& Scopes.TryGetValue(requestContextInstance, out var scope)
239+
&& scope is not null
240+
&& scope.Span.SpanId != activeScope?.Span.SpanId)
241+
{
242+
// capture the raw context for later activation
243+
var spanContextRaw = DistributedTracer.Instance.GetSpanContextRaw() ?? activeScope?.Span.Context;
244+
Log.Debug("Activating scope from operation context {ActivatedSpan}", scope.Span);
245+
246+
tracer.ActivateSpan(scope.Span);
247+
// Add the exception but do not dispose the span.
248+
// BeforeSendReplyIntegration is responsible for closing the span.
249+
return new CallTargetState(scope, activeScope, spanContextRaw);
250+
}
251+
}
252+
253+
return CallTargetState.GetDefault();
254+
}
255+
256+
public static void RestorePreviousScope(in CallTargetState state)
257+
{
258+
// Reset the scope to the previous active scope, so that callers of this method do not see this scope
259+
// Don't worry, this will be accessed and closed by the BeforeSendReplyIntegration
260+
if (state.Scope is not null && Tracer.Instance.ScopeManager is IScopeRawAccess rawAccess)
261+
{
262+
Log.Debug("Restoring previous scope from {Previous} to {Restored}", state.Scope.Span, state.PreviousScope?.Span);
263+
rawAccess.Active = state.PreviousScope;
264+
DistributedTracer.Instance.SetSpanContext(state.PreviousDistributedSpanContext);
265+
}
266+
}
219267
}
220268
}
221269
#endif

0 commit comments

Comments
 (0)