Synchronize initialization in ServiceFabric instrumentation #105

pellared · 2021-04-06T10:52:56Z

Why

initialized field is not synchronized at all which can lead to loss of traces.

What

Initialization is the only use-case where I find using volatile to be OK.
http://www.albahari.com/threading/part4.aspx#_Memory_Barriers_and_Volatility

pjanotti

Subtle, but it is true :)

zacharycmontoya · 2021-04-06T16:10:30Z

src/Datadog.Trace/ServiceFabric/ServiceRemotingClient.cs

@@ -13,7 +13,7 @@ internal static class ServiceRemotingClient
        private static readonly Logging.IDatadogLogger Log = Logging.DatadogLogging.GetLoggerFor(typeof(ServiceRemotingClient));

        private static int _firstInitialization = 1;
-        private static bool _initialized;
+        private static volatile bool _initialized;


I'm not sure why we need to make this change. Initialization is protected by _firstInitialization

It is not enough. The methods checking if it is initialized are checking _initialized field. _firstInitialization only makes sure that it would be initialized at most once. It does not guarantee that other threads see that it is already initialized.

Ok, I see what you're saying that the read of _initialized may be stale. I asked @kevingosse about this and it doesn't actually fix the issue you're trying to solve because the volatile read doesn't flush the cache. To truly fix this, a full fence would be needed, but that seems like high overhead for a scenario that has a very low probability to happen.

Q.: is it safe to call ServiceRemotingHelpers.FinishSpan if the corresponding start span was not called?

@zacharycmontoya @kevingosse
IMO it should be enough

The volatile keyword instructs the compiler to generate an acquire-fence on every read from that field, and a release-fence on every write to that field. An acquire-fence prevents other reads/writes from being moved before the fence; a release-fence prevents other reads/writes from being moved after the fence.

Because _initilized = true is the last line in initialization therefore I think this PR makes it thread-safe.

@pjanotti
I quickly looked at ServiceRemotingHelpers.FinishSpan and looked correct.

@pjanotti I do not think it would be possible

Just to clarify what volatile does and doesn't:

Volatile ensures that the operation is translated into an actual read/write to memory. Without volatile, .NET is allowed to just read/write from a register and never commit the changes to memory. Note that all writes in .NET 2.0+ are implicitly volatile (in the official implementation at least, I don't know about Mono), so we should only worry about reads. Given that _initialized is read only once in the method, there is no opportunity to cache the value in a register so this isn't a concern. Unless the method is inlined into its caller, but this is an event so it can't be inlined.

Volatile ensures that operations are consistent related to one another. Basically, it there's is an operation A followed by an operation B, proper usage of volatile ensures that B is visible only if A is visible. But, and that's the subtlety here, it does not ensure that A is visible to begin with. With a concrete example, imagine a thread is writing true to A then true to B. From another thread, without volatile you could in theory read A false and B true. With volatile, you will either see A and B both false, or A true and B false, or A true and B true, but never A false and B true.

But volatile has no impact on how soon the value is published to other threads. If that's important, you need an operation that forces the cache to be flushed, that is Thread.MemoryBarrier or any Interlocked operation.
If you're skeptical, you can try playing with this code (taken from https://stackoverflow.com/q/49203936/869621) :

class Program { class Foo { public bool a; public bool b; public bool c; public void SetA() { a = true; TestAB(); } public void SetB() { b = true; TestAB(); } public void TestAB() { if (a && b) { c = true; } } } static void Main(string[] args) { int timesCWasFalse = 0; for (int i = 0; i < 100000; i++) { var f = new Foo(); var t1 = Task.Run(() => f.SetA()); var t2 = Task.Run(() => f.SetB()); Task.WaitAll(t1, t2); if (!f.c) { timesCWasFalse++; } } Console.WriteLine($"timesCWasFalse: {timesCWasFalse}"); Console.WriteLine("Finished. Press Enter to exit"); Console.ReadLine(); } }

You'll see that no amount of volatile will prevent timesCWasFalse to fail. Only inserting a memory barrier (or an interlocked operation) in TestAB does.

@kevingosse @zacharycmontoya
TL;DR;
What do you think about adding Thread.MemoryBarrier(); after assigning _initialized and before reading _initialized for sake of "simplicity" and more trust. I added volatile because often in other PRs you were extremely concerned about performance 😄

@kevingosse (if you have a few minutes) 😉

As I have written before

Because _initilized = true is the last line in initialization, therefore, I think this PR makes it thread-safe.

Regarding:

But volatile has no impact on how soon the value is published to other threads.

It depends on how it is used. If done correctly we can make sure that it is published like doing a memory barrier. We just need to make sure that the same field is used and used in the correct manner (pattern).

The code you presented is different b/c there are loops and volatile writes before volatile reads. And it uses different fields

This is beautifully explained here: http://www.albahari.com/threading/part4.aspx#_Memory_Barriers_and_Volatility

Notice that applying volatile doesn’t prevent a write followed by a read from being swapped

But here it is not a problem as we have only one field initialized which is assigned to true only once.

in the official implementation at least, I don't know about Mono

And also we have .NET Core and .NET >= 5.0.... dotnet/runtime#6257

Well I don't think it's worth debating hours on this. Adding volatile won't affect the codegen (except on ARM but I don't see ServiceFabric coming to ARM anytime soon), so if adding it makes you feel better I really have no reason to oppose

It can affect codegen.

IL_0007: volatile. IL_0009: ldsfld requiredmodifier Datadog.Trace.ServiceFabric.ServiceRemotingService::_initialized

vs

IL_0007: ldsfld bool Datadog.Trace.ServiceFabric.ServiceRemotingService::_initialized

The interesting thing is that on my dev container with .NET 5.0 when I build with Release configuration. Then the .volatile is ALWAYS added to the IL 😄

I am almost certain that volatile can help on x86/x64 from practice. It solved an issue on Intel Atom (which is x86/x64) (AFAIK AMD processors also have different caching architecture) . It was a very similar issue to the current one here 😄

Found in background: dotnet/fsharp#122

pellared · 2021-04-07T19:48:33Z

Unstable test failure:

[xUnit.net 00:08:08.72] Datadog.Trace.ClrProfiler.IntegrationTests: FAILURE: Datadog.Trace.ClrProfiler.IntegrationTests.CallTargetNativeTests.MethodArgumentsInstrumentation(1, True, True) (2.1781022s)
[xUnit.net 00:08:08.72]     Datadog.Trace.ClrProfiler.IntegrationTests.CallTargetNativeTests.MethodArgumentsInstrumentation(numberOfArguments: 1, fastPath: True, inlining: True) [FAIL]
[xUnit.net 00:08:08.72] Datadog.Trace.ClrProfiler.IntegrationTests: STARTED: Datadog.Trace.ClrProfiler.IntegrationTests.Elasticsearch5Tests.SubmitsTraces(5.5.0, False, False)
  X Datadog.Trace.ClrProfiler.IntegrationTests.CallTargetNativeTests.MethodArgumentsInstrumentation(numberOfArguments: 1, fastPath: True, inlining: True) [2s 178ms]
  Error Message:
   Process exited with code 139
Expected: True
Actual:   False
  Stack Trace:
     at Datadog.Trace.ClrProfiler.IntegrationTests.CallTargetNativeTests.MethodArgumentsInstrumentation(Int32 numberOfArguments, Boolean fastPath, Boolean inlining) in /project/test/Datadog.Trace.ClrProfiler.IntegrationTests/CallTargetNativeTests.cs:line 42

pjanotti · 2021-04-07T19:51:46Z

Merging per earlier conversation on SIG meeting (also not waiting on CI since it was triggered just because branch update).

Synchronize initilization in ServiceFabric instrumentation

9656ee0

pellared requested a review from a team April 6, 2021 10:52

RassK approved these changes Apr 6, 2021

View reviewed changes

pjanotti approved these changes Apr 6, 2021

View reviewed changes

zacharycmontoya reviewed Apr 6, 2021

View reviewed changes

pellared requested a review from zacharycmontoya April 6, 2021 17:34

nrcventura approved these changes Apr 6, 2021

View reviewed changes

Merge branch 'main' into fix-race-condition-in-servicefabric

5eff58a

pjanotti merged commit 716974d into open-telemetry:main Apr 7, 2021

pellared deleted the fix-race-condition-in-servicefabric branch April 7, 2021 19:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Synchronize initialization in ServiceFabric instrumentation #105

Synchronize initialization in ServiceFabric instrumentation #105

pellared commented Apr 6, 2021

pjanotti left a comment

zacharycmontoya Apr 6, 2021

pellared Apr 6, 2021 •

edited

Loading

zacharycmontoya Apr 6, 2021

pjanotti Apr 6, 2021

pellared Apr 6, 2021 •

edited

Loading

pellared Apr 6, 2021

kevingosse Apr 6, 2021 •

edited

Loading

pellared Apr 6, 2021 •

edited

Loading

kevingosse Apr 6, 2021

pellared Apr 7, 2021 •

edited

Loading

pellared commented Apr 7, 2021

pjanotti commented Apr 7, 2021

Synchronize initialization in ServiceFabric instrumentation #105

Synchronize initialization in ServiceFabric instrumentation #105

Conversation

pellared commented Apr 6, 2021

Why

What

pjanotti left a comment

Choose a reason for hiding this comment

zacharycmontoya Apr 6, 2021

Choose a reason for hiding this comment

pellared Apr 6, 2021 • edited Loading

Choose a reason for hiding this comment

zacharycmontoya Apr 6, 2021

Choose a reason for hiding this comment

pjanotti Apr 6, 2021

Choose a reason for hiding this comment

pellared Apr 6, 2021 • edited Loading

Choose a reason for hiding this comment

pellared Apr 6, 2021

Choose a reason for hiding this comment

kevingosse Apr 6, 2021 • edited Loading

Choose a reason for hiding this comment

pellared Apr 6, 2021 • edited Loading

Choose a reason for hiding this comment

kevingosse Apr 6, 2021

Choose a reason for hiding this comment

pellared Apr 7, 2021 • edited Loading

Choose a reason for hiding this comment

pellared commented Apr 7, 2021

pjanotti commented Apr 7, 2021

pellared Apr 6, 2021 •

edited

Loading

pellared Apr 6, 2021 •

edited

Loading

kevingosse Apr 6, 2021 •

edited

Loading

pellared Apr 6, 2021 •

edited

Loading

pellared Apr 7, 2021 •

edited

Loading