-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Synchronize initialization in ServiceFabric instrumentation #105
Synchronize initialization in ServiceFabric instrumentation #105
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Subtle, but it is true :)
@@ -13,7 +13,7 @@ internal static class ServiceRemotingClient | |||
private static readonly Logging.IDatadogLogger Log = Logging.DatadogLogging.GetLoggerFor(typeof(ServiceRemotingClient)); | |||
|
|||
private static int _firstInitialization = 1; | |||
private static bool _initialized; | |||
private static volatile bool _initialized; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why we need to make this change. Initialization is protected by _firstInitialization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not enough. The methods checking if it is initialized are checking _initialized
field. _firstInitialization
only makes sure that it would be initialized at most once. It does not guarantee that other threads see that it is already initialized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see what you're saying that the read of _initialized
may be stale. I asked @kevingosse about this and it doesn't actually fix the issue you're trying to solve because the volatile read doesn't flush the cache. To truly fix this, a full fence would be needed, but that seems like high overhead for a scenario that has a very low probability to happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q.: is it safe to call ServiceRemotingHelpers.FinishSpan
if the corresponding start span was not called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zacharycmontoya @kevingosse
IMO it should be enough
The volatile keyword instructs the compiler to generate an acquire-fence on every read from that field, and a release-fence on every write to that field. An acquire-fence prevents other reads/writes from being moved before the fence; a release-fence prevents other reads/writes from being moved after the fence.
Because _initilized = true
is the last line in initialization therefore I think this PR makes it thread-safe.
@pjanotti
I quickly looked at ServiceRemotingHelpers.FinishSpan
and looked correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pjanotti I do not think it would be possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify what volatile does and doesn't:
- Volatile ensures that the operation is translated into an actual read/write to memory. Without volatile, .NET is allowed to just read/write from a register and never commit the changes to memory. Note that all writes in .NET 2.0+ are implicitly volatile (in the official implementation at least, I don't know about Mono), so we should only worry about reads. Given that
_initialized
is read only once in the method, there is no opportunity to cache the value in a register so this isn't a concern. Unless the method is inlined into its caller, but this is an event so it can't be inlined. - Volatile ensures that operations are consistent related to one another. Basically, it there's is an operation A followed by an operation B, proper usage of volatile ensures that B is visible only if A is visible. But, and that's the subtlety here, it does not ensure that A is visible to begin with. With a concrete example, imagine a thread is writing true to A then true to B. From another thread, without volatile you could in theory read A false and B true. With volatile, you will either see A and B both false, or A true and B false, or A true and B true, but never A false and B true.
But volatile has no impact on how soon the value is published to other threads. If that's important, you need an operation that forces the cache to be flushed, that is Thread.MemoryBarrier
or any Interlocked
operation.
If you're skeptical, you can try playing with this code (taken from https://stackoverflow.com/q/49203936/869621) :
class Program
{
class Foo
{
public bool a;
public bool b;
public bool c;
public void SetA()
{
a = true;
TestAB();
}
public void SetB()
{
b = true;
TestAB();
}
public void TestAB()
{
if (a && b)
{
c = true;
}
}
}
static void Main(string[] args)
{
int timesCWasFalse = 0;
for (int i = 0; i < 100000; i++)
{
var f = new Foo();
var t1 = Task.Run(() => f.SetA());
var t2 = Task.Run(() => f.SetB());
Task.WaitAll(t1, t2);
if (!f.c)
{
timesCWasFalse++;
}
}
Console.WriteLine($"timesCWasFalse: {timesCWasFalse}");
Console.WriteLine("Finished. Press Enter to exit");
Console.ReadLine();
}
}
You'll see that no amount of volatile will prevent timesCWasFalse
to fail. Only inserting a memory barrier (or an interlocked operation) in TestAB
does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevingosse @zacharycmontoya
TL;DR;
What do you think about adding Thread.MemoryBarrier();
after assigning _initialized
and before reading _initialized
for sake of "simplicity" and more trust. I added volatile
because often in other PRs you were extremely concerned about performance 😄
@kevingosse (if you have a few minutes) 😉
As I have written before
Because _initilized = true is the last line in initialization, therefore, I think this PR makes it thread-safe.
Regarding:
But volatile has no impact on how soon the value is published to other threads.
It depends on how it is used. If done correctly we can make sure that it is published like doing a memory barrier. We just need to make sure that the same field is used and used in the correct manner (pattern).
The code you presented is different b/c there are loops and volatile writes before volatile reads. And it uses different fields
This is beautifully explained here: http://www.albahari.com/threading/part4.aspx#_Memory_Barriers_and_Volatility
Notice that applying volatile doesn’t prevent a write followed by a read from being swapped
But here it is not a problem as we have only one field initialized
which is assigned to true
only once.
in the official implementation at least, I don't know about Mono
And also we have .NET Core and .NET >= 5.0.... dotnet/runtime#6257
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I don't think it's worth debating hours on this. Adding volatile won't affect the codegen (except on ARM but I don't see ServiceFabric coming to ARM anytime soon), so if adding it makes you feel better I really have no reason to oppose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can affect codegen.
IL_0007: volatile.
IL_0009: ldsfld requiredmodifier Datadog.Trace.ServiceFabric.ServiceRemotingService::_initialized
vs
IL_0007: ldsfld bool Datadog.Trace.ServiceFabric.ServiceRemotingService::_initialized
The interesting thing is that on my dev container with .NET 5.0 when I build with Release
configuration. Then the .volatile
is ALWAYS added to the IL 😄
I am almost certain that volatile
can help on x86/x64 from practice. It solved an issue on Intel Atom (which is x86/x64) (AFAIK AMD processors also have different caching architecture) . It was a very similar issue to the current one here 😄
Found in background: dotnet/fsharp#122
Unstable test failure:
|
Merging per earlier conversation on SIG meeting (also not waiting on CI since it was triggered just because branch update). |
Why
initialized
field is not synchronized at all which can lead to loss of traces.What
Initialization is the only use-case where I find using
volatile
to be OK.http://www.albahari.com/threading/part4.aspx#_Memory_Barriers_and_Volatility