-
Notifications
You must be signed in to change notification settings - Fork 765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Histogram synchronization issue #3534
Fix Histogram synchronization issue #3534
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3534 +/- ##
==========================================
+ Coverage 87.26% 87.28% +0.01%
==========================================
Files 277 277
Lines 9931 9938 +7
==========================================
+ Hits 8666 8674 +8
+ Misses 1265 1264 -1
|
@@ -337,16 +337,18 @@ internal void Update(double number) | |||
var sw = default(SpinWait); | |||
while (true) | |||
{ | |||
if (Interlocked.Exchange(ref this.histogramBuckets.IsCriticalSectionOccupied, 1) == 0) | |||
if (Interlocked.CompareExchange(ref this.histogramBuckets.IsCriticalSectionOccupied, 1, 0) == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious if there were more learnings based on the questions raised here #3458 (comment).
Does CompareExchange
help alleviate the concerns? Or did you learn the concern was unfounded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I introduced CompareExchange
in this PR because I think it's much better to try to update IsCriticalSectionOccupied
to 1 only when it's actually 0. With the Interlocked.Exchange
we would be setting that variable to 1 even when it's already set to 1, and that seems wasteful.
Regarding the learnings for the questions asked in #3458:
-
We can use Interlocked operations to devise a "lock" that would synchronize the access to histogram related fields here among multiple threads. However, the part where we release this "lock" i.e., where we set
IsCriticalSectionOccupied
= 0 has to be done using anInterlocked
statement as well. This is becauseInterlocked
creates a memory fence around that variable. This prevents certain reorderings of instructions that might setIsCriticalSection
to 0 before the original thread is done processing the histogram fields as that might allow another thread to enter the critical section while the original thread has not exited it. -
Also, any other thread which might be accessing these histogram fields must use the same synchronization mechanism. That's why I have updated the
TakeSnapshot
method to use the same Interlocked operations to updateIsCriticalSectionOccupied
to determine if it's safe to read these fields. -
The instructions within this "lock" can be reordered, in this case, the code under
unchecked
. That's okay because we don't care what order they get executed in. -
Regarding the freshness of the values, as long as all the threads that access the same set of fields use the same synchronization mechanism (could be the C#
lock
statement or our Interlocked based "lock"), any thread that enters the critical section would see the most up-to-date value of these fields. -
The only thing to be wary of in this Interlocked based "lock" is exceptions. If an exception is thrown while a thread was in the critical section,
IsCriticalSectionOccupied
would not be set back to 0 and other threads waiting to enter the critical section would never be able to enter it. In our case, the instructions inside the "lock" would not throw an exception so we don't have to worry about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I introduced CompareExchange in this PR because I think it's much better to try to update IsCriticalSectionOccupied to 1 only when it's actually 0. With the Interlocked.Exchange we would be setting that variable to 1 even when it's already set to 1, and that seems wasteful.
^ might be good to see if perf is impacted with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent summary! Thank you. Number 5 in particular had not occurred to me, but absolutely makes sense.
I'd be interested in perf too. Given that the snapshot and update threads were previously not synchronizing with one another, I'm curious if perf is impacted now that they are correctly synchronized. Maybe not by much, since the snapshot thread only runs seldomly...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with this as-is but we should probably do some perf testing using lock
, SpinLock
, and this new Interlocked
code. I think this will be the fastest way but I don't want to blindly trust the old test which proved "Interlocked is faster than lock" because it was bugged when tested 😄
PS: Special shout-out to @noahfalk for helping us on this 👏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for correctness, we need to modify this tests https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/test/OpenTelemetry.Tests/Metrics/MetricAPITest.cs#L1175
It validates multiple threads writing scenario.
We can add a reader thread also, and ultimately everything should add up correctly. (ideally we want to modify the stress tests to do it, so you can run it for hours and at the end, it'll confirm if the metrics are correct).
I ran Benchmark and Stress tests for Expand for codeusing System.Threading;
using BenchmarkDotNet.Attributes;
namespace Benchmarks.Metrics
{
[MemoryDiagnoser]
public class TestBenchmarks
{
private int isLocked = 1;
[Benchmark]
public void Exchange()
{
var temp = Interlocked.Exchange(ref this.isLocked, 1) == 0;
}
[Benchmark]
public void CompareExchange()
{
var temp = Interlocked.CompareExchange(ref this.isLocked, 1, 0) == 0;
}
}
}
I have updated the code to go back to using |
Beware that single threaded, zero contention lock perf benchmarks can (and probably will) give very different results vs. a test with multiple threads and some contention. Even if threads are never explicitly waiting for another to release the lock, down at the hardware level the cache line backing the critical section memory has to be moved between the caches connected to different cores. When the cores are continuously renegotiating which one has exclusive access to the memory that creates large additional latencies on those Interlocked instructions. |
Check the PRs for the other two synchronization techniques: Performance comparison of the different synchronization techniquesHigh contention scenario:
Low contention scenario:
The Interlocked based "lock" outperforms the classic lock and SpinLock for high contention scenario where multiple threads try to update the same MetricPoint in parallel. It had ~35% more loops/sec than the other techniques. It also has significantly lower CPU cycles/loop: ~83% less than classic lock and ~23% less than SpinLock. For low contention scenario, there is still some perf gain for Interlocked based "lock", but it's not as high. The benchmark results for Interlocked based "lock" are comparable to classic lock. The benchmark numbers for SpinLock are lower than that of Interlocked based "lock" and classic lock. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Changes
Update
andTakeSnapshot
methodsHistogramBuckets.LockObject
Stress Test Results (high contention) with Prometheus Exporter scraping interval = 10s
Multiple threads updating the same Histogram MetricPoint
Stress Test Results (low contention) with Prometheus Exporter scraping interval = 10s
Multiple threads updating random Histogram MetricPoints
Benchmark Results
// * Summary *
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22000
Intel Core i7-9700 CPU 3.00GHz, 1 CPU, 8 logical and 8 physical cores
.NET SDK=6.0.302
[Host] : .NET 6.0.7 (6.0.722.32202), X64 RyuJIT
DefaultJob : .NET 6.0.7 (6.0.722.32202), X64 RyuJIT