-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MINOR: Fix a race and add JMH bench for HdrHistogram #17184
MINOR: Fix a race and add JMH bench for HdrHistogram #17184
Conversation
After a bunch of experiments, I did arrive at this test, which on my development machine fails >50% without this change and passes 100% (in more than 500 attempts) with this change:
It does rely on numbers and it takes >2 sec locally when it succeeds, so I am inclined not to include it. Let me know if you prefer otherwise. I did spend some time trying to make the iteration body smaller by using a |
private volatile Histogram yammerHistogram; | ||
private volatile HdrHistogram hdrHistogram; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: On a second look these might not need to be volatile
as the @Setup
method probably happens-before the @Benchmark
methods. The accesses through the volatile
references are really cheap compared to what's being measured though, so it might also be OK to leave them as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! left some comments
Timestamped<Histogram> latest = timestampedHistogramSnapshot.get(); | ||
while (now - latest.timestamp > maxSnapshotAgeMs) { | ||
Histogram currentSnapshot = recorder.getIntervalHistogram(); | ||
boolean updatedLatest = timestampedHistogramSnapshot.compareAndSet( | ||
latest, new Timestamped<>(now, currentSnapshot)); | ||
|
||
latest = timestampedHistogramSnapshot.get(); | ||
if (updatedLatest) { | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to reproduce the bug mentioned and it fails deterministically in the previous implementation
@Test
public void testConcurrentAccessDoesNotLosePreviousState() {
long maxSnapshotAgeMs = 10L;
CountDownLatch latch = new CountDownLatch(1);
Recorder mockRecorder = new MockRecorder(latch);
HdrHistogram hdrHistogram = new HdrHistogram(maxSnapshotAgeMs, MAX_VALUE, 3, mockRecorder);
AtomicReference<org.HdrHistogram.Histogram> asyncHistogram = new AtomicReference<>();
org.HdrHistogram.Histogram mainHistogram;
CompletableFuture<Void> asyncFuture = CompletableFuture.runAsync(() -> {
asyncHistogram.set(hdrHistogram.latestHistogram(11));
});
mainHistogram = hdrHistogram.latestHistogram(11);
asyncFuture.join();
assertNotSame(asyncHistogram.get(), mainHistogram);
}
private static class MockRecorder extends Recorder {
private final CountDownLatch latch;
private final AtomicInteger invocationCounter = new AtomicInteger(0);
public MockRecorder(CountDownLatch latch) {
super(3);
this.latch = latch;
}
@Override
public org.HdrHistogram.Histogram getIntervalHistogram() {
if (invocationCounter.compareAndSet(0, 1)) {
try {
latch.await();
} catch (Exception e) {
e.printStackTrace();
}
} else {
latch.countDown();
}
return super.getIntervalHistogram();
}
}
This test hangs in the new implementation because the call to getIntervalHistogram
is synchronized on lock
. Which makes sense since we want to ensure that the thread that calls getIntervalHistogram is the only thread that can proceed (and update the state). I wonder if we can modify this test to pass when it hangs for x seconds. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some improvements to the test I mentioned here, and as a result it now runs much faster (well under a second on my development machine) and fails reliably without this change (failed 500 out of 500 attempts locally when run via Gradle).
After these improvements I prefer the aforementioned test over the one here, as the latter requires changes in the histogram code, is still not deterministically proving things if we are going to make it pass on a timeout, and is a bit harder to understand.
@Measurement(iterations = 5) | ||
@BenchmarkMode(Mode.AverageTime) | ||
@OutputTimeUnit(TimeUnit.NANOSECONDS) | ||
public class HistogramBenchmark { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To understand the benchmark:
- @group annotation allows the read and writes to happen concurrently
- Simulate "slower" histogram reads compared to writes
Is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@group annotation allows the read and writes to happen concurrently
The @Group
annotation allows the reads and writes to happen concurrently (if in the same group), but also asymmetrically, with some threads doing reads and some doing writes.
Check out https://github.com/melix/jmh-gradle-example/blob/master/src/jmh/java/org/openjdk/jmh/samples/JMHSample_15_Asymmetric.java
Simulate "slower" histogram reads compared to writes
Not slower, just more rare than writes, which is what we would expect in the general case.
@Benchmark | ||
@Group("runner") | ||
@GroupThreads(3) | ||
public void write_YammerHistogram() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: is this snake-case naming convention used in jmh benchmarks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In some, but apparently not in the ones in AK, so I'll fix this (e.g. making this writeYammerHistogram
).
2a14f85
to
f1e7954
Compare
@chia7712 did you merge this intentionally? This was in the process of being rebased in addition to having some of its comments addressed. I don't think reopening is an option now, which means the history will be split between this and a subsequent PR, but this shouldn't be that big of an issue. |
I did not merge it ... this (GitHub) bug happened before but I do have no idea why Github so hate me 😢 I guess GitHub get confused by the force-push from the bot "airlock-confluentinc"? |
About
Addresses the race described here and adds a JMH benchmark comparing
HdrHistogram
with Yammer's histogram.Testing
I haven't included a test reproducing the race, because on my development machine I wasn't able to see consistent reproducibility without more sophisticated testing tools like jcstress or Byteman/BMUnit. The race is easily reproducible manually while debugging the test.
Committer Checklist (excluded from commit message)