-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8345668: ZoneOffset.ofTotalSeconds performance regression #22854
8345668: ZoneOffset.ofTotalSeconds performance regression #22854
Conversation
👋 Welcome back naoto! A progress list of the required criteria for merging this PR into |
@naotoj This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 3 new commits pushed to the
Please see this link for an up-to-date comparison between the source branch of this pull request and the ➡️ To integrate this PR with the above commit message to the |
Webrevs
|
Co-authored-by: Roger Riggs <Roger.Riggs@Oracle.com>
return SECONDS_CACHE.computeIfAbsent(totalSeconds, totalSecs -> { | ||
ZoneOffset result = new ZoneOffset(totalSecs); | ||
Integer totalSecs = totalSeconds; | ||
ZoneOffset result = SECONDS_CACHE.get(totalSecs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, each call may allocate an Integer object. The maximum number of ZoneOffsets that need to be cached here is only 148. Using AtomicReferenceArray is better than AtomicConcurrentHashMap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example:
static final AtomicReferenceArray<ZoneOffset> MINUTES_15_CACHE = new AtomicReferenceArray<>(37 * 4);
public static ZoneOffset ofTotalSeconds(int totalSeconds) {
// ...
int minutes15Rem = totalSeconds / (15 * SECONDS_PER_MINUTE);
if (totalSeconds - minutes15Rem * 15 * SECONDS_PER_MINUTE == 0) {
int cacheIndex = minutes15Rem + 18 * 4;
ZoneOffset result = MINUTES_15_CACHE.get(cacheIndex);
if (result == null) {
result = new ZoneOffset(totalSeconds);
if (!MINUTES_15_CACHE.compareAndSet(cacheIndex, null, result)) {
result = MINUTES_15_CACHE.get(minutes15Rem);
}
}
return result;
}
// ...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Shaojin,
Thanks for the suggestion, but I am not planning to improve the code more than backing out the offending fix at this time. (btw, cache size would be 149 as 18:00 and -18:00 are inclusive)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I submit a PR to make this improvement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wenshao I agree with your proposal. Also for this part:
ZoneOffset result = MINUTES_15_CACHE.get(cacheIndex);
if (result == null) {
result = new ZoneOffset(totalSeconds);
if (!MINUTES_15_CACHE.compareAndSet(cacheIndex, null, result)) {
result = MINUTES_15_CACHE.get(minutes15Rem);
}
}
I recommend a rewrite:
ZoneOffset result = MINUTES_15_CACHE.getPlain(cacheIndex);
if (result == null) {
result = new ZoneOffset(totalSeconds);
ZoneOffset existing = MINUTES_15_CACHE.compareAndExchange(cacheIndex, null, result);
return existing == null ? result : existing;
}
The getPlain
is safe because ZoneOffset
is thread safe, so you can use the object when you can observe a ZoneOffset
object reference. Also compareAndExchange
avoids extra operations if we failed to racily set the computed ZoneOffset
.
@Benchmark | ||
public void ofTotalSeconds() { | ||
for (int i = 0; i < 1_000; i++) { | ||
ZoneOffset.ofTotalSeconds(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This benchmark method should accept a Blackhole
, and the return value of ofTotalSeconds
must be sent to the Blackhole.consume
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This benchmark currently works probably because the cache interactions in ofTotalSeconds
, which means JIT compilation cannot prove it is side-effect free. Had it been as simple as a decimal computation or if the cache becomes a stable map, JIT compilation can eliminate the static factory method call entirely, and the benchmark would be measuring the performance of no-op invocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to remove this benchmark, as the fix is merely to revert the previous fix and not providing any performance improvement (to the original).
The |
store = createStore(field, locale); | ||
CACHE.putIfAbsent(key, store); | ||
store = CACHE.get(key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be
store = CACHE.computeIfAbsent(key, e -> createStore(e.getKey(), e.getValue()));
That still allow the optimistic/concurrent get call to succeed most of the time (when already cached) but reduce the interactions with the map when a value is created/set/accessed the first time.
Alternatively, the result of putIfAbsent
could be checked/used to avoid the second call to get
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For sure we should use result of putIfAbsent
. Let's do this for all cases. See how it was implemented in my first commit - 73a2f6c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For sure we should use result of
putIfAbsent
Drive-by comment...
From what i can infer, the performance regression being addressed here is caused in part by the fact that (for example) ConcurrentHashMap.computeIfAbsent()
provides an atomicity guarantee, which is a stronger requirement than is necessary here, and therefore by splitting up that call up into two separate, non-atomic get()
and put()
calls we get (counter-intuitively) faster execution time, even though there are more lines of code. Note putIfAbsent()
also guarantees atomicity, so the same problem of slowness caused by "unnecessary atomicity" might occur with it as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, just noticed that both computeIfAbsent
and putIfAbsent
may acquire the lock when the key is present, while get
never acquires a lock for read-only access.
Maybe the implementation was written back when locking was less costly (with biased locking, etc.). Now we might have to reconsider locking until we know for sure a plain get fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This scenario is discussed in Effective Java by Joshua Block. His observation then (java 5/6 time frame?) was optimistically calling get
first and only calling putIfAbsent
or computeIfAbsent
if the get
returned null
was 250% faster, and this is because calls to put/compute ifAbsent have contention. There have been changes made to those methods since then to try to avoid synchronization when the key is already present, but the observation seems to confirm that the optimistic get
call first is still faster (though a much smaller difference).
My comment was not to revert back to the prior change of just calling computeIfAbsent
, but rather just to change the (expected rare) case when the first get
returns null
to replace the putIfAbsent
and second get
call with a single computeIfAbsent
(or utilize the result of putIfAbsent
to avoid the second call to get
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your observations. I think Archie's analysis sounds right, although have not confirmed. Will use the result from putIfAbsent()
for all cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
@@ -1,5 +1,5 @@ | |||
/* | |||
* Copyright (c) 2012, 2023, Oracle and/or its affiliates. All rights reserved. | |||
* Copyright (c) 2012, 2024, Oracle and/or its affiliates. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't be 2025 too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR was published last year and ZoneOffset
has not changed since then. So I think 2024 is fine
Thank you for the reviews! |
Going to push as commit 9a60f44.
Your commit was automatically rebased without conflicts. |
The change made in JDK-8288723 seems innocuous, but it caused this performance regression. Partially reverting the change (ones that involve
computeIfAbsent()
) to the original. Provided a benchmark that iterates the call toZoneOffset.ofTotalSeconds(0)
1,000 times, which improves the operation time from 3,946ns to 2,241ns.Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/22854/head:pull/22854
$ git checkout pull/22854
Update a local copy of the PR:
$ git checkout pull/22854
$ git pull https://git.openjdk.org/jdk.git pull/22854/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 22854
View PR using the GUI difftool:
$ git pr show -t 22854
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/22854.diff
Using Webrev
Link to Webrev Comment