-
-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lazy
access minor speedup
#1577
Conversation
@@ -44,8 +44,6 @@ | |||
|
|||
// read http://javarevisited.blogspot.de/2014/05/double-checked-locking-on-singleton-in-java.html | |||
private transient volatile Supplier<? extends T> supplier; | |||
|
|||
// does not need to be volatile, visibility piggy-backs on volatile read of `supplier` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the comments are kept in history, if somebody is interested, otherwise, in my opinion, they're just a distraction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this comment is crap but some lines of Lazy (including this) definitely need to be commented.
I'm also no friend of blowing up concise code with bla, bla. But here is happening something important behind the curtain. We need to mention that non-volatile instance variables are synced to the main memory when volatile instance variables are updated. Our code strongly relies on this property.
What does volatile do?
(...) Writing to a volatile field has the same memory effect as a monitor release, and reading from a volatile field has the same memory effect as a monitor acquire. In effect, because the new memory model places stricter constraints on reordering of volatile field accesses with other field accesses, volatile or not, anything that was visible to thread A when it writes to volatile field f becomes visible to thread B when it reads f.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I got what the comment meant, I though it has historical significance only.
You're right, some comment IS needed here :)
// using a local var speeds up the double-check idiom by 25%, see Effective Java, Item 71 | ||
Supplier<? extends T> tmp = supplier; | ||
if (tmp != null) { | ||
if ((value == null) && (supplier != null)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checking for a non-volatile
variable is faster.
Since the actual value can be null
also (but the supplier
cannot), we can short-circuit checking it first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonderful idea! I have two remarks:
1) supplier.get() may return null
Our value has two possible null-states: null as undefined and null as initialized state. If it is initialized with null, the get() calls will be slower than in #1576.
2) two reads from main memory within the synchronized block
Because supplier is volatile, all occurrences will be read from main memory, in particular:
- if (supplier != null) { ... }
- value = supplier.get();
The following solution solves 1) and 2) by fusing the ideas of #1577 with those of #1576 and additionally adding an UNDEFINED object:
private static final Object UNDEFINED = new Object();
private transient volatile Supplier<? extends T> supplier;
@SuppressWarnings("unchecked")
private T value = (T) UNDEFINED;
public T get() {
if (value == UNDEFINED && this.supplier != null) {
synchronized(this) {
final Supplier<? extends T> supplier = this.supplier;
if (supplier != null) {
value = supplier.get();
this.supplier = null;
}
}
}
return value;
}
I ask myself if the isEvaluated method could now just use value like this:
public boolean isEvaluated() {
// fast read, memory synchronized with volatile write of supplier
return value != UNDEFINED;
}
instead of
public boolean isEvaluated() {
// slow read from main memory
return supplier == null;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMPORTAT UPDATE
When I interprete this right
anything that was visible to thread A when it writes to volatile field f becomes visible to thread B when it reads f.
then the check value == UNDEFINED
may not reflect the actual main memory state.
Example:
- Thread A calls get(), acquires the lock and writes
value
andsupplier
(but is still in the synchronized block) - Thread B calls get() and checks
value == UNDEFINED
. Because Thread B did not readsupplier
, the initializedvalue
gets not visible to Thread B. Thensupplier != null
is checked, which isfalse
because Thread A set it to null. Because supplier was read, the initializedvalue
is now visible to Thread B and thereturn value
makes sense.
If the above cannot be proved to be wrong then, for the sake of correctness and safety, we should remove the first value == UNDEFINED
check completely again. What remains is this:
private transient volatile Supplier<? extends T> supplier;
private T value;
public T get() {
if (this.supplier != null) {
synchronized(this) {
final Supplier<? extends T> supplier = this.supplier;
if (supplier != null) {
value = supplier.get();
this.supplier = null;
}
}
}
return value;
}
This is exactly the solution proposed in #1576.
Also isEvaluated()
stays as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would keep the volatile check before assuming value
is correctly intialized (otherwise I think It is possible that it points to a "not finished" object).
I am also currently benchmarking lazy init schemes and I found that keeping the get
method as short as possible is key to trigger inlining and the associated speedup. So the following may be prefered:
@Override
public T get() {
return supplier == null
? value
: computeValue();
//: computeValue_withLocalVar(); (is it faster ?)
}
private T computeValue() {
synchronized (this) {
if (supplier != null) {
value = supplier.get();
supplier = null; // free mem
}
}
return value;
}
private T computeValue_withLocalVar() {
T v;
// using a local var speeds up the double-check idiom by 25% (maybe??), see Effective Java, Item 71
synchronized (this) {
Supplier<? extends T> tmp = supplier;
if (tmp != null) {
value = v = tmp.get();
supplier = null; // free mem
} else {
v = value;
}
}
return v;
}
I could not get @paplorinc benchmark working so this still need to be validated...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jbgi, inlining by keeping it small is a great idea!!
I also think that the read of value
is dirty.
The local var T v
most probably isn't needed because value
isn't volatile and reads are cheap.
I will further play around with different solutions....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other words your version is not deterministic and may produce unexpected results.
Worse than that: reading a naked value
is reading via the race. There is no guarantees that something that you have written into the value fields themselves is visible to the readers of value reference. Should read supplier first to get "acquire" part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shipilev In our case that might be ok. We first read value
because it is a fast read. We can have three cases then.
- value was not written by any thread yet
- read reflects the written value, i.e. it is visible to the thread.
- written value is not visible to the current thread
In case 1. value
is null and will be computed. After that it is visible to the current thread and will be returned by get()
.
In the second case the write to value
is visible. Then we are finished and can safely return it.
In the third case the write to value
is not visible, it seems to be still null. Then supplier
is read and it turns out that we already computed the value. Because of the volatile read of supplier
our value
is now ✨automagically✨ visible to the current thread and it can be safely returned by get()
.
In other words we use value == null
as a shortcut and we can detect not-visible writes by having the volatile 'afterburner' read of supplier
.
Benchmarks say that is faster in most cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the second case the write to value is visible. Then we are finished and can safely return it.
DANGEROUSLY WRONG. Let me try to explain this again. You cannot safely return something you have read off the race, you have just constructed the unsafe publication. Even if you have observed the reference to value
, you are not guaranteed to observe the transitive reads correctly, you have to piggyback on happens-before for this. And to piggy-back on all paths, you have to read supplier
on all paths.
In other words, with value == null
shortcut, you have just exposed your users to this (I am speculating about the syntax here, sorry, in hurry):
class MyObject {
int x;
MyObject(int x) { this.x = x; }
}
static final Lazy<MyObject> lazy = Lazy.with(() -> new MyObject(42));
...
MyObject o = lazy.get(); // discover some other thread had already initalized
assert (o.x == 42); // intermittently FAILS, because MyObject within Lazy was unsafely published
Or, to repeat myself from this thread:
It all boils down to a simple "safe publication" rule: one thread does "writes, writes, writes --> VOLATILE WRITE (release)", the other thread does "VOLATILE READ (acquire) THAT OBSERVES THE VOLATILE WRITE VALUE --> reads, reads, reads". No need to think deeply about the JMM if you follow this rule. Everything goes downhill if you start coming up with weird notions like "effectively volatile", racy shortcuts, etc.
See also this example: https://shipilev.net/blog/2016/close-encounters-of-jmm-kind/#pitfall-semi-sync
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shipilev thanks for clarifying it again and for being so patient with us 🙏🏼
The missing part we did not consider was the one with the transitive dependencies.
@paplorinc please follow then this solution. Further improvements can be made later if necessary. I want this get merged to add it to the upcoming 2.0.4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danieldietrich, pushed a new version - though I'm not sure about speed or correctness anymore :p
FYI, get
is small enough to be inlinable now , i.e. javaslang.Lazy::get (19 bytes) inline (hot)
@@ -516,7 +516,7 @@ default CharSeq toCharSeq() { | |||
*/ | |||
@SuppressWarnings("unchecked") | |||
@GwtIncompatible("reflection is not supported") | |||
default T[] toJavaArray(Class<T> componentType) { | |||
default T[] toJavaArray(Class<? super T> componentType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danieldietrich, is this correct?
Otherwise I should call .toJavaArray(Lazy<Integer>.class)
, which isn't valid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is...... awesome :)
It seems to solve the unsolvable problem of getting the class of a generic type. I've played around with it and couldn't construct an unsafe example (yet).
Please do this change to a separate PR, this little change is really worth it. I will pull it before this PR then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in #1578
Current coverage is 96.84% (diff: 100%)@@ master #1577 diff @@
==========================================
Files 88 88
Lines 10677 10674 -3
Methods 0 0
Messages 0 0
Branches 1818 1818
==========================================
- Hits 10339 10337 -2
Misses 194 194
+ Partials 144 143 -1
|
Wow, so many gems hidden in a small class like Lazy. Would love to see updated benchmark results of the applied code review comments. |
Considering concurrent reads of |
Sure :)
Indeed, but we're talking about
Yes, the same for If I'm mistaken, could you please provide a test that fails? |
@paplorinc
Not sure if I'm able to do it. It requires multiple threads and one thread that reads an outdated I'm sure that I'm right here. |
Will see what is possible... |
assertThat(lazy.get()).isEqualTo(1); | ||
} | ||
|
||
@Test | ||
public void shouldBeConsistentFromMultipleThreads() throws Exception { | ||
CompletableFuture.allOf( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danieldietrich, I couldn't manage to make it work with slang Future
(that fails for non-double check), only with CompletableFuture
final Lazy<Integer> lazy = Lazy.of(() -> i); | ||
lazy.get(); | ||
return lazy; | ||
}).toJavaList().toArray(new Lazy[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danieldietrich, this is ugly :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. It will be a little less ugly if we add T[] Value.toJavaArray(T[])
- the only safe solution :-/
See #1582
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice PR this one. We've learned so much :)
final Lazy<Integer> lazy = Lazy.of(() -> i); | ||
lazy.get(); | ||
return lazy; | ||
}).toJavaList().toArray(new Lazy[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. It will be a little less ugly if we add T[] Value.toJavaArray(T[])
- the only safe solution :-/
See #1582
Alternative to #1576
Memoization (mutation / multi threading in general) performance is difficult to measure properly.
The benchmark compares simple eager access (
java_eager
) toLazy
's first access (slang_lazy
) and subsequent, initialized access (slang_inited_lazy
):Since running the same measurement multiple times would measure different things (the first access would memoize, the rest would return the result of the calculation), I created a small array of
Lazy
values (small enough to fit in the cache but big enough to make the array access cost dissolve), and measure consuming it - recreating it before every measurement to assure uninitializedLazy
.These are the results (please run them on your PC also via
javaslang.control.LazyBenchmark#main
):