Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy access minor speedup #1577

Merged
merged 1 commit into from
Sep 24, 2016
Merged

Lazy access minor speedup #1577

merged 1 commit into from
Sep 24, 2016

Conversation

l0rinc
Copy link
Contributor

@l0rinc l0rinc commented Sep 19, 2016

Alternative to #1576
Memoization (mutation / multi threading in general) performance is difficult to measure properly.

The benchmark compares simple eager access (java_eager) to Lazy's first access (slang_lazy) and subsequent, initialized access (slang_inited_lazy):

Since running the same measurement multiple times would measure different things (the first access would memoize, the rest would return the result of the calculation), I created a small array of Lazy values (small enough to fit in the cache but big enough to make the array access cost dissolve), and measure consuming it - recreating it before every measurement to assure uninitialized Lazy.

These are the results (please run them on your PC also via javaslang.control.LazyBenchmark#main):

Before:
  java_eager/slang_inited_lazy 1.11×
  java_eager/slang_lazy        6.79×

After:
  java_eager/slang_inited_lazy 1.06×
  java_eager/slang_lazy        6.78×

@@ -44,8 +44,6 @@

// read http://javarevisited.blogspot.de/2014/05/double-checked-locking-on-singleton-in-java.html
private transient volatile Supplier<? extends T> supplier;

// does not need to be volatile, visibility piggy-backs on volatile read of `supplier`
Copy link
Contributor Author

@l0rinc l0rinc Sep 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comments are kept in history, if somebody is interested, otherwise, in my opinion, they're just a distraction

Copy link
Contributor

@danieldietrich danieldietrich Sep 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this comment is crap but some lines of Lazy (including this) definitely need to be commented.

I'm also no friend of blowing up concise code with bla, bla. But here is happening something important behind the curtain. We need to mention that non-volatile instance variables are synced to the main memory when volatile instance variables are updated. Our code strongly relies on this property.

What does volatile do?

(...) Writing to a volatile field has the same memory effect as a monitor release, and reading from a volatile field has the same memory effect as a monitor acquire. In effect, because the new memory model places stricter constraints on reordering of volatile field accesses with other field accesses, volatile or not, anything that was visible to thread A when it writes to volatile field f becomes visible to thread B when it reads f.

- JSR 133 FAQ

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I got what the comment meant, I though it has historical significance only.
You're right, some comment IS needed here :)

// using a local var speeds up the double-check idiom by 25%, see Effective Java, Item 71
Supplier<? extends T> tmp = supplier;
if (tmp != null) {
if ((value == null) && (supplier != null)) {
Copy link
Contributor Author

@l0rinc l0rinc Sep 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checking for a non-volatile variable is faster.
Since the actual value can be null also (but the supplier cannot), we can short-circuit checking it first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful idea! I have two remarks:

1) supplier.get() may return null

Our value has two possible null-states: null as undefined and null as initialized state. If it is initialized with null, the get() calls will be slower than in #1576.

2) two reads from main memory within the synchronized block

Because supplier is volatile, all occurrences will be read from main memory, in particular:

  • if (supplier != null) { ... }
  • value = supplier.get();

The following solution solves 1) and 2) by fusing the ideas of #1577 with those of #1576 and additionally adding an UNDEFINED object:

    private static final Object UNDEFINED = new Object();

    private transient volatile Supplier<? extends T> supplier;

    @SuppressWarnings("unchecked")
    private T value = (T) UNDEFINED;

    public T get() {
        if (value == UNDEFINED && this.supplier != null) {
            synchronized(this) {
                final Supplier<? extends T> supplier = this.supplier;
                if (supplier != null) {
                    value = supplier.get();
                    this.supplier = null;
                }
            }
        }
        return value;
    }

I ask myself if the isEvaluated method could now just use value like this:

public boolean isEvaluated() {
    // fast read, memory synchronized with volatile write of supplier
    return value != UNDEFINED;
}

instead of

public boolean isEvaluated() {
    // slow read from main memory
    return supplier == null;
}

Copy link
Contributor

@danieldietrich danieldietrich Sep 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMPORTAT UPDATE

When I interprete this right

anything that was visible to thread A when it writes to volatile field f becomes visible to thread B when it reads f.

then the check value == UNDEFINED may not reflect the actual main memory state.

Example:

  1. Thread A calls get(), acquires the lock and writes value and supplier (but is still in the synchronized block)
  2. Thread B calls get() and checks value == UNDEFINED. Because Thread B did not read supplier, the initialized value gets not visible to Thread B. Then supplier != null is checked, which is falsebecause Thread A set it to null. Because supplier was read, the initialized value is now visible to Thread B and the return value makes sense.

If the above cannot be proved to be wrong then, for the sake of correctness and safety, we should remove the first value == UNDEFINED check completely again. What remains is this:

    private transient volatile Supplier<? extends T> supplier;

    private T value;

    public T get() {
        if (this.supplier != null) {
            synchronized(this) {
                final Supplier<? extends T> supplier = this.supplier;
                if (supplier != null) {
                    value = supplier.get();
                    this.supplier = null;
                }
            }
        }
        return value;
    }

This is exactly the solution proposed in #1576.

Also isEvaluated() stays as is.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep the volatile check before assuming value is correctly intialized (otherwise I think It is possible that it points to a "not finished" object).

I am also currently benchmarking lazy init schemes and I found that keeping the get method as short as possible is key to trigger inlining and the associated speedup. So the following may be prefered:

    @Override
    public T get() {
        return supplier == null
            ? value
            : computeValue();
            //: computeValue_withLocalVar(); (is it faster ?)
    }

    private T computeValue() {
        synchronized (this) {
            if (supplier != null) {
                value = supplier.get();
                supplier = null; // free mem
            }
        }
        return value;
    }

    private T computeValue_withLocalVar() {
        T v;
        // using a local var speeds up the double-check idiom by 25% (maybe??), see Effective Java, Item 71
        synchronized (this) {
            Supplier<? extends T> tmp = supplier;
            if (tmp != null) {
                value = v = tmp.get();
                supplier = null; // free mem
            } else {
                v = value;
            }
        }
        return v;
    }

I could not get @paplorinc benchmark working so this still need to be validated...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jbgi, inlining by keeping it small is a great idea!!
I also think that the read of value is dirty.
The local var T v most probably isn't needed because value isn't volatile and reads are cheap.
I will further play around with different solutions....

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words your version is not deterministic and may produce unexpected results.

Worse than that: reading a naked value is reading via the race. There is no guarantees that something that you have written into the value fields themselves is visible to the readers of value reference. Should read supplier first to get "acquire" part.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shipilev In our case that might be ok. We first read value because it is a fast read. We can have three cases then.

  1. value was not written by any thread yet
  2. read reflects the written value, i.e. it is visible to the thread.
  3. written value is not visible to the current thread

In case 1. value is null and will be computed. After that it is visible to the current thread and will be returned by get().

In the second case the write to value is visible. Then we are finished and can safely return it.

In the third case the write to value is not visible, it seems to be still null. Then supplier is read and it turns out that we already computed the value. Because of the volatile read of supplier our value is now ✨automagically✨ visible to the current thread and it can be safely returned by get().

In other words we use value == null as a shortcut and we can detect not-visible writes by having the volatile 'afterburner' read of supplier.

Benchmarks say that is faster in most cases.

Copy link

@shipilev shipilev Sep 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the second case the write to value is visible. Then we are finished and can safely return it.

DANGEROUSLY WRONG. Let me try to explain this again. You cannot safely return something you have read off the race, you have just constructed the unsafe publication. Even if you have observed the reference to value, you are not guaranteed to observe the transitive reads correctly, you have to piggyback on happens-before for this. And to piggy-back on all paths, you have to read supplier on all paths.

In other words, with value == null shortcut, you have just exposed your users to this (I am speculating about the syntax here, sorry, in hurry):

  class MyObject {
    int x; 
    MyObject(int x) { this.x = x; }
  }

  static final Lazy<MyObject> lazy = Lazy.with(() -> new MyObject(42));
  ...
  MyObject o = lazy.get(); // discover some other thread had already initalized
  assert (o.x == 42); // intermittently FAILS, because MyObject within Lazy was unsafely published

Or, to repeat myself from this thread:

It all boils down to a simple "safe publication" rule: one thread does "writes, writes, writes --> VOLATILE WRITE (release)", the other thread does "VOLATILE READ (acquire) THAT OBSERVES THE VOLATILE WRITE VALUE --> reads, reads, reads". No need to think deeply about the JMM if you follow this rule. Everything goes downhill if you start coming up with weird notions like "effectively volatile", racy shortcuts, etc.

See also this example: https://shipilev.net/blog/2016/close-encounters-of-jmm-kind/#pitfall-semi-sync

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shipilev thanks for clarifying it again and for being so patient with us 🙏🏼
The missing part we did not consider was the one with the transitive dependencies.

@paplorinc please follow then this solution. Further improvements can be made later if necessary. I want this get merged to add it to the upcoming 2.0.4.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danieldietrich, pushed a new version - though I'm not sure about speed or correctness anymore :p
FYI, get is small enough to be inlinable now , i.e. javaslang.Lazy::get (19 bytes) inline (hot)

@@ -516,7 +516,7 @@ default CharSeq toCharSeq() {
*/
@SuppressWarnings("unchecked")
@GwtIncompatible("reflection is not supported")
default T[] toJavaArray(Class<T> componentType) {
default T[] toJavaArray(Class<? super T> componentType) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danieldietrich, is this correct?
Otherwise I should call .toJavaArray(Lazy<Integer>.class), which isn't valid

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is...... awesome :)

It seems to solve the unsolvable problem of getting the class of a generic type. I've played around with it and couldn't construct an unsafe example (yet).

Please do this change to a separate PR, this little change is really worth it. I will pull it before this PR then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in #1578

@codecov-io
Copy link

codecov-io commented Sep 19, 2016

Current coverage is 96.84% (diff: 100%)

Merging #1577 into master will increase coverage by <.01%

@@             master      #1577   diff @@
==========================================
  Files            88         88          
  Lines         10677      10674     -3   
  Methods           0          0          
  Messages          0          0          
  Branches       1818       1818          
==========================================
- Hits          10339      10337     -2   
  Misses          194        194          
+ Partials        144        143     -1   

Powered by Codecov. Last update 5a48c11...743cb07

@danieldietrich
Copy link
Contributor

Wow, so many gems hidden in a small class like Lazy.

Would love to see updated benchmark results of the applied code review comments.

@danieldietrich
Copy link
Contributor

Considering concurrent reads of value the optimization may lead to inconsistent reads (see comment above). Therefore we should implement a safe solution.
Thx for the great benchmarks!

@l0rinc
Copy link
Contributor Author

l0rinc commented Sep 20, 2016

Please do this change to a separate PR, this little change is really worth it.

Sure :)

the get() calls will be slower than in #1576.

Indeed, but we're talking about <5% here, no other class is this optimal, I don't think this should be a criterion.

then the check value == UNDEFINED may not reflect the actual main memory state

Yes, the same for null values, but that's what the subsequent synchronized block is for, which checks a volatile field, guaranteeing correctness. Once all the values are out of the cache, the non-volatile field check will suffice as it should only change once.

If I'm mistaken, could you please provide a test that fails?

@danieldietrich
Copy link
Contributor

@paplorinc

If I'm mistaken, could you please provide a test that fails?

Not sure if I'm able to do it. It requires multiple threads and one thread that reads an outdated value while checking at the same time that the value was updated.

I'm sure that I'm right here.

@danieldietrich
Copy link
Contributor

If I'm mistaken, could you please provide a test that fails?

Will see what is possible...

assertThat(lazy.get()).isEqualTo(1);
}

@Test
public void shouldBeConsistentFromMultipleThreads() throws Exception {
CompletableFuture.allOf(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danieldietrich, I couldn't manage to make it work with slang Future (that fails for non-double check), only with CompletableFuture

@l0rinc l0rinc changed the title Lazy access speedup Lazy access minor speedup Sep 21, 2016
final Lazy<Integer> lazy = Lazy.of(() -> i);
lazy.get();
return lazy;
}).toJavaList().toArray(new Lazy[0]);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danieldietrich, this is ugly :/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. It will be a little less ugly if we add T[] Value.toJavaArray(T[]) - the only safe solution :-/
See #1582

Copy link
Contributor

@danieldietrich danieldietrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice PR this one. We've learned so much :)

final Lazy<Integer> lazy = Lazy.of(() -> i);
lazy.get();
return lazy;
}).toJavaList().toArray(new Lazy[0]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. It will be a little less ugly if we add T[] Value.toJavaArray(T[]) - the only safe solution :-/
See #1582

@danieldietrich danieldietrich merged commit 7b9e258 into vavr-io:master Sep 24, 2016
@l0rinc l0rinc deleted the Lazy branch October 7, 2016 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants