Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplified function memoization #1837

Merged
merged 1 commit into from
Jan 26, 2017
Merged

Simplified function memoization #1837

merged 1 commit into from
Jan 26, 2017

Conversation

l0rinc
Copy link
Contributor

@l0rinc l0rinc commented Jan 24, 2017

Addresses #1762

  • nulls are avoided by always using a Tuple wrapper
  • concurrent modifications are allowed now, work on Java 9 also.

@@ -165,15 +165,6 @@ public void shouldLiftTryPartialFunction() {
assertThat(unknown.getCause().getMessage()).isEqualToIgnoringCase("Unknown MessageDigest not available");
}

private static final CheckedFunction2<Integer, Integer, Integer> recurrent1 = (i1, i2) -> i1 <= 0 ? i1 : CheckedFunction2Test.recurrent2.apply(i1 - 1, i2) + 1;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danieldietrich, @ruslansennov, how can we solve this properly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ConcurrentHashMap.html#computeIfAbsent-K-java.util.function.Function-

Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple, and must not attempt to update any other mappings of this map.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR #513 shows that we should avoid ConcurrentHashMap
Also I believe we should keep recursive memoization if it possible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ruslansennov, good find indeed.
Unfortunately however, using HashMap isn't safe either.
In Java 9 they added checks for concurrent modifications, i.e.
http://download.java.net/java/jdk9/docs/api/java/util/HashMap.html#computeIfAbsent-K-java.util.function.Function-

The mapping function should not modify this map during computation.

This method will, on a best-effort basis, throw a ConcurrentModificationException if it is detected that the mapping function modifies this map during computation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then we should use non-single operation solution inside sync block (if.. contains.. etc), or I'm wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Force pushed a new version, all tests seem to be passing now on Java 9 also (from Idea only, not yet from Maven).
Thanks for your help @ruslansennov, @zsolt-donca.
@danieldietrich, do you think we should add more tests here?

Copy link

@zsolt-donca zsolt-donca Jan 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned around

if (cache.containsKey(key)) {
    return cache.get(key);
} else {
    synchronized (cache) {
        ...
    }
}

It seems an awful lot to me like that previous discussion on Lazy that we've head with Aleksey Shipilev.

The key idea: the then branch above doesn't use any volatile variable, neither does it pass through a synchronized block, nor does cache have any volatile reads inside (it's a HashMap). Because no memory barrier is passed, it is unsafe. See Pitfall: Semi-Synchronized Is Fine.

Copy link
Contributor Author

@l0rinc l0rinc Jan 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't use any volatile variable

No, but it uses an effectively final variable, that's meant to avoid the need for local volatile ... (Shipilev would cry, seeing that we still don't understand this).
However, I don't trust HashMap in a non synchronized environment either ... maybe I should avoid the double locking and just put everything in a syncronized block :/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe I should avoid the double locking and just put everything in a syncronized block :/

it can be a good first step

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks.

@codecov-io
Copy link

codecov-io commented Jan 24, 2017

Current coverage is 97.62% (diff: 100%)

Merging #1837 into master will increase coverage by 0.12%

@@             master      #1837   diff @@
==========================================
  Files            87         87          
  Lines         11314      11252    -62   
  Methods           0          0          
  Messages          0          0          
  Branches       1875       1875          
==========================================
- Hits          11031      10985    -46   
+ Misses          169        153    -16   
  Partials        114        114          

Powered by Codecov. Last update 951e4e0...77e2b23

@l0rinc
Copy link
Contributor Author

l0rinc commented Jan 24, 2017

Another 0.12% codecov increase, we're almost at 98% :)
cc: @AlparSzabados

@l0rinc l0rinc changed the title RFC: Simplified function memoization Simplified function memoization Jan 24, 2017
""" else xs"""
final Object lock = new Object();
final ${im.getType("java.util.Map")}<Tuple$i<$generics>, R> cache = new ${im.getType("java.util.HashMap")}<>();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new WeakHashMap<>(); to avoid memory leaks?
@danieldietrich, @ruslansennov, @zsolt-donca?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new WeakHashMap<>(); to avoid memory leaks?

What about this scenario?

Function1<Tuple2<Integer, Integer>, Tuple2<Integer, Integer>> f;
Function1<Tuple2<Integer, Integer>, Tuple2<Integer, Integer>> mem = f.memoized();
Tuple2<Integer, Integer> keyInstance = Tuple.of(1, 2);
Tuple2<Integer, Integer> result = mem.apply(keyInstance);

        //
        //  hard work

keyInstance = null;

        ///  hard work
        ///

Tuple2<Integer, Integer> newInstance = Tuple.of(1, 2);
Tuple2<Integer, Integer> result2 = mem.apply(newInstance); // calculation again (?)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's a concern, as weak references can be eliminated at any time. The problem is even more problematic for FunctionN where N ≥ 2, as the key (tuple) is built internally, immediately discarding any other reference to it.

The above problem with the GC prematurely dropping the data could be eliminated by using soft references; for details, see this:

SoftReferences aren't required to behave any differently than WeakReferences, but in practice softly reachable objects are generally retained as long as memory is in plentiful supply. This makes them an excellent foundation for a cache, such as the image cache described above, since you can let the garbage collector worry about both how reachable the objects are (a strongly reachable object will never be removed from the cache) and how badly it needs the memory they are consuming.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paplorinc @zsolt-donca I follow @ruslansennov that the function should not compute a value twice (imagine Math::random). Not sure how we can ensure that a memoized function in GC'ed if not referenced any more but one of its cached values is still in use 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think memory out of bounds exception is worse, but I'm ok with both :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will merge this version in. Memory leaks may also appear in other cases, e.g. when capturing the outer object in a lambda. Optimizations need to be done carefully. Actually we have a working version, which is great. Thx!

@@ -74,5 +75,17 @@ default boolean isMemoized() {
* Zero Abstract Method (ZAM) interface for marking functions as memoized using intersection types.
*/
interface Memoized {
static <T extends Tuple, R> R of(Map<T, R> cache, T key, Function1<T, R> tupled) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good solution, we can do that. I currently have not a better solution for the synching problem. It is hard to test.

I'm not that familiar with proper cache invalidation (Weak/Soft-Refs) but we need to keep the tuples in the cache as long as the memoized function is referenced.

@danieldietrich
Copy link
Contributor

@paplorinc thank you, this is an important change toward Java 9!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants