-
-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplified function memoization #1837
Conversation
@@ -165,15 +165,6 @@ public void shouldLiftTryPartialFunction() { | |||
assertThat(unknown.getCause().getMessage()).isEqualToIgnoringCase("Unknown MessageDigest not available"); | |||
} | |||
|
|||
private static final CheckedFunction2<Integer, Integer, Integer> recurrent1 = (i1, i2) -> i1 <= 0 ? i1 : CheckedFunction2Test.recurrent2.apply(i1 - 1, i2) + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danieldietrich, @ruslansennov, how can we solve this properly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple, and must not attempt to update any other mappings of this map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR #513 shows that we should avoid ConcurrentHashMap
Also I believe we should keep recursive memoization if it possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @ruslansennov, good find indeed.
Unfortunately however, using HashMap
isn't safe either.
In Java 9
they added checks for concurrent modifications, i.e.
http://download.java.net/java/jdk9/docs/api/java/util/HashMap.html#computeIfAbsent-K-java.util.function.Function-
The mapping function should not modify this map during computation.
This method will, on a best-effort basis, throw a ConcurrentModificationException if it is detected that the mapping function modifies this map during computation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then we should use non-single operation solution inside sync block (if.. contains.. etc), or I'm wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Force pushed a new version, all tests seem to be passing now on Java 9
also (from Idea only, not yet from Maven).
Thanks for your help @ruslansennov, @zsolt-donca.
@danieldietrich, do you think we should add more tests here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned around
if (cache.containsKey(key)) {
return cache.get(key);
} else {
synchronized (cache) {
...
}
}
It seems an awful lot to me like that previous discussion on Lazy
that we've head with Aleksey Shipilev.
The key idea: the then branch above doesn't use any volatile variable, neither does it pass through a synchronized
block, nor does cache
have any volatile reads inside (it's a HashMap
). Because no memory barrier is passed, it is unsafe. See Pitfall: Semi-Synchronized Is Fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't use any volatile variable
No, but it uses an effectively final variable, that's meant to avoid the need for local volatile ... (Shipilev would cry, seeing that we still don't understand this).
However, I don't trust HashMap
in a non synchronized environment either ... maybe I should avoid the double locking and just put everything in a syncronized
block :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe I should avoid the double locking and just put everything in a syncronized block :/
it can be a good first step
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks.
Current coverage is 97.62% (diff: 100%)@@ master #1837 diff @@
==========================================
Files 87 87
Lines 11314 11252 -62
Methods 0 0
Messages 0 0
Branches 1875 1875
==========================================
- Hits 11031 10985 -46
+ Misses 169 153 -16
Partials 114 114
|
Another |
""" else xs""" | ||
final Object lock = new Object(); | ||
final ${im.getType("java.util.Map")}<Tuple$i<$generics>, R> cache = new ${im.getType("java.util.HashMap")}<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new WeakHashMap<>();
to avoid memory leaks?
@danieldietrich, @ruslansennov, @zsolt-donca?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new WeakHashMap<>();
to avoid memory leaks?
What about this scenario?
Function1<Tuple2<Integer, Integer>, Tuple2<Integer, Integer>> f;
Function1<Tuple2<Integer, Integer>, Tuple2<Integer, Integer>> mem = f.memoized();
Tuple2<Integer, Integer> keyInstance = Tuple.of(1, 2);
Tuple2<Integer, Integer> result = mem.apply(keyInstance);
//
// hard work
keyInstance = null;
/// hard work
///
Tuple2<Integer, Integer> newInstance = Tuple.of(1, 2);
Tuple2<Integer, Integer> result2 = mem.apply(newInstance); // calculation again (?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's a concern, as weak references can be eliminated at any time. The problem is even more problematic for FunctionN
where N ≥ 2
, as the key (tuple) is built internally, immediately discarding any other reference to it.
The above problem with the GC prematurely dropping the data could be eliminated by using soft references; for details, see this:
SoftReferences aren't required to behave any differently than WeakReferences, but in practice softly reachable objects are generally retained as long as memory is in plentiful supply. This makes them an excellent foundation for a cache, such as the image cache described above, since you can let the garbage collector worry about both how reachable the objects are (a strongly reachable object will never be removed from the cache) and how badly it needs the memory they are consuming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@paplorinc @zsolt-donca I follow @ruslansennov that the function should not compute a value twice (imagine Math::random). Not sure how we can ensure that a memoized function in GC'ed if not referenced any more but one of its cached values is still in use 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think memory out of bounds exception is worse, but I'm ok with both :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will merge this version in. Memory leaks may also appear in other cases, e.g. when capturing the outer object in a lambda. Optimizations need to be done carefully. Actually we have a working version, which is great. Thx!
@@ -74,5 +75,17 @@ default boolean isMemoized() { | |||
* Zero Abstract Method (ZAM) interface for marking functions as memoized using intersection types. | |||
*/ | |||
interface Memoized { | |||
static <T extends Tuple, R> R of(Map<T, R> cache, T key, Function1<T, R> tupled) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good solution, we can do that. I currently have not a better solution for the synching problem. It is hard to test.
I'm not that familiar with proper cache invalidation (Weak/Soft-Refs) but we need to keep the tuples in the cache as long as the memoized function is referenced.
@paplorinc thank you, this is an important change toward Java 9! |
Addresses #1762
null
s are avoided by always using aTuple
wrapper