-
-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iterate once to create two iterators in partition #2577
Iterate once to create two iterators in partition #2577
Conversation
Stream<T> second = Stream.empty(); | ||
for (T t : this) { | ||
if (predicate.test(t)) { | ||
first = first.append(t); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I remember, Stream.append is an O(n) operation, so the whole op will be O(n^2).
I believe you should use two java's ArrayList objects to partition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your pointer @ruslansennov . This should be fixed now.
final Iterator<T> first = that.iterator().filter(predicate); | ||
final Iterator<T> second = that.iterator().filter(predicate.negate()); | ||
return Tuple.of(first, second); | ||
final java.util.List<T> first = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not know if it is an issue here, but...
This solution is filtering eagerly and previous solution was filtering lazily.
If we would like to keep the same behaviour there is an option of creating a memoized predicate and then use it during filtering, e.g.
final Predicate<? super T> memoizedPredicate = Function1.of(predicate::test).memoized()::apply;
final Iterator<T> first = that.iterator().filter(memoizedPredicate);
final Iterator<T> second = that.iterator().filter(memoizedPredicate.negate());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that my solution with memoized predicate will not work in case when a mutable object mutates to the value that corresponds to another object to be filtered. Then another object is not evaluated but taken from cache with wrong result for it.
Example test:
final HashSet<MutableInteger> integers = HashSet.of(MutableInteger.of(1), MutableInteger.of(2));
final Tuple2<HashSet<MutableInteger>, HashSet<MutableInteger>> partition = integers.partition(mutableInteger -> mutableInteger.incrementAndCheckGreaterThan(2));
assertThat(partition._1).isEqualTo(HashSet.of(MutableInteger.of(3)));
assertThat(partition._2).isEqualTo(HashSet.of(MutableInteger.of(2)));
Where
private static class MutableInteger {
private int integer;
...
boolean incrementAndCheckGreaterThan(int i) {
return ++integer > i;
}
...
}
In this case the result of evaluation MutableInteger.of(1).incrementAndCheckGreaterThan(2)
is put into cache with entry MutableInteger.of(2) -> false
.
Then when the next object from set MutableInteger.of(2)
is filtered and it is found in the cache and evaluated to false
. While normally it would evaluate to MutableInteger.of(3) -> true
.
But the question stays whether we want to keep the partitioning lazy or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit too technical for me... Maybe @danieldietrich can have an answer about it.
@@ -1722,10 +1722,16 @@ public U next() { | |||
if (!hasNext()) { | |||
return Tuple.of(empty(), empty()); | |||
} else { | |||
final Stream<T> that = Stream.ofAll(this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also partition
method implemented by Stream
which works wrongly too.
I do not know another places. Maybe @danieldietrich could point them out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not aware of the problem of Stream#partition
. Thanks for pointing it out! 🙇 I checked all data-structures in VAVR, apart from Stream, all other implementations work correctly. About the Stream itself, I'm afraid that I don't know how to fix it and I'm not sure if it's really a bug. In Scala 2.13.2, the Stream implementation uses the predicates twice (link) as below:
override def partition(p: A => Boolean): (Stream[A], Stream[A]) = (filter(p(_)), filterNot(p(_)))
If I modify your example in issue #2559 by changing Set into Stream, you will see result in Scala as follow:
"Stream" should "partition correctly" in {
val fruitsToEat = Stream("apple", "banana")
val partition = fruitsToEat.partition(name => biteAndCheck(name))
partition._1 shouldEqual Stream()
partition._2 shouldEqual Stream() // not Stream("apple", "banana")
fruitsBeingEaten.get("apple").get.name shouldEqual "apple"
fruitsBeingEaten.get("apple").get.bites shouldEqual 2 // not 1
fruitsBeingEaten.get("banana").get.name shouldEqual "banana"
fruitsBeingEaten.get("banana").get.bites shouldEqual 2 // not 1
}
So I would say the VAVR is aligned with Scala on Stream's behaviors. Also, my current approach (ArrayList) does not fit the Stream requirement, because Stream is lazy sequence of elements which may be infinitely long. So I don't know how to use only one predicate to achieve that... So my suggestion is let's keep stream as it is and avoid modifying it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stream is lazy sequence of elements which may be infinitely long
Oh sorry guys I completely forgot about it :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review @kefasb, please see my comments inline.
@@ -1722,10 +1722,16 @@ public U next() { | |||
if (!hasNext()) { | |||
return Tuple.of(empty(), empty()); | |||
} else { | |||
final Stream<T> that = Stream.ofAll(this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not aware of the problem of Stream#partition
. Thanks for pointing it out! 🙇 I checked all data-structures in VAVR, apart from Stream, all other implementations work correctly. About the Stream itself, I'm afraid that I don't know how to fix it and I'm not sure if it's really a bug. In Scala 2.13.2, the Stream implementation uses the predicates twice (link) as below:
override def partition(p: A => Boolean): (Stream[A], Stream[A]) = (filter(p(_)), filterNot(p(_)))
If I modify your example in issue #2559 by changing Set into Stream, you will see result in Scala as follow:
"Stream" should "partition correctly" in {
val fruitsToEat = Stream("apple", "banana")
val partition = fruitsToEat.partition(name => biteAndCheck(name))
partition._1 shouldEqual Stream()
partition._2 shouldEqual Stream() // not Stream("apple", "banana")
fruitsBeingEaten.get("apple").get.name shouldEqual "apple"
fruitsBeingEaten.get("apple").get.bites shouldEqual 2 // not 1
fruitsBeingEaten.get("banana").get.name shouldEqual "banana"
fruitsBeingEaten.get("banana").get.bites shouldEqual 2 // not 1
}
So I would say the VAVR is aligned with Scala on Stream's behaviors. Also, my current approach (ArrayList) does not fit the Stream requirement, because Stream is lazy sequence of elements which may be infinitely long. So I don't know how to use only one predicate to achieve that... So my suggestion is let's keep stream as it is and avoid modifying it.
final Iterator<T> first = that.iterator().filter(predicate); | ||
final Iterator<T> second = that.iterator().filter(predicate.negate()); | ||
return Tuple.of(first, second); | ||
final java.util.List<T> first = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit too technical for me... Maybe @danieldietrich can have an answer about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mincong-h, thank you for your PR!
I dived into Scala's source code and checked their strategy. I summed up my findings on the original issue.
I think we need to align our Iterator.partition implementation to Scala. That way we will have the best of both worlds: laziness and iteration-once.
Additionally we might also check, if there are Scala collections that override the lazy evaluation strategy and provide a strict iteration-once implementation. I will do that tomorrow.
If you are not sure what to do, we can further discuss on the original issue.
Thanks!
Daniel
Thank you Daniel for your comments (both here and in the issue). I will take a look this weekend. |
Codecov Report
@@ Coverage Diff @@
## master #2577 +/- ##
============================================
+ Coverage 92.81% 92.85% +0.03%
- Complexity 5338 5340 +2
============================================
Files 89 89
Lines 12743 12743
Branches 1609 1611 +2
============================================
+ Hits 11828 11832 +4
+ Misses 727 723 -4
Partials 188 188
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #2577 +/- ##
============================================
+ Coverage 92.81% 92.84% +0.02%
- Complexity 5338 5355 +17
============================================
Files 89 89
Lines 12743 12773 +30
Branches 1609 1622 +13
============================================
+ Hits 11828 11859 +31
+ Misses 727 723 -4
- Partials 188 191 +3
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some comments to facilitate the review, @danieldietrich . I mainly have 3 concerns: the correctness in concurrent situation; whether or not the duplicate
method should be part of the public API; whether you see improvement about testing, the Iteractor.duplicate
in Scala 2.13 was added without tests... see scala/scala#6578
* | ||
* @return a pair of iterators | ||
*/ | ||
default Tuple2<Iterator<T>, Iterator<T>> duplicate() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method uses almost exactly the same logic as Iterator.duplicate in Scala 2.13
if (gap.isEmpty()) { | ||
ahead.set(this); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two Partner
s (duplicate iterators) may move forwards in different speed. The one moving faster is the one called "ahead", the other one is "behind". The distance between them is the "gap". When the gap becomes zero, it means that both iterators reached to the same position. We need to reset the "ahead".
} | ||
|
||
@Override | ||
public synchronized T next() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the synchronized
keyword, this is an intrinsic lock. From what I learnt from book "Java Concurrency in Practice", this can cause performance problem. But I don't know what is the good way to encapsulate all mutable states, gap
and ahead
here, and protect it from concurrent access. I would like to know if you have any suggestion about it. In Scala, the duplicate method is written as follows:
def next(): A = self.synchronized { ... }
|
||
@Override | ||
public synchronized boolean hasNext() { | ||
return (this != ahead.get() && !gap.isEmpty()) || Iterator.this.hasNext(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Daniel, you suggested me to create this method in another interface called IteratorModule
. But we need to use the "parent" iterator (not sure what is the right word here) via Iterator.this
. I'm not sure how to achieve that if the logic is moved to IteratorModule
. The trade-off of declaring the method here is that it becomes a public API... while it is meant to be a utility method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we need to use the "parent" iterator
The static duplicate method receives the parent Iterator as parameter:
IteratorModule.duplicate(Iterator iter)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes 😂 👍
@mincong-h thank you! please find my answers to your 3 concerns below.
An Iterator is stateful. It does not offer thread-safety on its own. The user has two options in a concurrent setting:
Please remove the occurrences of synchronized. The
For now, I would hide it in the
For Iterators, Therefore, please remove The unit tests cannot use the It is great to see that we will get the |
@danieldietrich , thanks for your suggestion and comments, really appreciated! I addressed most of the comments, here are something left.
I cannot remove
Vavr has it own implementation about
+1 :) |
0022fa7
to
4481b97
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Mincong,
thanks for applying the changes!
Of course you are absolutely right about the AtomicReference
object holder and about IterableAssert
. As long as Iterator
does not implement equals
and hashCode
we are free to use any of our assertion strategies. 👍
A special thank that you added the issue test! 😊
Hi Daniel. Thanks for your review, I learnt a lot from them. I'm looking forwards to contributing more in the future 😀 |
Hi Mincong, I‘m also looking forward to your future contributions, it is fun to work with you! Currently I have to finish some heavy duty tasks, such as restructuring the core Gradle project a bit and pulling some Git projects back into the core multi-module project. We need a clean version baseline in order to prevent versioning conflicts. Additionally, I will change Try, Option and Either accordingly to the current 2.0.0 branch (which will be deleted afterwards, it is just an arbitrary name). Finally, after some polishing, we will be able to release Vavr 1.0. After that, I will focus on a rework of the web page and the documentation. I expect also that there will be many questions and migration issues. Additionally, I expect several new feature requests. Here come new PRs into play but we will see. Thanks! |
The future looks exciting and your plan makes sens. Thanks for doing this Daniel! Do not hesitate to ping me if you need any help. I will try to grab some issues whenever I can as well. Best, |
* Reproduce the problem * Iterate once to create two iterators in partition * Avoid using io.vavr.collection.Stream * Test behavior of `partition` on different classes * Test that Stream.partition() is lazy * Create Iterator.duplicate() and add tests * Change the implementation of Iterator.partition() * Fix Set * Fix Map * Fix Multimap * Move duplicate to IteratorModule * Remove synchronized keyword * Remove hashCode and equals * Avoid using isEqualTo * Remove redundant tests
Fix #2559