Iterate once to create two iterators in partition #2577

mincong-h · 2020-05-05T10:20:00Z

mincong-h · 2020-05-05T10:29:48Z

CI: https://travis-ci.org/github/vavr-io/vavr/builds/683313113

ruslansennov · 2020-05-05T12:36:26Z

src/main/java/io/vavr/collection/Iterator.java

+            Stream<T> second = Stream.empty();
+            for (T t : this) {
+                if (predicate.test(t)) {
+                    first = first.append(t);


As far as I remember, Stream.append is an O(n) operation, so the whole op will be O(n^2).
I believe you should use two java's ArrayList objects to partition.

Thanks for your pointer @ruslansennov . This should be fixed now.

kefasb · 2020-05-08T06:12:01Z

src/main/java/io/vavr/collection/Iterator.java

-            final Iterator<T> first = that.iterator().filter(predicate);
-            final Iterator<T> second = that.iterator().filter(predicate.negate());
-            return Tuple.of(first, second);
+            final java.util.List<T> first = new ArrayList<>();


I do not know if it is an issue here, but...
This solution is filtering eagerly and previous solution was filtering lazily.

If we would like to keep the same behaviour there is an option of creating a memoized predicate and then use it during filtering, e.g.

final Predicate<? super T> memoizedPredicate = Function1.of(predicate::test).memoized()::apply; final Iterator<T> first = that.iterator().filter(memoizedPredicate); final Iterator<T> second = that.iterator().filter(memoizedPredicate.negate());

I found that my solution with memoized predicate will not work in case when a mutable object mutates to the value that corresponds to another object to be filtered. Then another object is not evaluated but taken from cache with wrong result for it.
Example test:

final HashSet<MutableInteger> integers = HashSet.of(MutableInteger.of(1), MutableInteger.of(2)); final Tuple2<HashSet<MutableInteger>, HashSet<MutableInteger>> partition = integers.partition(mutableInteger -> mutableInteger.incrementAndCheckGreaterThan(2)); assertThat(partition._1).isEqualTo(HashSet.of(MutableInteger.of(3))); assertThat(partition._2).isEqualTo(HashSet.of(MutableInteger.of(2)));

Where

private static class MutableInteger { private int integer; ... boolean incrementAndCheckGreaterThan(int i) { return ++integer > i; } ... }

In this case the result of evaluation MutableInteger.of(1).incrementAndCheckGreaterThan(2) is put into cache with entry MutableInteger.of(2) -> false.
Then when the next object from set MutableInteger.of(2) is filtered and it is found in the cache and evaluated to false. While normally it would evaluate to MutableInteger.of(3) -> true.

But the question stays whether we want to keep the partitioning lazy or not.

This is a bit too technical for me... Maybe @danieldietrich can have an answer about it.

kefasb · 2020-05-08T07:51:58Z

src/main/java/io/vavr/collection/Iterator.java

@@ -1722,10 +1722,16 @@ public U next() {
        if (!hasNext()) {
            return Tuple.of(empty(), empty());
        } else {
-            final Stream<T> that = Stream.ofAll(this);


There is also partition method implemented by Stream which works wrongly too.

I do not know another places. Maybe @danieldietrich could point them out.

I was not aware of the problem of Stream#partition. Thanks for pointing it out! 🙇 I checked all data-structures in VAVR, apart from Stream, all other implementations work correctly. About the Stream itself, I'm afraid that I don't know how to fix it and I'm not sure if it's really a bug. In Scala 2.13.2, the Stream implementation uses the predicates twice (link) as below:

override def partition(p: A => Boolean): (Stream[A], Stream[A]) = (filter(p(_)), filterNot(p(_)))

If I modify your example in issue #2559 by changing Set into Stream, you will see result in Scala as follow:

"Stream" should "partition correctly" in { val fruitsToEat = Stream("apple", "banana") val partition = fruitsToEat.partition(name => biteAndCheck(name)) partition._1 shouldEqual Stream() partition._2 shouldEqual Stream() // not Stream("apple", "banana") fruitsBeingEaten.get("apple").get.name shouldEqual "apple" fruitsBeingEaten.get("apple").get.bites shouldEqual 2 // not 1 fruitsBeingEaten.get("banana").get.name shouldEqual "banana" fruitsBeingEaten.get("banana").get.bites shouldEqual 2 // not 1 }

So I would say the VAVR is aligned with Scala on Stream's behaviors. Also, my current approach (ArrayList) does not fit the Stream requirement, because Stream is lazy sequence of elements which may be infinitely long. So I don't know how to use only one predicate to achieve that... So my suggestion is let's keep stream as it is and avoid modifying it.

Stream is lazy sequence of elements which may be infinitely long

Oh sorry guys I completely forgot about it :(

mincong-h

Thanks for your review @kefasb, please see my comments inline.

mincong-h · 2020-05-08T15:23:47Z

src/main/java/io/vavr/collection/Iterator.java

@@ -1722,10 +1722,16 @@ public U next() {
        if (!hasNext()) {
            return Tuple.of(empty(), empty());
        } else {
-            final Stream<T> that = Stream.ofAll(this);


I was not aware of the problem of Stream#partition. Thanks for pointing it out! 🙇 I checked all data-structures in VAVR, apart from Stream, all other implementations work correctly. About the Stream itself, I'm afraid that I don't know how to fix it and I'm not sure if it's really a bug. In Scala 2.13.2, the Stream implementation uses the predicates twice (link) as below:

override def partition(p: A => Boolean): (Stream[A], Stream[A]) = (filter(p(_)), filterNot(p(_)))

If I modify your example in issue #2559 by changing Set into Stream, you will see result in Scala as follow:

"Stream" should "partition correctly" in { val fruitsToEat = Stream("apple", "banana") val partition = fruitsToEat.partition(name => biteAndCheck(name)) partition._1 shouldEqual Stream() partition._2 shouldEqual Stream() // not Stream("apple", "banana") fruitsBeingEaten.get("apple").get.name shouldEqual "apple" fruitsBeingEaten.get("apple").get.bites shouldEqual 2 // not 1 fruitsBeingEaten.get("banana").get.name shouldEqual "banana" fruitsBeingEaten.get("banana").get.bites shouldEqual 2 // not 1 }

So I would say the VAVR is aligned with Scala on Stream's behaviors. Also, my current approach (ArrayList) does not fit the Stream requirement, because Stream is lazy sequence of elements which may be infinitely long. So I don't know how to use only one predicate to achieve that... So my suggestion is let's keep stream as it is and avoid modifying it.

mincong-h · 2020-05-08T15:27:49Z

src/main/java/io/vavr/collection/Iterator.java

-            final Iterator<T> first = that.iterator().filter(predicate);
-            final Iterator<T> second = that.iterator().filter(predicate.negate());
-            return Tuple.of(first, second);
+            final java.util.List<T> first = new ArrayList<>();


This is a bit too technical for me... Maybe @danieldietrich can have an answer about it.

danieldietrich

Hi @mincong-h, thank you for your PR!

I dived into Scala's source code and checked their strategy. I summed up my findings on the original issue.

I think we need to align our Iterator.partition implementation to Scala. That way we will have the best of both worlds: laziness and iteration-once.

Additionally we might also check, if there are Scala collections that override the lazy evaluation strategy and provide a strict iteration-once implementation. I will do that tomorrow.

If you are not sure what to do, we can further discuss on the original issue.

Thanks!

Daniel

mincong-h · 2020-05-14T05:38:02Z

Thank you Daniel for your comments (both here and in the issue). I will take a look this weekend.

codecov-io · 2020-05-16T16:06:25Z

Codecov Report

Merging #2577 into master will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@             Coverage Diff              @@
##             master    #2577      +/-   ##
============================================
+ Coverage     92.81%   92.85%   +0.03%     
- Complexity     5338     5340       +2     
============================================
  Files            89       89              
  Lines         12743    12743              
  Branches       1609     1611       +2     
============================================
+ Hits          11828    11832       +4     
+ Misses          727      723       -4     
  Partials        188      188

Impacted Files	Coverage Δ	Complexity Δ
src/main/java/io/vavr/collection/Iterator.java	`89.61% <100.00%> (+0.05%)`	`184.00 <0.00> (+2.00)`
src/main/java/io/vavr/collection/Map.java	`98.07% <0.00%> (+3.63%)`	`33.00% <0.00%> (ø%)`
src/main/java/io/vavr/collection/Seq.java	`93.75% <0.00%> (+3.75%)`	`40.00% <0.00%> (ø%)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0b7a72c...42117bf. Read the comment docs.

codecov-commenter · 2020-05-21T05:25:57Z

Codecov Report

Merging #2577 into master will increase coverage by 0.02%.
The diff coverage is 95.74%.

@@             Coverage Diff              @@
##             master    #2577      +/-   ##
============================================
+ Coverage     92.81%   92.84%   +0.02%     
- Complexity     5338     5355      +17     
============================================
  Files            89       89              
  Lines         12743    12773      +30     
  Branches       1609     1622      +13     
============================================
+ Hits          11828    11859      +31     
+ Misses          727      723       -4     
- Partials        188      191       +3

Impacted Files	Coverage Δ	Complexity Δ
src/main/java/io/vavr/collection/Traversable.java	`96.65% <ø> (+0.08%)`	`108.00 <0.00> (+2.00)`
src/main/java/io/vavr/collection/Iterator.java	`89.57% <91.66%> (+0.01%)`	`183.00 <1.00> (+1.00)`
...main/java/io/vavr/collection/AbstractMultimap.java	`98.44% <100.00%> (+0.03%)`	`141.00 <0.00> (+2.00)`
src/main/java/io/vavr/collection/BitSet.java	`98.02% <100.00%> (-0.01%)`	`82.00 <0.00> (ø)`
src/main/java/io/vavr/collection/Collections.java	`94.73% <100.00%> (+0.14%)`	`139.00 <3.00> (+3.00)`
src/main/java/io/vavr/collection/HashSet.java	`95.72% <100.00%> (-0.04%)`	`140.00 <1.00> (ø)`
...rc/main/java/io/vavr/collection/LinkedHashSet.java	`96.58% <100.00%> (-0.03%)`	`145.00 <1.00> (ø)`
src/main/java/io/vavr/collection/Maps.java	`99.24% <100.00%> (+0.02%)`	`80.00 <0.00> (+2.00)`
src/main/java/io/vavr/collection/TreeSet.java	`97.85% <100.00%> (-0.02%)`	`166.00 <1.00> (-2.00)`
src/main/java/io/vavr/collection/Queue.java	`95.76% <0.00%> (-0.69%)`	`202.00% <0.00%> (ø%)`
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0b7a72c...b75a1b6. Read the comment docs.

mincong-h

I added some comments to facilitate the review, @danieldietrich . I mainly have 3 concerns: the correctness in concurrent situation; whether or not the duplicate method should be part of the public API; whether you see improvement about testing, the Iteractor.duplicate in Scala 2.13 was added without tests... see scala/scala#6578

mincong-h · 2020-05-21T05:30:23Z

src/main/java/io/vavr/collection/Iterator.java

+     *
+     * @return a pair of iterators
+     */
+    default Tuple2<Iterator<T>, Iterator<T>> duplicate() {


This method uses almost exactly the same logic as Iterator.duplicate in Scala 2.13

mincong-h · 2020-05-21T05:35:19Z

src/main/java/io/vavr/collection/Iterator.java

+                if (gap.isEmpty()) {
+                    ahead.set(this);
+                }


Two Partners (duplicate iterators) may move forwards in different speed. The one moving faster is the one called "ahead", the other one is "behind". The distance between them is the "gap". When the gap becomes zero, it means that both iterators reached to the same position. We need to reset the "ahead".

mincong-h · 2020-05-21T05:46:02Z

src/main/java/io/vavr/collection/Iterator.java

+            }
+
+            @Override
+            public synchronized T next() {


I don't like the synchronized keyword, this is an intrinsic lock. From what I learnt from book "Java Concurrency in Practice", this can cause performance problem. But I don't know what is the good way to encapsulate all mutable states, gap and ahead here, and protect it from concurrent access. I would like to know if you have any suggestion about it. In Scala, the duplicate method is written as follows:

def next(): A = self.synchronized { ... }

mincong-h · 2020-05-21T05:52:28Z

src/main/java/io/vavr/collection/Iterator.java

+
+            @Override
+            public synchronized boolean hasNext() {
+                return (this != ahead.get() && !gap.isEmpty()) || Iterator.this.hasNext();


Daniel, you suggested me to create this method in another interface called IteratorModule. But we need to use the "parent" iterator (not sure what is the right word here) via Iterator.this. I'm not sure how to achieve that if the logic is moved to IteratorModule. The trade-off of declaring the method here is that it becomes a public API... while it is meant to be a utility method.

But we need to use the "parent" iterator

The static duplicate method receives the parent Iterator as parameter:

IteratorModule.duplicate(Iterator iter)

Ah yes 😂 👍

danieldietrich · 2020-05-21T23:13:38Z

@mincong-h thank you! please find my answers to your 3 concerns below.

the correctness in concurrent situation

An Iterator is stateful. It does not offer thread-safety on its own. The user has two options in a concurrent setting:

explicitly synchronize iterator access from the outside
alternatively use a persistent collection instead

Please remove the occurrences of synchronized. The AtomicReference also isn't needed.

whether or not the duplicate method should be part of the public API

For now, I would hide it in the IteratorModule you mentioned. We offer a static method:

public interface Iterator<T> {
    // ...
}

interface IteratorModule {
  static <T> Tuple2<Iterator<T>, Iterator<T>> duplicate(Iterator<T> iterator) {
    // ...
  }
}

whether you see improvement about testing

For Iterators, equals and hashCode aren't overwritten, we use Object equality because an Iterator may be infinite and there is no way to compare the elements. (Note: A Stream may also be infinite but we are able to defined equals and hashCode in a meaningful way because elements are persisted for later use).

Therefore, please remove hasSame, equals and hashCode from Partner.

The unit tests cannot use the isEqualTo method on Iterators. We need to first consume the elements, e.g. by storing them in a List or by joining them in a String. These then can be compared. Maybe the cheapest solution is to compare joined strings.

It is great to see that we will get the duplicate method :)

mincong-h · 2020-05-22T19:52:25Z

@danieldietrich , thanks for your suggestion and comments, really appreciated! I addressed most of the comments, here are something left.

Please remove the occurrences of synchronized. The AtomicReference also isn't needed.

I cannot remove AtomicReference, because I need a mutable object holder for holding the reference. It's not for being thread safe here. Assigning the reference to Iterator<T> ahead is not an option, because it fails the compilation with warning: "Variable 'ahead' is accessed from within innerclass, needs to be final or effectively final."

The unit tests cannot use the isEqualTo method on Iterators. We need to first consume the elements, e.g. by storing them in a List or by joining them in a String. These then can be compared. Maybe the cheapest solution is to compare joined strings.

Vavr has it own implementation about assertThat(Iterator) in IteratorTest as IterableAssert, which transforms the actual iterator and expected iterator into two lists and then compare them. So I think it's fine to use isEqualsTo. But just in case of something I didn't think about, I use what you suggested, i.e. joining them into a String and compare.

It is great to see that we will get the duplicate method :)

+1 :)

danieldietrich

Hi Mincong,
thanks for applying the changes!
Of course you are absolutely right about the AtomicReference object holder and about IterableAssert. As long as Iterator does not implement equals and hashCode we are free to use any of our assertion strategies. 👍
A special thank that you added the issue test! 😊

mincong-h · 2020-05-24T06:02:07Z

Hi Daniel. Thanks for your review, I learnt a lot from them. I'm looking forwards to contributing more in the future 😀

danieldietrich · 2020-05-24T13:33:39Z

Hi Mincong, I‘m also looking forward to your future contributions, it is fun to work with you!

Currently I have to finish some heavy duty tasks, such as restructuring the core Gradle project a bit and pulling some Git projects back into the core multi-module project. We need a clean version baseline in order to prevent versioning conflicts.

Additionally, I will change Try, Option and Either accordingly to the current 2.0.0 branch (which will be deleted afterwards, it is just an arbitrary name).

Finally, after some polishing, we will be able to release Vavr 1.0. After that, I will focus on a rework of the web page and the documentation. I expect also that there will be many questions and migration issues.

Additionally, I expect several new feature requests. Here come new PRs into play but we will see.

Thanks!

mincong-h · 2020-05-24T16:51:07Z

The future looks exciting and your plan makes sens. Thanks for doing this Daniel! Do not hesitate to ping me if you need any help. I will try to grab some issues whenever I can as well.

Best,
Mincong

* Reproduce the problem * Iterate once to create two iterators in partition * Avoid using io.vavr.collection.Stream * Test behavior of `partition` on different classes * Test that Stream.partition() is lazy * Create Iterator.duplicate() and add tests * Change the implementation of Iterator.partition() * Fix Set * Fix Map * Fix Multimap * Move duplicate to IteratorModule * Remove synchronized keyword * Remove hashCode and equals * Avoid using isEqualTo * Remove redundant tests

mincong-h added 2 commits May 5, 2020 10:34

Reproduce the problem

e305cb2

Iterate once to create two iterators in partition

1d25766

mincong-h requested a review from danieldietrich as a code owner May 5, 2020 10:20

mincong-h mentioned this pull request May 5, 2020

partition method seems to not work correctly, comparing to scala #2559

Closed

ruslansennov reviewed May 5, 2020

View reviewed changes

Avoid using io.vavr.collection.Stream

7c52962

kefasb reviewed May 8, 2020

View reviewed changes

kefasb suggested changes May 8, 2020

View reviewed changes

mincong-h commented May 8, 2020

View reviewed changes

danieldietrich suggested changes May 13, 2020

View reviewed changes

danieldietrich added design/refactoring/improvement «vavr-collection» labels May 15, 2020

danieldietrich added this to the v1.0.0 milestone May 15, 2020

Test behavior of partition on different classes

42117bf

mincong-h added 6 commits May 17, 2020 14:25

Test that Stream.partition() is lazy

5829a64

Create Iterator.duplicate() and add tests

10ff3d4

Change the implementation of Iterator.partition()

7316baa

Fix Set

0dce6b9

Fix Map

d486446

Fix Multimap

b75a1b6

mincong-h commented May 21, 2020

View reviewed changes

mincong-h added 4 commits May 22, 2020 20:59

Move duplicate to IteratorModule

64be4e4

Remove synchronized keyword

87e5bab

Remove hashCode and equals

17c6f4c

Avoid using isEqualTo

ba54ca5

Remove redundant tests

4481b97

mincong-h force-pushed the issue-2559-partition branch from 0022fa7 to 4481b97 Compare May 22, 2020 20:07

danieldietrich approved these changes May 23, 2020

View reviewed changes

danieldietrich merged commit 1c90106 into vavr-io:master May 23, 2020

mincong-h deleted the issue-2559-partition branch May 24, 2020 06:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Iterate once to create two iterators in partition #2577

Iterate once to create two iterators in partition #2577

mincong-h commented May 5, 2020

mincong-h commented May 5, 2020

ruslansennov May 5, 2020

mincong-h May 5, 2020

kefasb May 8, 2020

kefasb May 8, 2020 •

edited

Loading

mincong-h May 8, 2020

kefasb May 8, 2020 •

edited

Loading

mincong-h May 8, 2020

ruslansennov May 8, 2020

mincong-h left a comment

mincong-h May 8, 2020

mincong-h May 8, 2020

danieldietrich left a comment

mincong-h commented May 14, 2020

codecov-io commented May 16, 2020

codecov-commenter commented May 21, 2020

mincong-h left a comment

mincong-h May 21, 2020

mincong-h May 21, 2020

mincong-h May 21, 2020

mincong-h May 21, 2020 •

edited

Loading

danieldietrich May 21, 2020

mincong-h May 22, 2020

danieldietrich commented May 21, 2020 •

edited

Loading

mincong-h commented May 22, 2020

danieldietrich left a comment

mincong-h commented May 24, 2020

danieldietrich commented May 24, 2020

mincong-h commented May 24, 2020

Iterate once to create two iterators in partition #2577

Iterate once to create two iterators in partition #2577

Conversation

mincong-h commented May 5, 2020

mincong-h commented May 5, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kefasb May 8, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kefasb May 8, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mincong-h left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danieldietrich left a comment

Choose a reason for hiding this comment

mincong-h commented May 14, 2020

codecov-io commented May 16, 2020

Codecov Report

codecov-commenter commented May 21, 2020

Codecov Report

mincong-h left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mincong-h May 21, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danieldietrich commented May 21, 2020 • edited Loading

mincong-h commented May 22, 2020

danieldietrich left a comment

Choose a reason for hiding this comment

mincong-h commented May 24, 2020

danieldietrich commented May 24, 2020

mincong-h commented May 24, 2020

kefasb May 8, 2020 •

edited

Loading

kefasb May 8, 2020 •

edited

Loading

mincong-h May 21, 2020 •

edited

Loading

danieldietrich commented May 21, 2020 •

edited

Loading