partition method seems to not work correctly, comparing to scala #2559

kefasb · 2020-01-31T11:32:45Z

Hello, I spotted a weird behaviour of partition method.
I include code examples from Java and Scala which I used to make the comparison.

Java

public class PartitionVavr {
    private static final Map<String, Eat> fruitsBeingEaten = new HashMap<>();

    public static void main(String[] args) {

        final Set<String> fruitsToEat = HashSet.of("apple", "banana");

        final var partition = fruitsToEat.partition(PartitionVavr::biteAndCheck);

        System.out.println(partition);
    }

    private static boolean biteAndCheck(String name) {
        final var eat = fruitsBeingEaten.getOrDefault(name, Eat.prepare(name)).bite();
        fruitsBeingEaten.put(name, eat);
        return eat.isEaten();
    }

    private static class Eat {
        final int bitesLimit = 2;
        final int bites;
        final String name;

        public static Eat prepare(String name) {
            return new Eat(0, name);
        }

        private Eat(int bites, String name) {
            this.bites = bites;
            this.name = name;
        }

        public Eat bite() {
            return new Eat(bites + 1, name);
        }

        public boolean isEaten() {
            return bites >= bitesLimit;
        }
    }
}

Scala

object PartitionScala extends App {
  private val fruitsBeingEaten: mutable.Map[String, Eat] = mutable.Map()

  private val fruitsToEat = Set("apple", "banana")

  private val partition = fruitsToEat.partition(name => biteAndCheck(name))

  println(partition)

  private def biteAndCheck(name: String): Boolean = {
    val eat = fruitsBeingEaten.getOrElse(name, Eat.prepare(name)).bite
    fruitsBeingEaten += (name -> eat)
    eat.isEaten
  }

  private object Eat {
    def prepare(name: String) = new Eat(0, name)
  }

  private class Eat private(val bites: Int, val name: String) {
    final private val bitesLimit = 2

    def bite = new Eat(bites + 1, name)

    def isEaten: Boolean = bites >= bitesLimit
  }
}

The Java output is:
(HashSet(), HashSet())

The Scala output is:
(Set(),Set(apple, banana))

I think the difference is because vavr traverses the collection and checks the predicate twice while scala does it once.

Is this a bug or expected behaviour?

The text was updated successfully, but these errors were encountered:

kefasb · 2020-01-31T11:48:39Z

I used vavr 0.10.2

danieldietrich · 2020-01-31T11:57:13Z

You are right! We need to change that.

Generally, we focus on immutable objects. Your case is very special, because the predicate also mutates the object on every call. Normally, you would expect that a predicate reflects a state without changing the state.

But I understand your concern, our goal is to align to Scala.

The affected code lines in HashSet are:

    @Override
    default Tuple2<Iterator<T>, Iterator<T>> partition(Predicate<? super T> predicate) {
        Objects.requireNonNull(predicate, "predicate is null");
        if (!hasNext()) {
            return Tuple.of(empty(), empty());
        } else {
            final Stream<T> that = Stream.ofAll(this);
            final Iterator<T> first = that.iterator().filter(predicate);
            final Iterator<T> second = that.iterator().filter(predicate.negate());
            return Tuple.of(first, second);
        }
    }

Here we filter twice. We should do it in one turn.

mincong-h · 2020-05-05T10:22:58Z

@kefasb, thanks for reporting and @danieldietrich, thanks for you hints. I sent PR #2577 to fix the problem.

danieldietrich · 2020-05-13T20:48:09Z

Thanks I will take a look!

danieldietrich · 2020-05-13T23:04:32Z

By default, Scala's Iterable type, the base of all collections, provides a default implementation of partition that lazily iterates all elements twice. It is based on views:

def partition(p: A => Boolean): (C, C) = {
  val first = new View.Filter(this, p, false)
  val second = new View.Filter(this, p, true)
  (fromSpecific(first), fromSpecific(second))
}

Scala has so-called strict and lazy collections. The docs state that strict collections provide their own implementation in order to ensure that elements are iterated only twice:

*  The default implementation provided here needs to traverse the collection twice.
*  Strict collections have an overridden version of `partition` in `StrictOptimizedIterableOps`,
*  which requires only a single traversal.

Because Scala's Stream is already lazy, the default implementation of partition is overridden, based of filter and filterNot:

override def partition(p: A => Boolean): (Stream[A], Stream[A]) = (filter(p(_)), filterNot(p(_)))

As pointed out by you, we already do the same because we align to Scala:

@Override
public final Tuple2<Stream<T>, Stream<T>> partition(Predicate<? super T> predicate) {
    Objects.requireNonNull(predicate, "predicate is null");
    return Tuple.of(filter(predicate), filter(predicate.negate()));
}

Scala's Iterator is also special, because its element can be iterated only once. Therefore it lazily duplicates the elements and then filters both resulting iterators, very much like we saw in Stream:

def partition(p: A => Boolean): (Iterator[A], Iterator[A]) = {
   val (a, b) = duplicate
  (a filter p, b filterNot p)
}

Scala's Iterator.duplicate method isn't trivial.

I think it is sufficient to take the following actions:

Align our Iterators partition method to Scala's partition strategy
Double-check other Scala implementations than Iterator and Stream for strictness in the way that they override the default Iterable.partition implementation. We should align to that behavior. Scala's LazyList is currently out of scope because we don't have it, yet.

mincong-h · 2020-05-16T16:52:32Z

@danieldietrich , I'm a bit lost and I need your help on this issue. So far, I created some tests to compare the behavior between Scala 2.13.2 and VAVR 1.0.0-SNAPSHOT, but I don't know which direction to continue.

On Scala side, I created a Git repo and added a test Issue2559Spec.scala to demonstrate the behavior of different Scala classes in Scala 2.13.2. I never used Scala myself and maybe more test-cases need to be added. From that file, we can see that:

All classes with trait StrictOptimizedIterableOps, predicate is used exactly N times for partitioning the values, where N is the size of the values.
Array, Steam, LazyList are not strict. In Array, predicate is used N times as well. In Stream and LazyList, predicate is used N * 2 times. Array uses N times because the partition method is overridden by ArrayOps.scala (source code).

On VAVR side, I added a test in my PR as commit 42117bf to demonstrate the behavior of partition() on different final classes. Predicate is used exactly N times for all the classes, except Stream, where it is used N * 2 times. So the behavior is similar to Scala.

Several things confuse me here:

How to test the laziness? The laziness of partition() in particular. In my PR, I changed the implementation of Iterator, but none of the tests failed, I don't know what is expected.
You said "Align our Iterators partition method to Scala's partition strategy", then where should I put the optimized, one-iteration implementation of partition? Should we create StrictOptimizedIterableOps? This is not trivial...

Personally I would like to use the one-iteration implementation as the default partition strategy. And override this strategy in Stream (and LazyList later on). It means keeping my PR #2577 as it is. But the biggest concerns I have are the laziness and the partition-strategy alignment as stated above.

danieldietrich · 2020-05-17T15:26:12Z

Thank you for your analysis! That's interesting. We do it as follows:

How to test the laziness?

We could create an infinite Stream of 0, 1, 0, 1, 0, 1, ... and partition it. If partition isn't lazy, the test would run forever, which is ok. We attribute that test with @Test(timeout=5000) or something similar.

Should we create StrictOptimizedIterableOps?

No, I see currently no use-case. We already have Traversable.isLazy(), which means: lazy := !strict. So we need no special interface in order to reflect the strictness property.

where should I put the optimized, one-iteration implementation of partition?

Sorry, I misunderstood the Scala implementation. There, the duplicate method does not solve the single iteration problem. It just ensures that the underlying Iterator can be iterated twice. We need that too, it will solve one task of #1945.

1st step: Add `duplicate` to `Iterator`

public interface Iterator<T> {
    @Override
    default Tuple2<Iterator<T>, Iterator<T>> partition(Predicate<? super T> predicate) {
        Objects.requireNonNull(predicate, "predicate is null");
        if (!hasNext()) {
            return Tuple.of(empty(), empty());
        } else {
            final Tuple<Iterator<T>, Iterator<T>> dup = duplicate();
            return Tuple.of(dup._1.filter(p), dup._2.filterNot(p));
        }
    }
}

interface IteratorModule {
    static <T> Tuple2<Iterator<T>, Iterator<T>> duplicate(Iterator<T> iter) {
        // see https://github.com/scala/scala/blob/v2.13.2/src/library/scala/collection/Iterator.scala#L839-L884
    }
}

Hint: We keep the existing Stream.partition() method.

2nd step: All non-lazy collections (= all other than Stream and Iterator) need to have a strict partition method.

Many already have it, for example Array:

    @Override
    public Tuple2<Array<T>, Array<T>> partition(Predicate<? super T> predicate) {
        Objects.requireNonNull(predicate, "predicate is null");
        final java.util.List<T> left = new ArrayList<>(), right = new ArrayList<>();
        for (T t : this) {
            (predicate.test(t) ? left : right).add(t);
        }
        return Tuple.of(ofAll(left), ofAll(right));
    }

I suggest to move that functionality to Collections.java:

// caution, this is from the top of my head
class Collections {
    static <C> Tuple2<C, C> partition(Iterable<T> iterable, Predicate<? super T> predicate, Function<Iterable<T>, C> ofAll) {
        Objects.requireNonNull(predicate, "predicate is null");
        final java.util.List<T> left = new ArrayList<>(), right = new ArrayList<>();
        for (T t : this) {
            (predicate.test(t) ? left : right).add(t);
        }
        return Tuple.of(ofAll(left), ofAll(right));
    }
}

Then Array and all other strict collections would be implemented like this:

class Array<T> {
    @Override
    public Tuple2<Array<T>, Array<T>> partition(Predicate<? super T> predicate) {
        return Collections.partition(this, Array::ofAll, predicate);
    }
}

I think this should also work for Maps:

class HashMap<K, V> {
    @Override
    public Tuple2<HashMap<K, V>, HashMap<K, V>> partition(Predicate<? super Tuple2<K, V>> predicate) {
        return Collections.partition(this, predicate, this::createFromEntries);
    }
}

If so, the partition method in the utility class Maps.java can be deleted.

Please don't hesitate to ask on any further questions!

Thanks for your help!

- Daniel

mincong-h · 2020-05-17T17:00:51Z

Thanks for your hints 👀 , Daniel! I will continue the work on it and ping you when I have progress. I'm not sure I can finish it today. If not, we may need to wait until next weekend.

danieldietrich · 2020-05-17T17:22:25Z

@mincong-h thanks! you don't need to rush

danieldietrich added this to the v1.0.0 milestone Jan 31, 2020

danieldietrich added bug «vavr-collection» labels Jan 31, 2020

mincong-h mentioned this issue May 5, 2020

Iterate once to create two iterators in partition #2577

Merged

danieldietrich assigned mincong-h May 16, 2020

danieldietrich closed this as completed in #2577 May 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

partition method seems to not work correctly, comparing to scala #2559

partition method seems to not work correctly, comparing to scala #2559

kefasb commented Jan 31, 2020

kefasb commented Jan 31, 2020

danieldietrich commented Jan 31, 2020

mincong-h commented May 5, 2020

danieldietrich commented May 13, 2020

danieldietrich commented May 13, 2020

mincong-h commented May 16, 2020

danieldietrich commented May 17, 2020

mincong-h commented May 17, 2020

danieldietrich commented May 17, 2020

partition method seems to not work correctly, comparing to scala #2559

partition method seems to not work correctly, comparing to scala #2559

Comments

kefasb commented Jan 31, 2020

kefasb commented Jan 31, 2020

danieldietrich commented Jan 31, 2020

mincong-h commented May 5, 2020

danieldietrich commented May 13, 2020

danieldietrich commented May 13, 2020

mincong-h commented May 16, 2020

danieldietrich commented May 17, 2020

1st step: Add duplicate to Iterator

2nd step: All non-lazy collections (= all other than Stream and Iterator) need to have a strict partition method.

mincong-h commented May 17, 2020

danieldietrich commented May 17, 2020

1st step: Add `duplicate` to `Iterator`