Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

partition method seems to not work correctly, comparing to scala #2559

Closed
kefasb opened this issue Jan 31, 2020 · 9 comments · Fixed by #2577
Closed

partition method seems to not work correctly, comparing to scala #2559

kefasb opened this issue Jan 31, 2020 · 9 comments · Fixed by #2577
Assignees
Milestone

Comments

@kefasb
Copy link

kefasb commented Jan 31, 2020

Hello, I spotted a weird behaviour of partition method.
I include code examples from Java and Scala which I used to make the comparison.

Java

public class PartitionVavr {
    private static final Map<String, Eat> fruitsBeingEaten = new HashMap<>();

    public static void main(String[] args) {

        final Set<String> fruitsToEat = HashSet.of("apple", "banana");

        final var partition = fruitsToEat.partition(PartitionVavr::biteAndCheck);

        System.out.println(partition);
    }

    private static boolean biteAndCheck(String name) {
        final var eat = fruitsBeingEaten.getOrDefault(name, Eat.prepare(name)).bite();
        fruitsBeingEaten.put(name, eat);
        return eat.isEaten();
    }

    private static class Eat {
        final int bitesLimit = 2;
        final int bites;
        final String name;

        public static Eat prepare(String name) {
            return new Eat(0, name);
        }

        private Eat(int bites, String name) {
            this.bites = bites;
            this.name = name;
        }

        public Eat bite() {
            return new Eat(bites + 1, name);
        }

        public boolean isEaten() {
            return bites >= bitesLimit;
        }
    }
}

Scala

object PartitionScala extends App {
  private val fruitsBeingEaten: mutable.Map[String, Eat] = mutable.Map()

  private val fruitsToEat = Set("apple", "banana")

  private val partition = fruitsToEat.partition(name => biteAndCheck(name))

  println(partition)

  private def biteAndCheck(name: String): Boolean = {
    val eat = fruitsBeingEaten.getOrElse(name, Eat.prepare(name)).bite
    fruitsBeingEaten += (name -> eat)
    eat.isEaten
  }

  private object Eat {
    def prepare(name: String) = new Eat(0, name)
  }

  private class Eat private(val bites: Int, val name: String) {
    final private val bitesLimit = 2

    def bite = new Eat(bites + 1, name)

    def isEaten: Boolean = bites >= bitesLimit
  }
}

The Java output is:
(HashSet(), HashSet())

The Scala output is:
(Set(),Set(apple, banana))

I think the difference is because vavr traverses the collection and checks the predicate twice while scala does it once.

Is this a bug or expected behaviour?

@kefasb
Copy link
Author

kefasb commented Jan 31, 2020

I used vavr 0.10.2

@danieldietrich
Copy link
Contributor

You are right! We need to change that.

Generally, we focus on immutable objects. Your case is very special, because the predicate also mutates the object on every call. Normally, you would expect that a predicate reflects a state without changing the state.

But I understand your concern, our goal is to align to Scala.

The affected code lines in HashSet are:

    @Override
    default Tuple2<Iterator<T>, Iterator<T>> partition(Predicate<? super T> predicate) {
        Objects.requireNonNull(predicate, "predicate is null");
        if (!hasNext()) {
            return Tuple.of(empty(), empty());
        } else {
            final Stream<T> that = Stream.ofAll(this);
            final Iterator<T> first = that.iterator().filter(predicate);
            final Iterator<T> second = that.iterator().filter(predicate.negate());
            return Tuple.of(first, second);
        }
    }

Here we filter twice. We should do it in one turn.

@mincong-h
Copy link
Member

@kefasb, thanks for reporting and @danieldietrich, thanks for you hints. I sent PR #2577 to fix the problem.

@danieldietrich
Copy link
Contributor

Thanks I will take a look!

@danieldietrich
Copy link
Contributor

By default, Scala's Iterable type, the base of all collections, provides a default implementation of partition that lazily iterates all elements twice. It is based on views:

def partition(p: A => Boolean): (C, C) = {
  val first = new View.Filter(this, p, false)
  val second = new View.Filter(this, p, true)
  (fromSpecific(first), fromSpecific(second))
}

Scala has so-called strict and lazy collections. The docs state that strict collections provide their own implementation in order to ensure that elements are iterated only twice:

*  The default implementation provided here needs to traverse the collection twice.
*  Strict collections have an overridden version of `partition` in `StrictOptimizedIterableOps`,
*  which requires only a single traversal.

Because Scala's Stream is already lazy, the default implementation of partition is overridden, based of filter and filterNot:

override def partition(p: A => Boolean): (Stream[A], Stream[A]) = (filter(p(_)), filterNot(p(_)))

As pointed out by you, we already do the same because we align to Scala:

@Override
public final Tuple2<Stream<T>, Stream<T>> partition(Predicate<? super T> predicate) {
    Objects.requireNonNull(predicate, "predicate is null");
    return Tuple.of(filter(predicate), filter(predicate.negate()));
}

Scala's Iterator is also special, because its element can be iterated only once. Therefore it lazily duplicates the elements and then filters both resulting iterators, very much like we saw in Stream:

def partition(p: A => Boolean): (Iterator[A], Iterator[A]) = {
   val (a, b) = duplicate
  (a filter p, b filterNot p)
}

Scala's Iterator.duplicate method isn't trivial.


I think it is sufficient to take the following actions:

  • Align our Iterators partition method to Scala's partition strategy
  • Double-check other Scala implementations than Iterator and Stream for strictness in the way that they override the default Iterable.partition implementation. We should align to that behavior. Scala's LazyList is currently out of scope because we don't have it, yet.

@mincong-h
Copy link
Member

@danieldietrich , I'm a bit lost and I need your help on this issue. So far, I created some tests to compare the behavior between Scala 2.13.2 and VAVR 1.0.0-SNAPSHOT, but I don't know which direction to continue.

On Scala side, I created a Git repo and added a test Issue2559Spec.scala to demonstrate the behavior of different Scala classes in Scala 2.13.2. I never used Scala myself and maybe more test-cases need to be added. From that file, we can see that:

  • All classes with trait StrictOptimizedIterableOps, predicate is used exactly N times for partitioning the values, where N is the size of the values.
  • Array, Steam, LazyList are not strict. In Array, predicate is used N times as well. In Stream and LazyList, predicate is used N * 2 times. Array uses N times because the partition method is overridden by ArrayOps.scala (source code).

On VAVR side, I added a test in my PR as commit 42117bf to demonstrate the behavior of partition() on different final classes. Predicate is used exactly N times for all the classes, except Stream, where it is used N * 2 times. So the behavior is similar to Scala.

Several things confuse me here:

  • How to test the laziness? The laziness of partition() in particular. In my PR, I changed the implementation of Iterator, but none of the tests failed, I don't know what is expected.
  • You said "Align our Iterators partition method to Scala's partition strategy", then where should I put the optimized, one-iteration implementation of partition? Should we create StrictOptimizedIterableOps? This is not trivial...

Personally I would like to use the one-iteration implementation as the default partition strategy. And override this strategy in Stream (and LazyList later on). It means keeping my PR #2577 as it is. But the biggest concerns I have are the laziness and the partition-strategy alignment as stated above.

@danieldietrich
Copy link
Contributor

Thank you for your analysis! That's interesting. We do it as follows:

How to test the laziness?

We could create an infinite Stream of 0, 1, 0, 1, 0, 1, ... and partition it. If partition isn't lazy, the test would run forever, which is ok. We attribute that test with @Test(timeout=5000) or something similar.

Should we create StrictOptimizedIterableOps?

No, I see currently no use-case. We already have Traversable.isLazy(), which means: lazy := !strict. So we need no special interface in order to reflect the strictness property.

where should I put the optimized, one-iteration implementation of partition?

Sorry, I misunderstood the Scala implementation. There, the duplicate method does not solve the single iteration problem. It just ensures that the underlying Iterator can be iterated twice. We need that too, it will solve one task of #1945.

1st step: Add duplicate to Iterator

public interface Iterator<T> {
    @Override
    default Tuple2<Iterator<T>, Iterator<T>> partition(Predicate<? super T> predicate) {
        Objects.requireNonNull(predicate, "predicate is null");
        if (!hasNext()) {
            return Tuple.of(empty(), empty());
        } else {
            final Tuple<Iterator<T>, Iterator<T>> dup = duplicate();
            return Tuple.of(dup._1.filter(p), dup._2.filterNot(p));
        }
    }
}

interface IteratorModule {
    static <T> Tuple2<Iterator<T>, Iterator<T>> duplicate(Iterator<T> iter) {
        // see https://github.com/scala/scala/blob/v2.13.2/src/library/scala/collection/Iterator.scala#L839-L884
    }
}

Hint: We keep the existing Stream.partition() method.

2nd step: All non-lazy collections (= all other than Stream and Iterator) need to have a strict partition method.

Many already have it, for example Array:

    @Override
    public Tuple2<Array<T>, Array<T>> partition(Predicate<? super T> predicate) {
        Objects.requireNonNull(predicate, "predicate is null");
        final java.util.List<T> left = new ArrayList<>(), right = new ArrayList<>();
        for (T t : this) {
            (predicate.test(t) ? left : right).add(t);
        }
        return Tuple.of(ofAll(left), ofAll(right));
    }

I suggest to move that functionality to Collections.java:

// caution, this is from the top of my head
class Collections {
    static <C> Tuple2<C, C> partition(Iterable<T> iterable, Predicate<? super T> predicate, Function<Iterable<T>, C> ofAll) {
        Objects.requireNonNull(predicate, "predicate is null");
        final java.util.List<T> left = new ArrayList<>(), right = new ArrayList<>();
        for (T t : this) {
            (predicate.test(t) ? left : right).add(t);
        }
        return Tuple.of(ofAll(left), ofAll(right));
    }
}

Then Array and all other strict collections would be implemented like this:

class Array<T> {
    @Override
    public Tuple2<Array<T>, Array<T>> partition(Predicate<? super T> predicate) {
        return Collections.partition(this, Array::ofAll, predicate);
    }
}

I think this should also work for Maps:

class HashMap<K, V> {
    @Override
    public Tuple2<HashMap<K, V>, HashMap<K, V>> partition(Predicate<? super Tuple2<K, V>> predicate) {
        return Collections.partition(this, predicate, this::createFromEntries);
    }
}

If so, the partition method in the utility class Maps.java can be deleted.

Please don't hesitate to ask on any further questions!

Thanks for your help!

- Daniel

@mincong-h
Copy link
Member

Thanks for your hints 👀 , Daniel! I will continue the work on it and ping you when I have progress. I'm not sure I can finish it today. If not, we may need to wait until next weekend.

@danieldietrich
Copy link
Contributor

@mincong-h thanks! you don't need to rush

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants