Optimize Alternative (part 2): add prependK/appendK specializations for std containers #4052

satorg · 2021-11-23T08:50:40Z

See initial PR #4014 for details.
Subsequent PRs:

add prependK/appendK specializations for cats.data collections.
add prependK/appendK specializations for all other cats containers.

This PR adds optimized specializations for the new prependK/appendK methods introduced in the initial PR.

satorg · 2021-11-23T19:41:06Z

Seems there's no laws nor tests suites for Zip* wrappers. Neither there're docs for them. I'm wondering – are these wrappers used somewhere?

armanbilge · 2021-11-23T19:49:03Z

You mean like this?

cats/tests/src/test/scala/cats/tests/StreamSuite.scala

Line 48 in f52b1c7

    
           checkAll("ZipStream[Int]", CommutativeApplyTests[ZipStream].apply[Int, Int, Int])

satorg · 2021-11-23T20:19:31Z

Yeah, I didn't notice this test because it resides inside StreamSuite... But frankly speaking, it hardly can be satisfactory.
There's only a single CommutativeApplyTest for ZipStream and nothing more. The same for ZipLazyList. It doesn't make things clearer, unfortunately.

For example, this PR adds specializations for Alternative instances. Both ZipStream and ZipLazyList implement this typeclass (for some purpose I'd guess), but seems these implementations have never been tested.

Furthermore, both ZipLazyList and ZipStream wrappers have just zero of documentation available. Is there any clue on what are they for and what are the expectations they should meet?

armanbilge · 2021-11-23T20:32:21Z

Is there any clue on what are they for and what are the expectations they should meet?

IIUC the Zip* newtypes are used to implement the "parallel" Apply instances for various collections.

As an analogy: suppose I want a semigroup for integers. Both + and * form valid semigroups. One way to resolve the ambiguity would be to define the canonical semigroup for Int using addition and create a newtype MultiplicativeInt(value: Int) whose canonical semigroup is defined in terms of multiplication.

Similarly, we have List and ZipList to distinguish between the two possible implementations of Apply.

Apply[List].product(List(1, 2), List('a', 'b'))
// List((1,a), (1,b), (2,a), (2,b))

Apply[ZipList].product(ZipList(List(1, 2)), ZipList(List('a', 'b'))).value
// List((1,a), (2,b))

Try it out: https://scastie.scala-lang.org/zqPsMxHmTSe2qhIY4yIjyQ

This is exactly the basis for Parallel:

cats/core/src/main/scala/cats/instances/list.scala

Lines 246 to 258 in 8787d59

    
           implicit def catsStdNonEmptyParallelForListZipList: NonEmptyParallel.Aux[List, ZipList] = 
        
             new NonEmptyParallel[List] { 
        
               type F[x] = ZipList[x] 
        
               def flatMap: FlatMap[List] = cats.instances.list.catsStdInstancesForList 
        
               def apply: Apply[ZipList] = ZipList.catsDataCommutativeApplyForZipList 
        
               def sequential: ZipList ~> List = 
        
                 new (ZipList ~> List) { def apply[A](a: ZipList[A]): List[A] = a.value } 
        
               def parallel: List ~> ZipList = 
        
                 new (List ~> ZipList) { def apply[A](v: List[A]): ZipList[A] = new ZipList(v) } 
        
             }

johnynek · 2021-11-25T01:07:28Z

we should be testing the Zip* instances that their Applicative instances are lawful. I'm surprised we aren't doing that. Similarly with the Alternative instances.

armanbilge · 2021-11-25T01:08:48Z

we should be testing the Zip* instances that their Applicative instances are lawful.

I wonder if we can add this to the laws for Parallel?

johnynek · 2021-11-25T01:11:27Z

cats/laws/src/main/scala/cats/laws/NonEmptyParallelLaws.scala

Line 7 in 833fea8

trait NonEmptyParallelLaws[M[_]] {

yeah, it seems we should be testing that the Apply instance and the FlatMap instance are lawful there, similarly for Parallel.

johnynek · 2021-11-25T01:13:59Z

Parallel is kind of weird typeclass... it's like here are two type constructors... they have an isomorphic Functor and pure, but all bets are off when it comes to the Apply instances and one is maybe "parallel" (but not necessarily, because cats.Id can have a Parallel instance with itself).

armanbilge · 2021-11-25T01:21:22Z

one is maybe "parallel" (but not necessarily, because cats.Id can have a Parallel instance with itself).

Just because Id is its own Parallel instance, doesn't mean it's not "parallel", I think? For example, isn't Option its own Parallel as well? But that definition seems consistent with the Zip* parallel definitions for other collections, which I assume are "parallel".

johnynek · 2021-11-25T01:40:02Z

I guess I just mean the typeclass is underdefined...

I think when we actually use it we almost always actually know the concrete type.

Like, write an interesting abstract function using just Parallel. So, we have parTraverse which is virtually only called in abstract context when we know that the type constructor is for instance some kind of effect-like type.

I think it is a bit of a false generality....

Like, you can use Parallel with Either because you know behind the scenes what it is going to do, but not because you write interesting abstract functions with Parallel.

Does that make sense?

armanbilge · 2021-11-25T02:30:26Z

Oh, it is absolutely underdefined. But I wonder if we can do better?

I feel like I have an intuitive understand of what "parallel" means, and I wouldn't necessarily say it is a false generality. Something about how the "parallel" Applicative is one for which no consistent Monad exists i.e. it lacks any notion of sequentiality. And I think from here you may be (?) able to draw some useful (general) inferences e.g., that the effect does not "short-circuit".

But maybe I'm just babbling. Also not sure if this reasoning generalizes to NonEmptyParallel as well.

Edit: alas, I see my "no-consistent-Monad" reasoning has already excluded the Id and Option "parallels" 😆

satorg · 2021-11-26T00:40:40Z

I am struggling to understand, why does ZipLazyList implement Applicative#pure in this way:

def pure[A](x: A): ZipLazyList[A] = new ZipLazyList(LazyList.continually(x))

It is definitely impossible to implement for strict collections like List or Vector. Does it mean that Parallel[List] is impossible in principle and we can only have NonEmptyParallel[List]?

armanbilge · 2021-11-26T01:47:52Z

@satorg I think you need that definition of pure to satisfy this law:

cats/laws/src/main/scala/cats/laws/ApplicativeLaws.scala

Lines 13 to 14 in 309d344

    
           def applicativeIdentity[A](fa: F[A]): IsEq[F[A]] = 
        
             F.pure((a: A) => a).ap(fa) <-> fa

Makes sense about the strict collections 👍

satorg · 2021-11-26T02:34:28Z

Right, but I guess this law cannot be verified if pure is defined via LazyList#continually – is it correct?

Along with applicativeHomomorphism and applicativeInterchange...

satorg · 2021-11-26T02:38:01Z

FYI: I found out that seems such behavior was first introduced in this PR: #1938 for ZipStream and then repeated for ZipLazyList.

armanbilge · 2021-11-26T02:38:43Z

I could be mistaken, but I'm pretty sure you can, so long as fa is terminating. Try it and see?

satorg · 2021-11-26T02:45:10Z

I did already actually... ApplicativeTests for ZipLazyList hangs forever (it is what I meant).
But why do you think that fa might be terminating?
I cannot find any code that could make it terminating when executing ApplicativeTests[ZipLazyList].

armanbilge · 2021-11-26T02:46:41Z

fa is an arbitrary LazyList right? And an arbitrary LazyList can be terminating I think?

satorg · 2021-11-26T04:16:49Z

Ah, yeah, I see what you mean – that is correct for sure if we talk about applicativeIdentity and some other Applicative laws.

But unfortunately there're also applicativeHomomorphism and applicativeUnit in ApplicativeLaws – unlike applicativeIdentity these two don't take fa as a parameter but rather take some values and produce F[A] with F.pure on both sides of the <-> operator. Apparently, F.pure produce infinite sequences in such cases and these laws hang on execution:

  def applicativeHomomorphism[A, B](a: A, f: A => B): IsEq[F[B]] =
    F.pure(f).ap(F.pure(a)) <-> F.pure(f(a))

  def applicativeUnit[A](a: A): IsEq[F[A]] =
    F.unit.map(_ => a) <-> F.pure(a)

armanbilge · 2021-11-26T04:20:13Z

Yup, those laws would definitely hang :) I think if you filter the tests to remove those troublesome laws, at least we can test the rest?

armanbilge · 2021-11-26T04:25:17Z

Oh, here's one more idea: maybe you can override Eq for LazyList in the tests to just do like .take(100) or something?

satorg · 2021-11-26T05:17:35Z

Yes, it works for Applicative[ZipLazyList] indeed:

    implicit def limitedZipLazyListEq[A: Eq]: Eq[ZipLazyList[A]] = Eq.by(_.value.take(100))
    checkAll("ZipLazyList[Int]", AlternativeTests[ZipLazyList].alternative[Int, Int, Int])

The only concern here is that the limit value (100) is hardcoded. It could be solved by generating it though, but it is a minutiae for now.

The more important thing is that since this particular PR is about the Alternative typeclass, I am more interested in AlternativeLaws apparently. Unfortunately seems these laws are not respected by the current ZipLazyList instance:

    implicit def limitedZipLazyListEq[A: Eq]: Eq[ZipLazyList[A]] = Eq.by(_.value.take(100))
    checkAll("ZipLazyList[Int]", AlternativeTests[ZipLazyList].alternative[Int, Int, Int])

One of the Alternative's laws (namely nonEmptyAlternativeRightDistributivity) fails the verification:

failing seed for alternative.right distributivity is AXCCUaoqcBo2HTR10KI05rEEh3w_ev5SHXHKbuU2dEK=
failing seed for alternative.right distributivity is zu7cOAK1kuUfCZYUP8FV9HR54TXPpdnOTQOE4fkOz0K=
==> X cats.tests.LazyListSuite.ZipLazyList[Int]: alternative.right distributivity  0.03s munit.FailException: /Users/storgashov/Projects/public/cats/tests/src/test/scala-2.13+/cats/tests/LazyListSuite.scala:54
53:
54:    checkAll("ZipLazyList[Int]", AlternativeTests[ZipLazyList].alternative[Int, Int, Int])
55:  }

Failing seed: SPVn4KbEqAoFqs1jTEAi5tNDFVeie69H52mPPaqQUNF=
You can reproduce this failure by adding the following override to your suite:

  override val scalaCheckInitialSeed = "SPVn4KbEqAoFqs1jTEAi5tNDFVeie69H52mPPaqQUNF="

Falsified after 8 passed tests.
> Labels of failing property: 
Expected: cats.data.ZipLazyList@44b0cfdb
Received: cats.data.ZipLazyList@38d030fd
> ARG_0: cats.data.ZipLazyList@725f26df
> ARG_1: cats.data.ZipLazyList@adf8e75c
> ARG_2: cats.data.ZipLazyList@62e58123

satorg · 2021-11-26T05:19:01Z

Seems that Alternative[ZipLazyList] is unlawful, doesn't it? I bet the same is applicable for Alternative[Stream]...

armanbilge · 2021-11-26T13:34:34Z

Yes, it seems to me that NonEmptyAlternative is unlawful for any Zip* collection.

cats/laws/src/main/scala/cats/laws/NonEmptyAlternativeLaws.scala

Lines 13 to 14 in 309d344

    
           def nonEmptyAlternativeRightDistributivity[A, B](fa: F[A], ff: F[A => B], fg: F[A => B]): IsEq[F[B]] = 
        
             ((ff |+| fg).ap(fa)) <-> ((ff.ap(fa)) |+| (fg.ap(fa)))

Suppose fa, ff, fg are all some ZipCollection of length 1. Then the left-hand-side of the equality will have length 1 and the right-hand-side will have length 2.

satorg · 2021-11-26T18:21:47Z

Yes, agree. I would guess it may not be lawful even in theory. I'm struggling to imagine how Semigroup#combine (or SemigroupK#combineK) could be implemented for Zip* collections to make it working with Parallel properly. I mean, Semigroup#combine looks to me like a sequential operation by its nature (if I didn't miss something).

satorg · 2021-11-26T18:40:11Z

Btw, I've also found another piece of weirdness in ScalaVersionSpecificFoldableSuite, check it out:

  // Can't test Parallel here, as Applicative[ZipLazyList].pure doesn't terminate
  checkAll("Parallel[LazyList]", NonEmptyParallelTests[LazyList].nonEmptyParallel[Int, String])

  checkAll("Parallel[NonEmptyLazyList]", ParallelTests[NonEmptyLazyList].parallel[Int, String])

The thing is that there's no overridden Eq[NonEmptyLazyList in the scope, but ParallelTests[NonEmptyLazyList].parallel[Int, String] succeeds while ParallelTests[LazyList].parallel[Int, String] hangs. Guess why?

implicit def catsDataParallelForNonEmptyLazyList: Parallel.Aux[NonEmptyLazyList, OneAnd[ZipLazyList, *]] =
    new Parallel[NonEmptyLazyList] {
      type F[x] = OneAnd[ZipLazyList, x]

      def applicative: Applicative[OneAnd[ZipLazyList, *]] =
        OneAnd.catsDataApplicativeForOneAnd(ZipLazyList.catsDataAlternativeForZipLazyList)

      ...
    }

I.e. Parallel[NonEmptyZipList] completely delegates its Applicative to OneAnd, but the latter defines it in this way:

  implicit def catsDataApplicativeForOneAnd[F[_]](implicit F: Alternative[F]): Applicative[OneAnd[F, *]] =
    new Applicative[OneAnd[F, *]] {
      ....

      def pure[A](x: A): OneAnd[F, A] =  OneAnd(x, F.empty)

      override def ap[A, B](ff: OneAnd[F, A => B])(fa: OneAnd[F, A]): OneAnd[F, B] = {
        val (f, tf) = (ff.head, ff.tail)
        val (a, ta) = (fa.head, fa.tail)
        val fb = F.ap(tf)(F.combineK(F.pure(a), ta))
        OneAnd(f(a), F.combineK(F.map(ta)(f), fb))
      }
    }

I.e. it redefines pure and makes it always producing a single item. That makes the Parallel[NonEmptyLazyList] implementation inconsistent with Parallel[LazyList]. I.e. Parallel[NonEmptyLazyList] is not actually "parallel".

But the problem there is that all checks from ParallelTests[NonEmptyLazyList].parallel[Int, String] succeed and seems it does not detect that the instance is incorrect.

WDYT?

johnynek · 2021-11-26T19:24:48Z

Can we summarize the issues here?

pure(a) making singleton list vs infinite list
non-termination of laws with infinite pure (but maybe we could prove them to be lawful "on paper", however, it's not clear how useful such instances are since we might often trip over the infinite-ness of them).
the issue with distributivity in Alternative laws. This seems to be a counter-example of the current idea (combineK is concat but product is zip: Optimize Alternative (part 2): add prependK/appendK specializations for std containers #4052 (comment) )

I think we have an issue that we have instances that seem untested. And another issue about Parallel possibly being either ill-defined or under-defined.

armanbilge · 2021-11-26T19:36:47Z

Nice summary. IMO:

pure defined as an infinite list is correct to satisfy the identity law. See Optimize Alternative (part 2): add prependK/appendK specializations for std containers #4052 (comment).
I think these instances may still be useful, since in practice LazyLists can also be finite.
I see you edited with my counter-example. Seems to me like there is probably no valid NonEmptyAlternative for Zip* collections.

I think we have an issue that we have instances that seem untested. And another issue about Parallel possibly being either ill-defined or under-defined.

Yes, I have more thoughts about Parallel. I think that can be a separate issue. The issue of untested instances for Zip* is entangled with this PR.

johnynek · 2021-11-26T19:38:39Z

I wonder if something like this combineK works for Parallel infinite streams: a <+> b if a is finite, then we return a ++ b.drop(a.size)

Well, I guess if a is infinite, then that also works: a #::: b.drop(a.size) since the b part is lazily evaluated, and only evaluated if a is finite.

satorg · 2021-11-26T19:57:45Z

Hmm... a #::: b.drop(a.size) – dropping elements while combining collections looks a bit suspicious to me... Could it be that we will break some other laws in this way (even if such laws are not currently implemented in Cats)?

johnynek · 2021-11-26T20:44:48Z

If it doesn't break any laws, but the current ones do, I would argue it isn't suspicious.

Basically this is point wise aligning items, which feels like parallel to me. If that passes the distributive laws, I don't see any issue with it.

Also, note it follows len(a <+> b) == max(len(a), len(b)) where as for monadic alternatives we have len(a <+> b) = len(a) + len(b). This is reminiscent of the tropical semiring:

https://en.wikipedia.org/wiki/Tropical_semiring

armanbilge · 2021-11-26T20:45:32Z

It looked strange to me too at first but I think it might work actually :)

I think maybe before this PR, we should have another PR that:

Adds tests for the untested instances for Zip collections.
Makes those instances lawful as necessary.

Then this PR can stay focused on its original mission to implement prependK etc. @satorg WDYT?

satorg · 2021-11-26T21:04:44Z

Yeah, that makes sense to me too. I'd feel more confident about this PR if there were some tests for the code before the changes and I could see that the changes wouldn't break the tests. I'll take on this.

One more thing to confirm: as I understand, it is not possible to implement Applicative (and thus Alternative) for non-lazy Zip-collections (like ZipList or ZipVector) – is it correct?

I mean, apparently Applicative#pure cannot be implemented in terms of continually for non-lazy collections, but otherwise there's no way to produce a sequence that would be aligned with other Zip operations.

armanbilge · 2021-11-26T21:15:26Z

@satorg Thanks! Sounds great.

Yup, that's exactly my understanding about strict collections.

BTW, I just saw this relevant issue:

Deprecate the Parallel ZipLazyList instance #3082

Actually, I'm not sure I agree with it. I am writing up an issue about Parallel and I will express my thoughts there.

johnynek · 2021-11-26T21:34:05Z

I think we invented Parallel in cats, so we are in charge of what it means. It isn't entirely clear what the laws should be. I think that is worth exploring in more detail. I think it was basically developed to abstract over Either <-> Validated and IO <-> IO.Par. Basically those where the motivating use cases, IIRC.

We could have overfit on those two with some of the laws. But since it is so vague now what we mean it is hard to say. Basically the idea is sometimes there are multiple different applicatives that could be defined, but in those cases some of them cannot be extended to a Monad. I'm not sure if it is much deeper than that.

satorg · 2021-11-27T19:15:29Z

@armanbilge I decided to remove specializations for Zip collections from this PR since they apparently are worth a separate PR. Meanwhile it makes this PR focused on specializations for standard collections only and I think it is in a pretty good shape now, wdyt?

armanbilge · 2021-11-27T19:22:21Z

core/src/main/scala-2.12/cats/instances/stream.scala

@@ -16,6 +16,8 @@ trait StreamInstances extends cats.kernel.instances.StreamInstances {

      def combineK[A](x: Stream[A], y: Stream[A]): Stream[A] = x #::: y

+      override def prependK[A](a: A, fa: Stream[A]): Stream[A] = a #:: fa


For LazyList you overrode both prependK and appendK, why only prependK for Stream, are the implementations different?

Right, this is because appendK for LazyList utilizes LazyList#appended which is optimized for LazyList. But appended/prepended methods were first introduced in Scala 2.13 and Stream in Scala 2.12 does not have such methods yet. Therefore seems the best we can do here is to call

def appendK[A](fa: Stream[A], a: A) = fa #::: Stream(a)

but it is exactly what combineK (along with pure) already does.

On the second thought, Stream in Scala 2.13 does have the prepended/appended methods, but as I can see they're simply come from Seq and aren't optimized for Stream (unlike those in LazyList).

Makes sense, thanks for explaining!

johnynek previously approved these changes Nov 25, 2021

View reviewed changes

add prependK/appendK specializations for std containers

efa971c

satorg dismissed johnynek’s stale review via efa971c November 27, 2021 19:09

satorg force-pushed the ne-alternative-specializations-std branch from 530006f to efa971c Compare November 27, 2021 19:09

satorg requested a review from johnynek November 27, 2021 19:10

satorg marked this pull request as ready for review November 27, 2021 19:11

johnynek approved these changes Nov 27, 2021

View reviewed changes

armanbilge reviewed Nov 27, 2021

View reviewed changes

armanbilge approved these changes Nov 27, 2021

View reviewed changes

johnynek merged commit 4fe86ad into typelevel:main Nov 27, 2021

satorg deleted the ne-alternative-specializations-std branch November 28, 2021 01:16

satorg mentioned this pull request Nov 28, 2021

Optimize Alternative (part 3): add prependK/appendK specializations for Cats NE wrappers #4055

Merged

satorg mentioned this pull request Sep 16, 2022

Optimize Alternative (part 4): add prependK/appendK specializations for Cats monad transformers #4299

Open

armanbilge mentioned this pull request Aug 6, 2023

Type-classes for Prepend (Cons) and Append (Snoc) operations #4493

Open

		@@ -16,6 +16,8 @@ trait StreamInstances extends cats.kernel.instances.StreamInstances {

		def combineK[A](x: Stream[A], y: Stream[A]): Stream[A] = x #::: y

		override def prependK[A](a: A, fa: Stream[A]): Stream[A] = a #:: fa

Optimize Alternative (part 2): add prependK/appendK specializations for std containers #4052

Optimize Alternative (part 2): add prependK/appendK specializations for std containers #4052

Conversation

satorg commented Nov 23, 2021 • edited Loading

satorg commented Nov 23, 2021

armanbilge commented Nov 23, 2021

satorg commented Nov 23, 2021 • edited Loading

armanbilge commented Nov 23, 2021 • edited Loading

johnynek commented Nov 25, 2021

armanbilge commented Nov 25, 2021

johnynek commented Nov 25, 2021

johnynek commented Nov 25, 2021

armanbilge commented Nov 25, 2021 • edited Loading

johnynek commented Nov 25, 2021

armanbilge commented Nov 25, 2021 • edited Loading

satorg commented Nov 26, 2021 • edited Loading

armanbilge commented Nov 26, 2021 • edited Loading

satorg commented Nov 26, 2021 • edited Loading

satorg commented Nov 26, 2021 • edited Loading

armanbilge commented Nov 26, 2021

satorg commented Nov 26, 2021

armanbilge commented Nov 26, 2021

satorg commented Nov 26, 2021

armanbilge commented Nov 26, 2021

armanbilge commented Nov 26, 2021

satorg commented Nov 26, 2021

satorg commented Nov 26, 2021 • edited Loading

armanbilge commented Nov 26, 2021

satorg commented Nov 26, 2021

satorg commented Nov 26, 2021

johnynek commented Nov 26, 2021 • edited Loading

armanbilge commented Nov 26, 2021 • edited Loading

johnynek commented Nov 26, 2021

satorg commented Nov 26, 2021 • edited Loading

johnynek commented Nov 26, 2021

armanbilge commented Nov 26, 2021

satorg commented Nov 26, 2021

armanbilge commented Nov 26, 2021

johnynek commented Nov 26, 2021

satorg commented Nov 27, 2021

armanbilge Nov 27, 2021

Choose a reason for hiding this comment

satorg Nov 27, 2021

Choose a reason for hiding this comment

satorg Nov 27, 2021 • edited Loading

Choose a reason for hiding this comment

armanbilge Nov 27, 2021

Choose a reason for hiding this comment

satorg commented Nov 23, 2021 •

edited

Loading

satorg commented Nov 23, 2021 •

edited

Loading

armanbilge commented Nov 23, 2021 •

edited

Loading

armanbilge commented Nov 25, 2021 •

edited

Loading

armanbilge commented Nov 25, 2021 •

edited

Loading

satorg commented Nov 26, 2021 •

edited

Loading

armanbilge commented Nov 26, 2021 •

edited

Loading

satorg commented Nov 26, 2021 •

edited

Loading

satorg commented Nov 26, 2021 •

edited

Loading

satorg commented Nov 26, 2021 •

edited

Loading

johnynek commented Nov 26, 2021 •

edited

Loading

armanbilge commented Nov 26, 2021 •

edited

Loading

satorg commented Nov 26, 2021 •

edited

Loading

satorg Nov 27, 2021 •

edited

Loading