Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Alternative (part 2): add prependK/appendK specializations for std containers #4052

Merged
merged 1 commit into from
Nov 27, 2021

Conversation

satorg
Copy link
Contributor

@satorg satorg commented Nov 23, 2021

See initial PR #4014 for details.
Subsequent PRs:

  • add prependK/appendK specializations for cats.data collections.
  • add prependK/appendK specializations for all other cats containers.

This PR adds optimized specializations for the new prependK/appendK methods introduced in the initial PR.

@satorg
Copy link
Contributor Author

satorg commented Nov 23, 2021

Seems there's no laws nor tests suites for Zip* wrappers. Neither there're docs for them. I'm wondering – are these wrappers used somewhere?

@armanbilge
Copy link
Member

You mean like this?

checkAll("ZipStream[Int]", CommutativeApplyTests[ZipStream].apply[Int, Int, Int])

@satorg
Copy link
Contributor Author

satorg commented Nov 23, 2021

Yeah, I didn't notice this test because it resides inside StreamSuite... But frankly speaking, it hardly can be satisfactory.
There's only a single CommutativeApplyTest for ZipStream and nothing more. The same for ZipLazyList. It doesn't make things clearer, unfortunately.

For example, this PR adds specializations for Alternative instances. Both ZipStream and ZipLazyList implement this typeclass (for some purpose I'd guess), but seems these implementations have never been tested.

Furthermore, both ZipLazyList and ZipStream wrappers have just zero of documentation available. Is there any clue on what are they for and what are the expectations they should meet?

@armanbilge
Copy link
Member

armanbilge commented Nov 23, 2021

Is there any clue on what are they for and what are the expectations they should meet?

IIUC the Zip* newtypes are used to implement the "parallel" Apply instances for various collections.

As an analogy: suppose I want a semigroup for integers. Both + and * form valid semigroups. One way to resolve the ambiguity would be to define the canonical semigroup for Int using addition and create a newtype MultiplicativeInt(value: Int) whose canonical semigroup is defined in terms of multiplication.

Similarly, we have List and ZipList to distinguish between the two possible implementations of Apply.

Apply[List].product(List(1, 2), List('a', 'b'))
// List((1,a), (1,b), (2,a), (2,b))

Apply[ZipList].product(ZipList(List(1, 2)), ZipList(List('a', 'b'))).value
// List((1,a), (2,b))

Try it out: https://scastie.scala-lang.org/zqPsMxHmTSe2qhIY4yIjyQ

This is exactly the basis for Parallel:

implicit def catsStdNonEmptyParallelForListZipList: NonEmptyParallel.Aux[List, ZipList] =
new NonEmptyParallel[List] {
type F[x] = ZipList[x]
def flatMap: FlatMap[List] = cats.instances.list.catsStdInstancesForList
def apply: Apply[ZipList] = ZipList.catsDataCommutativeApplyForZipList
def sequential: ZipList ~> List =
new (ZipList ~> List) { def apply[A](a: ZipList[A]): List[A] = a.value }
def parallel: List ~> ZipList =
new (List ~> ZipList) { def apply[A](v: List[A]): ZipList[A] = new ZipList(v) }
}

johnynek
johnynek previously approved these changes Nov 25, 2021
@johnynek
Copy link
Contributor

we should be testing the Zip* instances that their Applicative instances are lawful. I'm surprised we aren't doing that. Similarly with the Alternative instances.

@armanbilge
Copy link
Member

we should be testing the Zip* instances that their Applicative instances are lawful.

I wonder if we can add this to the laws for Parallel?

@johnynek
Copy link
Contributor

trait NonEmptyParallelLaws[M[_]] {

yeah, it seems we should be testing that the Apply instance and the FlatMap instance are lawful there, similarly for Parallel.

@johnynek
Copy link
Contributor

Parallel is kind of weird typeclass... it's like here are two type constructors... they have an isomorphic Functor and pure, but all bets are off when it comes to the Apply instances and one is maybe "parallel" (but not necessarily, because cats.Id can have a Parallel instance with itself).

@armanbilge
Copy link
Member

armanbilge commented Nov 25, 2021

one is maybe "parallel" (but not necessarily, because cats.Id can have a Parallel instance with itself).

Just because Id is its own Parallel instance, doesn't mean it's not "parallel", I think? For example, isn't Option its own Parallel as well? But that definition seems consistent with the Zip* parallel definitions for other collections, which I assume are "parallel".

@johnynek
Copy link
Contributor

I guess I just mean the typeclass is underdefined...

I think when we actually use it we almost always actually know the concrete type.

Like, write an interesting abstract function using just Parallel. So, we have parTraverse which is virtually only called in abstract context when we know that the type constructor is for instance some kind of effect-like type.

I think it is a bit of a false generality....

Like, you can use Parallel with Either because you know behind the scenes what it is going to do, but not because you write interesting abstract functions with Parallel.

Does that make sense?

@armanbilge
Copy link
Member

armanbilge commented Nov 25, 2021

Oh, it is absolutely underdefined. But I wonder if we can do better?

I feel like I have an intuitive understand of what "parallel" means, and I wouldn't necessarily say it is a false generality. Something about how the "parallel" Applicative is one for which no consistent Monad exists i.e. it lacks any notion of sequentiality. And I think from here you may be (?) able to draw some useful (general) inferences e.g., that the effect does not "short-circuit".

But maybe I'm just babbling. Also not sure if this reasoning generalizes to NonEmptyParallel as well.

Edit: alas, I see my "no-consistent-Monad" reasoning has already excluded the Id and Option "parallels" 😆

@satorg
Copy link
Contributor Author

satorg commented Nov 26, 2021

I am struggling to understand, why does ZipLazyList implement Applicative#pure in this way:

def pure[A](x: A): ZipLazyList[A] = new ZipLazyList(LazyList.continually(x))

It is definitely impossible to implement for strict collections like List or Vector. Does it mean that Parallel[List] is impossible in principle and we can only have NonEmptyParallel[List]?

@armanbilge
Copy link
Member

armanbilge commented Nov 26, 2021

@satorg I think you need that definition of pure to satisfy this law:

def applicativeIdentity[A](fa: F[A]): IsEq[F[A]] =
F.pure((a: A) => a).ap(fa) <-> fa

Makes sense about the strict collections 👍

@satorg
Copy link
Contributor Author

satorg commented Nov 26, 2021

Right, but I guess this law cannot be verified if pure is defined via LazyList#continually – is it correct?

Along with applicativeHomomorphism and applicativeInterchange...

@satorg
Copy link
Contributor Author

satorg commented Nov 26, 2021

FYI: I found out that seems such behavior was first introduced in this PR: #1938 for ZipStream and then repeated for ZipLazyList.

@armanbilge
Copy link
Member

I could be mistaken, but I'm pretty sure you can, so long as fa is terminating. Try it and see?

@satorg
Copy link
Contributor Author

satorg commented Nov 26, 2021

I did already actually... ApplicativeTests for ZipLazyList hangs forever (it is what I meant).
But why do you think that fa might be terminating?
I cannot find any code that could make it terminating when executing ApplicativeTests[ZipLazyList].

@armanbilge
Copy link
Member

fa is an arbitrary LazyList right? And an arbitrary LazyList can be terminating I think?

@satorg
Copy link
Contributor Author

satorg commented Nov 26, 2021

Ah, yeah, I see what you mean – that is correct for sure if we talk about applicativeIdentity and some other Applicative laws.

But unfortunately there're also applicativeHomomorphism and applicativeUnit in ApplicativeLaws – unlike applicativeIdentity these two don't take fa as a parameter but rather take some values and produce F[A] with F.pure on both sides of the <-> operator. Apparently, F.pure produce infinite sequences in such cases and these laws hang on execution:

  def applicativeHomomorphism[A, B](a: A, f: A => B): IsEq[F[B]] =
    F.pure(f).ap(F.pure(a)) <-> F.pure(f(a))

  def applicativeUnit[A](a: A): IsEq[F[A]] =
    F.unit.map(_ => a) <-> F.pure(a)

@armanbilge
Copy link
Member

Yup, those laws would definitely hang :) I think if you filter the tests to remove those troublesome laws, at least we can test the rest?

@armanbilge
Copy link
Member

Oh, here's one more idea: maybe you can override Eq for LazyList in the tests to just do like .take(100) or something?

@satorg
Copy link
Contributor Author

satorg commented Nov 26, 2021

Yes, it works for Applicative[ZipLazyList] indeed:

    implicit def limitedZipLazyListEq[A: Eq]: Eq[ZipLazyList[A]] = Eq.by(_.value.take(100))
    checkAll("ZipLazyList[Int]", AlternativeTests[ZipLazyList].alternative[Int, Int, Int])

The only concern here is that the limit value (100) is hardcoded. It could be solved by generating it though, but it is a minutiae for now.

The more important thing is that since this particular PR is about the Alternative typeclass, I am more interested in AlternativeLaws apparently. Unfortunately seems these laws are not respected by the current ZipLazyList instance:

    implicit def limitedZipLazyListEq[A: Eq]: Eq[ZipLazyList[A]] = Eq.by(_.value.take(100))
    checkAll("ZipLazyList[Int]", AlternativeTests[ZipLazyList].alternative[Int, Int, Int])

One of the Alternative's laws (namely nonEmptyAlternativeRightDistributivity) fails the verification:

failing seed for alternative.right distributivity is AXCCUaoqcBo2HTR10KI05rEEh3w_ev5SHXHKbuU2dEK=
failing seed for alternative.right distributivity is zu7cOAK1kuUfCZYUP8FV9HR54TXPpdnOTQOE4fkOz0K=
==> X cats.tests.LazyListSuite.ZipLazyList[Int]: alternative.right distributivity  0.03s munit.FailException: /Users/storgashov/Projects/public/cats/tests/src/test/scala-2.13+/cats/tests/LazyListSuite.scala:54
53:
54:    checkAll("ZipLazyList[Int]", AlternativeTests[ZipLazyList].alternative[Int, Int, Int])
55:  }

Failing seed: SPVn4KbEqAoFqs1jTEAi5tNDFVeie69H52mPPaqQUNF=
You can reproduce this failure by adding the following override to your suite:

  override val scalaCheckInitialSeed = "SPVn4KbEqAoFqs1jTEAi5tNDFVeie69H52mPPaqQUNF="

Falsified after 8 passed tests.
> Labels of failing property: 
Expected: cats.data.ZipLazyList@44b0cfdb
Received: cats.data.ZipLazyList@38d030fd
> ARG_0: cats.data.ZipLazyList@725f26df
> ARG_1: cats.data.ZipLazyList@adf8e75c
> ARG_2: cats.data.ZipLazyList@62e58123

@satorg
Copy link
Contributor Author

satorg commented Nov 26, 2021

Seems that Alternative[ZipLazyList] is unlawful, doesn't it? I bet the same is applicable for Alternative[Stream]...

@armanbilge
Copy link
Member

Yes, it seems to me that NonEmptyAlternative is unlawful for any Zip* collection.

def nonEmptyAlternativeRightDistributivity[A, B](fa: F[A], ff: F[A => B], fg: F[A => B]): IsEq[F[B]] =
((ff |+| fg).ap(fa)) <-> ((ff.ap(fa)) |+| (fg.ap(fa)))

Suppose fa, ff, fg are all some ZipCollection of length 1. Then the left-hand-side of the equality will have length 1 and the right-hand-side will have length 2.

@satorg
Copy link
Contributor Author

satorg commented Nov 26, 2021

Yes, agree. I would guess it may not be lawful even in theory. I'm struggling to imagine how Semigroup#combine (or SemigroupK#combineK) could be implemented for Zip* collections to make it working with Parallel properly. I mean, Semigroup#combine looks to me like a sequential operation by its nature (if I didn't miss something).

@satorg
Copy link
Contributor Author

satorg commented Nov 26, 2021

Btw, I've also found another piece of weirdness in ScalaVersionSpecificFoldableSuite, check it out:

  // Can't test Parallel here, as Applicative[ZipLazyList].pure doesn't terminate
  checkAll("Parallel[LazyList]", NonEmptyParallelTests[LazyList].nonEmptyParallel[Int, String])

  checkAll("Parallel[NonEmptyLazyList]", ParallelTests[NonEmptyLazyList].parallel[Int, String])

The thing is that there's no overridden Eq[NonEmptyLazyList in the scope, but ParallelTests[NonEmptyLazyList].parallel[Int, String] succeeds while ParallelTests[LazyList].parallel[Int, String] hangs. Guess why?

implicit def catsDataParallelForNonEmptyLazyList: Parallel.Aux[NonEmptyLazyList, OneAnd[ZipLazyList, *]] =
    new Parallel[NonEmptyLazyList] {
      type F[x] = OneAnd[ZipLazyList, x]

      def applicative: Applicative[OneAnd[ZipLazyList, *]] =
        OneAnd.catsDataApplicativeForOneAnd(ZipLazyList.catsDataAlternativeForZipLazyList)

      ...
    }

I.e. Parallel[NonEmptyZipList] completely delegates its Applicative to OneAnd, but the latter defines it in this way:

  implicit def catsDataApplicativeForOneAnd[F[_]](implicit F: Alternative[F]): Applicative[OneAnd[F, *]] =
    new Applicative[OneAnd[F, *]] {
      ....

      def pure[A](x: A): OneAnd[F, A] =  OneAnd(x, F.empty)

      override def ap[A, B](ff: OneAnd[F, A => B])(fa: OneAnd[F, A]): OneAnd[F, B] = {
        val (f, tf) = (ff.head, ff.tail)
        val (a, ta) = (fa.head, fa.tail)
        val fb = F.ap(tf)(F.combineK(F.pure(a), ta))
        OneAnd(f(a), F.combineK(F.map(ta)(f), fb))
      }
    }

I.e. it redefines pure and makes it always producing a single item. That makes the Parallel[NonEmptyLazyList] implementation inconsistent with Parallel[LazyList]. I.e. Parallel[NonEmptyLazyList] is not actually "parallel".

But the problem there is that all checks from ParallelTests[NonEmptyLazyList].parallel[Int, String] succeed and seems it does not detect that the instance is incorrect.

WDYT?

@johnynek
Copy link
Contributor

johnynek commented Nov 26, 2021

Can we summarize the issues here?

  1. pure(a) making singleton list vs infinite list
  2. non-termination of laws with infinite pure (but maybe we could prove them to be lawful "on paper", however, it's not clear how useful such instances are since we might often trip over the infinite-ness of them).
  3. the issue with distributivity in Alternative laws. This seems to be a counter-example of the current idea (combineK is concat but product is zip: Optimize Alternative (part 2): add prependK/appendK specializations for std containers #4052 (comment) )

I think we have an issue that we have instances that seem untested. And another issue about Parallel possibly being either ill-defined or under-defined.

@armanbilge
Copy link
Member

armanbilge commented Nov 26, 2021

Nice summary. IMO:

  1. pure defined as an infinite list is correct to satisfy the identity law. See Optimize Alternative (part 2): add prependK/appendK specializations for std containers #4052 (comment).
  2. I think these instances may still be useful, since in practice LazyLists can also be finite.
  3. I see you edited with my counter-example. Seems to me like there is probably no valid NonEmptyAlternative for Zip* collections.

I think we have an issue that we have instances that seem untested. And another issue about Parallel possibly being either ill-defined or under-defined.

Yes, I have more thoughts about Parallel. I think that can be a separate issue. The issue of untested instances for Zip* is entangled with this PR.

@johnynek
Copy link
Contributor

I wonder if something like this combineK works for Parallel infinite streams: a <+> b if a is finite, then we return a ++ b.drop(a.size)

Well, I guess if a is infinite, then that also works: a #::: b.drop(a.size) since the b part is lazily evaluated, and only evaluated if a is finite.

@satorg
Copy link
Contributor Author

satorg commented Nov 26, 2021

Hmm... a #::: b.drop(a.size) – dropping elements while combining collections looks a bit suspicious to me... Could it be that we will break some other laws in this way (even if such laws are not currently implemented in Cats)?

@johnynek
Copy link
Contributor

If it doesn't break any laws, but the current ones do, I would argue it isn't suspicious.

Basically this is point wise aligning items, which feels like parallel to me. If that passes the distributive laws, I don't see any issue with it.

Also, note it follows len(a <+> b) == max(len(a), len(b)) where as for monadic alternatives we have len(a <+> b) = len(a) + len(b). This is reminiscent of the tropical semiring:

https://en.wikipedia.org/wiki/Tropical_semiring

@armanbilge
Copy link
Member

It looked strange to me too at first but I think it might work actually :)

I think maybe before this PR, we should have another PR that:

  1. Adds tests for the untested instances for Zip collections.
  2. Makes those instances lawful as necessary.

Then this PR can stay focused on its original mission to implement prependK etc. @satorg WDYT?

@satorg
Copy link
Contributor Author

satorg commented Nov 26, 2021

Yeah, that makes sense to me too. I'd feel more confident about this PR if there were some tests for the code before the changes and I could see that the changes wouldn't break the tests. I'll take on this.

One more thing to confirm: as I understand, it is not possible to implement Applicative (and thus Alternative) for non-lazy Zip-collections (like ZipList or ZipVector) – is it correct?

I mean, apparently Applicative#pure cannot be implemented in terms of continually for non-lazy collections, but otherwise there's no way to produce a sequence that would be aligned with other Zip operations.

@armanbilge
Copy link
Member

@satorg Thanks! Sounds great.

Yup, that's exactly my understanding about strict collections.

BTW, I just saw this relevant issue:

Actually, I'm not sure I agree with it. I am writing up an issue about Parallel and I will express my thoughts there.

@johnynek
Copy link
Contributor

I think we invented Parallel in cats, so we are in charge of what it means. It isn't entirely clear what the laws should be. I think that is worth exploring in more detail. I think it was basically developed to abstract over Either <-> Validated and IO <-> IO.Par. Basically those where the motivating use cases, IIRC.

We could have overfit on those two with some of the laws. But since it is so vague now what we mean it is hard to say. Basically the idea is sometimes there are multiple different applicatives that could be defined, but in those cases some of them cannot be extended to a Monad. I'm not sure if it is much deeper than that.

@satorg satorg force-pushed the ne-alternative-specializations-std branch from 530006f to efa971c Compare November 27, 2021 19:09
@satorg satorg requested a review from johnynek November 27, 2021 19:10
@satorg satorg marked this pull request as ready for review November 27, 2021 19:11
@satorg
Copy link
Contributor Author

satorg commented Nov 27, 2021

@armanbilge I decided to remove specializations for Zip collections from this PR since they apparently are worth a separate PR. Meanwhile it makes this PR focused on specializations for standard collections only and I think it is in a pretty good shape now, wdyt?

@@ -16,6 +16,8 @@ trait StreamInstances extends cats.kernel.instances.StreamInstances {

def combineK[A](x: Stream[A], y: Stream[A]): Stream[A] = x #::: y

override def prependK[A](a: A, fa: Stream[A]): Stream[A] = a #:: fa
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For LazyList you overrode both prependK and appendK, why only prependK for Stream, are the implementations different?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this is because appendK for LazyList utilizes LazyList#appended which is optimized for LazyList. But appended/prepended methods were first introduced in Scala 2.13 and Stream in Scala 2.12 does not have such methods yet. Therefore seems the best we can do here is to call

def appendK[A](fa: Stream[A], a: A) = fa #::: Stream(a)

but it is exactly what combineK (along with pure) already does.

Copy link
Contributor Author

@satorg satorg Nov 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the second thought, Stream in Scala 2.13 does have the prepended/appended methods, but as I can see they're simply come from Seq and aren't optimized for Stream (unlike those in LazyList).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks for explaining!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants