Optimize Chain length methods #4166

bplommer · 2022-04-07T19:06:14Z

No description provided.

johnynek · 2022-04-07T19:21:47Z

core/src/main/scala/cats/data/Chain.scala

+      case Empty                   => 0
+      case Wrap(seq)               => seq.length.toLong
+      case Singleton(a)            => 1
+      case Append(leftNE, rightNE) => leftNE.length + rightNE.length


this can blow the stack. I think you need to use the iterator in this case or write a specific recursion.

satorg · 2022-04-07T19:24:12Z

core/src/main/scala/cats/data/Chain.scala

+      case Empty                   => 0
+      case Wrap(seq)               => seq.length.toLong
+      case Singleton(a)            => 1
+      case Append(leftNE, rightNE) => leftNE.length + rightNE.length


This case for Append triggers recursive evaluation of length (if the Chain consist of a bunch of Append-s). It may not be stack safe. IMO it would be better to keep the old iteration based evaluation for the Append case.

... on the second thought (and as an alternative to just iteration), we can consider implementing a separate (perhaps private) method based on Eval structure:

def lengthEval: Eval[Long]

This method can handle all the length calculations in a stack-safe way.
Then the regular length can call that method to trigger all the calculation.

Not sure if it is really worth it, though...

johnynek · 2022-04-07T19:27:22Z

core/src/main/scala/cats/data/Chain.scala

-    while (iter.hasNext) { i += 1; iter.next(); }
-    i
-  }
+  final def length: Long =


actually I think what we really might want is:

def foldMap[B](fn: A => B)(implicit B: Monoid[B]): B = { @annotation.tailrec def loop(chains: List[Chain[A]], acc: B): B = chains match { case Nil => acc case h :: tail => h match { case Empty => loop(tail, acc) case Wrap(seq) => loop(tail, B.combine(B.combineAll(acc, seq.iterator.map(fn)))) case Singleton(a) => loop(tail, B.combine(acc, fn(a))) case Append(l, r) => loop(l :: r :: tail, acc) } } // we need to be careful and test this with a non-commutative monoid to be sure we get the order // right compared to `toList.foldMap` loop(this :: Nil, B.empty) }

That is stack safe and leverages associtivity (and we can override the foldMap in the Foldable instance).

With that, you can do: def length: Long = foldMap(Function.const(1L))

Or you could write a custom loop based on the same idea and make it stack safe.

Just a note: using such foldMap for calculating length will undermine the original efforts for Wrap case optimization.

...although the method foldMap itself is nice to have for sure.

yeah, I mean you can make 3 specialized methods: combineAll, foldMap, length and each of them can have their own code.

If you just care about length right now, you can just add that one and change the Wrap(seq) line to be loop(tail, acc + seq.length.toLong) and the Singleton(_) line to be loop(tail, acc + 1L)

Just curious: is it going to be faster than a stack-safe implementation based on Eval? I know Eval uses closures, but this code also does some additional allocations of List items all the way while recursing. Wdyt?

Eval also allocates lists to manage a stack. I think this will be faster since it won't also allocate and call lambdas.

Another option (what did I say about getting sucked down rabbit-holes...)

private def foldMapCase[B](f: A => B, g: Seq[A] => B)(implicit M: Monoid[B]): B = { @tailrec def loop(chains: List[Chain[A]], acc: B): B = chains match { case Nil => acc case h :: t => h match { case Empty => loop(t, acc) case Singleton(a) => loop(t, M.combine(f(a), acc)) case Wrap(as) => loop(t, M.combine(g(as), acc)) case Append(l, r) => loop(l :: r :: t, acc) } } loop(this :: Nil, M.empty) } final def foldMap[B](f: A => B)(implicit M: Monoid[B]): B = foldMapCase(f, Foldable[Seq].foldMap(_)(f)) final def length: Long = foldMapCase(_ => 1L, _.length.toLong)

I wouldn't sweat it. Just copy the code. The inlined copied code will also be more efficient since there is no Long boxing.

We prefer a copy of the code if we can get efficiency wins in library code. We aren't trying to make the internals of cats as beautiful as possible. IMO, we want the API to be beautiful, then we want the library to be stack safe, then we want it to be fast, then we want it to be implemented in a beautiful fashion.

thinking more about this, I am not sure we actually want to implement foldMap the way I suggested. Doing so would likely defeat the optimizations of Monoid.combineAll. I think the right way is just Monoid.combineAll(toIterator.map(fn)) which is free to use internal mutability on the monoid. If we implement the way I suggested above, we force the monoid to continue to concatenate (which could make things like string concatenation quadratic vs linear).

bplommer · 2022-04-07T19:41:28Z

Oops - I'm going to replace the recursion for Append with a todo to avoid getting sucked down a rabbit hole.

johnynek · 2022-04-07T19:48:35Z

core/src/main/scala/cats/data/Chain.scala

+      case Wrap(seq)    => seq.length.toLong
+
+      // TODO: consider implementing this case as a stack-safe recursion.
+      case Append(_, _) => iterator.length.toLong


this will undo the optimization if there is any nesting of Wrap. I think since I've already written the code, it isn't such a heavy lift to copy and paste that version in (with the modifications).

Fair point 😅

bplommer · 2022-04-07T19:48:56Z

core/src/main/scala/cats/data/Chain.scala


-  /**
+      // TODO: consider implementing this case as a stack-safe recursion.
+      case Append(_, _) => iterator.length.toLong


This ends up being implemented by counting through the iterator.

johnynek · 2022-04-07T20:07:53Z

core/src/main/scala/cats/data/Chain.scala

@@ -566,32 +566,26 @@ sealed abstract class Chain[+A] {
   * Returns the number of elements in this structure
   */
  final def length: Long = {


I don't know if we have a test for this, but if not we should add that chain.length == chain.toList.length.toLong

I'm really wondering how come Chain#length became Long rather than just Int (which is used for all other collections' length). Is there any chance that Chain#length can outgrow Int type?

Yeah, it's annoying - it means Chain can never extend Iterable or Seq.

sure it can, just cast to Int, but I don't think that's a good idea anyway. Make a wrapper and call toSeq and just implement the methods that way.

Why is the length an Int in scala? I think because java did this in java.util. Why did java.util do it? In those days computers were all 32 bits, so Long was more expensive (that isn't the case now, I don't think), also the idea of an arraylist with more than 2 billion items seemed crazy in the late 90s.

With scala List, there is nothing that I know of that prevents you from making a list more than 2 billion long: it is just a series of pointers. But if you do, I guess length will just throw or return junk.

In cats, we preferred Long for two reasons:

Long is as cheap as Int now.

We can actually imagine cases where Int is overflowed, but Long will basically never be overflowed because 2^63 is a gigantic number (we may live long enough to see computers with that much ram, but I doubt it).

So, if you want to implement def toSeq: Seq[A] on Chain, you can totally do it. The length being cast to Int is no more of a lie than when List does it, and we live with that all the time. I don't see the problem.

sure it can, just cast to Int

Not without return type polymorphism (the method name length is already taken by the Long version)- but yeah, you can still do a conversion.

Good point, but the toSeq or toIterable without doing a deep copy does work.

johnynek

thanks!

I verified that length is tested because there is a law that size.toInt == .toList.size

bplommer added 2 commits April 7, 2022 19:55

Optimize Chain.length implementation

f331cc8

Optimize Chain.knownLength

4ec151c

johnynek reviewed Apr 7, 2022

View reviewed changes

satorg reviewed Apr 7, 2022

View reviewed changes

johnynek reviewed Apr 7, 2022

View reviewed changes

Remove stack-unsafe recursion

237c2ed

johnynek reviewed Apr 7, 2022

View reviewed changes

bplommer commented Apr 7, 2022

View reviewed changes

bplommer added 4 commits April 7, 2022 20:51

Fix scala 3 compilation

d5000f3

Tail-recursive length loop

bb24501

Remove dead code

718dc8d

Fix comment

2125218

johnynek reviewed Apr 7, 2022

View reviewed changes

bplommer marked this pull request as ready for review April 7, 2022 20:44

johnynek approved these changes Apr 9, 2022

View reviewed changes

johnynek merged commit abd3fe6 into typelevel:main Apr 9, 2022

armanbilge mentioned this pull request Apr 9, 2022

Migrate to sbt-typelevel #4160

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Chain length methods #4166

Optimize Chain length methods #4166

bplommer commented Apr 7, 2022

johnynek Apr 7, 2022

satorg Apr 7, 2022

satorg Apr 7, 2022

johnynek Apr 7, 2022 •

edited

Loading

satorg Apr 7, 2022

satorg Apr 7, 2022

johnynek Apr 7, 2022 •

edited

Loading

satorg Apr 7, 2022

johnynek Apr 7, 2022

bplommer Apr 7, 2022 •

edited

Loading

johnynek Apr 7, 2022

johnynek Apr 9, 2022

bplommer commented Apr 7, 2022

johnynek Apr 7, 2022

bplommer Apr 7, 2022

bplommer Apr 7, 2022

johnynek Apr 7, 2022

satorg Apr 7, 2022

bplommer Apr 7, 2022

johnynek Apr 7, 2022

bplommer Apr 7, 2022 •

edited

Loading

johnynek Apr 7, 2022

johnynek left a comment

Optimize Chain length methods #4166

Optimize Chain length methods #4166

Conversation

bplommer commented Apr 7, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnynek Apr 7, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnynek Apr 7, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bplommer Apr 7, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bplommer commented Apr 7, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bplommer Apr 7, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnynek left a comment

Choose a reason for hiding this comment

johnynek Apr 7, 2022 •

edited

Loading

johnynek Apr 7, 2022 •

edited

Loading

bplommer Apr 7, 2022 •

edited

Loading

bplommer Apr 7, 2022 •

edited

Loading