Reconsider the use of fold in Validated #1951

travisbrown · 2017-10-06T06:22:10Z

A lot of the method implementations in Validated are extremely inefficient, which isn't ideal for projects like circe that rely on it extensively and are aiming for reasonable performance. For example, given this simplified definition (where eqF is the current === and eqP is a lightly optimized version):

import cats.kernel.Eq

sealed abstract class Validated[+E, +A] extends Product with Serializable {
  import Validated.{Invalid, Valid}

  def fold[B](fe: E => B, fa: A => B): B = this match {
    case Invalid(e) => fe(e)
    case Valid(a) => fa(a)
  }

  def eqF[EE >: E, AA >: A](that: Validated[EE, AA])
    (implicit EE: Eq[EE], AA: Eq[AA]): Boolean = fold(
      a => that.fold(EE.eqv(a, _), _ => false),
      b => that.fold(_ => false, AA.eqv(b, _))
    )

  def eqP[EE >: E, AA >: A](that: Validated[EE, AA])
    (implicit EE: Eq[EE], AA: Eq[AA]): Boolean = this match {
      case Invalid(e) => that match {
        case Invalid(ee) => EE.eqv(e, ee)
        case _ => false
      }
      case Valid(a) => that match {
        case Valid(aa) => AA.eqv(a, aa)
        case _ => false
      }
    }
}

object Validated {
  final case class Valid[+A](a: A) extends Validated[Nothing, A]
  final case class Invalid[+E](e: E) extends Validated[E, Nothing]
}

And a benchmark like this:

import cats.kernel.instances.all._
import java.util.concurrent.TimeUnit
import org.openjdk.jmh.annotations._

@State(Scope.Thread)
class ValidatedBenchmark {
  val a: Validated[String, Int] = Validated.Valid(1)
  val b: Validated[String, Int] = Validated.Valid(2)
  val c: Validated[String, Int] = Validated.Invalid("bad")
  val d: Validated[String, Int] = Validated.Invalid("worse")

  @Benchmark def eqF: Boolean =
    a.eqF(a) && !a.eqF(b) && !a.eqF(c) && c.eqF(c) && !c.eqF(d) && !c.eqF(a)

  @Benchmark def eqP: Boolean =
    a.eqP(a) && !a.eqP(b) && !a.eqP(c) && c.eqP(c) && !c.eqP(d) && !c.eqP(a)
}

The pattern matching implementation gets over four times the throughput on 2.12:

Benchmark                Mode  Cnt          Score         Error  Units
ValidatedBenchmark.eqF  thrpt   40   27131124.981 ±  164113.218  ops/s
ValidatedBenchmark.eqP  thrpt   40  118711782.519 ± 1166236.166  ops/s

The situation is even worse on 2.11, where the difference is a little larger and the fold version also involves a lot of unnecessary allocations:

Benchmark                                        Mode  Cnt          Score         Error   Units
ValidatedBenchmark.eqF:gc.alloc.rate.norm       thrpt    5        144.000 ±       0.001    B/op
ValidatedBenchmark.eqP:gc.alloc.rate.norm       thrpt    5         ≈ 10⁻⁵                  B/op

For most use cases these differences are unlikely to matter at all, but it seems like a shame to impose this extra cost on everyone just for the sake of slightly more concise implementations.

(There are similar issues in IndexedStateT, etc., but Validated is the one piece I personally really care that much about, because of how it affects circe.)

The text was updated successfully, but these errors were encountered:

adelbertc · 2017-10-06T07:10:13Z

👍 , it's the reasoning we took in #1289 as well

julienrf · 2017-10-06T07:10:19Z

What if fold is an abstract method in Validated and implemented in Invalid and Valid? (we would trade pattern matching for dynamic dispatch)

johnynek · 2017-10-06T07:45:51Z

+1 In general core library code in my view should have a nice API but then be implemented as efficiently as possible. I’d love to see us use the highest performance approach we can find.

On Thu, Oct 5, 2017 at 21:10 Julien Richard-Foy ***@***.***> wrote: What if fold is an abstract method in Validated and implemented in Invalid and Valid? (we would trade pattern matching for dynamic dispatch) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1951 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEJdkvM7HPVPALDnEoC2dUPjC-cRCTlks5spdJbgaJpZM4PwDNP> .

-- P. Oscar Boykin, Ph.D. | http://twitter.com/posco | http://pobox.com/~boykin

kailuowang · 2017-10-06T11:46:14Z

It will be interesting to benchmark @julienrf 's abstract fold approach.

travisbrown · 2017-10-06T14:12:07Z

@julienrf @kailuowang I don't think pattern matching is the real culprit here—it's more the overhead of the functions. Making fold abstract does seem to help a bit on 2.12:

Benchmark                 Mode  Cnt          Score        Error  Units
ValidatedBenchmark.eqF   thrpt   40   26938203.527 ± 189709.343  ops/s
ValidatedBenchmark.eqF2  thrpt   40   29927603.497 ± 199789.080  ops/s
ValidatedBenchmark.eqP   thrpt   40  118540469.078 ± 754414.614  ops/s

…but there's no measurable difference on 2.11 and the allocation situation is the same on both.

johnynek · 2017-10-06T17:19:34Z

functions, I've always heard, are very hard on the jit due to them being megamorphic. So, one of the best optimizations you can often do in scala is remove a function and replace with literal code that the JIT won't skip over.

I would love if this could be solved in the future with dotty linker, or changes to the JIT to better handle lambdas (maybe java will start to care about this now that they have lambdas).

julienrf · 2017-10-06T19:48:44Z

How can we apply that in our context?

travisbrown · 2017-10-06T20:45:45Z

@julienrf I think just rewriting fold in all of the implementations to use pattern matching (as in my example above) is a reasonable way to avoid unnecessary functions.

Users can decide whether for their own cases which approach they want. For example I use Json#fold all the time in application code that depends on circe, but don't use it anywhere internally in circe itself.

kailuowang added help wanted low-hanging fruit labels Oct 6, 2017

kailuowang added this to the 1.0.0 milestone Oct 10, 2017

kailuowang mentioned this issue Oct 16, 2017

Getting ready for RC1 #1974

Merged

12 tasks

kailuowang self-assigned this Oct 16, 2017

kailuowang mentioned this issue Oct 16, 2017

reduced usage of fold in Validated #1976

Merged

kailuowang closed this as completed in #1976 Oct 17, 2017

durban mentioned this issue Nov 26, 2017

Queue#peek1 and timedPeek1 typelevel/fs2#996

Merged

kailuowang mentioned this issue Dec 13, 2017

Add Arrow Choice #2096

Merged

LukaJCB mentioned this issue Mar 16, 2018

Add Bracket type class typelevel/cats-effect#113

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reconsider the use of fold in Validated #1951

Reconsider the use of fold in Validated #1951

travisbrown commented Oct 6, 2017

adelbertc commented Oct 6, 2017

julienrf commented Oct 6, 2017

johnynek commented Oct 6, 2017 via email

kailuowang commented Oct 6, 2017

travisbrown commented Oct 6, 2017

johnynek commented Oct 6, 2017

julienrf commented Oct 6, 2017

travisbrown commented Oct 6, 2017

Reconsider the use of fold in Validated #1951

Reconsider the use of fold in Validated #1951

Comments

travisbrown commented Oct 6, 2017

adelbertc commented Oct 6, 2017

julienrf commented Oct 6, 2017

johnynek commented Oct 6, 2017 via email

kailuowang commented Oct 6, 2017

travisbrown commented Oct 6, 2017

johnynek commented Oct 6, 2017

julienrf commented Oct 6, 2017

travisbrown commented Oct 6, 2017