Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconsider the use of fold in Validated #1951

Closed
travisbrown opened this issue Oct 6, 2017 · 8 comments · Fixed by #1976
Closed

Reconsider the use of fold in Validated #1951

travisbrown opened this issue Oct 6, 2017 · 8 comments · Fixed by #1976

Comments

@travisbrown
Copy link
Contributor

A lot of the method implementations in Validated are extremely inefficient, which isn't ideal for projects like circe that rely on it extensively and are aiming for reasonable performance. For example, given this simplified definition (where eqF is the current === and eqP is a lightly optimized version):

import cats.kernel.Eq

sealed abstract class Validated[+E, +A] extends Product with Serializable {
  import Validated.{Invalid, Valid}

  def fold[B](fe: E => B, fa: A => B): B = this match {
    case Invalid(e) => fe(e)
    case Valid(a) => fa(a)
  }

  def eqF[EE >: E, AA >: A](that: Validated[EE, AA])
    (implicit EE: Eq[EE], AA: Eq[AA]): Boolean = fold(
      a => that.fold(EE.eqv(a, _), _ => false),
      b => that.fold(_ => false, AA.eqv(b, _))
    )

  def eqP[EE >: E, AA >: A](that: Validated[EE, AA])
    (implicit EE: Eq[EE], AA: Eq[AA]): Boolean = this match {
      case Invalid(e) => that match {
        case Invalid(ee) => EE.eqv(e, ee)
        case _ => false
      }
      case Valid(a) => that match {
        case Valid(aa) => AA.eqv(a, aa)
        case _ => false
      }
    }
}

object Validated {
  final case class Valid[+A](a: A) extends Validated[Nothing, A]
  final case class Invalid[+E](e: E) extends Validated[E, Nothing]
}

And a benchmark like this:

import cats.kernel.instances.all._
import java.util.concurrent.TimeUnit
import org.openjdk.jmh.annotations._

@State(Scope.Thread)
class ValidatedBenchmark {
  val a: Validated[String, Int] = Validated.Valid(1)
  val b: Validated[String, Int] = Validated.Valid(2)
  val c: Validated[String, Int] = Validated.Invalid("bad")
  val d: Validated[String, Int] = Validated.Invalid("worse")

  @Benchmark def eqF: Boolean =
    a.eqF(a) && !a.eqF(b) && !a.eqF(c) && c.eqF(c) && !c.eqF(d) && !c.eqF(a)

  @Benchmark def eqP: Boolean =
    a.eqP(a) && !a.eqP(b) && !a.eqP(c) && c.eqP(c) && !c.eqP(d) && !c.eqP(a)
}

The pattern matching implementation gets over four times the throughput on 2.12:

Benchmark                Mode  Cnt          Score         Error  Units
ValidatedBenchmark.eqF  thrpt   40   27131124.981 ±  164113.218  ops/s
ValidatedBenchmark.eqP  thrpt   40  118711782.519 ± 1166236.166  ops/s

The situation is even worse on 2.11, where the difference is a little larger and the fold version also involves a lot of unnecessary allocations:

Benchmark                                        Mode  Cnt          Score         Error   Units
ValidatedBenchmark.eqF:gc.alloc.rate.norm       thrpt    5        144.000 ±       0.001    B/op
ValidatedBenchmark.eqP:gc.alloc.rate.norm       thrpt    5         ≈ 10⁻⁵                  B/op

For most use cases these differences are unlikely to matter at all, but it seems like a shame to impose this extra cost on everyone just for the sake of slightly more concise implementations.

(There are similar issues in IndexedStateT, etc., but Validated is the one piece I personally really care that much about, because of how it affects circe.)

@adelbertc
Copy link
Contributor

👍 , it's the reasoning we took in #1289 as well

@julienrf
Copy link
Contributor

julienrf commented Oct 6, 2017

What if fold is an abstract method in Validated and implemented in Invalid and Valid? (we would trade pattern matching for dynamic dispatch)

@johnynek
Copy link
Contributor

johnynek commented Oct 6, 2017 via email

@kailuowang
Copy link
Contributor

It will be interesting to benchmark @julienrf 's abstract fold approach.

@travisbrown
Copy link
Contributor Author

@julienrf @kailuowang I don't think pattern matching is the real culprit here—it's more the overhead of the functions. Making fold abstract does seem to help a bit on 2.12:

Benchmark                 Mode  Cnt          Score        Error  Units
ValidatedBenchmark.eqF   thrpt   40   26938203.527 ± 189709.343  ops/s
ValidatedBenchmark.eqF2  thrpt   40   29927603.497 ± 199789.080  ops/s
ValidatedBenchmark.eqP   thrpt   40  118540469.078 ± 754414.614  ops/s

…but there's no measurable difference on 2.11 and the allocation situation is the same on both.

@johnynek
Copy link
Contributor

johnynek commented Oct 6, 2017

functions, I've always heard, are very hard on the jit due to them being megamorphic. So, one of the best optimizations you can often do in scala is remove a function and replace with literal code that the JIT won't skip over.

I would love if this could be solved in the future with dotty linker, or changes to the JIT to better handle lambdas (maybe java will start to care about this now that they have lambdas).

@julienrf
Copy link
Contributor

julienrf commented Oct 6, 2017

How can we apply that in our context?

@travisbrown
Copy link
Contributor Author

@julienrf I think just rewriting fold in all of the implementations to use pattern matching (as in my example above) is a reasonable way to avoid unnecessary functions.

Users can decide whether for their own cases which approach they want. For example I use Json#fold all the time in application code that depends on circe, but don't use it anywhere internally in circe itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants