Don't depend on random sampling to determine function equivalence #2577

ceedubs · 2018-10-21T23:38:17Z

This is a work in progress and there is a bit more work that should
probably be done before merging this. However, I have already put a fair
amount of time into this and wanted to see what people thought about it
before pushing through to do all of the relevant work.

the problem

Cats has a lot of instances for function-like types. When we go to check
the laws for these types, we are required to supply an Eq instance.
But defining equality for functions is awkward. So far we've been
approaching this by generating a bunch of values and passing them into
both functions and ensuring that we get the same results from both. This
can produce false positives if two functions differ but we just happen
to sample values that produce the same output in both. For some
purposes, it isn't a big deal if we get some occasional false positives,
because over many runs of the tests with different RNG seeds, we should
eventually catch any discrepancies.

But here be dragons. Some tests use the results of these equality checks
on the left side of an implication, so a false positive in the equality
check becomes a false negative (and thus a build failure) in the test result.
See here for further discussion.

This is where my adventure with this PR begins. Cats builds have been
timing out quite a bit recently, so I tried to reduce the number of
random values that we sample when comparing two functions for equality.
While this did speed up the build a little bit, it started leading to a
much higher frequency of build failures due to false negatives in tests.

the proposed solution

I started to rethink how we determine function equivalence. Instead
of relying on nondeterministic behavior for equality, we can only
provide function equality for functions whose domains are small enough
to exhaustively check. If two functions produce the same output for the
entirety of their domain, they are equivalent.

I've introduced an ExhaustiveCheck[A] type class that is similar to
Gen[A] but produces a Stream[A] of the entire domain of A. I made
the name somewhat specific to tests as opposed to something like
Finite[A], because types like Int have a finite domain but would be
prohibitively expensive to exhaustively check in tests and therefore
shouldn't have an instance for this type class.

I also added some Eq instances for function-like types whose domains
have ExhaustiveCheck instances. For the sake of compatibility, I
didn't remove the old Eq instances, but I've put them in a lower
implicit priority scope, and I've changed the sites that were using them
to use the new instances (well not quite all of them yet -- that's why
this PR isn't quite complete yet).

benefits

The main benefits of this change as I see it are:

Remove some nondeterministic behavior from the build.
Allow for shrinking of the number of values checked to improve build
times without triggering build failures.
Allow for future deprecation of some problematic instances that are
exposed through cats-laws but that users should probably not depend on.

downsides

The main potential downside that I can think of is that we might be
checking 15 examples where we were checking 50 before, which could be
considered a reduction in test coverage. However, I think that all of
the places where this sort of approach is used are parametric on the
type, so I don't think that it should matter much that the domain for
this type is much smaller.

request for feedback

Let me know what you think. If people like this approach then I can
switch over the remaining bits.

This is a work in progress and there is a bit more work that should probably be done before merging this. However, I have already put a fair amount of time into this and wanted to see what people thought about it before pushing through to do all of the relevant work. Cats has a lot of instances for function-like types. When we go to check the laws for these types, we are required to supply an `Eq` instance. But defining equality for functions is awkward. So far we've been approaching this by generating a bunch of values and passing them into both functions and ensuring that we get the same results from both. This can produce false positives if two functions differ but we just happen to sample values that produce the same output in both. For some purposes, it isn't a big deal if we get some occasional false positives, because over many runs of the tests with different RNG seeds, we should eventually catch any discrepancies. But here be dragons. Some tests use the results of these equality checks on the left side of an implication, so a false positive in the equality check becomes a false _negative_ (and thus a build failure) in the test result. See [here](typelevel#1666 (comment)) for further discussion. This is where my adventure with this PR begins. Cats builds have been timing out quite a bit recently, so I tried to reduce the number of random values that we sample when comparing two functions for equality. While this did speed up the build a little bit, it started leading to a much higher frequency of build failures due to false negatives in tests. So I started to rethink how we determine function equivalence. Instead of relying on nondeterministic behavior for equality, we can only provide function equality for functions whose domains are small enough to exhaustively check. If two functions produce the same output for the entirety of their domain, they are equivalent. I've introduced an `ExhaustiveCheck[A]` type class that is similar to `Gen[A]` but produces a `Stream[A]` of the entire domain of `A`. I made the name somewhat specific to tests as opposed to something like `Finite[A]`, because types like `Int` have a finite domain but would be prohibitively expensive to exhaustively check in tests and therefore shouldn't have an instance for this type class. I also added some `Eq` instances for function-like types whose domains have `ExhaustiveCheck` instances. For the sake of compatibility, I didn't remove the old `Eq` instances, but I've put them in a lower implicit priority scope, and I've changed the sites that were using them to use the new instances (well not quite all of them yet -- that's why this PR isn't quite complete yet). The main benefits of this change as I see it are: 1. Remove some nondeterministic behavior from the build. 2. Allow for shrinking of the number of values checked to improve build times without triggering build failures. 3. Allow for future deprecation of some problematic instances that are exposed through cats-laws but that users should probably not depend on. The main potential downside that I can think of is that we might be checking 15 examples where we were checking 50 before, which could be considered a reduction in test coverage. However, I think that all of the places where this sort of approach is used are parametric on the type, so I don't think that it should matter much that the domain for this type is much smaller. Let me know what you think. If people like this approach then I can switch over the remaining bits.

johnynek · 2018-10-22T16:14:48Z

laws/src/main/scala/cats/laws/discipline/ExhaustiveCheck.scala

+trait ExhaustiveCheck[A] extends Serializable { self =>
+  def allValues: Stream[A]
+
+  def map[B](f: A => B): ExhaustiveCheck[B] = new ExhaustiveCheck[B] {


This isn’t safe. I think it should be imap to preserve the exhaustivity.

Thanks, good catch!

johnynek · 2018-10-22T16:16:42Z

laws/src/main/scala/cats/laws/discipline/MiniInt.scala

+ * Similar to `Int`, but with a much smaller domain. The exact range of [[MiniInt]] may be tuned
+ * from time to time, so consumers of this type should avoid depending on its exact range.
+ */
+final class MiniInt private (val asInt: Int) extends AnyVal with Serializable


Why not just use Short or Byte or something?

Do you mean on the inside of the MiniInt or using Byte or Short directly without a wrapper?

I chose to use a custom type so that we could tune the range as desired. A lot of times these checks come up when a law is checking the equality inside of a forAll, so we'll end up with 50 items in an outer loop (based on our Scalacheck config). If we were to use byte then we'd end up with 50 * 256 comparisons (or perhaps 50 * 256 * 256 if we are comparing two Function2 instances for equality). So I thought that something with a smaller domain might be desirable.

If you were just referring to why not use Byte instead of Int inside the MiniInt class, then I have no good reason and can change if you'd like.

In case it's of interest at one point I went down a rabbit hole of reimplementing parts of smallcheck within Cats laws so that we could have tests declare the "depth" that they wanted to use when comparing for equality. But it seemed like overkill and testing over a subset of a domain of Int didn't feel as safe to me as testing against the entire domain of a smaller type.

kailuowang · 2018-10-22T16:51:15Z

laws/src/main/scala/cats/laws/discipline/ExhaustiveCheck.scala

+ * domain of values as opposed to generating a random sampling of values.
+ */
+trait ExhaustiveCheck[A] extends Serializable { self =>
+  def allValues: Stream[A]


Hmm, my thinking is that Stream as a type doesn't enforce exhaustiveness, you probably have a practical case for favoring Stream over List/Chain somewhere right?

The Eq instances that use ExhaustiveCheck use forall, so my thought was that you wouldn't necessarily need to generate all instances for tests that fail fast. But since this is intended for small domains, there's really no reason to think that Stream would be better than List. I'm happy to change it if you'd like.

kailuowang · 2018-10-22T16:56:38Z

laws/src/main/scala/cats/laws/discipline/ExhaustiveCheck.scala

+        a <- A.allValues
+        b <- B.allValues
+        c <- C.allValues
+      } yield (a, b, c)


for MiniInt, it will be 349 values, just curious do we actually need this instance or you added it preemptively?

It is used by some tests, but I tried to make those places use (MiniInt, Boolean, Boolean) to reduce the domain size a bit. Let me know if you found a place where I failed to do that.

kailuowang · 2018-10-22T17:05:23Z

laws/src/main/scala/cats/laws/discipline/MiniInt.scala

+ * Similar to `Int`, but with a much smaller domain. The exact range of [[MiniInt]] may be tuned
+ * from time to time, so consumers of this type should avoid depending on its exact range.
+ */
+final class MiniInt private (val asInt: Int) extends AnyVal with Serializable


Since this MiniInt will be used as a type parameter a lot, maybe we should give https://github.com/typelevel/cats/blob/master/core/src/main/scala/cats/data/Newtype.scala a try to see if that makes an impact in test time? Of course that could be in a separate PR.

I would imagine that this overhead would be pretty small compared to other things that tests are doing, but it's certainly something that we could explore.

johnynek · 2018-10-22T17:44:53Z

Paging @non doesn’t this smell like Kronecker?

I like Stream since it is like and immutable iterator. You can avoid holding everything in memory by not holding the head. Not the case for List or Chain.

ceedubs · 2018-10-22T17:51:51Z

Ah that's right, I was pretty sure that Erik had done something like this, but I didn't look in any of the right repositories!

ceedubs · 2018-10-23T02:55:55Z

I took a look, and this does seem to be doing a (fairly small) subset of what Kronecker does. However, Kronecker has a few dependencies including Spire which would introduce a cyclical dependency were we to have cats-laws depend on Kronecker. I'm inclined to think that ExhaustiveCheck and the few instances that I've created for testing purposes is simple enough that it's okay for it to live inside cats-laws and duplicate some Kronecker functionality. But I'm also open to options (or to reasons that what I'm doing in this PR otherwise isn't a good idea).

kailuowang

I don't feel strongly about list vs stream. I will leave that to you.

kailuowang · 2018-10-31T14:44:40Z

build fails a serialization test on 2.13

[info] - Eq[Op[Function1, Int, MiniInt]].serializable.can serialize and deserialize *** FAILED ***
[info] NotSerializableException was thrown during property evaluation.
[info] Message: scala.collection.immutable.Stream$Cons

I am not sure but it could be related to scala/scala-dev#562
also related
#2360

ceedubs · 2018-11-06T15:00:06Z

I don't expect to have time to work on this for the next couple of weeks. Once I have some availability should I finish out this effort? So far it sounds like Kai might be a yes vote, but I'd like to have more buy-in before continuing.

ceedubs · 2019-01-15T15:55:55Z

Okay it looks like we have at least a couple of maintainers onboard for this sort of change. I'll work through the remaining pieces as I find the time and update this PR.

ceedubs · 2019-01-18T01:39:37Z

FWIW I ran into #2701 while working on this, so I put this effort on hold until that was resolved. But if we can't solve that one in a binary-compatible way then I can probably work around it in the tests here.

CsvCodec is looking to be problematic, since it checks equivalence of CsvCodec types which can't be converted into types with easily exhaustible input domains. I have it commented-out locally for now but will put some more thought into what our options are.

codecov-io · 2019-02-17T05:22:30Z

Codecov Report

Merging #2577 into master will decrease coverage by 0.66%.
The diff coverage is 94.73%.

@@            Coverage Diff             @@
##           master    #2577      +/-   ##
==========================================
- Coverage   95.14%   94.47%   -0.67%     
==========================================
  Files         365      367       +2     
  Lines        6816     6877      +61     
  Branches      302      303       +1     
==========================================
+ Hits         6485     6497      +12     
- Misses        331      380      +49

Impacted Files	Coverage Δ
...n/scala/cats/laws/discipline/ExhaustiveCheck.scala	`100% <100%> (ø)`
...rc/main/scala/cats/laws/discipline/Arbitrary.scala	`99.15% <100%> (+0.01%)`	⬆️
laws/src/main/scala/cats/laws/discipline/Eq.scala	`33.33% <88%> (-61.31%)`	⬇️
.../src/main/scala/cats/laws/discipline/MiniInt.scala	`97.29% <97.29%> (ø)`
core/src/main/scala/cats/data/Kleisli.scala	`90.99% <0%> (-6.31%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2718d5e...571a120. Read the comment docs.

ceedubs · 2019-02-17T05:58:54Z

Hmm strange. The build is failing with a compile error that I don't get locally. I'm not sure what's going on here.

Edit: never mind it looks like it must have been building off of some changes that have been included in master? I'll merge master into this branch again and resolve.

ceedubs · 2019-02-17T17:00:11Z

Okay I think that this PR is ready for review again. Some of it ended up getting a bit involved; for example the CSV Codec tests were checking equality of CSV codecs, which isn't feasible to do in an exhaustive way. I changed them to use a much more constrained list with a max of 6 bits.

The code coverage drop is because I've left all of the old equality instances around (but deprecated), but the tests aren't hitting them. I guess that I could add some tests specifically for them if people would like me to do that; but I'm not even sure how to call them without fatal deprecation warnings.

ceedubs · 2019-02-21T03:00:45Z

@LukaJCB would you be up for taking a look at this now that it's updated?

In order to test Invariant[Fractional], we need some way to define an Eq[Invariant[Fractional]] for some fractional type. We do this using Float as the fractional type, but this requires the use of the deprecated cats.laws.discipline.DeprecatedEqInstances#catsLawsEqForFn1. If we instead use MiniFloat, we can take advantage of ExhaustiveCheck[MiniFloat] to use cats.laws.discipline.eq#catsLawsEqForFn1Exhaustive. See typelevel#2577 for a broader explanation of this.

ceedubs added the speculative label Oct 21, 2018

ceedubs requested review from kailuowang and johnynek and removed request for johnynek October 22, 2018 15:25

johnynek reviewed Oct 22, 2018

View reviewed changes

kailuowang reviewed Oct 22, 2018

View reviewed changes

Remove ExhaustiveCheck.map method

637fbe7

kailuowang previously approved these changes Oct 26, 2018

View reviewed changes

LukaJCB previously approved these changes Jan 13, 2019

View reviewed changes

Merge branch 'master' into exhaustive-checks

e5dd6a0

ceedubs added 2 commits February 3, 2019 18:12

Merge branch 'master' into exhaustive-checks

0151836

Fix InvariantMonoidal tests

c468414

ceedubs dismissed stale reviews from LukaJCB and kailuowang via c468414 February 17, 2019 05:22

ceedubs added 2 commits February 16, 2019 21:59

Merge branch 'master' into exhaustive-checks

96d78c2

Fix tests with failing implicit resolution

571a120

LukaJCB approved these changes Feb 21, 2019

View reviewed changes

kailuowang approved these changes Apr 2, 2019

View reviewed changes

kailuowang merged commit 276a948 into typelevel:master Apr 2, 2019

kailuowang added testing and removed speculative labels Apr 11, 2019

kailuowang added this to the 2.0 milestone Apr 11, 2019

aeons mentioned this pull request Jun 14, 2019

Scala 2.13.0 support http4s/http4s#2493

Merged

11 tasks

tmccarthy mentioned this pull request Nov 1, 2021

Add small-domain floating-point type MiniFloat to discipline package #4033

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't depend on random sampling to determine function equivalence #2577

Don't depend on random sampling to determine function equivalence #2577

ceedubs commented Oct 21, 2018

johnynek Oct 22, 2018

ceedubs Oct 22, 2018

johnynek Oct 22, 2018

ceedubs Oct 22, 2018

kailuowang Oct 22, 2018

ceedubs Oct 22, 2018

kailuowang Oct 22, 2018

ceedubs Oct 22, 2018

kailuowang Oct 22, 2018 •

edited

Loading

ceedubs Oct 22, 2018

johnynek commented Oct 22, 2018

ceedubs commented Oct 22, 2018

ceedubs commented Oct 23, 2018

kailuowang left a comment

kailuowang commented Oct 31, 2018 •

edited

Loading

ceedubs commented Nov 6, 2018

ceedubs commented Jan 15, 2019

ceedubs commented Jan 18, 2019

codecov-io commented Feb 17, 2019 •

edited

Loading

ceedubs commented Feb 17, 2019 •

edited

Loading

ceedubs commented Feb 17, 2019

ceedubs commented Feb 21, 2019

Don't depend on random sampling to determine function equivalence #2577

Don't depend on random sampling to determine function equivalence #2577

Conversation

ceedubs commented Oct 21, 2018

the problem

the proposed solution

benefits

downsides

request for feedback

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kailuowang Oct 22, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnynek commented Oct 22, 2018

ceedubs commented Oct 22, 2018

ceedubs commented Oct 23, 2018

kailuowang left a comment

Choose a reason for hiding this comment

kailuowang commented Oct 31, 2018 • edited Loading

ceedubs commented Nov 6, 2018

ceedubs commented Jan 15, 2019

ceedubs commented Jan 18, 2019

codecov-io commented Feb 17, 2019 • edited Loading

Codecov Report

ceedubs commented Feb 17, 2019 • edited Loading

ceedubs commented Feb 17, 2019

ceedubs commented Feb 21, 2019

kailuowang Oct 22, 2018 •

edited

Loading

kailuowang commented Oct 31, 2018 •

edited

Loading

codecov-io commented Feb 17, 2019 •

edited

Loading

ceedubs commented Feb 17, 2019 •

edited

Loading