Skip to content
This repository has been archived by the owner on Oct 9, 2019. It is now read-only.

Shrinker repeats values a lot / Max's new idea for shrinker implementation #187

Closed
avh4 opened this issue Jul 6, 2017 · 13 comments · Fixed by #199
Closed

Shrinker repeats values a lot / Max's new idea for shrinker implementation #187

avh4 opened this issue Jul 6, 2017 · 13 comments · Fixed by #199
Labels

Comments

@avh4
Copy link
Member

avh4 commented Jul 6, 2017

In the following scenario, we manually run a shrink, simulating a test failure when the list is non-empty. The shrink behaves very inefficiently in that it produces empty list every other shrink (sometimes more often!) despite the fact that a pass for empty list has already been recorded. With this seed, 12 of the 21 produces values are the empty list (all seeds show a similar percentage).

Ideally the shrinker should not repeat values that have already been tested. But if that's not practical, at least Fuzz.list should be improved to avoid repeating empty list so much.

https://ellie-app.com/3G7LHDSJ2M4a1/1

module Main exposing (..)

import Html exposing (Html, text)

import Fuzz
import Test.Runner
import Random.Pcg

main : Html a
main =
    let
       first =
          Test.Runner.fuzz (Fuzz.list Fuzz.int)
          |> flip Random.Pcg.step (Random.Pcg.initialSeed 1000)
          |> Tuple.first
          
       step (v, shrink) acc b =
          case Test.Runner.shrink (v == []) shrink of
             Nothing -> (v :: acc)
             Just next ->
               step next (v :: acc) (not b)

    in
    
     Html.text (toString <| step first [] False)
@jwoudenberg
Copy link
Collaborator

Counterpoint: I've seen scenario's where the fuzzer would benefit from generating the 'nil' case more often. The example involves a test which fuzzes two strings like so: map2 (,) string string. The test fails on the value ("", "") which is only encountered on a small amount of runs because the odds of two fuzzers fuzzing the empty list simultaneously is relatively small. This case deals with strings instead of lists but I think the same principle applies: that when fuzzing larger composite structures generating the minimal value simultaneously in different parts of the structure can create valuable test cases.

I don't bring this up to attack your argument AV which I think is completely valid, just wonder if we can find a solution that does the right thing more often for both cases. I believe quickcheck ramps up the complexity of the generated values as the test runs? That might be an approach that would work here too: Start generating small values (empty list / string is likely) and then send the likelihood of those values being generated down as the test ramps up.

@rtfeldman
Copy link
Member

rtfeldman commented Jul 6, 2017 via email

@avh4
Copy link
Member Author

avh4 commented Jul 6, 2017

I'm not saying the shrinker shouldn't produce empty list first; I'm saying that after it tries empty list once it shouldn't keep trying it again later. (Currently, sometimes it will even produce empty list multiple times in a row!)

I would expect the map2 (,) string string to produce something like the following (assuming a simplified string:

  • ("sdjfklwejk...sdfkwe", "weuropew...erutoerpt")
  • FAIL -> ("", "weuropew...erutoerpt")
  • FAIL -> ("", "")
  • PASS -> ("sdjfklwejk...sdfkwe", "")
  • FAIL -> ("", "a") (should not produce ("", "") again)
  • FAIL -> ("a", "") (should not produce ("", "") again)
  • FAIL -> Done (should not produce ("", "") again)

^^ (I don't really care about the order of shrinking the first/second here; just about the fact that ("", "") doesn't keep getting produced.)

The shrinker currently does:

  • ("f","9j_^^")
  • FAIL -> ("","9j_^^")
  • FAIL -> ("","")
  • PASS -> ("","_^^")
  • FAIL -> ("","") (this was already tested!)
  • PASS -> ("","^^")
  • FAIL -> ("","") (this was already tested!)
  • PASS -> ("","^")
  • FAIL -> ("","") (this was already tested!)
  • PASS -> (""," ")
  • FAIL -> ("","") (this was already tested!)
  • PASS -> Done

@avh4 avh4 changed the title Fuzz.list repeats values a lot when shrinking Shrinker repeats values a lot (maybe specific to Fuzz.list, Fuzz.map2, etc?) Jul 6, 2017
@mgold
Copy link
Member

mgold commented Jul 7, 2017

It sounds like this would benefit from the same solution discussed in #168: "complexity credits" that allow shrinkers to control what value they try. So shrinking a pair of strings should try ("", "") first, and then never try it again. The idea being that Fuzz.map2, given zero credits, can pass only zero to the string fuzzers, which will be obliged to return "".

I wonder if random generation and shrinking should be combined? Why generate a long string if the empty string will cause a failure? That is, type alias Fuzzer a : Random.Pcg.Seed -> Int -> a? Or -> List a? Or Nonempty a? This would solve shrinking and Maybe this could even lend to a sensible definition of Fuzz.andThen?

Will need to brainstorm...

@mgold mgold added the fuzzers label Jul 7, 2017
@mgold
Copy link
Member

mgold commented Jul 7, 2017

Here's a proof-of-concept for how this would work: https://ellie-app.com/3GsVhJyy8xja1/1

Notice that neither of the numbers gets very large. That's because the most credits int gets called with is 50, since they are split in half. A proper implementation would have a helper function to randomly divvy up the credits in an off-balance way.

A proper solution will also make sure we get all of the simple cases like (0,1).

By cutting out laziness entirely, we simplify our dependency stack and quite possibly improve performance, since we generate at most 100 items (less if a test fails quickly) and don't need complex lazy data structures.

My biggest concern is that this will harm shrinking performance for complex items. It would be very helpful to have a catalog of real-world failing tests and what we expect them to shrink to.

@rtfeldman @zkessin what do you think of this idea?

@mgold mgold changed the title Shrinker repeats values a lot (maybe specific to Fuzz.list, Fuzz.map2, etc?) Shrinker repeats values a lot / Max's new idea for shrinker implementation Jul 7, 2017
@avh4
Copy link
Member Author

avh4 commented Jul 7, 2017

Is this suggesting that the fuzzer should always generate simple values first? That seems to lose the benefit of fuzz testing, which is that fuzzing will generate examples that are likely to find new, unknown bugs and then reduce them to minimal examples; not that it will test hundreds of simple examples that probably are all going to exercise the same code paths.

I don't think we should optimize for cases that can be easily tested without fuzz tests; we should optimize for cases where fuzz tests are necessary to find bugs in complex systems.

@avh4
Copy link
Member Author

avh4 commented Jul 7, 2017

Also, regarding better shrinking, see https://www.cs.indiana.edu/~lepike/pubs/smartcheck.pdf

To improve counterexample reduction and generalization, we introduce SmartCheck. SmartCheck is a debugging tool that reduces algebraic data using generic search heuristics to efficiently find smaller counterexamples. In addition to shrinking, SmartCheck also automatically generalizes counterexamples to formulas representing classes of counterexamples. SmartCheck has been implemented for Haskell and is freely available.

@mgold
Copy link
Member

mgold commented Jul 7, 2017

Is this suggesting that the fuzzer should always generate simple values first?

Yes.

fuzzing will generate examples that are likely to find new, unknown bugs and then reduce them to minimal examples

The point of fuzzing is to write unit tests for you. If we wind up shrinking to small values anyway, wouldn't it have been much better to have generated them first?

The initial complaint is that fuzzers repeat many values to improve the statistical likelihood of generating those values together for mapN. This idea tries to guarantee that those values will be generated exactly once. It seems like the number of simple examples hasn't changed, except now they are slightly different simple examples, and they get tried first.

What makes fuzzers work is the tension between a random sampling of many complex values, and the deterministic shrinking that simplifies them. Roughly speaking, it seems like generating the same value many times is a result of trying to make the random generators more predictable.

@avh4
Copy link
Member Author

avh4 commented Jul 7, 2017

(For the record, I disagree that the main use of fuzz tests is to avoid writing unit tests; I think that ignores their more important use which is finding bugs that are impossible or hard to find with unit tests. But not expecting to resolve that here; just adding that as a note)

To the point of this issue, I guess I wasn't clear in the original post, which was meant to report the issue that the shrinker generates the same value many times (not that the fuzzer is generating poor values to start with). I think the originally-generated values are good; it's just when there's a failure that needs to be shrunk that there is a lot of duplicated computation, namely that with list and mapN, more than 50% of the generated shrink cases are the exact same input value.

For my use, I am running tests where each evaluation takes 5-20 seconds, so reducing the number of shrink examples by >50% with no loss of coverage would be a huge win.

@mgold
Copy link
Member

mgold commented Jul 7, 2017

finding bugs that are impossible or hard to find with unit tests

Hmm. It's been my experience that most bugs get shrunken pretty small, even if they start huge. But yes, sometimes bugs only happen for large examples.

the shrinker generates the same value many times (not that the fuzzer is generating poor values to start with)

Ah! Those live at elm-community/shrink and I am open to new ideas for improving those, including prior art in Haskell. I think it likes to shrink lists to the empty list because that's a big win if the empty list fails and we tried it immediately. (And again, my questionable belief that most counterexamples are small.)

@avh4
Copy link
Member Author

avh4 commented Jul 7, 2017

Ah, got it -- should I move this issue to https://github.com/elm-community/shrink ?

@jwoudenberg
Copy link
Collaborator

jwoudenberg commented Jul 7, 2017

I'm really sorry @avh4 for dragging your post of topic, I failed miserably at reading it.

It seems that shrinking responsibility is split a bit across the Shrink library and the Fuzz library at the moment. Most primitive shrinkers come from the shrink package but the list shrinking is actually implemented in elm-test directly at the moment (the shrink package exports one but it isn't used in elm-test). So I think the issue is at home here.

Looking at the code, the current algorithm seems to be that it tries to shrink the list by removing an element and if that doesn't work shrink the list by shrinking an element. Then if it succeeds shrinking any element it starts over by trying to remove an element again. It seems hard to encode into this logic the rule that it's useless to shrink a value before removing it (which is happening over and over).

Perhaps a better way to look at the shrinking of the list is to say we only do the shrinking of elements part, with the difference that the smallest value for any element in the list is it not-existing (which would leave that element out of the list). In that context it's much easier to express the idea that the shrinker can try the smallest value (removing it from the list) once but should never return to it if that fails.

@rtfeldman
Copy link
Member

rtfeldman commented Jul 7, 2017

Random aside: I don't think it's important that shrinking live in a separate package. If the best shrinking strategy for elm-test turns out to be coupled with the elm-test API, that's fine. 😄

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants