Shrinker repeats values a lot / Max's new idea for shrinker implementation #187

avh4 · 2017-07-06T07:17:48Z

In the following scenario, we manually run a shrink, simulating a test failure when the list is non-empty. The shrink behaves very inefficiently in that it produces empty list every other shrink (sometimes more often!) despite the fact that a pass for empty list has already been recorded. With this seed, 12 of the 21 produces values are the empty list (all seeds show a similar percentage).

Ideally the shrinker should not repeat values that have already been tested. But if that's not practical, at least Fuzz.list should be improved to avoid repeating empty list so much.

https://ellie-app.com/3G7LHDSJ2M4a1/1

module Main exposing (..)

import Html exposing (Html, text)

import Fuzz
import Test.Runner
import Random.Pcg

main : Html a
main =
    let
       first =
          Test.Runner.fuzz (Fuzz.list Fuzz.int)
          |> flip Random.Pcg.step (Random.Pcg.initialSeed 1000)
          |> Tuple.first
          
       step (v, shrink) acc b =
          case Test.Runner.shrink (v == []) shrink of
             Nothing -> (v :: acc)
             Just next ->
               step next (v :: acc) (not b)

    in
    
     Html.text (toString <| step first [] False)

The text was updated successfully, but these errors were encountered:

jwoudenberg · 2017-07-06T10:13:25Z

Counterpoint: I've seen scenario's where the fuzzer would benefit from generating the 'nil' case more often. The example involves a test which fuzzes two strings like so: map2 (,) string string. The test fails on the value ("", "") which is only encountered on a small amount of runs because the odds of two fuzzers fuzzing the empty list simultaneously is relatively small. This case deals with strings instead of lists but I think the same principle applies: that when fuzzing larger composite structures generating the minimal value simultaneously in different parts of the structure can create valuable test cases.

I don't bring this up to attack your argument AV which I think is completely valid, just wonder if we can find a solution that does the right thing more often for both cases. I believe quickcheck ramps up the complexity of the generated values as the test runs? That might be an approach that would work here too: Start generating small values (empty list / string is likely) and then send the likelihood of those values being generated down as the test ramps up.

rtfeldman · 2017-07-06T15:27:51Z

Another benefit of the "start small and increase complexity" approach is that shrinking gets faster: you try the minimal case first, so if it fails, you have no shrinking to do.

…

On Thu, Jul 6, 2017, 6:13 AM Jasper Woudenberg ***@***.***> wrote: Counterpoint: I've seen scenario's where the fuzzer would benefit from generating the 'nil' case more often. The example involves a test which fuzzes two strings like so: map2 (,) string string. The test fails on the value ("", "") which is only encountered on a small amount of runs because the odds of two fuzzers fuzzing the empty list simultaneously is relatively small. This case deals with strings instead of lists but I think the same principle applies: that when fuzzing larger composite structures generating the minimal value simultaneously in different parts of the structure can create valuable test cases. I don't bring this up to attack your argument AV which I think is completely valid, just wonder if we can find a solution that does the right thing more often for both cases. I believe quickcheck ramps up the complexity of the generated values as the test runs? That might be an approach that would work here too: Start generating small values (empty list / string is likely) and then send the likelihood of those values being generated down as the test ramps up. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#187 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABCxwJYQB6ThZdz8HiJN69cTbxK5bXbWks5sLLNFgaJpZM4OPP4H> .

avh4 · 2017-07-06T16:26:19Z

I'm not saying the shrinker shouldn't produce empty list first; I'm saying that after it tries empty list once it shouldn't keep trying it again later. (Currently, sometimes it will even produce empty list multiple times in a row!)

I would expect the map2 (,) string string to produce something like the following (assuming a simplified string:

("sdjfklwejk...sdfkwe", "weuropew...erutoerpt")
FAIL -> ("", "weuropew...erutoerpt")
FAIL -> ("", "")
PASS -> ("sdjfklwejk...sdfkwe", "")
FAIL -> ("", "a") (should not produce ("", "") again)
FAIL -> ("a", "") (should not produce ("", "") again)
FAIL -> Done (should not produce ("", "") again)

^^ (I don't really care about the order of shrinking the first/second here; just about the fact that ("", "") doesn't keep getting produced.)

The shrinker currently does:

("f","9j_^^")
FAIL -> ("","9j_^^")
FAIL -> ("","")
PASS -> ("","_^^")
FAIL -> ("","") (this was already tested!)
PASS -> ("","^^")
FAIL -> ("","") (this was already tested!)
PASS -> ("","^")
FAIL -> ("","") (this was already tested!)
PASS -> (""," ")
FAIL -> ("","") (this was already tested!)
PASS -> Done

mgold · 2017-07-07T00:29:08Z

It sounds like this would benefit from the same solution discussed in #168: "complexity credits" that allow shrinkers to control what value they try. So shrinking a pair of strings should try ("", "") first, and then never try it again. The idea being that Fuzz.map2, given zero credits, can pass only zero to the string fuzzers, which will be obliged to return "".

I wonder if random generation and shrinking should be combined? Why generate a long string if the empty string will cause a failure? That is, type alias Fuzzer a : Random.Pcg.Seed -> Int -> a? Or -> List a? Or Nonempty a? This would solve shrinking and Maybe this could even lend to a sensible definition of Fuzz.andThen?

Will need to brainstorm...

mgold · 2017-07-07T06:16:13Z

Here's a proof-of-concept for how this would work: https://ellie-app.com/3GsVhJyy8xja1/1

Notice that neither of the numbers gets very large. That's because the most credits int gets called with is 50, since they are split in half. A proper implementation would have a helper function to randomly divvy up the credits in an off-balance way.

A proper solution will also make sure we get all of the simple cases like (0,1).

By cutting out laziness entirely, we simplify our dependency stack and quite possibly improve performance, since we generate at most 100 items (less if a test fails quickly) and don't need complex lazy data structures.

My biggest concern is that this will harm shrinking performance for complex items. It would be very helpful to have a catalog of real-world failing tests and what we expect them to shrink to.

@rtfeldman @zkessin what do you think of this idea?

avh4 · 2017-07-07T06:47:28Z

Is this suggesting that the fuzzer should always generate simple values first? That seems to lose the benefit of fuzz testing, which is that fuzzing will generate examples that are likely to find new, unknown bugs and then reduce them to minimal examples; not that it will test hundreds of simple examples that probably are all going to exercise the same code paths.

I don't think we should optimize for cases that can be easily tested without fuzz tests; we should optimize for cases where fuzz tests are necessary to find bugs in complex systems.

avh4 · 2017-07-07T06:50:39Z

Also, regarding better shrinking, see https://www.cs.indiana.edu/~lepike/pubs/smartcheck.pdf

To improve counterexample reduction and generalization, we introduce SmartCheck. SmartCheck is a debugging tool that reduces algebraic data using generic search heuristics to efficiently find smaller counterexamples. In addition to shrinking, SmartCheck also automatically generalizes counterexamples to formulas representing classes of counterexamples. SmartCheck has been implemented for Haskell and is freely available.

mgold · 2017-07-07T15:28:45Z

Is this suggesting that the fuzzer should always generate simple values first?

Yes.

fuzzing will generate examples that are likely to find new, unknown bugs and then reduce them to minimal examples

The point of fuzzing is to write unit tests for you. If we wind up shrinking to small values anyway, wouldn't it have been much better to have generated them first?

The initial complaint is that fuzzers repeat many values to improve the statistical likelihood of generating those values together for mapN. This idea tries to guarantee that those values will be generated exactly once. It seems like the number of simple examples hasn't changed, except now they are slightly different simple examples, and they get tried first.

What makes fuzzers work is the tension between a random sampling of many complex values, and the deterministic shrinking that simplifies them. Roughly speaking, it seems like generating the same value many times is a result of trying to make the random generators more predictable.

avh4 · 2017-07-07T15:41:16Z

(For the record, I disagree that the main use of fuzz tests is to avoid writing unit tests; I think that ignores their more important use which is finding bugs that are impossible or hard to find with unit tests. But not expecting to resolve that here; just adding that as a note)

To the point of this issue, I guess I wasn't clear in the original post, which was meant to report the issue that the shrinker generates the same value many times (not that the fuzzer is generating poor values to start with). I think the originally-generated values are good; it's just when there's a failure that needs to be shrunk that there is a lot of duplicated computation, namely that with list and mapN, more than 50% of the generated shrink cases are the exact same input value.

For my use, I am running tests where each evaluation takes 5-20 seconds, so reducing the number of shrink examples by >50% with no loss of coverage would be a huge win.

mgold · 2017-07-07T15:55:52Z

finding bugs that are impossible or hard to find with unit tests

Hmm. It's been my experience that most bugs get shrunken pretty small, even if they start huge. But yes, sometimes bugs only happen for large examples.

the shrinker generates the same value many times (not that the fuzzer is generating poor values to start with)

Ah! Those live at elm-community/shrink and I am open to new ideas for improving those, including prior art in Haskell. I think it likes to shrink lists to the empty list because that's a big win if the empty list fails and we tried it immediately. (And again, my questionable belief that most counterexamples are small.)

avh4 · 2017-07-07T16:30:15Z

Ah, got it -- should I move this issue to https://github.com/elm-community/shrink ?

jwoudenberg · 2017-07-07T17:23:45Z

I'm really sorry @avh4 for dragging your post of topic, I failed miserably at reading it.

It seems that shrinking responsibility is split a bit across the Shrink library and the Fuzz library at the moment. Most primitive shrinkers come from the shrink package but the list shrinking is actually implemented in elm-test directly at the moment (the shrink package exports one but it isn't used in elm-test). So I think the issue is at home here.

Looking at the code, the current algorithm seems to be that it tries to shrink the list by removing an element and if that doesn't work shrink the list by shrinking an element. Then if it succeeds shrinking any element it starts over by trying to remove an element again. It seems hard to encode into this logic the rule that it's useless to shrink a value before removing it (which is happening over and over).

Perhaps a better way to look at the shrinking of the list is to say we only do the shrinking of elements part, with the difference that the smallest value for any element in the list is it not-existing (which would leave that element out of the list). In that context it's much easier to express the idea that the shrinker can try the smallest value (removing it from the list) once but should never return to it if that fails.

rtfeldman · 2017-07-07T17:48:53Z

Random aside: I don't think it's important that shrinking live in a separate package. If the best shrinking strategy for elm-test turns out to be coupled with the elm-test API, that's fine. 😄

avh4 mentioned this issue Jul 6, 2017

Add Fuzz.set, Fuzz.dict #188

Open

avh4 changed the title ~~Fuzz.list repeats values a lot when shrinking~~ Shrinker repeats values a lot (maybe specific to Fuzz.list, Fuzz.map2, etc?) Jul 6, 2017

mgold added the fuzzers label Jul 7, 2017

mgold changed the title ~~Shrinker repeats values a lot (maybe specific to Fuzz.list, Fuzz.map2, etc?)~~ Shrinker repeats values a lot / Max's new idea for shrinker implementation Jul 7, 2017

jwoudenberg mentioned this issue Jul 27, 2017

Improve list shrinking #199

Merged

jwoudenberg closed this as completed in #199 Aug 24, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shrinker repeats values a lot / Max's new idea for shrinker implementation #187

Shrinker repeats values a lot / Max's new idea for shrinker implementation #187

avh4 commented Jul 6, 2017

jwoudenberg commented Jul 6, 2017

rtfeldman commented Jul 6, 2017 via email

avh4 commented Jul 6, 2017 •

edited

Loading

mgold commented Jul 7, 2017

mgold commented Jul 7, 2017 •

edited

Loading

avh4 commented Jul 7, 2017

avh4 commented Jul 7, 2017

mgold commented Jul 7, 2017

avh4 commented Jul 7, 2017 •

edited

Loading

mgold commented Jul 7, 2017

avh4 commented Jul 7, 2017

jwoudenberg commented Jul 7, 2017 •

edited

Loading

rtfeldman commented Jul 7, 2017 •

edited

Loading

Shrinker repeats values a lot / Max's new idea for shrinker implementation #187

Shrinker repeats values a lot / Max's new idea for shrinker implementation #187

Comments

avh4 commented Jul 6, 2017

jwoudenberg commented Jul 6, 2017

rtfeldman commented Jul 6, 2017 via email

avh4 commented Jul 6, 2017 • edited Loading

mgold commented Jul 7, 2017

mgold commented Jul 7, 2017 • edited Loading

avh4 commented Jul 7, 2017

avh4 commented Jul 7, 2017

mgold commented Jul 7, 2017

avh4 commented Jul 7, 2017 • edited Loading

mgold commented Jul 7, 2017

avh4 commented Jul 7, 2017

jwoudenberg commented Jul 7, 2017 • edited Loading

rtfeldman commented Jul 7, 2017 • edited Loading

avh4 commented Jul 6, 2017 •

edited

Loading

mgold commented Jul 7, 2017 •

edited

Loading

avh4 commented Jul 7, 2017 •

edited

Loading

jwoudenberg commented Jul 7, 2017 •

edited

Loading

rtfeldman commented Jul 7, 2017 •

edited

Loading