-
Notifications
You must be signed in to change notification settings - Fork 35
Shrinker repeats values a lot / Max's new idea for shrinker implementation #187
Comments
Counterpoint: I've seen scenario's where the fuzzer would benefit from generating the 'nil' case more often. The example involves a test which fuzzes two strings like so: I don't bring this up to attack your argument AV which I think is completely valid, just wonder if we can find a solution that does the right thing more often for both cases. I believe quickcheck ramps up the complexity of the generated values as the test runs? That might be an approach that would work here too: Start generating small values (empty list / string is likely) and then send the likelihood of those values being generated down as the test ramps up. |
Another benefit of the "start small and increase complexity" approach is
that shrinking gets faster: you try the minimal case first, so if it fails,
you have no shrinking to do.
…On Thu, Jul 6, 2017, 6:13 AM Jasper Woudenberg ***@***.***> wrote:
Counterpoint: I've seen scenario's where the fuzzer would benefit from
generating the 'nil' case more often. The example involves a test which
fuzzes two strings like so: map2 (,) string string. The test fails on the
value ("", "") which is only encountered on a small amount of runs
because the odds of two fuzzers fuzzing the empty list simultaneously is
relatively small. This case deals with strings instead of lists but I think
the same principle applies: that when fuzzing larger composite structures
generating the minimal value simultaneously in different parts of the
structure can create valuable test cases.
I don't bring this up to attack your argument AV which I think is
completely valid, just wonder if we can find a solution that does the right
thing more often for both cases. I believe quickcheck ramps up the
complexity of the generated values as the test runs? That might be an
approach that would work here too: Start generating small values (empty
list / string is likely) and then send the likelihood of those values being
generated down as the test ramps up.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#187 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABCxwJYQB6ThZdz8HiJN69cTbxK5bXbWks5sLLNFgaJpZM4OPP4H>
.
|
I'm not saying the shrinker shouldn't produce empty list first; I'm saying that after it tries empty list once it shouldn't keep trying it again later. (Currently, sometimes it will even produce empty list multiple times in a row!) I would expect the
^^ (I don't really care about the order of shrinking the first/second here; just about the fact that The shrinker currently does:
|
It sounds like this would benefit from the same solution discussed in #168: "complexity credits" that allow shrinkers to control what value they try. So shrinking a pair of strings should try I wonder if random generation and shrinking should be combined? Why generate a long string if the empty string will cause a failure? That is, Will need to brainstorm... |
Here's a proof-of-concept for how this would work: https://ellie-app.com/3GsVhJyy8xja1/1 Notice that neither of the numbers gets very large. That's because the most credits A proper solution will also make sure we get all of the simple cases like By cutting out laziness entirely, we simplify our dependency stack and quite possibly improve performance, since we generate at most 100 items (less if a test fails quickly) and don't need complex lazy data structures. My biggest concern is that this will harm shrinking performance for complex items. It would be very helpful to have a catalog of real-world failing tests and what we expect them to shrink to. @rtfeldman @zkessin what do you think of this idea? |
Is this suggesting that the fuzzer should always generate simple values first? That seems to lose the benefit of fuzz testing, which is that fuzzing will generate examples that are likely to find new, unknown bugs and then reduce them to minimal examples; not that it will test hundreds of simple examples that probably are all going to exercise the same code paths. I don't think we should optimize for cases that can be easily tested without fuzz tests; we should optimize for cases where fuzz tests are necessary to find bugs in complex systems. |
Also, regarding better shrinking, see https://www.cs.indiana.edu/~lepike/pubs/smartcheck.pdf
|
Yes.
The point of fuzzing is to write unit tests for you. If we wind up shrinking to small values anyway, wouldn't it have been much better to have generated them first? The initial complaint is that fuzzers repeat many values to improve the statistical likelihood of generating those values together for What makes fuzzers work is the tension between a random sampling of many complex values, and the deterministic shrinking that simplifies them. Roughly speaking, it seems like generating the same value many times is a result of trying to make the random generators more predictable. |
(For the record, I disagree that the main use of fuzz tests is to avoid writing unit tests; I think that ignores their more important use which is finding bugs that are impossible or hard to find with unit tests. But not expecting to resolve that here; just adding that as a note) To the point of this issue, I guess I wasn't clear in the original post, which was meant to report the issue that the shrinker generates the same value many times (not that the fuzzer is generating poor values to start with). I think the originally-generated values are good; it's just when there's a failure that needs to be shrunk that there is a lot of duplicated computation, namely that with For my use, I am running tests where each evaluation takes 5-20 seconds, so reducing the number of shrink examples by >50% with no loss of coverage would be a huge win. |
Hmm. It's been my experience that most bugs get shrunken pretty small, even if they start huge. But yes, sometimes bugs only happen for large examples.
Ah! Those live at elm-community/shrink and I am open to new ideas for improving those, including prior art in Haskell. I think it likes to shrink lists to the empty list because that's a big win if the empty list fails and we tried it immediately. (And again, my questionable belief that most counterexamples are small.) |
Ah, got it -- should I move this issue to https://github.com/elm-community/shrink ? |
I'm really sorry @avh4 for dragging your post of topic, I failed miserably at reading it. It seems that shrinking responsibility is split a bit across the Shrink library and the Fuzz library at the moment. Most primitive shrinkers come from the shrink package but the list shrinking is actually implemented in elm-test directly at the moment (the shrink package exports one but it isn't used in elm-test). So I think the issue is at home here. Looking at the code, the current algorithm seems to be that it tries to shrink the list by removing an element and if that doesn't work shrink the list by shrinking an element. Then if it succeeds shrinking any element it starts over by trying to remove an element again. It seems hard to encode into this logic the rule that it's useless to shrink a value before removing it (which is happening over and over). Perhaps a better way to look at the shrinking of the list is to say we only do the shrinking of elements part, with the difference that the smallest value for any element in the list is it not-existing (which would leave that element out of the list). In that context it's much easier to express the idea that the shrinker can try the smallest value (removing it from the list) once but should never return to it if that fails. |
Random aside: I don't think it's important that shrinking live in a separate package. If the best shrinking strategy for |
In the following scenario, we manually run a shrink, simulating a test failure when the list is non-empty. The shrink behaves very inefficiently in that it produces empty list every other shrink (sometimes more often!) despite the fact that a pass for empty list has already been recorded. With this seed, 12 of the 21 produces values are the empty list (all seeds show a similar percentage).
Ideally the shrinker should not repeat values that have already been tested. But if that's not practical, at least
Fuzz.list
should be improved to avoid repeating empty list so much.https://ellie-app.com/3G7LHDSJ2M4a1/1
The text was updated successfully, but these errors were encountered: