Comparing generated instances by `print` cb output #38

silentbicycle · 2018-02-27T16:39:25Z

With how autoshrinking works, there currently isn't a good way to detect generated instances that are structurally equivalent, but produced by a different random bitstream. This can lead to reporting the same failure several times, because it's generating the same input value by different code paths.

For example, the built-in generators have tables of interesting values, but the shrinker doesn't know whether a random uint16_t chose 65535 through random generation, or because it came from the table.

It might work well to treat instances whose print callback output are the same. There multi-core branch will already be adding buffering for each worker's stdout -- we could print to a buffer, check its hash in the bloom filter, and use that to determine whether it has already been tried. This probably shouldn't be enabled by default (at least not until it's already been optional for a release), because it could cause surprising interactions with existing tests.

If this causes strange behavior during shrinking, it may not be worthwhile, but it would be a good experiment. (For example, it might get hung up on input generated one way, but refuse to shrink further because the intermediate output has already been marked as tried. Other bugs caused that to happen while working on autoshrinking.)

The text was updated successfully, but these errors were encountered:

DRMacIver · 2018-02-27T16:53:15Z

I'd strongly recommend not using this to indicate already tried while shrinking. Moving to a smaller representation of the same thing is great for shrinking! It decreases the example size, which improves the performance (and sometimes quality) of future shrinks.

silentbicycle · 2018-02-27T17:09:04Z

Perhaps not for shrinking, but it seems like there's less downside for reporting -- there's probably little value in reporting another failure that is represented with a byte-identical string, beyond noting it as another seed that leads to a failure.

It's a different issue it if was initially a different failure that became shadowed by a common, simpler failure while shrinking, but failure tagging (#14) attempts to address that directly.

DRMacIver · 2018-02-27T17:38:11Z

Perhaps not for shrinking, but it seems like there's less downside for reporting -- there's probably little value in reporting another failure that is represented with a byte-identical string, beyond noting it as another seed that leads to a failure.

Yeah, I agree, that seems perfectly sensible! Though I wonder if it has enough of a hit rate to be worth it. Do people often include pointer values in their output? If so the output probably isn't even stable for a fixed bit stream.

(Hypothesis doesn't do this because the only time we show multiple final examples is when they lead to a different error)

silentbicycle · 2018-02-27T17:56:59Z

I don't know if people include pointers, but I've been watching theft report the an identical failure about once a minute today. (Generating source to fuzz a compiler.)

The crux of the problem is: how do I detect that it IS a different error? (I'd rather not depend on instrumentation from a specific compiler, or other non-portable approaches.)

DRMacIver · 2018-02-27T18:11:58Z

I don't know if people include pointers, but I've been watching theft report the an identical failure about once a minute today. (Generating source to fuzz a compiler.)

Fair enough!

The crux of the problem is: how do I detect that it IS a different error? (I'd rather not depend on instrumentation from a specific compiler, or other non-portable approaches.)

Yeah this is easier in languages that aren't C! In Hypothesis it's just line number + exception type.

silentbicycle · 2018-11-04T19:37:29Z

This is another case where the optional coverage reporting hook (#43) could be useful.

silentbicycle added enhancement question usability labels Feb 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparing generated instances by `print` cb output #38

Comparing generated instances by `print` cb output #38

silentbicycle commented Feb 27, 2018

DRMacIver commented Feb 27, 2018

silentbicycle commented Feb 27, 2018

DRMacIver commented Feb 27, 2018

silentbicycle commented Feb 27, 2018

DRMacIver commented Feb 27, 2018

silentbicycle commented Nov 4, 2018

Comparing generated instances by print cb output #38

Comparing generated instances by print cb output #38

Comments

silentbicycle commented Feb 27, 2018

DRMacIver commented Feb 27, 2018

silentbicycle commented Feb 27, 2018

DRMacIver commented Feb 27, 2018

silentbicycle commented Feb 27, 2018

DRMacIver commented Feb 27, 2018

silentbicycle commented Nov 4, 2018

Comparing generated instances by `print` cb output #38

Comparing generated instances by `print` cb output #38