Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparing generated instances by print cb output #38

Open
silentbicycle opened this issue Feb 27, 2018 · 6 comments
Open

Comparing generated instances by print cb output #38

silentbicycle opened this issue Feb 27, 2018 · 6 comments

Comments

@silentbicycle
Copy link
Owner

With how autoshrinking works, there currently isn't a good way to detect generated instances that are structurally equivalent, but produced by a different random bitstream. This can lead to reporting the same failure several times, because it's generating the same input value by different code paths.

For example, the built-in generators have tables of interesting values, but the shrinker doesn't know whether a random uint16_t chose 65535 through random generation, or because it came from the table.

It might work well to treat instances whose print callback output are the same. There multi-core branch will already be adding buffering for each worker's stdout -- we could print to a buffer, check its hash in the bloom filter, and use that to determine whether it has already been tried. This probably shouldn't be enabled by default (at least not until it's already been optional for a release), because it could cause surprising interactions with existing tests.

If this causes strange behavior during shrinking, it may not be worthwhile, but it would be a good experiment. (For example, it might get hung up on input generated one way, but refuse to shrink further because the intermediate output has already been marked as tried. Other bugs caused that to happen while working on autoshrinking.)

@DRMacIver
Copy link

I'd strongly recommend not using this to indicate already tried while shrinking. Moving to a smaller representation of the same thing is great for shrinking! It decreases the example size, which improves the performance (and sometimes quality) of future shrinks.

@silentbicycle
Copy link
Owner Author

Perhaps not for shrinking, but it seems like there's less downside for reporting -- there's probably little value in reporting another failure that is represented with a byte-identical string, beyond noting it as another seed that leads to a failure.

It's a different issue it if was initially a different failure that became shadowed by a common, simpler failure while shrinking, but failure tagging (#14) attempts to address that directly.

@DRMacIver
Copy link

Perhaps not for shrinking, but it seems like there's less downside for reporting -- there's probably little value in reporting another failure that is represented with a byte-identical string, beyond noting it as another seed that leads to a failure.

Yeah, I agree, that seems perfectly sensible! Though I wonder if it has enough of a hit rate to be worth it. Do people often include pointer values in their output? If so the output probably isn't even stable for a fixed bit stream.

(Hypothesis doesn't do this because the only time we show multiple final examples is when they lead to a different error)

@silentbicycle
Copy link
Owner Author

I don't know if people include pointers, but I've been watching theft report the an identical failure about once a minute today. (Generating source to fuzz a compiler.)

The crux of the problem is: how do I detect that it IS a different error? (I'd rather not depend on instrumentation from a specific compiler, or other non-portable approaches.)

@DRMacIver
Copy link

I don't know if people include pointers, but I've been watching theft report the an identical failure about once a minute today. (Generating source to fuzz a compiler.)

Fair enough!

The crux of the problem is: how do I detect that it IS a different error? (I'd rather not depend on instrumentation from a specific compiler, or other non-portable approaches.)

Yeah this is easier in languages that aren't C! In Hypothesis it's just line number + exception type.

@silentbicycle
Copy link
Owner Author

This is another case where the optional coverage reporting hook (#43) could be useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants