Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement swarm testing and use it for rule based stateful tests #2238

Merged
merged 12 commits into from
Nov 28, 2019

Conversation

DRMacIver
Copy link
Member

I have marking to do today, which is why I decided to suddenly do a thing I've been meaning to do for years and implement proper swarm testing in Hypothesis. Currently this is only used for stateful testing, because it's the place where it's most obviously a win.

Closes #1637 though we should open some new tickets for actually making use of it more broadly and including it in our public API. I will do that after this has been merged.

@DRMacIver
Copy link
Member Author

CC @agroce for interest.

@DRMacIver DRMacIver force-pushed the DRMacIver/stateful-swarms branch 2 times, most recently from fc2a23f to 7c90547 Compare November 27, 2019 14:17
@agroce
Copy link

agroce commented Nov 27, 2019

I'll be really interested to see how this does!

@DRMacIver
Copy link
Member Author

I'll be really interested to see how this does!

I will too, though unfortunately we don't have great ways of finding out. There's not currently any sort of established set of benchmarks for Hypothesis's stateful testing that would give us good feedback on its effectiveness.

@agroce
Copy link

agroce commented Nov 27, 2019

@regehr probably will want to know this is in the works, too

@Zac-HD Zac-HD added the new-feature entirely novel capabilities or strategies label Nov 28, 2019
Copy link
Member

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super exciting to see this coming to Hypothesis 😁

hypothesis-python/RELEASE.rst Outdated Show resolved Hide resolved
hypothesis-python/RELEASE.rst Outdated Show resolved Hide resolved
Copy link
Member

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@DRMacIver DRMacIver merged commit eaac6c3 into master Nov 28, 2019
@DRMacIver DRMacIver deleted the DRMacIver/stateful-swarms branch November 28, 2019 15:19
@Zac-HD
Copy link
Member

Zac-HD commented Dec 9, 2019

we should open some new tickets for actually making use of it more broadly and including it in our public API. I will do that after this has been merged.

@DRMacIver - anything specific in mind? I'm happy to write up some initial directions if that would help.

@agroce
Copy link

agroce commented Dec 10, 2019

So in Hypothesis, what does swarm mean outside rule-based testing (there it's obvious there's a small finite choice set, and swarm is built for that)? characters and one_of are obvious; or is this going to live at a deeper level?

@DRMacIver
Copy link
Member Author

is this going to live at a deeper level?

I don't think it's going to live at a deeper level - the plan is definitely to make it explicit.

As well as characters and one_of, we could also use it in the Lark integration with swarm features for each production rule.

I'm also keen to get it into the public API but that's blocked on sorting out printing of things where we only have the right repr for them at the end of the test execution - I know how to make this work, I just haven't yet.

@agroce
Copy link

agroce commented Dec 10, 2019

DeepState has it now, though I am annoyed the binary rep of test T with exactly the same semantics is different under swarm, because it has to record the coin flips. Having a utility to convert tests to/from swarm rep will work but seems really klugy.

BTW did I get OneOf from your one_of? I was wondering why I used that rather than choose or McCarthy's amb... seeing it in hypothesis and thinking it looked nice might explain :)

@agroce
Copy link

agroce commented Dec 10, 2019

(I can't even have DeepState "fix up" a test after it's done by removing the swarm and changing the choice bytes, because the input bytes are coming from AFL or libFuzzer or somebody, usually, and I don't really get to do anything but execute them)

@DRMacIver
Copy link
Member Author

DeepState has it now, though I am annoyed the binary rep of test T with exactly the same semantics is different under swarm, because it has to record the coin flips. Having a utility to convert tests to/from swarm rep will work but seems really klugy.

Yeah, the fragility of the underlying choice sequence representation for test cases is the biggest draw back of the whole thing. There are a bunch of tricks that are being used in the Hypothesis implementation of swarm testing to offset that fragility, but they're more so that reduction works well than anything else, and they only work because Hypothesis has a lot of scope to rewrite the byte stream.

BTW did I get OneOf from your one_of? I was wondering why I used that rather than choose or McCarthy's amb... seeing it in hypothesis and thinking it looked nice might explain :)

It's entirely possible! It's been called that in Hypothesis from fairly early on (2015ish certainly). I don't remember the exact chain of events that lead to that naming.

@agroce
Copy link

agroce commented Dec 10, 2019

Fortunately, reduction works fine, DeepState's reducer is pretty capable, and nicely tends to produce the minimal swarm config needed, even. In a sense, in AFL world, the swarm info is not "useless" bytes, because AFL may have tuned the choices, even if not needed in "this here" test.

I bet I got this from you, then!

@DRMacIver
Copy link
Member Author

Fortunately, reduction works fine, DeepState's reducer is pretty capable, and nicely tends to produce the minimal swarm config needed, even. In a sense, in AFL world, the swarm info is not "useless" bytes, because AFL may have tuned the choices, even if not needed in "this here" test.

It would work fine in Hypothesis if we made all the decisions up front, but because the decisions are made lazily a little more care is needed. Essentially the problem is what happens if you delete the first place a decision is made.

@agroce
Copy link

agroce commented Dec 10, 2019

Ok, we're lazy too, so we have the same issue, but it seems to work in practice. You probably have a much more hands-on view of the test, though; in some sense we're just a "test replay" API with a huge number of bells and whistles, except in symbolic mode, so bytes radically changing meaning in reduction is fine, so long as you satisfy the criteria. And in symbolic, for now I just always use the full configuration (though long term, I want to automatically concretize and fork on a set of "well chosen" swarm configs, based on empirical data about how many things are needed for bugs).

@DRMacIver
Copy link
Member Author

bytes radically changing meaning in reduction is fine, so long as you satisfy the criteria

This is true for us too!

The difference is probably mostly one of guarantees. If your reduction relies on reinterpreting things then it tends to only succeed if you get lucky - e.g. because that part of the test case doesn't matter. Reduction tries enough things that it has a lot of opportunities to get lucky so it's usually fine, but I prefer to design things so that they're guaranteed to make certain kinds of progress. Partly because I'm slightly obsessive, but also because I've found that reductions that rely on luck often end up being weirdly slow because they keep being able to make small amounts of progress which unlock other small amounts of progress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new-feature entirely novel capabilities or strategies
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support automatic 'swarm testing' for example selection
4 participants