-
Notifications
You must be signed in to change notification settings - Fork 587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement swarm testing and use it for rule based stateful tests #2238
Conversation
CC @agroce for interest. |
fc2a23f
to
7c90547
Compare
I'll be really interested to see how this does! |
I will too, though unfortunately we don't have great ways of finding out. There's not currently any sort of established set of benchmarks for Hypothesis's stateful testing that would give us good feedback on its effectiveness. |
@regehr probably will want to know this is in the works, too |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super exciting to see this coming to Hypothesis 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
d8a876c
to
42ee3be
Compare
@DRMacIver - anything specific in mind? I'm happy to write up some initial directions if that would help. |
So in Hypothesis, what does swarm mean outside rule-based testing (there it's obvious there's a small finite choice set, and swarm is built for that)? |
I don't think it's going to live at a deeper level - the plan is definitely to make it explicit. As well as I'm also keen to get it into the public API but that's blocked on sorting out printing of things where we only have the right repr for them at the end of the test execution - I know how to make this work, I just haven't yet. |
DeepState has it now, though I am annoyed the binary rep of test T with exactly the same semantics is different under swarm, because it has to record the coin flips. Having a utility to convert tests to/from swarm rep will work but seems really klugy. BTW did I get |
(I can't even have DeepState "fix up" a test after it's done by removing the swarm and changing the choice bytes, because the input bytes are coming from AFL or libFuzzer or somebody, usually, and I don't really get to do anything but execute them) |
Yeah, the fragility of the underlying choice sequence representation for test cases is the biggest draw back of the whole thing. There are a bunch of tricks that are being used in the Hypothesis implementation of swarm testing to offset that fragility, but they're more so that reduction works well than anything else, and they only work because Hypothesis has a lot of scope to rewrite the byte stream.
It's entirely possible! It's been called that in Hypothesis from fairly early on (2015ish certainly). I don't remember the exact chain of events that lead to that naming. |
Fortunately, reduction works fine, DeepState's reducer is pretty capable, and nicely tends to produce the minimal swarm config needed, even. In a sense, in AFL world, the swarm info is not "useless" bytes, because AFL may have tuned the choices, even if not needed in "this here" test. I bet I got this from you, then! |
It would work fine in Hypothesis if we made all the decisions up front, but because the decisions are made lazily a little more care is needed. Essentially the problem is what happens if you delete the first place a decision is made. |
Ok, we're lazy too, so we have the same issue, but it seems to work in practice. You probably have a much more hands-on view of the test, though; in some sense we're just a "test replay" API with a huge number of bells and whistles, except in symbolic mode, so bytes radically changing meaning in reduction is fine, so long as you satisfy the criteria. And in symbolic, for now I just always use the full configuration (though long term, I want to automatically concretize and fork on a set of "well chosen" swarm configs, based on empirical data about how many things are needed for bugs). |
This is true for us too! The difference is probably mostly one of guarantees. If your reduction relies on reinterpreting things then it tends to only succeed if you get lucky - e.g. because that part of the test case doesn't matter. Reduction tries enough things that it has a lot of opportunities to get lucky so it's usually fine, but I prefer to design things so that they're guaranteed to make certain kinds of progress. Partly because I'm slightly obsessive, but also because I've found that reductions that rely on luck often end up being weirdly slow because they keep being able to make small amounts of progress which unlock other small amounts of progress. |
I have marking to do today, which is why I decided to suddenly do a thing I've been meaning to do for years and implement proper swarm testing in Hypothesis. Currently this is only used for stateful testing, because it's the place where it's most obviously a win.
Closes #1637 though we should open some new tickets for actually making use of it more broadly and including it in our public API. I will do that after this has been merged.