Implement swarm testing and use it for rule based stateful tests #2238

DRMacIver · 2019-11-27T13:37:28Z

I have marking to do today, which is why I decided to suddenly do a thing I've been meaning to do for years and implement proper swarm testing in Hypothesis. Currently this is only used for stateful testing, because it's the place where it's most obviously a win.

Closes #1637 though we should open some new tickets for actually making use of it more broadly and including it in our public API. I will do that after this has been merged.

DRMacIver · 2019-11-27T13:38:07Z

CC @agroce for interest.

agroce · 2019-11-27T15:02:33Z

I'll be really interested to see how this does!

DRMacIver · 2019-11-27T15:40:22Z

I'll be really interested to see how this does!

I will too, though unfortunately we don't have great ways of finding out. There's not currently any sort of established set of benchmarks for Hypothesis's stateful testing that would give us good feedback on its effectiveness.

hypothesis-python/tests/cover/test_stateful.py

agroce · 2019-11-27T17:18:31Z

@regehr probably will want to know this is in the works, too

Zac-HD

Super exciting to see this coming to Hypothesis 😁

hypothesis-python/RELEASE.rst

hypothesis-python/src/hypothesis/searchstrategy/featureflags.py

Zac-HD

🎉

Zac-HD · 2019-12-09T05:04:42Z

we should open some new tickets for actually making use of it more broadly and including it in our public API. I will do that after this has been merged.

@DRMacIver - anything specific in mind? I'm happy to write up some initial directions if that would help.

agroce · 2019-12-10T16:38:42Z

So in Hypothesis, what does swarm mean outside rule-based testing (there it's obvious there's a small finite choice set, and swarm is built for that)? characters and one_of are obvious; or is this going to live at a deeper level?

DRMacIver · 2019-12-10T16:45:22Z

is this going to live at a deeper level?

I don't think it's going to live at a deeper level - the plan is definitely to make it explicit.

As well as characters and one_of, we could also use it in the Lark integration with swarm features for each production rule.

I'm also keen to get it into the public API but that's blocked on sorting out printing of things where we only have the right repr for them at the end of the test execution - I know how to make this work, I just haven't yet.

agroce · 2019-12-10T16:58:24Z

DeepState has it now, though I am annoyed the binary rep of test T with exactly the same semantics is different under swarm, because it has to record the coin flips. Having a utility to convert tests to/from swarm rep will work but seems really klugy.

BTW did I get OneOf from your one_of? I was wondering why I used that rather than choose or McCarthy's amb... seeing it in hypothesis and thinking it looked nice might explain :)

agroce · 2019-12-10T17:00:05Z

(I can't even have DeepState "fix up" a test after it's done by removing the swarm and changing the choice bytes, because the input bytes are coming from AFL or libFuzzer or somebody, usually, and I don't really get to do anything but execute them)

DRMacIver · 2019-12-10T17:39:34Z

DeepState has it now, though I am annoyed the binary rep of test T with exactly the same semantics is different under swarm, because it has to record the coin flips. Having a utility to convert tests to/from swarm rep will work but seems really klugy.

Yeah, the fragility of the underlying choice sequence representation for test cases is the biggest draw back of the whole thing. There are a bunch of tricks that are being used in the Hypothesis implementation of swarm testing to offset that fragility, but they're more so that reduction works well than anything else, and they only work because Hypothesis has a lot of scope to rewrite the byte stream.

BTW did I get OneOf from your one_of? I was wondering why I used that rather than choose or McCarthy's amb... seeing it in hypothesis and thinking it looked nice might explain :)

It's entirely possible! It's been called that in Hypothesis from fairly early on (2015ish certainly). I don't remember the exact chain of events that lead to that naming.

agroce · 2019-12-10T17:55:57Z

Fortunately, reduction works fine, DeepState's reducer is pretty capable, and nicely tends to produce the minimal swarm config needed, even. In a sense, in AFL world, the swarm info is not "useless" bytes, because AFL may have tuned the choices, even if not needed in "this here" test.

I bet I got this from you, then!

DRMacIver · 2019-12-10T18:19:52Z

Fortunately, reduction works fine, DeepState's reducer is pretty capable, and nicely tends to produce the minimal swarm config needed, even. In a sense, in AFL world, the swarm info is not "useless" bytes, because AFL may have tuned the choices, even if not needed in "this here" test.

It would work fine in Hypothesis if we made all the decisions up front, but because the decisions are made lazily a little more care is needed. Essentially the problem is what happens if you delete the first place a decision is made.

agroce · 2019-12-10T18:28:11Z

Ok, we're lazy too, so we have the same issue, but it seems to work in practice. You probably have a much more hands-on view of the test, though; in some sense we're just a "test replay" API with a huge number of bells and whistles, except in symbolic mode, so bytes radically changing meaning in reduction is fine, so long as you satisfy the criteria. And in symbolic, for now I just always use the full configuration (though long term, I want to automatically concretize and fork on a set of "well chosen" swarm configs, based on empirical data about how many things are needed for bugs).

DRMacIver · 2019-12-10T18:36:05Z

bytes radically changing meaning in reduction is fine, so long as you satisfy the criteria

This is true for us too!

The difference is probably mostly one of guarantees. If your reduction relies on reinterpreting things then it tends to only succeed if you get lucky - e.g. because that part of the test case doesn't matter. Reduction tries enough things that it has a lot of opportunities to get lucky so it's usually fine, but I prefer to design things so that they're guaranteed to make certain kinds of progress. Partly because I'm slightly obsessive, but also because I've found that reductions that rely on luck often end up being weirdly slow because they keep being able to make small amounts of progress which unlock other small amounts of progress.

DRMacIver force-pushed the DRMacIver/stateful-swarms branch 2 times, most recently from fc2a23f to 7c90547 Compare November 27, 2019 14:17

Stranger6667 reviewed Nov 27, 2019

View reviewed changes

hypothesis-python/tests/cover/test_stateful.py Outdated Show resolved Hide resolved

Zac-HD added the new-feature entirely novel capabilities or strategies label Nov 28, 2019

Zac-HD reviewed Nov 28, 2019

View reviewed changes

hypothesis-python/RELEASE.rst Outdated Show resolved Hide resolved

hypothesis-python/RELEASE.rst Outdated Show resolved Hide resolved

hypothesis-python/src/hypothesis/searchstrategy/featureflags.py Show resolved Hide resolved

Zac-HD approved these changes Nov 28, 2019

View reviewed changes

DRMacIver added 12 commits November 28, 2019 13:45

Make sure to freeze data after use.

a6e5944

We can have two statistical events due to aborted tests

d8634e4

Add a strategy for swarm testing

c645ccd

Add test to demonstrate the problem

5b00908

Use swarm testing to enable/disable rules in stateful testing

8032120

Add release file

493bb25

Fix format for reproduce failure tests

7750be4

That needs to be a list on Python 2

bc9cd73

Fix typo

5d69404

Add warning comment about order of checks

bce9d2a

Improve RELEASE.rst

02e4a30

Allow explicit construction of FeatureFlags

42ee3be

DRMacIver force-pushed the DRMacIver/stateful-swarms branch from d8a876c to 42ee3be Compare November 28, 2019 13:45

DRMacIver merged commit eaac6c3 into master Nov 28, 2019

DRMacIver deleted the DRMacIver/stateful-swarms branch November 28, 2019 15:19

Zac-HD mentioned this pull request Oct 17, 2020

Expand our use of swarm testing #2643

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement swarm testing and use it for rule based stateful tests #2238

Implement swarm testing and use it for rule based stateful tests #2238

DRMacIver commented Nov 27, 2019

DRMacIver commented Nov 27, 2019

agroce commented Nov 27, 2019

DRMacIver commented Nov 27, 2019

agroce commented Nov 27, 2019

Zac-HD left a comment

Zac-HD left a comment

Zac-HD commented Dec 9, 2019

agroce commented Dec 10, 2019

DRMacIver commented Dec 10, 2019

agroce commented Dec 10, 2019

agroce commented Dec 10, 2019

DRMacIver commented Dec 10, 2019

agroce commented Dec 10, 2019

DRMacIver commented Dec 10, 2019

agroce commented Dec 10, 2019

DRMacIver commented Dec 10, 2019

Implement swarm testing and use it for rule based stateful tests #2238

Implement swarm testing and use it for rule based stateful tests #2238

Conversation

DRMacIver commented Nov 27, 2019

DRMacIver commented Nov 27, 2019

agroce commented Nov 27, 2019

DRMacIver commented Nov 27, 2019

agroce commented Nov 27, 2019

Zac-HD left a comment

Choose a reason for hiding this comment

Zac-HD left a comment

Choose a reason for hiding this comment

Zac-HD commented Dec 9, 2019

agroce commented Dec 10, 2019

DRMacIver commented Dec 10, 2019

agroce commented Dec 10, 2019

agroce commented Dec 10, 2019

DRMacIver commented Dec 10, 2019

agroce commented Dec 10, 2019

DRMacIver commented Dec 10, 2019

agroce commented Dec 10, 2019

DRMacIver commented Dec 10, 2019