Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand our use of swarm testing #2643

Open
Zac-HD opened this issue Oct 17, 2020 · 1 comment
Open

Expand our use of swarm testing #2643

Zac-HD opened this issue Oct 17, 2020 · 1 comment
Labels
new-feature entirely novel capabilities or strategies

Comments

@Zac-HD
Copy link
Member

Zac-HD commented Oct 17, 2020

To paraphrase Swarm Testing (Groce et al, 2012),

Swarm testing is way to improve the diversity of generated test cases. Instead of potentially including all features in every test case, a large “swarm” of randomly generated configurations is used, each of which omits some features. ... First, some features actively prevent the system from executing interesting behaviors; e.g., pop calls may prevent an overflow bug from executing. Second, test features compete for space in each test, limiting the depth to which logic driven by features can be explored. Experimental results show that swarm testing increases coverage and can improve fault detection dramatically.

I first proposed that Hypothesis should use this trick in #1637, and a more advanced and shrinker-friendly variant was implemented in #2238 - but only used in rule-based stateful tests (where it has been very useful). In this issue I propose adding swarm testing logic in three more areas, though still without a public API.

st.one_of()

This is perhaps the most obvious place to add swarm testing - just disable a subset of the strategies being combined. It's also common enough that doing so might have performance implications, but "measure, don't guess"; and example quality may justify a slight slowdown anyway.

In conversation with @Stranger6667 we estimated that this would cover most downstream use-cases, which makes me inclined to keep swarm testing as an implementation detail with no public API at least for now.

Unicode strings (i.e. st.characters())

AKA #1401. This is a little trickier, as we'd be making many swarm-decisions (hence high overhead ratio of metadata to actual generated data), and the "shrink open" trick would need several layers. Performance more likely to be a problem. I can imagine memoizing our way out of that with chained lookups and the "make your own luck" trick, but we'll see.

from_lark() and grammar-based strategies

This is the original use-case for swarm testing, in CSmith, and I'd really like it to work for hypothesmith.

The complexity here is that we would want to analyse the grammar to decide the order in which to consider disabling production rules, and also ensure that the logic is aware of dependencies between productions. I'm pretty sure that I've seen John Regehr write about this somewhere, but can't find the paper or post now.

@Zac-HD Zac-HD added the new-feature entirely novel capabilities or strategies label Oct 17, 2020
@auvipy
Copy link

auvipy commented Nov 12, 2020

this seems great

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new-feature entirely novel capabilities or strategies
Projects
None yet
Development

No branches or pull requests

2 participants