Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure boundary cases are drawn by strategies #1847

Closed
cxong opened this issue Feb 28, 2019 · 2 comments
Closed

Ensure boundary cases are drawn by strategies #1847

cxong opened this issue Feb 28, 2019 · 2 comments

Comments

@cxong
Copy link
Contributor

cxong commented Feb 28, 2019

The range strategies, e.g. BoundedIntStrategy, seem to draw examples throughout the range. E.g.

from hypothesis.strategies import integers
s = integers(min_value=0, max_value=1000)
for _ in range(100):
    print(s.example())
    
186
472
808
843
240
81
262
138
37
754
682
...

The above example typically won't contain 0 or 1000, but defects tend to occur at the boundaries so a more useful strategy would try to always include the min and max values.

I could work around this by using explicit example but this cannot be used with data. I could also use a hack like st.draw(st.integers().map(lambda x: max(0, min(1000, x))) which will end up testing with the min/max values a lot. Is there a better way? Otherwise the strategy should minimise developer effort, and just try out boundary values all the time.

Here's a naive implementation; I'm not familiar with hypothesis design so not sure if this will break some other things:

class BoundedIntStrategy(SearchStrategy):
    """A strategy for providing integers in some interval with inclusive
    endpoints."""

    def __init__(self, start, end):
        SearchStrategy.__init__(self)
        self.start = start
        self.end = end
        self._start_used = False
        self._end_used = False

    def do_draw(self, data):
        if not self._start_used:
            self._start_used = True
            return self.start
        elif not self._end_used:
            self._end_used = True
            return self.end
        return d.integer_range(data, self.start, self.end)
@Zac-HD
Copy link
Member

Zac-HD commented Feb 28, 2019

Hi @cxong - we've thought about this one a fair bit, actually!

Unfortunately the naive approach does indeed break lots of important invariants. It's possible to implement a version which works; but then you have a serious problem in how to tune the heuristic - integers are often used in ways where it doesn't make sense to just generate the endpoints more often. See #1754 for example, where we actually reduced the occurrence of large numbers!

We'd also need to think about frequency in relation to the max_examples setting, and there's a bunch of other interlocking pieces. Longer term, we want to handle all these cases with a better fuzzing mode (#171), e.g. including swarm testing (#1637).

So I don't think we're up for changing this at the moment, but it's fantastic to see you're interested in this level of detail! Maybe I'll see you at PyCon AU in Sydney this year?

@Zac-HD Zac-HD closed this as completed Feb 28, 2019
@Zalathar
Copy link
Contributor

Note that calling example repeatedly isn't a good simulation of how the strategy will behave during a real test run. In fact, example will go out of its way to not return the simplest possible example (such as 0 in this case).

Hypothesis already has some internal heuristics that should make it pretty good at generating 0, or the boundary closest to 0.

Consistently generating the opposite boundary is more difficult, unfortunately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants