Ensure boundary cases are drawn by strategies #1847

cxong · 2019-02-28T05:25:56Z

The range strategies, e.g. BoundedIntStrategy, seem to draw examples throughout the range. E.g.

from hypothesis.strategies import integers
s = integers(min_value=0, max_value=1000)
for _ in range(100):
    print(s.example())
    
186
472
808
843
240
81
262
138
37
754
682
...

The above example typically won't contain 0 or 1000, but defects tend to occur at the boundaries so a more useful strategy would try to always include the min and max values.

I could work around this by using explicit example but this cannot be used with data. I could also use a hack like st.draw(st.integers().map(lambda x: max(0, min(1000, x))) which will end up testing with the min/max values a lot. Is there a better way? Otherwise the strategy should minimise developer effort, and just try out boundary values all the time.

Here's a naive implementation; I'm not familiar with hypothesis design so not sure if this will break some other things:

class BoundedIntStrategy(SearchStrategy):
    """A strategy for providing integers in some interval with inclusive
    endpoints."""

    def __init__(self, start, end):
        SearchStrategy.__init__(self)
        self.start = start
        self.end = end
        self._start_used = False
        self._end_used = False

    def do_draw(self, data):
        if not self._start_used:
            self._start_used = True
            return self.start
        elif not self._end_used:
            self._end_used = True
            return self.end
        return d.integer_range(data, self.start, self.end)

The text was updated successfully, but these errors were encountered:

Zac-HD · 2019-02-28T05:54:27Z

Hi @cxong - we've thought about this one a fair bit, actually!

Unfortunately the naive approach does indeed break lots of important invariants. It's possible to implement a version which works; but then you have a serious problem in how to tune the heuristic - integers are often used in ways where it doesn't make sense to just generate the endpoints more often. See #1754 for example, where we actually reduced the occurrence of large numbers!

We'd also need to think about frequency in relation to the max_examples setting, and there's a bunch of other interlocking pieces. Longer term, we want to handle all these cases with a better fuzzing mode (#171), e.g. including swarm testing (#1637).

So I don't think we're up for changing this at the moment, but it's fantastic to see you're interested in this level of detail! Maybe I'll see you at PyCon AU in Sydney this year?

Zalathar · 2019-02-28T11:42:08Z

Note that calling example repeatedly isn't a good simulation of how the strategy will behave during a real test run. In fact, example will go out of its way to not return the simplest possible example (such as 0 in this case).

Hypothesis already has some internal heuristics that should make it pretty good at generating 0, or the boundary closest to 0.

Consistently generating the opposite boundary is more difficult, unfortunately.

Zac-HD closed this as completed Feb 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure boundary cases are drawn by strategies #1847

Ensure boundary cases are drawn by strategies #1847

cxong commented Feb 28, 2019

Zac-HD commented Feb 28, 2019

Zalathar commented Feb 28, 2019

Ensure boundary cases are drawn by strategies #1847

Ensure boundary cases are drawn by strategies #1847

Comments

cxong commented Feb 28, 2019

Zac-HD commented Feb 28, 2019

Zalathar commented Feb 28, 2019