-
Notifications
You must be signed in to change notification settings - Fork 587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distribution of floats()
has regressed its ability to find bugs since 1.11.0
#469
Comments
I think you're actually identifying two separate issues:
Minimisation is hard, etc. - but I actually think the
This doesn't look like an issue to me.
While we obviously want failing examples, I think the effectiveness of the distribution of examples is influenced more by the test function than the distribution. If the new distribution is better at finding logic bugs but worse at finding floating-point edge cases, IMO that's a net win. Of course, it would be best to do both, so the real question is 'what parts of the floating-point distribution are not being explored any more?' |
Yeah, now that I think about it (@alexwlchan and I had talked about this before filing), the failing test case presented is actually absolutely correct by modern Hypothesis's heuristics for less data always being better - it was correct on the old heuristics as well, but due to reasons old Hypothesis would never have found that shrink. RE the distribution: It would not surprise me to learn that the distribution of floating point numbers has got worse in some manner, but the problem is that we don't currently have any sort of good empirical data about what sort of floating point bugs people actually care about and how to trigger them, so the distribution is rather guess work. I'm definitely happy to consider finding this bug less reliably as at least suggestive of an error, if not concrete proof. |
Closed by #816? |
The minimization quality part is. I don't think the bug finding part is though (it might - integer floats are now significantly more likely, which I've only just realised I should have highlighted - but I don't know if that's enough. |
@given(lists(floats(allow_infinity=False, allow_nan=False, min_value=1), min_size=1))
def test_mean(xs):
mean = sum(x / len(xs) for x in xs)
assert min(xs) <= mean <= max(xs), mean With Hypothesis 3.31.2, this code always finds the |
I haven't done a full sensitivity analysis, but in practice to trigger this assertion we have to generate a list of length >=6 consisting of a single floating-point value. The following test reliably finds the @given(integers(1, 100), floats(allow_infinity=False, allow_nan=False, min_value=1))
def test_mean_effective(n, x):
mean = sum(x / n for _ in range(n))
assert x == mean So implementing swarm testing (#1637) will fix this but anything less probably won't. |
Closing this issue because I don't think there's a good fix short of swarm testing, and we have #1637 for that. |
Via @pjdelport in IRC:
There’s an example in an old PyCon UK talk that would reliably find a failing test case in Hypothesis 1.11.0:
If you run this example on Hypothesis 3.6.1, most of the time this won’t trigger a failure – or when you do, it isn’t as nice (
xs=[5e-324, 5e-324]
has been found on three independent systems).According to the changelog, the distribution of
floats()
changed between 1.11.0 and 1.11.1 – and I can’t trigger the example in 1.11.1.It would be good to understand what caused this example to fall off the radar – and in particular, whether there are other floating-point bugs that are no longer being caught.
The text was updated successfully, but these errors were encountered: