Distribution of `floats()` has regressed its ability to find bugs since 1.11.0 #469

alexwlchan · 2017-02-19T21:53:24Z

There’s an example in an old PyCon UK talk that would reliably find a failing test case in Hypothesis 1.11.0:

import math
from hypothesis import given, assume, example
from hypothesis.strategies import lists, floats


def mean(xs):
    return sum(x / len(xs) for x in xs)
    

@given(lists(floats(), min_size=1))
def test_mean(xs):
    assume(not any(math.isnan(x) or math.isinf(x) for x in xs))
    assert min(xs) <= mean(xs) <= max(xs)

xs = [1.0, 1.0, 1.0, 1.0, 1.0, 1.0]

    @example([1.0] * 6)
    @given(lists(floats(), min_size=1))
    def test_mean(xs):
        assume(not any(math.isnan(x) or math.isinf(x) for x in xs))
>       assert min(xs) <= mean(xs) <= max(xs)
E       assert 1.0 <= 0.9999999999999999
E        +  where 1.0 = min([1.0, 1.0, 1.0, 1.0, 1.0, 1.0])
E        +  and   0.9999999999999999 = mean([1.0, 1.0, 1.0, 1.0, 1.0, 1.0])

means.py:17: AssertionError

If you run this example on Hypothesis 3.6.1, most of the time this won’t trigger a failure – or when you do, it isn’t as nice (xs=[5e-324, 5e-324] has been found on three independent systems).

According to the changelog, the distribution of floats() changed between 1.11.0 and 1.11.1 – and I can’t trigger the example in 1.11.1.

It would be good to understand what caused this example to fall off the radar – and in particular, whether there are other floating-point bugs that are no longer being caught.

The text was updated successfully, but these errors were encountered:

Zac-HD · 2017-03-23T02:38:24Z

I think you're actually identifying two separate issues:

The simplified example is harder to understand

Minimisation is hard, etc. - but I actually think the xs=[5e-324, 5e-324] example is clearer. With ones, you have to understand a buildup of imprecision; with large values you have to recognise overflow to infinity. Basically, I think the 'shorter lists are better' heuristic is a good one.

E       assert inf <= 5e-324
E        +  where 5e-324 = min([5e-324, 5e-324])
E        +  and   inf = mean([5e-324, 5e-324])

This doesn't look like an issue to me.

Failing examples are much less common than in earlier versions

While we obviously want failing examples, I think the effectiveness of the distribution of examples is influenced more by the test function than the distribution. If the new distribution is better at finding logic bugs but worse at finding floating-point edge cases, IMO that's a net win. Of course, it would be best to do both, so the real question is 'what parts of the floating-point distribution are not being explored any more?'

DRMacIver · 2017-03-23T08:27:08Z

Yeah, now that I think about it (@alexwlchan and I had talked about this before filing), the failing test case presented is actually absolutely correct by modern Hypothesis's heuristics for less data always being better - it was correct on the old heuristics as well, but due to reasons old Hypothesis would never have found that shrink.

RE the distribution: It would not surprise me to learn that the distribution of floating point numbers has got worse in some manner, but the problem is that we don't currently have any sort of good empirical data about what sort of floating point bugs people actually care about and how to trigger them, so the distribution is rather guess work. I'm definitely happy to consider finding this bug less reliably as at least suggestive of an error, if not concrete proof.

Zac-HD · 2017-09-08T05:24:19Z

Closed by #816?

DRMacIver · 2017-09-08T08:55:52Z

The minimization quality part is. I don't think the bug finding part is though (it might - integer floats are now significantly more likely, which I've only just realised I should have highlighted - but I don't know if that's enough.

Zac-HD · 2017-10-04T05:32:02Z

@given(lists(floats(allow_infinity=False, allow_nan=False, min_value=1), min_size=1))
def test_mean(xs):
    mean = sum(x / len(xs) for x in xs)
    assert min(xs) <= mean <= max(xs), mean

With Hypothesis 3.31.2, this code always finds the 6 * [1.0] counterexample - but if the min_value=1 is removed, it doesn't. Setting min_value=0.9 finds a list of 0.9 as the falsifying example; setting it to zero finds no falsifying example at all.

Zac-HD · 2018-10-19T12:54:35Z

I haven't done a full sensitivity analysis, but in practice to trigger this assertion we have to generate a list of length >=6 consisting of a single floating-point value. The following test reliably finds the [1.0] * 6 example as of Hypothesis 3.78.0, and is numerically equivalent:

@given(integers(1, 100), floats(allow_infinity=False, allow_nan=False, min_value=1))
def test_mean_effective(n, x):
    mean = sum(x / n for _ in range(n))
    assert x == mean

So implementing swarm testing (#1637) will fix this but anything less probably won't.

Zac-HD · 2019-07-09T13:47:32Z

Closing this issue because I don't think there's a good fix short of swarm testing, and we have #1637 for that.

Zac-HD added the enhancement it's not broken, but we want it to be better label Oct 19, 2018

Zac-HD closed this as completed Jul 9, 2019

Seelengrab mentioned this issue Jul 18, 2024

generate floats using lexographically ordered encoding Seelengrab/Supposition.jl#49

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distribution of `floats()` has regressed its ability to find bugs since 1.11.0 #469

Distribution of `floats()` has regressed its ability to find bugs since 1.11.0 #469

alexwlchan commented Feb 19, 2017

Zac-HD commented Mar 23, 2017

DRMacIver commented Mar 23, 2017

Zac-HD commented Sep 8, 2017

DRMacIver commented Sep 8, 2017

Zac-HD commented Oct 4, 2017

Zac-HD commented Oct 19, 2018

Zac-HD commented Jul 9, 2019

Distribution of floats() has regressed its ability to find bugs since 1.11.0 #469

Distribution of floats() has regressed its ability to find bugs since 1.11.0 #469

Comments

alexwlchan commented Feb 19, 2017

Zac-HD commented Mar 23, 2017

DRMacIver commented Mar 23, 2017

Zac-HD commented Sep 8, 2017

DRMacIver commented Sep 8, 2017

Zac-HD commented Oct 4, 2017

Zac-HD commented Oct 19, 2018

Zac-HD commented Jul 9, 2019

Distribution of `floats()` has regressed its ability to find bugs since 1.11.0 #469

Distribution of `floats()` has regressed its ability to find bugs since 1.11.0 #469