Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bizarre quirk with how Hypothesis formats the error message for falsifying examples #4183

Closed
tmaxwell-anthropic opened this issue Nov 22, 2024 · 3 comments · Fixed by #4239
Closed
Labels
bug something is clearly wrong here legibility make errors helpful and Hypothesis grokable

Comments

@tmaxwell-anthropic
Copy link

tmaxwell-anthropic commented Nov 22, 2024

Consider this test:

counter = 0

@hypothesis.given(dummy=st.text())
def test_that_prints_rich_error(dummy: str):
    global counter
    counter += 1
    if counter % 5 == 0:
        raise ExceptionGroup(
            "outer exceptiongroup", [ZeroDivisionError("this is a fake error")]
        )

It will fail nondeterministically, of course. Hypothesis reports a rich error message describing the failure:

...
  |   File "/opt/homebrew/Caskroom/miniforge/base/envs/py311/lib/python3.11/site-packages/hypothesis/core.py", line 1722, in wrapped_test
  |     raise the_error_hypothesis_found
  | hypothesis.errors.FlakyFailure: Hypothesis test_that_prints_rich_error(dummy='\x02\x1d\U000c4bc0') produces unreliable results: Falsified on the first call but did not on a subsequent one (1 sub-exception)
  | Falsifying example: test_that_prints_rich_error(
  |     dummy='\x02\x1d\U000c4bc0',
  | )
  | Failed to reproduce exception. Expected:
  | + Exception Group Traceback (most recent call last):
  |   |   File "/Users/tmaxwell/code/anthropic/belt/tests/test_hypothesis_utils.py", line 399, in test_that_prints_rich_error
  |   |     raise ExceptionGroup(
  |   | ExceptionGroup: outer exceptiongroup (1 sub-exception)
  |   +-+---------------- 1 ----------------
  |     | ZeroDivisionError: this is a fake error
  |     +------------------------------------
  |
  | Explanation:
  |     These lines were always and only run by failing examples:
  |         /Users/tmaxwell/code/anthropic/belt/tests/test_hypothesis_utils.py:399
  +-+---------------- 1 ----------------
    | Exception Group Traceback (most recent call last):
    ...

Now consider this one:

counter = 0

@hypothesis.given(dummy=st.text())
def test_that_does_not_print_rich_error(dummy: str):
    global counter
    counter += 1
    if counter % 5 == 0:
        try:
            raise ZeroDivisionError("this is a fake error")
        except* Exception as e:
            raise

This also fails nondeterministically, but with a much less helpful message:

...
  |   File "/opt/homebrew/Caskroom/miniforge/base/envs/py311/lib/python3.11/site-packages/hypothesis/core.py", line 1722, in wrapped_test
  |     raise the_error_hypothesis_found
  | ExceptionGroup:  (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/Users/tmaxwell/code/anthropic/belt/tests/test_hypothesis_utils.py", line 389, in test_that_does_not_print_rich_error
    |     raise ZeroDivisionError("this is a fake error")
    | ZeroDivisionError: this is a fake error
    +------------------------------------

What is the difference...? I feel like I'm going insane.

This is with Hypothesis v6.112.1.

@tmaxwell-anthropic
Copy link
Author

I tried @settings(phases=(Phase.generate,)) in case it was somehow related to the two tests having different values in the example database, but that didn't change anything. Also, it occurs pretty reliably even if I change other aspects of the test. (I found this by bisecting a much more complicated example.)

@Zac-HD Zac-HD added bug something is clearly wrong here legibility make errors helpful and Hypothesis grokable labels Nov 22, 2024
@jobh
Copy link
Contributor

jobh commented Jan 8, 2025

This seems to happen because the implicit ExceptionGroup in case 2 has empty traceback, and filepath here ends up as hypothesis/core.py (and hence treated as fatal)

# If an unhandled (i.e., non-Hypothesis) error was raised by
# Hypothesis-internal code, re-raise it as a fatal error instead
# of treating it as a test failure.
filepath = traceback.extract_tb(e.__traceback__)[-1][0]
if is_hypothesis_file(filepath) and not isinstance(e, HypothesisException):
raise

@jobh
Copy link
Contributor

jobh commented Jan 8, 2025

Empty was the wrong word, it would be more precise to say that the implicit ExceptionGroup — according the traceback — seems to be constructed by the caller (hypothesis) rather than the raiser (method under test).

I'm not planning to follow this up right now, but at least a pointer in the right direction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something is clearly wrong here legibility make errors helpful and Hypothesis grokable
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants