-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failing health check #115
Comments
@mattbarrett98 Would you know which test case was this for? This is most likely a problem on our end! |
It seems to pop up for a variety of functions. From memory I've seen it for test_reshape, test_inv, test_remainder and I think I'm probably forgetting a few. Would it be possible to add a flag to allow disabling of the health check? Thanks! |
Okay I just tried testing with jax and the test suite won't work with it anyway (missing namespaced dtypes e.g. Anywho, I can't figure out why say a test like
Not keen on this unless it becomes a common use case, for now see https://hypothesis.readthedocs.io/en/latest/settings.html (e.g. you could decorate the failing tests with |
We're diving deeper into what the exact issue is right now. We'll try to provide a minimal example ASAP. For a bit more context, this is an example commit where the health check has failed, tested against this Array API test suite commit, with this unit test. A stack trace for the failure is as follows:
Before continuing the discussion, we'll determine whether Ivy-specific inefficiencies are causing the failures, or perhaps JAX-specific slowdowns on the first forward pass during the JIT compilation or something similar. We'll sync back when we know a bit more. |
The health checks are there for a reason, so it would a bad idea to just ignore them. The most likely cause is that our test generation strategies are written poorly in some way. Hypothesis health checks are also very often a symptom of a much more serious problem, like a strategy that doesn't actually generate what we thought it did. So it's always a good idea to dig into this when it happens. Of course it could just be the case that the health check is just the array library being slower than hypothesis expects. If we determine that's really all that's going on here, it might make sense to modify the hypothesis array_api submodule to be smarter about this. It looks like this is the same test that from #117, which has some other problems (by all accounts it isn't generating what it should be), so there really is some more serious issue going on here. |
Perhaps it was confusing to show you the health check for matrix_power which has another issue. Here's another example for reshape
and we experience the same thing for other functions, always with JAX, we never experience issues with NumPy, torch or tensorflow. |
My guess is the unhealthy generation is coming from how |
I have done some timings of just the execution of inv in
and when using JAX the timings and inputs look like this:
Every time a new shape or a new dtype is used, presumably JAX has to jit compile again resulting in very slow runs. The last time (0.4795) is about 580 times slower than the previous already compiled run (0.0008). You can see that when a shape and dtype is reused, the execution is much quicker. Naturally hypothesis will cover many different shapes and types, meaning that JAX is constantly recompiling functions. For comparison, these are the first few timings when using NumPy:
The average time here being about 0.0003, 500x faster than JAX's average of 0.15, which seems sufficient to cause hypothesis' speed concerns. The health checks being caused by JAX's frequent jit recompilation is also supported by the fact that we don't experience issues with NumPy, tensorflow or torch. |
Ah thanks for the hard numbers, that's really good to know. There's still a problem of faulty array creation/manipulation (i.e. why are there invalid examples for Ideally we'd wait and see how an official JAX Array API namespace works, but no idea when that'll happen. As @asmeurer said we really really don't want to ignore health checks—maybe a slightly-less-worse solution is introducing a flag which can change |
Makes sense 🙂 Although for some like
so this might just be happening for the reason explained before. When I add
to the code it resolves the issue. This has the effect of disabling only the too_slow health check. Adding a flag for deadline may be useful as well but this doesn't resolve the errors we get related to the too_slow health check. Would it not also be possible to add a flag which allows the user to disable specific health checks. So in our case this would enable us to disable just the too_slow health check and leave all others enabled (and just leave everything enabled when JAX isn't the backend). As long as the default behaviour for the test suite is that all health checks are enabled then would that be okay? |
Note even 1 invalid example should very rarely happen for a first-party Hypothesis strategy, and I don't see how it could for
As a last resort 😅 |
okay thanks 🙂 I've just run test_reshape against the NumPy array api (not with Ivy) to see how it compares with invalid examples. I get
The test statistics show 196 invalid examples, should this not be occurring? |
I'm guessing all that filtering is from our custom strategies, like here
and everything we use from |
So we should expect some invalid examples to occur? Before you were questioning why this was happening |
We expect invalid examples for our custom strategies, certainly. But I'm assuming* the ones you saw where from *seems quite likely for multiple reasons, but yes this all needs a proper review |
@mattbarrett98 Yeah so running your FYI to run import ivy
ivy.set_framework("jax")
params = [pytest.param(ivy, make_strategies_namespace(ivy), id="ivyjax")] Now this could very well be a hard limitation of JAX's use of JIT, where we might struggle without indeed introducing a flag like |
This all makes sense, thanks for looking into it 🙂 I've actually just found a way to disable the too_slow health check on our end to allow us to just focus on checking the functionality. We no longer seem to have any issues with health checks. Thanks again! |
HypothesisWorks/hypothesis#3369 might be relevant, will explore but low prio. |
E hypothesis.errors.FailedHealthCheck: Data generation is extremely slow: Only produced 6 valid examples in 1.04 seconds (14 invalid ones and 2 exceeded maximum size). Try decreasing size of the data you're generating (with e.g. max_size or max_leaves parameters). E See https://hypothesis.readthedocs.io/en/latest/healthchecks.html for more information about this. If you want to disable just this health check, add HealthCheck.too_slow to the suppress_health_check settings for this test.
When testing against JAX this error tends to pop up every now and then, is there any easy way to disable this health check?
The text was updated successfully, but these errors were encountered: