-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an option to not fail on value-based casting errors #113
Comments
Well ideally we don't test for strictness already! But likely we haven't got this right 100% of the time—any examples off the top of your head? (excluding signature tests, which are being reworked—noted strictness in #110 (comment)) |
Thanks @honno that's good to know! I have a few errors for Dask that I thought were because of strictness, but I may be wrong. I will have another look at them and report back. |
Most of the errors seem to be to do with value-based casting differences. I've pasted a failing Array API test below, but here's a summary: import numpy as np
import numpy.array_api as nxp
assert (np.array([1], dtype=np.uint8) + 256).dtype == np.uint16 # 256 cannot fit in uint8, so promoted to uint16
assert (nxp.asarray([1], dtype=nxp.uint8) + 256).dtype == nxp.uint8 I wonder if it would be possible to have a flag to ignore failures due to value-based casting differences? They can occur in many of the tests in This would help when running the tests against the main NumPy namespace to check compliance there too. Also, there's work in numpy/numpy#21103 to change the behaviour of NumPy, but it's in an early stage.
|
Ok @tomwhite, I've puzzled on this for a bit, but think I understand now 😅 Please clarify anything I might of misunderstood!
So in this scenario, we don't test the "logic" of array-api-tests/array_api_tests/test_operators_and_elementwise_functions.py Lines 210 to 213 in 9816011
i.e. if the reference implementation (in this case But I do think In any case, this is a problem for testing Dask right now, and as you say testing NumPy-proper. I should explore reworking how elements are generated again—as you see, we generate them naively (for the most part), and throw away bad examples after-the-fact. IIRC, I couldn't find an easy way of never generating inputs which aren't quite valid for many of the Array API operations, but since then we've gone through a nice refactor that might lead to a more manageable solution. (FYI testing NumPy-proper won't work in the test suite upstream, but you can use #112) Hmm just to clarify, that failing >>> from dask import array as da
>>> x1 = da.asarray(257, dtype="uint32")
>>> x2 = da.asarray([1], dtype="uint8")
>>> int(da.bitwise_and(x1, x2))
ValueError: ... But if the right argument was a 0d array as opposed to a 1d array, this works as expected: >>> x2 = da.asarray(1, dtype="uint8")
>>> out = da.bitwise_and(x1, x3)
>>> out
dask.array<bitwise_and, shape=(), dtype=uint32, chunksize=(), chunktype=numpy.ndarray>
>>> int(out)
1 Is the issue here actually to do with broadcasting behaviour? 0d arrays acting as NumPy "scalars", and so inversely being compliant? Or is this actually a value-based casting issue? Another example I think like this can be seen in >>> int(da.add(da.asarray(255, dtype="uint32"), da.asarray(1, dtype="uint8")))
256
>>> int(da.add(da.asarray(255, dtype="uint32"), da.asarray([1], dtype="uint8")))
>>> 0 # should be 256 - the final array is uint32 but I'm guessing uint8 as an intermediate array caused problems |
Thanks for investigating @honno. Sorry I haven't been able to characterise the problem very clearly.
That would be great. In general, testing that value-based casting is not happening is what we want (since that's what the spec says), but the ability to disable that check across all tests would be handy.
I think that could be a problem, even if it's not exactly the one that I'm hitting here. BTW I've created another branch of Dask for the Array API implementation. It includes a workflow to run it here to track how the implementation is going. |
I should of clarified—what seems to be the problem is that for both NumPy-proper and your And yep if there still end up being areas where Dask has erroneous behaviour due to value-based casting for in-scope scenarios, I can see the utility of such a flag. I'm pretty sure this is going to be a problem with NumPy-proper, due to internal "NumPy scalar" shenanigans. But yeah we can revisit this out after I sort out the aforementioned issue. |
Sounds great - thanks @honno! |
@tomwhite Okay I see why you weren't sure if what I was saying was the problem you were facing heh, as it doesn't look like fixing the problem I identified (testing output dtypes/shapes from unspecified operations) does much for your failing Soooo to go back to the failing
|
Like @honno said, the test suite shouldn't be checking anything that isn't explicitly required by the spec. If it is, that's a bug in the test suite. If you want to implement "strictness" in your library you'll have to add tests for it in your own test suite (e.g., virtually all of the numpy.array_api tests are for strictness, leaving the other tests to the array_api_tests test suite).
"Value-based casting" is a pretty broad term. NumPy has somewhat specific instances where it applies it (with some exceptions), and presumably Dask follows NumPy (?), but there's no reason some other library might be implementing value-based casting in other ways. Really there's not a distinction in the test suite between "value-based casting" and "correct type promotion". The only difference is that "value-based casting" might be different depending on the value, but that's just a question of which values hypothesis happens to generate for the example arrays. So it sounds like what you would really want is either a) more general control either over what types of inputs hypothesis generates as inputs to the tests (i.e., never generate 0-D inputs, because you know those have value-based casting), or b) a better ability to ignore certain checks (like tests for type promotion) while still being able to test other things like shapes and other semantics. Both of these are somewhat challenging to do. The first I would say is probably more challenging just due to the way hypothesis works, unless @honno can think of any clever ways to work around it. The best thing I can think of would be to manually patch the hypothesis_helpers file and/or the @given decorators for specific tests to restrict the inputs. For instance, if you know a test will fail for 0-D inputs but still want it to run on non-0-D inputs you could manually add some The second would require refactoring the tests so that instead of having one test function per spec function, we have multiple test functions, effectively one per assert, so that you can easily disable the checks that you already know won't work (like type promotion) while still checking the others. This would be a pretty significant reworking of the test suite, and it's one we've resisted doing because it would add a lot of complexity to the code. It's also hard because sometimes the checks actually depend on each other, so it's not as simple as just splitting a test function into multiple functions. Also, as I noted above, hypothesis does report individual errors separately, and we've specifically given each assertion error a different message so they can be reported distinctly. So as long as some inputs can pass the first checks in a test function to get to the later ones, you will see errors from those later ones. |
Thanks for the detailed response @asmeurer. I can see that it's not particularly straightforward to control for the failures in these type promotion edge cases. I've run the test suite a few more times against Dask with a higher setting for max examples, which has found a few more of these edge cases - and there aren't as many as I first feared. (Current list of tests to skip: https://github.com/tomwhite/dask/blob/60405597c2777f0a40216cc1f517fab987591ce0/.github/workflows/array-api.yml#L116-L145.) So in light of your second point (how hypothesis does report individual errors separately), I think the current behaviour is actually fine, and I'd be happy to close this issue. |
The new (and very useful!) document on Array API Standard Compatibility has a type of difference (with regular NumPy) called strictness for "[things that] aren’t actually required by the spec, and other conforming libraries may not follow them".
As implementations start implementing the spec, it would be useful to be able to run the tests with a flag saying don't fail on strictness errors.
The text was updated successfully, but these errors were encountered: