-
-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Dask Arrow-based strings cause test failures #335
Comments
Please fill free to edit and fill this issue out Marvin 🙂 Just needed a placeholder for tracking |
Would also be good to make a note in issue (where feedback is being collected): dask/dask#10139 Ideally with a simple reproducer |
Despite the passing tests, potentially users who installed dask-image over conda would still experience the above described problem when using Perhaps a suitable fix for this on the Edit: dask-image/dask_image/ndmeasure/__init__.py Lines 243 to 247 in 67540af
and dask-image/dask_image/ndmeasure/_utils/_find_objects.py Lines 68 to 74 in 67540af
fixes the errors. |
I've made a comment here, but no reproducer (I'm not planning to do more work on this, it's open for anyone who wants it) |
Yeah think we are not seeing this in CI as it requires a newer version of Dask than we are testing. Perhaps we should upgrade one of the CI environments (like 3.11) to a very recent Dask version Tbh I've not looked deeply into the Dask Arrow work. Have heard about it mainly in passing. So not sure how Should add this pain point is not unique to us. We had to disable this feature in Dask-SQL recently as well ( dask-contrib/dask-sql#1206 ). Unclear whether this is due to upstream bugs or if we need to make changes |
We could add an "upstream" CI environment, that just uses whatever the latest (or even pre-release?) versions are, maybe? |
There are Dask nightly packages. So that would be easy to add |
As far as I understand, for reproducing the test failure in CI, next to a recent dask version we'd need |
Is this still an issue with recent Dask releases? Asking as they may have fixed something upstream since this occurred |
I'm not seeing any flaky/failing tests, so I don't think this is still happening currently. I'll close the issue, and if it pops up again we can re-open. |
Yeah I think part of the issue before was CI doesn't capture this edge case. Though maybe it should |
Reopening as this issue just came up in #355. |
Think this may have been fixed in the intervening time. The test suite no longer fails for me |
(Edited by @m-albert)
In the presence of
pyarrow
, dask by default assumes dataframes of type object to be pyarrow strings (see dask/dask#10139 (comment)).This creates problems revealed by failing tests (e.g.
test_dask_image/test_ndmeasure/test_find_objects.py::test_3d_find_objects
)dask-image/dask_image/ndmeasure/_utils/_find_objects.py
Lines 68 to 70 in 67540af
dd.from_delayed(df1, meta=meta).compute().dtypes
Working install:
Failing install:
The failing test had come up when releasing v2023.08.0 in conda-forge/dask-image-feedstock#14.
@jakirkham found that
pyarrow
is installed with the conda distribution of dask, but not when installing over pip, where it just part of the[complete]
target.Also @jakirkham found that the above described conflicting behaviour can be turned off using the dask configuration.
He did this for the tests performed by the dask-image conda feedstock on v2023.08.0.
The text was updated successfully, but these errors were encountered: