-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid importing from dask_cudf.core
#593
Avoid importing from dask_cudf.core
#593
Conversation
@rjzamora adding the following env variable to both
DASK_DATAFRAME__QUERY_PLANNING=False pytest \
DASK_DATAFRAME__QUERY_PLANNING=False python -m pytest -n 8 ./python/cuxfilter/tests |
Thanks @AjayThorve ! Just to clarify: My primary goal here is to find/fix the problems with |
okay, that makes sense! Thanks for the PR! |
I am only seeing a failure in one test now ( As far as I can tell, we are comparing the following output DataFrame:
...to an "expected" DataFrame:
Before I dig in too far here, is it possible that both results are correct? That is, does the order of the rows matter here, or just the relationship between |
dask_cudf.core
dask_cudf.core
) | ||
|
||
assert res.to_pandas().equals(result.to_pandas()) | ||
assert_eq(res, result, check_divisions=False, check_index=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can only use check_index=False
if the ordering of rows is not important. Not sure if this is the case for calc_connected_edges
, but I have a feeling it might be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any thoughts @AjayThorve ? Seems like this PR resolves query-planning failures if this test can be modified. Otherwise, I'll need to investigate the differences in row ordering further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, the ordering is not important for calc_connected_edges, so it's safe to use check_index=False.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm! Thanks for this @rjzamora
@rjzamora should I go ahead and merge this? Lgtm. |
Seems fine to me as long as you think the |
/merge |
This PR is intended to resolve test failures related to the recent dask-expr migration in
dask.dataframe
. I noticed that cuxfilter was importing fromdask_cudf.core
(no longer allowed when query-planning is enabled). There may be other issues as well.