Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cudf.pandas failure on test_convert_matrix_order_cuml_array #5882

Conversation

dantegd
Copy link
Member

@dantegd dantegd commented May 3, 2024

Error came from the fact that pandas and cudf convert to numpy by default with different order.

Towards fixing #5876

@github-actions github-actions bot added the Cython / Python Cython or Python issue label May 3, 2024
@dantegd dantegd added bug Something isn't working non-breaking Non-breaking change labels May 3, 2024
@dantegd
Copy link
Member Author

dantegd commented May 6, 2024

Just confirmed that this PR indeed fixes the test_convert_matrix_order_cuml_array test in the new optional integration jobs

@dantegd dantegd marked this pull request as ready for review May 6, 2024 22:41
@dantegd dantegd requested a review from a team as a code owner May 6, 2024 22:41
@betatim
Copy link
Member

betatim commented May 8, 2024

I think the test stopped failing. Out of interest, how did you check it stopped failing (in CI)? I opened https://github.com/rapidsai/cuml/actions/runs/8976060393/job/24678633926?pr=5882 and looked at the output/scrolled to the end of the output and then searched for test_convert_matrix_order_cuml_array. The search found nothing, which I think means it passed

@betatim
Copy link
Member

betatim commented May 8, 2024

Making sure I understand the change: you added an explicit step to the input handling that changes the order for pandas. As a result it now does the same thing as cudf. Which means we don't need two separate if clauses in the test anymore?

# by default pandas converts to numpy 'C' major, which
# does not keep the original order
if order == "K":
X = X.reshape(X.shape, order="F")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know that you could change the order via reshape and was wondering whether or not this leads to a copy of the data. So I played around a bit to see it happen with my own eyes.

df = pd.DataFrame({'a': [1,2,3], 'b':[11,22,33]})
X = df.to_numpy()
print(X.flags) # C_CONTIGUOUS is false, F_CONTIGUOUS is true
X = X.reshape(X.shape, order="F")
print(X.flags) # C_CONTIGUOUS is false, F_CONTIGUOUS is true :-/

Which left my puzzled. Is my assumption of what should be happening wrong? There is np.asfortranarray() which does change C to F, but at the cost of making a copy of the data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, got myself confused. For me the result of to_numpy() is already F contiguous?!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like the result can be inconsistent depending on the wrapping that cudf is doing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, it should have no cost if it's already in the correct order

@dantegd dantegd changed the base branch from branch-24.06 to branch-24.08 June 3, 2024 23:42
@dantegd
Copy link
Member Author

dantegd commented Jun 6, 2024

/merge

@rapids-bot rapids-bot bot merged commit dc32b02 into rapidsai:branch-24.08 Jun 6, 2024
59 of 62 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Cython / Python Cython or Python issue non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants