-
Notifications
You must be signed in to change notification settings - Fork 535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix cudf.pandas
failure on test_convert_matrix_order_cuml_array
#5882
Fix cudf.pandas
failure on test_convert_matrix_order_cuml_array
#5882
Conversation
…rix_order_cuml_array
Just confirmed that this PR indeed fixes the |
I think the test stopped failing. Out of interest, how did you check it stopped failing (in CI)? I opened https://github.com/rapidsai/cuml/actions/runs/8976060393/job/24678633926?pr=5882 and looked at the output/scrolled to the end of the output and then searched for |
Making sure I understand the change: you added an explicit step to the input handling that changes the order for pandas. As a result it now does the same thing as cudf. Which means we don't need two separate |
# by default pandas converts to numpy 'C' major, which | ||
# does not keep the original order | ||
if order == "K": | ||
X = X.reshape(X.shape, order="F") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know that you could change the order via reshape
and was wondering whether or not this leads to a copy of the data. So I played around a bit to see it happen with my own eyes.
df = pd.DataFrame({'a': [1,2,3], 'b':[11,22,33]})
X = df.to_numpy()
print(X.flags) # C_CONTIGUOUS is false, F_CONTIGUOUS is true
X = X.reshape(X.shape, order="F")
print(X.flags) # C_CONTIGUOUS is false, F_CONTIGUOUS is true :-/
Which left my puzzled. Is my assumption of what should be happening wrong? There is np.asfortranarray()
which does change C to F, but at the cost of making a copy of the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, got myself confused. For me the result of to_numpy()
is already F contiguous?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like the result can be inconsistent depending on the wrapping that cudf is doing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, it should have no cost if it's already in the correct order
…rix_order_cuml_array
/merge |
Error came from the fact that pandas and cudf convert to numpy by default with different order.
Towards fixing #5876