Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass columns instead of Series to cudf.DataFrame in split-combine workflow #1429

Merged
merged 4 commits into from
Aug 7, 2024

Conversation

mroeschke
Copy link
Contributor

Description

closes #1426

It appears there was a change in 24.08 that broke a notebook demonstrating a merge on two geometry columns. It seems like the merge result tries to reconstruct a GeoDataFrame from a dict[Any, GeoSeries | Series] but the Series.index alignment requires the types to be recognized cudf types (not "geometry")

I don't think this alignment is entirely necessary though since it goes through the _split_out_geometry_columns/_recombine_columns methods which appears to be used on operations that maintain row ordering so index alignment isn't required.

This PR instead passes a dict[Any, GeoColumn | Column] to cudf.DataFrame._from_data given that this row ordering is preserved.

(This PR also includes the fix for #1427)

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@mroeschke mroeschke requested a review from a team as a code owner August 3, 2024 00:02
@mroeschke mroeschke requested review from trxcllnt and thomcom August 3, 2024 00:02
@github-actions github-actions bot added the Python Related to Python code label Aug 3, 2024
@jameslamb jameslamb added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Aug 5, 2024
@harrism
Copy link
Member

harrism commented Aug 5, 2024

Is this fix needed in 24.08?

@mroeschke
Copy link
Contributor Author

Is this fix needed in 24.08?

I wouldn't necessarily call this a show-stopping bug, so IMO it's OK to target for 24.10 (especially since the Python unit tests didn't break due to this bug). But I'm happy to backport it if others think it's necessary

@jameslamb
Copy link
Member

Based on @harrism 's 👍🏻 on #1429 (comment) and our offline conversation about this, seems to me this fix belongs in 24.10.

I'm going to merge this into 24.10, to help with #1424. I also am available to help if we do decide later to backport it to 24.08.

@jameslamb
Copy link
Member

/merge

@rapids-bot rapids-bot bot merged commit 6865f7c into rapidsai:branch-24.10 Aug 7, 2024
69 checks passed
@mroeschke mroeschke deleted the fix/_recombine_columns branch August 7, 2024 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Related to Python code
Projects
Status: Review
Development

Successfully merging this pull request may close these issues.

[BUG]: ZipCodes_Stops_PiP_cuSpatial failing: "data type 'geometry' not understood"
3 participants