Pass columns instead of Series to cudf.DataFrame
in split-combine workflow
#1429
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
closes #1426
It appears there was a change in 24.08 that broke a notebook demonstrating a
merge
on two geometry columns. It seems like the merge result tries to reconstruct aGeoDataFrame
from adict[Any, GeoSeries | Series]
but theSeries.index
alignment requires the types to be recognized cudf types (not"geometry"
)I don't think this alignment is entirely necessary though since it goes through the
_split_out_geometry_columns
/_recombine_columns
methods which appears to be used on operations that maintain row ordering so index alignment isn't required.This PR instead passes a
dict[Any, GeoColumn | Column]
tocudf.DataFrame._from_data
given that this row ordering is preserved.(This PR also includes the fix for #1427)
Checklist