Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] GeoSeries.__getitem__ needs to support arbitrary slices, integer columns, lists, tuples, and boolmasks. #591

Closed
thomcom opened this issue Jul 15, 2022 · 1 comment
Assignees
Labels
feature request New feature or request Python Related to Python code

Comments

@thomcom
Copy link
Contributor

thomcom commented Jul 15, 2022

Depends on #585.

We need to support the more abstract slicing techniques, not just integer ILoc

import geopandas
import cuspatial
df = geopandas.read_file(geopandas.datasets.get_path("naturalearth_lowres"))
gdf = cuspatial.from_geopandas(df)
gdf[0] # Polygon
gdf[0:2] # NotImplementedError

Optimally we can do:

gdf[0:2]
gdf[[0, 1]]
gdf[(0, 1)]
gdf[[True, False]]
@thomcom thomcom added feature request New feature or request Needs Triage Need team to review and classify labels Jul 15, 2022
@harrism harrism added the Python Related to Python code label Jul 19, 2022
rapids-bot bot pushed a commit that referenced this issue Jul 28, 2022
…host storage with pyarrow (#585)

This PR: Closes #575, #584, #588, #591, and #592.

This PR removes all the old `GeoArrowBuffers` logic that was implementing GeoArrow from scratch.
Now, new geometry objects are first parsed into a `pyarrow` `DenseUnion`, then copied to the GPU as `ListSeries`. A `DenseUnion/Like` class `GeoSeries` implements the `input_types` and `union_offsets` for indexing into the four separate `ListSeries`.

When host indexing, the `ListColumns` and `types` and `offsets` are copied into a pyarrow `DenseUnion` that is used for accessing the host side coordinates. At present indexing always copies to host.

This deletes three times as many loc as it creates and trims down the GeoArrow code to nearly the minimum. It is ready for review. 

The next issue will describe the next four tasks that I or @isVoid will implement:
1. Testing and validation of the many `data` arguments types that can be made to the `GeoSeries` constructor.
2. Finalizing an arrow io loop: `x = cuspatial.from_arrow(dense_union); dense_union = x.to_arrow()`
3. Finally, the `__getitem__` method needs to be optimized for zero-copy and memory-direct buffer transfers. Presently it always copies the data into new Shapely objects for serialization.

This closes the original issue #575. I'll create another issue to validate that the new format is correct and notify/compare it with the existing error report and geopandas team.

@trxcllnt @exactlyallan @isVoid @harrism

Authors:
  - H. Thomson Comer (https://github.com/thomcom)

Approvers:
  - Paul Taylor (https://github.com/trxcllnt)
  - Michael Wang (https://github.com/isVoid)
  - Mark Sadang (https://github.com/msadang)

URL: #585
@thomcom
Copy link
Contributor Author

thomcom commented Jul 28, 2022

Done

@thomcom thomcom closed this as completed Jul 28, 2022
@thomcom thomcom moved this to Done in cuSpatial Jul 28, 2022
@jarmak-nv jarmak-nv removed the Needs Triage Need team to review and classify label Mar 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request Python Related to Python code
Projects
Status: Done
Development

No branches or pull requests

4 participants