Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] cuSpatial does not correctly implement GeoArrow format for MultiLineString and MultiPolygon #575

Closed
6 tasks done
thomcom opened this issue Jun 21, 2022 · 3 comments · Fixed by #585
Closed
6 tasks done
Assignees
Labels
bug Something isn't working

Comments

@thomcom
Copy link
Contributor

thomcom commented Jun 21, 2022

Describe the bug
My implementation of the named datatypes uses a poorly imagined "brackets" data structure to identify where the Multi features are located within the offsets arrays for LineString and Polygon coordinates. Instead of using brackets, to match the GeoArrow format, the offsets for LineStrings and Polygons should simply be 1, and the offset for Multi's should be the number of that feature that the Multi is made from in sequence.

Steps/Code to reproduce bug

The issue is documented in precise detail here: https://notebooksharing.space/view/517f3172b12354804179f248247ab5ffd6573214e9f9810d13494533f1aefd8a#displayOptions=

Plan
As part of fixing the bug, I'll convert cuspatial to use pyarrow for all of the Shapely/GeoPandas serialization. Presently I am doing this serialization manually, which requires many lines of code and is hard for other developers to understand. Additionally, that means that all indexing, host/device copies, and deserialization are manual, too. Fixing this bug will have four steps:

As each step begins I'll create an issue and a PR for it and document them here.

@thomcom thomcom added bug Something isn't working Needs Triage Need team to review and classify labels Jun 21, 2022
@thomcom thomcom self-assigned this Jun 21, 2022
@thomcom thomcom changed the title [BUG] Correctly implement GeoArrow format for MultiLineString and MultiPolygon [BUG] cuSpatial does not correctly implement GeoArrow format for MultiLineString and MultiPolygon Jun 21, 2022
@harrism
Copy link
Member

harrism commented Jun 21, 2022

Thanks @thomcom . Can you target 22.08 for the fix?

@thomcom
Copy link
Contributor Author

thomcom commented Jun 22, 2022

Yes!

@thomcom
Copy link
Contributor Author

thomcom commented Jul 15, 2022

@harrism this is done in #585 for 22.08

@rapids-bot rapids-bot bot closed this as completed in #585 Jul 28, 2022
rapids-bot bot pushed a commit that referenced this issue Jul 28, 2022
…host storage with pyarrow (#585)

This PR: Closes #575, #584, #588, #591, and #592.

This PR removes all the old `GeoArrowBuffers` logic that was implementing GeoArrow from scratch.
Now, new geometry objects are first parsed into a `pyarrow` `DenseUnion`, then copied to the GPU as `ListSeries`. A `DenseUnion/Like` class `GeoSeries` implements the `input_types` and `union_offsets` for indexing into the four separate `ListSeries`.

When host indexing, the `ListColumns` and `types` and `offsets` are copied into a pyarrow `DenseUnion` that is used for accessing the host side coordinates. At present indexing always copies to host.

This deletes three times as many loc as it creates and trims down the GeoArrow code to nearly the minimum. It is ready for review. 

The next issue will describe the next four tasks that I or @isVoid will implement:
1. Testing and validation of the many `data` arguments types that can be made to the `GeoSeries` constructor.
2. Finalizing an arrow io loop: `x = cuspatial.from_arrow(dense_union); dense_union = x.to_arrow()`
3. Finally, the `__getitem__` method needs to be optimized for zero-copy and memory-direct buffer transfers. Presently it always copies the data into new Shapely objects for serialization.

This closes the original issue #575. I'll create another issue to validate that the new format is correct and notify/compare it with the existing error report and geopandas team.

@trxcllnt @exactlyallan @isVoid @harrism

Authors:
  - H. Thomson Comer (https://github.com/thomcom)

Approvers:
  - Paul Taylor (https://github.com/trxcllnt)
  - Michael Wang (https://github.com/isVoid)
  - Mark Sadang (https://github.com/msadang)

URL: #585
@rapids-bot rapids-bot bot moved this to Done in cuSpatial Jul 28, 2022
@jarmak-nv jarmak-nv removed the Needs Triage Need team to review and classify label Mar 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment