-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] cuSpatial does not correctly implement GeoArrow format for MultiLineString and MultiPolygon #575
Closed
6 tasks done
Labels
bug
Something isn't working
Comments
thomcom
added
bug
Something isn't working
Needs Triage
Need team to review and classify
labels
Jun 21, 2022
thomcom
changed the title
[BUG] Correctly implement GeoArrow format for MultiLineString and MultiPolygon
[BUG] cuSpatial does not correctly implement GeoArrow format for MultiLineString and MultiPolygon
Jun 21, 2022
Thanks @thomcom . Can you target 22.08 for the fix? |
Yes! |
This was referenced Jun 29, 2022
rapids-bot bot
pushed a commit
that referenced
this issue
Jul 28, 2022
…host storage with pyarrow (#585) This PR: Closes #575, #584, #588, #591, and #592. This PR removes all the old `GeoArrowBuffers` logic that was implementing GeoArrow from scratch. Now, new geometry objects are first parsed into a `pyarrow` `DenseUnion`, then copied to the GPU as `ListSeries`. A `DenseUnion/Like` class `GeoSeries` implements the `input_types` and `union_offsets` for indexing into the four separate `ListSeries`. When host indexing, the `ListColumns` and `types` and `offsets` are copied into a pyarrow `DenseUnion` that is used for accessing the host side coordinates. At present indexing always copies to host. This deletes three times as many loc as it creates and trims down the GeoArrow code to nearly the minimum. It is ready for review. The next issue will describe the next four tasks that I or @isVoid will implement: 1. Testing and validation of the many `data` arguments types that can be made to the `GeoSeries` constructor. 2. Finalizing an arrow io loop: `x = cuspatial.from_arrow(dense_union); dense_union = x.to_arrow()` 3. Finally, the `__getitem__` method needs to be optimized for zero-copy and memory-direct buffer transfers. Presently it always copies the data into new Shapely objects for serialization. This closes the original issue #575. I'll create another issue to validate that the new format is correct and notify/compare it with the existing error report and geopandas team. @trxcllnt @exactlyallan @isVoid @harrism Authors: - H. Thomson Comer (https://github.com/thomcom) Approvers: - Paul Taylor (https://github.com/trxcllnt) - Michael Wang (https://github.com/isVoid) - Mark Sadang (https://github.com/msadang) URL: #585
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
My implementation of the named datatypes uses a poorly imagined "brackets" data structure to identify where the Multi features are located within the offsets arrays for LineString and Polygon coordinates. Instead of using brackets, to match the GeoArrow format, the offsets for LineStrings and Polygons should simply be 1, and the offset for Multi's should be the number of that feature that the Multi is made from in sequence.
Steps/Code to reproduce bug
The issue is documented in precise detail here: https://notebooksharing.space/view/517f3172b12354804179f248247ab5ffd6573214e9f9810d13494533f1aefd8a#displayOptions=
Plan
As part of fixing the bug, I'll convert
cuspatial
to usepyarrow
for all of the Shapely/GeoPandas serialization. Presently I am doing this serialization manually, which requires many lines of code and is hard for other developers to understand. Additionally, that means that all indexing, host/device copies, and deserialization are manual, too. Fixing this bug will have four steps:pyarrow
native types for all serialization.[FEA] Use
pyarrow
to read and serialize geoPandas geometry #582Create
pygeoarrow
and use it for cuSpatial feature storage and i/o #583pyarrow
based indexing.getitem
forUnionArray
in pyarrow.pyarrow
native code.pyarrow
buffers.As each step begins I'll create an issue and a PR for it and document them here.
The text was updated successfully, but these errors were encountered: