Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove GeoArrow glue code replacing gpu storage with cudf.Series and host storage with pyarrow #585

Merged
merged 43 commits into from
Jul 28, 2022

Conversation

thomcom
Copy link
Contributor

@thomcom thomcom commented Jul 8, 2022

This PR: Closes #575, #584, #588, #591, and #592.

This PR removes all the old GeoArrowBuffers logic that was implementing GeoArrow from scratch.
Now, new geometry objects are first parsed into a pyarrow DenseUnion, then copied to the GPU as ListSeries. A DenseUnion/Like class GeoSeries implements the input_types and union_offsets for indexing into the four separate ListSeries.

When host indexing, the ListColumns and types and offsets are copied into a pyarrow DenseUnion that is used for accessing the host side coordinates. At present indexing always copies to host.

This deletes three times as many loc as it creates and trims down the GeoArrow code to nearly the minimum. It is ready for review.

The next issue will describe the next four tasks that I or @isVoid will implement:

  1. Testing and validation of the many data arguments types that can be made to the GeoSeries constructor.
  2. Finalizing an arrow io loop: x = cuspatial.from_arrow(dense_union); dense_union = x.to_arrow()
  3. Finally, the __getitem__ method needs to be optimized for zero-copy and memory-direct buffer transfers. Presently it always copies the data into new Shapely objects for serialization.

This closes the original issue #575. I'll create another issue to validate that the new format is correct and notify/compare it with the existing error report and geopandas team.

@trxcllnt @exactlyallan @isVoid @harrism

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@github-actions github-actions bot added Python Related to Python code cmake Related to CMake code or build configuration conda Related to conda and conda configuration labels Jul 8, 2022
@thomcom thomcom marked this pull request as ready for review July 14, 2022 22:33
@thomcom thomcom requested review from a team as code owners July 14, 2022 22:33
@thomcom thomcom requested review from vyasr, trxcllnt and isVoid July 14, 2022 22:33
@thomcom thomcom added bug Something isn't working non-breaking Non-breaking change 4 - Needs Reviewer Waiting for reviewer to review or respond improvement Improvement / enhancement to an existing function and removed conda Related to conda and conda configuration cmake Related to CMake code or build configuration labels Jul 15, 2022
@github-actions github-actions bot added cmake Related to CMake code or build configuration conda Related to conda and conda configuration labels Jul 15, 2022
@thomcom thomcom removed the bug Something isn't working label Jul 15, 2022
@thomcom
Copy link
Contributor Author

thomcom commented Jul 27, 2022

All comments are handle. In the last push I added GeoSeriesLocIndexer as it was on my mind.

@thomcom thomcom requested a review from isVoid July 27, 2022 18:25
Copy link
Contributor

@trxcllnt trxcllnt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@thomcom thomcom requested a review from trxcllnt July 27, 2022 21:49
Copy link
Contributor

@isVoid isVoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@isVoid
Copy link
Contributor

isVoid commented Jul 27, 2022

I think it also closes #592 ?

@harrism
Copy link
Member

harrism commented Jul 28, 2022

So is this ready to merge? Good work @thomcom and thanks!

@thomcom thomcom requested a review from ajschmidt8 July 28, 2022 19:07
Copy link
Contributor

@msadang msadang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thomcom thomcom removed the request for review from ajschmidt8 July 28, 2022 21:49
@thomcom
Copy link
Contributor Author

thomcom commented Jul 28, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit afdd260 into rapidsai:branch-22.08 Jul 28, 2022
@thomcom thomcom deleted the fea-drop-glue-code branch July 29, 2022 17:26
ajschmidt8 added a commit to ajschmidt8/cuxfilter that referenced this pull request Aug 3, 2022
Similarly to rapidsai/cuspatial#585, this PR updates the `geopandas` version specifier to ensure compatibility with the rest of RAPIDS.
rapids-bot bot pushed a commit to rapidsai/cuxfilter that referenced this pull request Aug 3, 2022
Similarly to rapidsai/cuspatial#585, this PR updates the `geopandas` version specifier to ensure compatibility with the rest of RAPIDS.

Authors:
  - AJ Schmidt (https://github.com/ajschmidt8)

Approvers:
  - Jordan Jacobelli (https://github.com/Ethyling)

URL: #390
rapids-bot bot pushed a commit that referenced this pull request Sep 8, 2022
This PR introduces 3 documents for developer guides. The overall goal for these docs is to provide a place to centralize best practices and provide guidance to developers of all levels to bootstrap development for cuspatial.

This PR deprecates internals.md. The design of geocolumn has improved dramatically since #585 , rendering many of the description obsolete. Further, examples of geoarrow and detail description of geoarrow should be hosted in [geoarrow specification](https://github.com/geopandas/geo-arrow-spec). `library_design.md` replaces `internal.md`.

`CONTRIBUTING.md` is deprecated and only leaves a pointer to `contributing_guide.md`. The contents are highly identical to the overall RAPIDS contribution guide: https://docs.rapids.ai/contributing. This link is included in the contributing guide and is thus removed to maintain a single source of truth.

Authors:
  - Michael Wang (https://github.com/isVoid)

Approvers:
  - Paul Taylor (https://github.com/trxcllnt)
  - H. Thomson Comer (https://github.com/thomcom)
  - Mark Harris (https://github.com/harrism)

URL: #625
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 - Needs Reviewer Waiting for reviewer to review or respond cmake Related to CMake code or build configuration conda Related to conda and conda configuration improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Related to Python code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] cuSpatial does not correctly implement GeoArrow format for MultiLineString and MultiPolygon
5 participants