-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create pygeoarrow
and use it for cuSpatial feature storage and i/o
#583
Create pygeoarrow
and use it for cuSpatial feature storage and i/o
#583
Conversation
… committing before I delete a big block of code.
The next PR will have tons more - than + I think. |
This PR is messy because the old wrapper and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is my first review of geoarrow PRs, so bear with me if I have missed some background knowledge. I have a high level question for GeoColumn
. Why is GeoColumn
a NumericalColumn
? It seems to me no single existing cudf column type is sufficient to represent what GeoColumn
expresses. It seems a bit awkward to piggy back on existing methods defined for NumericalColumn
.
However, each "pure geometry" arrays can be represented as a ListColumn
in cuDF. My intuition is that we can use ListColumn
for each of the pure geometry arrays. The indexing logic introduced here for GeoColumn
can continue serve as the top level indexing (similar to UnionArray indexing).
Thanks for your comments! I'll cover each and hopefully we can merge this. This is a partial contribution, and many of your concerns will disappear in the next two.
Thanks for the reminder, I'll be sure to create an issue for |
Co-authored-by: Michael Wang <isVoid@users.noreply.github.com>
Co-authored-by: Michael Wang <isVoid@users.noreply.github.com>
Co-authored-by: Michael Wang <isVoid@users.noreply.github.com>
Co-authored-by: Michael Wang <isVoid@users.noreply.github.com>
Co-authored-by: Michael Wang <isVoid@users.noreply.github.com>
…spatial into fea-use-pyarrow-for-geopandas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, as requested I only went pretty broad-strokes this time and I see no red flags. I see there are internal calls to assert_eq
s that can hopefully be replaced in future PRs.
And just to be clear, groupby
is only promised at a indexed_frame
level in cudf and is very much a separate implementation unrelated to column. see:
https://github.com/rapidsai/cudf/blob/branch-22.08/python/cudf/cudf/core/groupby/groupby.py
It totally makes sense to implement geoseries and geodataframe inheriting from series and dataframe to make use of the compute APIs for non-geometric columns. But geocolumn
is too eccentric that I think it's best to implement an underlying UnionColumn
type in cuspatial and geocolumn
can inherit from it.
@gpucibot merge |
This closes #582.
This PR removes the input based on an iterative Shapely reader and
cudf
buffers. Now the input is stored directly in a pyarrowUnionArray
. The next PR will remove most of the interior functionality that is not necessary.