Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GeoTable.from_arrow doesn't recognize geometry column from PointArray #589

Open
deanm0000 opened this issue Mar 27, 2024 · 3 comments
Open

Comments

@deanm0000
Copy link

I started from something like

my_point_array = PointArray.from_xy(
        pa.array([-160.49, -87.35,-88.01], pa.float64()), 
        pa.array([55.34, 33.46,31.01], pa.float64()), )

arrow_table = pa.Table.from_arrays([
    pa.array([1,2,3], pa.int32()),
    my_point_array
    ], names=['a','geometry'])

then tried

GeoTable.from_arrow(arrow_table)

but got

PanicException: no geometry column in table

I also tried a few things around ChunkedPointArray.from_arrow_arrays([my_point_array]) but none of it worked.

@kylebarron
Copy link
Member

Thanks for trying it out!

I agree having better geometry constructors will be necessary for usability. GeoArrow defines extension metadata that needs to be on an array to declare it a geometry. Your issue is that when you call pa.Table.from_arrays, the field for each array is inferred from the data type of the arrays. But the inferred field won't have any metadata applied to it.

One way to fix this is to do use the schema parameter of from_arrays to ensure there's geoarrow metadata on the geometry column.

The other way is to register the pyarrow extension types provided in geoarrow-pyarrow. In that case, I believe the extension metadata will be automatically inferred.

For now, I've put more effort into the IO readers and writers and into the GeoPandas and Shapely interoperability. So a simple way to get a GeoTable is to first create a geopandas.GeoDataFrame and then use geoarrow.rust.core.from_geopandas.

@deanm0000
Copy link
Author

first create a geopandas.GeoDataFrame

I'm trying to quit doing that ;)

The other way is to register the pyarrow extension

That's what I really needed.

import geoarrow.pyarrow as ga
ga.register_extension_types()

Now, with my df coming from polars, I can just do

df_geo = GeoTable.from_arrow(
    df.to_arrow().add_column(
        0, "geometry", [PointArray.from_xy(df["x"].to_arrow(), df["y"].to_arrow())]
    )
)

I see that it says my geometry is a Struct but I thought it'd be a FixedSizeList. Is that always the case or is that related to how I constructed it?

@kylebarron
Copy link
Member

I'm trying to quit doing that ;)

Yes of course, but baby steps!

I see that it says my geometry is a Struct but I thought it'd be a FixedSizeList. Is that always the case or is that related to how I constructed it?

GeoArrow allows either FixedSizeList or Struct for coordinate buffers. PointArray.from_xy always creates a StructArray because that's how the memory is passed in.

#578 will allow you more control over interleaved vs separated layout when constructing arrays from raw buffers. We should also add a helper to go back and forth between them when you already have your arrays.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants