Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Geopanda functionalities in polars #1830

Closed
roger120981 opened this issue Nov 19, 2021 · 15 comments
Closed

Geopanda functionalities in polars #1830

roger120981 opened this issue Nov 19, 2021 · 15 comments
Labels
enhancement New feature or an improvement of an existing feature

Comments

@roger120981
Copy link

There is some known way to work in polars with geo shapes like Geopandas does in python?

@ghuls
Copy link
Collaborator

ghuls commented Nov 23, 2021

Not that I know. I never used GeoPandas, but after a quick look it seems to rely on storing python objects in a pandas dataframe (shapely objects?). It also depends on shapely and other packages that call into C++ code in which probably most of the time is spend.

@ghuls
Copy link
Collaborator

ghuls commented Nov 23, 2021

Seems like I was right:

https://blog.dask.org/2017/09/21/accelerating-geopandas-1

...
Unfortunately GeoPandas is slow. This limits interactive exploration on larger datasets. For example the Chicago crimes data (the first dataset above) has seven million entries and is several gigabytes in memory. Analyzing a dataset of this size interactively with GeoPandas is not feasible today.
This slowdown is because GeoPandas wraps each geometry (like a point, line, or polygon) with a Shapely object and stores all of those objects in an object-dtype column. When we compute a GeoPandas operation on all of our shapes we just iterate over these shapes in Python. As an example, here is how one might implement a distance method in GeoPandas today.
...
Cythonizing gives us speedups in the 10x-100x range. We use a single core as effectively as is possible with the GEOS library. Now we move on to using multiple cores in parallel. This gives us an extra 3-4x on a standard 4 core laptop. We can also scale to clusters, though I’ll leave that for a future blogpost.

Which resulted in this issue: geopandas/geopandas#473 which resulted in PyGEOS package:

PyGEOS is a C/Python library with vectorized geometry functions. The geometry operations are done in the open-source geometry library GEOS. PyGEOS wraps these operations in NumPy ufuncs providing a performance improvement when operating on arrays of geometries.)

Using the optional PyGEOS dependency at https://geopandas.org/en/latest/getting_started/install.html should give you a much faster experience.

@roger120981
Copy link
Author

In rust there is the https://georust.org/ project, the idea was to implement functionalities similar to those that geopandas have for polars but manipulating geographic data that are widely used these days Thanks !!

@nmandery
Copy link
Contributor

In this context it is probably good to know that there is currently an ongoing effort to create a common specification for storing geodata in arrow and parquet: https://github.com/geopandas/geo-arrow-spec

@kylebarron
Copy link
Contributor

I was just musing about geospatially-extended polars today, and surprised to see that there are already multiple issues about it here.

Some thoughts:

In my mind, this gives us all we need to create an add-on library for geospatial support in polars. The todo list would look something like:

  • Use geoarrow as the geometry format used by this extension package.
    • The geoarrow spec currently uses WKB as its geometry format, but there are ongoing discussions about an Arrow-native implementation that uses Arrow nested lists and/or structs.
    • geozero already supports WKB. When there's progress on an Arrow-native spec, that would also be added to geozero. But we could use current geoarrow today via WKB without making an intermediate representation.
  • The extension package would wrap Polars, and all geospatial functionality would be kept in the extension package.
  • Ideally we'd use algorithms defined in the geo crate, without binding to native GEOS, so that algorithms could work on the zero-copy geometries. This would also allow this extension package to be compiled to a target such as WASM.

An initial question is just: can an extension package wrap Polars and can it implement a simple item-wise operation like .bounds? I might explore some of these ideas but I have limited time, so no promises.

@roger120981
Copy link
Author

@kylebarron The idea of keeping the projects separate seems perfect to me.

  • Perhaps they could initially be called polars-geopolars
  • Trying to use code written in Rust would also be great, because in my case I want to create a binding of this library in elixir through rustler library, and the idea of using the path GEOS -> Rust/Rustler -> Elixir seems a bit complex to me
  • It is necessary to have a library to handle Coordinate Reference Systems and possible conversions. It is the same case as above. There is a binding for the proj4 project but a native library for Rust that does this task would be ideal. Is there any?
  • Using the GEO crate algorithms seems fine to me initially.

@kylebarron
Copy link
Contributor

  • It is necessary to have a library to handle Coordinate Reference Systems and possible conversions

I don't know of a native proj port in Rust, and I think it would be a massive undertaking. Presumably proj would be an optional feature; if the user needs reprojection support they can include it.

@nmandery
Copy link
Contributor

I would think having rust-only dependencies would be preferable. While geos is really powerful and well tested, it will surely complicate things when targeting multiple platforms or providing binary artifacts. On the other side there are things missing in the rust ecosystem which are already present elsewhere. Going with geozero and the georust ecosystem sounds reasonable to me, though.

Regarding a proj-alternative in rust, there is https://github.com/busstoptaktik/geodesy , but the project already mentions in the Readme that it is currently at an early stage.

@stuartlynn
Copy link

@kylebarron I am also very interested in working on this. Have been doing a lot of geo just using the arrow2 and geo crate but having geopandas like functionality in polars would make things a lot easier.

If it's something your working on and looking for a collaborator let me know!

@roger120981
Copy link
Author

@stuartlynn This is the geopolars repo, you can visit it and contribute https://github.com/kylebarron/geopolars

@kylebarron
Copy link
Contributor

Hey @stuartlynn. I did start hacking in a repo as mentioned above. So far there's enough to serve as a minimal proof of concept. It's currently on hold while researching an RFC to georust on using Arrow memory as a zero-copy format for georust algorithms. You're welcome to create an issue or discuss on geopolars/geopolars#1.

@stuartlynn
Copy link

Love it thanks!

@MarcoGorelli
Copy link
Collaborator

https://github.com/geopolars/geopolars exists, and it looks like it's coming together nicely - going to close this issue then, but please let me know if I've misunderstood and can reopen

@b-a0
Copy link

b-a0 commented Jul 9, 2024

It appears geopolars is on hold: the last commit to main is from 7 months ago, and the last alpha release about 1.5 years ago. Is it perhaps possible to re-open the current issue to signal that other solutions are welcome as well?

@kylebarron
Copy link
Contributor

kylebarron commented Jul 9, 2024

The main blocker for geopolars is #9112. I don't want to build out a full subclassing approach like GeoPandas does, which I think has a pretty high maintenance overhead. I'm working on Arrow-based Rust/Python functionality in https://github.com/geoarrow/geoarrow-rs, and then whenever #9112 is solved (see also discussion in #7288) we can resume integration in geopolars.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

9 participants