-
-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support GeoPandas GeoDataFrames #1006
Comments
@jorisvandenbossche Can you fill in some details? |
@ablythed thanks for the reminder. I started a draft at the time, but now finished it up. I updated the top post. |
I think that this can be useful elsewhere and should live in GeoPandas. It may also be easier for us to maintain it than for datashader.
+1 for this idea. I would also say that we should find a way of a direct interface between GeoPandas and Datashader, one which does not depend on spatialpandas. From what I understood from @jbednar, if there'll be an efficient interface of this kind they'll be more than happy to retire spatialpandas (I guess once we'll manage to have the similar one in dask-geopandas). |
Very cool! Thanks for all this; I'd love to see direct Datashader support for GeoPandas data! As a secondary priority, I'd also love to see all of the SpatialPandas functionality disappear into other suitable libraries; we want to keep the ecosystem/landscape small and manageable for users except when deep and genuine differences in requirements dictate. Putting the "GEOS -> ragged coordinate arrays" conversion code into GeoPandas seems to me like it would make the most sense, with Datashader and potentially Dask-GeoPandas working directly with that output. As Martin indicates, the raw format could be useful for other algorithms as well. I agree that standardizing on "some kind of If we do that, can we further remove the underlying need for SpatialPandas to exist at all? As background, I can recall four reasons we created SpatialPandas instead of using or extending existing libraries like GeoPandas and now Dask-GeoPandas:
I'm very happy to revisit these four considerations now and think about where we've ended up a few years later. The situation has definitely improved, largely through the hard work of people on the GeoPandas team:
So, where does that leave us? Seems like GeoPandas has addressed reason 1 and there's a good plan to address reason 3. What about reasons 2 and 4? Is it reasonable for code that does not assume it's used with geographic shapes to live in GeoPandas? Would GeoPandas be ok with stating on the home page that "Geo" here means both "Geographic" and "Geometric", and to say that while the algorithms in GeoPandas are largely inspired by geographic applications, they should also be fully usable for 2D geometry in general? If so I don't think we'd need to continue with SpatialPandas at all, and can coalesce around GeoPandas as the data structure and spatial algorithms while supporting Datashader for bulk rendering. |
@jbednar GeoPandas talks about geographic data while in reality supports any planar geometry no matter the origin.
Yes. While we don't mention it in the docs at the moment, GeoPandas data structures and functionality is fully usable for 2D geometry in general in the same way shapely/pygeos is. We just add projection support on top if one needs it. I'll open an issue in the GeoPandas repo to clarify the documentation in this sense. -> geopandas/geopandas#1971 |
Perfect, thanks! If I can tell people to use GeoPandas for all their 2D planar shapes regardless of what they are, then I am very happy for Datashader to work directly with whatever the rawest form of coordinate access GeoPandas can provide as the way to work with ragged shapes using Numba and Dask. (Non-ragged shapes like dense n-D arrays of same-length lines can already be supported by xarray and numpy.) Excellent! |
Hi, |
I have started looking at this, work-in-progress PR is #1285. I am happy to talk about it there or here. |
Context:
from_geopandas
conversion code eg here)With the latest release of PyGEOS, the conversion from geometries to (ragged) coordinate arrays can be done much more efficiently, though.
Function using PyGEOS to convert array of GEOS geometries to arrays of coordinates / offsets (+ putting those in a spatialpandas array)
With such a faster conversion available, it becomes more interesting for Datashader to directly support
geopandas.GeoDataFrame
, instead of requiring an up-front conversion tospatialpandas.GeoDataFrame
.Currently, the spatialpandas requirement is hardcoded here (for
polygons()
):datashader/datashader/core.py
Lines 694 to 701 in 1ae52b6
Adding support for GeoPandas can be done, using the function I defined above, with something like (leaving aside imports of geopandas/pygeos):
This patch is what I tried in the following notebook, first using a smaller countries/provinces dataset from NaturalEarth, and then with a larger NYC building footprints dataset (similar to https://examples.pyviz.org/nyc_buildings/nyc_buildings.html).
Notebook: https://nbviewer.jupyter.org/gist/jorisvandenbossche/3e7ce14cb5118daa0f6097d686981c9f
Some observations:
.cx
spatial subsetting step in my patch above, filtering the data before rendering). For spatialpandas, such subsetting is only added for the dask version.Gif of the notebook in action (the buildings dataset is fully loaded in memory, and not pararellized with dask, unlike the PyViz gallery example), interactively zooming into a GeoPandas dataframe with Datashader and Holoviews:
(note this was done a bit manually with Holoviews DynamicMap and a callback with Datashader code, because the integrated datashade functionality of Holoviews/HvPlot wouldn't preserve the geopandas.GeoDataFrame with the current versions)
So, what's the way forward here? I think I showed that it can be useful for Datashader to directly support GeoPandas, and that it can also be done with a relatively small change to datashader.
The big question, though, is about the "GEOS -> ragged coordinate arrays -> spatialpandas array" conversion. Where should this live / how should DataShader and GeoPandas interact?
Some initial thoughts about this:
pyarrow.ListArray
to then convert it to a spatialpandasMultiPolygonArray
. But in the end, what Datashader needs is only the raw coordinates and offsets arrays.For example, for rendering polygons, you access
.buffer_values
and.buffer_offsets
of the MultiPolygonArray, which gives back the raw coordinate and offset arrays.So in theory, this roundtrip through pyarrow and spatialpandas is not needed, and some method could convert GeoPandas geometries into coordinate/offset arrays, which could be directly handled by datashader as is. This would however require a bit more changes in datashader in the way that data gets passed down from
Canvas.polygons()
into theglyph
rendering (as currently that uses the spatiapandas array as container for the coordinates/offsets).One possible idea (relating to the third bullet point) is to standardize on some kind of
__geo_arrow_arrays__
interface (returning the coordinate + offset arrays), similarly to the existing__geo_interface__
that returns the geometries in GeoJSON-like dictionary (and which can be used now for accepting any "geometry-like" object even from libraries you don't know).The text was updated successfully, but these errors were encountered: