Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration with geopandas to plot shapely geometries #608

Closed
weiji14 opened this issue Sep 17, 2020 · 12 comments · Fixed by #1000
Closed

Integration with geopandas to plot shapely geometries #608

weiji14 opened this issue Sep 17, 2020 · 12 comments · Fixed by #1000
Assignees
Labels
feature request New feature wanted help wanted Helping hands are appreciated scipy-sprint Things to work on during the SciPy sprint!
Milestone

Comments

@weiji14
Copy link
Member

weiji14 commented Sep 17, 2020

Description of the desired feature

Geopandas is somewhat of a defacto geographic data structure in Python nowadays, and is used to hold vector shapes like Polygons, Lines and Points. What might be useful is if we can have PyGMT plot geopandas Data Structures (e.g. geopandas.GeoDataFrames) directly. Something like so:

import geopandas as gpd
import pygmt

cities: gpd.GeoDataFrame = gpd.read_file(gpd.datasets.get_path("naturalearth_cities"))
cities.head()
# 	name	geometry
# 0	Vatican City	POINT (12.45339 41.90328)
# 1	San Marino	POINT (12.44177 43.93610)
# 2	Vaduz	POINT (9.51667 47.13372)
# 3	Luxembourg	POINT (6.13000 49.61166)
# 4	Palikir	POINT (158.14997 6.91664)

fig = pygmt.Figure()
fig.plot(data=cities, style="s0.2c", color="black")
fig.show()

One way would be to change plot to accept geopandas type objects, look for the 'geometry' column, and plot the 'Shapely' geometries.

Currently, users need to go through a convoluted process such as with the following (adapted from https://forum.generic-mapping-tools.org/t/shapefile-to-gmt-python/834/21, also xref GenericMappingTools/foss4g2019oceania#7):

fig = pygmt.Figure()
fig.coast(region="g", land="gray", water="lightblue")
points = [point for point in cities.geometry[:10]]  # plot 10 cities only
for point in points:
    x, y = point.coords.xy
    fig.plot(x=x, y=y, style="s0.2c", color="black")
fig.show()

produces:

cities

Using data=geodataframe seems like the easiest, but we could consider alternative ways of implementing the geopandas integration. There might need to be separate code pathways for handling Points, Lines, and Polygons.

👋 Would be good to hear from the geo-community: What is your preferred way of putting data into a plot function?

  1. The plot-function-first-then-data way, i.e. fig.plot(data=geodata)
  2. The data-first-then-plot accessor way geodata.gmt.plot(style=...)

Are you willing to help implement and maintain this feature? Yes, but discuss first 😄

Related issues that may benefit from this:
#392, #493

@weiji14 weiji14 added feature request New feature wanted help wanted Helping hands are appreciated labels Sep 17, 2020
@mattijn
Copy link

mattijn commented Dec 3, 2020

I prefer the first option:

fig.plot(data=geodata)

Last year I made an accepted PR to support any objects with a __geo_interface__ attribute (including geopandas) for Altair. See vega/altair#1664.

It would be great to have similar possibilities for PyGMT!

@weiji14
Copy link
Member Author

weiji14 commented Dec 3, 2020

Ah cool, haven't heard of __geo_interface__ before but it looks like a good way to use it as the integration glue! Would be good to have some links to find out more about this API.

I haven't got much time recently, but if you (or someone else) are willing to start, I'm sure it will be much appreciated. Maybe start with a single geometry type (e.g. points or polygon) and we can gradually expand it from there.

@mattijn
Copy link

mattijn commented Dec 3, 2020

Here is the Python protocol: https://gist.github.com/sgillies/2217756. Its basically GeoJSON. I'm willing to make a start, once I can prioritise time for it!

@mattijn
Copy link

mattijn commented Dec 10, 2020

Little progress. Currently investigated an approach using geopandas and Fiona. While I'm typing this, I think that GMT might be using ogr2ogr under the hood and also can recognizes GeoJSON directly. Something to check for later.

Somehow I'm not able to succeed using the GMTTempFile. I think it is already trying to plot while the file is still being written?

import geopandas as gpd
import pygmt
import os

geodataframe to pygmt (works)

# read geodataframe
gdf = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
gdf = gdf[gdf.continent == 'Africa']
gdf = gdf[['name', 'geometry']]

# collect extent region
bound = gdf.total_bounds
region = [bound[0], bound[2], bound[1], bound[3]]

# write to (tmp) file 
# !geopandas (fiona) is not able to overwrite this format
filename = 'example_gdf.gmt'
gdf.to_file(filename, driver='OGR_GMT')

# plot
fig = pygmt.Figure()
fig.basemap(region=region, projection="M20c", frame=True)
fig.plot(data='example_gdf.gmt')
fig.show()  

geodataframe to pygmt using GMTTempFile (does not work)

from pygmt.helpers import GMTTempFile

with GMTTempFile() as tmpfile:
    gdf.to_file(tmpfile.name, driver='OGR_GMT') 

    # plot
    fig = pygmt.Figure()
    fig.basemap(region=region, projection="M20c", frame=True)
    fig.plot(data=tmpfile.name)
    fig.show()   
plot [WARNING]: File /var/folders/w_/fd4w81vd0lj2fvsvjtfl_rjr0000gn/T/pygmt-71vo6id9.txt is empty!

__geo_interface__ to pygmt using fiona (works)

import fiona
from fiona.crs import from_epsg
import collections

# make sure all types are equal
gdf = gdf.explode()
geo_if = gdf.__geo_interface__
filename_fiona = 'example_fiona.gmt'
schema = {'properties': collections.OrderedDict([('name', 'str')]), 'geometry': 'Polygon'}

# ! fiona is not able to overwrite this format
with fiona.open(filename_fiona, 'w', crs=from_epsg(4326), driver='OGR_GMT', schema=schema) as output:
    for feature in geo_if['features']:
        output.write(feature)

# plot
fig = pygmt.Figure()
fig.basemap(region=region, projection="M20c", frame=True)
fig.plot(data='example_fiona.gmt')
fig.show() 

__geo_interface__ to pygmt using fiona and GMTTempFile (does not work)

from pygmt.helpers import GMTTempFile

with GMTTempFile() as tmpfile:
    
    with fiona.open(tmpfile.name, 'w', crs=from_epsg(4326), driver='OGR_GMT', schema=schema) as output:
        for feature in geo_if['features']:
            output.write(feature)    

    # plot
    fig = pygmt.Figure()
    fig.basemap(region=region, projection="M20c", frame=True)
    fig.plot(data=tmpfile.name)
    fig.show()  
plot [WARNING]: File /var/folders/w_/fd4w81vd0lj2fvsvjtfl_rjr0000gn/T/pygmt-gri6a776.txt is empty!

@weiji14
Copy link
Member Author

weiji14 commented Dec 11, 2020

Wow, this is looking really promising! I was wondering how the GeoJSON -> GMT path would work but seems like you found a great solution here with OGR (never have I been so glad that this standard exists) 😆

On the GMTTempFile not working, could you try (for Option 2) with pygmt.helpers.GMTTempFile(suffix=".gmt") and see if it works? Not sure if GMT requires that the file extension be .gmt, but won't hurt to try. I can't see why else this won't work if regular GeoJSON files are ok.

@weiji14
Copy link
Member Author

weiji14 commented Dec 15, 2020

Seems like we need to do os.remove(tmpfile.name) first, i.e. this works:

import geopandas as gpd
import pygmt
import os

# read geodataframe
gdf = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
gdf = gdf[gdf.continent == "Africa"]
gdf = gdf[["name", "geometry"]]

# collect extent region
bound = gdf.total_bounds
region = [bound[0], bound[2], bound[1], bound[3]]

with pygmt.helpers.GMTTempFile(suffix=".gmt") as tmpfile:
    os.remove(tmpfile.name)
    gdf.to_file(filename=tmpfile.name, driver="OGR_GMT", mode="w")
    # plot
    fig = pygmt.Figure()
    fig.basemap(region=region, projection="M20c", frame=True)
    fig.plot(data=tmpfile.name)
fig.show()

Perhaps we should use one of the other Python tempfile functions instead? Anyways, the main question is should we have the intermediate file be GeoJSON or OGR_GMT? Is one more flexible that the other in some respects (e.g. MultiPolygons, FeatureCollections, etc)?

@mattijn
Copy link

mattijn commented Dec 15, 2020

Sorry for not coming back earlier. Always busy end of year. Maybe I can find a moment between Christmas and New year otherwise early January again👍

@mattijn
Copy link

mattijn commented Jan 2, 2021

I have been trying a bit with the __geo_interface__ (geojson-structure), but since there is no direct support for it in GMT it is probably better to stick with the OGR_GMT format support in geopandas through fiona and leave the __geo_inferface__ support for later.
It could be possible to create OGR_GMT format from the __geo_interface__ through fiona in python or ogr2ogr in a subprocess if fiona is not available, but it complicates matters more than just aiming on geopandas.

OGR_GMT is flexible enough for all sorts of geometries, even when you are mixing types.

import shapely
import geopandas as gpd

def instance(obj):
    return type(obj).__name__

def plot(geom):
    fig = pygmt.Figure()
    if hasattr(geom, '__geo_interface__') is True:
        with pygmt.helpers.GMTTempFile(suffix=".gmt") as tmp_ascii:
            os.remove(tmp_ascii.name)        

            # when possible, use geopandas `to_file` fuction to save geom as OGR_GMT format
            if instance(geom) == "GeoDataFrame":
                # collect extent region
                WSEN = geom.total_bounds
                WESN = [WSEN[0], WSEN[2], WSEN[1], WSEN[3]]  

                geom.to_file(filename=tmp_ascii.name, driver="OGR_GMT", mode="w")

            fig.basemap(region=WESN, projection="M20c", frame=True)
            fig.plot(data=tmp_ascii.name, pen="2p,tomato,4_2:2p", color='blue', )
    return fig.show()

a = shapely.geometry.LineString([(20, 15), (30, 15)])
b = shapely.geometry.Polygon([(20, 10), (23, 10), (23, 14), (20, 14)])
c = shapely.geometry.shape({
    "type": "MultiPolygon",
    "coordinates": [
        [
            [[0, 0], [20, 0], [10, 20], [0, 0]],  # CCW
            [[3, 2], [10, 16], [17, 2], [3, 2]],  # CW
        ], 
        [
            [[6, 4], [14, 4], [10, 12], [6, 4]]  # CCW
        ],  
        [
            [[25, 5], [30, 10], [35, 5], [25, 5]]
        ]
    ]
})
gdf = gpd.GeoDataFrame(geometry=[a, b, c])  # multipolygon first
gdf.plot()

image

plot(gdf)

image
but not always

gdf = gpd.GeoDataFrame(geometry=[a, b, c])  # linestring first
plot(gdf)

image

I tried setting up the development environment to be able to start a PR, but failed miserably. Hopefully all above provides enough infofor someone more skilled than me to implement it.

Two notes:

  • The OGR_GMT format stores the extent in the file, but I'm not sure if it is possible to access it for usage for the region parameter. Currently this still needs to be derived from the geodataframe first to initiate plotting. It would be nice to just throw the geodataframe to the plot command without figuring out the extent.
  • In example 51 of GMT (link: https://docs.generic-mapping-tools.org/6.1/gallery/ex51.html) is the OGR_GMT format converted into a binary file.
    This could be done as such:
# store `OGR_GMT` ascii in binary format to reduce size.
with pygmt.helpers.GMTTempFile(suffix=".bf2") as tmp_bf2:
    os.remove(tmp_bf2.name)
    cnv = f'{tmp_ascii.name} -bo2f > {tmp_bf2.name}'
    with pygmt.clib.Session() as session:
        session.call_module('convert', cnv)
  • But I could not find a way to access the -bi2f command in the plot routine, so I gave up this route.

@weiji14 weiji14 added this to the 0.4.0 milestone Feb 15, 2021
@steo85it
Copy link

steo85it commented Feb 20, 2021

Hi! This would be super-useful, and your "hack" already works great for plotting geoDataFrames of Polygons imported from .shp files on top of .grd "images/maps".

image

LMK if you need help with testing and such!

@michaelgrund
Copy link
Member

Definitely support the idea to integrate geopandas (geo) dataframes. I assume that the possibility to easily plot shape file stuff atop GMT maps allows us to attract lot's of new users to use PyGMT.

@weiji14
Copy link
Member Author

weiji14 commented Feb 26, 2021

FYI, I'm working on #961 which will lay the groundwork for geopandas integration (by providing a single I/O entrypoint into GMT virtualfiles). Still need to work on the implementation details, but hopefully 🤞 this will mean that any PyGMT function which accepts a table-like input will be able to take in geopandas.GeoDataFrame too (i.e. not just fig.plot), but will need more testing on points/lines/polygons to see if this does actually work!

@weiji14 weiji14 self-assigned this Mar 24, 2021
@weiji14 weiji14 pinned this issue Apr 23, 2021
@weiji14 weiji14 added the scipy-sprint Things to work on during the SciPy sprint! label May 2, 2021
@weiji14
Copy link
Member Author

weiji14 commented May 18, 2021

Ok, the initial PyGMT-GeoPandas integration has been done in #1000! To try it out, do:

  1. Install the pygmt development version using pip install --pre --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple pygmt Edit: feature is now generally available in PyGMT v0.4.0!
  2. Install geopandas following https://geopandas.org/getting_started/install.html
  3. Try passing in a geopandas.GeoDataFrame or shapely.geometry object into the data/table parameter in fig.plot/pygmt.info.

We'll still need to add some documentation around this, but we would appreciate it if people can test this out and give us feedback before PyGMT v0.4.0 release (~June 2021). I anticipate issues around complex multipolygon features, and mixed geometry types, but plotting basic points/lines/polygons should be ok 🤞. Just leave a comment below or start a new issue if you find anything that could be improved!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature wanted help wanted Helping hands are appreciated scipy-sprint Things to work on during the SciPy sprint!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants