Question: recommended way to load columns containing SRID info? #356

codeananda · 2024-01-31T15:27:20Z

I've got some csv files where the geometry column is named 'geom' and looks like this

SRID=27700;POLYGON()
SRID=27700;POLYGON((10 10, 20 20, 30 10, 10 10))
SRID=27700;MULTIPOLYGON()

To load it, I manually check and modify the columns but wonder if there is a way for pyogrio to handle this itself?

Some things I've tried

Looking at the GDAL CSV Driver docs but that wasn't too helpful.
Using GEOM_POSSIBLE_NAMES=['geom'] in pyogrio.read_dataframe - but think it got confused with SRID info.

My working code

import pandas as pd
import geopandas as gpd
from shapely import wkt
from io import StringIO

# What I actually do
shlaa_gdf = gpd.read_file(
    "SHLAA.csv",
    engine='pyogrio',
    encoding='utf-8',
)

# Toy example
csv_content = """fid,geom
1,"SRID=27700;POLYGON()"
2,"SRID=27700;POLYGON((10 10, 20 20, 30 10, 10 10))"
3,"SRID=27700;MULTIPOLYGON()" """

shlaa_gdf = pd.read_csv(StringIO(csv_content))
shlaa_gdf = gpd.GeoDataFrame(shlaa_gdf)

def load_to_geom(x):
    if x == 'POLYGON()':
        return wkt.loads('POLYGON EMPTY')
    elif x == 'MULTIPOLYGON()':
        return wkt.loads('MULTIPOLYGON EMPTY')
    else:
        return wkt.loads(x)

if 'geom' in shlaa_gdf.columns:
    split_df = shlaa_gdf['geom'].str.split(';', expand=True)
    shlaa_gdf['srid'] = split_df[0]
    shlaa_gdf['wkt_geom'] = split_df[1]
    
    shlaa_gdf['geometry'] = shlaa_gdf['wkt_geom'].apply(load_to_geom)
    shlaa_gdf = shlaa_gdf.set_geometry('geometry')

The text was updated successfully, but these errors were encountered:

martinfleis · 2024-01-31T15:39:12Z

This is EWKT, or Extended Well-Known Text representation of geometries. With the exception of empty geometries, which are invalid here. I can't seem to find a reference to it in GDAL docs so I suppose that it supports only standard WKT, which is everything after ;.

We could potentially try to detect it, parse it and create geometry column, assuming all SRID's are the same, though it is not supported at the moment. Though you'd need to ensure that POLYGON() is not present as that is incorrect WKT.

My recommendation for now is to use what you have, maybe trying to use vectorized shapely.from_wkt rather than wkt.loads but that is a minor difference.

theroggy · 2024-01-31T17:55:17Z

For information, as mentioned by @martinfleis , if you would have to process larger datasets it would probably be more efficiënt to use only vectorized functions.

E.g. like this:

import pandas as pd
import geopandas as gpd
import shapely
from io import StringIO

# Toy example
csv_content = """fid,geom
1,"SRID=27700;POLYGON()"
2,"SRID=27700;POLYGON((10 10, 20 20, 30 10, 10 10))"
3,"SRID=27700;MULTIPOLYGON()"
"""

shlaa_df = pd.read_csv(StringIO(csv_content))

shlaa_df[["srid", "wkt"]] = shlaa_df["geom"].str.split(";", expand=True)
shlaa_df.loc[shlaa_df["wkt"] == "POLYGON()", "wkt"] = "POLYGON EMPTY"
shlaa_df.loc[shlaa_df["wkt"] == "MULTIPOLYGON()", "wkt"] = "MULTIPOLYGON EMPTY"
crs = shlaa_df.iloc[0]["srid"].split("=")[1]

shlaa_gdf = gpd.GeoDataFrame(geometry=shapely.from_wkt(shlaa_df["wkt"]), crs=crs)

print(shlaa_gdf)
print(shlaa_gdf.crs)

codeananda · 2024-02-01T10:16:02Z

Amazing thank you!

@theroggy could you write 'python' after the triple backticks? It's so much nicer to read.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: recommended way to load columns containing SRID info? #356

Question: recommended way to load columns containing SRID info? #356

codeananda commented Jan 31, 2024

martinfleis commented Jan 31, 2024

theroggy commented Jan 31, 2024 •

edited by martinfleis

Loading

codeananda commented Feb 1, 2024 •

edited

Loading

Question: recommended way to load columns containing SRID info? #356

Question: recommended way to load columns containing SRID info? #356

Comments

codeananda commented Jan 31, 2024

martinfleis commented Jan 31, 2024

theroggy commented Jan 31, 2024 • edited by martinfleis Loading

codeananda commented Feb 1, 2024 • edited Loading

theroggy commented Jan 31, 2024 •

edited by martinfleis

Loading

codeananda commented Feb 1, 2024 •

edited

Loading